Hi By receicer I meant spark streaming receiver architecture- means worker nodes are different than receiver nodes. There is no direct consumer/low level consumer like of Kafka in kinesis spark streaming?
Is there any limitation on interval checkpoint - minimum of 1second in spark streaming with kinesis. But as such there is no limit on checkpoint interval in KCL side ? Thanks On Tue, Oct 25, 2016 at 8:36 AM, Takeshi Yamamuro <linguin....@gmail.com> wrote: > I'm not exactly sure about the receiver you pointed though, > if you point the "KinesisReceiver" implementation, yes. > > Also, we currently cannot disable the interval checkpoints. > > On Tue, Oct 25, 2016 at 11:53 AM, Shushant Arora < > shushantaror...@gmail.com> wrote: > >> Thanks! >> >> Is kinesis streams are receiver based only? Is there non receiver based >> consumer for Kinesis ? >> >> And Instead of having fixed checkpoint interval,Can I disable auto >> checkpoint and say when my worker has processed the data after last record >> of mapPartition now checkpoint the sequence no using some api. >> >> >> >> On Tue, Oct 25, 2016 at 7:07 AM, Takeshi Yamamuro <linguin....@gmail.com> >> wrote: >> >>> Hi, >>> >>> The only thing you can do for Kinesis checkpoints is tune the interval >>> of them. >>> https://github.com/apache/spark/blob/master/external/kinesis >>> -asl/src/main/scala/org/apache/spark/streaming/kinesis/Kines >>> isUtils.scala#L68 >>> >>> Whether the dataloss occurs or not depends on the storage level you set; >>> if you set StorageLevel.MEMORY_AND_DISK_2, Spark may continue processing >>> in case of the dataloss because the stream data Spark receives are >>> replicated across executors. >>> However, all the executors that have the replicated data crash, >>> IIUC the dataloss occurs. >>> >>> // maropu >>> >>> On Mon, Oct 24, 2016 at 4:43 PM, Shushant Arora < >>> shushantaror...@gmail.com> wrote: >>> >>>> Does spark streaming consumer for kinesis uses Kinesis Client Library >>>> and mandates to checkpoint the sequence number of shards in dynamo db. >>>> >>>> Will it lead to dataloss if consumed datarecords are not yet processed >>>> and kinesis checkpointed the consumed sequenece numbers in dynamo db and >>>> spark worker crashes - then spark launched the worker on another node but >>>> start consuming from dynamo db's checkpointed sequence number which is >>>> ahead of processed sequenece number . >>>> >>>> is there a way to checkpoint the sequenece numbers ourselves in Kinesis >>>> as it is in Kafka low level consumer ? >>>> >>>> Thanks >>>> >>>> >>> >>> >>> -- >>> --- >>> Takeshi Yamamuro >>> >> >> > > > -- > --- > Takeshi Yamamuro >