Hi

By receicer I meant spark streaming receiver architecture- means worker
nodes are different than receiver nodes. There is no direct consumer/low
level consumer like of  Kafka in kinesis spark streaming?

Is there any limitation on interval checkpoint - minimum of 1second in
spark streaming with kinesis. But as such there is no limit on checkpoint
interval in KCL side ?

Thanks

On Tue, Oct 25, 2016 at 8:36 AM, Takeshi Yamamuro <linguin....@gmail.com>
wrote:

> I'm not exactly sure about the receiver you pointed though,
> if you point the "KinesisReceiver" implementation, yes.
>
> Also, we currently cannot disable the interval checkpoints.
>
> On Tue, Oct 25, 2016 at 11:53 AM, Shushant Arora <
> shushantaror...@gmail.com> wrote:
>
>> Thanks!
>>
>> Is kinesis streams are receiver based only? Is there non receiver based
>> consumer for Kinesis ?
>>
>> And Instead of having fixed checkpoint interval,Can I disable auto
>> checkpoint and say  when my worker has processed the data after last record
>> of mapPartition now checkpoint the sequence no using some api.
>>
>>
>>
>> On Tue, Oct 25, 2016 at 7:07 AM, Takeshi Yamamuro <linguin....@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> The only thing you can do for Kinesis checkpoints is tune the interval
>>> of them.
>>> https://github.com/apache/spark/blob/master/external/kinesis
>>> -asl/src/main/scala/org/apache/spark/streaming/kinesis/Kines
>>> isUtils.scala#L68
>>>
>>> Whether the dataloss occurs or not depends on the storage level you set;
>>> if you set StorageLevel.MEMORY_AND_DISK_2, Spark may continue processing
>>> in case of the dataloss because the stream data Spark receives are
>>> replicated across executors.
>>> However,  all the executors that have the replicated data crash,
>>> IIUC the dataloss occurs.
>>>
>>> // maropu
>>>
>>> On Mon, Oct 24, 2016 at 4:43 PM, Shushant Arora <
>>> shushantaror...@gmail.com> wrote:
>>>
>>>> Does spark streaming consumer for kinesis uses Kinesis Client Library
>>>>  and mandates to checkpoint the sequence number of shards in dynamo db.
>>>>
>>>> Will it lead to dataloss if consumed datarecords are not yet processed
>>>> and kinesis checkpointed the consumed sequenece numbers in dynamo db and
>>>> spark worker crashes - then spark launched the worker on another node but
>>>> start consuming from dynamo db's checkpointed sequence number which is
>>>> ahead of processed sequenece number .
>>>>
>>>> is there a way to checkpoint the sequenece numbers ourselves in Kinesis
>>>> as it is in Kafka low level consumer ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Reply via email to