expected Kinesis checkpoint behavior when driver restarts

2015-10-27 Thread Hster Geguri
We are using Kinesis with Spark Streaming 1.5 on a YARN cluster. When we enable checkpointing in Spark, where in the Kinesis stream should a restarted driver continue? I run a simple experiment as follows: 1. In the first driver run, Spark driver processes 1 million records starting from

Re: expected Kinesis checkpoint behavior when driver restarts

2015-10-27 Thread Tathagata Das
Your observation is correct! The current implementation of checkpointing to DynamoDB is tied to the presence of new data from Kinesis (I think that emulates the KCL behavior), if there is no data for while, the checkpointing does not occur. That explains your observation. I have filed a JIRA to