Hi, I am using spark streaming check-pointing mechanism and reading the data from kafka. The window duration for my application is 2 hrs with a sliding interval of 15 minutes.
So, my batches run at following intervals... 09:45 10:00 10:15 10:30 and so on Suppose, my running batch dies at 09:55 and I restart the application at 12:05, then the flow is something like At 12:05 it would run the 10:00 batch -> would this read the kafka offsets from the time it went down (or 9:45) to 12:00 ? or just upto 10:10 ? then next would 10:15 batch - what would be the offsets as input for this batch ? ...so on for all the queued batches Basically, my requirement is such that when the application is restarted at 12:05 then it should read the kafka offsets till 10:00 and then the next queued batch takes offsets from 10:00 to 10:15 and so on until all the queued batches are processed. If this is the way offsets are handled for all the queued batched and I am fine. Or else please provide suggestions on how this can be done. Thanks!!!