Many thanks Cody, it explains quite a bit.
I had couple of problems with checkpointing and graceful shutdown moving
from working code in Spark 1.3.0 to 1.5.0. Having InterruptedExceptions,
KafkaDirectStream couldn't initialize, some exceptions regarding WAL even
I'm using direct stream. Meanwhile
TD can correct me on this, but I believe checkpointing is done after a set
of jobs is submitted, not after they are completed. If you fail while
processing the jobs, starting over from that checkpoint should put you in
the correct state.
In any case, are you actually observing a loss of messages
Hi,
I have 2 streams and checkpointing with code based on documentation. One
stream is transforming data from Kafka and saves them to Parquet file. The
other stream uses the same stream and does updateStateByKey to compute some
aggregations. There is no gracefulShutdown.
Both use about this code t