Re: Checkpoint files are saved before stream is saved to file (rdd.toDF().write ...)?

2015-09-25 Thread Petr Novak
Many thanks Cody, it explains quite a bit. I had couple of problems with checkpointing and graceful shutdown moving from working code in Spark 1.3.0 to 1.5.0. Having InterruptedExceptions, KafkaDirectStream couldn't initialize, some exceptions regarding WAL even I'm using direct stream. Meanwhile

Re: Checkpoint files are saved before stream is saved to file (rdd.toDF().write ...)?

2015-09-23 Thread Cody Koeninger
TD can correct me on this, but I believe checkpointing is done after a set of jobs is submitted, not after they are completed. If you fail while processing the jobs, starting over from that checkpoint should put you in the correct state. In any case, are you actually observing a loss of messages

Checkpoint files are saved before stream is saved to file (rdd.toDF().write ...)?

2015-09-23 Thread Petr Novak
Hi, I have 2 streams and checkpointing with code based on documentation. One stream is transforming data from Kafka and saves them to Parquet file. The other stream uses the same stream and does updateStateByKey to compute some aggregations. There is no gracefulShutdown. Both use about this code t