Hi, I'm no expert but
Short answer: yes, after restarting your application will reread the failed messages Longer answer: it seems like you're mixing several things together Let me try and explain: - WAL is used to prevent your application from losing data by making the executor first write the data it receives from Kafka into WAL and only then updating the Kafka high level consumer (what the receivers approach is using) that it actually received the data (making it an at-least once) - Checkpoints are a mechanism that helps your *driver* recover from failures by saving driver information into HDFS (or S3 or whatever) Now, the reason I explained these is this: you asked "... one bug caused the streaming application to fail and exit" - so the failure you're trying to solve is in the driver. When you restart your application your driver will go and fetch the information it last saved in the checkpoint (saved into HDFS) and order to new executors (since the previous driver died so did the executors) to continue consuming data. Since your executors are using the receivers approach (as opposed to the directkafkastream) with WAL what will happen is that when they (the executors) get started they will first execute what was saved in the WAL and then read from the latest offsets saved in Kafka (Zookeeper) which in your case means you won't lose data (the executors first save the data to WAL then advance their offsets on Kafka) If you decide to go for the direct approach <https://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers> then your driver will be the one (and only one) managing the offsets for Kafka which means that some of the data the driver will save in the checkpoint will be the Kafka offsets I hope this helps :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Checkpoint-help-failed-application-tp25347p25357.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org