When you enable check pointing your offsets get written in zookeeper. If you program dies or shutdowns and later restarted kafkadirectstream api knows where to start by looking at those offsets from zookeeper.
This is as easy as it gets. However if you are planning to re-use the same checkpoint folder among different spark version that is currently not supported. In that case you might want to go for writing the offset and topic in your favorite database. Assuming that DB is high available you can later retried the previously worked offset and start from there. Take a look at the blog post of cody.(the guy who wrote kafkadirectstream) https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/streaming-KafkaUtils-createDirectStream-how-to-start-streming-from-checkpoints-tp25461p25597.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org