When you enable check pointing your offsets get written in zookeeper. If you
program dies or shutdowns and later restarted kafkadirectstream api knows
where to start by looking at those offsets from zookeeper.

This is as easy as it gets.
However if you are planning to re-use the same checkpoint folder among
different spark version that is currently not supported.
In that case you might want to go for writing the offset and topic in your
favorite database. Assuming that DB is high available you can later retried
the previously worked offset and start from there.

Take a look at the blog post of cody.(the guy who wrote kafkadirectstream)
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/streaming-KafkaUtils-createDirectStream-how-to-start-streming-from-checkpoints-tp25461p25597.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to