Re: [streaming] KafkaUtils.createDirectStream - how to start streming from checkpoints?

Cody Koeninger Mon, 07 Dec 2015 12:41:38 -0800

Just to be clear, spark checkpoints have nothing to do with zookeeper,
they're stored in the filesystem you specify.


On Sun, Dec 6, 2015 at 1:25 AM, manasdebashiskar <poorinsp...@gmail.com>
wrote:

> When you enable check pointing your offsets get written in zookeeper. If
> you
> program dies or shutdowns and later restarted kafkadirectstream api knows
> where to start by looking at those offsets from zookeeper.
>
> This is as easy as it gets.
> However if you are planning to re-use the same checkpoint folder among
> different spark version that is currently not supported.
> In that case you might want to go for writing the offset and topic in your
> favorite database. Assuming that DB is high available you can later retried
> the previously worked offset and start from there.
>
> Take a look at the blog post of cody.(the guy who wrote kafkadirectstream)
> https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/streaming-KafkaUtils-createDirectStream-how-to-start-streming-from-checkpoints-tp25461p25597.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: [streaming] KafkaUtils.createDirectStream - how to start streming from checkpoints?

Reply via email to