Re: Retrieving offsets from previous spark streaming checkpoint

Cody Koeninger Thu, 13 Aug 2015 16:04:07 -0700

Access the offsets using HasOffsetRanges, save them in your datastore,
provide them as the fromOffsets argument when starting the stream.


See https://github.com/koeninger/kafka-exactly-once

On Thu, Aug 13, 2015 at 3:53 PM, Stephen Durfey <sjdur...@gmail.com> wrote:

> When deploying a spark streaming application I want to be able to
> retrieve the lastest kafka offsets that were processed by the pipeline, and
> create my kafka direct streams from those offsets. Because the checkpoint
> directory isn't guaranteed to be compatible between job deployments, I
> don't want to re-use the checkpoint directory from the previous job
> deployment. I also don't want to have to re-process everything in my kafka
> queues. Is there any way to retrieve this information from the checkpoint
> directory, or has anyone else solved this problem already?
>
> * I apologize if this is a duplicate message. I didn't see it go through
> earlier today, and I didn't see it in the archive.
>

Re: Retrieving offsets from previous spark streaming checkpoint

Reply via email to