Access the offsets using HasOffsetRanges, save them in your datastore, provide them as the fromOffsets argument when starting the stream.
See https://github.com/koeninger/kafka-exactly-once On Thu, Aug 13, 2015 at 3:53 PM, Stephen Durfey <sjdur...@gmail.com> wrote: > When deploying a spark streaming application I want to be able to > retrieve the lastest kafka offsets that were processed by the pipeline, and > create my kafka direct streams from those offsets. Because the checkpoint > directory isn't guaranteed to be compatible between job deployments, I > don't want to re-use the checkpoint directory from the previous job > deployment. I also don't want to have to re-process everything in my kafka > queues. Is there any way to retrieve this information from the checkpoint > directory, or has anyone else solved this problem already? > > * I apologize if this is a duplicate message. I didn't see it go through > earlier today, and I didn't see it in the archive. >