When deploying a spark streaming application I want to be able to retrieve
the lastest kafka offsets that were processed by the pipeline, and create
my kafka direct streams from those offsets. Because the checkpoint
directory isn't guaranteed to be compatible between job deployments, I
don't want to re-use the checkpoint directory from the previous job
deployment. I also don't want to have to re-process everything in my kafka
queues. Is there any way to retrieve this information from the checkpoint
directory, or has anyone else solved this problem already?

* I apologize if this is a duplicate message. I didn't see it go through
earlier today, and I didn't see it in the archive.

Reply via email to