When deploying a spark streaming application I want to be able to retrieve the lastest kafka offsets that were processed by the pipeline, and create my kafka direct streams from those offsets. Because the checkpoint directory isn't guaranteed to be compatible between job deployments, I don't want to re-use the checkpoint directory from the previous job deployment. I also don't want to have to re-process everything in my kafka queues. Is there any way to retrieve this information from the checkpoint directory, or has anyone else solved this problem already?
* I apologize if this is a duplicate message. I didn't see it go through earlier today, and I didn't see it in the archive.