saving offsets to zookeeper is old approach, check-pointing internally saves the offsets to HDFS/location of checkpointing.
more details here: http://spark.apache.org/docs/latest/streaming-kafka-integration.html On Tue, Aug 23, 2016 at 10:30 AM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi Experts, > > I am looking for some information on how to acheive zero data loss while > working with kafka and Spark. I have searched online and blogs have > different answer. Please let me know if anyone has idea on this. > > Blog 1: > https://databricks.com/blog/2015/01/15/improved-driver- > fault-tolerance-and-zero-data-loss-in-spark-streaming.html > > > Blog2: > http://aseigneurin.github.io/2016/05/07/spark-kafka- > achieving-zero-data-loss.html > > > Blog one simply says configuration change with checkpoint directory and > blog 2 give details about on how to save offsets to zoo keeper. can you > please help me out with right approach. > > Thanks, > Asmath > > >