Have a look at this: https://github.com/koeninger/kafka-exactly-once
especially:
https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerBatch.scala
https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalPerPartiti
Hi,
I had written spark streaming application using kafka stream and its
writing to hdfs for every hour(batch time). I would like to know how to get
offset or commit offset of kafka stream while writing to hdfs so that if
there is any issue or redeployment, i'll start from the point where i did a