If it's a batch job, don't use a stream. You have to store the offsets reliably somewhere regardless. So it sounds like your only issue is with identifying offsets per partition? Look at KafkaCluster.scala, methods getEarliestLeaderOffsets / getLatestLeaderOffsets.
On Tue, Jul 5, 2016 at 7:40 AM, Bruckwald Tamás <tamas.bruckw...@freemail.hu > wrote: > Thanks for you answer. Unfortunately I'm bound to Kafka 0.8.2.1. > --Bruckwald > > nihed mbarek <nihe...@gmail.com> írta: > > Hi, > > Are you using a new version of kafka ? if yes > since 0.9 auto.offset.reset parameter take : > > - earliest: automatically reset the offset to the earliest offset > - latest: automatically reset the offset to the latest offset > - none: throw exception to the consumer if no previous offset is found > for the consumer's group > - anything else: throw exception to the consumer. > > https://kafka.apache.org/documentation.html > > > Regards, > > On Tue, Jul 5, 2016 at 2:15 PM, Bruckwald Tamás < > tamas.bruckw...@freemail.hu> wrote: >> >> Hello, >> >> I'm writing a Spark (v1.6.0) batch job which reads from a Kafka topic. >> For this I can use org.apache.spark.streaming.kafka.KafkaUtils#createRDD >> however, I need to set the offsets for all the partitions and also need to >> store them somewhere (ZK? HDFS?) to know from where to start the next batch >> job. >> What is the right approach to read from Kafka in a batch job? >> >> I'm also thinking about writing a streaming job instead, which reads from >> auto.offset.reset=smallest and saves the checkpoint to HDFS and then in the >> next run it starts from that. >> But in this case how can I just fetch once and stop streaming after the >> first batch? >> >> I posted this question on StackOverflow recently ( >> http://stackoverflow.com/q/38026627/4020050) but got no answer there, so >> I'd ask here as well, hoping that I get some ideas on how to resolve this >> issue. >> >> Thanks - Bruckwald >> > > > > -- > > M'BAREK Med Nihed, > Fedora Ambassador, TUNISIA, Northern Africa > http://www.nihed.com > > <http://tn.linkedin.com/in/nihed> > > >