Thanks for you answer. Unfortunately I'm bound to Kafka 0.8.2.1.--Bruckwald nihed mbarek <nihe...@gmail.com> írta: >Hi, Are you using a new version of kafka ? if yessince 0.9 auto.offset.reset >parameter take :earliest: automatically reset the offset to the earliest >offsetlatest: automatically reset the offset to the latest offsetnone: throw >exception to the consumer if no previous offset is found for the >consumer's groupanything else: throw exception to the >consumer.https://kafka.apache.org/documentation.html Regards, On Tue, Jul 5, >2016 at 2:15 PM, Bruckwald Tamás <tamas.bruckw...@freemail.hu> wrote:
>>Hello, I'm writing a Spark (v1.6.0) batch job which reads from a Kafka >>topic. >>For this I can use org.apache.spark.streaming.kafka.KafkaUtils#createRDD >>however, I need to set the offsets for all the partitions and also need to >>store them somewhere (ZK? HDFS?) to know from where to start the next batch >>job.What is the right approach to read from Kafka in a batch job? I'm >>also thinking about writing a streaming job instead, which reads from >>auto.offset.reset=smallest and saves the checkpoint to HDFS and then in the >>next run it starts from that.But in this case how can I just fetch once and >>stop streaming after the first batch? I posted this question on StackOverflow >>recently (http://stackoverflow.com/q/38026627/4020050) but got no answer >>there, so I'd ask here as well, hoping that I get some ideas on how to >>resolve this issue. >> Thanks - Bruckwald > > > >-- >M'BAREK Med Nihed, >Fedora Ambassador, TUNISIA, Northern Africa >http://www.nihed.com > > >