Re: Read Kafka topic in a Spark batch job

Bruckwald Tamás Tue, 05 Jul 2016 05:41:26 -0700

Thanks for you answer. Unfortunately I&#39;m bound to Kafka 0.8.2.1.--Bruckwald
nihed mbarek <nihe...@gmail.com> írta:
>Hi, Are you using a new version of kafka  ? if yessince 0.9 auto.offset.reset 
>parameter take :earliest: automatically reset the offset to the earliest 
>offsetlatest: automatically reset the offset to the latest offsetnone: throw 
>exception to the consumer if no previous offset is found for the 
>consumer&#39;s groupanything else: throw exception to the 
>consumer.https://kafka.apache.org/documentation.html  Regards, On Tue, Jul 5, 
>2016 at 2:15 PM, Bruckwald Tamás <tamas.bruckw...@freemail.hu> wrote:


>>Hello, I&#39;m writing a Spark (v1.6.0) batch job which reads from a Kafka 
>>topic.
>>For this I can use org.apache.spark.streaming.kafka.KafkaUtils#createRDD 
>>however, I need to set the offsets for all the partitions and also need to 
>>store them somewhere (ZK? HDFS?) to know from where to start the next batch 
>>job.What is the right approach to read from Kafka in a batch job? I&#39;m 
>>also thinking about writing a streaming job instead, which reads from 
>>auto.offset.reset=smallest and saves the checkpoint to HDFS and then in the 
>>next run it starts from that.But in this case how can I just fetch once and 
>>stop streaming after the first batch? I posted this question on StackOverflow 
>>recently (http://stackoverflow.com/q/38026627/4020050) but got no answer 
>>there, so I&#39;d ask here as well, hoping that I get some ideas on how to 
>>resolve this issue.
>> Thanks - Bruckwald

>
>
>
>--
>M&#39;BAREK Med Nihed,
>Fedora Ambassador, TUNISIA, Northern Africa
>http://www.nihed.com
>
>
>

Re: Read Kafka topic in a Spark batch job

Reply via email to