If it's a batch job, don't use a stream.

You have to store the offsets reliably somewhere regardless.  So it sounds
like your only issue is with identifying offsets per partition?  Look at
KafkaCluster.scala, methods getEarliestLeaderOffsets /
getLatestLeaderOffsets.

On Tue, Jul 5, 2016 at 7:40 AM, Bruckwald Tamás <tamas.bruckw...@freemail.hu
> wrote:

> Thanks for you answer. Unfortunately I'm bound to Kafka 0.8.2.1.
> --Bruckwald
>
> nihed mbarek <nihe...@gmail.com> írta:
>
> Hi,
>
> Are you using a new version of kafka  ? if yes
> since 0.9 auto.offset.reset parameter take :
>
>    - earliest: automatically reset the offset to the earliest offset
>    - latest: automatically reset the offset to the latest offset
>    - none: throw exception to the consumer if no previous offset is found
>    for the consumer's group
>    - anything else: throw exception to the consumer.
>
> https://kafka.apache.org/documentation.html
>
>
> Regards,
>
> On Tue, Jul 5, 2016 at 2:15 PM, Bruckwald Tamás <
> tamas.bruckw...@freemail.hu> wrote:
>>
>> Hello,
>>
>> I'm writing a Spark (v1.6.0) batch job which reads from a Kafka topic.
>> For this I can use org.apache.spark.streaming.kafka.KafkaUtils#createRDD
>> however, I need to set the offsets for all the partitions and also need to
>> store them somewhere (ZK? HDFS?) to know from where to start the next batch
>> job.
>> What is the right approach to read from Kafka in a batch job?
>>
>> I'm also thinking about writing a streaming job instead, which reads from
>> auto.offset.reset=smallest and saves the checkpoint to HDFS and then in the
>> next run it starts from that.
>> But in this case how can I just fetch once and stop streaming after the
>> first batch?
>>
>> I posted this question on StackOverflow recently (
>> http://stackoverflow.com/q/38026627/4020050) but got no answer there, so
>> I'd ask here as well, hoping that I get some ideas on how to resolve this
>> issue.
>>
>> Thanks - Bruckwald
>>
>
>
>
> --
>
> M'BAREK Med Nihed,
> Fedora Ambassador, TUNISIA, Northern Africa
> http://www.nihed.com
>
> <http://tn.linkedin.com/in/nihed>
>
>
>

Reply via email to