For specific offsets you can directly pass the offset ranges and use the
KafkaUtils. createRDD to get the events those were missed in the Dstream.
- Thanks, via mobile, excuse brevity.
On Jan 25, 2016 3:33 PM, "Raju Bairishetti" wrote:
> Hi Yash,
>Basically, my question is
Hi Yash,
Basically, my question is how to avoid storing the kafka offsets in
spark checkpoint directory. Streaming context is getting build from
checkpoint directory and proceeding with the offsets in checkpointed RDD.
I want to consume data from kafka from specific offsets along with the
Hi Raju,
Could you please explain your expected behavior with the DStream. The
DStream will have event only from the 'fromOffsets' that you provided in
the createDirectStream (which I think is the expected behavior).
For the smaller files, you will have to deal with smaller files if you
intend to
Thanks for quick reply.
I am creating Kafka Dstream by passing offsets map. I have pasted code
snippet in my earlier mail. Let me know am I missing something.
I want to use spark checkpoint for hand ng only driver/executor failures.
On Jan 22, 2016 10:08 PM, "Cody Koeninger"
Hi,
I am very new to spark & spark-streaming. I am planning to use spark
streaming for real time processing.
I have created a streaming context and checkpointing to hdfs directory
for recovery purposes in case of executor failures & driver failures.
I am creating Dstream with offset map
Offsets are stored in the checkpoint. If you want to manage offsets
yourself, don't restart from the checkpoint, specify the starting offsets
when you create the stream.
Have you read / watched the materials linked from
https://github.com/koeninger/kafka-exactly-once
Regarding the small files