Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-25 Thread Yash Sharma
For specific offsets you can directly pass the offset ranges and use the KafkaUtils. createRDD to get the events those were missed in the Dstream. - Thanks, via mobile, excuse brevity. On Jan 25, 2016 3:33 PM, "Raju Bairishetti" wrote: > Hi Yash, >Basically, my question is

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-24 Thread Raju Bairishetti
Hi Yash, Basically, my question is how to avoid storing the kafka offsets in spark checkpoint directory. Streaming context is getting build from checkpoint directory and proceeding with the offsets in checkpointed RDD. I want to consume data from kafka from specific offsets along with the

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-23 Thread Yash Sharma
Hi Raju, Could you please explain your expected behavior with the DStream. The DStream will have event only from the 'fromOffsets' that you provided in the createDirectStream (which I think is the expected behavior). For the smaller files, you will have to deal with smaller files if you intend to

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-23 Thread Raju Bairishetti
Thanks for quick reply. I am creating Kafka Dstream by passing offsets map. I have pasted code snippet in my earlier mail. Let me know am I missing something. I want to use spark checkpoint for hand ng only driver/executor failures. On Jan 22, 2016 10:08 PM, "Cody Koeninger"

[Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-22 Thread Raju Bairishetti
Hi, I am very new to spark & spark-streaming. I am planning to use spark streaming for real time processing. I have created a streaming context and checkpointing to hdfs directory for recovery purposes in case of executor failures & driver failures. I am creating Dstream with offset map

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-22 Thread Cody Koeninger
Offsets are stored in the checkpoint. If you want to manage offsets yourself, don't restart from the checkpoint, specify the starting offsets when you create the stream. Have you read / watched the materials linked from https://github.com/koeninger/kafka-exactly-once Regarding the small files