Re: OffsetOutOfRangeError with Kafka-Spark streaming

2015-08-10 Thread Cassa L
Ok. Problem is resolved when I increased retention policy for topic. But now I see that whenever I restart Spark job, some old messages are being pulled up by Spark stream. For new Spark stream API, do we need to keep track of offsets? LCassa On Thu, Aug 6, 2015 at 4:58 PM, Grant Henke

Re: OffsetOutOfRangeError with Kafka-Spark streaming

2015-08-07 Thread Cassa L
That will be great if you can also try it! As for retention policy, I had come across some issue with 0.8.1 version where retention.ms is in milliseconds but actual server property is log.retention.minutes and servers would take it as minutes? Is it true? anyways, I have updated retention to 2

Re: OffsetOutOfRangeError with Kafka-Spark streaming

2015-08-06 Thread Parth Brahmbhatt
In Apache Storm some users reported the same issue few months ago [1][2][3]. This was an unusual situation which in our experience only happened when storm topology was asking for offsets that were already trimmed by kafka. Multiple pathological cases(too low retention period, too slow topology,

Re: OffsetOutOfRangeError with Kafka-Spark streaming

2015-08-06 Thread Parth Brahmbhatt
retention.ms is actually millisecond, you want a value much larger then 1440, which translates to 1.4 seconds. On 8/6/15, 4:35 PM, Cassa L lcas...@gmail.com wrote: Hi Grant, Yes, I saw exception in Spark and Kafka. In Kafka server logs I get this exception:

Re: OffsetOutOfRangeError with Kafka-Spark streaming

2015-08-06 Thread Cassa L
Hi Grant, Yes, I saw exception in Spark and Kafka. In Kafka server logs I get this exception: kafka.common.OffsetOutOfRangeException: Request for offset 2823 but we only have log segments in the range 2824 to 2824. at kafka.log.Log.read(Log.scala:380) at

Re: OffsetOutOfRangeError with Kafka-Spark streaming

2015-08-06 Thread Grant Henke
Looks like this is likely a case very similar to the case Parth mentioned storm users have seen, when processing falls behind the retention period. Perhaps Spark and Kafka can handle this scenario more gracefully. I would be happy to do some investigation/testing and report back with findings and

OffsetOutOfRangeError with Kafka-Spark streaming

2015-08-06 Thread Cassa L
Hi, Has anyone tried streaming API of Spark with Kafka? I am experimenting new Spark API to read from Kafka. KafkaUtils.createDirectStream(...) Every now and then, I get following error spark kafka.common.OffsetOutOfRangeException and my spark script stops working. I have simple topic with just

Re: OffsetOutOfRangeError with Kafka-Spark streaming

2015-08-06 Thread Grant Henke
Does this Spark Jira match up with what you are seeing or sound related? https://issues.apache.org/jira/browse/SPARK-8474 What versions of Spark and Kafka are you using? Can you include more of the spark log? Any errors shown in the Kafka log? Thanks, Grant On Thu, Aug 6, 2015 at 1:17 PM, Cassa