Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Charles vinodh
Thanks Dhaval, that fixed the issue. The constant resetting of Kafka offsets misled me about the issue. Please feel free the answer the SO question here if you would like to..

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Dhaval Patel
Hi Charles, Can you check is any of the case related to output directory and checkpoint location mentioned in below link is applicable in your case? https://kb.databricks.com/streaming/file-sink-streaming.html Regards Dhaval On Wed, Sep 11, 2019 at 9:29 PM Burak Yavuz wrote: > Hey Charles, >

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Burak Yavuz
Hey Charles, If you are using maxOffsetsPerTrigger, you will likely rest the offsets every microbatch, because: 1. Spark will figure out a range of offsets to process (let's call them x and y) 2. If these offsets have fallen out of the retention period, Spark will try to set the offset to x

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Charles vinodh
Hi Sandish, as I have said if the offset reset happens only once that would make sense. But I am not sure how to explain why the offset reset is happening for every micro-batch... ideally once the offset reset happens the app should move to a valid offset and start consuming data. but in my case

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Sandish Kumar HN
You can see this kind of error, if there is consumer lag more than Kafka retention period. You will not see any failures if below option is not set. Set failOnDataLoss=true option to see failures. On Wed, Sep 11, 2019 at 3:24 PM Charles vinodh wrote: > The only form of rate limiting I have set

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Charles vinodh
The only form of rate limiting I have set is *maxOffsetsPerTrigger *and *fetch.message.max.bytes. * *"*may be that you are trying to process records that have passed the retention period within Kafka.*"* If the above is true then I should have my offsets reset only once ideally when my

Re: Spark Kafka Streaming making progress but there is no data to be consumed

2019-09-11 Thread Burak Yavuz
Do you have rate limiting set on your stream? It may be that you are trying to process records that have passed the retention period within Kafka. On Wed, Sep 11, 2019 at 2:39 PM Charles vinodh wrote: > > Hi, > > I am trying to run a spark application ingesting data from Kafka using the > Spark