Could any of the experts kindly advise ? On Fri, May 19, 2017 at 6:00 PM, Jayadeep J <jayade...@gmail.com> wrote:
> Hi , > > I would appreciate some advice regarding an issue we are facing in > Streaming Kafka Direct Consumer. > > We have recently upgraded our application with Kafka Direct Stream to > Spark 2 (spark-streaming-kafka-0-10 - 2.1.0) with Kafka version (0.10.0.0) > . We find abnormal delays after the application has run for a couple of > hours & completed consumption of a ~ 10 million records. There is a sudden > dip in the processing time for ~15 seconds (usual for our app) to ~3 > minutes & from then on the processing time keeps degrading throughout > without any failure though. > > We have seen that the delay is due to certain tasks taking the exact time > duration of the configured 'request.timeout.ms' for the Kafka consumer. > We have tested this by varying timeout property to different values. Looks > like the get(offset: Long, timeout: Long): ConsumerRecord[K, V] & > subsequent poll(timeout) method in CachedKafkaConsumer.scala is actually > timing out on some of the partitions without reading the data. But the > executor logs it as successfully completed after the exact timeout > duration. Note that most other tasks are completing successfully with > millisecond duration. We found the DEBUG logs to contain > "org.apache.kafka.common.errors.DisconnectException" without any actual > failure. The Kafka issue logged as 'KafkaConsumer susceptible to > FetchResponse starvation' [KAFKA-4753] seems to be the underlying cause. > > Could anyone kindly suggest if this a normal behaviour for > spark? Shouldn't Spark throw Timeout error or may be fail the tasks in such > cases ?? Currently the tasks seems to be successful & the job appears to > progress with really slow speed. Thanks for your help. > > Thanks > Jay >