This is running in YARN cluster mode. It was restarted automatically and continued fine. I was trying to see what went wrong. AFAIK there were no task failure. Nothing in executor logs. The log I gave is in driver.
After some digging, I did see that there was a rebalance in kafka logs around this time. So will driver fail and exit in such cases? I've seen drivers exit after a job has hit max retry attempts. This is different though rt? Srikanth On Tue, Feb 7, 2017 at 5:25 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Does restarting after a few minutes solves the problem? Could be a > transient issue that lasts long enough for spark task-level retries to all > fail. > > On Tue, Feb 7, 2017 at 4:34 PM, Srikanth <srikanth...@gmail.com> wrote: > >> Hello, >> >> I had a spark streaming app that reads from kafka running for a few hours >> after which it failed with error >> >> *17/02/07 20:04:10 ERROR JobScheduler: Error generating jobs for time >> 1486497850000 ms >> java.lang.IllegalStateException: No current assignment for partition >> mt_event-5 >> at >> org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:251) >> at >> org.apache.kafka.clients.consumer.internals.SubscriptionState.needOffsetReset(SubscriptionState.java:315) >> at >> org.apache.kafka.clients.consumer.KafkaConsumer.seekToEnd(KafkaConsumer.java:1170) >> at >> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:179) >> at >> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:196) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)* >> >> .... >> .... >> >> 17/02/07 20:04:10 ERROR ApplicationMaster: User class threw exception: >> java.lang.IllegalStateException: No current assignment for partition >> mt_event-5 >> java.lang.IllegalStateException: No current assignment for partition >> mt_event-5 >> >> .... >> .... >> >> 17/02/07 20:05:00 WARN JobGenerator: Timed out while stopping the job >> generator (timeout = 50000) >> >> >> Driver did not recover from this error and failed. The previous batch ran >> 5sec back. There are no indications in the logs that some rebalance happened. >> As per kafka admin, kafka cluster health was good when this happened and no >> maintenance was being done. >> >> Any idea what could have gone wrong and why this is a fatal error? >> >> Regards, >> Srikanth >> >> >