Hi Cody, What other options do I have other than monitoring and restarting the job? Can the job recover automatically?
Thanks, Sweth On Thu, Oct 1, 2015 at 7:18 AM, Cody Koeninger <c...@koeninger.org> wrote: > Did you check you kafka broker logs to see what was going on during that > time? > > The direct stream will handle normal leader loss / rebalance by retrying > tasks. > > But the exception you got indicates that something with kafka was wrong, > such that offsets were being re-used. > > ie. your job already processed up through beginning offset 15027734702 > > but when asking kafka for the highest available offsets, it returns ending > offset 15027725493 > > which is lower, in other words kafka lost messages. This might happen > because you lost a leader and recovered from a replica that wasn't in sync, > or someone manually screwed up a topic, or ... ? > > If you really want to just blindly "recover" from this situation (even > though something is probably wrong with your data), the most > straightforward thing to do is monitor and restart your job. > > > > > On Wed, Sep 30, 2015 at 4:31 PM, swetha <swethakasire...@gmail.com> wrote: > >> >> Hi, >> >> I see this sometimes in our Kafka Direct approach in our Streaming job. >> How >> do we make sure that the job recovers from such errors and works normally >> thereafter? >> >> 15/09/30 05:14:18 ERROR KafkaRDD: Lost leader for topic x_stream partition >> 19, sleeping for 200ms >> 15/09/30 05:14:18 ERROR KafkaRDD: Lost leader for topic x_stream partition >> 5, sleeping for 200ms >> >> Followed by every task failing with something like this: >> >> 15/09/30 05:26:20 ERROR Executor: Exception in task 4.0 in stage 84281.0 >> (TID 818804) >> kafka.common.NotLeaderForPartitionException >> >> And: >> >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 >> in stage 84958.0 failed 4 times, most recent failure: Lost task 15.3 in >> stage 84958.0 (TID 819461, 10.227.68.102): java.lang.AssertionError: >> assertion failed: Beginning offset 15027734702 is after the ending offset >> 15027725493 for topic hubble_stream partition 12. You either provided an >> invalid fromOffset, or the Kafka topic has been damaged >> >> >> Thanks, >> Swetha >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Lost-leader-exception-in-Kafka-Direct-for-Streaming-tp24891.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >