Hi Cody,

What other options do I have other than monitoring and restarting the job?
Can the job recover automatically?

Thanks,
Sweth

On Thu, Oct 1, 2015 at 7:18 AM, Cody Koeninger <c...@koeninger.org> wrote:

> Did you check you kafka broker logs to see what was going on during that
> time?
>
> The direct stream will handle normal leader loss / rebalance by retrying
> tasks.
>
> But the exception you got indicates that something with kafka was wrong,
> such that offsets were being re-used.
>
> ie. your job already processed up through beginning offset 15027734702
>
> but when asking kafka for the highest available offsets, it returns ending
> offset 15027725493
>
> which is lower, in other words kafka lost messages.  This might happen
> because you lost a leader and recovered from a replica that wasn't in sync,
> or someone manually screwed up a topic, or ... ?
>
> If you really want to just blindly "recover" from this situation (even
> though something is probably wrong with your data), the most
> straightforward thing to do is monitor and restart your job.
>
>
>
>
> On Wed, Sep 30, 2015 at 4:31 PM, swetha <swethakasire...@gmail.com> wrote:
>
>>
>> Hi,
>>
>> I see this sometimes in our Kafka Direct approach in our Streaming job.
>> How
>> do we make sure that the job recovers from such errors and works normally
>> thereafter?
>>
>> 15/09/30 05:14:18 ERROR KafkaRDD: Lost leader for topic x_stream partition
>> 19,  sleeping for 200ms
>> 15/09/30 05:14:18 ERROR KafkaRDD: Lost leader for topic x_stream partition
>> 5,  sleeping for 200ms
>>
>> Followed by every task failing with something like this:
>>
>> 15/09/30 05:26:20 ERROR Executor: Exception in task 4.0 in stage 84281.0
>> (TID 818804)
>> kafka.common.NotLeaderForPartitionException
>>
>> And:
>>
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 15
>> in stage 84958.0 failed 4 times, most recent failure: Lost task 15.3 in
>> stage 84958.0 (TID 819461, 10.227.68.102): java.lang.AssertionError:
>> assertion failed: Beginning offset 15027734702 is after the ending offset
>> 15027725493 for topic hubble_stream partition 12. You either provided an
>> invalid fromOffset, or the Kafka topic has been damaged
>>
>>
>> Thanks,
>> Swetha
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Lost-leader-exception-in-Kafka-Direct-for-Streaming-tp24891.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to