Re: KafkaInputDStream mapping of partitions to tasks

Evgeny Shishkin Thu, 27 Mar 2014 16:28:32 -0700

On 28 Mar 2014, at 02:10, Scott Clasen <scott.cla...@gmail.com> wrote:


> Thanks everyone for the discussion.
> 
> Just to note, I restarted the job yet again, and this time there are indeed
> tasks being executed by both worker nodes. So the behavior does seem
> inconsistent/broken atm.
> 
> Then I added a third node to the cluster, and a third executor came up, and
> everything broke :|
> 
> 

This is kafka’s high-level consumer. Try to raise rebalance retries.

Also, as this consumer is threaded, it have some protection against this 
failure - first it waits some time, and then rebalances.
But for spark cluster i think this time is not enough.
If there was a way to wait every spark executor to start, rebalance, and only 
when start to consume, this issue would be less visible.   



> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3391.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: KafkaInputDStream mapping of partitions to tasks

Reply via email to