Re: remote Akka client disassociated - some timeout?
Try setting the following property: .set(spark.akka.frameSize,50) Also make sure that spark is able read from hbase (you can try it with small amount data). Thanks Best Regards On Fri, Jan 16, 2015 at 11:30 PM, Antony Mayi antonym...@yahoo.com.invalid wrote: Hi, I believe this is some kind of timeout problem but can't figure out how to increase it. I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task which first loads big RDD from hbase - I can see in the screen output all executors fire up then no more logging output for next two minutes after which I get plenty of 15/01/16 17:35:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 7 on node01: remote Akka client disassociated 15/01/16 17:35:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 7 from TaskSet 1.0 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 32.0 in stage 1.0 (TID 17, node01): ExecutorLostFailure (executor 7 lost) 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 1.0 (TID 25, node01): ExecutorLostFailure (executor 7 lost) this points to some timeout ~120secs while the nodes are loading the big RDD? any ideas how to get around it? fyi I already use following options without any success: spark.core.connection.ack.wait.timeout: 600 spark.akka.timeout: 1000 thanks, Antony.
Re: remote Akka client disassociated - some timeout?
Antony: Please check hbase master log to see if there was something noticeable in that period of time. If the hbase cluster is not big, check region server log as well. Cheers On Jan 16, 2015, at 10:00 AM, Antony Mayi antonym...@yahoo.com.INVALID wrote: Hi, I believe this is some kind of timeout problem but can't figure out how to increase it. I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task which first loads big RDD from hbase - I can see in the screen output all executors fire up then no more logging output for next two minutes after which I get plenty of 15/01/16 17:35:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 7 on node01: remote Akka client disassociated 15/01/16 17:35:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 7 from TaskSet 1.0 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 32.0 in stage 1.0 (TID 17, node01): ExecutorLostFailure (executor 7 lost) 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 1.0 (TID 25, node01): ExecutorLostFailure (executor 7 lost) this points to some timeout ~120secs while the nodes are loading the big RDD? any ideas how to get around it? fyi I already use following options without any success: spark.core.connection.ack.wait.timeout: 600 spark.akka.timeout: 1000 thanks, Antony.