Try setting the following property: .set("spark.akka.frameSize","50")
Also make sure that spark is able read from hbase (you can try it with small amount data). Thanks Best Regards On Fri, Jan 16, 2015 at 11:30 PM, Antony Mayi <antonym...@yahoo.com.invalid> wrote: > Hi, > > I believe this is some kind of timeout problem but can't figure out how to > increase it. > > I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python > task which first loads big RDD from hbase - I can see in the screen output > all executors fire up then no more logging output for next two minutes > after which I get plenty of > > 15/01/16 17:35:16 ERROR cluster.YarnClientClusterScheduler: Lost executor > 7 on node01: remote Akka client disassociated > 15/01/16 17:35:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 7 > from TaskSet 1.0 > 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 32.0 in stage > 1.0 (TID 17, node01): ExecutorLostFailure (executor 7 lost) > 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 34.0 in stage > 1.0 (TID 25, node01): ExecutorLostFailure (executor 7 lost) > > this points to some timeout ~120secs while the nodes are loading the big > RDD? any ideas how to get around it? > > fyi I already use following options without any success: > > spark.core.connection.ack.wait.timeout: 600 > spark.akka.timeout: 1000 > > > thanks, > Antony. > > >