Re: remote Akka client disassociated - some timeout?

2015-01-17 Thread Akhil Das
​Try setting the following property:

.set(spark.akka.frameSize,50)​

Also make sure that spark is able read from hbase (you can try it with
small amount data).

Thanks
Best Regards

On Fri, Jan 16, 2015 at 11:30 PM, Antony Mayi antonym...@yahoo.com.invalid
wrote:

 Hi,

 I believe this is some kind of timeout problem but can't figure out how to
 increase it.

 I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python
 task which first loads big RDD from hbase - I can see in the screen output
 all executors fire up then no more logging output for next two minutes
 after which I get plenty of

 15/01/16 17:35:16 ERROR cluster.YarnClientClusterScheduler: Lost executor
 7 on node01: remote Akka client disassociated
 15/01/16 17:35:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 7
 from TaskSet 1.0
 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 32.0 in stage
 1.0 (TID 17, node01): ExecutorLostFailure (executor 7 lost)
 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 34.0 in stage
 1.0 (TID 25, node01): ExecutorLostFailure (executor 7 lost)

 this points to some timeout ~120secs while the nodes are loading the big
 RDD? any ideas how to get around it?

 fyi I already use following options without any success:

 spark.core.connection.ack.wait.timeout: 600
 spark.akka.timeout: 1000


 thanks,
 Antony.





Re: remote Akka client disassociated - some timeout?

2015-01-17 Thread Ted Yu
Antony:
Please check hbase master log to see if there was something noticeable in that 
period of time. 
If the hbase cluster is not big, check region server log as well. 

Cheers



 On Jan 16, 2015, at 10:00 AM, Antony Mayi antonym...@yahoo.com.INVALID 
 wrote:
 
 Hi,
 
 I believe this is some kind of timeout problem but can't figure out how to 
 increase it.
 
 I am running spark 1.2.0 on yarn (all from cdh 5.3.0). I submit a python task 
 which first loads big RDD from hbase - I can see in the screen output all 
 executors fire up then no more logging output for next two minutes after 
 which I get plenty of
 
 15/01/16 17:35:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 7 
 on node01: remote Akka client disassociated
 15/01/16 17:35:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 7 from 
 TaskSet 1.0
 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 32.0 in stage 1.0 
 (TID 17, node01): ExecutorLostFailure (executor 7 lost)
 15/01/16 17:35:16 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 1.0 
 (TID 25, node01): ExecutorLostFailure (executor 7 lost)
 
 this points to some timeout ~120secs while the nodes are loading the big RDD? 
 any ideas how to get around it?
 
 fyi I already use following options without any success:
 
 spark.core.connection.ack.wait.timeout: 600
 spark.akka.timeout: 1000
 
 
 thanks,
 Antony.