Hi Koert,

Seems currently there is no API you can use to detect the block manager lost. 
This mainly caused by Full GC or some others that block the communications 
between client driver and executors' block manager, when the executors 
recovered from block, they will re-register themselves to client driver, so for 
users there's need to take special steps to recover. Also you can set 
"spark.storage.blockManagerTimeoutIntervalMs" to large value to avoid this 
warning, default is "60000".


Thanks
Jerry

From: Koert Kuipers [mailto:[email protected]]
Sent: Sunday, December 22, 2013 2:05 AM
To: [email protected]
Subject: how to detect a disconnect

with long running apps i see this at times:

13/12/21 12:57:59 INFO scheduler.Stage: Stage 1 is now unavailable on executor 
10 (0/66, false)
13/12/21 12:58:19 WARN storage.BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(1, node10, 33734, 0) with no recent heart beats: 50227ms exceeds 
45000ms
typically this would be because of a spark service restart. is there a way to 
detect this programmatically so that the client can take the correct steps to 
recover?

Reply via email to