Hi Koert, Seems currently there is no API you can use to detect the block manager lost. This mainly caused by Full GC or some others that block the communications between client driver and executors' block manager, when the executors recovered from block, they will re-register themselves to client driver, so for users there's need to take special steps to recover. Also you can set "spark.storage.blockManagerTimeoutIntervalMs" to large value to avoid this warning, default is "60000".
Thanks Jerry From: Koert Kuipers [mailto:[email protected]] Sent: Sunday, December 22, 2013 2:05 AM To: [email protected] Subject: how to detect a disconnect with long running apps i see this at times: 13/12/21 12:57:59 INFO scheduler.Stage: Stage 1 is now unavailable on executor 10 (0/66, false) 13/12/21 12:58:19 WARN storage.BlockManagerMasterActor: Removing BlockManager BlockManagerId(1, node10, 33734, 0) with no recent heart beats: 50227ms exceeds 45000ms typically this would be because of a spark service restart. is there a way to detect this programmatically so that the client can take the correct steps to recover?
