No reply, maybe I didn't make it clear. I try to add more information. When the worker node attempts to launch a problematic executor, not only the executor fails to launch but also the worker is removed by master. The worker will try to re-register with master but rejected. In the master log, the following WARN message continually pops up.
14/05/05 09:55:00 WARN Master: Got heartbeat from unregistered worker worker-20140504213545-spark-host007-7078 Can anyone shed some lights on the issue? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/works-disconnected-with-master-but-still-keep-alive-tp5315p5338.html Sent from the Apache Spark User List mailing list archive at Nabble.com.