I've seen this happen before due to the driver doing long GCs when the driver machine was heavily memory-constrained. For this particular issue, simply freeing up memory used by other applications fixed the problem.
On Fri, Nov 1, 2013 at 12:14 AM, Liu, Raymond <[email protected]> wrote: > Hi > > I am encounter an issue that the executor actor could not connect to > Driver actor. But I could not figure out what's the reason. > > Say the Driver actor is listening on :35838 > > root@sr434:~# netstat -lpv > Active Internet connections (only servers) > Proto Recv-Q Send-Q Local Address Foreign Address State > PID/Program name > tcp 0 0 *:50075 *:* LISTEN > 18242/java > tcp 0 0 *:50020 *:* LISTEN > 18242/java > tcp 0 0 *:ssh *:* LISTEN > 1325/sshd > tcp 0 0 *:50010 *:* LISTEN > 18242/java > tcp6 0 0 sr434:35838 [::]:* LISTEN > 9420/java > tcp6 0 0 [::]:40390 [::]:* LISTEN > 9420/java > tcp6 0 0 [::]:4040 [::]:* LISTEN > 9420/java > tcp6 0 0 [::]:8040 [::]:* LISTEN > 28324/java > tcp6 0 0 [::]:60712 [::]:* LISTEN > 28324/java > tcp6 0 0 [::]:8042 [::]:* LISTEN > 28324/java > tcp6 0 0 [::]:34028 [::]:* LISTEN > 9420/java > tcp6 0 0 [::]:ssh [::]:* LISTEN > 1325/sshd > tcp6 0 0 [::]:45528 [::]:* LISTEN > 9420/java > tcp6 0 0 [::]:13562 [::]:* LISTEN > 28324/java > > > while the executor driver report errors as below : > > 13/11/01 13:16:43 INFO executor.CoarseGrainedExecutorBackend: Connecting > to driver: akka://spark@sr434:35838/user/CoarseGrainedScheduler > 13/11/01 13:16:43 ERROR executor.CoarseGrainedExecutorBackend: Driver > terminated or disconnected! Shutting down. > > Any idea? > > Best Regards, > Raymond Liu >
