RE: Executor could not connect to Driver?

2013-11-03 Thread Liu, Raymond
Thanks, my case seems not caused by GC, cpu is pretty low and both YGC and FGC 
seems behavior quite normal. Hmm, weird.

Best Regards,
Raymond Liu

From: Aaron Davidson [mailto:ilike...@gmail.com]
Sent: Saturday, November 02, 2013 12:07 AM
To: user@spark.incubator.apache.org
Subject: Re: Executor could not connect to Driver?

I've seen this happen before due to the driver doing long GCs when the driver 
machine was heavily memory-constrained. For this particular issue, simply 
freeing up memory used by other applications fixed the problem.

On Fri, Nov 1, 2013 at 12:14 AM, Liu, Raymond 
mailto:raymond@intel.com>> wrote:
Hi

I am encounter an issue that the executor actor could not connect to Driver 
actor. But I could not figure out what's the reason.

Say the Driver actor is listening on :35838

root@sr434:~# netstat -lpv
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
PID/Program name
tcp0  0 *:50075 *:* LISTEN  
18242/java
tcp0  0 *:50020 *:* LISTEN  
18242/java
tcp0  0 *:ssh   *:* LISTEN  
1325/sshd
tcp0  0 *:50010 *:* LISTEN  
18242/java
tcp6   0  0 sr434:35838 [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:40390  [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:4040   [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:8040   [::]:*  LISTEN  
28324/java
tcp6   0  0 [::]:60712  [::]:*  LISTEN  
28324/java
tcp6   0  0 [::]:8042   [::]:*  LISTEN  
28324/java
tcp6   0  0 [::]:34028  [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:ssh[::]:*  LISTEN  
1325/sshd
tcp6   0  0 [::]:45528  [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:13562  [::]:*  LISTEN  
28324/java


while the executor driver report errors as below :

13/11/01 13:16:43 INFO executor.CoarseGrainedExecutorBackend: Connecting to 
driver: akka://spark@sr434:35838/user/CoarseGrainedScheduler
13/11/01 13:16:43 ERROR executor.CoarseGrainedExecutorBackend: Driver 
terminated or disconnected! Shutting down.

Any idea?

Best Regards,
Raymond Liu



Re: Executor could not connect to Driver?

2013-11-01 Thread Aaron Davidson
I've seen this happen before due to the driver doing long GCs when the
driver machine was heavily memory-constrained. For this particular issue,
simply freeing up memory used by other applications fixed the problem.


On Fri, Nov 1, 2013 at 12:14 AM, Liu, Raymond  wrote:

> Hi
>
> I am encounter an issue that the executor actor could not connect to
> Driver actor. But I could not figure out what's the reason.
>
> Say the Driver actor is listening on :35838
>
> root@sr434:~# netstat -lpv
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address   Foreign Address State
>   PID/Program name
> tcp0  0 *:50075 *:* LISTEN
>  18242/java
> tcp0  0 *:50020 *:* LISTEN
>  18242/java
> tcp0  0 *:ssh   *:* LISTEN
>  1325/sshd
> tcp0  0 *:50010 *:* LISTEN
>  18242/java
> tcp6   0  0 sr434:35838 [::]:*  LISTEN
>  9420/java
> tcp6   0  0 [::]:40390  [::]:*  LISTEN
>  9420/java
> tcp6   0  0 [::]:4040   [::]:*  LISTEN
>  9420/java
> tcp6   0  0 [::]:8040   [::]:*  LISTEN
>  28324/java
> tcp6   0  0 [::]:60712  [::]:*  LISTEN
>  28324/java
> tcp6   0  0 [::]:8042   [::]:*  LISTEN
>  28324/java
> tcp6   0  0 [::]:34028  [::]:*  LISTEN
>  9420/java
> tcp6   0  0 [::]:ssh[::]:*  LISTEN
>  1325/sshd
> tcp6   0  0 [::]:45528  [::]:*  LISTEN
>  9420/java
> tcp6   0  0 [::]:13562  [::]:*  LISTEN
>  28324/java
>
>
> while the executor driver report errors as below :
>
> 13/11/01 13:16:43 INFO executor.CoarseGrainedExecutorBackend: Connecting
> to driver: akka://spark@sr434:35838/user/CoarseGrainedScheduler
> 13/11/01 13:16:43 ERROR executor.CoarseGrainedExecutorBackend: Driver
> terminated or disconnected! Shutting down.
>
> Any idea?
>
> Best Regards,
> Raymond Liu
>


Executor could not connect to Driver?

2013-11-01 Thread Liu, Raymond
Hi

I am encounter an issue that the executor actor could not connect to Driver 
actor. But I could not figure out what's the reason.

Say the Driver actor is listening on :35838

root@sr434:~# netstat -lpv
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State   
PID/Program name
tcp0  0 *:50075 *:* LISTEN  
18242/java
tcp0  0 *:50020 *:* LISTEN  
18242/java
tcp0  0 *:ssh   *:* LISTEN  
1325/sshd
tcp0  0 *:50010 *:* LISTEN  
18242/java
tcp6   0  0 sr434:35838 [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:40390  [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:4040   [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:8040   [::]:*  LISTEN  
28324/java
tcp6   0  0 [::]:60712  [::]:*  LISTEN  
28324/java
tcp6   0  0 [::]:8042   [::]:*  LISTEN  
28324/java
tcp6   0  0 [::]:34028  [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:ssh[::]:*  LISTEN  
1325/sshd
tcp6   0  0 [::]:45528  [::]:*  LISTEN  
9420/java
tcp6   0  0 [::]:13562  [::]:*  LISTEN  
28324/java


while the executor driver report errors as below :

13/11/01 13:16:43 INFO executor.CoarseGrainedExecutorBackend: Connecting to 
driver: akka://spark@sr434:35838/user/CoarseGrainedScheduler
13/11/01 13:16:43 ERROR executor.CoarseGrainedExecutorBackend: Driver 
terminated or disconnected! Shutting down.

Any idea?

Best Regards,
Raymond Liu