Re: Why is my spark executor is terminated?

2015-10-14 Thread Jean-Baptiste Onofré

Hi Ningjun

I just wanted to check that the master didn't "kick out" the worker, as 
the "Disassociated" can come from the master.


Here it looks like the worker killed the executor before shutting down 
itself.


What's the Spark version ?

Regards
JB

On 10/14/2015 04:42 PM, Wang, Ningjun (LNG-NPV) wrote:

I checked master log before and did not find anything wrong. Unfortunately I 
have lost the master log now.

So you think master log will tell you why executor is down?

Regards,

Ningjun Wang


-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
Sent: Tuesday, October 13, 2015 10:42 AM
To: user@spark.apache.org
Subject: Re: Why is my spark executor is terminated?

Hi Ningjun,

Nothing special in the master log ?

Regards
JB

On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote:

We use spark on windows 2008 R2 servers. We use one spark context
which create one spark executor. We run spark master, slave, driver,
executor on one single machine.

  From time to time, we found that the executor JAVA process was
terminated. I cannot fig out why it was terminated. Can anybody help
me on how to find out why the executor was terminated?

The spark slave log. It shows that it kill the executor process

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-/0

But why does it do that?

Here is the detailed logs from spark slave

2015-10-13 09:58:04,915 WARN
[sparkWorker-akka.actor.default-dispatcher-16]
remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71))
- Association with remote system
[akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has
failed, address is now gated for [5000] ms. Reason is: [Disassociated].

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.EndpointWriter$AckIdleCheckTimer$] from
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter
-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234
-2/endpointWriter#-175670388]
to
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter
-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234
-2/endpointWriter#-175670388] was not delivered. [2] dead letters
encountered. This logging can be turned off or adjusted with
configuration settings 'akka.log-dead-letters' and
'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.AssociationHandle$Disassociated] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak
kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125
680] was not delivered. [3] dead letters encountered. This logging can
be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
from Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak
kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125
680] was not delivered. [4] dead letters encountered. This logging can
be turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-/0

2015-10-13 09:58:06,103 INFO  [ExecutorRunner for
app-20151009201453-/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Runner thread for executor
app-20151009201453-/0 interrupted

2015-10-13 09:58:06,118 INFO  [ExecutorRunner for
app-20151009201453-/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Killing process!

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Executor app-20151009201453-/0
finished with state KILLED exitStatus 1

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Cleaning up local directories for
application app-20151009201453-

Thanks

Ningjun Wang



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
comman

RE: Why is my spark executor is terminated?

2015-10-14 Thread Wang, Ningjun (LNG-NPV)
I checked master log before and did not find anything wrong. Unfortunately I 
have lost the master log now.

So you think master log will tell you why executor is down?

Regards,

Ningjun Wang


-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] 
Sent: Tuesday, October 13, 2015 10:42 AM
To: user@spark.apache.org
Subject: Re: Why is my spark executor is terminated?

Hi Ningjun,

Nothing special in the master log ?

Regards
JB

On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote:
> We use spark on windows 2008 R2 servers. We use one spark context 
> which create one spark executor. We run spark master, slave, driver, 
> executor on one single machine.
>
>  From time to time, we found that the executor JAVA process was 
> terminated. I cannot fig out why it was terminated. Can anybody help 
> me on how to find out why the executor was terminated?
>
> The spark slave log. It shows that it kill the executor process
>
> 2015-10-13 09:58:06,087 INFO
> [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
> (Logging.scala:logInfo(59)) - Asked to kill executor
> app-20151009201453-/0
>
> But why does it do that?
>
> Here is the detailed logs from spark slave
>
> 2015-10-13 09:58:04,915 WARN
> [sparkWorker-akka.actor.default-dispatcher-16]
> remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71)) 
> - Association with remote system 
> [akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has 
> failed, address is now gated for [5000] ms. Reason is: [Disassociated].
>
> 2015-10-13 09:58:05,134 INFO
> [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
> (Slf4jLogger.scala:apply$mcV$sp(74)) - Message 
> [akka.remote.EndpointWriter$AckIdleCheckTimer$] from 
> Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter
> -akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234
> -2/endpointWriter#-175670388]
> to
> Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter
> -akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234
> -2/endpointWriter#-175670388] was not delivered. [2] dead letters 
> encountered. This logging can be turned off or adjusted with 
> configuration settings 'akka.log-dead-letters' and 
> 'akka.log-dead-letters-during-shutdown'.
>
> 2015-10-13 09:58:05,134 INFO
> [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
> (Slf4jLogger.scala:apply$mcV$sp(74)) - Message 
> [akka.remote.transport.AssociationHandle$Disassociated] from 
> Actor[akka://sparkWorker/deadLetters] to 
> Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak
> kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125
> 680] was not delivered. [3] dead letters encountered. This logging can 
> be turned off or adjusted with configuration settings 
> 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
>
> 2015-10-13 09:58:05,134 INFO
> [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
> (Slf4jLogger.scala:apply$mcV$sp(74)) - Message 
> [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
> from Actor[akka://sparkWorker/deadLetters] to 
> Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak
> kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125
> 680] was not delivered. [4] dead letters encountered. This logging can 
> be turned off or adjusted with configuration settings 
> 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
>
> 2015-10-13 09:58:06,087 INFO
> [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
> (Logging.scala:logInfo(59)) - Asked to kill executor
> app-20151009201453-/0
>
> 2015-10-13 09:58:06,103 INFO  [ExecutorRunner for 
> app-20151009201453-/0] worker.ExecutorRunner
> (Logging.scala:logInfo(59)) - Runner thread for executor
> app-20151009201453-/0 interrupted
>
> 2015-10-13 09:58:06,118 INFO  [ExecutorRunner for 
> app-20151009201453-/0] worker.ExecutorRunner
> (Logging.scala:logInfo(59)) - Killing process!
>
> 2015-10-13 09:58:06,509 INFO
> [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
> (Logging.scala:logInfo(59)) - Executor app-20151009201453-/0 
> finished with state KILLED exitStatus 1
>
> 2015-10-13 09:58:06,509 INFO
> [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
> (Logging.scala:logInfo(59)) - Cleaning up local directories for 
> application app-20151009201453-
>
> Thanks
>
> Ningjun Wang
>

--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Why is my spark executor is terminated?

2015-10-13 Thread Jean-Baptiste Onofré

Hi Ningjun,

Nothing special in the master log ?

Regards
JB

On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote:

We use spark on windows 2008 R2 servers. We use one spark context which
create one spark executor. We run spark master, slave, driver, executor
on one single machine.

 From time to time, we found that the executor JAVA process was
terminated. I cannot fig out why it was terminated. Can anybody help me
on how to find out why the executor was terminated?

The spark slave log. It shows that it kill the executor process

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-/0

But why does it do that?

Here is the detailed logs from spark slave

2015-10-13 09:58:04,915 WARN
[sparkWorker-akka.actor.default-dispatcher-16]
remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71)) -
Association with remote system
[akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has
failed, address is now gated for [5000] ms. Reason is: [Disassociated].

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.EndpointWriter$AckIdleCheckTimer$] from
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234-2/endpointWriter#-175670388]
to
Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234-2/endpointWriter#-175670388]
was not delivered. [2] dead letters encountered. This logging can be
turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.AssociationHandle$Disassociated] from
Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125680]
was not delivered. [3] dead letters encountered. This logging can be
turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:05,134 INFO
[sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef
(Slf4jLogger.scala:apply$mcV$sp(74)) - Message
[akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying]
from Actor[akka://sparkWorker/deadLetters] to
Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125680]
was not delivered. [4] dead letters encountered. This logging can be
turned off or adjusted with configuration settings
'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.

2015-10-13 09:58:06,087 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Asked to kill executor
app-20151009201453-/0

2015-10-13 09:58:06,103 INFO  [ExecutorRunner for
app-20151009201453-/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Runner thread for executor
app-20151009201453-/0 interrupted

2015-10-13 09:58:06,118 INFO  [ExecutorRunner for
app-20151009201453-/0] worker.ExecutorRunner
(Logging.scala:logInfo(59)) - Killing process!

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Executor app-20151009201453-/0
finished with state KILLED exitStatus 1

2015-10-13 09:58:06,509 INFO
[sparkWorker-akka.actor.default-dispatcher-16] worker.Worker
(Logging.scala:logInfo(59)) - Cleaning up local directories for
application app-20151009201453-

Thanks

Ningjun Wang



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org