Re: Why is my spark executor is terminated?
Hi Ningjun I just wanted to check that the master didn't "kick out" the worker, as the "Disassociated" can come from the master. Here it looks like the worker killed the executor before shutting down itself. What's the Spark version ? Regards JB On 10/14/2015 04:42 PM, Wang, Ningjun (LNG-NPV) wrote: I checked master log before and did not find anything wrong. Unfortunately I have lost the master log now. So you think master log will tell you why executor is down? Regards, Ningjun Wang -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Tuesday, October 13, 2015 10:42 AM To: user@spark.apache.org Subject: Re: Why is my spark executor is terminated? Hi Ningjun, Nothing special in the master log ? Regards JB On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote: We use spark on windows 2008 R2 servers. We use one spark context which create one spark executor. We run spark master, slave, driver, executor on one single machine. From time to time, we found that the executor JAVA process was terminated. I cannot fig out why it was terminated. Can anybody help me on how to find out why the executor was terminated? The spark slave log. It shows that it kill the executor process 2015-10-13 09:58:06,087 INFO [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker (Logging.scala:logInfo(59)) - Asked to kill executor app-20151009201453-/0 But why does it do that? Here is the detailed logs from spark slave 2015-10-13 09:58:04,915 WARN [sparkWorker-akka.actor.default-dispatcher-16] remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71)) - Association with remote system [akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 2015-10-13 09:58:05,134 INFO [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef (Slf4jLogger.scala:apply$mcV$sp(74)) - Message [akka.remote.EndpointWriter$AckIdleCheckTimer$] from Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter -akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234 -2/endpointWriter#-175670388] to Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter -akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234 -2/endpointWriter#-175670388] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 2015-10-13 09:58:05,134 INFO [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef (Slf4jLogger.scala:apply$mcV$sp(74)) - Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125 680] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 2015-10-13 09:58:05,134 INFO [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef (Slf4jLogger.scala:apply$mcV$sp(74)) - Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125 680] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 2015-10-13 09:58:06,087 INFO [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker (Logging.scala:logInfo(59)) - Asked to kill executor app-20151009201453-/0 2015-10-13 09:58:06,103 INFO [ExecutorRunner for app-20151009201453-/0] worker.ExecutorRunner (Logging.scala:logInfo(59)) - Runner thread for executor app-20151009201453-/0 interrupted 2015-10-13 09:58:06,118 INFO [ExecutorRunner for app-20151009201453-/0] worker.ExecutorRunner (Logging.scala:logInfo(59)) - Killing process! 2015-10-13 09:58:06,509 INFO [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker (Logging.scala:logInfo(59)) - Executor app-20151009201453-/0 finished with state KILLED exitStatus 1 2015-10-13 09:58:06,509 INFO [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker (Logging.scala:logInfo(59)) - Cleaning up local directories for application app-20151009201453- Thanks Ningjun Wang -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional comman
RE: Why is my spark executor is terminated?
I checked master log before and did not find anything wrong. Unfortunately I have lost the master log now. So you think master log will tell you why executor is down? Regards, Ningjun Wang -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Tuesday, October 13, 2015 10:42 AM To: user@spark.apache.org Subject: Re: Why is my spark executor is terminated? Hi Ningjun, Nothing special in the master log ? Regards JB On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote: > We use spark on windows 2008 R2 servers. We use one spark context > which create one spark executor. We run spark master, slave, driver, > executor on one single machine. > > From time to time, we found that the executor JAVA process was > terminated. I cannot fig out why it was terminated. Can anybody help > me on how to find out why the executor was terminated? > > The spark slave log. It shows that it kill the executor process > > 2015-10-13 09:58:06,087 INFO > [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker > (Logging.scala:logInfo(59)) - Asked to kill executor > app-20151009201453-/0 > > But why does it do that? > > Here is the detailed logs from spark slave > > 2015-10-13 09:58:04,915 WARN > [sparkWorker-akka.actor.default-dispatcher-16] > remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71)) > - Association with remote system > [akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has > failed, address is now gated for [5000] ms. Reason is: [Disassociated]. > > 2015-10-13 09:58:05,134 INFO > [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef > (Slf4jLogger.scala:apply$mcV$sp(74)) - Message > [akka.remote.EndpointWriter$AckIdleCheckTimer$] from > Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter > -akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234 > -2/endpointWriter#-175670388] > to > Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter > -akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234 > -2/endpointWriter#-175670388] was not delivered. [2] dead letters > encountered. This logging can be turned off or adjusted with > configuration settings 'akka.log-dead-letters' and > 'akka.log-dead-letters-during-shutdown'. > > 2015-10-13 09:58:05,134 INFO > [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef > (Slf4jLogger.scala:apply$mcV$sp(74)) - Message > [akka.remote.transport.AssociationHandle$Disassociated] from > Actor[akka://sparkWorker/deadLetters] to > Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak > kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125 > 680] was not delivered. [3] dead letters encountered. This logging can > be turned off or adjusted with configuration settings > 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. > > 2015-10-13 09:58:05,134 INFO > [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef > (Slf4jLogger.scala:apply$mcV$sp(74)) - Message > [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] > from Actor[akka://sparkWorker/deadLetters] to > Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak > kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125 > 680] was not delivered. [4] dead letters encountered. This logging can > be turned off or adjusted with configuration settings > 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. > > 2015-10-13 09:58:06,087 INFO > [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker > (Logging.scala:logInfo(59)) - Asked to kill executor > app-20151009201453-/0 > > 2015-10-13 09:58:06,103 INFO [ExecutorRunner for > app-20151009201453-/0] worker.ExecutorRunner > (Logging.scala:logInfo(59)) - Runner thread for executor > app-20151009201453-/0 interrupted > > 2015-10-13 09:58:06,118 INFO [ExecutorRunner for > app-20151009201453-/0] worker.ExecutorRunner > (Logging.scala:logInfo(59)) - Killing process! > > 2015-10-13 09:58:06,509 INFO > [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker > (Logging.scala:logInfo(59)) - Executor app-20151009201453-/0 > finished with state KILLED exitStatus 1 > > 2015-10-13 09:58:06,509 INFO > [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker > (Logging.scala:logInfo(59)) - Cleaning up local directories for > application app-20151009201453- > > Thanks > > Ningjun Wang > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Why is my spark executor is terminated?
Hi Ningjun, Nothing special in the master log ? Regards JB On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote: We use spark on windows 2008 R2 servers. We use one spark context which create one spark executor. We run spark master, slave, driver, executor on one single machine. From time to time, we found that the executor JAVA process was terminated. I cannot fig out why it was terminated. Can anybody help me on how to find out why the executor was terminated? The spark slave log. It shows that it kill the executor process 2015-10-13 09:58:06,087 INFO [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker (Logging.scala:logInfo(59)) - Asked to kill executor app-20151009201453-/0 But why does it do that? Here is the detailed logs from spark slave 2015-10-13 09:58:04,915 WARN [sparkWorker-akka.actor.default-dispatcher-16] remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71)) - Association with remote system [akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has failed, address is now gated for [5000] ms. Reason is: [Disassociated]. 2015-10-13 09:58:05,134 INFO [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef (Slf4jLogger.scala:apply$mcV$sp(74)) - Message [akka.remote.EndpointWriter$AckIdleCheckTimer$] from Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234-2/endpointWriter#-175670388] to Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234-2/endpointWriter#-175670388] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 2015-10-13 09:58:05,134 INFO [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef (Slf4jLogger.scala:apply$mcV$sp(74)) - Message [akka.remote.transport.AssociationHandle$Disassociated] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125680] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 2015-10-13 09:58:05,134 INFO [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef (Slf4jLogger.scala:apply$mcV$sp(74)) - Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkWorker/deadLetters] to Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125680] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. 2015-10-13 09:58:06,087 INFO [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker (Logging.scala:logInfo(59)) - Asked to kill executor app-20151009201453-/0 2015-10-13 09:58:06,103 INFO [ExecutorRunner for app-20151009201453-/0] worker.ExecutorRunner (Logging.scala:logInfo(59)) - Runner thread for executor app-20151009201453-/0 interrupted 2015-10-13 09:58:06,118 INFO [ExecutorRunner for app-20151009201453-/0] worker.ExecutorRunner (Logging.scala:logInfo(59)) - Killing process! 2015-10-13 09:58:06,509 INFO [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker (Logging.scala:logInfo(59)) - Executor app-20151009201453-/0 finished with state KILLED exitStatus 1 2015-10-13 09:58:06,509 INFO [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker (Logging.scala:logInfo(59)) - Cleaning up local directories for application app-20151009201453- Thanks Ningjun Wang -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org