I checked master log before and did not find anything wrong. Unfortunately I have lost the master log now.
So you think master log will tell you why executor is down? Regards, Ningjun Wang -----Original Message----- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Tuesday, October 13, 2015 10:42 AM To: user@spark.apache.org Subject: Re: Why is my spark executor is terminated? Hi Ningjun, Nothing special in the master log ? Regards JB On 10/13/2015 04:34 PM, Wang, Ningjun (LNG-NPV) wrote: > We use spark on windows 2008 R2 servers. We use one spark context > which create one spark executor. We run spark master, slave, driver, > executor on one single machine. > > From time to time, we found that the executor JAVA process was > terminated. I cannot fig out why it was terminated. Can anybody help > me on how to find out why the executor was terminated? > > The spark slave log. It shows that it kill the executor process > > 2015-10-13 09:58:06,087 INFO > [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker > (Logging.scala:logInfo(59)) - Asked to kill executor > app-20151009201453-0000/0 > > But why does it do that? > > Here is the detailed logs from spark slave > > 2015-10-13 09:58:04,915 WARN > [sparkWorker-akka.actor.default-dispatcher-16] > remote.ReliableDeliverySupervisor (Slf4jLogger.scala:apply$mcV$sp(71)) > - Association with remote system > [akka.tcp://sparkexecu...@qa1-cas01.pcc.lexisnexis.com:61234] has > failed, address is now gated for [5000] ms. Reason is: [Disassociated]. > > 2015-10-13 09:58:05,134 INFO > [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef > (Slf4jLogger.scala:apply$mcV$sp(74)) - Message > [akka.remote.EndpointWriter$AckIdleCheckTimer$] from > Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter > -akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234 > -2/endpointWriter#-175670388] > to > Actor[akka://sparkWorker/system/endpointManager/reliableEndpointWriter > -akka.tcp%3A%2F%2FsparkExecutor%40QA1-CAS01.pcc.lexisnexis.com%3A61234 > -2/endpointWriter#-175670388] was not delivered. [2] dead letters > encountered. This logging can be turned off or adjusted with > configuration settings 'akka.log-dead-letters' and > 'akka.log-dead-letters-during-shutdown'. > > 2015-10-13 09:58:05,134 INFO > [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef > (Slf4jLogger.scala:apply$mcV$sp(74)) - Message > [akka.remote.transport.AssociationHandle$Disassociated] from > Actor[akka://sparkWorker/deadLetters] to > Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak > kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125 > 680] was not delivered. [3] dead letters encountered. This logging can > be turned off or adjusted with configuration settings > 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. > > 2015-10-13 09:58:05,134 INFO > [sparkWorker-akka.actor.default-dispatcher-16] actor.LocalActorRef > (Slf4jLogger.scala:apply$mcV$sp(74)) - Message > [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] > from Actor[akka://sparkWorker/deadLetters] to > Actor[akka://sparkWorker/system/transports/akkaprotocolmanager.tcp0/ak > kaProtocol-tcp%3A%2F%2FsparkWorker%4010.196.116.184%3A61236-3#-1210125 > 680] was not delivered. [4] dead letters encountered. This logging can > be turned off or adjusted with configuration settings > 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. > > 2015-10-13 09:58:06,087 INFO > [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker > (Logging.scala:logInfo(59)) - Asked to kill executor > app-20151009201453-0000/0 > > 2015-10-13 09:58:06,103 INFO [ExecutorRunner for > app-20151009201453-0000/0] worker.ExecutorRunner > (Logging.scala:logInfo(59)) - Runner thread for executor > app-20151009201453-0000/0 interrupted > > 2015-10-13 09:58:06,118 INFO [ExecutorRunner for > app-20151009201453-0000/0] worker.ExecutorRunner > (Logging.scala:logInfo(59)) - Killing process! > > 2015-10-13 09:58:06,509 INFO > [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker > (Logging.scala:logInfo(59)) - Executor app-20151009201453-0000/0 > finished with state KILLED exitStatus 1 > > 2015-10-13 09:58:06,509 INFO > [sparkWorker-akka.actor.default-dispatcher-16] worker.Worker > (Logging.scala:logInfo(59)) - Cleaning up local directories for > application app-20151009201453-0000 > > Thanks > > Ningjun Wang > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org