Whenever an Executor ends, it enters into one of three states: KILLED, FAILED, LOST (see: 1<https://github.com/falaki/incubator-spark/blob/79868fe7246d8e6d57e0a376b2593fabea9a9d83/core/src/main/scala/org/apache/spark/deploy/ExecutorState.scala>). None of these sound like "exited cleanly," which I agree is weird, but I don't believe this is a regression, as it has been this way for quite some time. Out of the three, KILLED sounds most reasonable for normal termination.
I've went ahead and created https://spark-project.atlassian.net/browse/SPARK-937 to fix this. On Fri, Oct 18, 2013 at 7:56 AM, Ameet Kini <[email protected]> wrote: > Jey, > > I don't see a "close()" method on SparkContext. > > http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.SparkContext > > I tried the "stop()" method but still see the job is reported KILLED. Btw, > I don't recall getting this behavior in 0.7.3, my standalone programs used > to cleanly shutdown without requiring any further operations on > SparkContext. Also, I notice that none of the examples do a stop() or any > other closing method calls on the SparkContext, so I'm not sure what I > could be doing differently with the SparkContext that jobs get reported as > KILLED even though they run through successfully. > > Ameet > > > On Thu, Oct 17, 2013 at 5:59 PM, Jey Kottalam <[email protected]> wrote: > >> You can try calling the "close()" method on your SparkContext, which >> should allow for a cleaner shutdown. >> >> On Thu, Oct 17, 2013 at 2:38 PM, Ameet Kini <[email protected]> wrote: >> > >> > I'm using the scala 2.10 branch of Spark in standalone mode, and am >> seeing >> > the job reports itself as KILLED in the UI with the below message in >> each of >> > the executors log, even though the job processes correctly and returns >> the >> > correct result. The job is triggered by a .count on an RDD and the count >> > seems right. The only thing I can thing of is I'm doing a >> System.exit(0) at >> > the end of the main method. If I remove that call, I don't see the below >> > message but the job hangs, and the UI reports it as still running. >> > >> > >> > >> > >> > 13/10/17 15:31:52 INFO actor.LocalActorRef: Message >> > [akka.remote.transport.AssociationHandle$Disassociated] from >> > Actor[akka://spark/deadLetters] to >> > >> Actor[akka://spark/system/transports/akkaprotocolmanager.tcp1/akkaProtocol-tcp%3A%2F%2Fspark%40ec2-cdh4u2-dev-master.geoeyeanalytics.ec2%3A47366-1#136073268] >> > was not delivered. [1] dead letters encountered. This logging can be >> turned >> > off or adjusted with configuration settings 'akka.log-dead-letters' and >> > 'akka.log-dead-letters-during-shutdown'. >> > 13/10/17 15:31:52 ERROR executor.StandaloneExecutorBackend: Driver >> > terminated or disconnected! Shutting down. >> > 13/10/17 15:31:52 INFO actor.LocalActorRef: Message >> > [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] >> from >> > Actor[akka://spark/deadLetters] to >> > >> Actor[akka://spark/system/transports/akkaprotocolmanager.tcp1/akkaProtocol-tcp%3A%2F%2Fspark%40ec2-cdh4u2-dev-master.geoeyeanalytics.ec2%3A47366-1#136073268] >> > was not delivered. [2] dead letters encountered. This logging can be >> turned >> > off or adjusted with configuration settings 'akka.log-dead-letters' and >> > 'akka.log-dead-letters-during-shutdown'. >> > 13/10/17 15:31:52 INFO actor.LocalActorRef: Message >> > [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] >> from >> > Actor[akka://sparkExecutor/deadLetters] to >> > >> Actor[akka://sparkExecutor/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2Fspark%40ec2-cdh4u2-dev-master.geoeyeanalytics.ec2%3A47366-1#593252773] >> > was not delivered. [1] dead letters encountered. This logging can be >> turned >> > off or adjusted with configuration settings 'akka.log-dead-letters' and >> > 'akka.log-dead-letters-during-shutdown'. >> > 13/10/17 15:31:52 ERROR remote.EndpointWriter: AssociationError >> > [akka.tcp://sparkExecutor@ec2-cdh4u2-dev-slave1:46566] -> >> > [akka.tcp://spark@ec2-cdh4u2-dev-master:47366]: Error [Association >> failed >> > with [akka.tcp://[email protected]:47366]] >> [ >> > akka.remote.EndpointAssociationException: Association failed with >> > [akka.tcp://spark@ec2-cdh4u2-dev-master:47366] >> > >
