bq. to get the logs from the data nodes Minor correction: the logs are collected from machines where node managers run.
Cheers On Wed, Dec 3, 2014 at 3:39 PM, Ganelin, Ilya <[email protected]> wrote: > You want to look further up the stack (there are almost certainly other > errors before this happens) and those other errors may give your better > idea of what is going on. Also if you are running on yarn you can run "yarn > logs -applicationId <yourAppId>" to get the logs from the data nodes. > > > > Sent with Good (www.good.com) > > > -----Original Message----- > *From: *S. Zhou [[email protected]] > *Sent: *Wednesday, December 03, 2014 06:30 PM Eastern Standard Time > *To: *[email protected] > *Subject: *Spark executor lost > > We are using Spark job server to submit spark jobs (our spark version is > 0.91). After running the spark job server for a while, we often see the > following errors (executor lost) in the spark job server log. As a > consequence, the spark driver (allocated inside spark job server) gradually > loses executors. And finally the spark job server no longer be able to > submit jobs. We tried to google the solutions but so far no luck. Please > help if you have any ideas. Thanks! > > [2014-11-25 01:37:36,250] INFO parkDeploySchedulerBackend [] > [akka://JobServer/user/context-supervisor/next-staging] - Executor 6 > disconnected, so removing it > [2014-11-25 01:37:36,252] ERROR cheduler.TaskSchedulerImpl [] > [akka://JobServer/user/context-supervisor/next-staging] - Lost executor 6 > on XXXX: remote Akka client disassociated > [2014-11-25 01:37:36,252] INFO ark.scheduler.DAGScheduler [] [] - *Executor > lost*: 6 (epoch 8) > [2014-11-25 01:37:36,252] INFO ge.BlockManagerMasterActor [] [] - Trying > to remove executor 6 from BlockManagerMaster. > [2014-11-25 01:37:36,252] INFO storage.BlockManagerMaster [] [] - Removed > 6 successfully in removeExecutor > [2014-11-25 01:37:36,286] INFO ient.AppClient$ClientActor [] > [akka://JobServer/user/context-supervisor/next-staging] - Executor updated: > app-20141125002023-0037/6 is now FAILED (Command exited with code 143) > > > > ------------------------------ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the > intended recipient, you are hereby notified that any review, > retransmission, dissemination, distribution, copying or other use of, or > taking of any action in reliance upon this information is strictly > prohibited. If you have received this communication in error, please > contact the sender and delete the material from your computer. >
