GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/14202
[SPARK-16230] [CORE] CoarseGrainedExecutorBackend to self kill if there is
an exception while creating an Executor
## What changes were proposed in this pull request?
With the fix from SPARK-13112, I see that `LaunchTask` is always processed
after `RegisteredExecutor` is done and so it gets chance to do all retries to
startup an executor. There is still a problem that if `Executor` creation
itself fails and there is some exception, it gets unnoticed and the executor is
killed when it tries to process the `LaunchTask` as `executor` is null :
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L88
So if one looks at the logs, it does not tell that there was problem during
`Executor` creation and thats why it was killed.
This PR explicitly catches exception in `Executor` creation, logs a proper
message and then exits the JVM. Also, I have changed the `exitExecutor` method
to accept `reason` so that backends can use that reason and do stuff like
logging to a DB to get an aggregate of such exits at a cluster level
## How was this patch tested?
I am relying on existing tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tejasapatil/spark exit_executor_failure
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14202.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14202
----
commit 0c71699894d4b7920388056a1d05d2277a79cf38
Author: Tejas Patil <[email protected]>
Date: 2016-07-14T14:36:36Z
CoarseGrainedExecutorBackend to self kill if there is an exception while
creating an Executor
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]