Github user tgravescs commented on a diff in the pull request:
https://github.com/apache/spark/pull/14916#discussion_r77202121
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
@@ -222,7 +222,9 @@ private[spark] class ApplicationMaster(
if (!unregistered) {
// we only want to unregister if we don't want the RM to retry
- if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
isLastAttempt) {
+ if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
+ exitCode == ApplicationMaster.EXIT_EARLY ||
+ exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS ||
isLastAttempt) {
--- End diff --
You can't do this. There are various reasons these can happen and if any
of them are retryable by yarn you are now preventing that from happening by
unregistering. The kill may cause these but other things could to. The
EXIT_EXCEPTION_USER_CLASS is any throwable from the user code, the EXIT_EARLY
is unknown and thus would want to retry.
I'm fine with adding something in if we know it was kill, but I think thats
hard here because yarn doesn't tell us. Ideally we have a spark command to
kill nicely and then we can do the cleanup ourselves.
The client should try to clean this up if it sees its killed, assuming its
still running.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]