tgravescs commented on a change in pull request #34366:
URL: https://github.com/apache/spark/pull/34366#discussion_r744864128
##########
File path:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1277,10 +1277,14 @@ private[spark] class Client(
} else {
val YarnAppReport(appState, finalState, diags) =
monitorApplication(appId)
if (appState == YarnApplicationState.FAILED || finalState ==
FinalApplicationStatus.FAILED) {
+ var amContainerSuccess = false
diags.foreach { err =>
+ amContainerSuccess = err.contains("AM Container") &&
err.contains("exitCode: 0")
logError(s"Application diagnostics message: $err")
}
- throw new SparkException(s"Application $appId finished with failed
status")
+ if (!amContainerSuccess) {
+ throw new SparkException(s"Application $appId finished with failed
status")
+ }
Review comment:
> There is a shutdown hook timeout of 30s.
This doesn't answer my question. How long does it take for YARN to actually
respond in the cases this is slow?
The way YARN API is supposed to work, the application master is meant to
unregister, it is not meant to go off of exit codes, if you can't unregister,
YARN doesn't really know what happened. Yes perhaps YARN should expose
container exit status better, feel free to go propose a better API in YARN that
we can use.
honestly this is kind of a YARN cluster setup issue in my opinion, if your
ResourceManager is not responding for long periods of times frequently, you
need to fix that. It would be great if applications could work around that but
I don't want to put something into Spark that is brittle and could cause other
issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]