tgravescs commented on a change in pull request #34366:
URL: https://github.com/apache/spark/pull/34366#discussion_r743114402
##########
File path:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1277,10 +1277,14 @@ private[spark] class Client(
} else {
val YarnAppReport(appState, finalState, diags) =
monitorApplication(appId)
if (appState == YarnApplicationState.FAILED || finalState ==
FinalApplicationStatus.FAILED) {
+ var amContainerSuccess = false
diags.foreach { err =>
+ amContainerSuccess = err.contains("AM Container") &&
err.contains("exitCode: 0")
logError(s"Application diagnostics message: $err")
}
- throw new SparkException(s"Application $appId finished with failed
status")
+ if (!amContainerSuccess) {
+ throw new SparkException(s"Application $appId finished with failed
status")
+ }
Review comment:
yeah unfortunately that final application status is what YARN really
uses and advertises and it really wants you to unregister.
How often does this really happen? it seems like if you are timing out
talking to RM very often you have other problems on your cluster. Or does this
happen on rolling upgrade or something?
Note, If we were to do this, then the YARN final status also wouldn't match
because doesn't that show up as failed when it can't unregister? that could
confuse the user.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]