tgravescs commented on a change in pull request #34366:
URL: https://github.com/apache/spark/pull/34366#discussion_r746690053
##########
File path:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1277,10 +1277,14 @@ private[spark] class Client(
} else {
val YarnAppReport(appState, finalState, diags) =
monitorApplication(appId)
if (appState == YarnApplicationState.FAILED || finalState ==
FinalApplicationStatus.FAILED) {
+ var amContainerSuccess = false
diags.foreach { err =>
+ amContainerSuccess = err.contains("AM Container") &&
err.contains("exitCode: 0")
logError(s"Application diagnostics message: $err")
}
- throw new SparkException(s"Application $appId finished with failed
status")
+ if (!amContainerSuccess) {
+ throw new SparkException(s"Application $appId finished with failed
status")
+ }
Review comment:
so I guess I'm kind of wondering if we should do make any change here.
This doesn't fail the application, it may be unwanted behavior in your case
that it retries sometimes but if its doing it to often it seems like a cluster
issue to me. Hadoop client has built in retries and timeouts that are supposed
to be configured to account for some of these, so if you are going outside of
those that should be a out of the norm occurrence. The Hadoop philosophy IMHO
is to retry on failures so if this occasionally happens shouldn't be a big
deal. If it is, it's up to the application to handle. The Hadoop API for
properly checking application status is what we are using and this trying to
infer the status which could be brittle and just adds maintenance for Spark if
we add yet another config.
Note Hadoop also has a shutdown hook configuration which I think is the 30
seconds you are referring to here: `hadoop.service.shutdown.timeout`. You
could increase that and increase the Hadoop retries/timeouts.
It would be nice to get more feedback on if this is a problem for other
users or if devs have opinions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]