tgravescs commented on a change in pull request #34366:
URL: https://github.com/apache/spark/pull/34366#discussion_r744864128



##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1277,10 +1277,14 @@ private[spark] class Client(
     } else {
       val YarnAppReport(appState, finalState, diags) = 
monitorApplication(appId)
       if (appState == YarnApplicationState.FAILED || finalState == 
FinalApplicationStatus.FAILED) {
+        var amContainerSuccess = false
         diags.foreach { err =>
+          amContainerSuccess = err.contains("AM Container") && 
err.contains("exitCode: 0")
           logError(s"Application diagnostics message: $err")
         }
-        throw new SparkException(s"Application $appId finished with failed 
status")
+        if (!amContainerSuccess) {
+          throw new SparkException(s"Application $appId finished with failed 
status")
+        }

Review comment:
       > There is a shutdown hook timeout of 30s.
   
   This doesn't answer my question.  How long does it take for YARN to actually 
respond in the cases this is slow?
   
   The way YARN API is supposed to work, the application master is meant to 
unregister, it is not meant to go off of exit codes, if you can't unregister, 
YARN doesn't really know what happened.  Yes perhaps YARN should expose 
container exit status better, feel free to go propose a better API in YARN that 
we can use.
   
   honestly this is kind of a YARN cluster setup issue in my opinion, if your 
ResourceManager is not responding for long periods of times frequently, you 
need to fix that.  It would be great if applications could work around that but 
I don't want to put something into Spark that is brittle and could cause other 
issues.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to