tgravescs commented on a change in pull request #34366:
URL: https://github.com/apache/spark/pull/34366#discussion_r743114402



##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
##########
@@ -1277,10 +1277,14 @@ private[spark] class Client(
     } else {
       val YarnAppReport(appState, finalState, diags) = 
monitorApplication(appId)
       if (appState == YarnApplicationState.FAILED || finalState == 
FinalApplicationStatus.FAILED) {
+        var amContainerSuccess = false
         diags.foreach { err =>
+          amContainerSuccess = err.contains("AM Container") && 
err.contains("exitCode: 0")
           logError(s"Application diagnostics message: $err")
         }
-        throw new SparkException(s"Application $appId finished with failed 
status")
+        if (!amContainerSuccess) {
+          throw new SparkException(s"Application $appId finished with failed 
status")
+        }

Review comment:
       yeah unfortunately that final application status is what YARN really 
uses and advertises and it really wants you to unregister.
   
   How often does this really happen?  it seems like if you are timing out 
talking to RM very often you have other problems on your cluster.  Or does this 
happen on rolling upgrade or something?
   
   Note, If we were to do this, then the YARN final status also wouldn't match 
because doesn't that show up as failed when it can't unregister?  that could 
confuse the user.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to