tgravescs commented on a change in pull request #33780:
URL: https://github.com/apache/spark/pull/33780#discussion_r719454162



##########
File path: docs/running-on-yarn.md
##########
@@ -441,6 +441,20 @@ To use a custom metrics.properties for the application 
master and executors, upd
   </td>
   <td>1.6.0</td>
 </tr>
+<tr>
+  <td><code>spark.yarn.am.clientModeTreatDisconnectAsFailed</code></td>
+  <td>false</td>
+  <td>
+  In managed yarn-client mode, when am disconnect with driver, am will finish 
application with SUCCESS final status since in 

Review comment:
       how about something like this:
   
   Treat yarn-client unclean disconnects as failures.  In yarn-client mode, 
normally the application will always finish with a final status of SUCCESS 
because in some cases, it is not possible to know if the Application was 
terminated intentionally by the user or if there was a real error.  This config 
changes that behavior such that if the Application Master disconnects from the 
driver uncleanly (ie without the proper shutdown handshake) the  application 
will terminate with a final status of FAILED. This will allow the caller to 
decide if it was truly a failure. Note that if this config is set and the user 
just terminate the client application badly it may show a status of FAILED when 
it wasn't really FAILED.

##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
##########
@@ -784,6 +784,9 @@ private[spark] class ApplicationMaster(
    */
   private class AMEndpoint(override val rpcEnv: RpcEnv, driver: RpcEndpointRef)
     extends RpcEndpoint with Logging {
+    private var shutdown = false

Review comment:
       make shutdown @volatile

##########
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
##########
@@ -843,8 +848,13 @@ private[spark] class ApplicationMaster(
       // In cluster mode or unmanaged am case, do not rely on the 
disassociated event to exit
       // This avoids potentially reporting incorrect exit codes if the driver 
fails
       if (!(isClusterMode || sparkConf.get(YARN_UNMANAGED_AM))) {
-        logInfo(s"Driver terminated or disconnected! Shutting down. 
$remoteAddress")
-        finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+        if (shutdown || !clientModeTreatDisconnectAsFailed) {
+          logInfo(s"Driver terminated or disconnected! Shutting down. 
$remoteAddress")
+          finish(FinalApplicationStatus.SUCCEEDED, 
ApplicationMaster.EXIT_SUCCESS)
+        } else {
+          logError(s"Application Master lose connection with driver! Shutting 
down. $remoteAddress")

Review comment:
       replace lose with lost




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to