Ngone51 commented on a change in pull request #28258:
URL: https://github.com/apache/spark/pull/28258#discussion_r419464952



##########
File path: core/src/main/scala/org/apache/spark/deploy/Client.scala
##########
@@ -124,38 +127,57 @@ private class ClientEndpoint(
     }
   }
 
-  /* Find out driver status then exit the JVM */
+  /**
+   * Find out driver status then exit the JVM. If the waitAppCompletion is set 
to true, monitors
+   * the application until it finishes, fails or is killed.
+   */
   def pollAndReportStatus(driverId: String): Unit = {
     // Since ClientEndpoint is the only RpcEndpoint in the process, blocking 
the event loop thread
     // is fine.
     logInfo("... waiting before polling master for driver state")
     Thread.sleep(5000)
     logInfo("... polling master for driver state")
-    val statusResponse =
-      
activeMasterEndpoint.askSync[DriverStatusResponse](RequestDriverStatus(driverId))
-    if (statusResponse.found) {
-      logInfo(s"State of $driverId is ${statusResponse.state.get}")
-      // Worker node, if present
-      (statusResponse.workerId, statusResponse.workerHostPort, 
statusResponse.state) match {
-        case (Some(id), Some(hostPort), Some(DriverState.RUNNING)) =>
-          logInfo(s"Driver running on $hostPort ($id)")
-        case _ =>
-      }
-      // Exception, if present
-      statusResponse.exception match {
-        case Some(e) =>
-          logError(s"Exception from cluster was: $e")
-          e.printStackTrace()
-          System.exit(-1)
-        case _ =>
-          System.exit(0)
+    while (true) {

Review comment:
       Hey, please pay attention to my comment here. I believe the current 
implementation could block `ClientEndpoint` because it's a 
`ThreadSafeRpcEndpoint`. When enabling `waitAppCompletion`, `ClientEndpoint` 
would actually keep handling message `SubmitDriverResponse` until the 
application finished. So, `ClientEndpoint` is unable to handle other messages, 
e.g. `RemoteProcessDisconnected`, `RemoteProcessConnectionError`, at the same 
time, which breaks the current behaviour. Furthermore, it could also block 
messages from backup masters, though not fatal in this case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to