viirya commented on a change in pull request #28245: [SPARK-31472][CORE] Make 
sure Barrier Task always return messages or exception with abortableRpcFuture 
check 
URL: https://github.com/apache/spark/pull/28245#discussion_r410977586
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
 ##########
 @@ -85,28 +84,28 @@ class BarrierTaskContext private[spark] (
         // BarrierCoordinator on timeout, instead of RPCTimeoutException from 
the RPC framework.
         timeout = new RpcTimeout(365.days, "barrierTimeout"))
 
-      // messages which consist of all barrier tasks' messages
-      var messages: Array[String] = null
       // Wait the RPC future to be completed, but every 1 second it will jump 
out waiting
       // and check whether current spark task is killed. If killed, then throw
       // a `TaskKilledException`, otherwise continue wait RPC until it 
completes.
-      try {
-        while (!abortableRpcFuture.toFuture.isCompleted) {
+
+      // import scala Success locally to avoid conflict with 
org.apache.spark.Success
+      import scala.util.{Failure, Success, Try}
+      while (!abortableRpcFuture.future.isCompleted) {
+        try {
           // wait RPC future for at most 1 second
-          try {
-            messages = ThreadUtils.awaitResult(abortableRpcFuture.toFuture, 
1.second)
-          } catch {
-            case _: TimeoutException | _: InterruptedException =>
-              // If `TimeoutException` thrown, waiting RPC future reach 1 
second.
-              // If `InterruptedException` thrown, it is possible this task is 
killed.
-              // So in this two cases, we should check whether task is killed 
and then
-              // throw `TaskKilledException`
-              taskContext.killTaskIfInterrupted()
+          Thread.sleep(1000)
+        } catch {
+          case _: InterruptedException => // task is killed by driver
+        } finally {
+          Try(taskContext.killTaskIfInterrupted()) match {
+            case Success(_) => // task is still running healthily
+            case Failure(e) => abortableRpcFuture.abort(e)
           }
         }
-      } finally {
-        
abortableRpcFuture.abort(taskContext.getKillReason().getOrElse("Unknown 
reason."))
 
 Review comment:
   Just out of curiosity, why previously we did `abortableRpcFuture.abort` even 
the RPC future is completely normally?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to