jjayadeep06 commented on code in PR #50020:
URL: https://github.com/apache/spark/pull/50020#discussion_r1980609388


##########
core/src/main/scala/org/apache/spark/BarrierCoordinator.scala:
##########
@@ -122,23 +124,40 @@ private[spark] class BarrierCoordinator(
     // Init a TimerTask for a barrier() call.
     private def initTimerTask(state: ContextBarrierState): Unit = {
       timerTask = new TimerTask {
-        override def run(): Unit = state.synchronized {
-          // Timeout current barrier() call, fail all the sync requests.
-          requesters.foreach(_.sendFailure(new SparkException("The coordinator 
didn't get all " +
-            s"barrier sync requests for barrier epoch $barrierEpoch from 
$barrierId within " +
-            s"$timeoutInSecs second(s).")))
-          cleanupBarrierStage(barrierId)
-        }
+        override def run(): Unit =
+          try {
+            state.synchronized {
+              if (!Thread.currentThread().isInterrupted()) {
+                // Timeout current barrier() call, fail all the sync requests.
+                requesters.foreach(
+                  _.sendFailure(new SparkException("The coordinator didn't get 
all " +
+                    s"barrier sync requests for barrier epoch +" +
+                    s" $barrierEpoch from $barrierId within " +
+                    s"$timeoutInSecs second(s).")))
+                cleanupBarrierStage(barrierId)
+              }
+            }
+          } catch {
+            case _: InterruptedException =>
+              // Handle interruption gracefully
+              Thread.currentThread().interrupt()
+            case e: Exception => new SparkException("Error during " +
+              s"running of barrier tasks for " +
+              s"$barrierId", e)
+          } finally {
+            // Ensure cleanup happens even if interrupted or exception occurs
+            cleanupBarrierStage(barrierId)
+            state.clear()
+          }

Review Comment:
   I have removed `cleanupBarrierStage(barrierId)` from the `try` block and 
moved it to `finally`. I have also left some comments on what the new logic 
does. Please take a look. Essentially, this new block of code handles thread 
interruptions which is called on the invocation of `timeTask.cancel(true)` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to