WeichenXu123 commented on a change in pull request #25235: [SPARK-28483][Core]
Fix canceling a spark job using barrier mode but barrier tasks blocking on
BarrierTaskContext.barrier()
URL: https://github.com/apache/spark/pull/25235#discussion_r309979408
##########
File path: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala
##########
@@ -117,12 +118,30 @@ class BarrierTaskContext private[spark] (
timer.schedule(timerTask, 60000, 60000)
try {
- barrierCoordinator.askSync[Unit](
+ val cancelableRpcFuture = barrierCoordinator.askCancelable[Unit](
message = RequestToSync(numTasks, stageId, stageAttemptNumber,
taskAttemptId,
barrierEpoch),
// Set a fixed timeout for RPC here, so users shall get a
SparkException thrown by
// BarrierCoordinator on timeout, instead of RPCTimeoutException from
the RPC framework.
timeout = new RpcTimeout(365.days, "barrierTimeout"))
+
+ // Wait the RPC future to be completed, but every 1 second it will jump
out waiting
+ // and check whether current spark task is killed. If killed, then throw
+ // a `TaskKilledException`, otherwise continue wait RPC until it
completes.
+ while(!taskContext.isCompleted()) {
+ if (taskContext.isInterrupted()) {
+ val reason = taskContext.getKillReason().get
Review comment:
No. See `isInterrupted` implementation:
```
def isInterrupted(): Boolean = reasonIfKilled.isDefined
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]