hujiahua created SPARK-37300:
--------------------------------

             Summary: TaskSchedulerImpl should ignore task finished event if 
its task was already finished state
                 Key: SPARK-37300
                 URL: https://issues.apache.org/jira/browse/SPARK-37300
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.2.0
            Reporter: hujiahua


When a executor finished a task of some stage, the driver will receive a 
StatusUpdate event to handle it. At the same time the driver found the executor 
heartbeat timed out, so the dirver also need handle ExecutorLost event 
simultaneously. There was a race condition issues here, which will make 
TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result.

The problem is that TaskResultGetter.enqueueSuccessfulTask use asynchronous 
thread to handle successful task, that mean the synchronized lock of 
TaskSchedulerImpl was released prematurely during midway 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala#L61.
 So TaskSchedulerImpl may handle executorLost first, then the asynchronous 
thread will go on to handle successful task. It cause TaskSetManager.successful 
and TaskSetManager.tasksSuccessful wrong result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to