Ngone51 commented on issue #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched URL: https://github.com/apache/spark/pull/22806#issuecomment-459942841 Oh, yes. I can image that this could be definitely potential race condition between thread `dag-scheduler-event-loop` and `task-result-getter` by accessing `tasksSuccessful` and `successful`. > What if we make successful into a ConcurrentHashMap? This could resolve the race condition, but problem still exists for scheduling throughput as @squito said. How about this way(actually, already mentioned above previously): Let's back to #21131 firstly and remains #21131 's behavior with no change. In `TaskSchedulerImpl` we maintain a map, e.g. called `stageIdToFinishedPartitions`. And each time we call `sched.markPartitionCompletedInAllTaskSets(stageId, tasks(index).partitionId, info)`, we do an extra thing, updating the finished partitionId into `stageIdToFinishedPartitions`. Then, when creating a new `TaskSetManager` In `TaskSchedulerImpl`, we always excludes `Task`s that corresponding to the finished partitions firstly by looking into `stageIdToFinishedPartitions.` In this way, if an active `tsm` exists when we called `sched.markPartitionCompletedInAllTaskSets`, then it realizes the finished partition and won't launch a duplicate one or just killing it if it's running. if no active `tsm` exists when we called `sched.markPartitionCompletedInAllTaskSets`, it could also be notified about the finished partition when we creating it later. WDYT? @squito
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
