GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/22923
[SPARK-25910][CORE] accumulator updates from previous stage attempt should not log error ## What changes were proposed in this pull request? For shuffle map stages, we may have multiple attempts, while only the latest attempt is active. However, the scheduler still accepts successful tasks from previous attempts, to speed up the execution. Each stage attempt has a `StageInfo` instance, which contains `TaskMetrics`. `TaskMetrics` has a bunch of accumulators to track the metrics like CPU time, etc. However, a stage only keeps the `StageInfo` of the latest attempt, which means the `StageInfo` of previous attempts will be GCed, and their accumulators of `TaskMetrics` will be cleaned. This causes a problem: When the scheduler accepts a successful task from a previous attempt, and tries to update accumulators, we may fail to get the accumulators from `AccumulatorContext`, as they are already cleaned. And we may hit error log like ``` 18/10/21 15:30:24 INFO ContextCleaner: Cleaned accumulator 2868 (name: internal.metrics.executorDeserializeTime) 18/10/21 15:30:24 ERROR DAGScheduler: Failed to update accumulators for task 7927 org.apache.spark.SparkException: attempted to access non-existent accumulator 2868 at org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1267) ... ``` This PR proposes a simple fix: When the scheduler receives successful tasks from previous attempts, don't update accumulators. Accumulators of previous stage attemps are not tracked anymore, so we don't need to update them. ## How was this patch tested? a new test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark late-task Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22923.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22923 ---- commit 07f900cf845662186f8d1daea3be9abe2633d5c0 Author: Wenchen Fan <wenchen@...> Date: 2018-11-01T15:40:14Z accumulator updates from previous stage attempt commit 4d9cbe043604e76b6367e4ecb42d0d36437d1792 Author: Wenchen Fan <wenchen@...> Date: 2018-11-01T16:04:41Z different fix ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org