GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/19877
[SPARK-22681]Accumulator should only updated once for each task in result stage ## What changes were proposed in this pull request? As the doc says "For accumulator updates performed inside actions only, Spark guarantees that each taskâs update to the accumulator will only be applied once, i.e. restarted tasks will not update the value." But currently the code doesn't guarantee this. ## How was this patch tested? New added tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark fixAccum Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19877.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19877 ---- commit 882126c2671e1733d572350af9749e9f8bdca1c2 Author: Carson Wang <carson.w...@intel.com> Date: 2017-12-04T12:23:14Z Do not update accumulator for resubmitted task in result stage ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org