GitHub user carsonwang opened a pull request:
https://github.com/apache/spark/pull/19877
[SPARK-22681]Accumulator should only updated once for each task in result
stage
## What changes were proposed in this pull request?
As the doc says "For accumulator updates performed inside actions only,
Spark guarantees that each taskâs update to the accumulator will only be
applied once, i.e. restarted tasks will not update the value."
But currently the code doesn't guarantee this.
## How was this patch tested?
New added tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/carsonwang/spark fixAccum
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19877.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19877
----
commit 882126c2671e1733d572350af9749e9f8bdca1c2
Author: Carson Wang <[email protected]>
Date: 2017-12-04T12:23:14Z
Do not update accumulator for resubmitted task in result stage
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]