GitHub user squito opened a pull request:
https://github.com/apache/spark/pull/9214
[SPARK-8029][core][wip] first successful shuffle task always wins
Shuffle writers now write to temp files, and when they are done, they
atomically move those files into the final location *if those files don't
already exist*. This way, if one executor ends up executing more than one task
to generate shuffle output for one partition, the first successful one "wins",
and all others are ignored.
TODO
- [ ] make sure I'm using the right compression / temp block sizes, per
SPARK-3426
- [ ] run some fault-injection tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/squito/spark SPARK-8029_first_wins
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9214.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9214
----
commit 6140e426f045967e107451336005887e144f6e39
Author: Imran Rashid <[email protected]>
Date: 2015-10-21T19:26:26Z
ShuffleWriters write to temp file, then go through
ShuffleOutputCoordinator to atomically move w/ "first one wins"
commit 5854ac8a68474b595c9f02d895f2bb3c2eb59c5a
Author: Imran Rashid <[email protected]>
Date: 2015-10-22T03:17:17Z
assorted cleanup
commit c3e4456788e4f6a10d07f5ff47eb4d6a8d19f543
Author: Imran Rashid <[email protected]>
Date: 2015-10-22T03:19:07Z
style
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]