GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/8789
[SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in
OutputCommitCoordinator (branch-1.4 backport)
This is a backport of #8544 to `branch-1.4` for inclusion in 1.4.2.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark SPARK-10381-1.4
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/8789.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #8789
----
commit 4bfbfd6fa59f544d034a7abc5cd31428342053eb
Author: Josh Rosen <[email protected]>
Date: 2015-09-16T00:11:21Z
[SPARK-10381] Fix mixup of taskAttemptNumber & attemptId in
OutputCommitCoordinator
When speculative execution is enabled, consider a scenario where the
authorized committer of a particular output partition fails during the
OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator is
supposed to release that committer's exclusive lock on committing once that
task fails. However, due to a unit mismatch (we used task attempt number in one
place and task attempt id in another) the lock will not be released, causing
Spark to go into an infinite retry loop.
This bug was masked by the fact that the OutputCommitCoordinator does not
have enough end-to-end tests (the current tests use many mocks). Other factors
contributing to this bug are the fact that we have many similarly-named
identifiers that have different semantics but the same data types (e.g.
attemptNumber and taskAttemptId, with inconsistent variable naming which makes
them difficult to distinguish).
This patch adds a regression test and fixes this bug by always using task
attempt numbers throughout this code.
Author: Josh Rosen <[email protected]>
Closes #8544 from JoshRosen/SPARK-10381.
(cherry picked from commit 38700ea40cb1dd0805cc926a9e629f93c99527ad)
Signed-off-by: Josh Rosen <[email protected]>
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]