GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/3849
[SPARK-4014] Change TaskContext.attemptId to return attempt number instead
of task ID
This patch modifies `TaskContext.attemptId` to return an attempt number,
which conveys how many times a task has been attempted, instead of a taskId,
which uniquely identifies a particular task attempt within a particular
SparkContext. Prior to this change, it was impossible to determine whether a
task was being re-attempted (or was a speculative copy), which made it
difficult to write unit tests for tasks that fail on early attempts or
speculative tasks that complete faster than original tasks.
I've introduced a new `TaskContext.taskId` field which returns the old
value.
Most of this patch is fairly straightforward, but there is a bit of
trickiness related to Mesos tasks: since there's no field in MesosTaskInfo to
encode the attemptId, I packed it into the `data` field alongside the task
binary.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark SPARK-4014
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/3849.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3849
----
commit fd515a56e9e66a2f7ab70187fbd463046f1cb330
Author: Josh Rosen <[email protected]>
Date: 2014-12-30T22:00:46Z
Add failing test for SPARK-4014
commit 1e7a933f41936e134e07c32aaf47bbcf8938167c
Author: Josh Rosen <[email protected]>
Date: 2014-12-30T22:01:21Z
[SPARK-4014] Change TaskContext.attemptId to return attempt number instead
of task ID.
commit 9d8d4d115449f48ce1714497e16704d4f12442d8
Author: Josh Rosen <[email protected]>
Date: 2014-12-30T22:06:19Z
Doc typo
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]