Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/13603#discussion_r68641914
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala ---
@@ -30,6 +30,7 @@ import org.apache.spark.annotation.DeveloperApi
@DeveloperApi
class TaskInfo(
val taskId: Long,
+ /** the index of this task in its TaskSet. *Not* necessarily the same
as the partitionid */
--- End diff --
yeah, sorry, at first I thought they were the same. and in fact I even made
a bunch of changes to reflect that (which is probably what you are referring
to).
Then the next morning I was poking around some more and discovered that I
was completely wrong. They are definitely not the same, so I reverted those
earlier changes and updated the names some more. (I dropped another comment
later on the PR saying as much, but probably got lost in all the noise, sorry.)
The names here are pretty confusing. Part of the problem is that
`taskIndex` makes me think of `taskId` aka `TID`, which it definitely is *not*.
There are 4 different numbers identifying a task now:
a) TID / `taskId`
b) index within a taskset (sometimes called `index`, sometimes called
`taskIndex`, sometimes called `taskId`, depending on where in the code)
c) attempt number
d) partition
In the logs, you'll see "task [b].[c]" eg
[`TaskSetManager.handleSuccessfulTask`](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L630).
[b] and [d] are *usually* exactly the same. However, they aren't the same
when a stage is retried, and you only run selective partitions -- the new
taskset still numbers from 0 to n, but the partitions will be arbitrary.
We don't have a good name for [b] now, I'd like to come up with something
better. We could at least consistently use `taskIndex`, so that `taskId`
stopped meaning two different things. Another option is `indexInTaskSet` but
that's kinda long ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]