Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13603#discussion_r68641914
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala ---
    @@ -30,6 +30,7 @@ import org.apache.spark.annotation.DeveloperApi
     @DeveloperApi
     class TaskInfo(
         val taskId: Long,
    +    /** the index of this task in its TaskSet.  *Not* necessarily the same 
as the partitionid */
    --- End diff --
    
    yeah, sorry, at first I thought they were the same. and in fact I even made 
a bunch of changes to reflect that (which is probably what you are referring 
to).
    
    Then the next morning I was poking around some more and discovered that I 
was completely wrong.  They are definitely not the same, so I reverted those 
earlier changes and updated the names some more. (I dropped another comment 
later on the PR saying as much, but probably got lost in all the noise, sorry.)
    
    The names here are pretty confusing.  Part of the problem is that 
`taskIndex` makes me think of `taskId` aka `TID`, which it definitely is *not*. 
 There are 4 different numbers identifying a task now:
    
    a) TID / `taskId`
    b) index within a taskset (sometimes called `index`, sometimes called 
`taskIndex`, sometimes called `taskId`, depending on where in the code)
    c) attempt number
    d) partition
    
    In the logs, you'll see "task [b].[c]" eg 
[`TaskSetManager.handleSuccessfulTask`](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L630).
  [b] and [d] are *usually* exactly the same.  However, they aren't the same 
when a stage is retried, and you only run selective partitions -- the new 
taskset still numbers from 0 to n, but the partitions will be arbitrary.
    
    We don't have a good name for [b] now, I'd like to come up with something 
better.  We could at least consistently use `taskIndex`, so that `taskId` 
stopped meaning two different things.   Another option is `indexInTaskSet` but 
that's kinda long ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to