mridulm commented on a change in pull request #35185:
URL: https://github.com/apache/spark/pull/35185#discussion_r785399113
##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala
##########
@@ -35,12 +35,33 @@ class TaskInfo(
*/
val index: Int,
val attemptNumber: Int,
+ /**
+ * The actual RDD partition ID in this task.
+ * The ID of the RDD partition is always same even task retries.
Review comment:
`The ID of the RDD partition is always same across task or stage
retries.`
Also, add that this will be `-1` for historical data, and available for all
spark applications since `3.3`.
##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala
##########
@@ -35,12 +35,33 @@ class TaskInfo(
*/
val index: Int,
val attemptNumber: Int,
+ /**
+ * The actual RDD partition ID in this task.
+ * The ID of the RDD partition is always same even task retries.
+ */
+ val partitionId: Int,
val launchTime: Long,
val executorId: String,
val host: String,
val taskLocality: TaskLocality.TaskLocality,
val speculative: Boolean) {
+ def this(
+ taskId: Long,
+ /**
+ * The index of this task within its task set. Not necessarily the same
as the ID of the RDD
+ * partition that the task is computing.
+ */
+ index: Int,
+ attemptNumber: Int,
+ launchTime: Long,
+ executorId: String,
+ host: String,
+ taskLocality: TaskLocality.TaskLocality,
+ speculative: Boolean) = {
+ this(taskId, index, attemptNumber, -1, launchTime, executorId, host,
taskLocality, speculative)
+ }
Review comment:
All use of `TaskInfo` in spark should pass the `partitionId` - leaving
this only for backward compatibility for end users.
##########
File path:
core/src/test/resources/HistoryServerExpectations/excludeOnFailure_for_stage_expectation.json
##########
@@ -631,6 +642,7 @@
"taskId" : 4,
"index" : 2,
"attempt" : 1,
+ "partitionId": -1,
"launchTime" : "2018-01-09T10:21:18.943GMT",
"duration" : 16,
Review comment:
Since we expect this to default to `-1`, revert changes to this (and
similar other) file ?
Similarly for Partition Id = -1 from json files as well.
##########
File path: core/src/main/scala/org/apache/spark/status/storeTypes.scala
##########
@@ -286,6 +289,7 @@ private[spark] class TaskDataWrapper(
taskId,
index,
attempt,
+ partitionId,
new Date(launchTime),
Review comment:
+CC @thejdeep, please take a look to make sure there are no backwardly
incompatible changes.
Existing event files and/or level db's should be readable after this change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]