toujours33 commented on code in PR #38711: URL: https://github.com/apache/spark/pull/38711#discussion_r1036003754
########## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ########## @@ -383,8 +383,8 @@ private[spark] class DAGScheduler( /** * Called by the TaskSetManager when it decides a speculative task is needed. */ - def speculativeTaskSubmitted(task: Task[_]): Unit = { - eventProcessLoop.post(SpeculativeTaskSubmitted(task)) + def speculativeTaskSubmitted(task: Task[_], taskIndex: Int = -1): Unit = { + eventProcessLoop.post(SpeculativeTaskSubmitted(task, taskIndex)) Review Comment: I constructed a case with shuffle retry and logged `taskIndex` and `partitionId` recorded in task events `SparkListenerTaskStart` & `SparkListenerTaskEnd`. When stage submitted normally(Not a retry stage or more broadly, not a partial task submission), `taskIndex` is equal to `partitionId`: <img width="1284" alt="image" src="https://user-images.githubusercontent.com/20420642/204814905-59625694-2e37-4484-8b10-522af8f0d61e.png"> But when shuffle stage resubmitted with partial tasks, `taskIndex` is not necessarily equal to `partitionId`: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org