Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/6750#discussion_r34054828
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -1126,39 +1127,47 @@ class DAGScheduler(
case FetchFailed(bmAddress, shuffleId, mapId, reduceId,
failureMessage) =>
val failedStage = stageIdToStage(task.stageId)
val mapStage = shuffleToMapStage(shuffleId)
+ if (failedStage.attemptId - 1 > task.stageAttemptId) {
--- End diff --
stage attempt ids get incremented when a task set is submitted (see
[here](https://github.com/apache/spark/blob/1cb2629f1aa466f92246828c562ea6f35c89ab87/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L941)).
So its normal for the attempt id of tasks to be one behind `stage.attemptId`.
This magically still lets all spark listener events have the right id, b/c [we
store `stage.latestInfo` before the attemptId gets
incremented](https://github.com/apache/spark/blob/1cb2629f1aa466f92246828c562ea6f35c89ab87/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L875).
Honestly that logic is pretty weird to me -- i could change that instead,
but I thought maybe better to not mess with that now. I suppose the test could
be `failedStage.attemptId - 1 != task.stageAttemptId`. Maybe it would actually
make the most sense to change to `failedStage.latestInfo.attemptId !=
task.stageAttemptId`, with a comment explaining why `failedStage.attemptId`
can't be used (and maybe add a comment to `Stage.attemptId` itself)?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]