Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/9274#discussion_r43177740
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -949,7 +949,13 @@ class DAGScheduler(
// serializable. If tasks are not serializable, a
SparkListenerStageCompleted event
// will be posted, which should always come after a corresponding
SparkListenerStageSubmitted
// event.
- outputCommitCoordinator.stageStart(stage.id)
+ stage match {
+ case s: ShuffleMapStage =>
+ outputCommitCoordinator.stageStart(stage = s.id, maxPartitionId =
s.numPartitions - 1)
--- End diff --
as I was reviewing this, I was wondering if a `ShuffleMapStage` could have
a different maximum partitionId if it was from a skipped stage. I'm now
convinced it cannot, but it might be a bit clearer if we change the
[constructor](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala#L39)
to not even take a `numTasks` argument, since it should always be
`rdd.partitions.length`? Not necessary for this change, but just a thought
while you are touching this.
Also -- isn't the output commit coordinator irrelevant for
`ShuffleMapStage`s anyway? If not, than I think there might be another bug
there for skipped stages. Since it indexes by stageId, you can have two
different stages, that really represent the exact same shuffle, so you could
have two different tasks authorized to commit that are handling the same stage.
(Which wouldn't be a problem introduced by this change, but I just thought it
was worth mentioning.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]