Victsm commented on a change in pull request #30691:
URL: https://github.com/apache/spark/pull/30691#discussion_r647710370
##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -1298,7 +1329,7 @@ private[spark] class DAGScheduler(
// `findMissingPartitions()` returns all partitions every time.
stage match {
case sms: ShuffleMapStage if stage.isIndeterminate && !sms.isAvailable =>
- mapOutputTracker.unregisterAllMapOutput(sms.shuffleDep.shuffleId)
+
mapOutputTracker.unregisterAllMapAndMergeOutput(sms.shuffleDep.shuffleId)
Review comment:
If we make this change, is SPARK-32923 (for properly handling
indeterminate stage retries) still needed as part of SPARK-30602?
This will always recompute all partitions.
Should we also reset the other metadata here, such as resetting
`sms.shuffleDep.shuffleMergeEnabled`?
This way it would make sure that the later invocation to
`prepareShuffleServicesForShuffleMapStage` would not be interfered from the
previous attempt of this indeterminate stage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]