mridulm commented on a change in pull request #30691:
URL: https://github.com/apache/spark/pull/30691#discussion_r648897593
##########
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##########
@@ -1678,38 +1717,16 @@ private[spark] class DAGScheduler(
}
if (runningStages.contains(shuffleStage) &&
shuffleStage.pendingPartitions.isEmpty) {
- markStageAsFinished(shuffleStage)
- logInfo("looking for newly runnable stages")
- logInfo("running: " + runningStages)
- logInfo("waiting: " + waitingStages)
- logInfo("failed: " + failedStages)
-
- // This call to increment the epoch may not be strictly
necessary, but it is retained
- // for now in order to minimize the changes in behavior from an
earlier version of the
- // code. This existing behavior of always incrementing the epoch
following any
- // successful shuffle map stage completion may have benefits by
causing unneeded
- // cached map outputs to be cleaned up earlier on executors. In
the future we can
- // consider removing this call, but this will require some extra
investigation.
- // See
https://github.com/apache/spark/pull/17955/files#r117385673 for more details.
- mapOutputTracker.incrementEpoch()
-
- clearCacheLocs()
-
- if (!shuffleStage.isAvailable) {
- // Some tasks had failed; let's resubmit this shuffleStage.
- // TODO: Lower-level scheduler should also deal with this
- logInfo("Resubmitting " + shuffleStage + " (" +
shuffleStage.name +
- ") because some of its tasks had failed: " +
- shuffleStage.findMissingPartitions().mkString(", "))
- submitStage(shuffleStage)
+ if (!shuffleStage.isMergeFinalized &&
Review comment:
Adding `isAvailable` check pushes the finalization of the current stage
attempt to the next.
In the current incarnation of the code, this is fine if we analyze all the
corner cases - but in general, this is not robust to future changes.
As we always invoke `processShuffleMapStageCompletion` after finalization -
in case of missing partitions at that time, it will result in stage
resubmission (either due to it being missing at this point - or due to loss
later on while we wait for finalization).
Let us keep the side effects simpler and revisit this in future in case we
are seeing increased drop in merge ratio.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]