squito commented on a change in pull request #25620: [SPARK-25341][Core] 
Support rolling back a shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/25620#discussion_r322464691
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
 ##########
 @@ -1105,7 +1105,16 @@ private[spark] class DAGScheduler(
   private def submitMissingTasks(stage: Stage, jobId: Int) {
     logDebug("submitMissingTasks(" + stage + ")")
 
-    // First figure out the indexes of partition ids to compute.
+    // Before find missing partition, do the intermediate state clean work 
first.
+    // The operation here can make sure for the intermediate stage, 
`findMissingPartitions()`
+    // returns all partitions every time.
+    stage match {
 
 Review comment:
   I'm a little concerned about putting this here, as you'll see lower down in 
this method there is some handling for the case that `submitMissingTasks` is 
called but there are actually no tasks to run.  I'm not seeing how that happens 
now, but your change would make those cases always re-evaluate all partitions 
of the stage.
   
   I think @jiangxb1987 suggestion makes sense, couldn't you do it the 
unregistering there?  I agree the logic is currently insufficient as its not 
building up the full set of stages that need to be recomputed, but maybe we 
need to combine both.
   
   or maybe we understand the old cases of submitting a stage with no missing 
partitions and my concern is not relevant?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to