xuanyuanking opened a new pull request #25508: 
[SPARK-28699][CORE][BACKPORT-2.3] Fix a corner case for aborting indeterminate 
stage
URL: https://github.com/apache/spark/pull/25508
 
 
   ### What changes were proposed in this pull request?
   Change the logic of collecting the indeterminate stage, we should look at 
stages from mapStage, not failedStage during handle FetchFailed.
   
   ### Why are the changes needed?
   In the fetch failed error handle logic, the original logic of collecting 
indeterminate stage from the fetch failed stage. And in the scenario of the 
fetch failed happened in the first task of this stage, this logic will cause 
the indeterminate stage to resubmit partially. Eventually, we are capable of 
getting correctness bug.
   
   ### Does this PR introduce any user-facing change?
   It makes the corner case of indeterminate stage abort as expected.
   
   ### How was this patch tested?
   New UT in DAGSchedulerSuite.
   Run below integrated test with `local-cluster[5, 2, 5120]`, and set 
`spark.sql.execution.sortBeforeRepartition`=false, it will abort the 
indeterminate stage as expected:
   ```
   import scala.sys.process._
   import org.apache.spark.TaskContext
   
   val res = spark.range(0, 10000 * 10000, 1).map{ x => (x % 1000, x)}
   // kill an executor in the stage that performs repartition(239)
   val df = res.repartition(113).map{ x => (x._1 + 1, 
x._2)}.repartition(239).map { x =>
     if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 
&& TaskContext.get.stageAttemptNumber == 0) {
       throw new Exception("pkill -f -n java".!!)
     }
     x
   }
   val r2 = df.distinct.count()
   ```
   
   Authored-by: Yuanjian Li <xyliyuanj...@gmail.com>
   Signed-off-by: Wenchen Fan <wenc...@databricks.com>
   (cherry picked from commit 0d3a783cc57ed09650ee31851a19728d8f16cd0c)
   Signed-off-by: Yuanjian Li <xyliyuanj...@gmail.com>

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to