Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/15213#discussion_r80297729
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2105,6 +2109,52 @@ class DAGSchedulerSuite extends SparkFunSuite with
LocalSparkContext with Timeou
assert(scheduler.getShuffleDependencies(rddE) === Set(shuffleDepA,
shuffleDepC))
}
+ test("The failed stage never resubmitted due to abort stage in another
thread") {
+ implicit val executorContext = ExecutionContext
+ .fromExecutorService(Executors.newFixedThreadPool(5))
+ val duration = 60.seconds
+
+ val f1 = Future {
+ try {
+ val rdd1 = sc.makeRDD(Array(1, 2, 3, 4), 2).map(x => (x,
1)).groupByKey()
+ val shuffleHandle =
+ rdd1.dependencies.head.asInstanceOf[ShuffleDependency[_, _,
_]].shuffleHandle
+ rdd1.map {
+ case (x, _) if (x == 1) =>
+ throw new FetchFailedException(
+ BlockManagerId("1", "1", 1), shuffleHandle.shuffleId, 0, 0,
"test")
+ case (x, _) => x
+ }.count()
+ } catch {
+ case e: Throwable =>
+ logInfo("expected abort stage1: " + e.getMessage)
+ }
+ }
+ ThreadUtils.awaitResult(f1, duration)
+ val f2 = Future {
--- End diff --
Could you add a comment to explain why needs two same jobs here? It took me
a while to figure out. E.g.,
```
The following job that fails due to fetching failure will hang without the
fix for SPARK-17644
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]