Github user kayousterhout commented on a diff in the pull request:
https://github.com/apache/spark/pull/12655#discussion_r62944090
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -322,6 +322,53 @@ class DAGSchedulerSuite extends SparkFunSuite with
LocalSparkContext with Timeou
assert(sparkListener.stageByOrderOfExecution(0) <
sparkListener.stageByOrderOfExecution(1))
}
+ /**
+ * This test ensures that DAGScheduler build stage graph correctly.
+ *
+ * Suppose you have the following DAG:
+ *
+ * [A] <--(s_A)-- [B] <--(s_B)-- [C] <--(s_C)-- [D]
+ * \ /
+ * <-------------
+ *
+ * Here, RDD B has a shuffle dependency on RDD A, and RDD C has shuffle
dependency on both
+ * B and A. The shuffle dependency IDs are numbers in the DAGScheduler,
but to make the example
+ * easier to understand, let's call the shuffled data from A shuffle
dependency ID s_A and the
+ * shuffled data from B shuffle dependency ID s_B.
+ *
+ * Note: [] means an RDD, () means a shuffle dependency.
+ */
+ test("[SPARK-13902] not to create duplicate stage.") {
--- End diff --
Can you change this to "[SPARK-13902] Ensure no duplicate stages are
created"?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]