Github user ivoson commented on a diff in the pull request:
https://github.com/apache/spark/pull/20244#discussion_r161145538
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2399,6 +2417,93 @@ class DAGSchedulerSuite extends SparkFunSuite with
LocalSparkContext with TimeLi
}
}
+ /**
+ * In this test, we simply simulate the scene in concurrent jobs using
the same
+ * rdd which is marked to do checkpoint:
+ * Job one has already finished the spark job, and start the process of
doCheckpoint;
+ * Job two is submitted, and submitMissingTasks is called.
+ * In submitMissingTasks, if taskSerialization is called before
doCheckpoint is done,
+ * while part calculates from stage.rdd.partitions is called after
doCheckpoint is done,
+ * we may get a ClassCastException when execute the task because of some
rdd will do
+ * Partition cast.
+ *
+ * With this test case, just want to indicate that we should do
taskSerialization and
+ * part calculate in submitMissingTasks with the same rdd checkpoint
status.
+ */
+ test("task part misType with checkpoint rdd in concurrent execution
scenes") {
--- End diff --
thanks for the suggest.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]