Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/21927#discussion_r207108174
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -340,6 +340,22 @@ class DAGScheduler(
}
}
+ /**
+ * Check to make sure we don't launch a barrier stage with unsupported
RDD chain pattern. The
+ * following patterns are not supported:
+ * 1. Ancestor RDDs that have different number of partitions from the
resulting RDD (eg.
+ * union()/coalesce()/first()/PartitionPruningRDD);
+ * 2. An RDD that depends on multiple barrier RDDs (eg.
barrierRdd1.zip(barrierRdd2)).
+ */
+ private def checkBarrierStageWithRDDChainPattern(rdd: RDD[_],
numPartitions: Int): Unit = {
--- End diff --
It would be nice to rename `numPartitions` to `numTasksInStage` (or a
better name).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]