Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22112#discussion_r212332473
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1865,6 +1876,39 @@ abstract class RDD[T: ClassTag](
// RDD chain.
@transient protected lazy val isBarrier_ : Boolean =
dependencies.filter(!_.isInstanceOf[ShuffleDependency[_, _,
_]]).exists(_.rdd.isBarrier())
+
+ /**
+ * Returns the random level of this RDD's computing function. Please
refer to [[RDD.RandomLevel]]
+ * for the definition of random level.
+ *
+ * By default, an RDD without parents(root RDD) is IDEMPOTENT. For RDDs
with parents, the random
+ * level of current RDD is the random level of the parent which is
random most.
+ */
+ // TODO: make it public so users can set random level to their custom
RDDs.
+ // TODO: this can be per-partition. e.g. UnionRDD can have different
random level for different
+ // partitions.
+ private[spark] def computingRandomLevel: RDD.RandomLevel.Value = {
+ val parentRandomLevels = dependencies.map {
+ case dep: ShuffleDependency[_, _, _] =>
+ if (dep.rdd.computingRandomLevel == RDD.RandomLevel.INDETERMINATE)
{
+ RDD.RandomLevel.INDETERMINATE
--- End diff --
> If checkpointed already - then always same order
`RDD#dependencies` is defined as
```
final def dependencies: Seq[Dependency[_]] = {
checkpointRDD.map(r => List(new OneToOneDependency(r))).getOrElse {
if (dependencies_ == null) {
dependencies_ = getDependencies
}
dependencies_
}
}
```
So we don't need to handle checkpoint here.
> All other shuffle cases, we dont know the output order in spark.
Actually we know. As long as the shuffle map stage RDD is IDEMPOTENT or
UNORDERED, the reduce RDD is UNORDERED instead of INDETERMINATE.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]