Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/22112#discussion_r212385688 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1865,6 +1876,39 @@ abstract class RDD[T: ClassTag]( // RDD chain. @transient protected lazy val isBarrier_ : Boolean = dependencies.filter(!_.isInstanceOf[ShuffleDependency[_, _, _]]).exists(_.rdd.isBarrier()) + + /** + * Returns the random level of this RDD's computing function. Please refer to [[RDD.RandomLevel]] + * for the definition of random level. + * + * By default, an RDD without parents(root RDD) is IDEMPOTENT. For RDDs with parents, the random + * level of current RDD is the random level of the parent which is random most. + */ + // TODO: make it public so users can set random level to their custom RDDs. + // TODO: this can be per-partition. e.g. UnionRDD can have different random level for different + // partitions. + private[spark] def computingRandomLevel: RDD.RandomLevel.Value = { + val parentRandomLevels = dependencies.map { + case dep: ShuffleDependency[_, _, _] => + if (dep.rdd.computingRandomLevel == RDD.RandomLevel.INDETERMINATE) { + RDD.RandomLevel.INDETERMINATE --- End diff -- RE: checkpoint. I wanted to handle two cases. * Checkpoint is being done as part of the current job (and not a previous job which forced materialization of checkpoint'ed RDD). * Checkpoint is happening to reliable store, not local - where we are subject to failures on node failures. Looks like `dep.rdd.isCheckpointed` is the wrong way to go about it (relying on `dependencies` is insufficient for both cases). A better option seems to be: ``` // If checkpointed already - then always same order case dep: Dependency if dep.rdd.getCheckpointFile.isDefined => RDD.RandomLevel.IDEMPOTENT ``` > Actually we know. As long as the shuffle map stage RDD is IDEMPOTENT or UNORDERED, the reduce RDD is UNORDERED instead of INDETERMINATE. It does not matter what the output order of map stage was, after we shuffle the map output, it is always indeterminate order except for the specific cases I referred to above.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org