Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22112#discussion_r212376990
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1865,6 +1876,39 @@ abstract class RDD[T: ClassTag](
// RDD chain.
@transient protected lazy val isBarrier_ : Boolean =
dependencies.filter(!_.isInstanceOf[ShuffleDependency[_, _,
_]]).exists(_.rdd.isBarrier())
+
+ /**
+ * Returns the random level of this RDD's computing function. Please
refer to [[RDD.RandomLevel]]
+ * for the definition of random level.
+ *
+ * By default, an RDD without parents(root RDD) is IDEMPOTENT. For RDDs
with parents, the random
+ * level of current RDD is the random level of the parent which is
random most.
+ */
+ // TODO: make it public so users can set random level to their custom
RDDs.
+ // TODO: this can be per-partition. e.g. UnionRDD can have different
random level for different
+ // partitions.
+ private[spark] def computingRandomLevel: RDD.RandomLevel.Value = {
+ val parentRandomLevels = dependencies.map {
+ case dep: ShuffleDependency[_, _, _] =>
+ if (dep.rdd.computingRandomLevel == RDD.RandomLevel.INDETERMINATE)
{
+ RDD.RandomLevel.INDETERMINATE
--- End diff --
> > All other shuffle cases, we dont know the output order in spark.
>
> Actually we know. As long as the shuffle map stage RDD is IDEMPOTENT or
UNORDERED, the reduce RDD is UNORDERED instead of INDETERMINATE.
IIUC shuffle map itself works as follows:
- If Aggregator and key ordering are specified:
- output becomes idempotent;
- If Aggregator or key ordering are not specified:
- If input is indeterminate, then output becomes indeterminate;
- If input is idempotent or unordered, then output becomes unordered.
We have to also include the case @mridulm raised that shuffle map may be
skipped.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]