Github user markhamstra commented on a diff in the pull request:
https://github.com/apache/spark/pull/22112#discussion_r213061324
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1918,3 +1980,19 @@ object RDD {
new DoubleRDDFunctions(rdd.map(x => num.toDouble(x)))
}
}
+
+/**
+ * The random level of RDD's output (i.e. what `RDD#compute` returns),
which indicates how the
+ * output will diff when Spark reruns the tasks for the RDD. There are 3
random levels, ordered
+ * by the randomness from low to high:
--- End diff --
Again, please remove "random" and "randomness". The issue is not
randomness, but rather determinism. For example, the output of `RDD#compute`
could be completely non-random but still dependent on state not contained in
the RDD. That would still make it problematic in terms of recomputing only some
partitions and aggregating the results.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]