[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...

squito Mon, 27 Aug 2018 08:35:50 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22112#discussion_r213010846
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1918,3 +1980,19 @@ object RDD {
         new DoubleRDDFunctions(rdd.map(x => num.toDouble(x)))
       }
     }
    +
    +/**
    + * The random level of RDD's output (i.e. what `RDD#compute` returns), 
which indicates how the
    + * output will diff when Spark reruns the tasks for the RDD. There are 3 
random levels, ordered
    + * by the randomness from low to high:
    + * 1. IDEMPOTENT: The RDD output is always same (including order) when 
rerun.
    --- End diff --
    
    here too, idempotent is the wrong word for this ... deteminstic?  
partition-ordered? (I guess "ordered" could make it seem like the entire data 
is ordered ...)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...

Reply via email to