[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

mengxr Mon, 20 Aug 2018 23:21:42 -0700

Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/22112
  
    Then it doesn't meet the requirements for those operations used by MLlib:
    * sampling
    * zipWithIndex, zipWithUniqueId
    * we also use zip, assuming the ordering from the source RDD is preserved, 
e.g., 
https://github.com/apache/spark/blob/e50192494d1ae1bdaf845ddd388189998c1a2403/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L403



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

Reply via email to