Github user mengxr commented on the issue: https://github.com/apache/spark/pull/22112 Then it doesn't meet the requirements for those operations used by MLlib: * sampling * zipWithIndex, zipWithUniqueId * we also use zip, assuming the ordering from the source RDD is preserved, e.g., https://github.com/apache/spark/blob/e50192494d1ae1bdaf845ddd388189998c1a2403/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L403
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org