Github user mengxr commented on the issue:
https://github.com/apache/spark/pull/22112
Then it doesn't meet the requirements for those operations used by MLlib:
* sampling
* zipWithIndex, zipWithUniqueId
* we also use zip, assuming the ordering from the source RDD is preserved,
e.g.,
https://github.com/apache/spark/blob/e50192494d1ae1bdaf845ddd388189998c1a2403/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L403
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]