Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21561#discussion_r209860657
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -317,7 +317,14 @@ class KMeans private (
}.reduceByKey { case ((sum1, count1), (sum2, count2)) =>
axpy(1.0, sum2, sum1)
(sum1, count1 + count2)
- }.collectAsMap().mapValues { case (sum, count) =>
+ }.collectAsMap()
+
+ if (iteration == 0) {
+ val numSamples = collected.values.map(_._2).sum
--- End diff --
what about moving this in the `foreach`, so it is computed only id needed?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]