Hello everyone,
I am trying to understand Kmeans in Flink, Scala.
I can see that the attached Kmeans-snippet (taken from Flink examples)
updates centroids.
in (1) map function assigns points to centroids,
in (3) centroids are grouped by their ids.
in (4) the x and y coordinates are being added
But, I cannot understand what happens at (2) and then (5) ?
I will really appreciate, if any one can elaborate how this works ?
Thanks
Hajira
-------------------------------------
K means code snippet
--------------------------------------
val newCentroids = points
1) .map(new
SelectNearestCenter()).withBroadcastSet(currentCentroids, "centroids")
2) .map { x => (x._1, x._2, 1L) }
3) .groupBy(0) // by centroid ID
4) .reduce { (p1, p2) => (p1._1, p1._2.add(p2._2), p1._3 + p2._3) }
5) .map { x => new Centroid(x._1, x._2.div(x._3)) }
newCentroids
--------------------------------------