Re: extremely slow k-means version

2014-04-19 Thread ticup
Thanks a lot for the explanation Matei. As a matter of fact, I was just reading up on the paper on the Narrow and Wide Dependencies and saw that groupByKey is indeed a wide dependency which, as you explained, is the problem. Maybe it wouldn't be a bad thing to have a section in the docs on the wi

extremely slow k-means version

2014-04-19 Thread ticup
1 / pair._2._2)}.collectAsMap() Afterwards the change of the new centroids is calculated in order to know when to stop iterating: tempDist = 0.0 for (i <- 0 until K) { tempDist += kPoints(i).squaredDist(newPoints(i)) } *my algorithm * (https://github.com/ticup/k-means-spark/blob/master