Github user Kevin-Ferret commented on the issue:

    https://github.com/apache/spark/pull/19340
  
     @mgaido91 I actually needed something like that recently and I stumbled 
upon your PR (and JIRA, that I cannot update unfortunately). 
    Your approach looks good to me but I was wondering : are you keeping the 
existing code to find the new centers, in the sense "newCenter = compute the 
arithmetic mean of all points in the cluster" ? (couldn't find any git diff on 
this piece 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L291-L298)
    This will lead to misleading results (your centroids won't minimize the 
within-cluster cosine distance anymore) unless you make sure to normalize all 
your input vectors AND your newly computed centroid (ie spherical k-means). 
    I might be wrong/have misread the code, forgive me if that's the case. If 
you want to discuss further, what's the best place (assuming you don't want 
this PR derailed with my blabbering) ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to