Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/19340
  
    @Kevin-Ferret pointed out that both the input and the centers should be 
normalized to unit Euclidean length. Citing you, 
    
    > the solution is also the arithmetic mean only if all vectors are of unit 
length.
    
    Therefore ensuring convergence means that the input dataset should contain 
unit length vectors, but this should be done by the user. I think we can add a 
comment in the documentation or adding a check and a WARN, but this has 
performance impact.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to