Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19340
@Kevin-Ferret pointed out that both the input and the centers should be
normalized to unit Euclidean length. Citing you,
> the solution is also the arithmetic mean only if all vectors are of unit
length.
Therefore ensuring convergence means that the input dataset should contain
unit length vectors, but this should be done by the user. I think we can add a
comment in the documentation or adding a check and a WARN, but this has
performance impact.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]