Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19340
I think some of the 'objections' in that link won't matter here. For
example some point out that k-means inherently implies Euclidean distance;
fine, we should really call this an instance of Lloyd's algorithm, but it
doesn't matter much. Cosine distance isn't a distance metric either, and it's
not obvious that Lloyd's converges when you pretend it is. I am not actually
sure, though I have the impression it satisfies enough properties that it does
in practice.
That link also mentions that Matlab allows cosine distance.
http://www.mathworks.com/help/stats/kmeans.html?s_tid=gn_loc_drop
This aspect doesn't worry me so much.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]