Re: [scikit-learn] Issues with kmeans: Difference in centroid values

Andreas Mueller Mon, 16 Apr 2018 14:38:58 -0700


On 04/16/2018 04:07 PM, Sidak Pal Singh wrote:

Hi everyone,
I was using scikit-learn KMeans algorithm to cluster pretrainedword-vectors. There are a few things which I found to be surprisingand wanted to get some feedback on.
- Based upon the 'labels_' assigned to each word-vector (i.e. clustermemberships), I compute every cluster centroid as the average of theword-vectors (corresponding to that cluster). Surprisingly, this seemsto be pretty different from the 'cluster_centers_'. Is there anythingthat I am missing here?

If the algorithm did not fully converge, you just did one more step, sothe results are expected to be different.

- I was later using the verbose option to see if the clustering hasconverged or not. I saw on the console log messages such as /"//centershift 7.994126e-04 within tolerance 1.243425e-06"/. It seems that thiscorresponds to some code in *kmeans_elkan.pyx*(https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/_k_means_elkan.pyx).- Lastly, another thing that seems strange is that I hadn't set thetolerance value. So the default of 1e-4 should have been used. But ifyou look again at the above log, it says /within tolerance1.243425e-06 instead of 1e-4.
/

/https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cluster/k_means_.py#L159

The tolerance is scaled by the variance of the data to be independent ofthe scal/e

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Issues with kmeans: Difference in centroid values

Reply via email to