Hello, A while ago I made a numeric computation of Elbow method to estimate an optimal number of clusters created by K-means and presented it in scikit-learn digest as a function. As a result I got a lot of clones from my github repo, and was encouraged to work further on it. I got feedback from people with whom I discussed it. Now it is in the format of scikit-learn Estimator and has bootstrapping to verify if the chosen number is more or less reliable and not too random. It returns a suggested optimal number and a dictionary of all calculated suggestions with corresponding frequencies. Here is the method: https://github.com/Mathemilda/Numeric_ElbowMethod_For_K-means/blob/master/EstimatedClusterNumberWithWCSS.py Here is an example of its application in a jupiter notebook: https://github.com/Mathemilda/Numeric_ElbowMethod_For_K-means/blob/master/A%20scikit-learn%20compatible%20method%20with%20WCSS%20metric.ipynb I got a number of other suggestions, like incorporating other metrics and methods. I have seen a discussion about it with some pullrequests on scikit-learn github, but it does not appear to be finished. As I understand a lot people would like to have something now, so I offer my work. Please do not hesitate with questions or suggestions, Mya
-- Maiia Bakhova Mathematician in Data Science http://myabakhova.blogspot.com https://www.linkedin.com/in/myabakhova
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn