Hello, As I was writing before, I am trying to improve BIRCH output quality. The idea is to use BIRCH subclusters to estimate the number of clusters K for K-means, and then run the K-means as global step.
So far I implemented K-means to be a global step for BIRCH, with selection of K based on this paper [1]. It is nicely explained on this blog post [2]. I have also implemented Gap statistics for selection of K (similar approach to [1] but slower). However in my approach I was feeding 'selection of k function' with centers obtained from BIRCH. Meaning that K-means with given K would calculate inertia using BIRCH centers as new samples. Using this approach selection of K cannot give good results, because there are just a few sample points (centers). My questions is: it is possible to feed actual subclusters instead of centers to the 'selection of k function' and does it make sense? I want to use BIRCH in an online fashion, so in my opinion it could make sense to feed 'selection of k function' and k-means with subclusters as a dataset. [1] http://www.ee.columbia.edu/~dpwe/papers/PhamDN05-kmeans.pdf [2] https://datasciencelab.wordpress.com/2014/01/21/selection-of-k-in-k-means-clustering-reloaded/ Thank you. Best, Dzenan
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general