Hello,

As I was writing before, I am trying to improve BIRCH output quality. The
idea is to use BIRCH subclusters to estimate the number of clusters K for
K-means, and then run the K-means as global step.

So far I implemented K-means to be a global step for BIRCH, with selection
of K based on this paper [1]. It is nicely explained on this blog post [2].
I have also implemented Gap statistics for selection of K (similar approach
to [1] but slower).

However in my approach I was feeding 'selection of k function' with centers
obtained from BIRCH. Meaning that K-means with given K would
calculate inertia using BIRCH centers as new samples. Using this approach
selection of K cannot give good results, because there are just a few
sample points (centers). My questions is: it is possible to feed actual
subclusters instead of centers to the 'selection of k function' and does it
make sense?

I want to use BIRCH in an online fashion, so in my opinion it could make
sense to feed 'selection of k function' and k-means with subclusters as a
dataset.

[1] http://www.ee.columbia.edu/~dpwe/papers/PhamDN05-kmeans.pdf
[2]
https://datasciencelab.wordpress.com/2014/01/21/selection-of-k-in-k-means-clustering-reloaded/

Thank you.

Best,
Dzenan
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to