Hi, Thank you for your answer. That is actually what I was thinking to do. But it seems that it can perform well even with centers (I had a bug in my code). Using the same example for BIRCH comparison with MiniBatch K-means, where BIRCH produces 158 clusters, I was able to find exact number of clusters which is 100. I still have to test a lot, but it looks like this approach can work.
Best, Dzenan On Sat, Oct 17, 2015 at 2:29 AM, Manoj Kumar <manojkumarsivaraj...@gmail.com > wrote: > Hi, > > Birch does not remember which samples are fed to it in it and it each > keeps track of only the linear sum, squared sum and number of samples. (I > think), so you cannot do this directly. > > You can either > 1. Reduce the threshold so much to get a large number of subcluster > centers. (Note these are reduced instances of the original data, not the > original data itself) or. > 2. Using the distances from of the original data to the subcluster > centers, compute the subclusters yourself. This is done at predict time in > Birch. > > > HTH > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general