Hi,

Thank you for your answer. That is actually what I was thinking to do. But
it seems that it can perform well even with centers (I had a bug in my
code). Using the same example for BIRCH comparison with MiniBatch K-means,
where BIRCH produces 158 clusters, I was able to find exact number of
clusters which is 100. I still have to test a lot, but it looks like this
approach can work.

Best,
Dzenan

On Sat, Oct 17, 2015 at 2:29 AM, Manoj Kumar <manojkumarsivaraj...@gmail.com
> wrote:

> Hi,
>
> Birch does not remember which samples are fed to it in it and it each
> keeps track of only the linear sum, squared sum and number of samples. (I
> think), so you cannot do this directly.
>
> You can either
> 1. Reduce the threshold so much to get a large number of subcluster
> centers. (Note these are reduced instances of the original data, not the
> original data itself) or.
> 2. Using the distances from of the original data to the subcluster
> centers, compute the subclusters yourself. This is done at predict time in
> Birch.
>
>
> HTH
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to