Can you use nearest neighbors with a KD tree to build a distance matrix that is sparse, in that distances to all but the nearest neighbors of a point are (near-)infinite? Yes, this again has an additional parameter (neighborhood size), just as BIRCH has its threshold. I suspect you will not be able to improve on having another, approximating, parameter. You do not need to set n_clusters to a fixed value for BIRCH. You only need to provide another clusterer, which has its own parameters, although you should be able to experiment with different "global clusterers".
On 4 January 2018 at 11:04, Shiheng Duan <shid...@ucdavis.edu> wrote: > Yes, it is an efficient method, still, we need to specify the number of > clusters or the threshold. Is there another way to run hierarchy clustering > on the big dataset? The main problem is the distance matrix. > Thanks. > > On Tue, Jan 2, 2018 at 6:02 AM, Olivier Grisel <olivier.gri...@ensta.org> > wrote: > >> Have you had a look at BIRCH? >> >> http://scikit-learn.org/stable/modules/clustering.html#birch >> >> -- >> Olivier >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn