Can you use nearest neighbors with a KD tree to build a distance matrix that is sparse, in that distances to all but the nearest neighbors of a point are (near-)infinite? Yes, this again has an additional parameter (neighborhood size), just as BIRCH has its threshold. I suspect you will not be able to improve on having another, approximating, parameter. You do not need to set n_clusters to a fixed value for BIRCH. You only need to provide another clusterer, which has its own parameters, although you should be able to experiment with different "global clusterers".
On 4 January 2018 at 11:04, Shiheng Duan <[email protected]> wrote: > Yes, it is an efficient method, still, we need to specify the number of > clusters or the threshold. Is there another way to run hierarchy clustering > on the big dataset? The main problem is the distance matrix. > Thanks. > > On Tue, Jan 2, 2018 at 6:02 AM, Olivier Grisel <[email protected]> > wrote: > >> Have you had a look at BIRCH? >> >> http://scikit-learn.org/stable/modules/clustering.html#birch >> >> -- >> Olivier >> >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
