Can you use nearest neighbors with a KD tree to build a distance matrix
that is sparse, in that distances to all but the nearest neighbors of a
point are (near-)infinite? Yes, this again has an additional parameter
(neighborhood size), just as BIRCH has its threshold. I suspect you will
not be able to improve on having another, approximating, parameter. You do
not need to set n_clusters to a fixed value for BIRCH. You only need to
provide another clusterer, which has its own parameters, although you
should be able to experiment with different "global clusterers".

On 4 January 2018 at 11:04, Shiheng Duan <shid...@ucdavis.edu> wrote:

> Yes, it is an efficient method, still, we need to specify the number of
> clusters or the threshold. Is there another way to run hierarchy clustering
> on the big dataset? The main problem is the distance matrix.
> Thanks.
>
> On Tue, Jan 2, 2018 at 6:02 AM, Olivier Grisel <olivier.gri...@ensta.org>
> wrote:
>
>> Have you had a look at BIRCH?
>>
>> http://scikit-learn.org/stable/modules/clustering.html#birch
>>
>> --
>> Olivier
>> ​
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to