Re: [scikit-learn] clustering on big dataset

Joel Nothman Thu, 04 Jan 2018 03:57:47 -0800

Can you use nearest neighbors with a KD tree to build a distance matrix
that is sparse, in that distances to all but the nearest neighbors of a
point are (near-)infinite? Yes, this again has an additional parameter
(neighborhood size), just as BIRCH has its threshold. I suspect you will
not be able to improve on having another, approximating, parameter. You do
not need to set n_clusters to a fixed value for BIRCH. You only need to
provide another clusterer, which has its own parameters, although you
should be able to experiment with different "global clusterers".


On 4 January 2018 at 11:04, Shiheng Duan <[email protected]> wrote:

> Yes, it is an efficient method, still, we need to specify the number of
> clusters or the threshold. Is there another way to run hierarchy clustering
> on the big dataset? The main problem is the distance matrix.
> Thanks.
>
> On Tue, Jan 2, 2018 at 6:02 AM, Olivier Grisel <[email protected]>
> wrote:
>
>> Have you had a look at BIRCH?
>>
>> http://scikit-learn.org/stable/modules/clustering.html#birch
>>
>> --
>> Olivier
>> 
>>
>> _______________________________________________
>> scikit-learn mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] clustering on big dataset

Reply via email to