Yes, use an approximate nearest neighbors approach. None is included in
scikit-learn, but there are numerous implementations with Python interfaces.

On 5 January 2018 at 12:51, Shiheng Duan <shid...@ucdavis.edu> wrote:

> Thanks, Joel,
> I am working on KD-tree to find the nearest neighbors. Basically, I find
> the nearest neighbors for each point and then merge a couple of points if
> they are both NN for each other. The problem is that after each iteration,
> we will have a new bunch of points, where new clusters are added. So the
> tree needs to be updated. Since I didn't find any dynamic way to update the
> tree, I just rebuild it after each iteration, costing lots of time. Any
> idea about it?
> Actually, it takes around 16 mins to build the tree in the first
> iteration, which is not slow I think. But it still runs slowly. I have a
> dataset of 12*872505 (features, samples). It takes several days to run the
> program. Is there any way to speed up the query process of NN? I doubt
> query may be too slow.
> Thanks for your time.
>
> On Thu, Jan 4, 2018 at 3:55 AM, Joel Nothman <joel.noth...@gmail.com>
> wrote:
>
>> Can you use nearest neighbors with a KD tree to build a distance matrix
>> that is sparse, in that distances to all but the nearest neighbors of a
>> point are (near-)infinite? Yes, this again has an additional parameter
>> (neighborhood size), just as BIRCH has its threshold. I suspect you will
>> not be able to improve on having another, approximating, parameter. You do
>> not need to set n_clusters to a fixed value for BIRCH. You only need to
>> provide another clusterer, which has its own parameters, although you
>> should be able to experiment with different "global clusterers".
>>
>> On 4 January 2018 at 11:04, Shiheng Duan <shid...@ucdavis.edu> wrote:
>>
>>> Yes, it is an efficient method, still, we need to specify the number of
>>> clusters or the threshold. Is there another way to run hierarchy clustering
>>> on the big dataset? The main problem is the distance matrix.
>>> Thanks.
>>>
>>> On Tue, Jan 2, 2018 at 6:02 AM, Olivier Grisel <olivier.gri...@ensta.org
>>> > wrote:
>>>
>>>> Have you had a look at BIRCH?
>>>>
>>>> http://scikit-learn.org/stable/modules/clustering.html#birch
>>>>
>>>> --
>>>> Olivier
>>>> ​
>>>>
>>>> _______________________________________________
>>>> scikit-learn mailing list
>>>> scikit-learn@python.org
>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>
>>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn@python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn@python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to