Yes, use an approximate nearest neighbors approach. None is included in scikit-learn, but there are numerous implementations with Python interfaces.
On 5 January 2018 at 12:51, Shiheng Duan <shid...@ucdavis.edu> wrote: > Thanks, Joel, > I am working on KD-tree to find the nearest neighbors. Basically, I find > the nearest neighbors for each point and then merge a couple of points if > they are both NN for each other. The problem is that after each iteration, > we will have a new bunch of points, where new clusters are added. So the > tree needs to be updated. Since I didn't find any dynamic way to update the > tree, I just rebuild it after each iteration, costing lots of time. Any > idea about it? > Actually, it takes around 16 mins to build the tree in the first > iteration, which is not slow I think. But it still runs slowly. I have a > dataset of 12*872505 (features, samples). It takes several days to run the > program. Is there any way to speed up the query process of NN? I doubt > query may be too slow. > Thanks for your time. > > On Thu, Jan 4, 2018 at 3:55 AM, Joel Nothman <joel.noth...@gmail.com> > wrote: > >> Can you use nearest neighbors with a KD tree to build a distance matrix >> that is sparse, in that distances to all but the nearest neighbors of a >> point are (near-)infinite? Yes, this again has an additional parameter >> (neighborhood size), just as BIRCH has its threshold. I suspect you will >> not be able to improve on having another, approximating, parameter. You do >> not need to set n_clusters to a fixed value for BIRCH. You only need to >> provide another clusterer, which has its own parameters, although you >> should be able to experiment with different "global clusterers". >> >> On 4 January 2018 at 11:04, Shiheng Duan <shid...@ucdavis.edu> wrote: >> >>> Yes, it is an efficient method, still, we need to specify the number of >>> clusters or the threshold. Is there another way to run hierarchy clustering >>> on the big dataset? The main problem is the distance matrix. >>> Thanks. >>> >>> On Tue, Jan 2, 2018 at 6:02 AM, Olivier Grisel <olivier.gri...@ensta.org >>> > wrote: >>> >>>> Have you had a look at BIRCH? >>>> >>>> http://scikit-learn.org/stable/modules/clustering.html#birch >>>> >>>> -- >>>> Olivier >>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> scikit-learn@python.org >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn