Build k-NN graph for large dataset

Jaonary Rabarisoa Wed, 26 Aug 2015 04:36:08 -0700

Dear all,

I'm trying to find an efficient way to build a k-NN graph for a large
dataset. Precisely, I have a large set of high dimensional vector (say d
>>> 10000) and I want to build a graph where those high dimensional points
are the vertices and each one is linked to the k-nearest neighbor based on
some kind similarity defined on the vertex spaces.
My problem is to implement an efficient algorithm to compute the weight
matrix of the graph. I need to compute a N*N similarities and the only way
I know is to use "cartesian" operation follow by "map" operation on RDD.
But, this is very slow when the N is large. Is there a more cleaver way to
do this for an arbitrary similarity function ?


Cheers,

Jao

Build k-NN graph for large dataset

Reply via email to