Dear all,

I'm trying to find an efficient way to build a k-NN graph for a large
dataset. Precisely, I have a large set of high dimensional vector (say d
>>> 10000) and I want to build a graph where those high dimensional points
are the vertices and each one is linked to the k-nearest neighbor based on
some kind similarity defined on the vertex spaces.
My problem is to implement an efficient algorithm to compute the weight
matrix of the graph. I need to compute a N*N similarities and the only way
I know is to use "cartesian" operation follow by "map" operation on RDD.
But, this is very slow when the N is large. Is there a more cleaver way to
do this for an arbitrary similarity function ?

Cheers,

Jao

Reply via email to