You could try dimensionality reduction (PCA or SVD) first. I would imagine that 
even if you could successfully compute similarities in the high-dimensional 
space you would probably run into the curse of dimensionality.
> On 26 Aug 2015, at 12:35, Jaonary Rabarisoa <[email protected]> wrote:
> 
> Dear all,
> 
> I'm trying to find an efficient way to build a k-NN graph for a large 
> dataset. Precisely, I have a large set of high dimensional vector (say d >>> 
> 10000) and I want to build a graph where those high dimensional points are 
> the vertices and each one is linked to the k-nearest neighbor based on some 
> kind similarity defined on the vertex spaces. 
> My problem is to implement an efficient algorithm to compute the weight 
> matrix of the graph. I need to compute a N*N similarities and the only way I 
> know is to use "cartesian" operation follow by "map" operation on RDD. But, 
> this is very slow when the N is large. Is there a more cleaver way to do this 
> for an arbitrary similarity function ? 
> 
> Cheers,
> 
> Jao


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to