The batch version of this is part of rowSimilarities JIRA 4823 ...if your
query points can fit in memory there is broadcast version which we are
experimenting with internallywe are using brute force KNN right now in
the PR...based on flann paper lsh did not work well but before you go to
approx
Spark SQL doesn't provide spatial features. Large-scale KNN is usually
combined with locality-sensitive hashing (LSH). This Spark package may
be helpful: http://spark-packages.org/package/mrsqueeze/spark-hash.
-Xiangrui
On Sat, May 9, 2015 at 9:25 PM, Dong Li wrote:
> Hello experts,
>
> I’m new t
Hello experts,
I’m new to Spark, and want to find K nearest neighbors on huge scale
high-dimension points dataset in very short time.
The scenario is: the dataset contains more than 10 million points, whose
dimension is 200d. I’m building a web service, to receive one new point at each
request