Re: Find KNN in Spark SQL

2015-05-19 Thread Debasish Das
The batch version of this is part of rowSimilarities JIRA 4823 ...if your query points can fit in memory there is broadcast version which we are experimenting with internallywe are using brute force KNN right now in the PR...based on flann paper lsh did not work well but before you go to approx

Re: Find KNN in Spark SQL

2015-05-19 Thread Xiangrui Meng
Spark SQL doesn't provide spatial features. Large-scale KNN is usually combined with locality-sensitive hashing (LSH). This Spark package may be helpful: http://spark-packages.org/package/mrsqueeze/spark-hash. -Xiangrui On Sat, May 9, 2015 at 9:25 PM, Dong Li wrote: > Hello experts, > > I’m new t

Find KNN in Spark SQL

2015-05-09 Thread Dong Li
Hello experts, I’m new to Spark, and want to find K nearest neighbors on huge scale high-dimension points dataset in very short time. The scenario is: the dataset contains more than 10 million points, whose dimension is 200d. I’m building a web service, to receive one new point at each request