We've just open sourced a LSH implementation on Spark. We're using this
internally in order to find topK neighbors after a matrix factorization.

We hope that this might be of use for others:

https://github.com/soundcloud/cosine-lsh-join-spark

For those wondering: lsh is a technique to quickly find most similar
neighbors in a high dimensional space. This is a problem faced whenever
objects are represented as vectors in a high dimensional space e.g. words,
items, users...

cheers

özgür demir



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cosine-LSH-Join-tp24785.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to