We've just open sourced a LSH implementation on Spark. We're using this internally in order to find topK neighbors after a matrix factorization.
We hope that this might be of use for others: https://github.com/soundcloud/cosine-lsh-join-spark For those wondering: lsh is a technique to quickly find most similar neighbors in a high dimensional space. This is a problem faced whenever objects are represented as vectors in a high dimensional space e.g. words, items, users... cheers özgür demir -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cosine-LSH-Join-tp24785.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org