GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/17092
[SPARK-18450][ML] Scala API Change for LSH AND-amplification ## What changes were proposed in this pull request? Implemented a new Param numHashFunctions as the dimension of AND-amplification for Locality Sensitive Hashing. Now the hash of each feature in LSH is an array of size numHashTables while each element in the array is a vector of size numHashFunctions. Two features are in the same hash bucket iff ANY pair of the vectors are equal (OR-amplification). Two vectors are equal iff ALL pair of the vector entries are equal (AND-amplification). Will create follow-up PRs for Python API and Doc/Examples. ## How was this patch tested? By running unit tests MinHashLSHSuite and BucketedRandomProjectionLSHSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Yunni/spark SPARK-18450 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17092.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17092 ---- commit e6f9f9541f0b00c14b7c5a201b22aeb400eb9f19 Author: Yun Ni <y...@uber.com> Date: 2017-02-16T20:54:22Z Scala API Change for AND-amplification commit 010acb2caf69ca0822db6aeb866cce21cdfcce4b Author: Yunni <euler57...@gmail.com> Date: 2017-02-27T03:43:21Z Merge branch 'SPARK-18450' of https://github.com/Yunni/spark into SPARK-18450 commit 83a155699df4b15f1ab1fc427730613b63f7d1d6 Author: Yunni <euler57...@gmail.com> Date: 2017-02-27T04:04:37Z Fix typos in unit tests commit 9dd87ba21a025939df7020ff1491a2c6c29f2d93 Author: Yunni <euler57...@gmail.com> Date: 2017-02-28T02:04:10Z Merge branch 'master' of https://github.com/apache/spark into SPARK-18450 ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org