GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/17092
[SPARK-18450][ML] Scala API Change for LSH AND-amplification
## What changes were proposed in this pull request?
Implemented a new Param numHashFunctions as the dimension of
AND-amplification for Locality Sensitive Hashing. Now the hash of each feature
in LSH is an array of size numHashTables while each element in the array is a
vector of size numHashFunctions.
Two features are in the same hash bucket iff ANY pair of the vectors are
equal (OR-amplification). Two vectors are equal iff ALL pair of the vector
entries are equal (AND-amplification).
Will create follow-up PRs for Python API and Doc/Examples.
## How was this patch tested?
By running unit tests MinHashLSHSuite and BucketedRandomProjectionLSHSuite.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Yunni/spark SPARK-18450
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17092.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17092
commit e6f9f9541f0b00c14b7c5a201b22aeb400eb9f19
Author: Yun Ni
Date: 2017-02-16T20:54:22Z
Scala API Change for AND-amplification
commit 010acb2caf69ca0822db6aeb866cce21cdfcce4b
Author: Yunni
Date: 2017-02-27T03:43:21Z
Merge branch 'SPARK-18450' of https://github.com/Yunni/spark into
SPARK-18450
commit 83a155699df4b15f1ab1fc427730613b63f7d1d6
Author: Yunni
Date: 2017-02-27T04:04:37Z
Fix typos in unit tests
commit 9dd87ba21a025939df7020ff1491a2c6c29f2d93
Author: Yunni
Date: 2017-02-28T02:04:10Z
Merge branch 'master' of https://github.com/apache/spark into SPARK-18450
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org