GitHub user Yunni opened a pull request:
https://github.com/apache/spark/pull/16965
[Spark-18450][ML] Scala API Change for LSH AND-amplification
## What changes were proposed in this pull request?
Implemented a new Param numHashFunctions as the dimension of
AND-amplification for Locality Sensitive Hashing. Now the hash of each feature
in LSH is an array of size numHashTables while each element in the array is a
vector of size numHashFunctions.
Two features are in the same hash bucket iff ANY pair of the vectors are
equal (OR-amplification). Two vectors are equal iff ALL pair of the vector
entries are equal (AND-amplification).
## How was this patch tested?
By running unit tests MinHashLSHSuite and BucketedRandomProjectionLSHSuite.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Yunni/spark SPARK-18450
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16965.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16965
commit e6f9f9541f0b00c14b7c5a201b22aeb400eb9f19
Author: Yun Ni
Date: 2017-02-16T20:54:22Z
Scala API Change for AND-amplification
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org