GitHub user Yunni opened a pull request:

    https://github.com/apache/spark/pull/15874

    Spark 18408 yunn api improvements

    ## What changes were proposed in this pull request?
    
    (1) Change output schema to `Array of Vector` instead of `Vectors`
    (2) Use `numHashTables` as the dimension of Array and `numHashFunctions` as 
the dimension of Vector
    (3) Rename `RandomProjection` to `BucketedRandomProjectionLSH`, `MinHash` 
to `MinHashLSH`
    (4) Make `randUnitVectors/randCoefficients` private
    (5) Make Multi-Probe NN Search and `hashDistance` private for future 
discussion
    
    ## How was this patch tested?
    Related unit tests are modified to make sure the performance of LSH are 
ensured, and the outputs of the APIs meets expectation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Yunni/spark SPARK-18408-yunn-api-improvements

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15874.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15874
    
----
commit 559c09904538012b70bcb3493b8bc287dd855b2d
Author: Yun Ni <[email protected]>
Date:   2016-11-07T21:30:32Z

    [SPARK-18334] MinHash should use binary hash distance

commit 517a97bd16f3771d9abbcdf54957a011f5f87adc
Author: Yunni <[email protected]>
Date:   2016-11-08T06:15:24Z

    Remove misleading documentation as requested

commit b546dbd207a04e73bde097f25cae8c927322c2ae
Author: Yun Ni <[email protected]>
Date:   2016-11-08T18:54:09Z

    Add warning for multi-probe in MinHash

commit a3cd9281d1fb8d969cb8bdd32ae8c5b9c373ad3b
Author: Yun Ni <[email protected]>
Date:   2016-11-08T18:55:49Z

    Merge branch 'SPARK-18334-yunn-minhash-bug' of 
https://github.com/Yunni/spark into SPARK-18334-yunn-minhash-bug

commit c8243c7def8c270072edd5889cea7fd02677b44f
Author: Yun Ni <[email protected]>
Date:   2016-11-09T23:11:20Z

    (1) Fix documentation as CR suggested (2) Fix typo in unit test

commit 6aac8b343c5ea3a91b8517a2d3f47ed055ece9ad
Author: Yun Ni <[email protected]>
Date:   2016-11-09T23:22:27Z

    Fix typo in unit test

commit 98707436ea8a90599fd8615a47afff3bf29a3ae6
Author: Yun Ni <[email protected]>
Date:   2016-11-14T04:25:17Z

    [SPARK-18408] API Improvements for LSH

commit 0e9250be0142691e9e085ed1260f83f8ed40f5e4
Author: Yun Ni <[email protected]>
Date:   2016-11-14T04:38:44Z

    (1) Fix description for numHashFunctions (2) Make numEntries in MinHash 
private

commit adbbefe1777db8fb85a0af59c11e5840d3bc91ee
Author: Yun Ni <[email protected]>
Date:   2016-11-14T04:43:30Z

    Add assertion for hashFunction in BucketedRandomProjectionLSHSuite

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to