[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66659/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-07 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 Related to the docs, some more comments defining terminology would be useful for non-experts: * OR-amplification * probing buckets * false positives/negatives (w.r.t. finding nearest

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66395/ Test PASSed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66395/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 @jkbradley Take you time for the code review. :) I will be working on the open dataset testing at the same time. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66395/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66323/ Test PASSed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66323 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66323/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66323 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66323/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66322/ Test FAILed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66322/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66322 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66322/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66306/ Test PASSed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66306/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66305/ Test PASSed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66305/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66306/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66305/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66068/ Test PASSed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66068/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66068/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 @jkbradley I see. Thanks Joseph! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 > Our use case is mainly using similarity join to find fraud trips. I think I can change the NN-search to only single-probing NN search of dataframe if you think it's fine. What do you think?

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66065/ Test FAILed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66065 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66065/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66065/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66061/ Test FAILed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66061/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66061/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66055/ Test FAILed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66055/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66057/ Test FAILed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66057/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66057/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66055/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66054 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66054/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66054/ Test FAILed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66054 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66054/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Hi @MLnick @jkbradley Thanks for the code review. I made some changes based on your comments. - I agree it's better to align the input types to vector in internal implementation.

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66051/ Test FAILed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66051/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-28 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #66051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66051/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-27 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15148 Yes ideally it's nice to be able to support multiple input types. Though I lean towards Vector as the most appropriate "unified" interface. Somewhere there is a TODO about supporting e.g.

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-26 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15148 * Do we want to use the subpackage ```spark.ml.feature.lsh``` or just put the classes under ```spark.ml.feature```? This would be the first division of ```feature```. I'd prefer not using

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65924/ Test PASSed. ---

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #65924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65924/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15148 **[Test build #65924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65924/consoleFull)** for PR 15148 at commit

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15148 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-26 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15148 At a high level I like the idea here and the work that's gone into a unified interface. A few comments: Data types I'm not that keen on mixing up the input data types between

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-26 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Thanks @karlhigley All of your comments are very helpful. I made some changes to make it work. :) --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-23 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Hi @sethah - My understanding is h(x) = floor((g1 dot x) / w) is one hash function, as is in the wiki. - In bulletpoint 6 of "Approach found on Wikipedia and here and here", we have a

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-23 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 Thanks for your clarifications. I still don't see where the algorithm used in this patch comes from. Here is my summary of how the approach here is different than the approach found on wikipedia and

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-20 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Hi @sethah, Thanks for the comments. - I agree. I have moved `lsh` package to be under `feature` - In "Similarity search in high dimensions via hashing", there is an algorithm in the

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 A few high-level comments/questions: * Should this go into the `feature` package as a feature estimator/transformer? That is where other dimensionality reduction techniques have gone and

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15148 @Yunni Thanks for working on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Hi @sethah, I have updated the reference in the PR and scaladoc for LSH. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 @Yunni Could you provide the specific reference paper this patch is based on? Also, it might be nice to put the reference in the code somewhere, e.g. the scaladoc for LSH/Random Projections. Thanks!

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 Thanks very much for reviewing @viirya I made some changes based on your comments. PTAL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

<    1   2