[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15874 Well, I'm having trouble merging b/c of bad wifi during travel. Ping @yanboliang @MLnick @mengxr would one of you mind merging this with master and branch-2.1? @sethah and I having both given

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-28 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15874 LGTM Thanks everyone! Merging with master and branch-2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69215/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #69215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69215/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #69215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69215/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-27 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 @jkbradley If you don't have more comments, can we merge this because I need to change the examples in #15795 ? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 Thanks @sethah ! Your comment was very helpful and detailed :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15874 LGTM. I think we've made JIRAs for all of the follow-up items. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 @sethah PTAL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69031/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #69031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69031/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #69031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69031/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69020/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #69020 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69020/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69012/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #69012 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69012/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #69020 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69020/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-22 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #69012 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69012/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68880/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68880/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68880/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 Hi @sethah, grouping to a number of buckets does not really affect the independence since p is a mach larger prime. For example, in http://people.csail.mit.edu/mip/papers/kwise-lb/kwise-lb.pdf, they

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15874 @jkbradley Thanks for checking that, that is the conclusion I drew as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 @jkbradley Awesome, thanks so much! :) Now that the API is finalized, I will work on the User Doc --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15874 I will take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-18 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15874 @Yunni Thanks for the updates! I don't think we should include AND-amplification for 2.1 since we're already in QA. But it'd be nice to get it in 2.2. Also, 2.2 will give us plenty of time to

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68825/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68825 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68825/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68823/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68823/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68825/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68823/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/15874 @Yunni I think if we are using this 2-independent hash family we should provide that reference you mention in the Scaladoc, and also mention it approximates min-wise independent. --- If your

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68803/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68803/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68802/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68802/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread Yunni
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15874 Hi @jkbradley, **MinHash** Yes, I agree that I shouldn't have said it's perfect hashing. Theoretically, it should be Min-wise Independent Permutation Family. What we used here is

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68803 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68803/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68802 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68802/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-17 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15874 Other comments: **MinHash** Looking yet again at this, I think it's using a technically incorrect hash function. It is *not* a perfect hash function. It can hash 2 input

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-16 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15874 I'll take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68689/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68689/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68689/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68683/ Test FAILed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68683/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68683/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68678/ Test PASSed. ---

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68678/consoleFull)** for PR 15874 at commit

[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH

2016-11-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #68678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68678/consoleFull)** for PR 15874 at commit