Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
Well, I'm having trouble merging b/c of bad wifi during travel. Ping
@yanboliang @MLnick @mengxr would one of you mind merging this with master and
branch-2.1? @sethah and I having both given
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
LGTM
Thanks everyone!
Merging with master and branch-2.1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69215/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #69215 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69215/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #69215 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69215/consoleFull)**
for PR 15874 at commit
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15874
@jkbradley If you don't have more comments, can we merge this because I
need to change the examples in #15795 ?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15874
Thanks @sethah ! Your comment was very helpful and detailed :-)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15874
LGTM. I think we've made JIRAs for all of the follow-up items. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15874
@sethah PTAL
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69031/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #69031 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69031/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #69031 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69031/consoleFull)**
for PR 15874 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69020/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #69020 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69020/consoleFull)**
for PR 15874 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69012/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #69012 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69012/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #69020 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69020/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #69012 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69012/consoleFull)**
for PR 15874 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68880/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68880 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68880/consoleFull)**
for PR 15874 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68880 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68880/consoleFull)**
for PR 15874 at commit
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15874
Hi @sethah, grouping to a number of buckets does not really affect the
independence since p is a mach larger prime. For example, in
http://people.csail.mit.edu/mip/papers/kwise-lb/kwise-lb.pdf, they
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15874
@jkbradley Thanks for checking that, that is the conclusion I drew as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15874
@jkbradley Awesome, thanks so much! :) Now that the API is finalized, I
will work on the User Doc
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user sethah commented on the issue:
https://github.com/apache/spark/pull/15874
I will take a look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
@Yunni Thanks for the updates! I don't think we should include
AND-amplification for 2.1 since we're already in QA. But it'd be nice to get
it in 2.2. Also, 2.2 will give us plenty of time to
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68825/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68825 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68825/consoleFull)**
for PR 15874 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68823/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68823 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68823/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68825 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68825/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68823 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68823/consoleFull)**
for PR 15874 at commit
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/15874
@Yunni I think if we are using this 2-independent hash family we should
provide that reference you mention in the Scaladoc, and also mention it
approximates min-wise independent.
---
If your
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68803/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68803 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68803/consoleFull)**
for PR 15874 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68802/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68802 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68802/consoleFull)**
for PR 15874 at commit
Github user Yunni commented on the issue:
https://github.com/apache/spark/pull/15874
Hi @jkbradley,
**MinHash**
Yes, I agree that I shouldn't have said it's perfect hashing.
Theoretically, it should be Min-wise Independent Permutation Family. What we
used here is
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68803 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68803/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68802 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68802/consoleFull)**
for PR 15874 at commit
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
Other comments:
**MinHash**
Looking yet again at this, I think it's using a technically incorrect hash
function. It is *not* a perfect hash function. It can hash 2 input
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/15874
I'll take a look
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68689/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68689 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68689/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68689 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68689/consoleFull)**
for PR 15874 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68683/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68683 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68683/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68683 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68683/consoleFull)**
for PR 15874 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68678/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/15874
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68678 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68678/consoleFull)**
for PR 15874 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/15874
**[Test build #68678 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68678/consoleFull)**
for PR 15874 at commit
63 matches
Mail list logo