[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-14 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 9:

Thanks for merging! Adar~


--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 9
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Sat, 15 Sep 2018 02:54:04 +
Gerrit-HasComments: No


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-14 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Reviewed-on: http://gerrit.cloudera.org:8080/11333
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
2 files changed, 588 insertions(+), 0 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 9
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-14 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 8
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Fri, 14 Sep 2018 18:02:47 +
Gerrit-HasComments: No


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-13 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 8:

Thanks Adar~
I updated, please take another look :)


--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 8
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Fri, 14 Sep 2018 01:50:24 +
Gerrit-HasComments: No


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-13 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Adar Dembo, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11333

to look at the new patch set (#8).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
2 files changed, 588 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/11333/8
--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 8
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-13 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 7:

(4 comments)

Almost there, just a few more stylistic comments.

http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@73
PS6, Line 73: this.nBits = bitSet.size();
> it's:
Okay, that's constant time, so can you drop nBits and call bitSet.size() as 
needed?


http://gerrit.cloudera.org:8080/#/c/11333/7/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/7/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@71
PS7, Line 71: bitSet.size() > 8
Should be >=, right?


http://gerrit.cloudera.org:8080/#/c/11333/7/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@84
PS7, Line 84:   public static BloomFilter BySize(int nBytes) {
In our Java code we use lowerCamelCase to name functions and variables. So 
"BySize" should actually be "bySize", and "BySizeAndFPRate" should be 
"bySizeAndFPRate". Etc.


http://gerrit.cloudera.org:8080/#/c/11333/7/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@92
PS7, Line 92:*  ever been {@code put} into the {@code 
BloomFilter}.
For @params that span multiple lines, please align the continuation lines with 
the first letter of the explanation on the first line:

  @param fpRate the probability ...
ever been put into the ...

That's the Javadoc style we follow.

Below too.



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 7
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Thu, 13 Sep 2018 17:57:43 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-12 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@73
PS6, Line 73: this.nBits = bitSet.size();
> Could you look into whether bitSet.size() is a constant time operation? If
it's:
```
public int size() {
return words.length * BITS_PER_WORD;
}
```


http://gerrit.cloudera.org:8080/#/c/11333/7/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/7/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@17
PS7, Line 17:
Thanks a lot for quick response ~Adar
I updated according to your comments.
Please take a look :)



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 7
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Thu, 13 Sep 2018 04:09:53 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-12 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Adar Dembo, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11333

to look at the new patch set (#7).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
2 files changed, 590 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/11333/7
--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 7
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-12 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 6:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@225
PS5, Line 225: return checkIfContains(byteBuffer);
> I'm not sure here.
Actually, the length of 'bytes' doesn't matter because it's hashed into a 
single long regardless.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@29
PS6, Line 29:  * An space-efficient filter and offers an approximate 
containment check.
Nit: "A space-efficient filter which offers an approximate containment check"?


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@37
PS6, Line 37: shrink the amount
Nit: "constrain the number"


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@39
PS6, Line 39: you are expecting TServer to return records have
:  * the same value in a scan.
Nit: "you expect the TServer to return records with the same value in a scan".


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@52
PS6, Line 52:  *   // TODO: implemnt the interface for serializaing and sending
"implement" and "serializing"


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@69
PS6, Line 69: if (bitSet.size() < 8) {
:   throw new IllegalArgumentException("Number of bits in 
bitset should be at least 8, but found "
: + bitSet.length());
: }
You can use Guava's Preconditions to simplify these assertions:

  Preconditions.checkArgument(bitset.size() < 8, "...");


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@73
PS6, Line 73: this.nBits = bitSet.size();
Could you look into whether bitSet.size() is a constant time operation? If it 
is, there's no need to store nBits separately; we could just query 
bitSet.size() whenever we need the number of bits.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@82
PS6, Line 82:* @param nBytes size of Bloom filter
Nit: "size of bloom filter in bytes"


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@83
PS6, Line 83:* @param fpRate false positive rate
Could you elaborate on what this means? How should a user know what value to 
provide here?

Below as well.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@89
PS6, Line 89:   public static BloomFilter BySizeAndFPRate(int nBytes, double 
fpRate, HashFunction hashFunction) {
Please doc this one too.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@97
PS6, Line 97: Bloom
Nit: lower case "bloom" (to be consistent with how you've written "bloom 
filter" in other Javadoc.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@112
PS6, Line 112:   public void put(byte[] data) {
Please Javadoc the various put() methods.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@163
PS6, Line 163:
 :   public byte[] getBitSet() {
 : return bitSet.toByteArray();
 :   }
 :
 :   public int getNHashes() {
 : return nHashes;
 :   }
 :
 :   public String getHashFunctionName() {
 : return hashFunction.toString();
 :   }
Are these testing only? If so, annotate them as such. If not, provide Javadoc.


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@200
PS6, Line 200: assert (byteBuffer.length >= length);
Nit: use Guava's Preconditions here.


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java:


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-12 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@225
PS5, Line 225: return checkIfContains(byteBuffer);
> Likewise, precondition on bytes.length vs. bitmap size here?
I'm not sure here.
`bytes` is generated from the records.
We cannot guarantee bytes.length vs. bitmap.size I think


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@27
PS6, Line 27:
Thanks a lot for careful review ~ Adar

I think I resolved your comments, please take another look when you have time


http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@34
PS6, Line 34:   public void testNumberOfHashes() {
I think it's necessary to check correctness of the `nHashes` calculated, but 
didn't find a better way.



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 6
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Wed, 12 Sep 2018 17:13:04 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-12 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Adar Dembo, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11333

to look at the new patch set (#6).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
2 files changed, 514 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/11333/6
--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 6
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 5:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@30
PS5, Line 30: public class BloomFilter {
Could you add Javadocs to the class as well as to all public 
non-visiblefortesting methods, explaining how they work and how to use them?


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@33
PS5, Line 33:   private byte[] bitmap;
Curious why you chose to implement your own bitmap instead of reusing 
java.util.BitSet?


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@36
PS5, Line 36:   private HashFunction hashFunction;
Some of these fields can be marked final because they're not modified after the 
constructor call, right?


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@39
PS5, Line 39: this.nBits = nBits;
Should we precondition that this is at least 8?


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@40
PS5, Line 40: this.bitmap = bitmap;
Likewise, should we precondition that this has space for at least 8 bits?


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@65
PS5, Line 65: nBytes
Shouldn't this be nBits? I'm surprised this worked as-is; could you add a unit 
test that would fail or malfunction when this is nBytes?


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@74
PS5, Line 74: if (data) {
:   byteBuffer[0] = 1;
: } else {
:   byteBuffer[0] = 0;
: }
Nit: byteBuffer[0] = data ? 1 : 0;


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@143
PS5, Line 143:   private void updateBitmap(byte[] byteBuffer, int length) {
Should we precondition that bitmap.size >= length?


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@151
PS5, Line 151:   tmp = tmp + h2;
Nit: tmp += h2;


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@155
PS5, Line 155:   // This is for test.
Methods whose visibility has been increased for testing should be annotated 
with:

  @InterfaceAudience.LimitedPrivate("Test”)

Then you don't need the "This is for test" comments either.

See 
https://github.com/apache/kudu/commit/abbde75e12f2275e1a286df60d788e9c5b411bcb 
for more details.


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@225
PS5, Line 225:   private boolean checkIfContains(byte[] bytes) {
Likewise, precondition on bytes.length vs. bitmap size here?


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@233
PS5, Line 233: bitpos
Nit: bitPos (camel-case)


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@237
PS5, Line 237:   tmp = tmp + h2;
Nit: tmp += h2;


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@243
PS5, Line 243:   private static double kNaturalLog2= 0.69314;
Nit: reformat as:

  private static double kNaturalLog2 = 0.69314;

(separate the assignment operator '=' from the variable name 'kNaturalLog2' 
with a space)


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@257
PS5, Line 257: n_bits,
Nit: camel-case in Java, so 'nBits'


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@271
PS5, Line 271:   private static long pickBit(long hash, int nBits) {
Nit: maybe rename to GetBitIndex or some such, to emphasize that what's 
returned is an index into a bitmap?


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@276
PS5, Line 276:   public String toString() {
Nit: mind implementing this via MoreObjects.toStringHelper(), or even just a 
StringBuilder? Easier to follow, I think.


http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File 

[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-10 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@112
PS5, Line 112:   public void testLong() {
Sure, I fixed this typo.
Thanks a lot for quick response, Dan~



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 5
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Tue, 11 Sep 2018 01:59:30 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-10 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11333

to look at the new patch set (#5).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
2 files changed, 450 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/11333/5
--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 5
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-10 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 4:

(1 comment)

Looks good, just a small typo to fix.  Thanks!

http://gerrit.cloudera.org:8080/#/c/11333/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@127
PS4, Line 127:   public void testFLoat() {
'Float'



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 4
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Tue, 11 Sep 2018 00:11:31 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-10 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11333/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java:

PS4:
Dan~ I added test here.
Please leave some comments when you have time ~



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 4
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Mon, 10 Sep 2018 15:52:13 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-10 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11333

to look at the new patch set (#4).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
2 files changed, 450 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/11333/4
--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 4
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-10 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11333

to look at the new patch set (#3).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
2 files changed, 453 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/11333/3
--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 3
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-09-06 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 2:

(1 comment)

This is looking good, if you add back the tests I think it will be ready to go.

http://gerrit.cloudera.org:8080/#/c/11333/2/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/2/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@156
PS2, Line 156:   public boolean mayContains(byte[] data) {
I think 'mayContain' (singular) is more correct here and below.



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 2
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Thu, 06 Sep 2018 23:42:55 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-31 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11333/2/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/2/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@17
PS2, Line 17:
Thanks advice on How-To-Use-Gerrit. I'm finding Gerrit is easier to use now :)


http://gerrit.cloudera.org:8080/#/c/11333/2/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@33
PS2, Line 33:   private byte[] bitmap;
should we provided a clear method like below:

public void clear() {
  for (int i = 0; i < bitmap.length(); i++) {
bitmap[i] =0
  }
}


http://gerrit.cloudera.org:8080/#/c/11333/2/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@134
PS2, Line 134: // We can consider to provide some other options like 
Murmur3, CityHash in the future.
Hmm, only Murmur2 is provided. Note that Spark is using Murmur3.



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 2
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Fri, 31 Aug 2018 09:02:43 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-31 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11333

to look at the new patch set (#2).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
1 file changed, 280 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/11333/2
--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 2
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-29 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@149
PS1, Line 149:   public static interface HashFunction {
> Exposing this abstraction as an enum might be the better way to go, since i
To be clear the implementation probably requires such an interface, but the 
public API is probably better exposed as an enum.



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 1
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Wed, 29 Aug 2018 22:50:12 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-29 Thread Dan Burkert (Code Review)
Dan Burkert has abandoned this change. ( http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Abandoned

Closing in favor of https://gerrit.cloudera.org/c/11333/
--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 4
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-29 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 4:

> Patch Set 4:
>
> (1 comment)

Just left a review, thanks!  Will close this one out in favor of the new 
patchset.


--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 4
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Wed, 29 Aug 2018 22:36:19 +
Gerrit-HasComments: No


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-29 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@31
PS1, Line 31:
> Shall we start the review at this Change-ID ?
Typically you can just rebase and keep pushing to the same change-id / gerrit 
patchset, but in this case we can just continue the review here.


http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@109
PS1, Line 109:   public void put(Long data) {
should this be 'long'?


http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@149
PS1, Line 149:   public static interface HashFunction {
Exposing this abstraction as an enum might be the better way to go, since it's 
a 'closed' set of options.  I think a user could technically implement this 
interface for their custom hash function, which would work until the bloom 
filter was sent to the server, at which point the server would have no idea 
what to do with it.  An enum better models the 'one of a few options, but not 
open to extension' idea.


http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@288
PS1, Line 288:   private static class BloomKeyProbe {
Is the key probe useful?  As I understand it it's used in the server as an 
optimization when looking up whether a key is in multiple bloom filters, but 
since we don't need such optimized lookups in this Java implementation it could 
potentially be removed?  Then you could expose mayContains for specific types 
instead of having to do type dispatch, and the number of methods will be about 
the same.



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 1
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Wed, 29 Aug 2018 22:35:33 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-26 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@31
PS1, Line 31:
Shall we start the review at this Change-ID ?
Sorry, i'm not quite familiar with Gerrit,  and thought Change-ID is something 
like Commit-ID by mistake.



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 1
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Sun, 26 Aug 2018 16:03:29 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-26 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11333 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@30
PS1, Line 30: public class BloomFilter {
I start this new Change here.
In current change:
1. User has flexibility to define hash function and Murmur2 by default;
2. I removed some unrelated public API and did some renaming (some of the 
interface used for test is still kept);
3. I removed the test `TestBloomFilter`. After structure/interface of 
`BloomFilter` is approved, I will add unit test later.



--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 1
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Sun, 26 Aug 2018 16:00:42 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-26 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@139
PS1, Line 139: obe.mixH
> > I think it's necessary to make the hashing consistent with the blooms on
I think my previous understanding is not correct. Yes, Dan~ there is no 
necessity to make the PK bloom filters use the same hash function as this 
predicate bloom filter.

I re-write the BloomFIlter at: 
https://gerrit.cloudera.org/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java

In current change:
1. User has flexibility to define hash function and Murmur2 by default;
2. I removed some unrelated public API and did some renaming (some of the 
interface used for test is still kept);

Note that Spark internal is using Murmur3 by default, which is different with 
Kudu internal.

I removed the test at 
https://gerrit.cloudera.org/#/c/11333/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java,
 please take a look  at the structure and whether the interface is appropriate. 
I will re-write the test later.



--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 4
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Sun, 26 Aug 2018 15:52:45 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-26 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11333


Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
1 file changed, 343 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/11333/1
--
To view, visit http://gerrit.cloudera.org:8080/11333
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e
Gerrit-Change-Number: 11333
Gerrit-PatchSet: 1
Gerrit-Owner: jinxing6...@126.com


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-14 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@139
PS1, Line 139: obe.mixH
> I think it's necessary to make the hashing consistent with the blooms on PK 
> lookups in TServer.

Could you expand on why it's necessary to make the PK bloom filters use the 
same hash function as this predicate bloom filter?  Off the top of my head I'm 
not sure what the necessity or even upside of doing that would be (besides code 
re-use, but as pointed out we could get code-reuse in the client if we went 
with murmur).  We currently have the ability (in theory) of being flexible 
about what hash function we use, which means we can switch out city hash in the 
future if we want.  I'd be reluctant to lose that flexibility because we add a 
new mostly unrelated public API.

Related note - what do you think about making this API be 'hash agile' from the 
start?  E.g. build in a hash algorithm selector from the start, using feature 
flags or something similar.  It would be fine to have just a single 
implementation (murmur or city) at first.


http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@29
PS4, Line 29: public class BloomFilter {
The public API surface are of this class is much bigger than necessary. In 
particular the membership methods and the BloomKeyProbe class aren't necessary. 
 I'm thinking something along these lines (modeled more closely on the Guava 
BloomFilter class):

public class BloomFilter {

  public BloomFilter(int expectedInsertions, double fpp) { .. }

  public BloomFilter(int size, double fpp) { .. }

  public put(int value) { .. }

  public put(long value) { .. }

  ...

}


http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@57
PS4, Line 57:   bytes = new byte[]{1};
Lots of shortlived allocations happening here.  Given that the new hash API 
appears to take offset and len parameters, maybe the BloomFilter could allocate 
a length 8 byte[] upfront and just use that?


http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java
File java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java:

http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java@18
PS4, Line 18: package org.apache.kudu.util;
Was this implementation copied from somewhere?  Please include a comment if so, 
and retain the original header.


http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@38
PS4, Line 38: ,n
missing space here and below



--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 4
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Tue, 14 Aug 2018 23:58:44 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-13 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@22
PS1, Line 22: public class BloomFilter {
> Is this meant to be part of the public API or an internal-facing class? I'm
True, I updated and added some annotation.


http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@40
PS1, Line 40: genBloomKeyProbe
> this pattern seems like it will create a new object for each lookup in the
True,  I updated in current change.


http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@86
PS1, Line 86: CharsetUtil
> I think we can use StandardCharsets from java 7 here
Updated


http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@139
PS1, Line 139: CityHash
> I'm curious why the decision to use CityHash here? We are using murmur hash
Yes, in Kudu kernel, we are using Murmur2 for partitioning and CityHash for 
blooms on PK lookups. The hashing added in this change is only used for bloom 
filter. So I think it's necessary to make the hashing consistent with the 
blooms on PK lookups in TServer.
I was also looking forward to use the bloom filters to eliminate entire 
partitions. But it was hard. In scenario of Join, the small table used to 
generate BloomFilter is not always small enough, thus hard to eliminate the 
entire partition.


http://gerrit.cloudera.org:8080/#/c/11077/3/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/3/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@28
PS3, Line 28:   public BloomFilter(BloomFilterSizing sizing) {
Added some annotations


http://gerrit.cloudera.org:8080/#/c/11077/3/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@169
PS3, Line 169:
Yes, in Kudu kernel, we are using Murmur2 for partitioning and CityHash for 
blooms on PK lookups. The hashing added in this change is only used for bloom 
filter. So I think it's necessary to make the hashing consistent with the 
blooms on PK lookups in TServer.
I was also looking forward to use the bloom filters to eliminate entire 
partitions. But it was hard. In scenario of Join, the small table used to 
generate BloomFilter is not always small enough, thus hard to eliminate the 
entire partition.



--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 2
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: jinxing6...@126.com
Gerrit-Comment-Date: Mon, 13 Aug 2018 09:59:24 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-06 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 4:

Did you forget to "publish" responses to the comments? I'm curious to discuss 
more the choice of hash functions, etc, per my comment on patch #1


--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 4
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 06 Aug 2018 18:45:09 +
Gerrit-HasComments: No


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-06 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 2:

I think the changes in https://gerrit.cloudera.org/#/c/11126/ should likely be 
squashed in to this change.


--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 2
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 06 Aug 2018 17:40:56 +
Gerrit-HasComments: No


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-05 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11077

to look at the new patch set (#4).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
3 files changed, 697 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/11077/4
--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 4
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-05 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11077

to look at the new patch set (#3).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
3 files changed, 696 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/11077/3
--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 3
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-08-05 Thread Anonymous Coward (Code Review)
Hello Dan Burkert, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11077

to look at the new patch set (#2).

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
3 files changed, 673 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/11077/2
--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 2
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-07-30 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@22
PS1, Line 22: public class BloomFilter {
Is this meant to be part of the public API or an internal-facing class? I'm 
guessing that consumers of the kudu client need to actually construct the bloom 
filter in some way, so this is public, right?

Either way, we should add an InterfaceAudience and InterfaceStability 
annotation.


http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@40
PS1, Line 40: genBloomKeyProbe
this pattern seems like it will create a new object for each lookup in the 
bloom filter. Although these are short-lived objects, I'm not sure if we can 
count on them being removed by escape analysis. Maybe instead we can make 
BloomKeyProbe a reusable mutable object?


http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@86
PS1, Line 86: CharsetUtil
I think we can use StandardCharsets from java 7 here


http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@139
PS1, Line 139: CityHash
I'm curious why the decision to use CityHash here? We are using murmur hashes 
for partitioning, and internal to the tserver using CityHash for blooms on PK 
lookups.

Is there an advantage to this bloom filter matching the hash used by the blooms 
on the server side? I can't quite see how it would be used. If the choice of 
hash is arbitrary, maybe better to just stick with Murmur since we already 
included the library? Or maybe we can benchmark some alternatives for this use 
case and see which is best? I think performance on small strings is most 
important, even if the hash function is lower "quality". See 
https://bigdata.uni-saarland.de/publications/p249-richter.pdf for some 
interesting discussion

Is there any way we can use the bloom filters to eliminate entire partitions 
while scanning if the bloom is calculated on the partitioning key? I'm not sure 
if the bloom could be used directly, but I wonder if this would be useful for 
joins in some cases (eg by just calculating a simple bitmap of matching 
partitions based on knowledge of the hash partitioning on the build side of the 
join)



--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 1
Gerrit-Owner: jinxing6...@126.com
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 30 Jul 2018 16:05:08 +
Gerrit-HasComments: Yes


[kudu-CR] [KUDU-2521] Java Implementation for BloomFilter

2018-07-30 Thread Anonymous Coward (Code Review)
jinxing6...@126.com has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11077


Change subject: [KUDU-2521] Java Implementation for BloomFilter
..

[KUDU-2521] Java Implementation for BloomFilter

Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
---
A java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
A java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
3 files changed, 673 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/11077/1
--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 1
Gerrit-Owner: jinxing6...@126.com