Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11077 )

Change subject: [KUDU-2521] Java Implementation for BloomFilter
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@139
PS1, Line 139: obe.mixH
> I think it's necessary to make the hashing consistent with the blooms on PK 
> lookups in TServer.

Could you expand on why it's necessary to make the PK bloom filters use the 
same hash function as this predicate bloom filter?  Off the top of my head I'm 
not sure what the necessity or even upside of doing that would be (besides code 
re-use, but as pointed out we could get code-reuse in the client if we went 
with murmur).  We currently have the ability (in theory) of being flexible 
about what hash function we use, which means we can switch out city hash in the 
future if we want.  I'd be reluctant to lose that flexibility because we add a 
new mostly unrelated public API.

Related note - what do you think about making this API be 'hash agile' from the 
start?  E.g. build in a hash algorithm selector from the start, using feature 
flags or something similar.  It would be fine to have just a single 
implementation (murmur or city) at first.


http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java
File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@29
PS4, Line 29: public class BloomFilter {
The public API surface are of this class is much bigger than necessary. In 
particular the membership methods and the BloomKeyProbe class aren't necessary. 
 I'm thinking something along these lines (modeled more closely on the Guava 
BloomFilter class):

public class BloomFilter {

  public BloomFilter(int expectedInsertions, double fpp) { .. }

  public BloomFilter(int size, double fpp) { .. }

  public put(int value) { .. }

  public put(long value) { .. }

  ...

}


http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@57
PS4, Line 57:       bytes = new byte[]{1};
Lots of shortlived allocations happening here.  Given that the new hash API 
appears to take offset and len parameters, maybe the BloomFilter could allocate 
a length 8 byte[] upfront and just use that?


http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java
File java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java:

http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java@18
PS4, Line 18: package org.apache.kudu.util;
Was this implementation copied from somewhere?  Please include a comment if so, 
and retain the original header.


http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java:

http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@38
PS4, Line 38: ,n
missing space here and below



--
To view, visit http://gerrit.cloudera.org:8080/11077
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8
Gerrit-Change-Number: 11077
Gerrit-PatchSet: 4
Gerrit-Owner: [email protected]
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: [email protected]
Gerrit-Comment-Date: Tue, 14 Aug 2018 23:58:44 +0000
Gerrit-HasComments: Yes

Reply via email to