Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/11077 )
Change subject: [KUDU-2521] Java Implementation for BloomFilter ...................................................................... Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11077/1/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@139 PS1, Line 139: obe.mixH > I think it's necessary to make the hashing consistent with the blooms on PK > lookups in TServer. Could you expand on why it's necessary to make the PK bloom filters use the same hash function as this predicate bloom filter? Off the top of my head I'm not sure what the necessity or even upside of doing that would be (besides code re-use, but as pointed out we could get code-reuse in the client if we went with murmur). We currently have the ability (in theory) of being flexible about what hash function we use, which means we can switch out city hash in the future if we want. I'd be reluctant to lose that flexibility because we add a new mostly unrelated public API. Related note - what do you think about making this API be 'hash agile' from the start? E.g. build in a hash algorithm selector from the start, using feature flags or something similar. It would be fine to have just a single implementation (murmur or city) at first. http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@29 PS4, Line 29: public class BloomFilter { The public API surface are of this class is much bigger than necessary. In particular the membership methods and the BloomKeyProbe class aren't necessary. I'm thinking something along these lines (modeled more closely on the Guava BloomFilter class): public class BloomFilter { public BloomFilter(int expectedInsertions, double fpp) { .. } public BloomFilter(int size, double fpp) { .. } public put(int value) { .. } public put(long value) { .. } ... } http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@57 PS4, Line 57: bytes = new byte[]{1}; Lots of shortlived allocations happening here. Given that the new hash API appears to take offset and len parameters, maybe the BloomFilter could allocate a length 8 byte[] upfront and just use that? http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java File java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java: http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/main/java/org/apache/kudu/util/CityHash.java@18 PS4, Line 18: package org.apache.kudu.util; Was this implementation copied from somewhere? Please include a comment if so, and retain the original header. http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11077/4/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@38 PS4, Line 38: ,n missing space here and below -- To view, visit http://gerrit.cloudera.org:8080/11077 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50c5dce4d24707c068c5ab94e5d311b12b3251b8 Gerrit-Change-Number: 11077 Gerrit-PatchSet: 4 Gerrit-Owner: [email protected] Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: [email protected] Gerrit-Comment-Date: Tue, 14 Aug 2018 23:58:44 +0000 Gerrit-HasComments: Yes
