Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11333 )
Change subject: [KUDU-2521] Java Implementation for BloomFilter ...................................................................... Patch Set 5: (18 comments) http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java File java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@30 PS5, Line 30: public class BloomFilter { Could you add Javadocs to the class as well as to all public non-visiblefortesting methods, explaining how they work and how to use them? http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@33 PS5, Line 33: private byte[] bitmap; Curious why you chose to implement your own bitmap instead of reusing java.util.BitSet? http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@36 PS5, Line 36: private HashFunction hashFunction; Some of these fields can be marked final because they're not modified after the constructor call, right? http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@39 PS5, Line 39: this.nBits = nBits; Should we precondition that this is at least 8? http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@40 PS5, Line 40: this.bitmap = bitmap; Likewise, should we precondition that this has space for at least 8 bits? http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@65 PS5, Line 65: nBytes Shouldn't this be nBits? I'm surprised this worked as-is; could you add a unit test that would fail or malfunction when this is nBytes? http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@74 PS5, Line 74: if (data) { : byteBuffer[0] = 1; : } else { : byteBuffer[0] = 0; : } Nit: byteBuffer[0] = data ? 1 : 0; http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@143 PS5, Line 143: private void updateBitmap(byte[] byteBuffer, int length) { Should we precondition that bitmap.size >= length? http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@151 PS5, Line 151: tmp = tmp + h2; Nit: tmp += h2; http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@155 PS5, Line 155: // This is for test. Methods whose visibility has been increased for testing should be annotated with: @InterfaceAudience.LimitedPrivate("Test”) Then you don't need the "This is for test" comments either. See https://github.com/apache/kudu/commit/abbde75e12f2275e1a286df60d788e9c5b411bcb for more details. http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@225 PS5, Line 225: private boolean checkIfContains(byte[] bytes) { Likewise, precondition on bytes.length vs. bitmap size here? http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@233 PS5, Line 233: bitpos Nit: bitPos (camel-case) http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@237 PS5, Line 237: tmp = tmp + h2; Nit: tmp += h2; http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@243 PS5, Line 243: private static double kNaturalLog2= 0.69314; Nit: reformat as: private static double kNaturalLog2 = 0.69314; (separate the assignment operator '=' from the variable name 'kNaturalLog2' with a space) http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@257 PS5, Line 257: n_bits, Nit: camel-case in Java, so 'nBits' http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@271 PS5, Line 271: private static long pickBit(long hash, int nBits) { Nit: maybe rename to GetBitIndex or some such, to emphasize that what's returned is an index into a bitmap? http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/main/java/org/apache/kudu/util/BloomFilter.java@276 PS5, Line 276: public String toString() { Nit: mind implementing this via MoreObjects.toStringHelper(), or even just a StringBuilder? Easier to follow, I think. http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java: http://gerrit.cloudera.org:8080/#/c/11333/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestBloomFilter.java@29 PS5, Line 29: private int kRandomSeed = 0xdeadbeef; By reusing the same seed, you'll get the exact same PRNG sequencing with every test run, so the test coverage will remain static. Why not seed using the system time or something like that? -- To view, visit http://gerrit.cloudera.org:8080/11333 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32673c008f9958088d281c2c198c543bfea2eb8e Gerrit-Change-Number: 11333 Gerrit-PatchSet: 5 Gerrit-Owner: jinxing6...@126.com Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: jinxing6...@126.com Gerrit-Comment-Date: Tue, 11 Sep 2018 22:06:12 +0000 Gerrit-HasComments: Yes