Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12588 )
Change subject: KUDU-2038-p1: Introduce CRoaring ...................................................................... Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/12588/2//COMMIT_MSG Commit Message: PS2: Given that this patch contains some tests to benchmark CRoaring, I think it's worth adding some numbers to the commit message. http://gerrit.cloudera.org:8080/#/c/12588/2//COMMIT_MSG@7 PS2, Line 7: KUDU-2038-p1: Introduce CRoaring In your original patch, you had two abstractions: "data" (i.e. bitmaps and indexing), and "readers/writers" (i.e. iterators and writers) that used the bitmap data. What do you think about doing the separation as: - bitmaps: a class that provides a clean API to the indexes/readers/writers - indexes, iterators, and writers: the logical components that use bitmaps to evaluate predicates, write blocks, etc. I think this would be cleaner because it: 1) hides some of the CRoaring-specific bits and encapsulates them in one place, 2) would therefore allow us to more easily move to a different bitmapping library in the future if we wanted to, and 3) would allow us reuse these "Kudu" bitmaps in other places if we wanted to without digging through CRoaring APIs. Defining a good API, though, requires that we have a solid understanding of how we will use it, that is, the "indexes, iterators, and writers". Since you have that context, do you agree with this separation? Or do you prefer the "data" and "reader/writers" approach that you had? FWIW I also think it would make it easier to check in this change; adding CRoaring alone with nothing else seems a bit bare. Ideally, each patch has a clear trajectory towards usefulness. http://gerrit.cloudera.org:8080/#/c/12588/2//COMMIT_MSG@9 PS2, Line 9: Introduce CRoaring and do a simple memory and disk consumption tests. nit: It may be obvious with the context we have, but it's probably worth noting how this fits into a larger feature. E.g.: This library will be used as the underlying bitmap for the upcoming bitmap indexing feature. http://gerrit.cloudera.org:8080/#/c/12588/2/cmake_modules/FindCRoaring.cmake File cmake_modules/FindCRoaring.cmake: http://gerrit.cloudera.org:8080/#/c/12588/2/cmake_modules/FindCRoaring.cmake@18 PS2, Line 18: # - Find CRoaring (roaring/roaring.hh, libroaring.a, and libroaring.so) Are all of these necessary? E.g. could we get by with just the static lib? http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc File src/kudu/tablet/index/croaring-test.cc: PS2: nit: I think this makes more sense put into the /util directory, given there isn't anything tablet-specific about CRoaring, unless you pull more scope into this patch. http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc@37 PS2, Line 37: TEST_TYPE_SINGLE, : TEST_TYPE_PAIR, : TEST_TYPE_SKIP nit: add comments documenting what these are. http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc@66 PS2, Line 66: t.n_bytes_array_containers + : t.n_bytes_run_containers + : t.n_bytes_bitset_containers; How is this different from getSizeInBytes? http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc@119 PS2, Line 119: printf(" Single Pair Skip\n"); nit: I think there's a DataTable class you can use that helps print things out nicely. Same above. http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc@120 PS2, Line 120: *6*/ Remove this. -- To view, visit http://gerrit.cloudera.org:8080/12588 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I878da34e388fb402c52c51364a68e738f785673d Gerrit-Change-Number: 12588 Gerrit-PatchSet: 2 Gerrit-Owner: helifu <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 05 Mar 2019 00:27:27 +0000 Gerrit-HasComments: Yes
