Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )
Change subject: Import Impala's blocked based BloomFilter ...................................................................... Patch Set 5: (25 comments) http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG Commit Message: PS5: > A few nits on writing commit messages as per guidelines mentioned at https: Done http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@7 PS5, Line 7: blocked based > block-based Done http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@9 PS5, Line 9: blocked based > block-based Done http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@10 PS5, Line 10: predicate > predicates Done http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@12 PS5, Line 12: part > a part Done http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@13 PS5, Line 13: kudu > Kudu Done http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@15 PS5, Line 15: generated > auto-generated Done http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@22 PS5, Line 22: This BF uses AVX2 operations which may not be available on target CPU when run. : So the AVX2 specific methods have been moved to a separate file and compiled with -mavx2 : flag and a runtime check is made to invoke the method only if the CPU supports AVX2. : : Compiler may not support AVX2, hence the file block_bloom_filter_avx2.cc is conditionally : compiled. Done. > In general do you know which segment(s) of our supported distro matrix fall > into state #1? Support for AVX2 instructions was added in GCC 4.7.0 and we require GCC 4.8+ for C+11 support. So support for AVX2 will always be available with GCC. https://gcc.gnu.org/gcc-4.7/changes.html Couldn't figure about Clang. http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt File src/kudu/util/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@248 PS5, Line 248: execute_process ( > Nit: execute_process( Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@256 PS5, Line 256: if (${AVX2_SUPPORT}) > I think this might work too: Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@258 PS5, Line 258: set_source_files_properties(block_bloom_filter_avx2.cc PROPERTIES COMPILE_FLAGS "-mavx2") > Seems like COMPILE_OPTIONS is preferred now: https://cmake.org/cmake/help/l Using the preferred method of COMPILE_OPTIONS here. http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc File src/kudu/util/block_bloom_filter-test.cc: http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@88 PS5, Line 88: vector<shared_ptr<BlockBloomFilter>> bloom_filters_; > Based on usage, doesn't seem like shared ownership is warranted here. How a shared_ptr is needed so that Close() can be invoked on the BBFs created by the test cases using CreateBloomFilter(). If the BBF API used regular constructor/destructor to allocate and de-allocate memory then there won't be need to keep track of BFFs. However the API uses Init/Close to allow integration with Impala. http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@124 PS5, Line 124: SeedRandom(); > You can SeedRandom() once in the test fixture constructor (or SetUp) instea Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h File src/kudu/util/block_bloom_filter.h: http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@37 PS5, Line 37: // A BloomFilter stores sets of items and offers a query operation indicating whether or > May want to add something here about how this is different from bloom_filte Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@63 PS5, Line 63: virtual Status AllocateBuffer(size_t bytes, void **ptr) = 0; : virtual void FreeBuffer(void *ptr) = 0; > Nit: void** and void* (space after the asterisk). Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@134 PS5, Line 134: // The BloomFilter is divided up into Buckets > Nit: comments should terminate with a period, here and elsewhere. Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@188 PS5, Line 188: : // Detect at run-time whether CPU supports AVX2 : bool has_avx2() const { : return !FLAGS_disable_blockbloomfilter_avx2 && cpu_.has_avx2(); : } > These checks are on the hot path; they add a few branches to each BF operat Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc File src/kudu/util/block_bloom_filter.cc: http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@33 PS5, Line 33: DEFINE_bool(disable_blockbloomfilter_avx2, false, > should this be marked hidden or experimental? Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@48 PS5, Line 48: cpu_(base::CPU()) {} > Depending on how often BBFs are constructed/destroyed, this may get expensi Made the cpu_ member static so it's initialized only once. http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@52 PS5, Line 52: "Close() should have been called before the object is destroyed."; > Yeah if the convention in Kudu is to release resources in the destructor, m Ack. Using Init/Close v/s the default constructor/destructor approach is not intuitive. However this comes from Impala and will allow for easier integration. http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@64 PS5, Line 64: DCHECK(log_num_buckets_ <= 32) << "Bloom filter too large. log_space_bytes: " > Yeah, this invariant I think came from the Impala planner not generating fi Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@138 PS5, Line 138: ATTRIBUTE_NO_SANITIZE_INTEGER > why do we not want to sanitize this and the above? it seems like overflow w Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@178 PS5, Line 178: if (has_avx2()) { > do all of our supported compiler toolchains support function multiversionin For GCC, we require 4.8+ so function multi-versioning is supported there. Addressed run-time branching issue using function pointers that are evaluated in constructor. Few points in favor of this approach: - We need to support the case where compiler doesn't support AVX2 so not sure how function multi-versioning will work if the compiler doesn't support the target architecture. - Using function multi-versioning, we won't be able to test both non-AVX2 and AVX2 variants on the same machine as currently using a simple toggle flag (at least I couldn't figure) http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@187 PS5, Line 187: int ret_code = posix_memalign(ptr, 64, bytes); > I think we have a constant somewhere (maybe port.h) for CACHELINE_SIZE or s Done http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@192 PS5, Line 192: DCHECK(ptr != nullptr); > or free(DCHECK_NOTNULL(ptr)) Done -- To view, visit http://gerrit.cloudera.org:8080/14745 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906 Gerrit-Change-Number: 14745 Gerrit-PatchSet: 5 Gerrit-Owner: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Thu, 05 Dec 2019 18:27:19 +0000 Gerrit-HasComments: Yes
