Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 5:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG
Commit Message:

PS5:
> A few nits on writing commit messages as per guidelines mentioned at https:
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@7
PS5, Line 7: blocked based
> block-based
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@9
PS5, Line 9: blocked based
> block-based
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@10
PS5, Line 10: predicate
> predicates
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@12
PS5, Line 12: part
> a part
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@13
PS5, Line 13: kudu
> Kudu
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@15
PS5, Line 15: generated
> auto-generated
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@22
PS5, Line 22: This BF uses AVX2 operations which may not be available on target 
CPU when run.
            : So the AVX2 specific methods have been moved to a separate file 
and compiled with -mavx2
            : flag and a runtime check is made to invoke the method only if the 
CPU supports AVX2.
            :
            : Compiler may not support AVX2, hence the file 
block_bloom_filter_avx2.cc is conditionally
            : compiled.
Done.

> In general do you know which segment(s) of our supported distro matrix fall 
> into state #1?

Support for AVX2 instructions was added in GCC 4.7.0 and we require GCC 4.8+ 
for C+11 support. So support for AVX2 will always be available with GCC. 
https://gcc.gnu.org/gcc-4.7/changes.html

Couldn't figure about Clang.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@248
PS5, Line 248: execute_process (
> Nit: execute_process(
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@256
PS5, Line 256: if (${AVX2_SUPPORT})
> I think this might work too:
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@258
PS5, Line 258:   set_source_files_properties(block_bloom_filter_avx2.cc 
PROPERTIES COMPILE_FLAGS "-mavx2")
> Seems like COMPILE_OPTIONS is preferred now: https://cmake.org/cmake/help/l
Using the preferred method of COMPILE_OPTIONS here.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@88
PS5, Line 88:   vector<shared_ptr<BlockBloomFilter>> bloom_filters_;
> Based on usage, doesn't seem like shared ownership is warranted here. How a
shared_ptr is needed so that Close() can be invoked on the BBFs created by the 
test cases using CreateBloomFilter().

If the BBF API used regular constructor/destructor to allocate and de-allocate 
memory then there won't be need to keep track of BFFs. However the API uses 
Init/Close to allow integration with Impala.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@124
PS5, Line 124:   SeedRandom();
> You can SeedRandom() once in the test fixture constructor (or SetUp) instea
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h
File src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@37
PS5, Line 37: // A BloomFilter stores sets of items and offers a query 
operation indicating whether or
> May want to add something here about how this is different from bloom_filte
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@63
PS5, Line 63:   virtual Status AllocateBuffer(size_t bytes, void **ptr) = 0;
            :   virtual void FreeBuffer(void *ptr) = 0;
> Nit: void** and void* (space after the asterisk).
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@134
PS5, Line 134:   // The BloomFilter is divided up into Buckets
> Nit: comments should terminate with a period, here and elsewhere.
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@188
PS5, Line 188:
             :   // Detect at run-time whether CPU supports AVX2
             :   bool has_avx2() const {
             :     return !FLAGS_disable_blockbloomfilter_avx2 && 
cpu_.has_avx2();
             :   }
> These checks are on the hot path; they add a few branches to each BF operat
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@33
PS5, Line 33: DEFINE_bool(disable_blockbloomfilter_avx2, false,
> should this be marked hidden or experimental?
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@48
PS5, Line 48:   cpu_(base::CPU()) {}
> Depending on how often BBFs are constructed/destroyed, this may get expensi
Made the cpu_ member static so it's initialized only once.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@52
PS5, Line 52:     "Close() should have been called before the object is 
destroyed.";
> Yeah if the convention in Kudu is to release resources in the destructor, m
Ack. Using Init/Close v/s the default constructor/destructor approach is not 
intuitive. However this comes from Impala and will allow for easier integration.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@64
PS5, Line 64:   DCHECK(log_num_buckets_ <= 32) << "Bloom filter too large. 
log_space_bytes: "
> Yeah, this invariant I think came from the Impala planner not generating fi
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@138
PS5, Line 138: ATTRIBUTE_NO_SANITIZE_INTEGER
> why do we not want to sanitize this and the above? it seems like overflow w
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@178
PS5, Line 178:   if (has_avx2()) {
> do all of our supported compiler toolchains support function multiversionin
For GCC, we require 4.8+ so function multi-versioning is supported there.

Addressed run-time branching issue using function pointers that are evaluated 
in constructor.

Few points in favor of this approach:
- We need to support the case where compiler doesn't support AVX2 so not sure 
how function multi-versioning will work if the compiler doesn't support the 
target architecture.
- Using function multi-versioning, we won't be able to test both non-AVX2 and 
AVX2 variants on the same machine as currently using a simple toggle flag (at 
least I couldn't figure)


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@187
PS5, Line 187:   int ret_code = posix_memalign(ptr, 64, bytes);
> I think we have a constant somewhere (maybe port.h) for CACHELINE_SIZE or s
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@192
PS5, Line 192:   DCHECK(ptr != nullptr);
> or free(DCHECK_NOTNULL(ptr))
Done



--
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Thu, 05 Dec 2019 18:27:19 +0000
Gerrit-HasComments: Yes

Reply via email to