Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17592 )
Change subject: IMPALA-7635: Reducing HashTable size by packing it's buckets efficiently. ...................................................................... Patch Set 9: (33 comments) Looks pretty good. I have some additional comments, mostly nits. http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@9 PS9, Line 9: HashTable implementation comprises of contiguous nit. in Impala http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@10 PS9, Line 10: would either : have Data or will nit. "contains either data or a pointer to linked" http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@15 PS9, Line 15: struct DuplicateNode { : bool matched; : DuplicateNode* next; : HtData htdata; : }; : : struct Bucket { : bool filled; : bool matched; : bool hasDuplicates; : uint32_t hash; : union { : HtData htdata; : DuplicateNode* duplicates; : } bucketData; : }; nit. The commit message is nice with many details. I wonder if it can be simplified a little bit by omitting some details. For example we could just provide the definition on Bucket here. http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@38 PS9, Line 38: intel nit. Intel http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@58 PS9, Line 58: As a part of patch, TaggedPointer is introduced which is template nit: a template class http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@59 PS9, Line 59: tag together in 64 bit integer nit. "tags together in 64-bit integers". http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@71 PS9, Line 71: Building Hash Table and Probing the table nit. use lower cases. http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@81 PS9, Line 81: 17% and 21% Nice. http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@83 PS9, Line 83: 0-1.5% Nice. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc File be/src/benchmarks/hash-table-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@47 PS9, Line 47: Hash Table Build: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile nit. Formatting. The space under "HashT able Build" is wasted. Suggest to break the entire line into two: Hash Table Build: Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@50 PS9, Line 50: 65536_100 nit. Add a comment on the meaning of these two numbers. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@254 PS9, Line 254: void I think we should return bool here to indicate whether there are any troubles with hashing or insertion. DCHECK() is an no-op in non-debug build. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@261 PS9, Line 261: Status status; nit. This can be defined outside the for loop. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@267 PS9, Line 267: amespace build nit. can we use name space htbenchmark here? http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@279 PS9, Line 279: Build nit. We should check the return status from Build(). http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h File be/src/exec/hash-table.h: http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@694 PS9, Line 694: Tag bit 1 nit. this does not match the code at line 704. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@713 PS9, Line 713: TAGGED I wonder if why this argument is needed. Is it true that all hash tables have been converted to tagged version? It brings complexity. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/benchmark.cc File be/src/util/benchmark.cc: http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/benchmark.cc@143 PS9, Line 143: if (fn != NULL) { fn(benchmarks_[0].args); } nit. May run Clang-format. {} is not needed if the entire IF statement is in one line. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/benchmark.cc@174 PS9, Line 174: if (fn != NULL) { fn(benchmarks_[i].args); } nit. same as above. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc File be/src/util/tagged-ptr-test.cc: http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@27 PS9, Line 27: std::string nit const std::string& http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@42 PS9, Line 42: std::string nit. const std::string& http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@59 PS9, Line 59: false nit. I wonder if we use OWN=true version in BE. If so, you may test it well. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@92 PS9, Line 92: ptr.SetTagBit0(); : ptr.SetTagBit1(); nit. May include setting tag bit 3, 4,5 and 6. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@97 PS9, Line 97: 96 nit. Please use 0x60 here to indicate which bits are set better. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@102 PS9, Line 102: 32 nit same as above. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@144 PS9, Line 144: nit remove the space. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h File be/src/util/tagged-ptr.h: http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@54 PS9, Line 54: UnSet nit. Unset http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@46 PS9, Line 46: // Boiler Plate Code for Setting/Unsetting/Checking Tags : #define SET_UNSET_IS_TAG(bit) \ : static const uintptr_t DATA_MASK_##bit = (1ULL << (63 - bit)); \ : static const uintptr_t DATA_MASK_INVERSE_##bit = ~(1ULL << (63 - bit)); \ : ALWAYS_INLINE void SetTagBit##bit() { \ : data_ = (data_ | DATA_MASK_##bit); \ : } \ : \ : ALWAYS_INLINE void UnSetTagBit##bit() { \ : data_ = (data_ & DATA_MASK_INVERSE_##bit); \ : } \ : \ : ALWAYS_INLINE bool IsTagBitSet##bit() const { \ : return (data_ & DATA_MASK_##bit); \ : } nit. May run ClangFormat on the entire macro to align '\' nicely. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@99 PS9, Line 99: // Member functions for Setting/Unsetting/Checking tags for 0-6 nit. tag bits 0-6. http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@100 PS9, Line 100: // E.g. for tag bit 0, : // ALWAYS_INLINE void SetTagBit0() { : // data_ = (data_ | DATA_MASK_0); : // } nit. This can be moved to where the macro is defined (e.g. line 46). http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@104 PS9, Line 104: // ALWAYS_INLINE void UnSetTagBit##bit() { : // data_ = (data_ & DATA_MASK_INVERSE_##bit); : // } : // ALWAYS_INLINE bool IsTagBitSet##bit() const { : // return (data_ & DATA_MASK_INVERSE_##bit); : // } nit. remove? http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@137 PS9, Line 137: ~MASK_0_56_BITS This can be made a constant to avoid compute ~ every time SetPtr() is called. Not sure if C++ constant folds it. http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h File be/src/util/tagged-ptr.h: http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@101 PS7, Line 101: // ALWAYS_INLINE void SetTagBit0() { : // data_ = (data_ | DATA_MASK_0); : // } > These are comparison of the tagged pointers which includes tag. For compari Done -- To view, visit http://gerrit.cloudera.org:8080/17592 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b Gerrit-Change-Number: 17592 Gerrit-PatchSet: 9 Gerrit-Owner: Amogh Margoor <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 05 Aug 2021 15:33:41 +0000 Gerrit-HasComments: Yes
