Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17592 )

Change subject: IMPALA-7635: Reducing HashTable size by packing it's buckets 
efficiently.
......................................................................


Patch Set 9:

(33 comments)

Looks pretty good. I have some additional comments, mostly nits.

http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@9
PS9, Line 9: HashTable implementation comprises of contiguous
nit. in Impala


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@10
PS9, Line 10: would either
            : have Data or will
nit. "contains either data or a pointer to linked"


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@15
PS9, Line 15:   struct DuplicateNode {
            :     bool matched;
            :     DuplicateNode* next;
            :     HtData htdata;
            :   };
            :
            :   struct Bucket {
            :     bool filled;
            :     bool matched;
            :     bool hasDuplicates;
            :     uint32_t hash;
            :     union {
            :       HtData htdata;
            :       DuplicateNode* duplicates;
            :     } bucketData;
            :   };
nit. The commit message is nice with many details. I wonder if it can be 
simplified a little bit by omitting some details. For example we could just 
provide the definition on Bucket here.


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@38
PS9, Line 38: intel
nit. Intel


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@58
PS9, Line 58: As a part of patch, TaggedPointer is introduced which is template
nit: a template class


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@59
PS9, Line 59: tag together in 64 bit integer
nit. "tags together in 64-bit integers".


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@71
PS9, Line 71: Building Hash Table and Probing the table
nit. use lower cases.


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@81
PS9, Line 81: 17% and 21%
Nice.


http://gerrit.cloudera.org:8080/#/c/17592/9//COMMIT_MSG@83
PS9, Line 83: 0-1.5%
Nice.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc
File be/src/benchmarks/hash-table-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@47
PS9, Line 47: Hash Table Build:          Function  iters/ms   10%ile   50%ile   
90%ile     10%ile     50%ile     90%ile
nit. Formatting. The space under "HashT able Build" is wasted. Suggest to break 
the entire line into two:

Hash Table Build:
Function  iters/ms   10%ile   50%ile   90%ile     10%ile     50%ile     90%ile


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@50
PS9, Line 50: 65536_100
nit. Add a comment on the meaning of these two numbers.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@254
PS9, Line 254: void
I think we should return bool here to indicate whether there are any troubles 
with hashing or insertion.

DCHECK() is an no-op in non-debug build.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@261
PS9, Line 261: Status status;
nit. This can be defined outside the for loop.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@267
PS9, Line 267: amespace build
nit. can we use name space htbenchmark here?


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/benchmarks/hash-table-benchmark.cc@279
PS9, Line 279: Build
nit. We should check the return status from Build().


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@694
PS9, Line 694: Tag bit 1
nit. this does not match the code at line 704.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/exec/hash-table.h@713
PS9, Line 713: TAGGED
I wonder if why this argument is needed. Is it true that all hash tables have 
been converted to tagged version?

It brings complexity.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/benchmark.cc
File be/src/util/benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/benchmark.cc@143
PS9, Line 143: if (fn != NULL) { fn(benchmarks_[0].args); }
nit. May run Clang-format. {} is not needed if the entire IF statement is in 
one line.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/benchmark.cc@174
PS9, Line 174:  if (fn != NULL) { fn(benchmarks_[i].args); }
nit. same as above.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc
File be/src/util/tagged-ptr-test.cc:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@27
PS9, Line 27: std::string
nit const std::string&


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@42
PS9, Line 42: std::string
nit. const std::string&


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@59
PS9, Line 59: false
nit. I wonder if we use OWN=true version in BE. If so, you may test it well.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@92
PS9, Line 92: ptr.SetTagBit0();
            :   ptr.SetTagBit1();
nit. May include setting tag bit 3, 4,5 and 6.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@97
PS9, Line 97: 96
nit. Please use 0x60 here to indicate which bits are set better.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@102
PS9, Line 102: 32
nit same as above.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr-test.cc@144
PS9, Line 144:
nit remove the space.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h
File be/src/util/tagged-ptr.h:

http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@54
PS9, Line 54: UnSet
nit. Unset


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@46
PS9, Line 46: // Boiler Plate Code for Setting/Unsetting/Checking Tags
            : #define SET_UNSET_IS_TAG(bit)                                     
          \
            :   static const uintptr_t DATA_MASK_##bit = (1ULL << (63 - bit));  
        \
            :   static const uintptr_t DATA_MASK_INVERSE_##bit = ~(1ULL << (63 
- bit)); \
            :   ALWAYS_INLINE void SetTagBit##bit() {                           
          \
            :     data_ = (data_ | DATA_MASK_##bit);                            
          \
            :   }                                                               
          \
            :                                                                   
          \
            :   ALWAYS_INLINE void UnSetTagBit##bit() {                         
          \
            :     data_ = (data_ & DATA_MASK_INVERSE_##bit);                    
          \
            :   }                                                               
          \
            :                                                                   
          \
            :   ALWAYS_INLINE bool IsTagBitSet##bit() const {                   
          \
            :     return (data_ & DATA_MASK_##bit);                             
  \
            :   }
nit. May run ClangFormat on the entire macro to align '\' nicely.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@99
PS9, Line 99:   // Member functions for Setting/Unsetting/Checking tags for 0-6
nit. tag bits 0-6.


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@100
PS9, Line 100: // E.g. for tag bit 0,
             :   //   ALWAYS_INLINE void SetTagBit0() {
             :   //     data_ = (data_ | DATA_MASK_0);
             :   //   }
nit. This can be moved to where the macro is defined (e.g. line 46).


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@104
PS9, Line 104: //   ALWAYS_INLINE void UnSetTagBit##bit() {
             :   //     data_ = (data_ & DATA_MASK_INVERSE_##bit);
             :   //   }
             :   //   ALWAYS_INLINE bool IsTagBitSet##bit() const {
             :   //     return (data_ & DATA_MASK_INVERSE_##bit);
             :   //   }
nit. remove?


http://gerrit.cloudera.org:8080/#/c/17592/9/be/src/util/tagged-ptr.h@137
PS9, Line 137: ~MASK_0_56_BITS
This can be made a constant to avoid compute ~ every time SetPtr() is called. 
Not sure if C++ constant folds it.


http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h
File be/src/util/tagged-ptr.h:

http://gerrit.cloudera.org:8080/#/c/17592/7/be/src/util/tagged-ptr.h@101
PS7, Line 101: //   ALWAYS_INLINE void SetTagBit0() {
             :   //     data_ = (data_ | DATA_MASK_0);
             :   //   }
> These are comparison of the tagged pointers which includes tag. For compari
Done



--
To view, visit http://gerrit.cloudera.org:8080/17592
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b
Gerrit-Change-Number: 17592
Gerrit-PatchSet: 9
Gerrit-Owner: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 05 Aug 2021 15:33:41 +0000
Gerrit-HasComments: Yes

Reply via email to