Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17592 )

Change subject: IMPALA-7635: Reducing HashTable size by packing it's buckets 
efficiently.
......................................................................


Patch Set 5:

(12 comments)

Did a first round. The code looks really great, I mostly had comments about 
naming conventions.

http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG@9
PS5, Line 9: contiguos
contiguous?


http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG@61
PS5, Line 61: the the
duplicated 'the'


http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG@69
PS5, Line 69: ofthis
of this


http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG@73
PS5, Line 73: This would help measuring the impact of changes to the HashTable's
            : data structure and algorithm.
It would be nice to some results in the commit message


http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table-test.cc
File be/src/exec/hash-table-test.cc:

http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table-test.cc@744
PS5, Line 744: // Test to ensure the bucket size doesn't change accidentally.
Nice! It could be also a static_assert at the class scope, e.g.:

  static_assert(HashTable::BUCKET_SIZE == 8, "size should be power of two");


http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@653
PS5, Line 653: is_matched
The namings are inconsistent a bit in this file. Some functions use underscore, 
some use CamelCase.

We usually only use underscores if we return a reference to a data member.

So it'd be probably better to use UpperCamelCase for all the new member 
functions.


http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@674
PS5, Line 674: dn
nit: probably name it 'tdn' to be more clear?


http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@694
PS5, Line 694: uint8
Why it isn't BucketData?


http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@711
PS5, Line 711: getBd
nit: we usually don't abbreviate, i.e. it should be 'getBucketData()'


http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@712
PS5, Line 712:       uint8* ptr = getPtr();
             :       BucketData * bdPtr = reinterpret_cast<BucketData *>(&ptr);
             :       return *bdPtr;
(&ptr) is the address of the local variable 'ptr'.

Why not just simply

 return *reinterpret_cast<BucketData*>(getPtr());


http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/util/tagged-ptr.h
File be/src/util/tagged-ptr.h:

http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/util/tagged-ptr.h@66
PS5, Line 66: s
nit: 'S', i.e. we are using UpperCamelCase for functions.


http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/util/tagged-ptr.h@103
PS5, Line 103: data
nit: data_, i.e. we postfix member variables with '_'



--
To view, visit http://gerrit.cloudera.org:8080/17592
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b
Gerrit-Change-Number: 17592
Gerrit-PatchSet: 5
Gerrit-Owner: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 07 Jul 2021 16:49:38 +0000
Gerrit-HasComments: Yes

Reply via email to