Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17592 )
Change subject: IMPALA-7635: Reducing HashTable size by packing it's buckets efficiently. ...................................................................... Patch Set 5: (12 comments) Did a first round. The code looks really great, I mostly had comments about naming conventions. http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG@9 PS5, Line 9: contiguos contiguous? http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG@61 PS5, Line 61: the the duplicated 'the' http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG@69 PS5, Line 69: ofthis of this http://gerrit.cloudera.org:8080/#/c/17592/5//COMMIT_MSG@73 PS5, Line 73: This would help measuring the impact of changes to the HashTable's : data structure and algorithm. It would be nice to some results in the commit message http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table-test.cc File be/src/exec/hash-table-test.cc: http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table-test.cc@744 PS5, Line 744: // Test to ensure the bucket size doesn't change accidentally. Nice! It could be also a static_assert at the class scope, e.g.: static_assert(HashTable::BUCKET_SIZE == 8, "size should be power of two"); http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h File be/src/exec/hash-table.h: http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@653 PS5, Line 653: is_matched The namings are inconsistent a bit in this file. Some functions use underscore, some use CamelCase. We usually only use underscores if we return a reference to a data member. So it'd be probably better to use UpperCamelCase for all the new member functions. http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@674 PS5, Line 674: dn nit: probably name it 'tdn' to be more clear? http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@694 PS5, Line 694: uint8 Why it isn't BucketData? http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@711 PS5, Line 711: getBd nit: we usually don't abbreviate, i.e. it should be 'getBucketData()' http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/exec/hash-table.h@712 PS5, Line 712: uint8* ptr = getPtr(); : BucketData * bdPtr = reinterpret_cast<BucketData *>(&ptr); : return *bdPtr; (&ptr) is the address of the local variable 'ptr'. Why not just simply return *reinterpret_cast<BucketData*>(getPtr()); http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/util/tagged-ptr.h File be/src/util/tagged-ptr.h: http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/util/tagged-ptr.h@66 PS5, Line 66: s nit: 'S', i.e. we are using UpperCamelCase for functions. http://gerrit.cloudera.org:8080/#/c/17592/5/be/src/util/tagged-ptr.h@103 PS5, Line 103: data nit: data_, i.e. we postfix member variables with '_' -- To view, visit http://gerrit.cloudera.org:8080/17592 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72912ae9353b0d567a976ca712d2d193e035df9b Gerrit-Change-Number: 17592 Gerrit-PatchSet: 5 Gerrit-Owner: Amogh Margoor <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 07 Jul 2021 16:49:38 +0000 Gerrit-HasComments: Yes
