Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15511 )
Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table. ...................................................................... Patch Set 10: (5 comments) Patch Set 10 adds several things: - Fix bug where we forgot to swap distance (PSL) variable as we swap bucket. - Create a separate ProbeRobinHood function for robin-hood algorithm. - Add some new profile counters to measure the cost of rebalancing. Below, I also add some of my own comment indicating several of DCHECK hacks that I'm not fully understand, but works. Patch set 10 pass core tests with this hacks, but we should definitely revisit them later to fully verify their correctness. http://gerrit.cloudera.org:8080/#/c/15511/9/be/src/exec/hash-table.inline.h File be/src/exec/hash-table.inline.h: http://gerrit.cloudera.org:8080/#/c/15511/9/be/src/exec/hash-table.inline.h@92 PS9, Line 92: (LIKELY(step < nu > In case of FindBuildRowBucket, it is quite tricky, because the caller of Fi I went ahead by creating separate ProbeRobinHood function to solve this. http://gerrit.cloudera.org:8080/#/c/15511/9/be/src/exec/hash-table.inline.h@132 PS9, Line 132: if we got > You are correct, thanks for pointing this bug! Will fix this in my next sub Done http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-builder.cc@1316 PS10, Line 1316: DCHECK_EQ(replaced_constants.robin_hood, 0); I'm not sure if the replacement count here should be 0. But looking at priors DCHECKs, it is probably right. http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-builder.cc@1393 PS10, Line 1393: DCHECK_REPLACE_COUNT(replaced, 2); In Patch Set 10, I create a second Probe function special for Robin-Hood algorithm. This additions seems to add a second call-site to function "Equals". Changing this DCHECK and other related checks to 2 pass me from errors. http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-node.cc File be/src/exec/partitioned-hash-join-node.cc: http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-node.cc@1543 PS10, Line 1543: DCHECK(replaced == 1 || replaced == 2 || replaced == 3 || replaced == 4) << replaced; Similarly, since I'm adding a new Probe function, I think I should change this DCHECK, but I'm not sure what is the right check. -- To view, visit http://gerrit.cloudera.org:8080/15511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986 Gerrit-Change-Number: 15511 Gerrit-PatchSet: 10 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 02 Apr 2020 15:17:18 +0000 Gerrit-HasComments: Yes
