Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15511 )

Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table.
......................................................................


Patch Set 10:

(5 comments)

Patch Set 10 adds several things:
- Fix bug where we forgot to swap distance (PSL) variable as we swap bucket.
- Create a separate ProbeRobinHood function for robin-hood algorithm.
- Add some new profile counters to measure the cost of rebalancing.

Below, I also add some of my own comment indicating several of DCHECK hacks 
that I'm not fully understand, but works. Patch set 10 pass core tests with 
this hacks, but we should definitely revisit them later to fully verify their 
correctness.

http://gerrit.cloudera.org:8080/#/c/15511/9/be/src/exec/hash-table.inline.h
File be/src/exec/hash-table.inline.h:

http://gerrit.cloudera.org:8080/#/c/15511/9/be/src/exec/hash-table.inline.h@92
PS9, Line 92: (LIKELY(step < nu
> In case of FindBuildRowBucket, it is quite tricky, because the caller of Fi
I went ahead by creating separate ProbeRobinHood function to solve this.


http://gerrit.cloudera.org:8080/#/c/15511/9/be/src/exec/hash-table.inline.h@132
PS9, Line 132:  if we got
> You are correct, thanks for pointing this bug! Will fix this in my next sub
Done


http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-builder.cc
File be/src/exec/partitioned-hash-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-builder.cc@1316
PS10, Line 1316:   DCHECK_EQ(replaced_constants.robin_hood, 0);
I'm not sure if the replacement count here should be 0. But looking at priors 
DCHECKs, it is probably right.


http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-builder.cc@1393
PS10, Line 1393:   DCHECK_REPLACE_COUNT(replaced, 2);
In Patch Set 10, I create a second Probe function special for Robin-Hood 
algorithm. This additions seems to add a second call-site to function "Equals". 
Changing this DCHECK and other related checks to 2 pass me from errors.


http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/15511/10/be/src/exec/partitioned-hash-join-node.cc@1543
PS10, Line 1543:   DCHECK(replaced == 1 || replaced == 2 || replaced == 3 || 
replaced == 4) << replaced;
Similarly, since I'm adding a new Probe function, I think I should change this 
DCHECK, but I'm not sure what is the right check.



--
To view, visit http://gerrit.cloudera.org:8080/15511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
Gerrit-Change-Number: 15511
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 02 Apr 2020 15:17:18 +0000
Gerrit-HasComments: Yes

Reply via email to