Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15511 )

Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table.
......................................................................


Patch Set 4:

(6 comments)

Patch set 4 submitted.

This patch address some of comments from patch set 3, and add initial 
microbenchmarking code.
After some exploration, I come to conclusion that quadratic probing does not 
fit well with robin hood hashing. We might be better off by dropping quadratic 
probing entirely and commit on robin hood linear probing only.

http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG@12
PS3, Line 12: bucket rebalancing after every insert.
> You told me before about short-circuiting the lookup but I didn't quite get
I've been looking at this. It looks like short-circuiting lookup only possible 
in linear probing mode because it exploit that linear probing behavior. 
Therefore, we need to drop quadratic probing mode first before we can fully 
implement all improvement that is possible in Robin Hood, linear probing.


http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG@14
PS3, Line 14: Instead of proactively swapping elements during insertion, the
> Also I think when probing for lookup we might consider trying the "start in
I'm still thinking how to do single pass insert. From what I understand, we 
should tweak the insertion pattern that is using Iterator. For example, 
FindProbeRow(), FindBuildRowBucket() should commit early whether they indeed 
plan to do insert or just testing existence.


http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h
File be/src/exec/hash-table.inline.h:

http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@115
PS3, Line 115: new_distance <= target_d
> Ah, I see what you mean. Somehow I was under impression that element that i
Done


http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@138
PS3, Line 138:         // swapped back into temporary location at index 
curr_idx.
> Sorry, this example is bad, because the final balanced bucked should be [-1
This check, unfortunately, need to stay because it is possible to swap with 
empty bucket in the case of quadratic probing. We should iron this out if we 
decide to drop quadratic probing.


http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@162
PS3, Line 162:   } else {
> Wonder if we should consider storing the bucket distance (calculated during
Patch Set 4 address O(1) distance measurement for linear probing.

For quadratic probing, because the probe steps follow Triangular number 
pattern, we supposedly can do ~O(1)  by doing its inverse function
https://en.wikipedia.org/wiki/Triangular_number#Triangular_roots_and_tests_for_triangular_numbers

However, I'm leaning towards dropping quadratic probing entirely, so I keep it 
as loop for quadratic probing.


http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@374
PS3, Line 374:     DCHECK(node_ != NULL);
> > And hash-table suppose to be thread-safe for read access.
Add rebalancing at the end of Iterator::SetTuple and it pass core tests.



--
To view, visit http://gerrit.cloudera.org:8080/15511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
Gerrit-Change-Number: 15511
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Mar 2020 01:25:58 +0000
Gerrit-HasComments: Yes

Reply via email to