Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15511 )

Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table.
......................................................................


Patch Set 3:

(6 comments)

Hi Tim, thanks for your valuable feedbacks!

http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG@12
PS3, Line 12: robin hood algorithm to keep probe function of hash-table intact.
> It's worth noting that this isn't the full robin-hood hashing approach, bec
You told me before about short-circuiting the lookup but I didn't quite get it 
then. Now that you put it that way, I understand better what you mean.

Yes, we can do the short-circuit lookup if we maintain the distance invariant.  
Right now, the invariant is not maintained because insert using iterator does 
not trigger rebalance. I'll check the code again and see if I can do something 
about it.


http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG@14
PS3, Line 14: Instead of proactively swapping elements during insertion, the
> I guess we can revisit this to see what the impact is, it seems like it pro
Having separate Probe() function that is special for insert might remove the 
redundant second pass. I'll see if this is possible.


http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h
File be/src/exec/hash-table.inline.h:

http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@115
PS3, Line 115: target_distance == 0 ||
> Why do we need this condition? My understanding was that you always want to
Ah, I see what you mean. Somehow I was under impression that element that is 
already perfectly placed should not be evicted. Will remove this condition.


http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@138
PS3, Line 138:       if (curr_bucket->filled) {
> Is it possible for this to be false? Why would there be an empty bucket *be
Good point, I must have miss this while I refine my code. Will change this with 
DCHECK.


http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@162
PS3, Line 162: inline int64_t HashTable::BucketDistance(
> I guess calculating this for linear probing is relatively straightforward,
I'll see if I can put O(1) distance measurement, at least for the linear 
probing. I did tried that before, and it failed PlannerTests. But that probably 
my implementation that still incorrect.


http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@374
PS3, Line 374:   table_->PrepareBucketForInsert(bucket_idx_, hash);
> You could also rebalance there, I think, to benefit the agg. Or combine the
If we rebalance here, it will reorder elements in array buckets_, which in turn 
breaking the order of iterator.
And hash-table suppose to be thread-safe for read access.
I'm worried if there are two iterator going where one do read only and the 
other do write through this SetTuple() function, the later iterator will break 
the former as well.
Please correct me if my assumption is wrong, or if there is some guarantee that 
such case is impossible. I think I haven't fully understand the use case of 
this function.



--
To view, visit http://gerrit.cloudera.org:8080/15511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986
Gerrit-Change-Number: 15511
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: David Rorke <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 24 Mar 2020 19:03:33 +0000
Gerrit-HasComments: Yes

Reply via email to