Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15511 )
Change subject: WIP IMPALA-9434: Implement Robin Hood Hash Table. ...................................................................... Patch Set 3: (6 comments) Hi Tim, thanks for your valuable feedbacks! http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG@12 PS3, Line 12: robin hood algorithm to keep probe function of hash-table intact. > It's worth noting that this isn't the full robin-hood hashing approach, bec You told me before about short-circuiting the lookup but I didn't quite get it then. Now that you put it that way, I understand better what you mean. Yes, we can do the short-circuit lookup if we maintain the distance invariant. Right now, the invariant is not maintained because insert using iterator does not trigger rebalance. I'll check the code again and see if I can do something about it. http://gerrit.cloudera.org:8080/#/c/15511/3//COMMIT_MSG@14 PS3, Line 14: Instead of proactively swapping elements during insertion, the > I guess we can revisit this to see what the impact is, it seems like it pro Having separate Probe() function that is special for insert might remove the redundant second pass. I'll see if this is possible. http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h File be/src/exec/hash-table.inline.h: http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@115 PS3, Line 115: target_distance == 0 || > Why do we need this condition? My understanding was that you always want to Ah, I see what you mean. Somehow I was under impression that element that is already perfectly placed should not be evicted. Will remove this condition. http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@138 PS3, Line 138: if (curr_bucket->filled) { > Is it possible for this to be false? Why would there be an empty bucket *be Good point, I must have miss this while I refine my code. Will change this with DCHECK. http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@162 PS3, Line 162: inline int64_t HashTable::BucketDistance( > I guess calculating this for linear probing is relatively straightforward, I'll see if I can put O(1) distance measurement, at least for the linear probing. I did tried that before, and it failed PlannerTests. But that probably my implementation that still incorrect. http://gerrit.cloudera.org:8080/#/c/15511/3/be/src/exec/hash-table.inline.h@374 PS3, Line 374: table_->PrepareBucketForInsert(bucket_idx_, hash); > You could also rebalance there, I think, to benefit the agg. Or combine the If we rebalance here, it will reorder elements in array buckets_, which in turn breaking the order of iterator. And hash-table suppose to be thread-safe for read access. I'm worried if there are two iterator going where one do read only and the other do write through this SetTuple() function, the later iterator will break the former as well. Please correct me if my assumption is wrong, or if there is some guarantee that such case is impossible. I think I haven't fully understand the use case of this function. -- To view, visit http://gerrit.cloudera.org:8080/15511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I28eeccd7f9ccae39e31972391f971901bcbfe986 Gerrit-Change-Number: 15511 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: David Rorke <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 24 Mar 2020 19:03:33 +0000 Gerrit-HasComments: Yes
