Hi,
On 2016-10-03 13:26:09 +0200, Arthur Silva wrote: > On Sat, Oct 1, 2016 at 2:44 AM, Andres Freund <and...@anarazel.de> wrote: > A couple of comments. > * 80% occupancy is a bit conservative for RH hashing, it works well up to > 90% if you use the early stops due to distance. So that TODO is worth > pursuing. I found 90% a tiny bit slower during modifications, due to the increased moving of cells around. I actually implemented that TODO at some point, it's not actually a very clear win for narrow elements and mid-sized tables - the additional instructions for distance computations cost. I've played with a different version of robin hood hashing, where the buckets are ordered by hash-value. I.e. the hashvalue is shifted right, instead of being masked, to get the hash bucket. That allows to have a hashtable which is ordered by the hash-value, thus distance checks simply become >=. The problem with that is that it only works if you have an "overflow" area at the end of the table, which is hard to size correctly. > * An optimized memory layout for RH hashmap is along the lines of > HHHKVKVKV, using one bit of the hash as the present/absent flag. That plays > specially well with large occupancies (~90%). Due to RH the probings on the > H array are very short and within a single cache line. Even with a 31bit > hash it's reallyyy rare to have to probe more than 1 entry in the KV array. > Essentially guaranteeing that the 99% percentile of lookups will only hit a > couple of cache lines (if you ignore crossing cache lines and other > details). That seems likely to end up being noticeably more complicated - I'm not sure the cpu overhead of the more complex lookups weighs of the benefits. I'd like to get the basics in, we can optimize further later on. Based on my instrumentation the majority of time is currently spent in the cache-miss for the initial bucket, everything else inside the hash table code is quite small. After that it's hash value computations in the execGrouping case. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers