Consider the "cities" table I've played around with throughout the development of this patch:
postgres=# select tablename, attname, n_distinct, correlation from pg_stats where attname in ('country', 'province', 'city'); tablename | attname | n_distinct | correlation -----------+----------+------------+------------- cities | city | -0.666628 | -0.0169657 cities | province | 1958 | 0.0776529 cities | country | 206 | 1 (3 rows) Consider this query: select * from (select * from cities order by country, province, city offset 1000000) i; With master, this consistently takes about 6.4 seconds on my laptop. With the patch I posted most recently, it takes about 3.5 seconds. But if I change this code: + if (ssup->position == sortKeyTiebreak && len1 == len2) + { To this: + if (len1 == len2) + { Then my patch takes about 1.3 seconds to execute the query, since now we're always optimistic about the prospect of getting away with a cheap memcmp() when the lengths match, avoiding copying and strcoll() overhead entirely. "province" has a relatively low number of distinct values - about 1,958, in a table of 317,102 cities - so clearly that optimism is justified in this instance. This does seem to suggest that there is something to be said for being optimistic about yielding equality not just when performing a leading attribute abbreviated key tie-breaker. Maybe the executor could pass a per-attribute n_distinct hint to tuplesort, which would pass that on to our sortsupport routine. Of course, if there was a low cardinality attribute with text strings all of equal length, and/or if there wasn't a correlation between "country" and "province" a lot of the time those opportunistic memcmp()s could go to waste. But that might be just fine, especially if we only did this for moderately short strings (say less than 64 bytes). I don't have a good sense of the costs yet, but the hashing that HyperLogLog requires in my patch appears to be almost free, since it occurs at a time when we're already bottlenecked on memory bandwidth, and is performed on memory that needed to be manipulated at that juncture anyway. It wouldn't surprise me if this general optimism about a simple memcmp() working out had a very acceptable worst case. It might even be practically free to be totally wrong about equality being likely, and if we have nothing to lose and everything to gain, clearly we should proceed. I'm aware of cases where it makes probabilistic sense to waste compute bandwidth to gain memory bandwidth. It's something I've seen crop up a number of times in various papers. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers