So, let me sum this up, the way I understand the current status.
1) overall, the patch seems to be a clear performance improvement
There's far more "green" cells than "red" ones in the spreadsheets, and
the patch often shaves off 30-75% of the sort duration. Improvements are
pretty much all over the board, for all data sets (low/high/unique
cardinality, initial ordering) and data types.
2) it's unlikely we can improve the performance further
The regressions are limited to low work_mem settings, which we believe
are not representative (or at least not as much as the higher work_mem
values), for two main reasons.
Firstly, if you need to sort a lot of data (e.g. 10M, as benchmarked),
it's quite reasonable to use larger work_mem values. It'd be a bit
backwards to reject a patch that gets you 2-4x speedup with enough
memory, on the grounds that it may have negative impact with
unreasonably small work_mem values.
Secondly, master is faster only if there's enough on-CPU cache for the
replacement sort (for the memtuples heap), but the benchmark is not
realistic in this respect as it only ran 1 query at a time, so it used
the whole cache (6MB for i5, 12MB for Xeon).
In reality there will be multiple processes running at the same time
(e.g backends when running parallel query), significantly reducing the
amount of cache per process, making the replacement sort inefficient and
thus eliminating the regressions (by making the master slower).
3) replacement_sort_mem GUC
I'm not quite sure what's the plan with this GUC. It was useful for
development, but it seems to me it's pretty difficult to tune it in
practice (especially if you don't know the internals, which users
The current patch includes the new GUC right next to work_mem, which
seems rather unfortunate - I do expect users to simply mess with
assuming "more is better" which seems to be rather poor idea.
So I think we should either remove the GUC entirely, or move it to the
developer section next to trace_sort (and removing it from the conf).
I'm wondering whether 16MB default is not a bit too much, actually. As
explained before, that's not the amount of cache we should expect per
process, so maybe ~2-4MB would be a better default value?
Also, not what I'm re-reading the docs for the GUC, I realize it also
depends on how the input data is correlated - that seems like a rather
useless criteria for tuning, though, because that varies per sort node,
so using that for a GUC value set in postgresql.conf does not seem very
wise. Actually even on per-query basis that's rather dubious, as it
depends on how the sort node gets data (some nodes preserve ordering,
BTW couldn't we tune the value automatically for each sort, using the
pg_stats.correlation for the sort keys, when available (increasing the
replacement_sort_mem when correlation is close to 1.0)? Wouldn't that
improve at least some of the regressions?
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: