So, let me sum this up, the way I understand the current status.

1) overall, the patch seems to be a clear performance improvement

There's far more "green" cells than "red" ones in the spreadsheets, and the patch often shaves off 30-75% of the sort duration. Improvements are pretty much all over the board, for all data sets (low/high/unique cardinality, initial ordering) and data types.

2) it's unlikely we can improve the performance further

The regressions are limited to low work_mem settings, which we believe are not representative (or at least not as much as the higher work_mem values), for two main reasons.

Firstly, if you need to sort a lot of data (e.g. 10M, as benchmarked), it's quite reasonable to use larger work_mem values. It'd be a bit backwards to reject a patch that gets you 2-4x speedup with enough memory, on the grounds that it may have negative impact with unreasonably small work_mem values.

Secondly, master is faster only if there's enough on-CPU cache for the replacement sort (for the memtuples heap), but the benchmark is not realistic in this respect as it only ran 1 query at a time, so it used the whole cache (6MB for i5, 12MB for Xeon).

In reality there will be multiple processes running at the same time (e.g backends when running parallel query), significantly reducing the amount of cache per process, making the replacement sort inefficient and thus eliminating the regressions (by making the master slower).

3) replacement_sort_mem GUC

I'm not quite sure what's the plan with this GUC. It was useful for development, but it seems to me it's pretty difficult to tune it in practice (especially if you don't know the internals, which users generally don't).

The current patch includes the new GUC right next to work_mem, which seems rather unfortunate - I do expect users to simply mess with assuming "more is better" which seems to be rather poor idea.

So I think we should either remove the GUC entirely, or move it to the developer section next to trace_sort (and removing it from the conf).

I'm wondering whether 16MB default is not a bit too much, actually. As explained before, that's not the amount of cache we should expect per process, so maybe ~2-4MB would be a better default value?

Also, not what I'm re-reading the docs for the GUC, I realize it also depends on how the input data is correlated - that seems like a rather useless criteria for tuning, though, because that varies per sort node, so using that for a GUC value set in postgresql.conf does not seem very wise. Actually even on per-query basis that's rather dubious, as it depends on how the sort node gets data (some nodes preserve ordering, some don't).

BTW couldn't we tune the value automatically for each sort, using the pg_stats.correlation for the sort keys, when available (increasing the replacement_sort_mem when correlation is close to 1.0)? Wouldn't that improve at least some of the regressions?


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to