On Thu, Apr 7, 2016 at 6:55 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> In reality there will be multiple processes running at the same time (e.g
>> backends when running parallel query), significantly reducing the amount of
>> cache per process, making the replacement sort inefficient and thus
>> eliminating the regressions (by making the master slower).
> Interesting point.

The effective use of CPU cache is *absolutely* critical here. I think
that this patch is valuable primarily because it makes sorting
predictable, and only secondarily because it makes it much faster.
Having discrete costs that can be modeled fairly accurately has
significant practical benefits for DBAs, and for query optimization,
especially when parallel worker sorts must be costed. Inefficient use
of CPU cache implies a big overall cost for the server, not just one
client; my sorting patches are usually tested on single client cases,
but the multi-client cases can be a lot more sympathetic (we saw this
with abbreviated keys at one point).

I wonder how many DBAs are put off by higher work_mem settings due to
issues with replacement selection....they are effectively denied the
ability to set work_mem appropriately across the board, because of
this one weak spot. It really is perverse that there is, in effect, a
"Blackjack" cost function for sorts, which runs counter to the general
intuition that more memory is better.

> I certainly agree that GUCs that aren't easy to tune are bad.  I'm
> wondering whether the fact that this one is hard to tune is something
> that can be fixed.  The comments about "padding" - a term I don't
> like, because it to me implies a deliberate attempt to game the
> benchmark when in reality wanting to sort a wide row is entirely
> reasonable - make me wonder if this should be based on a number of
> tuples rather than an amount of memory.  If considering the row width
> makes us get the wrong answer, then let's not do that.

That's a good point. While I don't think it will make it easy to tune
the GUC, it will make it easier. Although, I think that it should
probably still be GUC_UNIT_KB. That should just be something that my
useselection() function compares to the overall size of memtuples
alone when we must initially spill, not the value of
work_mem/maintenance_work_mem. The degree of padding isn't entirely
irrelevant, because not all comparisons will be resolved at the
stup.datum1 level, but it's still clearly an improvement to not have
wide tuples mess with things.

Would you like me to revise the patch along those lines? Or, do you
prefer units of tuples? Tuples are basically equivalent, but make it
way less obvious what the relationship with CPU cache might be. If I
revise the patch along these lines, I should also reduce the default
replacement_sort_mem to produce roughly equivalent behavior for
non-padded cases.

Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to