On Thu, Apr 7, 2016 at 6:55 AM, Robert Haas <robertmh...@gmail.com> wrote: >> In reality there will be multiple processes running at the same time (e.g >> backends when running parallel query), significantly reducing the amount of >> cache per process, making the replacement sort inefficient and thus >> eliminating the regressions (by making the master slower). > > Interesting point.
The effective use of CPU cache is *absolutely* critical here. I think that this patch is valuable primarily because it makes sorting predictable, and only secondarily because it makes it much faster. Having discrete costs that can be modeled fairly accurately has significant practical benefits for DBAs, and for query optimization, especially when parallel worker sorts must be costed. Inefficient use of CPU cache implies a big overall cost for the server, not just one client; my sorting patches are usually tested on single client cases, but the multi-client cases can be a lot more sympathetic (we saw this with abbreviated keys at one point). I wonder how many DBAs are put off by higher work_mem settings due to issues with replacement selection....they are effectively denied the ability to set work_mem appropriately across the board, because of this one weak spot. It really is perverse that there is, in effect, a "Blackjack" cost function for sorts, which runs counter to the general intuition that more memory is better. > I certainly agree that GUCs that aren't easy to tune are bad. I'm > wondering whether the fact that this one is hard to tune is something > that can be fixed. The comments about "padding" - a term I don't > like, because it to me implies a deliberate attempt to game the > benchmark when in reality wanting to sort a wide row is entirely > reasonable - make me wonder if this should be based on a number of > tuples rather than an amount of memory. If considering the row width > makes us get the wrong answer, then let's not do that. That's a good point. While I don't think it will make it easy to tune the GUC, it will make it easier. Although, I think that it should probably still be GUC_UNIT_KB. That should just be something that my useselection() function compares to the overall size of memtuples alone when we must initially spill, not the value of work_mem/maintenance_work_mem. The degree of padding isn't entirely irrelevant, because not all comparisons will be resolved at the stup.datum1 level, but it's still clearly an improvement to not have wide tuples mess with things. Would you like me to revise the patch along those lines? Or, do you prefer units of tuples? Tuples are basically equivalent, but make it way less obvious what the relationship with CPU cache might be. If I revise the patch along these lines, I should also reduce the default replacement_sort_mem to produce roughly equivalent behavior for non-padded cases. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers