On Thu, Mar 10, 2016 at 9:54 PM, Peter Geoghegan <p...@heroku.com> wrote: > 1. Creates a separate memory context for tuplesort's copies of > caller's tuples, which can be reset at key points, avoiding > fragmentation. Every SortTuple.tuple is allocated there (with trivial > exception); *everything else*, including the memtuples array, is > allocated in the existing tuplesort context, which becomes the parent > of this new "caller's tuples" context. Roughly speaking, that means > that about half of total memory for the sort is managed by each > context in common cases. Even with a high work_mem memory budget, > memory fragmentation could previously get so bad that tuplesort would > in effect claim a share of memory from the OS that is *significantly* > higher than the work_mem budget allotted to its sort. And with low > work_mem settings, fragmentation previously made palloc() thrash the > sort, especially during non-final merging. In this latest revision, > tuplesort now almost gets to use 100% of the memory that was requested > from the OS by palloc() is cases tested.
I spent some time looking at this part of the patch yesterday and today. This is not a full review yet, but here are some things I noticed: - I think that batchmemtuples() is somewhat weird. Normally, grow_memtuples() doubles the size of the array each time it's called. So if you somehow called this function when you still had lots of memory available, it would just double the size of the array. However, I think the expectation is that it's only going to be called when availMem is less than half of allowedMem, in which case we're going to get the special "last increment of memtupsize" behavior, where we expand the memtuples array by some multiple between 1.0 and 2.0 based on allowedMem/memNowUsed. And after staring at this for a long time ... yeah, I think this does the right thing. But it certainly is hairy. - It's not exactly clear what you mean when you say that the tuplesort context contains "everything else". I don't understand why that only ends up containing half the memory ... what, other than the memtuples array, ends up there? - If I understand correctly, the point of the MemoryContextReset call is: there wouldn't be any tuples in memory at that point anyway. But the OS-allocated chunks might be divided up into a bunch of small chunks that then got stuck on freelists, and those chunks might not be the right size for the next merge pass. Resetting the context avoids that problem by blowing up the freslists. Right? Clever. - I haven't yet figured out why we use batching only for the final on-the-fly merge pass, instead of doing it for all merges. I expect you have a reason. I just don't know what it is. - I have also not yet figured out why you chose to replace state->datumTypByVal with state->tuples and reverse the sense. I bet there's a reason for this, too. I don't know what it is, either. That's as far as I've gotten thus far. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers