On Thu, Jul 30, 2015 at 3:47 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> This is a good optimization for the common case where tuples are mostly
> already in order. We could increase the usefulness of this by making UPDATE
> pick blocks that are close to a tuple's original block, rather than putting
> them near the end of a relation.

Not sure what you mean here.

>> So here's a shorter/different explanation of this optimization: When it's
>> time to perform the sort, instead of draining the in-memory heap one tuple
>> at a time to the last tape, you sort the heap with quicksort, and pretend
>> that the sorted heap belongs to the last tape, after all the other tuples in
>> the tape.
>> Some questions/thoughts on that:
>> Isn't that optimization applicable even when you have multiple runs?
>> Quicksorting the heap and keeping it as an array in memory is surely always
>> faster than heapsorting and pushing it to the tape.
> It's about use of memory. If you have multiple runs on tape, then they will
> need to be merged and you need memory to do that efficiently. If there are
> tuples in the last batch still in memory then it can work, but it depends
> upon how full memory is from the last batch and how many batches there are.

I agree that this optimization has a lot to do with exploiting the
fact that you don't need to free the memtuples array for future runs
because you've already received all tuples (or keep space free for
previous runs). I think that we should still use quicksort on runs
where this optimization doesn't work out, but I also still think that
that's a different idea. Doing what I've proposed here when there are
multiple runs seems less valuable, if only because you're not going to
avoid that much writing.

Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to