Re: [HACKERS] Tuplesort merge pre-reading

Heikki Linnakangas Wed, 28 Sep 2016 11:13:20 -0700

On 09/28/2016 07:11 PM, Peter Geoghegan wrote:

On Wed, Sep 28, 2016 at 5:04 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

Not sure that I understand. I agree that each merge pass tends to use
roughly the same number of tapes, but the distribution of real runs on
tapes is quite unbalanced in earlier merge passes (due to dummy runs).
It looks like you're always using batch memory, even for non-final
merges. Won't that fail to be in balance much of the time because of
the lopsided distribution of runs? Tapes have an uneven amount of real
data in earlier merge passes.



How does the distribution of the runs on the tapes matter?


The exact details are not really relevant to this discussion (I think
it's confusing that we simply say "Target Fibonacci run counts",
FWIW), but the simple fact that it can be quite uneven is.

Well, I claim that the fact that the distribution of runs is uneven,does not matter. Can you explain why you think it does?

This is why I never pursued batch memory for non-final merges. Isn't
that what you're doing here? You're pretty much always setting
"state->batchUsed = true".

Yep. As the patch stands, we wouldn't really need batchUsed, as we knowthat it's always true when merging, and false otherwise. But I kept it,as it seems like that might not always be true - we might use batchmemory when building the initial runs, for example - and because itseems nice to have an explicit flag for it, for readability anddebugging purposes.

I'm basically repeating myself here, but: I think it's incorrect that
LogicalTapeAssignReadBufferSize() is called so indiscriminately (more
generally, it is questionable that it is called in such a high level
routine, rather than the start of a specific merge pass -- I said so a
couple of times already).



You can't release the tape buffer at the end of a pass, because the buffer
of a tape will already be filled with data from the next run on the same
tape.


Okay, but can't you just not use batch memory for non-final merges,
per my initial approach? That seems far cleaner.

Why? I don't see why the final merge should behave differently from thenon-final ones.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

Reply via email to