Re: [HACKERS] Tuplesort merge pre-reading

Heikki Linnakangas Wed, 28 Sep 2016 09:06:43 -0700

On 09/28/2016 06:05 PM, Peter Geoghegan wrote:

On Thu, Sep 15, 2016 at 9:51 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

I don't think it makes much difference in practice, because most merge
passes use all, or almost all, of the available tapes. BTW, I think the
polyphase algorithm prefers to do all the merges that don't use all tapes
upfront, so that the last final merge always uses all the tapes. I'm not
100% sure about that, but that's my understanding of the algorithm, and
that's what I've seen in my testing.


Not sure that I understand. I agree that each merge pass tends to use
roughly the same number of tapes, but the distribution of real runs on
tapes is quite unbalanced in earlier merge passes (due to dummy runs).
It looks like you're always using batch memory, even for non-final
merges. Won't that fail to be in balance much of the time because of
the lopsided distribution of runs? Tapes have an uneven amount of real
data in earlier merge passes.


How does the distribution of the runs on the tapes matter?

+   usedBlocks = 0;
+   for (tapenum = 0; tapenum < state->maxTapes; tapenum++)
+   {
+       int64       numBlocks = blocksPerTape + (tapenum < remainder ? 1 : 0);
+
+       if (numBlocks > MaxAllocSize / BLCKSZ)
+           numBlocks = MaxAllocSize / BLCKSZ;
+       LogicalTapeAssignReadBufferSize(state->tapeset, tapenum,
+                                       numBlocks * BLCKSZ);
+       usedBlocks += numBlocks;
+   }
+   USEMEM(state, usedBlocks * BLCKSZ);


I'm basically repeating myself here, but: I think it's incorrect that
LogicalTapeAssignReadBufferSize() is called so indiscriminately (more
generally, it is questionable that it is called in such a high level
routine, rather than the start of a specific merge pass -- I said so a
couple of times already).

You can't release the tape buffer at the end of a pass, because thebuffer of a tape will already be filled with data from the next run onthe same tape.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

Reply via email to