Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Heikki Linnakangas Tue, 06 Sep 2016 22:52:08 -0700

On 09/07/2016 12:46 AM, Peter Geoghegan wrote:

On Tue, Sep 6, 2016 at 12:34 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

Why do we reserve the buffer space for all the tapes right at the beginning?
Instead of the single USEMEM(maxTapes * TAPE_BUFFER_OVERHEAD) callin
inittapes(), couldn't we call USEMEM(TAPE_BUFFER_OVERHEAD) every time we
start a new run, until we reach maxTapes?


No, because then you have no way to clamp back memory, which is now
almost all used (we hold off from making LACKMEM() continually true,
if at all possible, which is almost always the case). You can't really
continually shrink memtuples to make space for new tapes, which is
what it would take.

I still don't get it. When building the initial runs, we don't needbuffer space for maxTapes yet, because we're only writing to a singletape at a time. An unused tape shouldn't take much memory. Ininittapes(), when we have built all the runs, we know how many tapes weactually needed, and we can allocate the buffer memory accordingly.

[thinks a bit, looks at logtape.c]. Hmm, I guess that's wrong, becauseof the way this all is implemented. When we're building the initialruns, we're only writing to one tape at a time, but logtape.cnevertheless holds onto a BLCKSZ'd currentBuffer, plus one buffer foreach indirect level, for every tape that has been used so far. What ifwe changed LogicalTapeRewind to free those buffers? Flush out theindirect buffers to disk, remembering just the physical block number ofthe topmost indirect block in memory, and free currentBuffer. That way,a tape that has been used, but isn't being read or written to at themoment, would take very little memory, and we wouldn't need to reservespace for them in the build-runs phase.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Reply via email to