Re: [HACKERS] Tuplesort merge pre-reading

Peter Geoghegan Fri, 30 Sep 2016 06:09:45 -0700

On Thu, Sep 29, 2016 at 4:10 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
> Bah, I fumbled the initSlabAllocator() function, attached is a fixed
> version.


This looks much better. It's definitely getting close. Thanks for
being considerate of my more marginal concerns. More feedback:

* Should say "fixed number of...":

> +    * we start merging.  Merging only needs to keep a small, fixed number 
> tuples

* Minor concern about these new macros:

> +#define IS_SLAB_SLOT(state, tuple) \
> +   ((char *) tuple >= state->slabMemoryBegin && \
> +    (char *) tuple < state->slabMemoryEnd)
> +
> +/*
> + * Return the given tuple to the slab memory free list, or free it
> + * if it was palloc'd.
> + */
> +#define RELEASE_SLAB_SLOT(state, tuple) \
> +   do { \
> +       SlabSlot *buf = (SlabSlot *) tuple; \
> +       \
> +       if (IS_SLAB_SLOT(state, tuple)) \
> +       { \
> +           buf->nextfree = state->slabFreeHead; \
> +           state->slabFreeHead = buf; \
> +       } else \
> +           pfree(tuple); \
> +   } while(0)

I suggest duplicating the paranoia seen elsewhere around what "state"
macro argument could expand to. You know, by surrounding "state" with
parenthesis each time it is used. This is what we see with existing,
similar macros.

* Should cast to int64 here (for the benefit of win64):

> +       elog(LOG, "using " INT64_FORMAT " KB of memory for read buffers among 
> %d input tapes",
> +            (long) (availBlocks * BLCKSZ) / 1024, numInputTapes);

* FWIW, I still don't love this bit:

> +    * numTapes and numInputTapes reflect the actual number of tapes we will
> +    * use.  Note that the output tape's tape number is maxTapes - 1, so the
> +    * tape numbers of the used tapes are not consecutive, so you cannot
> +    * just loop from 0 to numTapes to visit all used tapes!
> +    */
> +   if (state->Level == 1)
> +   {
> +       numInputTapes = state->currentRun;
> +       numTapes = numInputTapes + 1;
> +       FREEMEM(state, (state->maxTapes - numTapes) * TAPE_BUFFER_OVERHEAD);
> +   }

But I can see how the verbosity of almost-duplicating the activeTapes
stuff seems unappealing. That said, I think that you should point out
in comments that you're calculating the number of
maybe-active-in-some-merge tapes. They're maybe-active in that they
have some number of real tapes. Not going to insist on that, but
something to think about.

* Shouldn't this use state->tapeRange?:

> +   else
> +   {
> +       numInputTapes = state->maxTapes - 1;
> +       numTapes = state->maxTapes;
> +   }

* Doesn't it also set numTapes without it being used? Maybe that
variable can be declared within "if (state->Level == 1)" block.

* Minor issues with initSlabAllocator():

You call the new function initSlabAllocator() as follows:

> +   if (state->tuples)
> +       initSlabAllocator(state, numInputTapes + 1);
> +   else
> +       initSlabAllocator(state, 0);

Isn't the number of slots (the second argument to initSlabAllocator())
actually just numInputTapes when we're "state->tuples"? And so,
shouldn't the "+ 1" bit happen within initSlabAllocator() itself? It
can just inspect "state->tuples" itself. In short, maybe push a bit
more into initSlabAllocator(). Making the arguments match those passed
to initTapeBuffers() a bit later would be nice, perhaps.

* This could be simpler, I think:

> +   /* Release the read buffers on all the other tapes, by rewinding them. */
> +   for (tapenum = 0; tapenum < state->maxTapes; tapenum++)
> +   {
> +       if (tapenum == state->result_tape)
> +           continue;
> +       LogicalTapeRewind(state->tapeset, tapenum, true);
> +   }

Can't you just use state->tapeRange, and remove the "continue"? I
recommend referring to "now-exhausted input tapes" here, too.

* I'm not completely prepared to give up on using
MemoryContextAllocHuge() within logtape.c just yet. Maybe you're right
that it couldn't possibly matter that we impose a MaxAllocSize cap
within logtape.c (per tape), but I have slight reservations that I
need to address. Maybe a better way of putting it would be that I have
some reservations about possible regressions at the very high end,
with very large workMem. Any thoughts on that?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tuplesort merge pre-reading

Reply via email to