You seem to have erased the subject line from this email somehow.

On Tue, Sep 27, 2016 at 10:18 AM, Peter Geoghegan <> wrote:
> On Tue, Sep 27, 2016 at 3:08 PM, Robert Haas <> wrote:
>> I don't know what any of that means.  You said we need something like
>> an LWLock, but I think we don't.  The workers just write the results
>> of their own final merges into shm_mqs.  The leader can read from any
>> given shm_mq until no more tuples can be read without blocking, just
>> like nodeGather.c does, or at least it can do that unless its own
>> queue fills up first.  No mutual exclusion mechanism is required for
>> any of that, as far as I can see - not an LWLock, and not anything
>> similar.
> I'm confused about the distinction you're making. Isn't the shm_mqs
> queue itself a mutual exclusion mechanism? The leader cannot really do
> anything without some input to process, because of the dependency that
> naturally exists. ISTM that everything else we've discussed is
> semantics, and/or about particular mechanisms in Postgres.

No, a shm_mq is not a mutual exclusion mechanism.  It's a queue!

A data dependency and a lock aren't the same thing.  If your point in
saying "we need something like an LWLock" was actually "the leader
will have to wait if it's merged all tuples from a worker and no more
are immediately available" then I think that's a pretty odd way to
make that point.  To me, the first statement appears to be false,
while the second is obviously true.

> At a high level, based on my intuition about performance
> characteristics, I have reservations about eager merging in the
> leader. That's all I mean. There is a trade-off to be made between
> eager and lazy merging. As I pointed out, the author of the Volcano
> paper (Goetz Graefe) went on at length about this particular trade-off
> for parallel sort almost 30 years ago. So, it's not clear that that
> would be an easy win, or even a win.
> That's all I meant.

OK, well I'm not taking any position on whether what Heikki is
proposing will turn out to be good from a performance point of view.
My intuitions about sorting performance haven't turned out to be
particularly good in the past.  I'm only saying that if you do decide
to queue the tuples passing from worker to leader in a shm_mq, you
shouldn't read from the shm_mq objects in a round robin, but rather
read multiple tuples at a time from the same worker whenever that is
possible without blocking.  If you read tuples from workers one at a
time, you get a lot of context-switch thrashing, because the worker
wakes up, writes one tuple (or even part of a tuple) and immediately
goes back to sleep.  Then it shortly thereafter does it again.
Whereas if you drain the worker's whole queue each time you read from
that queue, then the worker can wake up, refill it completely, and go
back to sleep again.  So you incur fewer context switches for the same
amount of work.  Or at least, that's how it worked out for Gather, and
what I am saying is that it will probably work out the same way for
sorting, if somebody chooses to try to implement this.

I am also saying that using the shm_mq code here does not require the
introduction of an LWLock.

Robert Haas
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to