Some of my EnterpriseDB colleagues and I have been working on various
parallel query projects, all of which have been previously disclosed


One issue we've encountered is that it's not very easy for one process
in a group of cooperating parallel processes to wait for another
process in that same group.  One idea is to have one process grab an
LWLock and other processes try to acquire it, but that actually
doesn't work very well.  A pretty obvious problem is that it holds of
interrupts for the entire time that you are holding the lock, which is
pretty undesirable.  A more subtle problem is that it's easy to
conceive of situations where the LWLock paradigm is just a very poor
fit for what you actually want to do.  For example, suppose you have a
computation which proceeds in two phases: each backend that finishes
phase 1 must wait until all backends finish phase 1, and once all have
finished, all can begin phase 2.  You could handle this case by having
an LWLock which everyone holds during phase 1 in shared mode, and then
everyone must briefly acquire it in exclusive mode before starting
phase 2, but that's an awful hack.  It also has race conditions: what
if someone finishes phase 1 before everyone has started phase 1?  And
what if there are 10 phases instead of 2?

Another approach to the problem is to use a latch wait loop.  That
almost works.  Interrupts can be serviced, and you can recheck shared
memory to see whether the condition for proceeding is satisfied after
each iteration of the loop.  There's only one problem: when you do
something that might cause the condition to be satisfied for other
waiting backends, you need to set their latch - but you don't have an
easy way to know exactly which processes are waiting, so how do you
call SetLatch?  I originally thought of adding a function like
SetAllLatches(ParallelContext *) and maybe that can work, but then I
had what I think is a better idea, which is to introduce a notion of
condition variables.  Condition variables, of course, are a standard
synchronization primitive:


Basically, a condition variable has three operations: you can wait for
the condition variable; you can signal the condition variable to wake
up one waiter; or you can broadcast on the condition variable to wake
up all waiters.  Atomically with entering the wait, you must be able
to check whether the condition is satisfied.  So, in my
implementation, a condition variable wait loop looks like this:

for (;;)
    if (condition for which we are waiting is satisfied)

To wake up one waiter, another backend can call
ConditionVariableSignal(cv); to wake up all waiters,

I am cautiously optimistic that this design will serve a wide variety
of needs for parallel query development - basically anything that
needs to wait for another process to reach a certain point in the
computation that can be detected through changes in shared memory
state.  The attached patch condition-variable-v1.patch implements this
API.  I originally open-coded the wait queue for this, but I've just
finished rebasing it on top of Thomas Munro's proclist stuff, so
before applying this patch you need the one from here:


At some point while hacking on this I realized that we could actually
replace the io_in_progress locks with condition variables; the
attached patch buffer-io-cv-v1.patch does this (it must be applied on
top of the proclist patch from the above email and also on top of
condition-variable-v1.patch).  Using condition variables here seems to
have a couple of advantages.  First, it means that a backend waiting
for buffer I/O to complete is interruptible.  Second, it fixes a
long-running bit of nastiness in AbortBufferIO: right now, if a
backend that is doing buffer I/O aborts, the abort causes it to
release all of its LWLocks, including the buffer I/O lock. Everyone
waiting for that buffer busy-loops until the aborting process gets
around to reacquiring the lock and updating the buffer state in
AbortBufferIO.  But if we replace the io_in_progress locks with
condition variables, then that doesn't happen any more.  Nobody is
"holding" the condition variable, so it doesn't get "released" when
the process doing I/O aborts.  Instead, they just keep sleeping until
the aborting process reaches AbortBufferIO, and then it broadcasts on
the condition variable and wakes everybody up, which seems a good deal

I'm very curious to know whether other people like this abstraction
and whether they think it will be useful for things they want to do
with parallel query (or otherwise).  Comments welcome.  Review
appreciated.  Other suggestions for how to handle this are cool, too.

Credit: These patches were written by me; an earlier version of the
condition-variable-v1.patch was reviewed and tested by Rahila Syed.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment: condition-variable-v1.patch
Description: application/download

Attachment: buffer-io-cv-v1.patch
Description: application/download

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to