On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas <robertmh...@gmail.com> wrote:
> https://en.wikipedia.org/wiki/Monitor_(synchronization)#Condition_variables_2
> Basically, a condition variable has three operations: you can wait for
> the condition variable; you can signal the condition variable to wake
> up one waiter; or you can broadcast on the condition variable to wake
> up all waiters.  Atomically with entering the wait, you must be able
> to check whether the condition is satisfied.  So, in my
> implementation, a condition variable wait loop looks like this:
> for (;;)
> {
>     ConditionVariablePrepareToSleep(cv);
>     if (condition for which we are waiting is satisfied)
>         break;
>     ConditionVariableSleep();
> }
> ConditionVariableCancelSleep();
> To wake up one waiter, another backend can call
> ConditionVariableSignal(cv); to wake up all waiters,
> ConditionVariableBroadcast(cv).

It is interesting to compare this interface with Wikipedia's
description, POSIX's pthread_cond_t and C++'s std::condition_variable.

In those interfaces, the wait operation takes a mutex which must
already be held by the caller.  It unlocks the mutex and begins
waiting atomically.  Then when it returns, the mutex is automatically
reacquired.  This approach avoids race conditions as long as the
shared state change you are awaiting is protected by that mutex.  If
you check that state before waiting and while still holding the lock,
you can be sure not to miss any change signals, and then when it
returns you can check the state again and be sure that no one can be
concurrently changing it.

In contrast, this proposal leaves it up to client code to get that
right, similarly to the way you need to do things in a certain order
when waiting for state changes with latches.  You could say that it's
more error prone: I think there have been a few cases of incorrectly
coded latch/state-change wait loops in the past.  On the other hand,
it places no requirements on the synchronisation mechanism the client
code uses for the related shared state.  pthread_cond_wait requires
you to pass in a pointer to the related pthread_mutex_t, whereas with
this proposal client code is free to use atomic ops, lwlocks,
spinlocks or any other mutual exclusion mechanism to coordinate state
changes and deal with cache coherency.

Then there is the question of what happens when the backend that is
supposed to be doing the signalling dies or aborts, which Tom Lane
referred to in his reply.  In those other libraries there is no such
concern: it's understood that these are low level thread
synchronisation primitives and if you're waiting for something that
never happens, you'll be waiting forever.  I don't know what the
answer is in general for Postgres condition variables, but...

The thing that I personally am working on currently that is very
closely related and could use this has a more specific set of
circumstances:  I want "join points" AKA barriers.  Something like
pthread_barrier_t.  (I'm saying "join point" rather than "barrier" to
avoid confusion with compiler and memory barriers, barrier.h etc.)
Join points let you wait for all workers in a known set to reach a
given point, possibly with a phase number or at least sense (one bit
phase counter) to detect synchronisation bugs.  They also select one
worker arbitrarily to receive a different return value when releasing
workers from a join point, for cases where a particular phase of
parallel work needs to be done by exactly one worker while the others
sit on the bench: for example initialisation, cleanup or merging (CF
PTHREAD_BARRIER_SERIAL_THREAD).  Clearly a join point could be not
much more than a condition variable and some state tracking arrivals
and departures, but I think that this higher level synchronisation
primitive might have an advantage over raw condition variables in the
abort case: it can know the total set of workers that its waiting for,
if they are somehow registered with it first, and registration can
include arranging for cleanup hooks to do the right thing.  It's
already a requirement for a join point to know which workers exist (or
at least how many).  Then the deal would then be that when you call
joinpoint_join(&some_joinpoint, phase), it will return only when all
peers have joined or detached, where the latter happens automatically
if they abort or die.  Not at all sure of the details yet...  but I
suspect join points are useful for a bunch of things like parallel
sort, parallel hash join (my project), and anything else involving
phases or some form of "fork/join" parallelism.

Or perhaps that type of thinking about error handling should be pushed
down to the condition variable.  How would that look: all potential
signallers would have to register to deliver a goodbye signal in their
abort and shmem exit paths?  Then what happens if you die before
registering?  I think even if you find a way to do that I'd still need
to do similar extra work on top for my join points concept, because
although I do need waiters to be poked at the time worker aborts or
dies, one goodbye prod isn't enough: I'd also need to adjust the join
point's set of workers, or put it into error state.

Thomas Munro

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to