On 2014-02-13 15:34:09 +0100, Florian Pflug wrote:
> On Feb10, 2014, at 17:38 , Andres Freund <and...@2ndquadrant.com> wrote:
> > On 2014-02-10 11:11:28 -0500, Tom Lane wrote:
> >> Andres Freund <and...@2ndquadrant.com> writes:
> >>> So what we need to do is to acquire a write barrier between the
> >>> assignments to lwWaitLink and lwWaiting, i.e.
> >>> proc->lwWaitLink = NULL;
> >>> pg_write_barrier();
> >>> proc->lwWaiting = false;
> >> You didn't really explain why you think that ordering is necessary?
> >> Each proc being awoken will surely see both fields updated, and other
> >> procs won't be examining these fields at all, since we already delinked
> >> all these procs from the LWLock's queue.
> > The problem is that one the released backends could wake up concurrently
> > because of a unrelated, or previous PGSemaphoreUnlock(). It could see
> > lwWaiting = false, and thus wakeup and acquire the lock, even if the
> > store for lwWaitLink hasn't arrived (or performed, there's no guaranteed
> > ordering here) yet.
> > Now, it may well be that there's no practical consequence of that, but I
> > am not prepared to bet on it.
> AFAICS there is a potential problem if three backends are involved, since
> by the time the waiting backend's lwWaitLink is set to NULL after the
> original lock holder released the lock, the waiting backend might already
> have acquired the lock (due to a spurious wakeup) *and* a third backend
> might have already enqueued behind it.
> The exact sequence for backends A,B,C that corrupts the wait queue is:
> A: Release lock, set B's lwWaiting to false
> B: Wakes up spuriously, takes the lock
> C: Enqueues behind B
> A: Sets B' lwWaitLink back to NULL, thereby truncating the queue and
> causing C and anyone behind it to block indefinitely.
I don't think that can actually happen because the head of the wait list
isn't the lock holder's lwWaitLink, but LWLock->head. I thought the same
for a while...
So, right now I don't see problems without either the compiler reordering
stores or heavily out of order machines with speculative execution.
> I wonder whether LWLockRelease really needs to update lwWaitLink at all.
> We take the backends we awake off the queue by updating the queue's head and
> tail, so the contents of lwWaitLink should only matter once the backend is
> re-inserted into some wait queue. But when doing that, we reset lwWaitLink
> back to NULL anway.
I don't like that, this stuff is hard to debug already, having stale
pointers around doesn't help. I think we should just add the barrier in
the releases with barrier.h and locally use a volatile var in the
branches before that.
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: