Re: [HACKERS] LWLock deadlock and gdb advice

Andres Freund Wed, 29 Jul 2015 23:37:16 -0700

On 2015-07-29 20:23:24 +0300, Heikki Linnakangas wrote:
> Backend A has called LWLockWaitForVar(X) on a lock, and is now waiting on
> it. The lock holder releases the lock, and wakes up A. But before A wakes up
> and sees that the lock is free, another backend acquires the lock again. It
> runs LWLockAcquireWithVar to the point just before setting the variable's
> value. Now A wakes up, sees that the lock is still (or again) held, and that
> the variable's value still matches the old one, and goes back to sleep. The
> new lock holder won't wake it up until it updates the value again, or to
> releases the lock.


I'm not sure whether this observation is about my patch or the general
lwlock variable mechanism. In my opinion that behaviour exists today
both in 9.4 and 9.5.

But I think that's fine because that "race" seems pretty
fundamental. After all, you could have called LWLockWaitForVar() just
after the second locker had set the variable to the same value.


> You didn't like the new LW_FLAG_VAR_SET flag and the API changes I
> proposed?

I do slightly dislike the additional bit of math in AttemptLock. But
what I really disliked about the patch is the reliance on holding the
spinlock over longer times and conditionally acquiring spinlocks.  I
didn't see a need for the flags change to fix the issue at hand, that's
why I didn't incorporate it. I'm not fundamentally against it.

> Either way, there is a race condition that if the new lock holder sets the
> variable to the *same* value as before, the old waiter won't necessarily
> wake up even though the lock was free or had a different value in between.
> But that's easier to explain and understand than the fact that the value set
> by LWLockAcquireWithVar() might not be seen by a waiter, until you release
> the lock or update it again.

I can't really se a use for the API that'd allow that and care about the
waits. Because quite fundamentally you could just started waiting after
the variable was set to the value at the same time.

> Another idea is to have LWLockAcquire check the wait queue after setting the
> variable, and wake up anyone who's already queued up.

Ugh, not a fan of that.

> The changes in LWLockWaitForVar() and LWLockConflictsWithVar() seem OK in
> principle. You'll want to change LWLockConflictsWithVar() so that the
> spinlock is held only over the "value = *valptr" line, though; the other
> stuff just modifies local variables and don't need to be protected by the
> spinlock.

good point.,

> Also, when you enter LWLockWaitForVar(), you're checking if the
> lock is held twice in a row; first at the top of the function, and again
> inside LWLockConflictsWithVar. You could just remove the "quick test" at the
> top.

Yea, I was thinking about removing it. The first check was there
previously, so I left it in place. We do execute a bit more code once
we've disabled interrupts (extraWaits, re-enabling interrupts). I don't
think it'll matter much, but it seemed like an independent thing.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LWLock deadlock and gdb advice

Reply via email to