On 2015-02-14 14:10:53 -0500, Tom Lane wrote: > Andres Freund <and...@2ndquadrant.com> writes: > > I don't think it's actually 675333 at fault here. I think it's a > > long standing bug in LockBufferForCleanup() that can just much easier be > > hit with the new interrupt code. > > > Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal() > > returns spuriously - something it's documented to possibly do (and which > > got more likely with the new patches). In the normal case UnpinBuffer() > > will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll > > still be set and LockBufferForCleanup() will see it still set. > > Yeah, you're right: LockBufferForCleanup has never coped with the > possibility that ProcWaitForSignal returns prematurely. I'm not sure > if that was possible when this code was written, but we've got it > documented as being possible at least back to 8.2. So this needs to > be fixed in all branches.
Agreed. > I think it would be smarter to duplicate all the logic > that's currently in UnlockBuffers(), just to make real sure we don't > drop somebody else's waiter flag. ISTM that in LockBufferForCleanup() such a state shouldn't be accepted - it'd be a sign of something going rather bad. I think asserting that it's "our" flag is a good idea, but silently ignoring the fact sounds like a bad plan. As LockBufferForCleanup() really is only safe when holding a SUE lock or heavier (otherwise one wait_backend_pid field obviously would not be sufficient), there should never ever be another waiter. > > If you just gdb into the VACUUM process with 6647248e370884 checked out, > > and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think > > we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside > > LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf = > > NULL. Afaics, that should do the trick. > > If we're moving the responsibility for clearing that flag from the waker > to the wakee, I'm not sure if that's the best plan. Some buffers are pinned at an incredible rate, sending a signal everytime might actually delay the pincount waiter from actually getting through the loop. Unless we block further buffer pins by any backend while the flag is set, which I think would likely not be a good idea, there seem to be little benefit in moving the responsibility. The least invasive fix would be to to weaken the error check to not trigger if it's not the first iteration through the loop... But that's not particularly pretty. I think just adding something like ... /* * Make sure waiter flag is reset - it might not be if * ProcWaitForSignal() returned for another reason than UnpinBuffer() * signalling us. */ LockBufHdr(bufHdr); buf->flags &= ~BM_PIN_COUNT_WAITER; Assert(bufHdr->wait_backend_pid == MyProcPid); UnlockBufHdr(bufHdr); PinCountWaitBuf = NULL; /* Loop back and try again */ } to the bottom of the loop would suffice. I can't see a extra buffer spinlock cycle matter in comparison to all the the other cost (like ping/pong ing around between processes). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers