Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-23 Thread Andres Freund
On 2015-02-17 13:14:00 -0500, Tom Lane wrote: Hm, good point. On the other hand, should we worry about the possibility of a lost signal? Moving the flag-clearing would guard against that, which the current code does not. But we've not seen field reports of such issues AFAIR, so this might

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-22 Thread Stefan Kaltenbrunner
On 02/13/2015 06:27 AM, Tom Lane wrote: Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly the same failure pattern on HEAD: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhordt=2015-02-06%2011%3A59%3A59

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-17 Thread Andres Freund
On 2015-02-14 14:10:53 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: I don't think it's actually 675333 at fault here. I think it's a long standing bug in LockBufferForCleanup() that can just much easier be hit with the new interrupt code. Imagine what happens in

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: On 2015-02-14 14:10:53 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: If you just gdb into the VACUUM process with 6647248e370884 checked out, and do a PGSemaphoreUnlock(MyProc-sem) you'll hit it as well. I think we should

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-14 Thread Andres Freund
On 2015-02-14 17:25:00 +, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal() returns spuriously - something it's documented to possibly do (and which got more likely with the new patches). In the

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-14 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote: On 2015-02-14 17:25:00 +, Kevin Grittner wrote: I think we should simply move the buf-flags = ~BM_PIN_COUNT_WAITER (Inside LockBuffer) I think you meant inside UnpinBuffer? No, LockBufferHdr. What I meant was that the pincount can only be

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-14 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote: I don't think it's actually 675333 at fault here. I think it's a long standing bug in LockBufferForCleanup() that can just much easier be hit with the new interrupt code. The patches I'll be posting soon make it even easier to hit, which is why I

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-14 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes: I don't think it's actually 675333 at fault here. I think it's a long standing bug in LockBufferForCleanup() that can just much easier be hit with the new interrupt code. Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal()

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-13 Thread Andres Freund
On 2015-02-13 00:27:04 -0500, Tom Lane wrote: Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly the same failure pattern on HEAD: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhordt=2015-02-06%2011%3A59%3A59

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-13 Thread Andres Freund
On 2015-02-13 22:33:35 +, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: On 2015-02-13 00:27:04 -0500, Tom Lane wrote: I'd say we have a problem. I'd even go so far as to say that somebody has completely broken locking, because this looks like autovacuum and

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-13 Thread Andres Freund
On 2015-02-13 23:05:16 +, Kevin Grittner wrote: Andres Freund and...@2ndquadrant.com wrote: How did you get to that recipe? I have been working on some patches to allow vacuum to function in the face of long-held snapshots. (I'm struggling to get them into presentable shape for the

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-13 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote: On 2015-02-13 00:27:04 -0500, Tom Lane wrote: I'd say we have a problem. I'd even go so far as to say that somebody has completely broken locking, because this looks like autovacuum and manual vacuuming are hitting the same table at the same time.

Re: [HACKERS] multiple backends attempting to wait for pincount 1

2015-02-13 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote: How did you get to that recipe? I have been working on some patches to allow vacuum to function in the face of long-held snapshots. (I'm struggling to get them into presentable shape for the upcoming CF.) I was devising the most diabolical cases I