Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
On 2015-02-17 13:14:00 -0500, Tom Lane wrote: > Hm, good point. On the other hand, should we worry about the possibility > of a lost signal? Moving the flag-clearing would guard against that, > which the current code does not. But we've not seen field reports of such > issues AFAIR, so this might not be an important consideration. I think if there were lost signals there'd be much bigger problems given the same (or in master) similar mechanics are used for a lot of other things including heavyweight and lightweight locks wait queues. > > ... > > /* > > * Make sure waiter flag is reset - it might not be if > > * ProcWaitForSignal() returned for another reason than > > UnpinBuffer() > > * signalling us. > > */ > > LockBufHdr(bufHdr); > > buf->flags &= ~BM_PIN_COUNT_WAITER; > > Assert(bufHdr->wait_backend_pid == MyProcPid); > > UnlockBufHdr(bufHdr); > > > PinCountWaitBuf = NULL; > > /* Loop back and try again */ > > } > > > to the bottom of the loop would suffice. > > No, I disagree. If we maintain the rule that the signaler clears > BM_PIN_COUNT_WAITER, then once that happens there is nothing to stop a > third party from trying to LockBufferForCleanup on the same buffer (except > for table-level locking conventions, which IMO this mechanism shouldn't be > dependent on). So this coding would potentially clear the > BM_PIN_COUNT_WAITER flag belonging to that third party, and then fail the > Assert --- but only in debug builds, not in production, where it would > just silently lock up the third-party waiter. So I think having a test to > verify that it's still "our" BM_PIN_COUNT_WAITER flag is essential. Pushed with a test guarding against that. I still think it might be slightly better to error out if somebody else waits, but I guess it's unlikely that we'd mistakenly add code doing that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
On 02/13/2015 06:27 AM, Tom Lane wrote: > Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly > the same failure pattern on HEAD: > > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59 > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57 > > I'd say we have a problem. I'd even go so far as to say that somebody has > completely broken locking, because this looks like autovacuum and manual > vacuuming are hitting the same table at the same time. fwiw - looks like spoonbill(not doing CLOBBER_CACHE_ALWAYS) managed to trigger that ones as well: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2015-02-23%2000%3A00%3A06 there is also some failures from the BETWEEN changes in that regression.diff but that might be fallout from the above problem. Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
Andres Freund writes: > On 2015-02-14 14:10:53 -0500, Tom Lane wrote: >> Andres Freund writes: >>> If you just gdb into the VACUUM process with 6647248e370884 checked out, >>> and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think >>> we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside >>> LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf = >>> NULL. Afaics, that should do the trick. >> If we're moving the responsibility for clearing that flag from the waker >> to the wakee, > I'm not sure if that's the best plan. Some buffers are pinned at an > incredible rate, sending a signal everytime might actually delay the > pincount waiter from actually getting through the loop. Hm, good point. On the other hand, should we worry about the possibility of a lost signal? Moving the flag-clearing would guard against that, which the current code does not. But we've not seen field reports of such issues AFAIR, so this might not be an important consideration. > Unless we block > further buffer pins by any backend while the flag is set, which I think > would likely not be a good idea, there seem to be little benefit in > moving the responsibility. I concur that we don't want the flag to block other backends from acquiring pins. The whole point here is for VACUUM to lurk in the background until it can proceed with deletion; we don't want it to take priority over foreground queries. > I think just adding something like > ... > /* > * Make sure waiter flag is reset - it might not be if > * ProcWaitForSignal() returned for another reason than UnpinBuffer() > * signalling us. > */ > LockBufHdr(bufHdr); > buf->flags &= ~BM_PIN_COUNT_WAITER; > Assert(bufHdr->wait_backend_pid == MyProcPid); > UnlockBufHdr(bufHdr); > PinCountWaitBuf = NULL; > /* Loop back and try again */ > } > to the bottom of the loop would suffice. No, I disagree. If we maintain the rule that the signaler clears BM_PIN_COUNT_WAITER, then once that happens there is nothing to stop a third party from trying to LockBufferForCleanup on the same buffer (except for table-level locking conventions, which IMO this mechanism shouldn't be dependent on). So this coding would potentially clear the BM_PIN_COUNT_WAITER flag belonging to that third party, and then fail the Assert --- but only in debug builds, not in production, where it would just silently lock up the third-party waiter. So I think having a test to verify that it's still "our" BM_PIN_COUNT_WAITER flag is essential. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
On 2015-02-14 14:10:53 -0500, Tom Lane wrote: > Andres Freund writes: > > I don't think it's actually 675333 at fault here. I think it's a > > long standing bug in LockBufferForCleanup() that can just much easier be > > hit with the new interrupt code. > > > Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal() > > returns spuriously - something it's documented to possibly do (and which > > got more likely with the new patches). In the normal case UnpinBuffer() > > will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll > > still be set and LockBufferForCleanup() will see it still set. > > Yeah, you're right: LockBufferForCleanup has never coped with the > possibility that ProcWaitForSignal returns prematurely. I'm not sure > if that was possible when this code was written, but we've got it > documented as being possible at least back to 8.2. So this needs to > be fixed in all branches. Agreed. > I think it would be smarter to duplicate all the logic > that's currently in UnlockBuffers(), just to make real sure we don't > drop somebody else's waiter flag. ISTM that in LockBufferForCleanup() such a state shouldn't be accepted - it'd be a sign of something going rather bad. I think asserting that it's "our" flag is a good idea, but silently ignoring the fact sounds like a bad plan. As LockBufferForCleanup() really is only safe when holding a SUE lock or heavier (otherwise one wait_backend_pid field obviously would not be sufficient), there should never ever be another waiter. > > If you just gdb into the VACUUM process with 6647248e370884 checked out, > > and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think > > we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside > > LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf = > > NULL. Afaics, that should do the trick. > > If we're moving the responsibility for clearing that flag from the waker > to the wakee, I'm not sure if that's the best plan. Some buffers are pinned at an incredible rate, sending a signal everytime might actually delay the pincount waiter from actually getting through the loop. Unless we block further buffer pins by any backend while the flag is set, which I think would likely not be a good idea, there seem to be little benefit in moving the responsibility. The least invasive fix would be to to weaken the error check to not trigger if it's not the first iteration through the loop... But that's not particularly pretty. I think just adding something like ... /* * Make sure waiter flag is reset - it might not be if * ProcWaitForSignal() returned for another reason than UnpinBuffer() * signalling us. */ LockBufHdr(bufHdr); buf->flags &= ~BM_PIN_COUNT_WAITER; Assert(bufHdr->wait_backend_pid == MyProcPid); UnlockBufHdr(bufHdr); PinCountWaitBuf = NULL; /* Loop back and try again */ } to the bottom of the loop would suffice. I can't see a extra buffer spinlock cycle matter in comparison to all the the other cost (like ping/pong ing around between processes). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
Andres Freund wrote: > On 2015-02-14 17:25:00 +, Kevin Grittner wrote: >>> I think we should simply move the >>> buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer) >> >> I think you meant inside UnpinBuffer? > > No, LockBufferHdr. What I meant was that the pincount can only be > manipulated while the buffer header spinlock is held. Oh, I see what you were saying -- I had read that a different way entirely. Got it. >> Even though it appears to be a long-standing bug, there don't >> appear to have been any field reports, so it doesn't seem like >> something to back-patch. > > I was wondering about that as well. But I don't think I agree. > The most likely scenario for this to fail is in full table > vacuums that have to freeze rows - those are primarily triggered > by autovacuum. I don't think it's likely that such a error > message would be discovered in the logs unless it happens very > regularly. I guess we have some time before the next minor release to find any problems with this; perhaps the benefit would outweigh the risk. Anyone else want to weigh in on that? > You can't manipulate flags without holding the spinlock. > Otherwise you (or the other writer) can easily cancel the other > sides effects. So is the attached more like what you had in mind? If not, feel free to post a patch. :-) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index e1e6240..6640172 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1548,7 +1548,6 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner) /* we just released the last pin other than the waiter's */ int wait_backend_pid = buf->wait_backend_pid; - buf->flags &= ~BM_PIN_COUNT_WAITER; UnlockBufHdr(buf); ProcSendSignal(wait_backend_pid); } @@ -3273,6 +3272,13 @@ LockBufferForCleanup(Buffer buffer) else ProcWaitForSignal(); + /* + * Clear the flag unconditionally here, because otherwise a spurious + * signal (which is allowed) could make it look like an error. + */ + LockBufHdr(bufHdr); + bufHdr->flags &= ~BM_PIN_COUNT_WAITER; + UnlockBufHdr(bufHdr); PinCountWaitBuf = NULL; /* Loop back and try again */ } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
Andres Freund writes: > I don't think it's actually 675333 at fault here. I think it's a > long standing bug in LockBufferForCleanup() that can just much easier be > hit with the new interrupt code. > Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal() > returns spuriously - something it's documented to possibly do (and which > got more likely with the new patches). In the normal case UnpinBuffer() > will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll > still be set and LockBufferForCleanup() will see it still set. Yeah, you're right: LockBufferForCleanup has never coped with the possibility that ProcWaitForSignal returns prematurely. I'm not sure if that was possible when this code was written, but we've got it documented as being possible at least back to 8.2. So this needs to be fixed in all branches. > If you just gdb into the VACUUM process with 6647248e370884 checked out, > and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think > we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside > LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf = > NULL. Afaics, that should do the trick. If we're moving the responsibility for clearing that flag from the waker to the wakee, I think it would be smarter to duplicate all the logic that's currently in UnlockBuffers(), just to make real sure we don't drop somebody else's waiter flag. So the bottom of the loop would look more like this: LockBufHdr(bufHdr); if ((bufHdr->flags & BM_PIN_COUNT_WAITER) != 0 && bufHdr->wait_backend_pid == MyProcPid) { /* Release hold on the BM_PIN_COUNT_WAITER bit */ bufHdr->flags &= ~BM_PIN_COUNT_WAITER; PinCountWaitBuf = NULL; // optionally, we could check for pin count 1 here ... } UnlockBufHdr(bufHdr); /* Loop back and try again */ Also we should rethink at least the comment in UnlockBuffers(). I'm not sure what the failure conditions are with this reassignment of responsibility, but the described case couldn't occur anymore. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
On 2015-02-14 17:25:00 +, Kevin Grittner wrote: > Andres Freund wrote: > > Imagine what happens in LockBufferForCleanup() when > > ProcWaitForSignal() returns spuriously - something it's > > documented to possibly do (and which got more likely with the new > > patches). In the normal case UnpinBuffer() will have unset > > BM_PIN_COUNT_WAITER - but in a spurious return it'll still be set > > and LockBufferForCleanup() will see it still set. > > That analysis makes sense to me. > > > I think we should simply move the > > buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer) > > I think you meant inside UnpinBuffer? No, LockBufferHdr. What I meant was that the pincount can only be manipulated while the buffer header spinlock is held. > > to LockBufferForCleanup, besides the PinCountWaitBuf = NULL. > > Afaics, that should do the trick. > > I tried that on the master branch (33e879c) (attached) and it > passes `make check-world` with no problems. I'm reviewing the > places that BM_PIN_COUNT_WAITER appears, to see if I can spot any > flaw in this. Does anyone else see a problem with it? Even though > it appears to be a long-standing bug, there don't appear to have > been any field reports, so it doesn't seem like something to > back-patch. I was wondering about that as well. But I don't think I agree. The most likely scenario for this to fail is in full table vacuums that have to freeze rows - those are primarily triggered by autovacuum. I don't think it's likely that such a error message would be discovered in the logs unless it happens very regularly. > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > diff --git a/src/backend/storage/buffer/bufmgr.c > b/src/backend/storage/buffer/bufmgr.c > index e1e6240..40b2194 100644 > --- a/src/backend/storage/buffer/bufmgr.c > +++ b/src/backend/storage/buffer/bufmgr.c > @@ -1548,7 +1548,6 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner) > /* we just released the last pin other than the > waiter's */ > int wait_backend_pid = > buf->wait_backend_pid; > > - buf->flags &= ~BM_PIN_COUNT_WAITER; > UnlockBufHdr(buf); > ProcSendSignal(wait_backend_pid); > } > @@ -3273,6 +3272,7 @@ LockBufferForCleanup(Buffer buffer) > else > ProcWaitForSignal(); > > + bufHdr->flags &= ~BM_PIN_COUNT_WAITER; > PinCountWaitBuf = NULL; > /* Loop back and try again */ > } You can't manipulate flags without holding the spinlock. Otherwise you (or the other writer) can easily cancel the other sides effects. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
Andres Freund wrote: > I don't think it's actually 675333 at fault here. I think it's a > long standing bug in LockBufferForCleanup() that can just much > easier be hit with the new interrupt code. The patches I'll be posting soon make it even easier to hit, which is why I was trying to sort this out when Tom noticed the buildfarm issues. > Imagine what happens in LockBufferForCleanup() when > ProcWaitForSignal() returns spuriously - something it's > documented to possibly do (and which got more likely with the new > patches). In the normal case UnpinBuffer() will have unset > BM_PIN_COUNT_WAITER - but in a spurious return it'll still be set > and LockBufferForCleanup() will see it still set. That analysis makes sense to me. > I think we should simply move the > buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer) I think you meant inside UnpinBuffer? > to LockBufferForCleanup, besides the PinCountWaitBuf = NULL. > Afaics, that should do the trick. I tried that on the master branch (33e879c) (attached) and it passes `make check-world` with no problems. I'm reviewing the places that BM_PIN_COUNT_WAITER appears, to see if I can spot any flaw in this. Does anyone else see a problem with it? Even though it appears to be a long-standing bug, there don't appear to have been any field reports, so it doesn't seem like something to back-patch. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index e1e6240..40b2194 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1548,7 +1548,6 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner) /* we just released the last pin other than the waiter's */ int wait_backend_pid = buf->wait_backend_pid; - buf->flags &= ~BM_PIN_COUNT_WAITER; UnlockBufHdr(buf); ProcSendSignal(wait_backend_pid); } @@ -3273,6 +3272,7 @@ LockBufferForCleanup(Buffer buffer) else ProcWaitForSignal(); + bufHdr->flags &= ~BM_PIN_COUNT_WAITER; PinCountWaitBuf = NULL; /* Loop back and try again */ } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
On 2015-02-13 23:05:16 +, Kevin Grittner wrote: > Andres Freund wrote: > > > How did you get to that recipe? > > I have been working on some patches to allow vacuum to function in > the face of long-held snapshots. (I'm struggling to get them into > presentable shape for the upcoming CF.) I was devising the most > diabolical cases I could to try to break my patched code and > started seeing this error. I was panicked that I had introduced > the bug, but on comparing to the master branch I found I was able > to cause it there, too. So I saw this a couple days before the > report on list, and had some cases that *sometimes* caused the > error. I tweaked until it seemed to be pretty reliable, and then > used that for the bisect. > > I still consider you to be the uncontested champion of diabolical > test cases, but I'm happy to have hit upon one that was useful > here. ;-) Hah. Not sure if that's something to be proud of :P I don't think it's actually 675333 at fault here. I think it's a long standing bug in LockBufferForCleanup() that can just much easier be hit with the new interrupt code. Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal() returns spuriously - something it's documented to possibly do (and which got more likely with the new patches). In the normal case UnpinBuffer() will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll still be set and LockBufferForCleanup() will see it still set. If you just gdb into the VACUUM process with 6647248e370884 checked out, and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf = NULL. Afaics, that should do the trick. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
Andres Freund wrote: > How did you get to that recipe? I have been working on some patches to allow vacuum to function in the face of long-held snapshots. (I'm struggling to get them into presentable shape for the upcoming CF.) I was devising the most diabolical cases I could to try to break my patched code and started seeing this error. I was panicked that I had introduced the bug, but on comparing to the master branch I found I was able to cause it there, too. So I saw this a couple days before the report on list, and had some cases that *sometimes* caused the error. I tweaked until it seemed to be pretty reliable, and then used that for the bisect. I still consider you to be the uncontested champion of diabolical test cases, but I'm happy to have hit upon one that was useful here. ;-) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
On 2015-02-13 22:33:35 +, Kevin Grittner wrote: > Andres Freund wrote: > > On 2015-02-13 00:27:04 -0500, Tom Lane wrote: > > >> I'd say we have a problem. I'd even go so far as to say that > >> somebody has completely broken locking, because this looks like > >> autovacuum and manual vacuuming are hitting the same table at > >> the same time. > > > One avenue to look are my changes around both buffer pinning and > > interrupt handling... > > I found a way to cause this reliably on my machine and did a > bisect. That pointed to commit 675f55e1d9bcb9da4323556b456583624a07 > > For the record, I would build and start the cluster, start two psql > sessions, and paste this into the first session: > drop table if exists m; > create table m (id int primary key); > insert into m select generate_series(1, 100) x; > checkpoint; > vacuum analyze; > checkpoint; > delete from m where id between 50 and 100; > begin; > declare c cursor for select * from m; > fetch c; > fetch c; > fetch c; > > As soon as I saw the fetches execute I hit Enter on this in the > other psql session: > vacuum freeze m; > > It would block, and then within a minute (i.e., autovacuum_naptime) > I would get the error. Great! Thanks for that piece of detective work. I've been travelling until an hour ago and not looked yet. How did you get to that recipe? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
Andres Freund wrote: > On 2015-02-13 00:27:04 -0500, Tom Lane wrote: >> I'd say we have a problem. I'd even go so far as to say that >> somebody has completely broken locking, because this looks like >> autovacuum and manual vacuuming are hitting the same table at >> the same time. > One avenue to look are my changes around both buffer pinning and > interrupt handling... I found a way to cause this reliably on my machine and did a bisect. That pointed to commit 675f55e1d9bcb9da4323556b456583624a07 For the record, I would build and start the cluster, start two psql sessions, and paste this into the first session: drop table if exists m; create table m (id int primary key); insert into m select generate_series(1, 100) x; checkpoint; vacuum analyze; checkpoint; delete from m where id between 50 and 100; begin; declare c cursor for select * from m; fetch c; fetch c; fetch c; As soon as I saw the fetches execute I hit Enter on this in the other psql session: vacuum freeze m; It would block, and then within a minute (i.e., autovacuum_naptime) I would get the error. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "multiple backends attempting to wait for pincount 1"
On 2015-02-13 00:27:04 -0500, Tom Lane wrote: > Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly > the same failure pattern on HEAD: > > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59 > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57 Those are rather strange, yea. Unfortunately both report a relatively large number of changes since the last run... > I'd say we have a problem. I'd even go so far as to say that somebody has > completely broken locking, because this looks like autovacuum and manual > vacuuming are hitting the same table at the same time. Hm. It seems likely that that would show up more widely. Oddly enough other CLOBBER_CACHE animals, like http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=jaguarundi&dt=2015-02-12%2013%3A03%3A00 , that run more frequently have not reported a problem. Neither has leech which IIRC runs on the same system... One avenue to look are my changes around both buffer pinning and interrupt handling... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] "multiple backends attempting to wait for pincount 1"
Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly the same failure pattern on HEAD: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57 I'd say we have a problem. I'd even go so far as to say that somebody has completely broken locking, because this looks like autovacuum and manual vacuuming are hitting the same table at the same time. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers