subject:"\[HACKERS\] \"multiple backends attempting to wait for pincount 1\""

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-23 Thread Andres Freund

On 2015-02-17 13:14:00 -0500, Tom Lane wrote:
> Hm, good point.  On the other hand, should we worry about the possibility
> of a lost signal?  Moving the flag-clearing would guard against that,
> which the current code does not.  But we've not seen field reports of such
> issues AFAIR, so this might not be an important consideration.

I think if there were lost signals there'd be much bigger problems given
the same (or in master) similar mechanics are used for a lot of other
things including heavyweight and lightweight locks wait queues.

> > ...
> > /*
> >  * Make sure waiter flag is reset - it might not be if
> >  * ProcWaitForSignal() returned for another reason than 
> > UnpinBuffer()
> >  * signalling us.
> >  */
> > LockBufHdr(bufHdr);
> > buf->flags &= ~BM_PIN_COUNT_WAITER;
> > Assert(bufHdr->wait_backend_pid == MyProcPid);
> > UnlockBufHdr(bufHdr);
> 
> > PinCountWaitBuf = NULL;
> > /* Loop back and try again */
> > }
> 
> > to the bottom of the loop would suffice.
> 
> No, I disagree.  If we maintain the rule that the signaler clears
> BM_PIN_COUNT_WAITER, then once that happens there is nothing to stop a
> third party from trying to LockBufferForCleanup on the same buffer (except
> for table-level locking conventions, which IMO this mechanism shouldn't be
> dependent on).  So this coding would potentially clear the
> BM_PIN_COUNT_WAITER flag belonging to that third party, and then fail the
> Assert --- but only in debug builds, not in production, where it would
> just silently lock up the third-party waiter.  So I think having a test to
> verify that it's still "our" BM_PIN_COUNT_WAITER flag is essential.

Pushed with a test guarding against that. I still think it might be
slightly better to error out if somebody else waits, but I guess it's
unlikely that we'd mistakenly add code doing that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-22 Thread Stefan Kaltenbrunner

On 02/13/2015 06:27 AM, Tom Lane wrote:
> Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly
> the same failure pattern on HEAD:
> 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57
> 
> I'd say we have a problem.  I'd even go so far as to say that somebody has
> completely broken locking, because this looks like autovacuum and manual
> vacuuming are hitting the same table at the same time.

fwiw - looks like spoonbill(not doing CLOBBER_CACHE_ALWAYS) managed to
trigger that ones as well:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill&dt=2015-02-23%2000%3A00%3A06

there is also some failures from the BETWEEN changes in that
regression.diff but that might be fallout from the above problem.


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-17 Thread Tom Lane

Andres Freund  writes:
> On 2015-02-14 14:10:53 -0500, Tom Lane wrote:
>> Andres Freund  writes:
>>> If you just gdb into the VACUUM process with 6647248e370884 checked out,
>>> and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
>>> we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
>>> LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
>>> NULL. Afaics, that should do the trick.

>> If we're moving the responsibility for clearing that flag from the waker
>> to the wakee,

> I'm not sure if that's the best plan.  Some buffers are pinned at an
> incredible rate, sending a signal everytime might actually delay the
> pincount waiter from actually getting through the loop.

Hm, good point.  On the other hand, should we worry about the possibility
of a lost signal?  Moving the flag-clearing would guard against that,
which the current code does not.  But we've not seen field reports of such
issues AFAIR, so this might not be an important consideration.

> Unless we block
> further buffer pins by any backend while the flag is set, which I think
> would likely not be a good idea, there seem to be little benefit in
> moving the responsibility.

I concur that we don't want the flag to block other backends from
acquiring pins.  The whole point here is for VACUUM to lurk in the
background until it can proceed with deletion; we don't want it to take
priority over foreground queries.

> I think just adding something like

> ...
> /*
>  * Make sure waiter flag is reset - it might not be if
>  * ProcWaitForSignal() returned for another reason than UnpinBuffer()
>  * signalling us.
>  */
> LockBufHdr(bufHdr);
> buf->flags &= ~BM_PIN_COUNT_WAITER;
> Assert(bufHdr->wait_backend_pid == MyProcPid);
> UnlockBufHdr(bufHdr);

> PinCountWaitBuf = NULL;
> /* Loop back and try again */
> }

> to the bottom of the loop would suffice.

No, I disagree.  If we maintain the rule that the signaler clears
BM_PIN_COUNT_WAITER, then once that happens there is nothing to stop a
third party from trying to LockBufferForCleanup on the same buffer (except
for table-level locking conventions, which IMO this mechanism shouldn't be
dependent on).  So this coding would potentially clear the
BM_PIN_COUNT_WAITER flag belonging to that third party, and then fail the
Assert --- but only in debug builds, not in production, where it would
just silently lock up the third-party waiter.  So I think having a test to
verify that it's still "our" BM_PIN_COUNT_WAITER flag is essential.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-17 Thread Andres Freund

On 2015-02-14 14:10:53 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > I don't think it's actually 675333 at fault here. I think it's a
> > long standing bug in LockBufferForCleanup() that can just much easier be
> > hit with the new interrupt code.
> 
> > Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal()
> > returns spuriously - something it's documented to possibly do (and which
> > got more likely with the new patches). In the normal case UnpinBuffer()
> > will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll
> > still be set and LockBufferForCleanup() will see it still set.
> 
> Yeah, you're right: LockBufferForCleanup has never coped with the
> possibility that ProcWaitForSignal returns prematurely.  I'm not sure
> if that was possible when this code was written, but we've got it
> documented as being possible at least back to 8.2.  So this needs to
> be fixed in all branches.

Agreed.

> I think it would be smarter to duplicate all the logic
> that's currently in UnlockBuffers(), just to make real sure we don't
> drop somebody else's waiter flag.

ISTM that in LockBufferForCleanup() such a state shouldn't be accepted -
it'd be a sign of something going rather bad. I think asserting that
it's "our" flag is a good idea, but silently ignoring the fact sounds
like a bad plan.  As LockBufferForCleanup() really is only safe when
holding a SUE lock or heavier (otherwise one wait_backend_pid field
obviously would not be sufficient), there should never ever be another
waiter.

> > If you just gdb into the VACUUM process with 6647248e370884 checked out,
> > and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
> > we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
> > LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
> > NULL. Afaics, that should do the trick.
> 
> If we're moving the responsibility for clearing that flag from the waker
> to the wakee,

I'm not sure if that's the best plan.  Some buffers are pinned at an
incredible rate, sending a signal everytime might actually delay the
pincount waiter from actually getting through the loop. Unless we block
further buffer pins by any backend while the flag is set, which I think
would likely not be a good idea, there seem to be little benefit in
moving the responsibility.

The least invasive fix would be to to weaken the error check to not
trigger if it's not the first iteration through the loop... But that's
not particularly pretty.

I think just adding something like

...
/*
 * Make sure waiter flag is reset - it might not be if
 * ProcWaitForSignal() returned for another reason than UnpinBuffer()
 * signalling us.
 */
LockBufHdr(bufHdr);
buf->flags &= ~BM_PIN_COUNT_WAITER;
Assert(bufHdr->wait_backend_pid == MyProcPid);
UnlockBufHdr(bufHdr);

PinCountWaitBuf = NULL;
/* Loop back and try again */
}

to the bottom of the loop would suffice. I can't see a extra buffer
spinlock cycle matter in comparison to all the the other cost (like
ping/pong ing around between processes).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-14 Thread Kevin Grittner

Andres Freund  wrote:
> On 2015-02-14 17:25:00 +, Kevin Grittner wrote:

>>> I think we should simply move the
>>>  buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer)
>>
>> I think you meant inside UnpinBuffer?
>
> No, LockBufferHdr. What I meant was that the pincount can only be
> manipulated while the buffer header spinlock is held.

Oh, I see what you were saying -- I had read that a different way
entirely.  Got it.

>> Even though it appears to be a long-standing bug, there don't
>> appear to have been any field reports, so it doesn't seem like
>> something to back-patch.
>
> I was wondering about that as well. But I don't think I agree.
> The most likely scenario for this to fail is in full table
> vacuums that have to freeze rows - those are primarily triggered
> by autovacuum. I don't think it's likely that such a error
> message would be discovered in the logs unless it happens very
> regularly.

I guess we have some time before the next minor release to find any
problems with this; perhaps the benefit would outweigh the risk.
Anyone else want to weigh in on that?

> You can't manipulate flags without holding the spinlock.
> Otherwise you (or the other writer) can easily cancel the other
> sides effects.

So is the attached more like what you had in mind?  If not, feel
free to post a patch.  :-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e1e6240..6640172 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1548,7 +1548,6 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
 			/* we just released the last pin other than the waiter's */
 			int			wait_backend_pid = buf->wait_backend_pid;
 
-			buf->flags &= ~BM_PIN_COUNT_WAITER;
 			UnlockBufHdr(buf);
 			ProcSendSignal(wait_backend_pid);
 		}
@@ -3273,6 +3272,13 @@ LockBufferForCleanup(Buffer buffer)
 		else
 			ProcWaitForSignal();
 
+		/*
+		 * Clear the flag unconditionally here, because otherwise a spurious
+		 * signal (which is allowed) could make it look like an error.
+		 */
+		LockBufHdr(bufHdr);
+		bufHdr->flags &= ~BM_PIN_COUNT_WAITER;
+		UnlockBufHdr(bufHdr);
 		PinCountWaitBuf = NULL;
 		/* Loop back and try again */
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-14 Thread Tom Lane

Andres Freund  writes:
> I don't think it's actually 675333 at fault here. I think it's a
> long standing bug in LockBufferForCleanup() that can just much easier be
> hit with the new interrupt code.

> Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal()
> returns spuriously - something it's documented to possibly do (and which
> got more likely with the new patches). In the normal case UnpinBuffer()
> will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll
> still be set and LockBufferForCleanup() will see it still set.

Yeah, you're right: LockBufferForCleanup has never coped with the
possibility that ProcWaitForSignal returns prematurely.  I'm not sure
if that was possible when this code was written, but we've got it
documented as being possible at least back to 8.2.  So this needs to
be fixed in all branches.

> If you just gdb into the VACUUM process with 6647248e370884 checked out,
> and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
> we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
> LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
> NULL. Afaics, that should do the trick.

If we're moving the responsibility for clearing that flag from the waker
to the wakee, I think it would be smarter to duplicate all the logic
that's currently in UnlockBuffers(), just to make real sure we don't
drop somebody else's waiter flag.  So the bottom of the loop would
look more like this:

LockBufHdr(bufHdr);
if ((bufHdr->flags & BM_PIN_COUNT_WAITER) != 0 &&
bufHdr->wait_backend_pid == MyProcPid)
{
/* Release hold on the BM_PIN_COUNT_WAITER bit */
bufHdr->flags &= ~BM_PIN_COUNT_WAITER;
PinCountWaitBuf = NULL;
// optionally, we could check for pin count 1 here ...
}
UnlockBufHdr(bufHdr);
/* Loop back and try again */

Also we should rethink at least the comment in UnlockBuffers().
I'm not sure what the failure conditions are with this reassignment
of responsibility, but the described case couldn't occur anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-14 Thread Andres Freund

On 2015-02-14 17:25:00 +, Kevin Grittner wrote:
> Andres Freund  wrote:
> > Imagine what happens in LockBufferForCleanup() when
> > ProcWaitForSignal() returns spuriously - something it's
> > documented to possibly do (and which got more likely with the new
> > patches). In the normal case UnpinBuffer() will have unset
> > BM_PIN_COUNT_WAITER - but in a spurious return it'll still be set
> > and LockBufferForCleanup() will see it still set.
> 
> That analysis makes sense to me.
> 
> > I think we should simply move the
> >   buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer)
> 
> I think you meant inside UnpinBuffer?

No, LockBufferHdr. What I meant was that the pincount can only be
manipulated while the buffer header spinlock is held.

> > to LockBufferForCleanup, besides the PinCountWaitBuf = NULL.
> > Afaics, that should do the trick.
> 
> I tried that on the master branch (33e879c) (attached) and it
> passes `make check-world` with no problems.  I'm reviewing the
> places that BM_PIN_COUNT_WAITER appears, to see if I can spot any
> flaw in this.  Does anyone else see a problem with it?  Even though
> it appears to be a long-standing bug, there don't appear to have
> been any field reports, so it doesn't seem like something to
> back-patch.

I was wondering about that as well. But I don't think I agree. The most
likely scenario for this to fail is in full table vacuums that have to
freeze rows - those are primarily triggered by autovacuum. I don't think
it's likely that such a error message would be discovered in the logs
unless it happens very regularly.

> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

> diff --git a/src/backend/storage/buffer/bufmgr.c 
> b/src/backend/storage/buffer/bufmgr.c
> index e1e6240..40b2194 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -1548,7 +1548,6 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
>   /* we just released the last pin other than the 
> waiter's */
>   int wait_backend_pid = 
> buf->wait_backend_pid;
>  
> - buf->flags &= ~BM_PIN_COUNT_WAITER;
>   UnlockBufHdr(buf);
>   ProcSendSignal(wait_backend_pid);
>   }
> @@ -3273,6 +3272,7 @@ LockBufferForCleanup(Buffer buffer)
>   else
>   ProcWaitForSignal();
>  
> + bufHdr->flags &= ~BM_PIN_COUNT_WAITER;
>   PinCountWaitBuf = NULL;
>   /* Loop back and try again */
>   }

You can't manipulate flags without holding the spinlock. Otherwise you
(or the other writer) can easily cancel the other sides effects.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-14 Thread Kevin Grittner

Andres Freund  wrote:

> I don't think it's actually 675333 at fault here. I think it's a
> long standing bug in LockBufferForCleanup() that can just much
> easier be hit with the new interrupt code.

The patches I'll be posting soon make it even easier to hit, which
is why I was trying to sort this out when Tom noticed the buildfarm
issues.

> Imagine what happens in LockBufferForCleanup() when
> ProcWaitForSignal() returns spuriously - something it's
> documented to possibly do (and which got more likely with the new
> patches). In the normal case UnpinBuffer() will have unset
> BM_PIN_COUNT_WAITER - but in a spurious return it'll still be set
> and LockBufferForCleanup() will see it still set.

That analysis makes sense to me.

> I think we should simply move the
>   buf->flags &= ~BM_PIN_COUNT_WAITER (Inside LockBuffer)

I think you meant inside UnpinBuffer?

> to LockBufferForCleanup, besides the PinCountWaitBuf = NULL.
> Afaics, that should do the trick.

I tried that on the master branch (33e879c) (attached) and it
passes `make check-world` with no problems.  I'm reviewing the
places that BM_PIN_COUNT_WAITER appears, to see if I can spot any
flaw in this.  Does anyone else see a problem with it?  Even though
it appears to be a long-standing bug, there don't appear to have
been any field reports, so it doesn't seem like something to
back-patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e1e6240..40b2194 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1548,7 +1548,6 @@ UnpinBuffer(volatile BufferDesc *buf, bool fixOwner)
 			/* we just released the last pin other than the waiter's */
 			int			wait_backend_pid = buf->wait_backend_pid;
 
-			buf->flags &= ~BM_PIN_COUNT_WAITER;
 			UnlockBufHdr(buf);
 			ProcSendSignal(wait_backend_pid);
 		}
@@ -3273,6 +3272,7 @@ LockBufferForCleanup(Buffer buffer)
 		else
 			ProcWaitForSignal();
 
+		bufHdr->flags &= ~BM_PIN_COUNT_WAITER;
 		PinCountWaitBuf = NULL;
 		/* Loop back and try again */
 	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-13 Thread Andres Freund

On 2015-02-13 23:05:16 +, Kevin Grittner wrote:
> Andres Freund  wrote:
> 
> > How did you get to that recipe?
> 
> I have been working on some patches to allow vacuum to function in
> the face of long-held snapshots.  (I'm struggling to get them into
> presentable shape for the upcoming CF.)  I was devising the most
> diabolical cases I could to try to break my patched code and
> started seeing this error.  I was panicked that I had introduced
> the bug, but on comparing to the master branch I found I was able
> to cause it there, too.  So I saw this a couple days before the
> report on list, and had some cases that *sometimes* caused the
> error.  I tweaked until it seemed to be pretty reliable, and then
> used that for the bisect.
> 
> I still consider you to be the uncontested champion of diabolical 
> test cases, but I'm happy to have hit upon one that was useful 
> here.  ;-)

Hah. Not sure if that's something to be proud of :P

I don't think it's actually 675333 at fault here. I think it's a
long standing bug in LockBufferForCleanup() that can just much easier be
hit with the new interrupt code.

Imagine what happens in LockBufferForCleanup() when ProcWaitForSignal()
returns spuriously - something it's documented to possibly do (and which
got more likely with the new patches). In the normal case UnpinBuffer()
will have unset BM_PIN_COUNT_WAITER - but in a spurious return it'll
still be set and LockBufferForCleanup() will see it still set.

If you just gdb into the VACUUM process with 6647248e370884 checked out,
and do a PGSemaphoreUnlock(&MyProc->sem) you'll hit it as well. I think
we should simply move the buf->flags &= ~BM_PIN_COUNT_WAITER (Inside
LockBuffer) to LockBufferForCleanup, besides the PinCountWaitBuf =
NULL. Afaics, that should do the trick.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-13 Thread Kevin Grittner

Andres Freund  wrote:

> How did you get to that recipe?

I have been working on some patches to allow vacuum to function in
the face of long-held snapshots.  (I'm struggling to get them into
presentable shape for the upcoming CF.)  I was devising the most
diabolical cases I could to try to break my patched code and
started seeing this error.  I was panicked that I had introduced
the bug, but on comparing to the master branch I found I was able
to cause it there, too.  So I saw this a couple days before the
report on list, and had some cases that *sometimes* caused the
error.  I tweaked until it seemed to be pretty reliable, and then
used that for the bisect.

I still consider you to be the uncontested champion of diabolical 
test cases, but I'm happy to have hit upon one that was useful 
here.  ;-)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-13 Thread Andres Freund

On 2015-02-13 22:33:35 +, Kevin Grittner wrote:
> Andres Freund  wrote:
> > On 2015-02-13 00:27:04 -0500, Tom Lane wrote:
> 
> >> I'd say we have a problem.  I'd even go so far as to say that
> >> somebody has completely broken locking, because this looks like
> >> autovacuum and manual vacuuming are hitting the same table at
> >> the same time.
> 
> > One avenue to look are my changes around both buffer pinning and
> > interrupt handling...
> 
> I found a way to cause this reliably on my machine and did a
> bisect.  That pointed to commit 675f55e1d9bcb9da4323556b456583624a07
> 
> For the record, I would build and start the cluster, start two psql
> sessions, and paste this into the first session:

> drop table if exists m;
> create table m (id int primary key);
> insert into m select generate_series(1, 100) x;
> checkpoint;
> vacuum analyze;
> checkpoint;
> delete from m where id between 50 and 100;
> begin;
> declare c cursor for select * from m;
> fetch c;
> fetch c;
> fetch c;
> 
> As soon as I saw the fetches execute I hit Enter on this in the
> other psql session:

> vacuum freeze m;
> 
> It would block, and then within a minute (i.e., autovacuum_naptime)
> I would get the error.

Great! Thanks for that piece of detective work. I've been travelling
until an hour ago and not looked yet. How did you get to that recipe?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-13 Thread Kevin Grittner

Andres Freund  wrote:
> On 2015-02-13 00:27:04 -0500, Tom Lane wrote:

>> I'd say we have a problem.  I'd even go so far as to say that
>> somebody has completely broken locking, because this looks like
>> autovacuum and manual vacuuming are hitting the same table at
>> the same time.

> One avenue to look are my changes around both buffer pinning and
> interrupt handling...

I found a way to cause this reliably on my machine and did a
bisect.  That pointed to commit 675f55e1d9bcb9da4323556b456583624a07

For the record, I would build and start the cluster, start two psql
sessions, and paste this into the first session:

drop table if exists m;
create table m (id int primary key);
insert into m select generate_series(1, 100) x;
checkpoint;
vacuum analyze;
checkpoint;
delete from m where id between 50 and 100;
begin;
declare c cursor for select * from m;
fetch c;
fetch c;
fetch c;

As soon as I saw the fetches execute I hit Enter on this in the
other psql session:

vacuum freeze m;

It would block, and then within a minute (i.e., autovacuum_naptime)
I would get the error.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-13 Thread Andres Freund

On 2015-02-13 00:27:04 -0500, Tom Lane wrote:
> Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly
> the same failure pattern on HEAD:
> 
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57

Those are rather strange, yea.

Unfortunately both report a relatively large number of changes since the
last run...

> I'd say we have a problem.  I'd even go so far as to say that somebody has
> completely broken locking, because this looks like autovacuum and manual
> vacuuming are hitting the same table at the same time.

Hm. It seems likely that that would show up more widely.

Oddly enough other CLOBBER_CACHE animals, like
http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=jaguarundi&dt=2015-02-12%2013%3A03%3A00
, that run more frequently have not reported a problem. Neither has
leech which IIRC runs on the same system...

One avenue to look are my changes around both buffer pinning and
interrupt handling...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "multiple backends attempting to wait for pincount 1"

2015-02-12 Thread Tom Lane

Two different CLOBBER_CACHE_ALWAYS critters recently reported exactly
the same failure pattern on HEAD:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2015-02-06%2011%3A59%3A59
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tick&dt=2015-02-12%2010%3A22%3A57

I'd say we have a problem.  I'd even go so far as to say that somebody has
completely broken locking, because this looks like autovacuum and manual
vacuuming are hitting the same table at the same time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

Re: [HACKERS] "multiple backends attempting to wait for pincount 1"

[HACKERS] "multiple backends attempting to wait for pincount 1"

14 matches

Site Navigation

Mail list logo

Footer information