Re: [BUGS] Re: BUG #7969: Postgres Recovery Fatal With: "incorrect local pin count:2"

Heikki Linnakangas Wed, 27 Mar 2013 12:48:50 -0700

On 27.03.2013 21:04, Heikki Linnakangas wrote:

On 27.03.2013 20:27, Josh Berkus wrote:

Folks,


So I'm a bit surprised that this bug report hasn't gotten a follow-up.
Does this sound like the known 9.2.2 corruption issue, or is it
potentially something else?


It seems like a new issue. At a quick glance, I think there's a bug in
heap_xlog_update, ie. the redo routine of a heap update. If the new
tuple is put on a different page, and at redo, the new page doesn't
exist (that's normal if it was later vacuumed away), heap_xlog_update
leaks a pin on the old page. Here:

{
nbuffer = XLogReadBuffer(xlrec->target.node,
ItemPointerGetBlockNumber(&(xlrec->newtid)),
false);
if (!BufferIsValid(nbuffer))
return;
page = (Page) BufferGetPage(nbuffer);

if (XLByteLE(lsn, PageGetLSN(page))) /* changes are applied */
{
UnlockReleaseBuffer(nbuffer);
if (BufferIsValid(obuffer))
UnlockReleaseBuffer(obuffer);
return;
}
}


Notice how in the first 'return' above, obuffer is not released.

I'll try to create a reproducible test case for this, and fix..


Ok, here's how to reproduce it:

create table foo (i int4 primary key);
insert into foo select generate_series(1,1000);
checkpoint;
-- update a tuple from the first page, new tuple goes to last page
update foo set i = 10000 where i = 1;
-- delete everything on pages > 1
delete from foo where i > 10;
-- truncate the table, including the page the updated tuple went to
vacuum verbose foo;

pg_ctl stop -m immediate

This bug was introduced by commit8805ff6580621d0daee350826de5211d6bb36ec3, in 9.2.2 (and 9.1.7 and9.0.11), which fixed multiple WAL replay issues with Hot Standby. Beforethat commit, replaying a heap update didn't try to keep both bufferslocked at the same time, which is necessary for the correctness of hotstandby. The patch fixed that, but missed releasing the old buffer inthis corner case. I was not able to come up with a scenario withfull_page_writes=on where this would fail, but I'm also not 100% sure itcan't happen.

I scanned through the commit, and couldn't see any other instances ofthis kind of a bug. heap_xlog_update is more complicated than other redofunctions, with all the return statements inside it. It could use somerefactoring, but for now, I'll commit the attached small fix.


- Heikki

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 595dead..860fd20 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5367,7 +5367,11 @@ newt:;
 								 ItemPointerGetBlockNumber(&(xlrec->newtid)),
 								 false);
 		if (!BufferIsValid(nbuffer))
+		{
+			if (BufferIsValid(obuffer))
+				UnlockReleaseBuffer(obuffer);
 			return;
+		}
 		page = (Page) BufferGetPage(nbuffer);
 
 		if (XLByteLE(lsn, PageGetLSN(page)))	/* changes are applied */

-- 
Sent via pgsql-bugs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Re: BUG #7969: Postgres Recovery Fatal With: "incorrect local pin count:2"

Reply via email to