Hi,

On 2012-12-23 02:36:42 -0500, Greg Smith wrote:
> I'm testing a checkout from a few days ago and trying to complete a day long
> pgbench stress test, with assertions and debugging on.  I want to make sure
> the base code works as expected before moving on to testing checksums.  It's
> crashing before finishing though.  Here's a sample:
>
> 2012-12-20 20:36:11 EST [26613]: LOG:  checkpoint complete: wrote 32 buffers
> (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3.108
> s, sync=0.978 s, total=4.187 s; sync files=7, longest=0.832 s, average=0.139
> s
> TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line:
> 1703)
> 2012-12-20 20:36:44 EST [26611]: LOG:  server process (PID 4834) was
> terminated by signal 6: Aborted
> 2012-12-20 20:36:44 EST [26611]: DETAIL:  Failed process was running: END;
>
> Running the same test set again gave the same crash, so this looks
> repeatable.  It's not trivial to trigger given the average runtime to crash
> I'm seeing.  Second sample:
>
> 2012-12-22 21:27:17 EST [6006]: LOG:  checkpoint complete: wrote 34 buffers
> (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=3.310
> s, sync=2.906 s, total=6.424 s; sync files=7, longest=2.648 s, average=0.415
> s
> TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line:
> 1703)
> 2012-12-22 21:27:23 EST [26611]: LOG:  server process (PID 12143) was
> terminated by signal 6: Aborted
> 2012-12-22 21:27:23 EST [26611]: DETAIL:  Failed process was running: END;
>
> The important part:
>
> TRAP: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line:
> 1703)
>
> That's the check done by "AtEOXact_Buffers - clean up at end of
> transaction".  It fired while executing the "END;"" statement at the end of
> the standard pgbench test.  The test looks for loose things that weren't
> pinned/unpinned correctly by the transaction.  I'm not sure if getting a
> stack trace at that point will be useful.  The ref count mistake could
> easily have been made by an earlier statement in the transaction block,
> right?
>
> This is on a server where shared_buffers=2GB, checkpoint_segments=64. Test
> runs were automated with pgbench-tools, using the following configuration:
>
> MAX_WORKERS="4"
> SCRIPT="tpc-b.sql"
> SCALES="500 1000"
> SETCLIENTS="4 8 16 32 64 96"
> SETTIMES=3
> RUNTIME=600
>
> All of the crashes so far have been at scale=500, and I haven't seen those
> finish yet.  It failed 7 hours in the first time, then 4 hours in.
>
> For code reference, the last commit in the source code tree I built against
> was:
>
> af275a12dfeecd621ed9899a9382f26a68310263
> Thu Dec 20 14:23:31 2012 +0200
>
> Looking at recent activity, the only buffer pin related changes I saw was
> this one:
>
> commit 62656617dbe49cca140f3da588a5e311b3fc35ea
> Date:   Mon Dec 3 18:53:31 2012 +0000
>     Avoid holding vmbuffer pin after VACUUM.

I don't immediately see anything bad here.

> This is my first test like this against 9.3 development though, so the cause
> could be an earlier commit.  I'm just starting with the most recent work as
> the first suspect.  Next I think I'll try autovacuum=off and see if the
> crash goes away.  Other ideas are welcome.

Something like the (untested) debug message below might be helpful:

diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index 7be767b..20f61a1 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1700,7 +1700,13 @@ AtEOXact_Buffers(bool isCommit)
 
                for (i = 0; i < NBuffers; i++)
                {
-                       Assert(PrivateRefCount[i] == 0);
+                       if (PrivateRefCount[i] != 0)
+                       {
+                               BufferDesc *bufHdr = &BufferDescriptors[i];
+                               elog(PANIC, "refcount of %s is %u should be 0, 
globally: %u",
+                                    relpathbackend(bufHdr->tag.rnode, 
InvalidBackendId, bufHdr->tag.forkNum),
+                                    PrivateRefCount[i], bufHdr->refcount);
+                       }
                }
        }
 #endif

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to