On 2016-05-05 11:52:46 -0700, Andres Freund wrote: > Hi Jeff, > > On 2016-04-29 10:38:55 -0700, Jeff Janes wrote: > > I don't see the problem with an cassert-enabled, probably because it > > is just too slow to ever reach the point where the problem occurs. > > Running the test with cassert enabled I actually get assertion failures, > due to the FATAL you added. > > #1 0x0000000000958dde in ExceptionalCondition (conditionName=0xb36c2a > "!(RefCountErrors == 0)", errorType=0xb361af "FailedAssertion", > fileName=0xb36170 > "/home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c", > lineNumber=2506) at > /home/admin/src/postgresql/src/backend/utils/error/assert.c:54 > #2 0x00000000007c9fc9 in CheckForBufferLeaks () at > /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2506 > #3 0x00000000007c9f09 in AtProcExit_Buffers (code=1, arg=0) at > /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2459 > #4 0x00000000007d927f in shmem_exit (code=1) at > /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:261 > #5 0x00000000007d90dd in proc_exit_prepare (code=1) at > /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:185 > #6 0x00000000007d904b in proc_exit (code=1) at > /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:102 > #7 0x000000000095958d in errfinish (dummy=0) at > /home/admin/src/postgresql/src/backend/utils/error/elog.c:543 > #8 0x000000000080214b in mdwrite (reln=0x2e8b4a8, forknum=MAIN_FORKNUM, > blocknum=154, buffer=0x2e8e5a8 "", skipFsync=0 '\000') > at /home/admin/src/postgresql/src/backend/storage/smgr/md.c:832 > #9 0x0000000000804633 in smgrwrite (reln=0x2e8b4a8, forknum=MAIN_FORKNUM, > blocknum=154, buffer=0x2e8e5a8 "", skipFsync=0 '\000') > at /home/admin/src/postgresql/src/backend/storage/smgr/smgr.c:650 > #10 0x00000000007ca548 in FlushBuffer (buf=0x7f0285955330, reln=0x2e8b4a8) at > /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2734 > #11 0x00000000007c9d5a in SyncOneBuffer (buf_id=2503, skip_recently_used=0 > '\000', wb_context=0x7ffe7305d290) at > /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2377 > #12 0x00000000007c964e in BufferSync (flags=64) at > /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:1967 > #13 0x00000000007ca185 in CheckPointBuffers (flags=64) at > /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2561 > #14 0x000000000052d497 in CheckPointGuts (checkPointRedo=382762776, flags=64) > at /home/admin/src/postgresql/src/backend/access/transam/xlog.c:8644 > #15 0x000000000052cede in CreateCheckPoint (flags=64) at > /home/admin/src/postgresql/src/backend/access/transam/xlog.c:8430 > #16 0x00000000007706ac in CheckpointerMain () at > /home/admin/src/postgresql/src/backend/postmaster/checkpointer.c:488 > #17 0x000000000053e0d5 in AuxiliaryProcessMain (argc=2, argv=0x7ffe7305ea40) > at /home/admin/src/postgresql/src/backend/bootstrap/bootstrap.c:429 > #18 0x000000000078099f in StartChildProcess (type=CheckpointerProcess) at > /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:5227 > #19 0x000000000077dcc3 in reaper (postgres_signal_arg=17) at > /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:2781 > #20 <signal handler called> > #21 0x00007f028ebbdac3 in __select_nocancel () at > ../sysdeps/unix/syscall-template.S:81 > #22 0x000000000077c049 in ServerLoop () at > /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:1654 > #23 0x000000000077b7a9 in PostmasterMain (argc=4, argv=0x2e49f20) at > /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:1298 > #24 0x00000000006c5849 in main (argc=4, argv=0x2e49f20) at > /home/admin/src/postgresql/src/backend/main/main.c:228 > > You didn't see those? > > > The trigger here appears to be that the checkpointer doesn't have > on-exit callback similar to a normal backend's ShutdownPostgres() et al, > and thus doesn't trigger a resource owner release. The normal ERROR > path has > /* buffer pins are released here: */ > ResourceOwnerRelease(CurrentResourceOwner, > > RESOURCE_RELEASE_BEFORE_LOCKS, > false, true); > /* we needn't bother with the other ResourceOwnerRelease phases > */ > > That clearly is a bug. But I'm not immediately seing how this could > trigger the corruption issue you observed.
The same issue exists in bgwriter afaics. ISTM that we need to provide an before_shmem_exit (or on_shmem_exit?) handler for both which essentially does /* * These operations are really just a minimal subset of * AbortTransaction(). We don't have very many resources to worry * about in bgwriter, but we do have LWLocks, buffers, and temp files. */ LWLockReleaseAll(); AbortBufferIO(); UnlockBuffers(); /* buffer pins are released here: */ ResourceOwnerRelease(CurrentResourceOwner, RESOURCE_RELEASE_BEFORE_LOCKS, false, true); it looks to me like that should be backpatched? There's some question about how to make the ordering vs. AtProcExit_Buffers robust; which is why I'm above explicitly doing LWLockReleaseAll/AbortBufferIO/UnlockBuffers. Any better ideas? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers