On Thu, Sep 6, 2012 at 3:04 AM, Noah Misch <n...@leadboat.com> wrote: > On Tue, Sep 04, 2012 at 09:46:58AM -0700, Daniel Farina wrote: >> I might try to find the segments leading up to the overflow point and >> try xlogdumping them to see what we can see. > > That would be helpful to see. > > Just to grasp at yet-flimsier straws, could you post (URL preferred, else > private mail) the output of "objdump -dS" on your "postgres" executable?
https://dl.dropbox.com/s/444ktxbrimaguxu/txid-wrap-objdump-dS-postgres.txt.gz Sure, it's a 9.0.6 with pg_cancel_backend by-same-role backported along with the standard debian changes, so nothing all that interesting should be going on that isn't going on normally with compilers on this platform. I am also starting to grovel through this assembly, although I don't have a ton of experience finding problems this way. To save you a tiny bit of time aligning the assembly with the C, this line c797f: e8 7c c9 17 00 callq 244300 <LWLockAcquire> Seems to be the beginning of: LWLockAcquire(XidGenLock, LW_SHARED); checkPoint.nextXid = ShmemVariableCache->nextXid; checkPoint.oldestXid = ShmemVariableCache->oldestXid; checkPoint.oldestXidDB = ShmemVariableCache->oldestXidDB; LWLockRelease(XidGenLock); >> If there's anything to note about the workload, I'd say that it does >> tend to make fairly pervasive use of long running transactions which >> can span probably more than one checkpoint, and the txid reporting >> functions, and a concurrency level of about 300 or so backends ... but >> per my reading of the mechanism so far, it doesn't seem like any of >> this should matter. > > Thanks for the details; I agree none of that sounds suspicious. > > After some further pondering and testing, this remains a mystery to me. These > symptoms imply a proper update of ControlFile->checkPointCopy.nextXid without > having properly updated ControlFile->checkPointCopy.nextXidEpoch. After > recovery, only CreateCheckPoint() updates ControlFile->checkPointCopy at all. > Its logic for doing so looks simple and correct. Yeah. I'm pretty flabbergasted that so much seems to be going right while this goes wrong. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers