On Tue, Sep 04, 2012 at 09:46:58AM -0700, Daniel Farina wrote: > I might try to find the segments leading up to the overflow point and > try xlogdumping them to see what we can see.
That would be helpful to see. Just to grasp at yet-flimsier straws, could you post (URL preferred, else private mail) the output of "objdump -dS" on your "postgres" executable? > If there's anything to note about the workload, I'd say that it does > tend to make fairly pervasive use of long running transactions which > can span probably more than one checkpoint, and the txid reporting > functions, and a concurrency level of about 300 or so backends ... but > per my reading of the mechanism so far, it doesn't seem like any of > this should matter. Thanks for the details; I agree none of that sounds suspicious. After some further pondering and testing, this remains a mystery to me. These symptoms imply a proper update of ControlFile->checkPointCopy.nextXid without having properly updated ControlFile->checkPointCopy.nextXidEpoch. After recovery, only CreateCheckPoint() updates ControlFile->checkPointCopy at all. Its logic for doing so looks simple and correct. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers