On Oct26, 2011, at 17:36 , Chris Redekop wrote: > > And I think they also reported that if they didn't run hot standby, > > but just normal recovery into a new master, it didn't have the problem > > either, i.e. without hotstandby, recovery ran, properly extended the > > clog, and then ran as a new master fine. > > Yes this is correct...attempting to start as hotstandby will produce the > pg_clog error repeatedly and then without changing anything else, just > turning hot standby off it will start up successfully.
Yup, because with hot standby disabled (on the client side), StartupCLOG() happens after recovery has completed. That, at the very least, makes the problem very unlikely to occur in the non-hot-standby case. I'm not sure it's completely impossible, though. Per my theory about the cause of the problem in my other mail, I think you might see StartupCLOG failures even during crash recovery, provided that wal_level was set to hot_standby when the primary crashed. Here's how 1) We start a checkpoint, and get as far as LogStandbySnapshot() 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId(). The assigned XID requires CLOG extension. 3) The checkpoint continues, and LogStandbySnapshot () advances the checkpoint's nextXid to the XID assigned in (2). 4) We crash after writing the checkpoint record, but before the CLOG extension makes it to the disk, and before any trace of the XID assigned in (2) makes it to the xlog. Then StartupCLOG() would fail at the end of recovery, because we'd end up with a nextXid whose corresponding CLOG page doesn't exist. > > This fits the OP's observation ob the > > problem vanishing when pg_start_backup() does an immediate checkpoint. > > Note that this is *not* the behaviour I'm seeing....it's possible it happens > more frequently without the immediate checkpoint, but I am seeing it happen > even with the immediate checkpoint. Yeah, I should have said "of the problem's likelihood decreasing" instead of "vanishing". The point is, the longer the checkpoint takes, the higher the chance the nextId is advanced far enough to require a CLOG extension. That alone isn't enough to trigger the error - the CLOG extension must also *not* make it to the disk before the checkpoint completes - but it's a required precondition for the error to occur. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers