I wrote: > Actually, it looks like there is an extremely simple way to handle this, > which is to move the call of LogStandbySnapshot (which generates the WAL > record in question) to before the checkpoint's REDO pointer is set, but > after we have decided that we need a checkpoint.
On further contemplation, there is a downside to that idea, which probably explains why the code was written as it was: if we place the XLOG_RUNNING_XACTS WAL record emitted during a checkpoint before rather than after the checkpoint's REDO point, then a hot standby slave starting up from that checkpoint won't process the XLOG_RUNNING_XACTS record. That means its KnownAssignedXids machinery won't be fully operational until the master starts another checkpoint, which might be awhile. So this could result in undesirable delay in hot standby mode becoming active. I am not sure how significant this really is though. Comments? If we don't like that, I can think of a couple of other ways to get there, but they have their own downsides: * Instead of trying to detect after-the-fact whether any concurrent WAL activity happened during the last checkpoint, we could detect it during the checkpoint and then keep the info in a static variable in the checkpointer process until next time. However, I don't see any bulletproof way to do this without adding at least one or two lines of code within XLogInsert, which I'm sure Robert will complain about. * We could expand checkpoint records to contain two different REDO pointers, one to be used by hot standby slaves and one for normal crash recovery. (The LogStandbySnapshot records would appear between these two points; we'd still be moving them up to the start of the checkpoint sequence.) This is a relatively clean solution but would force pg_upgrade between beta2 and beta3, so that's not so nice. * Combining the two ideas, we could take the nominal REDO pointer, run LogStandbySnapshot, make a fresh note of where the insert point is (real REDO point, which is what we publish in shared memory for the bufmgr to compare LSNs to), complete the checkpoint, and write the checkpoint record using the nominal REDO pointer so that that's where any crash or HS slave starts from. But save the real REDO pointer in checkpointer static state, and in the next checkpoint use that rather than the nominal pointer to decide if anything's happened that would force a new checkpoint. I think this dodges both of the above complaints, but it feels pretty baroque. Thoughts, other ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers