On Oct26, 2011, at 18:08 , Simon Riggs wrote:
> On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug <f...@phlo.org> wrote:
>> On Oct26, 2011, at 15:57 , Florian Pflug wrote:
>> Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), 
>> but
>> before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
>> whose CLOG page hasn't necessarily made it to the disk yet. The longer 
>> CheckPointGuts()
>> takes to finish it's work the more likely it becomes (assuming that CLOG 
>> writing
>> and syncing doesn't happen at the very end). This fits the OP's observation 
>> ob the
>> problem vanishing when pg_start_backup() does an immediate checkpoint.
> As it turns out the correct fix is actually just to skip StartupClog()
> until the end of recovery because it does nothing useful when executed
> at that time. When I wrote the original code I remember thinking that
> StartupClog() is superfluous at that point.

Hm, that fixes the problem in the hot standby case, but as I said in my
reply to Chris Redekop, normal crash recovery is also at risk (although
the probability of hitting the bug is much smaller there). Here's my
reasoning from that other mail:

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
  The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
  checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
  extension makes it to the disk, and before any trace of the XID assigned
  in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

Quite aside from that concern, I think it's probably not a good idea
for the nextXid value of a checkpoint to depend on whether wal_level
was set to hot_standby or not. Our recovery code is already quite complex
and hard to test, and this just adds one more combination that has to
be thought about while coding and that needs to be tested.

My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.

We could then also fold GetOldestActiveTransactionId() back into
your proposed LogStandbySnapshot() and thus don't need two ProcArray

best regards,
Florian Pflug

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to