On Tue, May 10, 2016 at 4:02 PM, Andres Freund <and...@anarazel.de> wrote:
> On 2016-05-10 15:53:38 -0700, Jeff Janes wrote:
>> But isn't CreateCheckPoint called at the end of the checkpoint, not
>> the start of it?
> No, CreateCheckPoint() does it all.
> CreateCheckPoint(int flags)
> /* 1) determine redo pointer */
> curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos);
> prevPtr = XLogBytePosToRecPtr(Insert->PrevBytePos);
> /* 2) determine oid */
> LWLockAcquire(OidGenLock, LW_SHARED);
> checkPoint.nextOid = ShmemVariableCache->nextOid;
> if (!shutdown)
> checkPoint.nextOid += ShmemVariableCache->oidCount;
> /* 3) actually checkpoint shared_buffers et al. */
> CheckPointGuts(checkPoint.redo, flags);
> /* 4) log the checkpoint */
> recptr = XLogInsert(RM_XLOG_ID,
> shutdown ? XLOG_CHECKPOINT_SHUTDOWN :
OK, got it. I don' t know how I missed the bigger picture of that
function in the first place.
But, another perhaps stupid question, why do we care what the value of
nextOid was at the start of the last successfully completed
checkpoint? Wouldn't it make more sense to populate that part of the
record at the end, not the start, of the checkpoint?
>> I don't understand how it could be out-of-date at that point. But
>> obviously it is.
> A checkpoint logically "starts" at 1) in the above abbreviated
> CreateCheckPoint(), that's where recovery starts when starting up from
> that checkpoint. But inbetween 1) and 4) all other backends can continue
> to insert WAL, and it'll be replayed *before* the checkpoint's record
> itself. That means that if some backend generates a NEXTOID record
> between 2) and 4) (with largers checkpoints we're looking at minutes to
> an hour of time), it's effects will temporarily take effect (as in
> ShmemVariableCache->nextOid is updated), but XLOG_CHECKPOINT_ONLINE's
> replay will overwrite it unconditionally:
> xlog_redo(XLogReaderState *record)
> else if (info == XLOG_CHECKPOINT_ONLINE)
> /* ... but still treat OID counter as exact */
> LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
> ShmemVariableCache->nextOid = checkPoint.nextOid;
> ShmemVariableCache->oidCount = 0;
> Makes sense?
So it seems like we should unconditionally **not** do that when
replaying XLOG_CHECKPOINT_ONLINE, right?
If this is the checkpoint record at which we started the system then
we would have done that "replay" into ShmemVariableCache->nextOid
during StartupXLOG(), there is no reason to do it again when redo
passes through that record the 2nd time.
But, to round this out, all of this is formally only a hint on where
to start OIDs so as to avoid performance problems as you busy-loop
looking for an unused OID. The real correctness bug is in the
hint-bit problem you discuss elsewhere that turns collisions into
corruption, and this bug just makes that other one reachable?
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: