At Mon, 07 Feb 2022 10:16:34 +0900 (JST), Kyotaro Horiguchi
<[email protected]> wrote in
> At Fri, 4 Feb 2022 10:59:04 +0900, Fujii Masao <[email protected]>
> wrote in
> > On 2022/02/03 15:50, Kyotaro Horiguchi wrote:
> > > By the way, restart point should start only while recoverying, and at
> > > the timeof the start both checkpoint.redo and checkpoint LSN are
> > > already past. We shouldn't update minRecovery point after promotion,
> > > but is there any reason for not updating the checkPoint and
> > > checkPointCopy? If we update them after promotion, the
> > > which-LSN-to-show problem would be gone.
> >
> > I tried to find the reason by reading the past discussion, but have
> > not found that yet.
> >
> > If we update checkpoint and REDO LSN at pg_control in that case, we
> > also need to update min recovery point at pg_control? Otherwise the
> > min recovery point at pg_control still indicates the old LSN that
> > previous restart point set.
>
> I had an assuption that the reason I think it shouldn't update
> minRecoveryPoint is that it has been or is going to be reset to
> invalid LSN by promotion and the checkpoint should refrain from
> touching it.
Hmm.. It doesn't seem to be the case. If a server crashes just after
promotion and before requesting post-promtion checkpoint,
minRecoveryPoint stays at a valid LSN.
(Promoted at 0/7000028)
Database cluster state: in production
Latest checkpoint location: 0/6000060
Latest checkpoint's REDO location: 0/6000028
Latest checkpoint's REDO WAL file: 000000010000000000000006
Minimum recovery ending location: 0/7000090
Min recovery ending loc's timeline: 2
minRecoveryPoint/TLI are ignored in any case where a server in
in-production state is started. In other words, the values are
useless. There's no clear or written reason for unrecording the last
ongoing restartpoint after promotion.
Before fast-promotion was introduced, we shouldn't get there after
end-of-recovery checkpoint (but somehow reached sometimes?) but it is
quite normal nowadays. Or to the contrary, we're expecting it to
happen and it is regarded as a normal checkponit. So we should do
there nowadays are as the follows.
- If any later checkpoint/restartpoint has been established, just skip
remaining task then return false. (!chkpt_was_latest)
(I'm not sure this can happen, though.)
- we update control file only when archive recovery is still ongoing.
- Otherwise reset minRecoveryPoint then continue.
Do you have any thoughts or opinions?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 958220c495..ab8a4d9a1b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9658,6 +9658,7 @@ CreateRestartPoint(int flags)
XLogRecPtr endptr;
XLogSegNo _logSegNo;
TimestampTz xtime;
+ bool chkpt_was_latest = false;
/* Get a local copy of the last safe checkpoint record. */
SpinLockAcquire(&XLogCtl->info_lck);
@@ -9752,44 +9753,69 @@ CreateRestartPoint(int flags)
PriorRedoPtr = ControlFile->checkPointCopy.redo;
/*
- * Update pg_control, using current time. Check that it still shows
- * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do
nothing;
- * this is a quick hack to make sure nothing really bad happens if
somehow
- * we get here after the end-of-recovery checkpoint.
+ * Update pg_control, using current time if no later checkpoints have
been
+ * performed.
*/
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
- if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
- ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
+ if (ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
{
+ chkpt_was_latest = true;
ControlFile->checkPoint = lastCheckPointRecPtr;
ControlFile->checkPointCopy = lastCheckPoint;
/*
- * Ensure minRecoveryPoint is past the checkpoint record.
Normally,
- * this will have happened already while writing out dirty
buffers,
- * but not necessarily - e.g. because no buffers were dirtied.
We do
- * this because a non-exclusive base backup uses
minRecoveryPoint to
- * determine which WAL files must be included in the backup,
and the
- * file (or files) containing the checkpoint record must be
included,
- * at a minimum. Note that for an ordinary restart of recovery
there's
- * no value in having the minimum recovery point any earlier
than this
+ * Ensure minRecoveryPoint is past the checkpoint record while
archive
+ * recovery is still ongoing. Normally, this will have happened
+ * already while writing out dirty buffers, but not necessarily
-
+ * e.g. because no buffers were dirtied. We do this because a
+ * non-exclusive base backup uses minRecoveryPoint to determine
which
+ * WAL files must be included in the backup, and the file (or
files)
+ * containing the checkpoint record must be included, at a
+ * minimum. Note that for an ordinary restart of recovery
there's no
+ * value in having the minimum recovery point any earlier than
this
* anyway, because redo will begin just after the checkpoint
record.
*/
- if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr)
+ if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY)
{
- ControlFile->minRecoveryPoint = lastCheckPointEndPtr;
- ControlFile->minRecoveryPointTLI =
lastCheckPoint.ThisTimeLineID;
+ if (ControlFile->minRecoveryPoint <
lastCheckPointEndPtr)
+ {
+ ControlFile->minRecoveryPoint =
lastCheckPointEndPtr;
+ ControlFile->minRecoveryPointTLI =
lastCheckPoint.ThisTimeLineID;
- /* update local copy */
- minRecoveryPoint = ControlFile->minRecoveryPoint;
- minRecoveryPointTLI = ControlFile->minRecoveryPointTLI;
+ /* update local copy */
+ minRecoveryPoint =
ControlFile->minRecoveryPoint;
+ minRecoveryPointTLI =
ControlFile->minRecoveryPointTLI;
+ }
+ if (flags & CHECKPOINT_IS_SHUTDOWN)
+ ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
+ }
+ else
+ {
+ /*
+ * We have exited from archive-recovery mode after
starting. Crash
+ * recovery ever after should always recover to the end
of WAL
+ */
+ ControlFile->minRecoveryPoint = InvalidXLogRecPtr;
+ ControlFile->minRecoveryPointTLI = 0;
}
- if (flags & CHECKPOINT_IS_SHUTDOWN)
- ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY;
UpdateControlFile();
}
LWLockRelease(ControlFileLock);
+ /*
+ * Skip all post-checkpoint work if others have done that with later
+ * checkpoints.
+ */
+ if (!chkpt_was_latest)
+ {
+ ereport((log_checkpoints ? LOG : DEBUG2),
+ (errmsg("post-restartpoint cleanup is skpped at
%X/%X, because later restartpoints have been already performed",
+
LSN_FORMAT_ARGS(lastCheckPoint.redo))));
+
+ /* this checkpoint has not bee established */
+ return false;
+ }
+
/*
* Update the average distance between checkpoints/restartpoints if the
* prior checkpoint exists.