On Fri, Sep 24, 2021 at 5:07 PM Amul Sul <sula...@gmail.com> wrote:
>
> On Thu, Sep 23, 2021 at 11:56 PM Robert Haas <robertmh...@gmail.com> wrote:
> >
> > On Mon, Sep 20, 2021 at 11:20 AM Amul Sul <sula...@gmail.com> wrote:
> > > Ok, understood, I have separated my changes into 0001 and 0002 patch,
> > > and the refactoring patches start from 0003.
> >
> > I think it would be better in the other order, with the refactoring
> > patches at the beginning of the series.
> >
>
> Ok, will do that. I did this other way to minimize the diff e.g.
> deletion diff of RecoveryXlogAction enum and
> DetermineRecoveryXlogAction(), etc.
>

I have reversed the patch order. Now refactoring patches will be
first, and the patch that removes the dependencies on global & local
variables will be the last. I did the necessary modification in the
refactoring patches too e.g. removed DetermineRecoveryXlogAction() and
RecoveryXlogAction enum which is no longer needed (thanks to commit #
1d919de5eb3fffa7cc9479ed6d2915fb89794459 to make code simple).

To find the value of InRecovery after we clear it, patch still uses
ControlFile's DBState, but now the check condition changed to a more
specific one which is less confusing.

In casual off-list discussion, the point was made to check
SharedRecoveryState to find out the InRecovery value afterward, and
check that using RecoveryInProgress().  But we can't depend on
SharedRecoveryState because at the start it gets initialized to
RECOVERY_STATE_CRASH irrespective of InRecovery that happens later.
Therefore, we can't use RecoveryInProgress() which always returns
true if SharedRecoveryState != RECOVERY_STATE_DONE.

I am posting only refactoring patches for now.

Regards,
Amul
From 730e8331fefc882b4cab7112adf0f4d8da1ea831 Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Thu, 30 Sep 2021 06:29:06 -0400
Subject: [PATCH v36 4/4] Remove dependencies on startup-process specifical
 variables.

To make XLogAcceptWrites(), need to dependency on few global and local
variable spcific to startup process.

Global variables are ArchiveRecoveryRequested and
LocalPromoteIsTriggered, whereas LocalPromoteIsTriggered can be
accessed in any other process using existing PromoteIsTriggered().
ArchiveRecoveryRequested is made accessible by copying into shared
memory.

XLogAcceptWrites() accepts two argument as EndOfLogTLI and EndOfLog
which are local to StartupXLOG(). Instead of passing as an argument
XLogCtl->replayEndTLI and XLogCtl->lastSegSwitchLSN from the shared
memory can be used as an replacement to EndOfLogTLI and EndOfLog
respectively.  XLogCtl->lastSegSwitchLSN is not going to change until
we use it. That changes only when the current WAL segment gets full
which never going to happen because of two reasons, first WAL writes
are disabled for other processes until XLogAcceptWrites() finishes and
other reasons before use of lastSegSwitchLSN, XLogAcceptWrites() is
writes fix size wal records as full-page write and record for either
recovery end or checkpoint which not going to fill up the 16MB wal
segment.

EndOfLogTLI in the StartupXLOG() is the timeline ID of the last record
that xlogreader reads, but this xlogreader was simply re-fetching the
last record which we have replied in redo loop if it was in recovery,
if not in recovery, we don't need to worry since this value is needed
only in case of ArchiveRecoveryRequested = true, which implicitly
forces redo and sets XLogCtl->replayEndTLI value.
---
 src/backend/access/transam/xlog.c | 36 ++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 91cdd7d9ff2..5b4e5ac379f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -659,6 +659,13 @@ typedef struct XLogCtlData
 	 */
 	bool		SharedPromoteIsTriggered;
 
+	/*
+	 * SharedArchiveRecoveryRequested exports the value of the
+	 * ArchiveRecoveryRequested flag to be share which is otherwise valid only
+	 * in the startup process.
+	 */
+	bool		SharedArchiveRecoveryRequested;
+
 	/*
 	 * WalWriterSleeping indicates whether the WAL writer is currently in
 	 * low-power mode (and hence should be nudged if an async commit occurs).
@@ -880,8 +887,7 @@ static MemoryContext walDebugCxt = NULL;
 static void readRecoverySignalFile(void);
 static void validateRecoveryParameters(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
-static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
-										XLogRecPtr EndOfLog);
+static void CleanupAfterArchiveRecovery(void);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static char *getRecoveryStopReason(void);
@@ -927,7 +933,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
 							  int emode, bool fetching_ckpt);
 static void CheckRecoveryConsistency(void);
-static bool XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog);
+static bool XLogAcceptWrites(void);
 static bool PerformRecoveryXLogAction(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
@@ -5230,6 +5236,7 @@ XLOGShmemInit(void)
 	XLogCtl->SharedHotStandbyActive = false;
 	XLogCtl->InstallXLogFileSegmentActive = false;
 	XLogCtl->SharedPromoteIsTriggered = false;
+	XLogCtl->SharedArchiveRecoveryRequested = false;
 	XLogCtl->WalWriterSleeping = false;
 
 	SpinLockInit(&XLogCtl->Insert.insertpos_lck);
@@ -5511,6 +5518,11 @@ readRecoverySignalFile(void)
 		ereport(FATAL,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("standby mode is not supported by single-user servers")));
+
+	/*
+	 * Remember archive recovery request in shared memory state.
+	 */
+	XLogCtl->SharedArchiveRecoveryRequested = ArchiveRecoveryRequested;
 }
 
 static void
@@ -5702,8 +5714,10 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
  * Perform cleanup actions at the conclusion of archive recovery.
  */
 static void
-CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog)
+CleanupAfterArchiveRecovery(void)
 {
+	XLogRecPtr	EndOfLog;
+
 	/*
 	 * Execute the recovery_end_command, if any.
 	 */
@@ -5720,6 +5734,7 @@ CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog)
 	 * files containing garbage. In any case, they are not part of the new
 	 * timeline's history so we don't need them.
 	 */
+	(void) GetLastSegSwitchData(&EndOfLog);
 	RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
 
 	/*
@@ -5754,6 +5769,7 @@ CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog)
 	{
 		char		origfname[MAXFNAMELEN];
 		XLogSegNo	endLogSegNo;
+		TimeLineID	EndOfLogTLI = XLogCtl->replayEndTLI;
 
 		XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
 		XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
@@ -8023,7 +8039,7 @@ StartupXLOG(void)
  	Insert->fullPageWrites = lastFullPageWrites;
 
 	/* Prepare to accept WAL writes. */
-	promoted = XLogAcceptWrites(EndOfLogTLI, EndOfLog);
+	promoted = XLogAcceptWrites();
 
 	/*
 	 * If there were cascading standby servers connected to us, nudge any wal
@@ -8045,7 +8061,7 @@ StartupXLOG(void)
  * Prepare to accept WAL writes.
  */
 static bool
-XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog)
+XLogAcceptWrites(void)
 {
 	bool		promoted = false;
 
@@ -8063,8 +8079,8 @@ XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog)
 		promoted = PerformRecoveryXLogAction();
 
 	/* If this is archive recovery, perform post-recovery cleanup actions. */
-	if (ArchiveRecoveryRequested)
-		CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog);
+	if (XLogCtl->SharedArchiveRecoveryRequested)
+		CleanupAfterArchiveRecovery();
 
 	/*
 	 * If any of the critical GUCs have changed, log them before we allow
@@ -8232,8 +8248,8 @@ PerformRecoveryXLogAction(void)
 	 * a full checkpoint. A checkpoint is requested later, after we're fully out
 	 * of recovery mode and already accepting queries.
 	 */
-	if (ArchiveRecoveryRequested && IsUnderPostmaster &&
-		LocalPromoteIsTriggered)
+	if (XLogCtl->SharedArchiveRecoveryRequested && IsUnderPostmaster &&
+		PromoteIsTriggered())
 	{
 		promoted = true;
 
-- 
2.18.0

From 76587e09ce6b7811ff940e2e65051cb49e7c16e6 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 23 Jul 2021 15:37:53 -0400
Subject: [PATCH v36 3/4] Create XLogAcceptWrites() function with code from
 StartupXLOG().

This is just code movement. A future patch will want to defer the
call to XLogAcceptWrites() until a later time, rather than doing
it as soon as we finish applying WAL, but here we're just grouping
related code together into a new function.

Robert Haas, with modifications by Amul Sul.
---
 src/backend/access/transam/xlog.c | 52 ++++++++++++++++++++++---------
 1 file changed, 37 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 93849d8f29a..91cdd7d9ff2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -927,6 +927,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
 							  int emode, bool fetching_ckpt);
 static void CheckRecoveryConsistency(void);
+static bool XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog);
 static bool PerformRecoveryXLogAction(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
@@ -8014,7 +8015,41 @@ StartupXLOG(void)
 	 * record before resource manager writes cleanup WAL records or checkpoint
 	 * record is written.
 	 */
-	Insert->fullPageWrites = lastFullPageWrites;
+ 	/*
+	 * Update full_page_writes in shared memory, and later whenever wal write
+	 * permitted, write an XLOG_FPW_CHANGE record before resource manager
+	 * writes cleanup WAL records or checkpoint record is written.
+ 	 */
+ 	Insert->fullPageWrites = lastFullPageWrites;
+
+	/* Prepare to accept WAL writes. */
+	promoted = XLogAcceptWrites(EndOfLogTLI, EndOfLog);
+
+	/*
+	 * If there were cascading standby servers connected to us, nudge any wal
+	 * sender processes to notice that we've been promoted.
+	 */
+	WalSndWakeup();
+
+	/*
+	 * If this was a promotion, request an (online) checkpoint now. This isn't
+	 * required for consistency, but the last restartpoint might be far back,
+	 * and in case of a crash, recovering from it might take a longer than is
+	 * appropriate now that we're not in standby mode anymore.
+	 */
+	if (promoted)
+		RequestCheckpoint(CHECKPOINT_FORCE);
+}
+
+/*
+ * Prepare to accept WAL writes.
+ */
+static bool
+XLogAcceptWrites(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog)
+{
+	bool		promoted = false;
+
+	/* Write an XLOG_FPW_CHANGE record */
 	LocalSetXLogInsertAllowed();
 	UpdateFullPageWrites();
 	LocalXLogInsertAllowed = -1;
@@ -8070,20 +8105,7 @@ StartupXLOG(void)
 	UpdateControlFile();
 	LWLockRelease(ControlFileLock);
 
-	/*
-	 * If there were cascading standby servers connected to us, nudge any wal
-	 * sender processes to notice that we've been promoted.
-	 */
-	WalSndWakeup();
-
-	/*
-	 * If this was a promotion, request an (online) checkpoint now. This isn't
-	 * required for consistency, but the last restartpoint might be far back,
-	 * and in case of a crash, recovering from it might take a longer than is
-	 * appropriate now that we're not in standby mode anymore.
-	 */
-	if (promoted)
-		RequestCheckpoint(CHECKPOINT_FORCE);
+	return promoted;
 }
 
 /*
-- 
2.18.0

From 1a14516bfca72febbc3e70f7d25398c0f074c3d8 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 23 Jul 2021 13:07:56 -0400
Subject: [PATCH v36 1/4] Refactor some end-of-recovery code out of
 StartupXLOG().

Moved the code that performs whether to write a checkpoint or an
end-of-recovery record into PerformRecoveryXlogAction().

Also create a new function CleanupAfterArchiveRecovery() to
perform a few tasks that we want to do after we've actually exited
archive recovery but before we start accepting new WAL writes.
This is straightforward code movement to make StartupXLOG() a
little bit shorter and a little bit easier to understand.

Robert Haas, with modifications by Amul Sul.
---
 src/backend/access/transam/xlog.c | 261 ++++++++++++++++--------------
 1 file changed, 143 insertions(+), 118 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a749da..397f7d486a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -880,6 +880,8 @@ static MemoryContext walDebugCxt = NULL;
 static void readRecoverySignalFile(void);
 static void validateRecoveryParameters(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
+static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,
+										XLogRecPtr EndOfLog);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static char *getRecoveryStopReason(void);
@@ -925,6 +927,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
 							  int emode, bool fetching_ckpt);
 static void CheckRecoveryConsistency(void);
+static bool PerformRecoveryXLogAction(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
@@ -5694,6 +5697,88 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 			(errmsg("archive recovery complete")));
 }
 
+/*
+ * Perform cleanup actions at the conclusion of archive recovery.
+ */
+static void
+CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI, XLogRecPtr EndOfLog)
+{
+	/*
+	 * Execute the recovery_end_command, if any.
+	 */
+	if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
+		ExecuteRecoveryCommand(recoveryEndCommand,
+							   "recovery_end_command",
+							   true);
+
+	/*
+	 * We switched to a new timeline. Clean up segments on the old timeline.
+	 *
+	 * If there are any higher-numbered segments on the old timeline, remove
+	 * them. They might contain valid WAL, but they might also be pre-allocated
+	 * files containing garbage. In any case, they are not part of the new
+	 * timeline's history so we don't need them.
+	 */
+	RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
+
+	/*
+	 * If the switch happened in the middle of a segment, what to do with the
+	 * last, partial segment on the old timeline? If we don't archive it, and
+	 * the server that created the WAL never archives it either (e.g. because it
+	 * was hit by a meteor), it will never make it to the archive. That's OK
+	 * from our point of view, because the new segment that we created with the
+	 * new TLI contains all the WAL from the old timeline up to the switch
+	 * point. But if you later try to do PITR to the "missing" WAL on the old
+	 * timeline, recovery won't find it in the archive. It's physically present
+	 * in the new file with new TLI, but recovery won't look there when it's
+	 * recovering to the older timeline. On the other hand, if we archive the
+	 * partial segment, and the original server on that timeline is still
+	 * running and archives the completed version of the same segment later, it
+	 * will fail. (We used to do that in 9.4 and below, and it caused such
+	 * problems).
+	 *
+	 * As a compromise, we rename the last segment with the .partial suffix, and
+	 * archive it. Archive recovery will never try to read .partial segments, so
+	 * they will normally go unused. But in the odd PITR case, the administrator
+	 * can copy them manually to the pg_wal directory (removing the suffix).
+	 * They can be useful in debugging, too.
+	 *
+	 * If a .done or .ready file already exists for the old timeline, however,
+	 * we had already determined that the segment is complete, so we can let it
+	 * be archived normally. (In particular, if it was restored from the archive
+	 * to begin with, it's expected to have a .done file).
+	 */
+	if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
+		XLogArchivingActive())
+	{
+		char		origfname[MAXFNAMELEN];
+		XLogSegNo	endLogSegNo;
+
+		XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
+		XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
+
+		if (!XLogArchiveIsReadyOrDone(origfname))
+		{
+			char		origpath[MAXPGPATH];
+			char		partialfname[MAXFNAMELEN];
+			char		partialpath[MAXPGPATH];
+
+			XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
+			snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
+			snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
+
+			/*
+			 * Make sure there's no .done or .ready file for the .partial
+			 * file.
+			 */
+			XLogArchiveCleanup(partialfname);
+
+			durable_rename(origpath, partialpath, ERROR);
+			XLogArchiveNotify(partialfname);
+		}
+	}
+}
+
 /*
  * Extract timestamp from WAL record.
  *
@@ -7883,127 +7968,13 @@ StartupXLOG(void)
 	UpdateFullPageWrites();
 	LocalXLogInsertAllowed = -1;
 
+	/* Emit checkpoint or end-of-recovery record in XLOG, if required. */
 	if (InRecovery)
-	{
-		/*
-		 * Perform a checkpoint to update all our recovery activity to disk.
-		 *
-		 * Note that we write a shutdown checkpoint rather than an on-line
-		 * one. This is not particularly critical, but since we may be
-		 * assigning a new TLI, using a shutdown checkpoint allows us to have
-		 * the rule that TLI only changes in shutdown checkpoints, which
-		 * allows some extra error checking in xlog_redo.
-		 *
-		 * In promotion, only create a lightweight end-of-recovery record
-		 * instead of a full checkpoint. A checkpoint is requested later,
-		 * after we're fully out of recovery mode and already accepting
-		 * queries.
-		 */
-		if (ArchiveRecoveryRequested && IsUnderPostmaster &&
-			LocalPromoteIsTriggered)
-		{
-			promoted = true;
-
-			/*
-			 * Insert a special WAL record to mark the end of recovery, since
-			 * we aren't doing a checkpoint. That means that the checkpointer
-			 * process may likely be in the middle of a time-smoothed
-			 * restartpoint and could continue to be for minutes after this.
-			 * That sounds strange, but the effect is roughly the same and it
-			 * would be stranger to try to come out of the restartpoint and
-			 * then checkpoint. We request a checkpoint later anyway, just for
-			 * safety.
-			 */
-			CreateEndOfRecoveryRecord();
-		}
-		else
-		{
-			RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
-							  CHECKPOINT_IMMEDIATE |
-							  CHECKPOINT_WAIT);
-		}
-	}
+		promoted = PerformRecoveryXLogAction();
 
+	/* If this is archive recovery, perform post-recovery cleanup actions. */
 	if (ArchiveRecoveryRequested)
-	{
-		/*
-		 * And finally, execute the recovery_end_command, if any.
-		 */
-		if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
-			ExecuteRecoveryCommand(recoveryEndCommand,
-								   "recovery_end_command",
-								   true);
-
-		/*
-		 * We switched to a new timeline. Clean up segments on the old
-		 * timeline.
-		 *
-		 * If there are any higher-numbered segments on the old timeline,
-		 * remove them. They might contain valid WAL, but they might also be
-		 * pre-allocated files containing garbage. In any case, they are not
-		 * part of the new timeline's history so we don't need them.
-		 */
-		RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
-
-		/*
-		 * If the switch happened in the middle of a segment, what to do with
-		 * the last, partial segment on the old timeline? If we don't archive
-		 * it, and the server that created the WAL never archives it either
-		 * (e.g. because it was hit by a meteor), it will never make it to the
-		 * archive. That's OK from our point of view, because the new segment
-		 * that we created with the new TLI contains all the WAL from the old
-		 * timeline up to the switch point. But if you later try to do PITR to
-		 * the "missing" WAL on the old timeline, recovery won't find it in
-		 * the archive. It's physically present in the new file with new TLI,
-		 * but recovery won't look there when it's recovering to the older
-		 * timeline. On the other hand, if we archive the partial segment, and
-		 * the original server on that timeline is still running and archives
-		 * the completed version of the same segment later, it will fail. (We
-		 * used to do that in 9.4 and below, and it caused such problems).
-		 *
-		 * As a compromise, we rename the last segment with the .partial
-		 * suffix, and archive it. Archive recovery will never try to read
-		 * .partial segments, so they will normally go unused. But in the odd
-		 * PITR case, the administrator can copy them manually to the pg_wal
-		 * directory (removing the suffix). They can be useful in debugging,
-		 * too.
-		 *
-		 * If a .done or .ready file already exists for the old timeline,
-		 * however, we had already determined that the segment is complete, so
-		 * we can let it be archived normally. (In particular, if it was
-		 * restored from the archive to begin with, it's expected to have a
-		 * .done file).
-		 */
-		if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
-			XLogArchivingActive())
-		{
-			char		origfname[MAXFNAMELEN];
-			XLogSegNo	endLogSegNo;
-
-			XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
-			XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
-
-			if (!XLogArchiveIsReadyOrDone(origfname))
-			{
-				char		origpath[MAXPGPATH];
-				char		partialfname[MAXFNAMELEN];
-				char		partialpath[MAXPGPATH];
-
-				XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
-				snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
-				snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
-
-				/*
-				 * Make sure there's no .done or .ready file for the .partial
-				 * file.
-				 */
-				XLogArchiveCleanup(partialfname);
-
-				durable_rename(origpath, partialpath, ERROR);
-				XLogArchiveNotify(partialfname);
-			}
-		}
-	}
+		CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog);
 
 	/*
 	 * Preallocate additional log files, if wanted.
@@ -8207,6 +8178,60 @@ CheckRecoveryConsistency(void)
 	}
 }
 
+/*
+ * Perform whatever XLOG actions are necessary at end of REDO.
+ *
+ * The goal here is to make sure that we'll be able to recover properly if
+ * we crash again. If we choose to write a checkpoint, we'll write a shutdown
+ * checkpoint rather than an on-line one. This is not particularly critical,
+ * but since we may be assigning a new TLI, using a shutdown checkpoint allows
+ * us to have the rule that TLI only changes in shutdown checkpoints, which
+ * allows some extra error checking in xlog_redo.
+ */
+static bool
+PerformRecoveryXLogAction(void)
+{
+	bool		promoted = false;
+
+	/*
+	 * Perform a checkpoint to update all our recovery activity to disk.
+	 *
+	 * Note that we write a shutdown checkpoint rather than an on-line one. This
+	 * is not particularly critical, but since we may be assigning a new TLI,
+	 * using a shutdown checkpoint allows us to have the rule that TLI only
+	 * changes in shutdown checkpoints, which allows some extra error checking
+	 * in xlog_redo.
+	 *
+	 * In promotion, only create a lightweight end-of-recovery record instead of
+	 * a full checkpoint. A checkpoint is requested later, after we're fully out
+	 * of recovery mode and already accepting queries.
+	 */
+	if (ArchiveRecoveryRequested && IsUnderPostmaster &&
+		LocalPromoteIsTriggered)
+	{
+		promoted = true;
+
+		/*
+		 * Insert a special WAL record to mark the end of recovery, since we
+		 * aren't doing a checkpoint. That means that the checkpointer process
+		 * may likely be in the middle of a time-smoothed restartpoint and could
+		 * continue to be for minutes after this.  That sounds strange, but the
+		 * effect is roughly the same and it would be stranger to try to come
+		 * out of the restartpoint and then checkpoint. We request a checkpoint
+		 * later anyway, just for safety.
+		 */
+		CreateEndOfRecoveryRecord();
+	}
+	else
+	{
+		RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
+						  CHECKPOINT_IMMEDIATE |
+						  CHECKPOINT_WAIT);
+	}
+
+	return promoted;
+}
+
 /*
  * Is the system still in recovery?
  *
-- 
2.18.0

From bad3d0db320f68b083311578b1b17ff8cd1714c6 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 23 Jul 2021 14:27:51 -0400
Subject: [PATCH v36 2/4] Postpone some end-of-recovery operations relating to
 allowing WAL.

Previously, moved the code that performs whether to write a checkpoint
or an end-of-recovery record into PerformRecoveryXlogAction(), and
code performs post-archive-recovery into CleanupAfterArchiveRecovery(),
but called both the functions from the same place. Now postpone that
stuff until after we clear InRecovery and shut down the XLogReader.

We can find out of InRecovery value afterward by looking ControlFile's
DBState is needed to decide PerformRecoveryXlogAction().

This is preparatory work for a future patch that wants to allow
recovery to end at one time and only later start to allow WAL writes.
The steps that themselves write WAL clearly shouldn't happen before
we're ready to accept WAL writes, and it seems best for now to keep
the steps performed by CleanupAfterArchiveRecovery() at the same point
relative to the surrounding steps. We assume (hopefully correctly)
that the user doesn't want recovery_end_command to run until we're
committed to writing WAL on the new timeline. Until then, the
machine is still usable as a standby on the old timeline.

Aside from the value of this patch as preparatory work, this order of
operations actually seems more logical, since it means we don't
actually write any WAL until after exiting recovery.

Robert Haas, with modifications by Amul Sul.
---
 src/backend/access/transam/xlog.c | 40 +++++++++++++++++--------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 397f7d486a6..93849d8f29a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7958,24 +7958,6 @@ StartupXLOG(void)
 	XLogCtl->LogwrtRqst.Write = EndOfLog;
 	XLogCtl->LogwrtRqst.Flush = EndOfLog;
 
-	/*
-	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
-	 * record before resource manager writes cleanup WAL records or checkpoint
-	 * record is written.
-	 */
-	Insert->fullPageWrites = lastFullPageWrites;
-	LocalSetXLogInsertAllowed();
-	UpdateFullPageWrites();
-	LocalXLogInsertAllowed = -1;
-
-	/* Emit checkpoint or end-of-recovery record in XLOG, if required. */
-	if (InRecovery)
-		promoted = PerformRecoveryXLogAction();
-
-	/* If this is archive recovery, perform post-recovery cleanup actions. */
-	if (ArchiveRecoveryRequested)
-		CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog);
-
 	/*
 	 * Preallocate additional log files, if wanted.
 	 */
@@ -8027,6 +8009,28 @@ StartupXLOG(void)
 	}
 	XLogReaderFree(xlogreader);
 
+	/*
+	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
+	 * record before resource manager writes cleanup WAL records or checkpoint
+	 * record is written.
+	 */
+	Insert->fullPageWrites = lastFullPageWrites;
+	LocalSetXLogInsertAllowed();
+	UpdateFullPageWrites();
+	LocalXLogInsertAllowed = -1;
+
+	/*
+	 * Emit checkpoint or end-of-recovery record in XLOG, if the server has been
+	 * through the archive or the crash recovery.
+	 */
+	if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY ||
+		ControlFile->state == DB_IN_CRASH_RECOVERY)
+		promoted = PerformRecoveryXLogAction();
+
+	/* If this is archive recovery, perform post-recovery cleanup actions. */
+	if (ArchiveRecoveryRequested)
+		CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog);
+
 	/*
 	 * If any of the critical GUCs have changed, log them before we allow
 	 * backends to write WAL.
-- 
2.18.0

Reply via email to