Re: [Patch] ALTER SYSTEM READ ONLY

Amul Sul Wed, 15 Sep 2021 03:50:17 -0700

, On Sat, Jul 24, 2021 at 1:33 AM Robert Haas <robertmh...@gmail.com> wrote:
>
> On Thu, Jun 17, 2021 at 1:23 AM Amul Sul <sula...@gmail.com> wrote:
> > Attached is rebase for the latest master head.  Also, I added one more
> > refactoring code that deduplicates the code setting database state in the
> > control file. The same code set the database state is also needed for this
> > feature.
>
> I started studying 0001 today and found that it rearranged the order
> of operations in StartupXLOG() more than I was expecting. It does, as
> per previous discussions, move a bunch of things to the place where we
> now call XLogParamters(). But, unsatisfyingly, InRecovery = false and
> XLogReaderFree() then have to move down even further. Since the goal
> here is to get to a situation where we sometimes XLogAcceptWrites()
> after InRecovery = false, it didn't seem nice for this refactoring
> patch to still end up with a situation where this stuff happens while
> InRecovery = true. In fact, with the patch, the amount of code that
> runs with InRecovery = true actually *increases*, which is not what I
> think should be happening here. That's why the patch ends up having to
> adjust SetMultiXactIdLimit to not Assert(!InRecovery).
>
> And then I started to wonder how this was ever going to work as part
> of the larger patch set, because as you have it here,
> XLogAcceptWrites() takes arguments XLogReaderState *xlogreader,
> XLogRecPtr EndOfLog, and TimeLineID EndOfLogTLI and if the
> checkpointer is calling that at a later time after the user issues
> pg_prohibit_wal(false), it's going to have none of those things. So I
> had a quick look at that part of the code and found this in
> checkpointer.c:
>
> XLogAcceptWrites(true, NULL, InvalidXLogRecPtr, 0);
>
> For those following along from home, the additional "true" is a bool
> needChkpt argument added to XLogAcceptWrites() by 0003. Well, none of
> this is very satisfying. The whole purpose of passing the xlogreader
> is so we can figure out whether we need a checkpoint (never mind the
> question of whether the existing algorithm for determining that is
> really sensible) but now we need a second argument that basically
> serves the same purpose since one of the two callers to this function
> won't have an xlogreader. And then we're passing the EndOfLog and
> EndOfLogTLI as dummy values which seems like it's probably just
> totally wrong, but if for some reason it works correctly there sure
> don't seem to be any comments explaining why.
>
> So I started doing a bit of hacking myself and ended up with the
> attached, which I think is not completely the right thing yet but I
> think it's better than your version. I split this into three parts.
> 0001 splits up the logic that currently decides whether to write an
> end-of-recovery record or a checkpoint record and if the latter how
> the checkpoint ought to be performed into two functions.
> DetermineRecoveryXlogAction() figures out what we want to do, and
> PerformRecoveryXlogAction() does it. It also moves the code to run
> recovery_end_command and related stuff into a new function
> CleanupAfterArchiveRecovery(). 0002 then builds on this by postponing
> UpdateFullPageWrites(), PerformRecoveryXLogAction(), and
> CleanupAfterArchiveRecovery() to just before we
> XLogReportParameters(). Because of the refactoring done by 0001, this
> is only a small amount of code movement. Because of the separation
> between DetermineRecoveryXlogAction() and PerformRecoveryXlogAction(),
> the latter doesn't need the xlogreader. So we can do
> DetermineRecoveryXlogAction() at the same time as now, while the
> xlogreader is available, and then we don't need it later when we
> PerformRecoveryXlogAction(), because we already know what we need to
> know. I think this is all fine as far as it goes.
>
> My 0003 is where I see some lingering problems. It creates
> XLogAcceptWrites(), moves the appropriate stuff there, and doesn't
> need the xlogreader. But it doesn't really solve the problem of how
> checkpointer.c would be able to call this function with proper
> arguments. It is at least better in not needing two arguments to
> decide what to do, but how is checkpointer.c supposed to know what to
> pass for xlogaction? Worse yet, how is checkpointer.c supposed to know
> what to pass for EndOfLogTLI and EndOfLog? Actually, EndOfLog doesn't
> seem too problematic, because that value has been stored in four (!)
> places inside XLogCtl by this code:
>
>     LogwrtResult.Write = LogwrtResult.Flush = EndOfLog;
>
>     XLogCtl->LogwrtResult = LogwrtResult;
>
>     XLogCtl->LogwrtRqst.Write = EndOfLog;
>     XLogCtl->LogwrtRqst.Flush = EndOfLog;
>
> Presumably we could relatively easily change things around so that we
> finish one of those values ... probably one of the "write" values ..
> back out of XLogCtl instead of passing it as a parameter. That would
> work just as well from the checkpointer as from the startup process,
> and there seems to be no way for the value to change until after
> XLogAcceptWrites() has been called, so it seems fine. But that doesn't
> help for the other arguments. What I'm thinking is that we should just
> arrange to store EndOfLogTLI and xlogaction into XLogCtl also, and
> then XLogAcceptWrites() can fish those values out of there as well,
> which should be enough to make it work and do the same thing
> regardless of which process is calling it. But I have run out of time
> for today so have not explored coding that up.
>


I have spent some time thinking about making XLogAcceptWrites()
independent, and for that, we need to get rid of its arguments which
are available only in the startup process. The first argument
xlogaction deduced by the DetermineRecoveryXlogAction(). If we are able to
make this function logic independent and can deduce that xlogaction in
any process, we can skip xlogaction argument passing.

DetermineRecoveryXlogAction() function depends on a few global
variables, valid only in the startup process are InRecovery,
ArchiveRecoveryRequested & LocalPromoteIsTriggered.  Out of
three LocalPromoteIsTriggered's value is already available in shared
memory and that can be fetched by calling LocalPromoteIsTriggered().
InRecovery's value can be guessed by as long as DBState in the control
file doesn't get changed unexpectedly until XLogAcceptWrites()
executes.  If the DBState was not a clean shutdown, then surely the
server has gone through recovery. If we could rely on DBState in the
control file then we are good to go. For the last one,
ArchiveRecoveryRequested, I don't see any existing and appropriate
shared memory or control file information, so that can be identified
if the archive recovery was requested or not. Initially, I thought to
use SharedRecoveryState which is always set to RECOVERY_STATE_ARCHIVE,
if  the archive recovery requested. But there is another case where
SharedRecoveryState could be RECOVERY_STATE_ARCHIVE irrespective of
ArchiveRecoveryRequested value, that is the presence of a backup label
file.  If we want to use SharedRecoveryState, we need one more state
which could differentiate between ArchiveRecoveryRequested and the
backup label file presence case.  To move ahead, I have copied
ArchiveRecoveryRequested into shared memory and it will be cleared
once archive cleanup is finished. With all these changes, we could get
rid of xlogaction argument and DetermineRecoveryXlogAction() function.
Could move its logic to PerformRecoveryXLogAction() directly.

Now, the remaining two arguments of XLogAcceptWrites() are required
for the CleanupAfterArchiveRecovery() function. Along with these two
arguments, this function requires ArchiveRecoveryRequested and
ThisTimeLineID which are again global variables.  With the previous
changes, we have got ArchiveRecoveryRequested into shared memory.
And for ThisTimeLineID, I don't think we need to do anything since this
value is available with all the backend as per the following comment:

"
/*
 * ThisTimeLineID will be same in all backends --- it identifies current
 * WAL timeline for the database system.
 */
TimeLineID  ThisTimeLineID = 0;
"

In addition to the four places that Robert has pointed for EndOfLog,
XLogCtl->lastSegSwitchLSN also holds EndOfLog value and that doesn't
seem to change until WAL write is enabled. For EndOfLogTLI, I think we
can safely use XLogCtl->replayEndTLI. Currently, The EndOfLogTLI is
the timeline ID of the last record that xlogreader reads, but this
xlogreader was simply re-fetching the last record which we have
replied in redo loop if it was in recovery, if not in recovery, we
don't need to worry since this value is needed only in case of
ArchiveRecoveryRequested = true, which implicitly forces redo and sets
replayEndTLI value.

With all the above changes XLogAcceptWrites() can be called from other
processes but I haven't tested that. This finding is still not
complete and not too clean, perhaps, posting the patches with
aforesaid changes just to confirm the direction and forward the
discussion, thanks.

Regards,
Amul

From 15329d85e26967602e5aedb14e10f31a8631e33c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 23 Jul 2021 13:07:56 -0400
Subject: [PATCH v34 1/3] Refactor some end-of-recovery code out of
 StartupXLOG().

Moved the code that performs whether to write a checkpoint or an
end-of-recovery record into PerformRecoveryXlogAction().

Also create a new function CleanupAfterArchiveRecovery() to
perform a few tasks that we want to do after we've actually exited
archive recovery but before we start accepting new WAL writes.
This is straightforward code movement to make StartupXLOG() a
little bit shorter and a little bit easier to understand.
---
 src/backend/access/transam/xlog.c | 277 +++++++++++++++++-------------
 1 file changed, 159 insertions(+), 118 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e51a7a749da..cd1d87c14b3 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -637,6 +637,12 @@ typedef struct XLogCtlData
 	 */
 	RecoveryState SharedRecoveryState;
 
+	/*
+	 * SharedArchiveRecoveryRequested indicates whether an archive recovery is
+	 * requested. Protected by info_lck.
+	 */
+	bool		SharedArchiveRecoveryRequested;
+
 	/*
 	 * SharedHotStandbyActive indicates if we allow hot standby queries to be
 	 * run.  Protected by info_lck.
@@ -880,6 +886,7 @@ static MemoryContext walDebugCxt = NULL;
 static void readRecoverySignalFile(void);
 static void validateRecoveryParameters(void);
 static void exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog);
+static void CleanupAfterArchiveRecovery(void);
 static bool recoveryStopsBefore(XLogReaderState *record);
 static bool recoveryStopsAfter(XLogReaderState *record);
 static char *getRecoveryStopReason(void);
@@ -925,6 +932,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
 							  int emode, bool fetching_ckpt);
 static void CheckRecoveryConsistency(void);
+static bool PerformRecoveryXLogAction(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
@@ -5223,6 +5231,7 @@ XLOGShmemInit(void)
 	 */
 	XLogCtl->XLogCacheBlck = XLOGbuffers - 1;
 	XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
+	XLogCtl->SharedArchiveRecoveryRequested = false;
 	XLogCtl->SharedHotStandbyActive = false;
 	XLogCtl->InstallXLogFileSegmentActive = false;
 	XLogCtl->SharedPromoteIsTriggered = false;
@@ -5507,6 +5516,12 @@ readRecoverySignalFile(void)
 		ereport(FATAL,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("standby mode is not supported by single-user servers")));
+
+	/*
+	 * Remember archive recovery request in shared memory state.  A lock is not
+	 * needed since we are the only ones who updating this.
+	 */
+	XLogCtl->SharedArchiveRecoveryRequested = ArchiveRecoveryRequested;
 }
 
 static void
@@ -5694,6 +5709,95 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
 			(errmsg("archive recovery complete")));
 }
 
+/*
+ * Perform cleanup actions at the conclusion of archive recovery.
+ */
+static void
+CleanupAfterArchiveRecovery(void)
+{
+	XLogRecPtr EndOfLog;
+
+	/*
+	 * Execute the recovery_end_command, if any.
+	 */
+	if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
+		ExecuteRecoveryCommand(recoveryEndCommand,
+							   "recovery_end_command",
+							   true);
+
+	/*
+	 * We switched to a new timeline. Clean up segments on the old
+	 * timeline.
+	 *
+	 * If there are any higher-numbered segments on the old timeline,
+	 * remove them. They might contain valid WAL, but they might also be
+	 * pre-allocated files containing garbage. In any case, they are not
+	 * part of the new timeline's history so we don't need them.
+	 */
+	(void) GetLastSegSwitchData(&EndOfLog);
+	RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
+
+	/*
+	 * If the switch happened in the middle of a segment, what to do with
+	 * the last, partial segment on the old timeline? If we don't archive
+	 * it, and the server that created the WAL never archives it either
+	 * (e.g. because it was hit by a meteor), it will never make it to the
+	 * archive. That's OK from our point of view, because the new segment
+	 * that we created with the new TLI contains all the WAL from the old
+	 * timeline up to the switch point. But if you later try to do PITR to
+	 * the "missing" WAL on the old timeline, recovery won't find it in
+	 * the archive. It's physically present in the new file with new TLI,
+	 * but recovery won't look there when it's recovering to the older
+	 * timeline. On the other hand, if we archive the partial segment, and
+	 * the original server on that timeline is still running and archives
+	 * the completed version of the same segment later, it will fail. (We
+	 * used to do that in 9.4 and below, and it caused such problems).
+	 *
+	 * As a compromise, we rename the last segment with the .partial
+	 * suffix, and archive it. Archive recovery will never try to read
+	 * .partial segments, so they will normally go unused. But in the odd
+	 * PITR case, the administrator can copy them manually to the pg_wal
+	 * directory (removing the suffix). They can be useful in debugging,
+	 * too.
+	 *
+	 * If a .done or .ready file already exists for the old timeline,
+	 * however, we had already determined that the segment is complete, so
+	 * we can let it be archived normally. (In particular, if it was
+	 * restored from the archive to begin with, it's expected to have a
+	 * .done file).
+	 */
+	if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
+		XLogArchivingActive())
+	{
+		char		origfname[MAXFNAMELEN];
+		XLogSegNo	endLogSegNo;
+		TimeLineID EndOfLogTLI = XLogCtl->replayEndTLI;
+
+		XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
+		XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
+
+		if (!XLogArchiveIsReadyOrDone(origfname))
+		{
+			char		origpath[MAXPGPATH];
+			char		partialfname[MAXFNAMELEN];
+			char		partialpath[MAXPGPATH];
+
+			XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
+			snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
+			snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
+
+			/*
+			 * Make sure there's no .done or .ready file for the .partial
+			 * file.
+			 */
+			XLogArchiveCleanup(partialfname);
+
+			durable_rename(origpath, partialpath, ERROR);
+			XLogArchiveNotify(partialfname);
+		}
+	}
+}
+
 /*
  * Extract timestamp from WAL record.
  *
@@ -7883,127 +7987,13 @@ StartupXLOG(void)
 	UpdateFullPageWrites();
 	LocalXLogInsertAllowed = -1;
 
+	/* Emit checkpoint or end-of-recovery record in XLOG, if required. */
 	if (InRecovery)
-	{
-		/*
-		 * Perform a checkpoint to update all our recovery activity to disk.
-		 *
-		 * Note that we write a shutdown checkpoint rather than an on-line
-		 * one. This is not particularly critical, but since we may be
-		 * assigning a new TLI, using a shutdown checkpoint allows us to have
-		 * the rule that TLI only changes in shutdown checkpoints, which
-		 * allows some extra error checking in xlog_redo.
-		 *
-		 * In promotion, only create a lightweight end-of-recovery record
-		 * instead of a full checkpoint. A checkpoint is requested later,
-		 * after we're fully out of recovery mode and already accepting
-		 * queries.
-		 */
-		if (ArchiveRecoveryRequested && IsUnderPostmaster &&
-			LocalPromoteIsTriggered)
-		{
-			promoted = true;
-
-			/*
-			 * Insert a special WAL record to mark the end of recovery, since
-			 * we aren't doing a checkpoint. That means that the checkpointer
-			 * process may likely be in the middle of a time-smoothed
-			 * restartpoint and could continue to be for minutes after this.
-			 * That sounds strange, but the effect is roughly the same and it
-			 * would be stranger to try to come out of the restartpoint and
-			 * then checkpoint. We request a checkpoint later anyway, just for
-			 * safety.
-			 */
-			CreateEndOfRecoveryRecord();
-		}
-		else
-		{
-			RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
-							  CHECKPOINT_IMMEDIATE |
-							  CHECKPOINT_WAIT);
-		}
-	}
+		promoted = PerformRecoveryXLogAction();
 
+	/* If this is archive recovery, perform post-recovery cleanup actions. */
 	if (ArchiveRecoveryRequested)
-	{
-		/*
-		 * And finally, execute the recovery_end_command, if any.
-		 */
-		if (recoveryEndCommand && strcmp(recoveryEndCommand, "") != 0)
-			ExecuteRecoveryCommand(recoveryEndCommand,
-								   "recovery_end_command",
-								   true);
-
-		/*
-		 * We switched to a new timeline. Clean up segments on the old
-		 * timeline.
-		 *
-		 * If there are any higher-numbered segments on the old timeline,
-		 * remove them. They might contain valid WAL, but they might also be
-		 * pre-allocated files containing garbage. In any case, they are not
-		 * part of the new timeline's history so we don't need them.
-		 */
-		RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
-
-		/*
-		 * If the switch happened in the middle of a segment, what to do with
-		 * the last, partial segment on the old timeline? If we don't archive
-		 * it, and the server that created the WAL never archives it either
-		 * (e.g. because it was hit by a meteor), it will never make it to the
-		 * archive. That's OK from our point of view, because the new segment
-		 * that we created with the new TLI contains all the WAL from the old
-		 * timeline up to the switch point. But if you later try to do PITR to
-		 * the "missing" WAL on the old timeline, recovery won't find it in
-		 * the archive. It's physically present in the new file with new TLI,
-		 * but recovery won't look there when it's recovering to the older
-		 * timeline. On the other hand, if we archive the partial segment, and
-		 * the original server on that timeline is still running and archives
-		 * the completed version of the same segment later, it will fail. (We
-		 * used to do that in 9.4 and below, and it caused such problems).
-		 *
-		 * As a compromise, we rename the last segment with the .partial
-		 * suffix, and archive it. Archive recovery will never try to read
-		 * .partial segments, so they will normally go unused. But in the odd
-		 * PITR case, the administrator can copy them manually to the pg_wal
-		 * directory (removing the suffix). They can be useful in debugging,
-		 * too.
-		 *
-		 * If a .done or .ready file already exists for the old timeline,
-		 * however, we had already determined that the segment is complete, so
-		 * we can let it be archived normally. (In particular, if it was
-		 * restored from the archive to begin with, it's expected to have a
-		 * .done file).
-		 */
-		if (XLogSegmentOffset(EndOfLog, wal_segment_size) != 0 &&
-			XLogArchivingActive())
-		{
-			char		origfname[MAXFNAMELEN];
-			XLogSegNo	endLogSegNo;
-
-			XLByteToPrevSeg(EndOfLog, endLogSegNo, wal_segment_size);
-			XLogFileName(origfname, EndOfLogTLI, endLogSegNo, wal_segment_size);
-
-			if (!XLogArchiveIsReadyOrDone(origfname))
-			{
-				char		origpath[MAXPGPATH];
-				char		partialfname[MAXFNAMELEN];
-				char		partialpath[MAXPGPATH];
-
-				XLogFilePath(origpath, EndOfLogTLI, endLogSegNo, wal_segment_size);
-				snprintf(partialfname, MAXFNAMELEN, "%s.partial", origfname);
-				snprintf(partialpath, MAXPGPATH, "%s.partial", origpath);
-
-				/*
-				 * Make sure there's no .done or .ready file for the .partial
-				 * file.
-				 */
-				XLogArchiveCleanup(partialfname);
-
-				durable_rename(origpath, partialpath, ERROR);
-				XLogArchiveNotify(partialfname);
-			}
-		}
-	}
+		CleanupAfterArchiveRecovery();
 
 	/*
 	 * Preallocate additional log files, if wanted.
@@ -8207,6 +8197,57 @@ CheckRecoveryConsistency(void)
 	}
 }
 
+/*
+ * Perform whatever XLOG actions are necessary at end of REDO.
+ *
+ * The goal here is to make sure that we'll be able to recover properly if
+ * we crash again. If we choose to write a checkpoint, we'll write a shutdown
+ * checkpoint rather than an on-line one. This is not particularly critical,
+ * but since we may be assigning a new TLI, using a shutdown checkpoint allows
+ * us to have the rule that TLI only changes in shutdown checkpoints, which
+ * allows some extra error checking in xlog_redo.
+ */
+static bool
+PerformRecoveryXLogAction(void)
+{
+	bool		promoted = false;
+
+	/*
+	 * In promotion, only create a lightweight end-of-recovery record
+	 * instead of a full checkpoint. A checkpoint is requested later,
+	 * after we're fully out of recovery mode and already accepting
+	 * queries.
+	 *
+	 * NB: Check does not rely on the global variables are valid only in the
+	 * startup process only.
+	 */
+	if (((volatile XLogCtlData *) XLogCtl)->SharedArchiveRecoveryRequested &&
+		IsUnderPostmaster && PromoteIsTriggered())
+	{
+		promoted = true;
+
+		/*
+		 * Insert a special WAL record to mark the end of recovery, since
+		 * we aren't doing a checkpoint. That means that the checkpointer
+		 * process may likely be in the middle of a time-smoothed
+		 * restartpoint and could continue to be for minutes after this.
+		 * That sounds strange, but the effect is roughly the same and it
+		 * would be stranger to try to come out of the restartpoint and
+		 * then checkpoint. We request a checkpoint later anyway, just for
+		 * safety.
+		 */
+		CreateEndOfRecoveryRecord();
+	}
+	else
+	{
+		RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
+						  CHECKPOINT_IMMEDIATE |
+						  CHECKPOINT_WAIT);
+	}
+
+	return promoted;
+}
+
 /*
  * Is the system still in recovery?
  *
-- 
2.18.0

From c75ab1ae31afc00c2c4e22902fe421d344d64add Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Sep 2021 01:09:42 -0400
Subject: [PATCH v34 3/3] Create XLogAcceptWrites() function with code from
 StartupXLOG().

This is just code movement. A future patch will want to defer the
call to XLogAcceptWrites() until a later time, rather than doing
it as soon as we finish applying WAL, but here we're just grouping
related code together into a new function.
---
 src/backend/access/transam/xlog.c | 84 ++++++++++++++++++++-----------
 1 file changed, 54 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ffec084ee98..30600db19f4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -932,6 +932,7 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
 							  int emode, bool fetching_ckpt);
 static void CheckRecoveryConsistency(void);
+static bool XLogAcceptWrites(void);
 static bool PerformRecoveryXLogAction(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
@@ -6616,7 +6617,7 @@ StartupXLOG(void)
 	DBState		dbstate_at_startup;
 	XLogReaderState *xlogreader;
 	XLogPageReadPrivate private;
-	bool		promoted = false;
+	bool		promoted;
 	struct stat st;
 
 	/*
@@ -8029,38 +8030,14 @@ StartupXLOG(void)
 	XLogReaderFree(xlogreader);
 
 	/*
-	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
-	 * record before resource manager writes cleanup WAL records or checkpoint
-	 * record is written.
+	 * Update full_page_writes in shared memory, and later whenever wal write
+	 * permitted, write an XLOG_FPW_CHANGE record before resource manager
+	 * writes cleanup WAL records or checkpoint record is written.
 	 */
 	Insert->fullPageWrites = lastFullPageWrites;
-	LocalSetXLogInsertAllowed();
-	UpdateFullPageWrites();
-	LocalXLogInsertAllowed = -1;
-
-	/*
-	 * Emit checkpoint or end-of-recovery record in XLOG, if the server wasn't
-	 * shut down cleanly, which been through recovery.
-	 */
-	if (ControlFile->state != DB_SHUTDOWNED)
-		promoted = PerformRecoveryXLogAction();
-
-	/* If this is archive recovery, perform post-recovery cleanup actions. */
-	if (ArchiveRecoveryRequested)
-		CleanupAfterArchiveRecovery();
-
-	/*
-	 * If any of the critical GUCs have changed, log them before we allow
-	 * backends to write WAL.
-	 */
-	LocalSetXLogInsertAllowed();
-	XLogReportParameters();
 
-	/*
-	 * Local WAL inserts enabled, so it's time to finish initialization of
-	 * commit timestamp.
-	 */
-	CompleteCommitTsInitialization();
+	/* Prepare to accept WAL writes. */
+	promoted = XLogAcceptWrites();
 
 	/*
 	 * All done with end-of-recovery actions.
@@ -8104,6 +8081,53 @@ StartupXLOG(void)
 		RequestCheckpoint(CHECKPOINT_FORCE);
 }
 
+/*
+ * Prepare to accept WAL writes.
+ */
+static bool
+XLogAcceptWrites(void)
+{
+	bool		promoted = false;
+
+	/* Write an XLOG_FPW_CHANGE record */
+	LocalSetXLogInsertAllowed();
+	UpdateFullPageWrites();
+	LocalXLogInsertAllowed = -1;
+
+	/*
+	 * Emit checkpoint or end-of-recovery record in XLOG, if the server wasn't
+	 * shut down cleanly, which been through recovery.
+	 */
+	if (ControlFile->state != DB_SHUTDOWNED)
+		promoted = PerformRecoveryXLogAction();
+
+	/* If this is archive recovery, perform post-recovery cleanup actions. */
+	if (((volatile XLogCtlData *) XLogCtl)->SharedArchiveRecoveryRequested)
+	{
+		CleanupAfterArchiveRecovery();
+
+		/* Done with archive recovery cleanup, clear the share memory state. */
+		SpinLockAcquire(&XLogCtl->info_lck);
+		XLogCtl->SharedArchiveRecoveryRequested = false;
+		SpinLockRelease(&XLogCtl->info_lck);
+	}
+
+	/*
+	 * If any of the critical GUCs have changed, log them before we allow
+	 * backends to write WAL.
+	 */
+	LocalSetXLogInsertAllowed();
+	XLogReportParameters();
+
+	/*
+	 * Local WAL inserts enabled, so it's time to finish initialization of
+	 * commit timestamp.
+	 */
+	CompleteCommitTsInitialization();
+
+	return promoted;
+}
+
 /*
  * Checks if recovery has reached a consistent state. When consistency is
  * reached and we have a valid starting standby snapshot, tell postmaster
-- 
2.18.0

From f43812b8984d4d07f2142019451d8ba14fbece58 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Sep 2021 00:39:03 -0400
Subject: [PATCH v34 2/3] Postpone some end-of-recovery operations relating to
 allowing WAL.

Previously, we issued XLOG_FPW_CHANGE and either
XLOG_CHECKPOINT_SHUTDOWN or XLOG_END_OF_RECOVERY while still
technically in recovery, and also performed post-archive-recovery
cleanup steps at that point. Postpone that stuff until after we clear
InRecovery and shut down the XLogReader.

This is preparatory work for a future patch that wants to allow
recovery to end at one time and only later start to allow WAL writes.
The steps that themselves write WAL clearly shouldn't happen before
we're ready to accept WAL writes, and it seems best for now to keep
the steps performed by CleanupAfterArchiveRecovery() at the same point
relative to the surrounding steps. We assume (hopefully correctly)
that the user doesn't want recovery_end_command to run until we're
committed to writing WAL on the new timeline. Until then, the
machine is still usable as a standby on the old timeline.

Aside from the value of this patch as preparatory work, this order of
operations actually seems more logical, since it means we don't
actually write any WAL until after exiting recovery.
---
 src/backend/access/transam/xlog.c | 39 +++++++++++++++++--------------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index cd1d87c14b3..ffec084ee98 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7977,24 +7977,6 @@ StartupXLOG(void)
 	XLogCtl->LogwrtRqst.Write = EndOfLog;
 	XLogCtl->LogwrtRqst.Flush = EndOfLog;
 
-	/*
-	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
-	 * record before resource manager writes cleanup WAL records or checkpoint
-	 * record is written.
-	 */
-	Insert->fullPageWrites = lastFullPageWrites;
-	LocalSetXLogInsertAllowed();
-	UpdateFullPageWrites();
-	LocalXLogInsertAllowed = -1;
-
-	/* Emit checkpoint or end-of-recovery record in XLOG, if required. */
-	if (InRecovery)
-		promoted = PerformRecoveryXLogAction();
-
-	/* If this is archive recovery, perform post-recovery cleanup actions. */
-	if (ArchiveRecoveryRequested)
-		CleanupAfterArchiveRecovery();
-
 	/*
 	 * Preallocate additional log files, if wanted.
 	 */
@@ -8046,6 +8028,27 @@ StartupXLOG(void)
 	}
 	XLogReaderFree(xlogreader);
 
+	/*
+	 * Update full_page_writes in shared memory and write an XLOG_FPW_CHANGE
+	 * record before resource manager writes cleanup WAL records or checkpoint
+	 * record is written.
+	 */
+	Insert->fullPageWrites = lastFullPageWrites;
+	LocalSetXLogInsertAllowed();
+	UpdateFullPageWrites();
+	LocalXLogInsertAllowed = -1;
+
+	/*
+	 * Emit checkpoint or end-of-recovery record in XLOG, if the server wasn't
+	 * shut down cleanly, which been through recovery.
+	 */
+	if (ControlFile->state != DB_SHUTDOWNED)
+		promoted = PerformRecoveryXLogAction();
+
+	/* If this is archive recovery, perform post-recovery cleanup actions. */
+	if (ArchiveRecoveryRequested)
+		CleanupAfterArchiveRecovery();
+
 	/*
 	 * If any of the critical GUCs have changed, log them before we allow
 	 * backends to write WAL.
-- 
2.18.0

Re: [Patch] ALTER SYSTEM READ ONLY

Reply via email to