On Wed, Jul 19, 2023 at 11:21:17AM -0700, David Zhang wrote:
> 1) simply start server from a base backup
> 
> FATAL:  could not find recovery.signal or standby.signal when recovering
> with backup_label
> 
> HINT:  If you are restoring from a backup, touch
> "/media/david/disk1/pg_backup1/recovery.signal" or
> "/media/david/disk1/pg_backup1/standby.signal" and add required recovery
> options.

Note the difference when --write-recovery-conf is specified, where a
standby.conf is created with a primary_conninfo in
postgresql.auto.conf.  So, yes, that's expected by default with the
patch.

> 2) touch a recovery.signal file and then try to start the server, the
> following error was encountered:
> 
> FATAL:  must specify restore_command when standby mode is not enabled

Yes, that's also something expected in the scope of the v1 posted.
The idea behind this restriction is that specifying recovery.signal is
equivalent to asking for archive recovery, but not specifying
restore_command is equivalent to not provide any options to be able to
recover.  See validateRecoveryParameters() and note that this
restriction exists since the beginning of times, introduced in commit
66ec2db.  I tend to agree that there is something to be said about
self-contained backups taken from pg_basebackup, though, as these
would fail if no restore_command is specified, and this restriction is
in place before Postgres has introduced replication and easier ways to
have base backups.  As a whole, I think that there is a good argument
in favor of removing this restriction for the case where archive
recovery is requested if users have all their WAL in pg_wal/ to be
able to recover up to a consistent point, keeping these GUC
restrictions if requesting a standby (not recovery.signal, only
standby.signal).

> 3) touch a standby.signal file, then the server successfully started,
> however, it operates in standby mode, whereas the intended behavior was for
> it to function as a primary server.

standby.signal implies that the server will start in standby mode.  If
one wants to deploy a new primary, that would imply a timeline jump at
the end of recovery, you would need to specify recovery.signal
instead.

We need more discussions and more opinions, but the discussion has
stalled for a few months now.  In case, I am adding Thomas Munro in CC
who has mentioned to me at PGcon that he was interested in this patch
(this thread's problem is not directly related to the fact that the
checkpointer now runs in crash recovery, though).

For now, I am attaching a rebased v2.
--
Michael
From 9945a86161b1cf23f7002a8f1f4bce20061693d6 Mon Sep 17 00:00:00 2001
From: Michael Paquier <mich...@paquier.xyz>
Date: Thu, 20 Jul 2023 07:59:56 +0900
Subject: [PATCH v2] Strengthen use of ArchiveRecoveryRequested and
 InArchiveRecovery

---
 src/backend/access/transam/xlogrecovery.c     | 112 +++++++++++-------
 src/bin/pg_basebackup/t/010_pg_basebackup.pl  |   3 +-
 src/bin/pg_rewind/t/008_min_recovery_point.pl |   1 +
 3 files changed, 75 insertions(+), 41 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index becc2bda62..86899452b4 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -125,15 +125,19 @@ static TimeLineID curFileTLI;
 
 /*
  * When ArchiveRecoveryRequested is set, archive recovery was requested,
- * ie. signal files were present.  When InArchiveRecovery is set, we are
- * currently recovering using offline XLOG archives.  These variables are only
- * valid in the startup process.
+ * ie. signal files or backup_label were present.  When InArchiveRecovery is
+ * set, we are currently recovering using offline XLOG archives.  These
+ + variables are only valid in the startup process.  Note that the presence of
+ * a backup_label file forces archive recovery even if there is no signal
+ * file.
  *
  * When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're
  * currently performing crash recovery using only XLOG files in pg_wal, but
  * will switch to using offline XLOG archives as soon as we reach the end of
  * WAL in pg_wal.
-*/
+ *
+ * InArchiveRecovery should never be set without ArchiveRecoveryRequested.
+ */
 bool		ArchiveRecoveryRequested = false;
 bool		InArchiveRecovery = false;
 
@@ -540,42 +544,7 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 	readRecoverySignalFile();
 	validateRecoveryParameters();
 
-	if (ArchiveRecoveryRequested)
-	{
-		if (StandbyModeRequested)
-			ereport(LOG,
-					(errmsg("entering standby mode")));
-		else if (recoveryTarget == RECOVERY_TARGET_XID)
-			ereport(LOG,
-					(errmsg("starting point-in-time recovery to XID %u",
-							recoveryTargetXid)));
-		else if (recoveryTarget == RECOVERY_TARGET_TIME)
-			ereport(LOG,
-					(errmsg("starting point-in-time recovery to %s",
-							timestamptz_to_str(recoveryTargetTime))));
-		else if (recoveryTarget == RECOVERY_TARGET_NAME)
-			ereport(LOG,
-					(errmsg("starting point-in-time recovery to \"%s\"",
-							recoveryTargetName)));
-		else if (recoveryTarget == RECOVERY_TARGET_LSN)
-			ereport(LOG,
-					(errmsg("starting point-in-time recovery to WAL location (LSN) \"%X/%X\"",
-							LSN_FORMAT_ARGS(recoveryTargetLSN))));
-		else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
-			ereport(LOG,
-					(errmsg("starting point-in-time recovery to earliest consistent point")));
-		else
-			ereport(LOG,
-					(errmsg("starting archive recovery")));
-	}
-
-	/*
-	 * Take ownership of the wakeup latch if we're going to sleep during
-	 * recovery.
-	 */
-	if (ArchiveRecoveryRequested)
-		OwnLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
-
+	/* Set the WAL reading processor, as backup_label may need it */
 	private = palloc0(sizeof(XLogPageReadPrivate));
 	xlogreader =
 		XLogReaderAllocate(wal_segment_size, NULL,
@@ -609,11 +578,30 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 	replay_image_masked = (char *) palloc(BLCKSZ);
 	primary_image_masked = (char *) palloc(BLCKSZ);
 
+	/*
+	 * Read the backup_label file.  We want to run this part of the recovery
+	 * process after checking for signal files and perform validation of the
+	 * recovery parameters, as it may be possible that a backup needs to be
+	 * run, but no signal files have been set.
+	 */
 	if (read_backup_label(&CheckPointLoc, &CheckPointTLI, &backupEndRequired,
 						  &backupFromStandby))
 	{
 		List	   *tablespaces = NIL;
 
+		if (!ArchiveRecoveryRequested)
+			ereport(FATAL,
+					(errmsg("could not find recovery.signal or standby.signal when recovering with backup_label"),
+					 errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" or \"%s/standby.signal\" and add required recovery options.",
+							 DataDir, DataDir)));
+
+		/*
+		 * Take ownership of the wakeup latch if we're going to sleep during
+		 * recovery if needed.
+		 */
+		if (ArchiveRecoveryRequested)
+			OwnLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
+
 		/*
 		 * Archive recovery was requested, and thanks to the backup label
 		 * file, we know how far we need to replay to reach consistency. Enter
@@ -706,6 +694,15 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 	}
 	else
 	{
+		/* No backup_label file has been found if we are here. */
+
+		/*
+		 * Take ownership of the wakeup latch if we're going to sleep during
+		 * recovery if needed.
+		 */
+		if (ArchiveRecoveryRequested)
+			OwnLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
+
 		/*
 		 * If tablespace_map file is present without backup_label file, there
 		 * is no use of such file.  There is no harm in retaining it, but it
@@ -789,6 +786,35 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr,
 		wasShutdown = ((record->xl_info & ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN);
 	}
 
+	if (ArchiveRecoveryRequested)
+	{
+		if (StandbyModeRequested)
+			ereport(LOG,
+					(errmsg("entering standby mode")));
+		else if (recoveryTarget == RECOVERY_TARGET_XID)
+			ereport(LOG,
+					(errmsg("starting point-in-time recovery to XID %u",
+							recoveryTargetXid)));
+		else if (recoveryTarget == RECOVERY_TARGET_TIME)
+			ereport(LOG,
+					(errmsg("starting point-in-time recovery to %s",
+							timestamptz_to_str(recoveryTargetTime))));
+		else if (recoveryTarget == RECOVERY_TARGET_NAME)
+			ereport(LOG,
+					(errmsg("starting point-in-time recovery to \"%s\"",
+							recoveryTargetName)));
+		else if (recoveryTarget == RECOVERY_TARGET_LSN)
+			ereport(LOG,
+					(errmsg("starting point-in-time recovery to WAL location (LSN) \"%X/%X\"",
+							LSN_FORMAT_ARGS(recoveryTargetLSN))));
+		else if (recoveryTarget == RECOVERY_TARGET_IMMEDIATE)
+			ereport(LOG,
+					(errmsg("starting point-in-time recovery to earliest consistent point")));
+		else
+			ereport(LOG,
+					(errmsg("starting archive recovery")));
+	}
+
 	/*
 	 * If the location of the checkpoint record is not on the expected
 	 * timeline in the history of the requested timeline, we cannot proceed:
@@ -1574,6 +1600,12 @@ ShutdownWalRecovery(void)
 	 */
 	if (ArchiveRecoveryRequested)
 		DisownLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
+
+	/*
+	 * InArchiveRecovery should never have been set without
+	 * ArchiveRecoveryRequested.
+	 */
+	Assert(ArchiveRecoveryRequested || !InArchiveRecovery);
 }
 
 /*
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index b9f5e1266b..b9e54f4562 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -392,7 +392,8 @@ SKIP:
 	my $node2 = PostgreSQL::Test::Cluster->new('replica');
 
 	# Recover main data directory
-	$node2->init_from_backup($node, 'tarbackup2', tar_program => $tar);
+	$node2->init_from_backup($node, 'tarbackup2', tar_program => $tar,
+	    has_restoring => 1);
 
 	# Recover tablespace into a new directory (not where it was!)
 	my $repTsDir = "$tempdir/tblspc1replica";
diff --git a/src/bin/pg_rewind/t/008_min_recovery_point.pl b/src/bin/pg_rewind/t/008_min_recovery_point.pl
index d4c89451e6..1cff5b7019 100644
--- a/src/bin/pg_rewind/t/008_min_recovery_point.pl
+++ b/src/bin/pg_rewind/t/008_min_recovery_point.pl
@@ -152,6 +152,7 @@ move(
 	"$tmp_folder/node_2-postgresql.conf.tmp",
 	"$node_2_pgdata/postgresql.conf");
 
+$node_2->append_conf('standby.signal', '');
 $node_2->start;
 
 # Check contents of the test tables after rewind. The rows inserted in node 3
-- 
2.40.1

Attachment: signature.asc
Description: PGP signature

Reply via email to