Hi everyone,

Michael, thank you for outlining your alternative approach.
After rethinking the current patch state with a clearer vision, I realized
that simply truncating the postgresql.auto.conf file is sufficient. All
modifications made by pg_createsubscriber in this file are append-only, so
truncation reliably restores it to its original state without adding extra
logic. This keeps the patch small and clean.

For older versions using recovery.conf, the situation differs — since that
file is fully rewritten during recovery setup, we instead restore the
previously saved original file using a durable rename.

Regarding debugging: the contents are not entirely lost. pg_createsubscriber
already prints the new recovery configuration as debug output, so the full
parameter set remains visible in the logs for inspection when needed. My
point is that adding include directives isn't needed, as we already have
debug output, and, moreover, they aren't applied to recovery.conf.

Robert, this scenario actually occurred in production at one of our
customer environments. Even though this workflow may be uncommon,
PostgreSQL should still handle it gracefully. The fact that the server can
end up in a state where it cannot start because it fails to reach a
recovery target point far in the past suggests a potential area for
improvement in the recovery process. It would be helpful if the system
could detect such a case — where the recovery target precedes the current
consistent point — and either skip recovery or emit a clear diagnostic
message rather than failing to start.

Best regards,
Vinter Alena
From e156a12d97589d53682bbe3dbff7b1608789cf54 Mon Sep 17 00:00:00 2001
From: Alena Vinter <[email protected]>
Date: Mon, 27 Oct 2025 13:02:02 +0700
Subject: [PATCH 1/2] Reseting recovery target parameters in
 `pg_createsubscriber`

The utility sets recovery target params for correct recovery before
conversion a physical replica to a logical one but does not reset them
afterward. It may cause recovery failures in certain scenarios.
For example, if recovery begins from a checkpoint where no WAL records
need to be applied, the system may incorrectly determine that the
recovery target was never reached because these parameters remain
active.

This change ensures all recovery parameters are properly reset after
conversion to prevent such edge cases.
---
 src/bin/pg_basebackup/pg_createsubscriber.c   | 100 +++++++++++++++---
 .../t/040_pg_createsubscriber.pl              |  31 ++++++
 2 files changed, 115 insertions(+), 16 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index 9fa5caaf91d..c5a8997e5db 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -12,6 +12,7 @@
  */
 
 #include "postgres_fe.h"
+#include "utils/elog.h"
 
 #include <sys/stat.h>
 #include <sys/time.h>
@@ -28,9 +29,11 @@
 #include "fe_utils/string_utils.h"
 #include "fe_utils/version.h"
 #include "getopt_long.h"
+#include "storage/fd.h"
 
 #define	DEFAULT_SUB_PORT	"50432"
 #define	OBJECTTYPE_PUBLICATIONS  0x0001
+#define RECOVERY_CONF_PREFIX ".original"
 
 /* Command-line options */
 struct CreateSubscriberOptions
@@ -155,6 +158,11 @@ static char *subscriber_dir = NULL;
 
 static bool recovery_ended = false;
 static bool standby_running = false;
+static bool recovery_params_set = false;
+
+static const char *recovery_conf_filename = NULL;	/* cached recovery conf
+													 * filename */
+static off_t recovery_conf_size = 0;	/* size of original recovery conf file */
 
 enum WaitPMResult
 {
@@ -164,19 +172,59 @@ enum WaitPMResult
 
 
 /*
- * Cleanup objects that were created by pg_createsubscriber if there is an
- * error.
+ * Clean up objects created by pg_createsubscriber.
  *
- * Publications and replication slots are created on primary. Depending on the
- * step it failed, it should remove the already created objects if it is
- * possible (sometimes it won't work due to a connection issue).
- * There is no cleanup on the target server. The steps on the target server are
- * executed *after* promotion, hence, at this point, a failure means recreate
- * the physical replica and start again.
+ * Publications and replication slots are created on the primary. Depending on
+ * the step where it failed, already created objects should be removed if
+ * possible (sometimes this won't work due to a connection issue).
+ * There is no cleanup on the target server *after* its promotion, because any
+ * failure at this point means recreating the physical replica and starting
+ * again.
  */
 static void
 cleanup_objects_atexit(void)
 {
+	/*
+	 * Restore the recovery config to its original contents, ensuring that the
+	 * subscriber doesn't use outdated recovery parameters.
+	 */
+	if (recovery_params_set)
+	{
+		int			err = 0;
+		char		filename[MAXPGPATH];
+
+		snprintf(filename, MAXPGPATH, "%s/%s", subscriber_dir, recovery_conf_filename);
+
+		/*
+		 * 'recovery.conf' are fully rewritten during the 'recovery_setup'
+		 * function, so we need to restore saved original contents
+		 */
+		if (strcmp(recovery_conf_filename, "recovery.conf") == 0)
+		{
+			char		filename_with_original_contents[MAXPGPATH];
+
+			snprintf(filename_with_original_contents, MAXPGPATH,
+					 "%s" RECOVERY_CONF_PREFIX, filename);
+
+			err = durable_rename(filename_with_original_contents, filename, ERROR);
+		}
+		else					/* postgresql.auto.conf */
+		{
+			if ((err = truncate(filename, recovery_conf_size)) != 0)
+			{
+				pg_log_error("could not truncate file \"%s\" to %u: %m",
+							 filename, (unsigned int) recovery_conf_size);
+			}
+		}
+
+		if (err != 0)
+		{
+			pg_log_warning("recovery parameters were set but not properly reset on the target");
+			pg_log_warning_hint("manual removal of recovery parameters is required from '%s'",
+								recovery_conf_filename);
+		}
+	}
+
 	if (success)
 		return;
 
@@ -1025,6 +1073,9 @@ check_subscriber(const struct LogicalRepInfo *dbinfo)
 	pg_log_info("checking settings on subscriber");
 
 	conn = connect_database(dbinfo[0].subconninfo, true);
+	recovery_conf_filename =
+		PQserverVersion(conn) < MINIMUM_VERSION_FOR_RECOVERY_GUC ?
+		"recovery.conf" : "postgresql.auto.conf";
 
 	/* The target server must be a standby */
 	if (!server_is_in_recovery(conn))
@@ -1234,6 +1285,8 @@ setup_recovery(const struct LogicalRepInfo *dbinfo, const char *datadir, const c
 {
 	PGconn	   *conn;
 	PQExpBuffer recoveryconfcontents;
+	char		filename[MAXPGPATH];
+	struct stat st;
 
 	/*
 	 * Despite of the recovery parameters will be written to the subscriber,
@@ -1242,6 +1295,16 @@ setup_recovery(const struct LogicalRepInfo *dbinfo, const char *datadir, const c
 	 */
 	conn = connect_database(dbinfo[0].pubconninfo, true);
 
+	/*
+	 * Before setting up the recovery parameters save the original size for
+	 * restoring later.
+	 */
+	snprintf(filename, MAXPGPATH, "%s/%s", datadir, recovery_conf_filename);
+	if (stat(filename, &st) < 0)
+		pg_fatal("could not stat file \"%s\": %m", filename);
+
+	recovery_conf_size = st.st_size;
+
 	/*
 	 * Write recovery parameters.
 	 *
@@ -1273,17 +1336,22 @@ setup_recovery(const struct LogicalRepInfo *dbinfo, const char *datadir, const c
 	appendPQExpBufferStr(recoveryconfcontents, "recovery_target_time = ''\n");
 	appendPQExpBufferStr(recoveryconfcontents, "recovery_target_xid = ''\n");
 
-	if (dry_run)
-	{
-		appendPQExpBufferStr(recoveryconfcontents, "# dry run mode");
-		appendPQExpBuffer(recoveryconfcontents,
-						  "recovery_target_lsn = '%X/%08X'\n",
-						  LSN_FORMAT_ARGS((XLogRecPtr) InvalidXLogRecPtr));
-	}
-	else
+	if (!dry_run)
 	{
+		if (strcmp(recovery_conf_filename, "recovery.conf") == 0)
+		{
+			char		filename_with_original_contents[MAXPGPATH];
+
+			snprintf(filename_with_original_contents, MAXPGPATH,
+					 "%s" RECOVERY_CONF_PREFIX, filename);
+
+			durable_rename(filename, filename_with_original_contents, FATAL);
+		}
+
 		appendPQExpBuffer(recoveryconfcontents, "recovery_target_lsn = '%s'\n",
 						  lsn);
+
+		recovery_params_set = true;
 		WriteRecoveryConfig(conn, datadir, recoveryconfcontents);
 	}
 	disconnect_database(conn, false);
diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
index 1e887a0657a..c108747a9db 100644
--- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
+++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl
@@ -41,6 +41,30 @@ sub generate_db
 	return $dbname;
 }
 
+# Check if a parameter is absent from the server's configuration file.
+# Uses 'recovery.conf' for PostgreSQL < 12, 'postgresql.auto.conf' for newer
+# versions.
+sub test_param_absent
+{
+	my ($node, $param) = @_;
+	my $version = $node->pg_version->major;
+	my $conf;
+
+	if ($version < 12)
+	{
+		$conf = $node->data_dir . '/recovery.conf';
+	}
+	else
+	{
+		$conf = $node->data_dir . '/postgresql.auto.conf';
+	}
+
+	return 1 unless -e $conf;
+
+	my $content = slurp_file($conf);
+	return $content !~ /^\s*$param\s*=/m;
+}
+
 #
 # Test mandatory options
 command_fails(['pg_createsubscriber'],
@@ -467,6 +491,13 @@ command_ok(
 	],
 	'run pg_createsubscriber on node S');
 
+# Verify that recovery parameters have been reset after pg_createsubscriber
+# We check recovery_target_lsn as a representative parameter - since all
+# recovery parameters are managed as a group, the absence of one indicates
+# that the entire set has been properly cleared from the configuration.
+ok( test_param_absent($node_s, 'recovery_target_lsn'),
+	'recovery_target_lsn parameter was removed');
+
 # Confirm the physical replication slot has been removed
 $result = $node_p->safe_psql($db1,
 	"SELECT count(*) FROM pg_replication_slots WHERE slot_name = '$slotname'"
-- 
2.51.0

From c755b53aeb8061f85308b1f57f028579ea481cf7 Mon Sep 17 00:00:00 2001
From: Alena Vinter <[email protected]>
Date: Mon, 27 Oct 2025 13:02:51 +0700
Subject: [PATCH 2/2] doc: Add warning about leftover recovery parameters in
 'pg_createsubscriber'

---
 doc/src/sgml/ref/pg_createsubscriber.sgml | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml
index bb9cc72576c..fa14aeeef13 100644
--- a/doc/src/sgml/ref/pg_createsubscriber.sgml
+++ b/doc/src/sgml/ref/pg_createsubscriber.sgml
@@ -520,11 +520,19 @@ PostgreSQL documentation
       are added to avoid unexpected behavior during the recovery process such
       as end of the recovery as soon as a consistent state is reached (WAL
       should be applied until the replication start location) and multiple
-      recovery targets that can cause a failure.  This step finishes once the
-      server ends standby mode and is accepting read-write transactions.  If
-      <option>--recovery-timeout</option> option is set,
-      <application>pg_createsubscriber</application> terminates if recovery
-      does not end until the given number of seconds.
+      recovery targets that can cause a failure.
+     </para>
+     <para>
+      If an error occurs during this step, recovery parameters might remain in
+      the target server's configuration.  You may need to manually remove these
+      leftover parameters before attempting to run.
+      <application>pg_createsubscriber</application> again.
+     </para>
+     <para>
+      This step finishes once the server ends standby mode and is accepting
+      read-write transactions.  If <option>--recovery-timeout</option> option
+      is set, <application>pg_createsubscriber</application> terminates if
+      recovery does not end until the given number of seconds.
      </para>
     </step>
 
-- 
2.51.0

Reply via email to