Hi everyone, Michael, thank you for outlining your alternative approach. After rethinking the current patch state with a clearer vision, I realized that simply truncating the postgresql.auto.conf file is sufficient. All modifications made by pg_createsubscriber in this file are append-only, so truncation reliably restores it to its original state without adding extra logic. This keeps the patch small and clean.
For older versions using recovery.conf, the situation differs — since that file is fully rewritten during recovery setup, we instead restore the previously saved original file using a durable rename. Regarding debugging: the contents are not entirely lost. pg_createsubscriber already prints the new recovery configuration as debug output, so the full parameter set remains visible in the logs for inspection when needed. My point is that adding include directives isn't needed, as we already have debug output, and, moreover, they aren't applied to recovery.conf. Robert, this scenario actually occurred in production at one of our customer environments. Even though this workflow may be uncommon, PostgreSQL should still handle it gracefully. The fact that the server can end up in a state where it cannot start because it fails to reach a recovery target point far in the past suggests a potential area for improvement in the recovery process. It would be helpful if the system could detect such a case — where the recovery target precedes the current consistent point — and either skip recovery or emit a clear diagnostic message rather than failing to start. Best regards, Vinter Alena
From e156a12d97589d53682bbe3dbff7b1608789cf54 Mon Sep 17 00:00:00 2001 From: Alena Vinter <[email protected]> Date: Mon, 27 Oct 2025 13:02:02 +0700 Subject: [PATCH 1/2] Reseting recovery target parameters in `pg_createsubscriber` The utility sets recovery target params for correct recovery before conversion a physical replica to a logical one but does not reset them afterward. It may cause recovery failures in certain scenarios. For example, if recovery begins from a checkpoint where no WAL records need to be applied, the system may incorrectly determine that the recovery target was never reached because these parameters remain active. This change ensures all recovery parameters are properly reset after conversion to prevent such edge cases. --- src/bin/pg_basebackup/pg_createsubscriber.c | 100 +++++++++++++++--- .../t/040_pg_createsubscriber.pl | 31 ++++++ 2 files changed, 115 insertions(+), 16 deletions(-) diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c index 9fa5caaf91d..c5a8997e5db 100644 --- a/src/bin/pg_basebackup/pg_createsubscriber.c +++ b/src/bin/pg_basebackup/pg_createsubscriber.c @@ -12,6 +12,7 @@ */ #include "postgres_fe.h" +#include "utils/elog.h" #include <sys/stat.h> #include <sys/time.h> @@ -28,9 +29,11 @@ #include "fe_utils/string_utils.h" #include "fe_utils/version.h" #include "getopt_long.h" +#include "storage/fd.h" #define DEFAULT_SUB_PORT "50432" #define OBJECTTYPE_PUBLICATIONS 0x0001 +#define RECOVERY_CONF_PREFIX ".original" /* Command-line options */ struct CreateSubscriberOptions @@ -155,6 +158,11 @@ static char *subscriber_dir = NULL; static bool recovery_ended = false; static bool standby_running = false; +static bool recovery_params_set = false; + +static const char *recovery_conf_filename = NULL; /* cached recovery conf + * filename */ +static off_t recovery_conf_size = 0; /* size of original recovery conf file */ enum WaitPMResult { @@ -164,19 +172,59 @@ enum WaitPMResult /* - * Cleanup objects that were created by pg_createsubscriber if there is an - * error. + * Clean up objects created by pg_createsubscriber. * - * Publications and replication slots are created on primary. Depending on the - * step it failed, it should remove the already created objects if it is - * possible (sometimes it won't work due to a connection issue). - * There is no cleanup on the target server. The steps on the target server are - * executed *after* promotion, hence, at this point, a failure means recreate - * the physical replica and start again. + * Publications and replication slots are created on the primary. Depending on + * the step where it failed, already created objects should be removed if + * possible (sometimes this won't work due to a connection issue). + * There is no cleanup on the target server *after* its promotion, because any + * failure at this point means recreating the physical replica and starting + * again. */ static void cleanup_objects_atexit(void) { + /* + * Restore the recovery config to its original contents, ensuring that the + * subscriber doesn't use outdated recovery parameters. + */ + if (recovery_params_set) + { + int err = 0; + char filename[MAXPGPATH]; + + snprintf(filename, MAXPGPATH, "%s/%s", subscriber_dir, recovery_conf_filename); + + /* + * 'recovery.conf' are fully rewritten during the 'recovery_setup' + * function, so we need to restore saved original contents + */ + if (strcmp(recovery_conf_filename, "recovery.conf") == 0) + { + char filename_with_original_contents[MAXPGPATH]; + + snprintf(filename_with_original_contents, MAXPGPATH, + "%s" RECOVERY_CONF_PREFIX, filename); + + err = durable_rename(filename_with_original_contents, filename, ERROR); + } + else /* postgresql.auto.conf */ + { + if ((err = truncate(filename, recovery_conf_size)) != 0) + { + pg_log_error("could not truncate file \"%s\" to %u: %m", + filename, (unsigned int) recovery_conf_size); + } + } + + if (err != 0) + { + pg_log_warning("recovery parameters were set but not properly reset on the target"); + pg_log_warning_hint("manual removal of recovery parameters is required from '%s'", + recovery_conf_filename); + } + } + if (success) return; @@ -1025,6 +1073,9 @@ check_subscriber(const struct LogicalRepInfo *dbinfo) pg_log_info("checking settings on subscriber"); conn = connect_database(dbinfo[0].subconninfo, true); + recovery_conf_filename = + PQserverVersion(conn) < MINIMUM_VERSION_FOR_RECOVERY_GUC ? + "recovery.conf" : "postgresql.auto.conf"; /* The target server must be a standby */ if (!server_is_in_recovery(conn)) @@ -1234,6 +1285,8 @@ setup_recovery(const struct LogicalRepInfo *dbinfo, const char *datadir, const c { PGconn *conn; PQExpBuffer recoveryconfcontents; + char filename[MAXPGPATH]; + struct stat st; /* * Despite of the recovery parameters will be written to the subscriber, @@ -1242,6 +1295,16 @@ setup_recovery(const struct LogicalRepInfo *dbinfo, const char *datadir, const c */ conn = connect_database(dbinfo[0].pubconninfo, true); + /* + * Before setting up the recovery parameters save the original size for + * restoring later. + */ + snprintf(filename, MAXPGPATH, "%s/%s", datadir, recovery_conf_filename); + if (stat(filename, &st) < 0) + pg_fatal("could not stat file \"%s\": %m", filename); + + recovery_conf_size = st.st_size; + /* * Write recovery parameters. * @@ -1273,17 +1336,22 @@ setup_recovery(const struct LogicalRepInfo *dbinfo, const char *datadir, const c appendPQExpBufferStr(recoveryconfcontents, "recovery_target_time = ''\n"); appendPQExpBufferStr(recoveryconfcontents, "recovery_target_xid = ''\n"); - if (dry_run) - { - appendPQExpBufferStr(recoveryconfcontents, "# dry run mode"); - appendPQExpBuffer(recoveryconfcontents, - "recovery_target_lsn = '%X/%08X'\n", - LSN_FORMAT_ARGS((XLogRecPtr) InvalidXLogRecPtr)); - } - else + if (!dry_run) { + if (strcmp(recovery_conf_filename, "recovery.conf") == 0) + { + char filename_with_original_contents[MAXPGPATH]; + + snprintf(filename_with_original_contents, MAXPGPATH, + "%s" RECOVERY_CONF_PREFIX, filename); + + durable_rename(filename, filename_with_original_contents, FATAL); + } + appendPQExpBuffer(recoveryconfcontents, "recovery_target_lsn = '%s'\n", lsn); + + recovery_params_set = true; WriteRecoveryConfig(conn, datadir, recoveryconfcontents); } disconnect_database(conn, false); diff --git a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl index 1e887a0657a..c108747a9db 100644 --- a/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl +++ b/src/bin/pg_basebackup/t/040_pg_createsubscriber.pl @@ -41,6 +41,30 @@ sub generate_db return $dbname; } +# Check if a parameter is absent from the server's configuration file. +# Uses 'recovery.conf' for PostgreSQL < 12, 'postgresql.auto.conf' for newer +# versions. +sub test_param_absent +{ + my ($node, $param) = @_; + my $version = $node->pg_version->major; + my $conf; + + if ($version < 12) + { + $conf = $node->data_dir . '/recovery.conf'; + } + else + { + $conf = $node->data_dir . '/postgresql.auto.conf'; + } + + return 1 unless -e $conf; + + my $content = slurp_file($conf); + return $content !~ /^\s*$param\s*=/m; +} + # # Test mandatory options command_fails(['pg_createsubscriber'], @@ -467,6 +491,13 @@ command_ok( ], 'run pg_createsubscriber on node S'); +# Verify that recovery parameters have been reset after pg_createsubscriber +# We check recovery_target_lsn as a representative parameter - since all +# recovery parameters are managed as a group, the absence of one indicates +# that the entire set has been properly cleared from the configuration. +ok( test_param_absent($node_s, 'recovery_target_lsn'), + 'recovery_target_lsn parameter was removed'); + # Confirm the physical replication slot has been removed $result = $node_p->safe_psql($db1, "SELECT count(*) FROM pg_replication_slots WHERE slot_name = '$slotname'" -- 2.51.0
From c755b53aeb8061f85308b1f57f028579ea481cf7 Mon Sep 17 00:00:00 2001 From: Alena Vinter <[email protected]> Date: Mon, 27 Oct 2025 13:02:51 +0700 Subject: [PATCH 2/2] doc: Add warning about leftover recovery parameters in 'pg_createsubscriber' --- doc/src/sgml/ref/pg_createsubscriber.sgml | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/doc/src/sgml/ref/pg_createsubscriber.sgml b/doc/src/sgml/ref/pg_createsubscriber.sgml index bb9cc72576c..fa14aeeef13 100644 --- a/doc/src/sgml/ref/pg_createsubscriber.sgml +++ b/doc/src/sgml/ref/pg_createsubscriber.sgml @@ -520,11 +520,19 @@ PostgreSQL documentation are added to avoid unexpected behavior during the recovery process such as end of the recovery as soon as a consistent state is reached (WAL should be applied until the replication start location) and multiple - recovery targets that can cause a failure. This step finishes once the - server ends standby mode and is accepting read-write transactions. If - <option>--recovery-timeout</option> option is set, - <application>pg_createsubscriber</application> terminates if recovery - does not end until the given number of seconds. + recovery targets that can cause a failure. + </para> + <para> + If an error occurs during this step, recovery parameters might remain in + the target server's configuration. You may need to manually remove these + leftover parameters before attempting to run. + <application>pg_createsubscriber</application> again. + </para> + <para> + This step finishes once the server ends standby mode and is accepting + read-write transactions. If <option>--recovery-timeout</option> option + is set, <application>pg_createsubscriber</application> terminates if + recovery does not end until the given number of seconds. </para> </step> -- 2.51.0
