On Mon, Jun 17, 2024, at 8:04 AM, Peter Eisentraut wrote:
> On 07.06.24 05:49, Euler Taveira wrote:
> > Here it is a patch series to fix the issues reported in recent 
> > discussions. The
> > patches 0001 and 0003 aim to fix the buildfarm issues. The patch 0002 
> > removes
> > synchronized failover slots on subscriber since it has no use. I also 
> > included
> > an optional patch 0004 that improves the usability by checking both 
> > servers if
> > it already failed in any subscriber check.
> 
> I have committed 0001, 0002, and 0003.  Let's keep an eye on the 
> buildfarm to see if that stabilizes things.  So far it looks good.

Thanks.

> For 0004, I suggest inverting the result values from check_publisher() 
> and create_subscriber() so that it returns true if the check is ok.
> 

Yeah, the order doesn't matter after the commit b963913826. I thought about
changing the order but I didn't; I provided a minimal patch. Since you think
it is an improvement, I'm attaching another patch with this change.


--
Euler Taveira
EDB   https://www.enterprisedb.com/
From ec96586fa35be3f855fb8844a197f374ac82a622 Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler.tave...@enterprisedb.com>
Date: Fri, 24 May 2024 14:29:06 -0300
Subject: [PATCH] Check both servers before exiting

The current code provides multiple errors for each server. If any
subscriber condition is not met, it terminates without checking the
publisher conditions.
This change allows it to check both servers before terminating if any
condition is not met. It saves at least one dry run execution.

The order it calls the check functions don't matter after commit
b96391382626306c301b67cbd2d28313d96741f3 (because there is no
primary_slot_name check anymore) so check the publisher first.
---
 src/bin/pg_basebackup/pg_createsubscriber.c | 28 ++++++++++++---------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_createsubscriber.c b/src/bin/pg_basebackup/pg_createsubscriber.c
index 5499e6d96a..26b4087ff7 100644
--- a/src/bin/pg_basebackup/pg_createsubscriber.c
+++ b/src/bin/pg_basebackup/pg_createsubscriber.c
@@ -76,9 +76,9 @@ static uint64 get_standby_sysid(const char *datadir);
 static void modify_subscriber_sysid(const struct CreateSubscriberOptions *opt);
 static bool server_is_in_recovery(PGconn *conn);
 static char *generate_object_name(PGconn *conn);
-static void check_publisher(const struct LogicalRepInfo *dbinfo);
+static bool check_publisher(const struct LogicalRepInfo *dbinfo);
 static char *setup_publisher(struct LogicalRepInfo *dbinfo);
-static void check_subscriber(const struct LogicalRepInfo *dbinfo);
+static bool check_subscriber(const struct LogicalRepInfo *dbinfo);
 static void setup_subscriber(struct LogicalRepInfo *dbinfo,
 							 const char *consistent_lsn);
 static void setup_recovery(const struct LogicalRepInfo *dbinfo, const char *datadir,
@@ -803,7 +803,7 @@ server_is_in_recovery(PGconn *conn)
  *
  * XXX Does it not allow a synchronous replica?
  */
-static void
+static bool
 check_publisher(const struct LogicalRepInfo *dbinfo)
 {
 	PGconn	   *conn;
@@ -908,8 +908,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
 
 	pg_free(wal_level);
 
-	if (failed)
-		exit(1);
+	return failed;
 }
 
 /*
@@ -923,7 +922,7 @@ check_publisher(const struct LogicalRepInfo *dbinfo)
  * there is not a reliable way to provide a suitable error saying the server C
  * will be broken at the end of this process (due to pg_resetwal).
  */
-static void
+static bool
 check_subscriber(const struct LogicalRepInfo *dbinfo)
 {
 	PGconn	   *conn;
@@ -1014,8 +1013,7 @@ check_subscriber(const struct LogicalRepInfo *dbinfo)
 		failed = true;
 	}
 
-	if (failed)
-		exit(1);
+	return failed;
 }
 
 /*
@@ -1771,6 +1769,9 @@ main(int argc, char **argv)
 
 	char		pidfile[MAXPGPATH];
 
+	bool		pub_failed = false;
+	bool		sub_failed = false;
+
 	pg_logging_init(argv[0]);
 	pg_logging_set_level(PG_LOG_WARNING);
 	progname = get_progname(argv[0]);
@@ -2061,11 +2062,14 @@ main(int argc, char **argv)
 	pg_log_info("starting the standby with command-line options");
 	start_standby_server(&opt, true);
 
-	/* Check if the standby server is ready for logical replication */
-	check_subscriber(dbinfo);
-
 	/* Check if the primary server is ready for logical replication */
-	check_publisher(dbinfo);
+	pub_failed = check_publisher(dbinfo);
+
+	/* Check if the standby server is ready for logical replication */
+	sub_failed = check_subscriber(dbinfo);
+
+	if (pub_failed || sub_failed)
+		exit(1);
 
 	/*
 	 * Stop the target server. The recovery process requires that the server
-- 
2.30.2

Reply via email to