Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

Alexey Kondratov Fri, 04 Oct 2019 07:22:18 -0700

On 04.10.2019 11:37, Michael Paquier wrote:

On Thu, Oct 03, 2019 at 12:43:37PM +0300, Alexey Kondratov wrote:

On 03.10.2019 6:07, Michael Paquier wrote:

I have reworked your first patch as per the attached.  What do you
think about it?  The part with the control file needs to go down to
v12, and I would likely split that into two commits on HEAD: one for
the control file and a second for the recovery.conf portion with the
fix for --no-ensure-shutdown to keep a cleaner history.

It looks fine for me excepting the progress reporting part. It now adds
PG_CONTROL_FILE_SIZE to fetch_done. However, I cannot find that control file
is either included into filemap and fetch_size or counted during
calculate_totals(). Maybe I've missed something, but now it looks like we
report something that wasn't planned for progress reporting, doesn't
it?

Right.  The pre-12 code actually handles that incorrecly as it assumed
that any files written through file_ops.c should be part of the
progress.  So I went with the simplest solution, and backpatched this
part with 6f3823b.  I have also committed the set of fixes for the new
options so as we have a better base of work than what's on HEAD
currently.


Great, thanks.


Regarding the tests, adding a --dry-run command is a good idea.
However I think that there is more value to automate the use of the
single user mode automatically in the tests as that's more critical
from the point of view of rewind run, and stopping the cluster with
immediate mode causes, as expected, the next --dry-run command to
fail.

Another thing is that I think that we should use -F with --single.
This makes recovery faster, and the target data folder is synced
at the end of pg_rewind anyway.

Using the long option names makes the tests easier to follow in this
case, so I have switched -R to --write-recovery-conf.

Some comments and the docs have been using some confusing wording, so
I have reworked what I found (like many "it" in a single sentence
referring different things).

I agree with all the points. Shutting down target server using'immediate' mode is a good way to test ensureCleanShutdown automatically.

Regarding all the set of incompatible options, we have much more of
that after the initial option parsing so I think that we should group
all the cheap ones together.  Let's tackle that as a separate patch.
We can also just check after --no-ensure-shutdown directly in
RewindTest.pm as I have switched the cluster to not be cleanly shut
down anymore to stress the automatic recovery path, and trigger that
before running pg_rewind for the local and remote mode.

Attached is an updated patch with all I found.  What do you think?

I've checked your patch, but it seems that it cannot be applied as is,since it e.g. adds a comment to 005_same_timeline.pl without actuallychanging the test. So I've slightly modified your patch and tried to fitboth dry-run and ensureCleanShutdown testing together. It works justfine and fails immediately if any of recent fixes is reverted. I stillthink that dry-run testing is worth adding, since it helped to catchthis v12 refactoring issue, but feel free to throw it way if it isn'tcommitable right now, of course.

As for incompatible options and sanity checks testing, yes, I agree thatit is a matter of different patch. I attached it as a separate WIP patchjust for history. Maybe I will try to gather more cases there later.


--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

>From 6e5667edcad6b037004288635a7ae0eda40d4262 Mon Sep 17 00:00:00 2001
From: Alexey Kondratov <kondratov.alek...@gmail.com>
Date: Fri, 4 Oct 2019 17:14:12 +0300
Subject: [PATCH v3 1/2] Improve functionality, docs and tests of -R,
 --no-ensure-shutdown and --dry-run options

Branch: pg-rewind-fixes
---
 doc/src/sgml/ref/pg_rewind.sgml            | 10 +--
 src/bin/pg_rewind/pg_rewind.c              | 19 +++---
 src/bin/pg_rewind/t/001_basic.pl           |  2 +-
 src/bin/pg_rewind/t/002_databases.pl       |  2 +-
 src/bin/pg_rewind/t/003_extrafiles.pl      |  2 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |  2 +-
 src/bin/pg_rewind/t/005_same_timeline.pl   | 32 +++++++---
 src/bin/pg_rewind/t/RewindTest.pm          | 71 +++++++++++++++++-----
 8 files changed, 103 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index fbf454803b..42d29edd4e 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -169,12 +169,14 @@ PostgreSQL documentation
       <term><option>--no-ensure-shutdown</option></term>
       <listitem>
        <para>
-        <application>pg_rewind</application> verifies that the target server
-        is cleanly shutdown before rewinding; by default, if it isn't, it
-        starts the server in single-user mode to complete crash recovery.
+        <application>pg_rewind</application> requires that the target server
+        is cleanly shut down before rewinding. By default, if the target server
+        is not shut down cleanly, <application>pg_rewind</application> starts
+        the target server in single-user mode to complete crash recovery first,
+        and stops it.
         By passing this option, <application>pg_rewind</application> skips
         this and errors out immediately if the server is not cleanly shut
-        down.  Users are expected to handle the situation themselves in that
+        down. Users are expected to handle the situation themselves in that
         case.
        </para>
       </listitem>
diff --git a/src/bin/pg_rewind/pg_rewind.c b/src/bin/pg_rewind/pg_rewind.c
index fe1468b771..875a43b219 100644
--- a/src/bin/pg_rewind/pg_rewind.c
+++ b/src/bin/pg_rewind/pg_rewind.c
@@ -270,11 +270,12 @@ main(int argc, char **argv)
 	pg_free(buffer);
 
 	/*
-	 * If the target instance was not cleanly shut down, run a single-user
-	 * postgres session really quickly and reload the control file to get the
-	 * new state. Note if no_ensure_shutdown is specified, pg_rewind won't do
-	 * that automatically. That means users need to do themselves in advance,
-	 * else pg_rewind will soon quit, see sanityChecks().
+	 * If the target instance was not cleanly shut down, start and stop the
+	 * target cluster once in single-user mode to enforce recovery to finish,
+	 * ensuring that the cluster can be used by pg_rewind.  Note that if
+	 * no_ensure_shutdown is specified, pg_rewind ignores this step, and users
+	 * need to make sure by themselves that the target cluster is in a clean
+	 * state.
 	 */
 	if (!no_ensure_shutdown &&
 		ControlFile_target.state != DB_SHUTDOWNED &&
@@ -847,8 +848,12 @@ ensureCleanShutdown(const char *argv0)
 	if (dry_run)
 		return;
 
-	/* finally run postgres in single-user mode */
-	snprintf(cmd, MAXCMDLEN, "\"%s\" --single -D \"%s\" template1 < \"%s\"",
+	/*
+	 * Finally run postgres in single-user mode.  There is no need to use
+	 * fsync here.  This makes the recovery faster, and the target data folder
+	 * is synced at the end anyway.
+	 */
+	snprintf(cmd, MAXCMDLEN, "\"%s\" --single -F -D \"%s\" template1 < \"%s\"",
 			 exec_path, datadir_target, DEVNULL);
 
 	if (system(cmd) != 0)
diff --git a/src/bin/pg_rewind/t/001_basic.pl b/src/bin/pg_rewind/t/001_basic.pl
index c3293e93df..a1659460ec 100644
--- a/src/bin/pg_rewind/t/001_basic.pl
+++ b/src/bin/pg_rewind/t/001_basic.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 11;
+use Test::More tests => 14;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/002_databases.pl b/src/bin/pg_rewind/t/002_databases.pl
index 1db534c0dc..921c4434f5 100644
--- a/src/bin/pg_rewind/t/002_databases.pl
+++ b/src/bin/pg_rewind/t/002_databases.pl
@@ -1,7 +1,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 7;
+use Test::More tests => 10;
 
 use FindBin;
 use lib $FindBin::RealBin;
diff --git a/src/bin/pg_rewind/t/003_extrafiles.pl b/src/bin/pg_rewind/t/003_extrafiles.pl
index f4710440fc..bce5b47148 100644
--- a/src/bin/pg_rewind/t/003_extrafiles.pl
+++ b/src/bin/pg_rewind/t/003_extrafiles.pl
@@ -3,7 +3,7 @@
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 5;
+use Test::More tests => 8;
 
 use File::Find;
 
diff --git a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
index 639eeb9c91..a501be8f78 100644
--- a/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
+++ b/src/bin/pg_rewind/t/004_pg_xlog_symlink.pl
@@ -14,7 +14,7 @@ if ($windows_os)
 }
 else
 {
-	plan tests => 5;
+	plan tests => 8;
 }
 
 use FindBin;
diff --git a/src/bin/pg_rewind/t/005_same_timeline.pl b/src/bin/pg_rewind/t/005_same_timeline.pl
index 40dbc44caa..1d96740a55 100644
--- a/src/bin/pg_rewind/t/005_same_timeline.pl
+++ b/src/bin/pg_rewind/t/005_same_timeline.pl
@@ -1,19 +1,35 @@
+#
+# Test that running pg_rewind with the source and target clusters
+# on the same timeline runs successfully.
+#
 use strict;
 use warnings;
 use TestLib;
-use Test::More tests => 1;
+use Test::More tests => 6;
 
 use FindBin;
 use lib $FindBin::RealBin;
 
 use RewindTest;
 
-# Test that running pg_rewind if the two clusters are on the same
-# timeline runs successfully.
+sub run_test
+{
+	my $test_mode = shift;
+
+	RewindTest::setup_cluster($test_mode, ['-g']);
+	RewindTest::start_master();
+	RewindTest::create_standby($test_mode);
+
+	RewindTest::promote_standby();
+
+	RewindTest::run_pg_rewind($test_mode);
+
+	RewindTest::clean_rewind_test();
+	return;
+}
+
+# Run the test in both modes.
+run_test('local');
+run_test('remote');
 
-RewindTest::setup_cluster();
-RewindTest::start_master();
-RewindTest::create_standby();
-RewindTest::run_pg_rewind('local');
-RewindTest::clean_rewind_test();
 exit(0);
diff --git a/src/bin/pg_rewind/t/RewindTest.pm b/src/bin/pg_rewind/t/RewindTest.pm
index c540722420..9292092b1a 100644
--- a/src/bin/pg_rewind/t/RewindTest.pm
+++ b/src/bin/pg_rewind/t/RewindTest.pm
@@ -227,9 +227,57 @@ sub run_pg_rewind
 	# Append the rewind-specific role to the connection string.
 	$standby_connstr = "$standby_connstr user=rewind_user";
 
-	# Stop the master and be ready to perform the rewind
+	# Stop the master cleanly to check dry-run mode.
 	$node_master->stop;
 
+	# First run pg_rewind in dry-run mode
+	if ($test_mode eq "local")
+	{
+
+		# Stop the new master and be ready to perform the rewind
+		# Do rewind in dry-run mode using a local pgdata as source
+		$node_standby->stop;
+		command_ok(
+			[
+				'pg_rewind',
+				"--debug", "--no-sync", "--dry-run",
+				"--source-pgdata=$standby_pgdata",
+				"--target-pgdata=$master_pgdata",
+				"--no-ensure-shutdown"
+			],
+			'pg_rewind local with --dry-run');
+	}
+	elsif ($test_mode eq "remote")
+	{
+
+		# Do rewind in dry-run mode using a remote connection as source
+		command_ok(
+			[
+				'pg_rewind',
+				"--debug", "--no-sync", "--dry-run",
+				"--source-server", $standby_connstr,
+				"--target-pgdata=$master_pgdata",
+				"--write-recovery-conf", "--progress"
+			],
+			'pg_rewind remote with --dry-run');
+
+		# Check that standby.signal hasn't been created.
+		ok(! -e "$master_pgdata/standby.signal",
+			'standby.signal is absent');
+	}
+	else
+	{
+
+		# Cannot come here normally
+		croak("Incorrect test mode specified");
+	}
+
+	# Start and stop the master and be ready to perform the rewind.
+	# The cluster needs recovery to finish once, and pg_rewind makes
+	# sure that it happens automatically.
+	$node_master->start;
+	$node_master->stop('immediate');
+
 	# At this point, the rewind processing is ready to run.
 	# We now have a very simple scenario with a few diverged WAL record.
 	# The real testing begins really now with a bifurcation of the possible
@@ -246,8 +294,6 @@ sub run_pg_rewind
 	{
 
 		# Do rewind using a local pgdata as source
-		# Stop the master and be ready to perform the rewind
-		$node_standby->stop;
 		command_ok(
 			[
 				'pg_rewind',
@@ -261,18 +307,21 @@ sub run_pg_rewind
 	elsif ($test_mode eq "remote")
 	{
 
-		# Do rewind using a remote connection as source
+		# Do rewind using a remote connection as source, generating
+		# recovery configuration automatically.
 		command_ok(
 			[
 				'pg_rewind',                      "--debug",
 				"--source-server",                $standby_connstr,
-				"--target-pgdata=$master_pgdata", "-R",
-				"--no-sync"
+				"--target-pgdata=$master_pgdata", "--no-sync",
+				"--write-recovery-conf"
 			],
 			'pg_rewind remote');
 
-		# Check that standby.signal has been created.
-		ok(-e "$master_pgdata/standby.signal");
+		# Check that standby.signal is here as recovery configuration
+		# was requested.
+		ok( -e "$master_pgdata/standby.signal",
+			'standby.signal created after pg_rewind');
 
 		# Now, when pg_rewind apparently succeeded with minimal permissions,
 		# add REPLICATION privilege.  So we could test that new standby
@@ -280,12 +329,6 @@ sub run_pg_rewind
 		$node_standby->safe_psql('postgres',
 			"ALTER ROLE rewind_user WITH REPLICATION;");
 	}
-	else
-	{
-
-		# Cannot come here normally
-		croak("Incorrect test mode specified");
-	}
 
 	# Now move back postgresql.conf with old settings
 	move(

base-commit: 6837632b758e0470a2692d9f8303e8aebd6fbd8f
-- 
2.17.1

>From 50c4ac745354c38ece3a805a4882eb1b2895a3fe Mon Sep 17 00:00:00 2001
From: Alexey Kondratov <kondratov.alek...@gmail.com>
Date: Fri, 4 Oct 2019 17:14:53 +0300
Subject: [PATCH v3 2/2] WIP: increase pg_rewind test coverage

Branch: pg-rewind-fixes
---
 src/bin/pg_rewind/t/006_actions.pl | 62 ++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)
 create mode 100644 src/bin/pg_rewind/t/006_actions.pl

diff --git a/src/bin/pg_rewind/t/006_actions.pl b/src/bin/pg_rewind/t/006_actions.pl
new file mode 100644
index 0000000000..439074ee72
--- /dev/null
+++ b/src/bin/pg_rewind/t/006_actions.pl
@@ -0,0 +1,62 @@
+#
+# Test incompatible options and pg_rewind internal sanity checks.
+#
+use strict;
+use warnings;
+use TestLib;
+use Test::More tests => 3;
+
+use FindBin;
+use lib $FindBin::RealBin;
+
+use RewindTest;
+
+RewindTest::setup_cluster('local', ['-g']);
+RewindTest::start_master();
+RewindTest::create_standby('local');
+
+my $master_pgdata   = $node_master->data_dir;
+my $standby_pgdata  = $node_standby->data_dir;
+
+RewindTest::promote_standby();
+
+# Check that pg_rewind errors out if target server
+# wasn't shutdown.
+command_fails(
+    [
+        'pg_rewind', "--debug",
+        "--source-pgdata=$standby_pgdata",
+        "--target-pgdata=$master_pgdata",
+        "--no-ensure-shutdown"
+    ],
+    'pg_rewind local without target shutdown');
+
+$node_master->stop;
+
+# Check that pg_rewind errors out if source server
+# wasn't shutdown.
+command_fails(
+    [
+        'pg_rewind', "--debug",
+        "--source-pgdata=$standby_pgdata",
+        "--target-pgdata=$master_pgdata",
+        "--no-ensure-shutdown"
+    ],
+    'pg_rewind local without source shutdown');
+
+$node_standby->stop;
+
+# Check that incompatible options error out.
+command_fails(
+    [
+        'pg_rewind', "--debug",
+        "--source-pgdata=$standby_pgdata",
+        "--target-pgdata=$master_pgdata",
+        "--write-recovery-conf",
+        "--no-ensure-shutdown"
+    ],
+    'pg_rewind local with --write-recovery-conf');
+
+RewindTest::clean_rewind_test();
+
+exit(0);
-- 
2.17.1

Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

Reply via email to