Hi,

we had a customer who ran pg_checksums on their v12 cluster and got a
confusing error:

|pg_checksums: error: invalid segment number 0 in file name "/PG-
|Data/foo_12_data/pg_tblspc/16402/PG_10_201707211/16390/pg_internal.init
|.10028"

Turns out the customer ran a pg_ugprade in copy mode before and started
up the old cluster again which pg_checksums decided to checked as well -
note the PG_10_201707211 in the error message. The attached patch is a
stab at teaching pg_checksums to only check its own
TABLESPACE_VERSION_DIRECTORY directory. I guess this implies that it
would ignore tablespace directories of outdated catversion instances
during development, which I think should be ok, but others might not
agree?

The other question is whether it is possible to end up with a
pg_internal.init.$PID file in a running cluster. E.g. if an instance
crashes and gets started up again - are those cleaned up during crash
recovery, or should pg_checksums ignore them? Right now pg_checksums
only checks against a list of filenames and only skips on exact matches
not prefixes so that might take a bit of work.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
From 5a0a1e7ff543749815bd4d87476550b01009dbba Mon Sep 17 00:00:00 2001
From: Michael Banck <michael.ba...@credativ.de>
Date: Thu, 30 Jan 2020 17:48:37 +0100
Subject: [PATCH] Make pg_checksums skip foreign tablespace directories.

Until now, pg_checksums blindly descended into directories and the symlinks in
pg_tblspc. However, if pg_upgrade was run in copy mode on a cluster with
tablespaces, the external tablespace directory will contain version-specific
directories for both the old and the upgraded version and pg_checksums will
check both. If the old cluster is started up again, pg_checksums would check
the online cluster and possibly fail on files it does not expect in regular
offline mode. To fix, check if the directory we are descending into starts with
'PG_' and only check it if it is equal to our TABLESPACE_VERSION_DIRECTORY
string.
---
 src/bin/pg_checksums/pg_checksums.c   | 17 +++++++++++++++++
 src/bin/pg_checksums/t/002_actions.pl | 24 ++++++++++++++++++++----
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 46ee1f1dc3..45caf473cb 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -58,6 +58,8 @@ typedef enum
 #define PG_TEMP_FILES_DIR "pgsql_tmp"
 #define PG_TEMP_FILE_PREFIX "pgsql_tmp"
 
+#define TABLESPACE_PREFIX "PG_"
+
 static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
@@ -368,7 +370,22 @@ scan_directory(const char *basedir, const char *subdir, bool sizeonly)
 #else
 		else if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn))
 #endif
+		{
+			/*
+			 * Check if the directory starts with 'PG_', indicating a versioned
+			 * tablspace directory. If this is the case, compare it against the
+			 * versioned directory we expect. If it does not match, we are
+			 * looking at a versioned tablespace directory from a different
+			 * version, possibly from a pg_upgrade, so skip it.
+			 */
+			if ((strncmp(TABLESPACE_PREFIX, de->d_name, strlen(TABLESPACE_PREFIX)) == 0) &&
+				(strncmp(TABLESPACE_VERSION_DIRECTORY, de->d_name, strlen(de->d_name)) != 0))
+			{
+				pg_log_debug("skipping foreign tablespace directory %s", de->d_name);
+				continue;
+			}
 			dirsize += scan_directory(path, de->d_name, sizeonly);
+		}
 	}
 	closedir(dir);
 	return dirsize;
diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl
index 59228b916c..ac9f1c9fe4 100644
--- a/src/bin/pg_checksums/t/002_actions.pl
+++ b/src/bin/pg_checksums/t/002_actions.pl
@@ -5,7 +5,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 62;
+use Test::More tests => 63;
 
 
 # Utility routine to create and check a table with corrupted checksums
@@ -181,12 +181,28 @@ check_relation_corruption($node, 'corrupt1', 'pg_default');
 
 # Create tablespace to check corruptions in a non-default tablespace.
 my $basedir        = $node->basedir;
-my $tablespace_dir = "$basedir/ts_corrupt_dir";
+my $corrupt_tablespace_dir = "$basedir/ts_corrupt_dir";
+mkdir($corrupt_tablespace_dir);
+$corrupt_tablespace_dir = TestLib::perl2host($corrupt_tablespace_dir);
+$node->safe_psql('postgres',
+	"CREATE TABLESPACE ts_corrupt LOCATION '$corrupt_tablespace_dir';");
+check_relation_corruption($node, 'corrupt2', 'ts_corrupt');
+
+# Create another tablespace and create a bogus versioned tablespace directory
+# with a file usually only found on online clusters in it, this one should get
+# skipped.
+my $tablespace_dir = "$basedir/ts_dir";
 mkdir($tablespace_dir);
 $tablespace_dir = TestLib::perl2host($tablespace_dir);
 $node->safe_psql('postgres',
-	"CREATE TABLESPACE ts_corrupt LOCATION '$tablespace_dir';");
-check_relation_corruption($node, 'corrupt2', 'ts_corrupt');
+	"CREATE TABLESPACE ts_good LOCATION '$tablespace_dir';");
+$node->stop;
+my $foreign_tablespace_version_dir = "$tablespace_dir/PG_99_9999999999";
+mkdir("$foreign_tablespace_version_dir");
+append_to_file "$foreign_tablespace_version_dir/pg_internal.init.123",   "";
+command_ok(
+        [ 'pg_checksums', '--check', '-D', $pgdata ],
+        "correctly skips foreign tablespace directory");
 
 # Utility routine to check that pg_checksums is able to detect
 # correctly-named relation files filled with some corrupted data.
-- 
2.20.1

Reply via email to