Hi, we had a customer who ran pg_checksums on their v12 cluster and got a confusing error:
|pg_checksums: error: invalid segment number 0 in file name "/PG- |Data/foo_12_data/pg_tblspc/16402/PG_10_201707211/16390/pg_internal.init |.10028" Turns out the customer ran a pg_ugprade in copy mode before and started up the old cluster again which pg_checksums decided to checked as well - note the PG_10_201707211 in the error message. The attached patch is a stab at teaching pg_checksums to only check its own TABLESPACE_VERSION_DIRECTORY directory. I guess this implies that it would ignore tablespace directories of outdated catversion instances during development, which I think should be ok, but others might not agree? The other question is whether it is possible to end up with a pg_internal.init.$PID file in a running cluster. E.g. if an instance crashes and gets started up again - are those cleaned up during crash recovery, or should pg_checksums ignore them? Right now pg_checksums only checks against a list of filenames and only skips on exact matches not prefixes so that might take a bit of work. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
From 5a0a1e7ff543749815bd4d87476550b01009dbba Mon Sep 17 00:00:00 2001 From: Michael Banck <michael.ba...@credativ.de> Date: Thu, 30 Jan 2020 17:48:37 +0100 Subject: [PATCH] Make pg_checksums skip foreign tablespace directories. Until now, pg_checksums blindly descended into directories and the symlinks in pg_tblspc. However, if pg_upgrade was run in copy mode on a cluster with tablespaces, the external tablespace directory will contain version-specific directories for both the old and the upgraded version and pg_checksums will check both. If the old cluster is started up again, pg_checksums would check the online cluster and possibly fail on files it does not expect in regular offline mode. To fix, check if the directory we are descending into starts with 'PG_' and only check it if it is equal to our TABLESPACE_VERSION_DIRECTORY string. --- src/bin/pg_checksums/pg_checksums.c | 17 +++++++++++++++++ src/bin/pg_checksums/t/002_actions.pl | 24 ++++++++++++++++++++---- 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c index 46ee1f1dc3..45caf473cb 100644 --- a/src/bin/pg_checksums/pg_checksums.c +++ b/src/bin/pg_checksums/pg_checksums.c @@ -58,6 +58,8 @@ typedef enum #define PG_TEMP_FILES_DIR "pgsql_tmp" #define PG_TEMP_FILE_PREFIX "pgsql_tmp" +#define TABLESPACE_PREFIX "PG_" + static PgChecksumMode mode = PG_MODE_CHECK; static const char *progname; @@ -368,7 +370,22 @@ scan_directory(const char *basedir, const char *subdir, bool sizeonly) #else else if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn)) #endif + { + /* + * Check if the directory starts with 'PG_', indicating a versioned + * tablspace directory. If this is the case, compare it against the + * versioned directory we expect. If it does not match, we are + * looking at a versioned tablespace directory from a different + * version, possibly from a pg_upgrade, so skip it. + */ + if ((strncmp(TABLESPACE_PREFIX, de->d_name, strlen(TABLESPACE_PREFIX)) == 0) && + (strncmp(TABLESPACE_VERSION_DIRECTORY, de->d_name, strlen(de->d_name)) != 0)) + { + pg_log_debug("skipping foreign tablespace directory %s", de->d_name); + continue; + } dirsize += scan_directory(path, de->d_name, sizeonly); + } } closedir(dir); return dirsize; diff --git a/src/bin/pg_checksums/t/002_actions.pl b/src/bin/pg_checksums/t/002_actions.pl index 59228b916c..ac9f1c9fe4 100644 --- a/src/bin/pg_checksums/t/002_actions.pl +++ b/src/bin/pg_checksums/t/002_actions.pl @@ -5,7 +5,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 62; +use Test::More tests => 63; # Utility routine to create and check a table with corrupted checksums @@ -181,12 +181,28 @@ check_relation_corruption($node, 'corrupt1', 'pg_default'); # Create tablespace to check corruptions in a non-default tablespace. my $basedir = $node->basedir; -my $tablespace_dir = "$basedir/ts_corrupt_dir"; +my $corrupt_tablespace_dir = "$basedir/ts_corrupt_dir"; +mkdir($corrupt_tablespace_dir); +$corrupt_tablespace_dir = TestLib::perl2host($corrupt_tablespace_dir); +$node->safe_psql('postgres', + "CREATE TABLESPACE ts_corrupt LOCATION '$corrupt_tablespace_dir';"); +check_relation_corruption($node, 'corrupt2', 'ts_corrupt'); + +# Create another tablespace and create a bogus versioned tablespace directory +# with a file usually only found on online clusters in it, this one should get +# skipped. +my $tablespace_dir = "$basedir/ts_dir"; mkdir($tablespace_dir); $tablespace_dir = TestLib::perl2host($tablespace_dir); $node->safe_psql('postgres', - "CREATE TABLESPACE ts_corrupt LOCATION '$tablespace_dir';"); -check_relation_corruption($node, 'corrupt2', 'ts_corrupt'); + "CREATE TABLESPACE ts_good LOCATION '$tablespace_dir';"); +$node->stop; +my $foreign_tablespace_version_dir = "$tablespace_dir/PG_99_9999999999"; +mkdir("$foreign_tablespace_version_dir"); +append_to_file "$foreign_tablespace_version_dir/pg_internal.init.123", ""; +command_ok( + [ 'pg_checksums', '--check', '-D', $pgdata ], + "correctly skips foreign tablespace directory"); # Utility routine to check that pg_checksums is able to detect # correctly-named relation files filled with some corrupted data. -- 2.20.1