On Mon, Mar 30, 2026 at 11:23 AM Tom Lane <[email protected]> wrote: > Thomas Munro <[email protected]> writes: > > Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty > > unlikely to hit this in the wild, and the symptom is a confusing error > > in a maintenance tool, not corruption, so I don't think this is a big > > deal. I might still try teaching the astreamer code to understand PAX > > 1.0 when it sees it in the next cycle though, for the benefit of > > FreeBSD users. > > I agree that this isn't too critical if the effects are confined to > pg_waldump. I believe that pg_basebackup and pg_verifybackup also use > astreamer_tar.c, but it's not clear to me if they'd ever be asked to > parse files made by tar(1) and not by our own sparseness-ignorant > tar-writing code. If they can be, that'd be a higher-priority reason > to fill in this gap.
I pushed the workaround for the test. Yeah I can't see any reason why pg_verifybackup --wal-path=foo.tar won't suffer the same problem in the wild. Again, it's not the end of the world because it'll just fail and you'll probably eventually figure out why. So perhaps we should just improve our detection of archives that we can't handle? Straw man algorithm: If you can't find $NAME in the archive, then check if PaxHeaders/$NAME exists, and if so, fail with 'unsupported TAR format for WAL file "%s" in archive "%s"' instead. That'd probably work well enough in practice, because astreamer_tar.c treats PAX extended header pseudo-files as regular files (they're not, they have type 'x'), and both GNU and BSD tar happen to use that. POSIX doesn't require that naming, so it would in theory be more correct to teach astreamer_tar.c to recognise PAX extended headers and fish out enough information and link it to the following archive member, but a simple test to improve error messaging seems like the right level of effort here. Here's a test patch that shows the problem on any system with GNU tar or BSD tar and a file system that supports sparse files. The test succeeds because it looks for "error: could not find WAL" but the idea would be to change it to look for a new error message like that. My motivation was to make this reproducible on any system, in case that's helpful for Amul and Andrew if they're interested in trying to improve this edge case in time for the release. Otherwise I'll come back to it, but probably not in time...
From 084d71f81143f0462caf03569722b5f0b2a147e6 Mon Sep 17 00:00:00 2001 From: Thomas Munro <[email protected]> Date: Mon, 30 Mar 2026 18:20:09 +1300 Subject: [PATCH] Add a pg_waldump test with GNU tar PAX format. XXX Update this to test for a new improved error message! XXX Should this test run for all the scenarios? Doesn't seem like compression is relevant to this problem so I just added it as a standalone test... XXX No doubt the perl isn't the greatest... --- src/bin/pg_waldump/t/001_basic.pl | 73 +++++++++++++++++++++++++++++-- 1 file changed, 69 insertions(+), 4 deletions(-) diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl index ce1f6aa30c0..7f8a319c85d 100644 --- a/src/bin/pg_waldump/t/001_basic.pl +++ b/src/bin/pg_waldump/t/001_basic.pl @@ -6,6 +6,7 @@ use warnings FATAL => 'all'; use Cwd; use File::Copy; use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::RecursiveCopy; use PostgreSQL::Test::Utils; use Test::More; use List::Util qw(shuffle); @@ -212,9 +213,13 @@ $node->safe_psql('postgres', qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))} ); -my ($end_lsn, $end_walfile) = split /\|/, +my ($end_lsn, $end_walfile, $wal_segsize) = split /\|/, $node->safe_psql('postgres', - q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())} + q{SELECT pg_current_wal_insert_lsn(), + pg_walfile_name(pg_current_wal_insert_lsn()), + setting + FROM pg_settings + WHERE name = 'wal_segment_size'} ); my $default_ts_oid = $node->safe_psql('postgres', @@ -339,7 +344,7 @@ sub test_pg_waldump # Create a tar archive, shuffle the file order sub generate_archive { - my ($archive, $directory, $compression_flags) = @_; + my ($archive, $directory, $compression_flags, @extra_flags) = @_; my @files; opendir my $dh, $directory or die "opendir: $!"; @@ -350,12 +355,17 @@ sub generate_archive } closedir $dh; + if (!@extra_flags) + { + @extra_flags = @tar_c_flags; + } + @files = shuffle @files; # move into the WAL directory before archiving files my $cwd = getcwd; chdir($directory) || die "chdir: $!"; - command_ok([$tar, @tar_c_flags, $compression_flags, $archive, @files]); + command_ok([$tar, @extra_flags, $compression_flags, $archive, @files]); chdir($cwd) || die "chdir: $!"; } @@ -477,4 +487,59 @@ for my $scenario (@scenarios) } } +SKIP: + skip "tar command is not available", 1 + if !defined $tar; + + my @sparse_flags; + + # Tell $TAR to use GNU tar's PAX sparse file archive format, so we can test + # our handling of that. + + # GNU tar + @sparse_flags = ("--sparse", "--format=pax") + if system("$tar --sparse --format=pax -c " . + $node->data_dir . "/pg_wal/* /dev/null > /dev/null") == 0; + # BSD tar (this is the default, but we still need to detect BSD tar) + @sparse_flags = ("--read-sparse", "--format=pax") + if system("$tar --read-sparse --format=pax -c " . + $node->data_dir . "/pg_wal/* /dev/null > /dev/null") == 0; + + skip "tar command doesn't support GNU PAX format for sparse files", 1 + if !@sparse_flags; + + PostgreSQL::Test::RecursiveCopy::copypath($node->data_dir . '/pg_wal', + $tmp_dir . '/pg_wal_sparse'); + + # truncate the unused part of final WAL file + my $end_byte = $end_lsn; + $end_byte =~ s/\///; + $end_byte = hex($end_byte); + $end_byte %= $wal_segsize; + truncate $tmp_dir . '/pg_wal_sparse/' . $end_walfile, $end_byte; + + # now re-extend it to create a hole + truncate $tmp_dir . '/pg_wal_sparse/' . $end_walfile, $wal_segsize; + + # XXX maybe we should detect sparse files with stat (size > blocks * block + # size?), and skip the test if truncate failed to make one... that + # might happen on eg windows I think? otherwise we'd have to tolerate + # the pg_waldump command succeeding OR failing with a certain message + + generate_archive($tmp_dir . '/pg_wal_sparse.tar', + $tmp_dir . '/pg_wal_sparse', + '-cf', + @sparse_flags); + + # XXX change this to check for new improved error message + command_fails_like( + [ + 'pg_waldump', + '--path' => $tmp_dir . '/pg_wal_sparse.tar', + '--start' => $start_lsn, + '--end' => $end_lsn, + ], + qr/error: could not find WAL/, + 'fails with GNU tar PAX-format sparse files'); + done_testing(); -- 2.53.0
