On Sat, Mar 21, 2026 at 9:05 PM Amul Sul <[email protected]> wrote: > > On Sat, Mar 21, 2026 at 5:51 PM Andrew Dunstan <[email protected]> wrote: > > > > > > On 2026-03-21 Sa 2:34 AM, Tom Lane wrote: > > > > Michael Paquier <[email protected]> writes: > > > > On Fri, Mar 20, 2026 at 11:49:02PM -0400, Tom Lane wrote: > > > > Buildfarm members batta and hachi don't like this very much. > > > > I did not look at what's happening on the host, but it seems like a > > safe bet to assume that we are not seeing many failures in the > > buildfarm because we don't have many animals that have the idea to add > > --with-zstd to their build configuration, like these two ones. > > > > That may be part of the story, but only part. I spent a good deal of > > time trying to reproduce batta & hachi's configurations locally, on > > several different platforms, but still couldn't duplicate what they > > are showing. > > > > > > > > > > > > Yeah, I haven't been able to reproduce it either. But while investigating I > > found a couple of issues. We neglected to add one of the tests to > > meson.build, and we neglected to close some files, causing errors on > > windows. > > > > While the proposed fix of closing the file pointer before returning is > correct, we also need to ensure the file is reopened in the next call > to spill any remaining buffered data. I’ve made a small update to > Andrew's 0001 patch to handle this. Also, changes to meson.build don't > seem to be needed as we haven't committed that file yet (unless I am > missing something). > > I’ve also reattached the other patches so they don't get lost: v2-0002 > is Andrew's patch for the archive streamer, and v2-0003 is the patch I > posted previously [1]. > >
On further thought, I don't think v2-0001 is the right patch. Consider the case where we write a temporary file partially: if the next segment required for decoding is that same segment, TarWALDumpReadPage() will find the physical file present and continue decoding, potentially triggering an error later due to the shorter file. I have attached the v3-0001 patch, which ensures that once we start writing a temporary file, it should be finished before performing the lookup. This ensures we don't leave a partial file on disk. Updated patches are attached; 0002 and 0003 remain the same as before. Regards, Amul
From b3bfdac9e425f4cb9fd7d7b6c698dd1607b737ee Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Sat, 21 Mar 2026 22:27:22 +0530 Subject: [PATCH v3 1/3] archive_waldump: skip hash lookup and tighten write_fp invariant In get_archive_wal_entry(), when the streamer is still mid-segment (entry == cur_file), jump directly to read_more instead of looping back to the top and performing a hash table lookup that is guaranteed to fail. --- src/bin/pg_waldump/archive_waldump.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c index b078c2d6960..ee292b6dc8d 100644 --- a/src/bin/pg_waldump/archive_waldump.c +++ b/src/bin/pg_waldump/archive_waldump.c @@ -484,6 +484,7 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo) */ entry = privateInfo->cur_file; +read_more: /* * Fetch more data either when no current file is being tracked or * when its buffer has been fully flushed to the temporary file. @@ -525,11 +526,20 @@ get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo) * file handle so data is flushed to disk before the next segment * starts writing to a different handle. */ - if (entry != privateInfo->cur_file && write_fp != NULL) + if (entry != privateInfo->cur_file) { + Assert(write_fp); fclose(write_fp); write_fp = NULL; } + else + /* + * The file being written hasn't been completed. We must finish + * extracting it before performing the hash lookup; otherwise, the + * lookup might return without flushing the current segment buffer, + * leaving the file open and incomplete on disk. + */ + goto read_more; } /* Requested WAL segment not found */ -- 2.47.1
From 3a52d70947f7c7bc3a0decbd473e95891ad3b6eb Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Sat, 21 Mar 2026 19:48:48 +0530 Subject: [PATCH v3 2/3] Fix-astreamer-decompressor-finalize-to-send-correct --- src/fe_utils/astreamer_gzip.c | 9 +++++---- src/fe_utils/astreamer_lz4.c | 9 +++++---- src/fe_utils/astreamer_zstd.c | 2 +- 3 files changed, 11 insertions(+), 9 deletions(-) diff --git a/src/fe_utils/astreamer_gzip.c b/src/fe_utils/astreamer_gzip.c index 2e080c37a58..df392f67cab 100644 --- a/src/fe_utils/astreamer_gzip.c +++ b/src/fe_utils/astreamer_gzip.c @@ -347,10 +347,11 @@ astreamer_gzip_decompressor_finalize(astreamer *streamer) * End of the stream, if there is some pending data in output buffers then * we must forward it to next streamer. */ - astreamer_content(mystreamer->base.bbs_next, NULL, - mystreamer->base.bbs_buffer.data, - mystreamer->base.bbs_buffer.maxlen, - ASTREAMER_UNKNOWN); + if (mystreamer->bytes_written > 0) + astreamer_content(mystreamer->base.bbs_next, NULL, + mystreamer->base.bbs_buffer.data, + mystreamer->bytes_written, + ASTREAMER_UNKNOWN); astreamer_finalize(mystreamer->base.bbs_next); } diff --git a/src/fe_utils/astreamer_lz4.c b/src/fe_utils/astreamer_lz4.c index 2bc32b42879..605c188007b 100644 --- a/src/fe_utils/astreamer_lz4.c +++ b/src/fe_utils/astreamer_lz4.c @@ -397,10 +397,11 @@ astreamer_lz4_decompressor_finalize(astreamer *streamer) * End of the stream, if there is some pending data in output buffers then * we must forward it to next streamer. */ - astreamer_content(mystreamer->base.bbs_next, NULL, - mystreamer->base.bbs_buffer.data, - mystreamer->base.bbs_buffer.maxlen, - ASTREAMER_UNKNOWN); + if (mystreamer->bytes_written > 0) + astreamer_content(mystreamer->base.bbs_next, NULL, + mystreamer->base.bbs_buffer.data, + mystreamer->bytes_written, + ASTREAMER_UNKNOWN); astreamer_finalize(mystreamer->base.bbs_next); } diff --git a/src/fe_utils/astreamer_zstd.c b/src/fe_utils/astreamer_zstd.c index f26abcfd0fa..4b43ab795e3 100644 --- a/src/fe_utils/astreamer_zstd.c +++ b/src/fe_utils/astreamer_zstd.c @@ -347,7 +347,7 @@ astreamer_zstd_decompressor_finalize(astreamer *streamer) if (mystreamer->zstd_outBuf.pos > 0) astreamer_content(mystreamer->base.bbs_next, NULL, mystreamer->base.bbs_buffer.data, - mystreamer->base.bbs_buffer.maxlen, + mystreamer->zstd_outBuf.pos, ASTREAMER_UNKNOWN); astreamer_finalize(mystreamer->base.bbs_next); -- 2.47.1
From 242b8904682cec326d059e3d11355ca2315c869c Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Sat, 21 Mar 2026 20:57:23 +0530 Subject: [PATCH v3 3/3] pg_waldump: Handle archive exhaustion in init_archive_reader(). When read_archive_file() returns 0, the archive may have already buffered a complete WAL file into the hash table before exhausting the input. Instead of immediately reporting an error, search the hash table for an entry containing at least sizeof(XLogLongPageHeader) bytes. Report a specific error if a WAL entry exists but is too short (truncated/corrupt), or a generic error if no WAL was found at all. Also tighten the loop condition to check for sizeof(XLogLongPageHeader) rather than XLOG_BLCKSZ, since only the long page header is needed at this stage. --- src/bin/pg_waldump/archive_waldump.c | 55 ++++++++++++++++++++++++++-- 1 file changed, 51 insertions(+), 4 deletions(-) diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c index ee292b6dc8d..943c843e05b 100644 --- a/src/bin/pg_waldump/archive_waldump.c +++ b/src/bin/pg_waldump/archive_waldump.c @@ -176,13 +176,60 @@ init_archive_reader(XLogDumpPrivate *privateInfo, * the first WAL segment in the archive so we can extract the WAL segment * size from the long page header. */ - while (entry == NULL || entry->buf->len < XLOG_BLCKSZ) + while (entry == NULL || entry->read_len < sizeof(XLogLongPageHeader)) { if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0) - pg_fatal("could not find WAL in archive \"%s\"", - privateInfo->archive_name); + { + ArchivedWAL_iterator iter; + ArchivedWALFile *e = NULL; + ArchivedWALFile *short_entry = NULL; - entry = privateInfo->cur_file; + entry = NULL; + + /* + * read_archive_file() returned 0, meaning the archive is + * exhausted. However, a sufficiently compressed archive may have + * already read a complete WAL file and inserted it into the hash + * table before returning. Search the hash table for any entry + * that already has enough buffered data to contain the long page + * header; if none is found, the archive contains no usable WAL. + */ + ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter); + while ((e = ArchivedWAL_iterate(privateInfo->archive_wal_htab, + &iter)) != NULL) + { + if (e->read_len >= sizeof(XLogLongPageHeader)) + { + entry = e; + break; + } + /* Remember a short entry in case we need to report it */ + short_entry = e; + } + + if (entry == NULL) + { + /* + * A WAL file was found in the hash table but it does not + * contain enough data to read the long page header, + * indicating a truncated or corrupt WAL segment. + */ + if (short_entry != NULL) + pg_fatal("could not read file \"%s\" from \"%s\" archive: read %d of %d", + short_entry->fname, privateInfo->archive_name, + short_entry->read_len, + (int) sizeof(XLogLongPageHeader)); + + /* + * The hash table contains no WAL entries at all, meaning the + * archive holds no WAL data. + */ + pg_fatal("could not find WAL in archive \"%s\"", + privateInfo->archive_name); + } + } + else + entry = privateInfo->cur_file; } /* Extract the WAL segment size from the long page header */ -- 2.47.1
