On Thu, Mar 26, 2026 at 6:28 AM Andres Freund <[email protected]> wrote:
> On 2026-03-24 12:11:44 +0900, Michael Paquier wrote:
> > On Sun, Mar 22, 2026 at 11:02:20PM -0400, Tom Lane wrote:
> > > Proposed patch attached. There might be an argument for using some
> > > other size than 256K for the other two decompressors, but my
> > > inclination is to try to make all three use roughly the same block
> > > size. (See also 66ec01dc4.)
> >
> > The buildfarm has switched mostly to green, except on this one:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoatzin&dt=2026-03-23%2006%3A00%3A42
>
> I think there's a few more failues. Fairywren regularly fails, including in a
> run from today.
This fails 100% of the time on my machine, even after e9d72348 and ff84efe4, eg:
# Running: pg_waldump --path /tmp/D8WG1Sv2HE/pg_wal.tar --start
0/017A2610 --end 0/02093848
[09:43:29.288](0.148s) not ok 104 - runs with path option and start
and end locations: exit code 0
[09:43:29.289](0.001s) # Failed test 'runs with path option and
start and end locations: exit code 0'
# at /home/tmunro/projects/postgresql/src/bin/pg_waldump/t/001_basic.pl
line 402.
[09:43:29.290](0.001s) not ok 105 - runs with path option and start
and end locations: no stderr
[09:43:29.291](0.001s) # Failed test 'runs with path option and
start and end locations: no stderr'
# at /home/tmunro/projects/postgresql/src/bin/pg_waldump/t/001_basic.pl
line 402.
[09:43:29.291](0.000s) # got: 'pg_waldump: error: could not
find WAL "000000010000000000000002" in archive "pg_wal.tar"
# '
I can see that it is wrong about the contents of the tar file:
$ pg_waldump --path _tmp_H_1gv81G1L_pg_wal.tar --start 0/017A2610
--end 0/020934F8 2>&1 | tail -3
rmgr: Hash len (rec/tot): 72/ 72, tx: 720, lsn:
0/01FFC1B8, prev 0/01FFC178, desc: INSERT off 40, blkref #0: rel
1663/5/16397 blk 2, blkref #1: rel
1663/5/16397 blk 0
rmgr: Transaction len (rec/tot): 46/ 46, tx: 720, lsn:
0/01FFC200, prev 0/01FFC1B8, desc: COMMIT 2026-03-29 10:15:24.112967
NZDT
pg_waldump: error: could not find WAL "000000010000000000000002" in
archive "_tmp_H_1gv81G1L_pg_wal.tar"
$ tar tvf _tmp_H_1gv81G1L_pg_wal.tar
drwx------ 0 tmunro tmunro 0 Mar 29 10:15 archive_status/
-rw------- 0 tmunro tmunro 0 Mar 29 10:15
archive_status/000000010000000000000002.ready
-rw------- 0 tmunro tmunro 0 Mar 29 10:15
archive_status/000000010000000000000001.ready
drwx------ 0 tmunro tmunro 0 Mar 29 10:08 summaries/
-rw------- 0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000002
-rw------- 0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000001
-rw------- 0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000003
It seems like the place we'd be looking for the file is in
astreamer_tar_header(), so I added in some caveman debugging:
/*
* Parse key fields out of the header.
*/
fprintf(stderr, "XXXX [%s] XXXX\n", &buffer[TAR_OFFSET_NAME]);
strlcpy(member->pathname, &buffer[TAR_OFFSET_NAME], MAXPGPATH);
if (member->pathname[0] == '\0')
pg_fatal("tar member has empty name");
Now I see:
XXXX [archive_status/] XXXX
XXXX [archive_status/000000010000000000000002.ready] XXXX
XXXX [archive_status/000000010000000000000001.ready] XXXX
XXXX [summaries/] XXXX
XXXX [PaxHeader/000000010000000000000002] XXXX
XXXX [GNUSparseFile.0/000000010000000000000002] XXXX
XXXX [000000010000000000000001] XXXX
rmgr: XLOG len (rec/tot): 30/ 30, tx: 0, lsn:
0/017A2610, prev 0/017A25F0, desc: NEXTOID 24576
rmgr: Standby len (rec/tot): 42/ 42, tx: 692, lsn:
0/017A2630, prev 0/017A2610, desc: LOCK xid 692 db 5 rel 16384
rmgr: Storage len (rec/tot): 42/ 42, tx: 692, lsn:
0/017A2660, prev 0/017A2630, desc: CREATE base/5/16384
... lots more normal output ...
rmgr: Hash len (rec/tot): 72/ 72, tx: 720, lsn:
0/01FFBED8, prev 0/01FFBE98, desc: INSERT off 97, blkref #0: rel
1663/5/16397 blk 2, blkref #1: rel
1663/5/16397 blk 0
rmgr: Heap len (rec/tot): 575/ 575, tx: 720, lsn:
0/01FFBF20, prev 0/01FFBED8, desc: INSERT off: 12, flags: 0x08, blkref
#0: rel 1663/5/16393 blk 52
rmgr: Btree len (rec/tot): 64/ 64, tx: 720,
lsn:XXXX [PaxHeader/000000010000000000000003] XXXX
XXXX [GNUSparseFile.0/000000010000000000000003] XXXX
0/01FFC178, prev 0/01FFBF20, desc: INSERT_LEAF off: 344, blkref #0:
rel 1663/5/16396 blk 2
rmgr: Hash len (rec/tot): 72/ 72, tx: 720, lsn:
0/01FFC1B8, prev 0/01FFC178, desc: INSERT off 40, blkref #0: rel
1663/5/16397 blk 2, blkref #1: rel
1663/5/16397 blk 0
rmgr: Transaction len (rec/tot): 46/ 46, tx: 720, lsn:
0/01FFC200, prev 0/01FFC1B8, desc: COMMIT 2026-03-29 10:15:24.112967
NZDT
pg_waldump: error: could not find WAL "000000010000000000000002" in
archive "_tmp_H_1gv81G1L_pg_wal.tar"
Seems like it already stepped over 000000010000000000000002 earlier?
Could it be a table-of-contents order dependency bug or something like
that?