On Thu, Mar 26, 2026 at 6:28 AM Andres Freund <[email protected]> wrote:
> On 2026-03-24 12:11:44 +0900, Michael Paquier wrote:
> > On Sun, Mar 22, 2026 at 11:02:20PM -0400, Tom Lane wrote:
> > > Proposed patch attached.  There might be an argument for using some
> > > other size than 256K for the other two decompressors, but my
> > > inclination is to try to make all three use roughly the same block
> > > size.  (See also 66ec01dc4.)
> >
> > The buildfarm has switched mostly to green, except on this one:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoatzin&dt=2026-03-23%2006%3A00%3A42
>
> I think there's a few more failues. Fairywren regularly fails, including in a
> run from today.

This fails 100% of the time on my machine, even after e9d72348 and ff84efe4, eg:

# Running: pg_waldump --path /tmp/D8WG1Sv2HE/pg_wal.tar --start
0/017A2610 --end 0/02093848
[09:43:29.288](0.148s) not ok 104 - runs with path option and start
and end locations: exit code 0
[09:43:29.289](0.001s) #   Failed test 'runs with path option and
start and end locations: exit code 0'
#   at /home/tmunro/projects/postgresql/src/bin/pg_waldump/t/001_basic.pl
line 402.
[09:43:29.290](0.001s) not ok 105 - runs with path option and start
and end locations: no stderr
[09:43:29.291](0.001s) #   Failed test 'runs with path option and
start and end locations: no stderr'
#   at /home/tmunro/projects/postgresql/src/bin/pg_waldump/t/001_basic.pl
line 402.
[09:43:29.291](0.000s) #          got: 'pg_waldump: error: could not
find WAL "000000010000000000000002" in archive "pg_wal.tar"
# '

I can see that it is wrong about the contents of the tar file:

$ pg_waldump --path _tmp_H_1gv81G1L_pg_wal.tar --start 0/017A2610
--end 0/020934F8 2>&1 | tail -3
rmgr: Hash        len (rec/tot):     72/    72, tx:        720, lsn:
0/01FFC1B8, prev 0/01FFC178, desc: INSERT off 40, blkref #0: rel
1663/5/16397 blk 2, blkref #1: rel
1663/5/16397 blk 0
rmgr: Transaction len (rec/tot):     46/    46, tx:        720, lsn:
0/01FFC200, prev 0/01FFC1B8, desc: COMMIT 2026-03-29 10:15:24.112967
NZDT
pg_waldump: error: could not find WAL "000000010000000000000002" in
archive "_tmp_H_1gv81G1L_pg_wal.tar"

$ tar tvf _tmp_H_1gv81G1L_pg_wal.tar
drwx------  0 tmunro tmunro      0 Mar 29 10:15 archive_status/
-rw-------  0 tmunro tmunro      0 Mar 29 10:15
archive_status/000000010000000000000002.ready
-rw-------  0 tmunro tmunro      0 Mar 29 10:15
archive_status/000000010000000000000001.ready
drwx------  0 tmunro tmunro      0 Mar 29 10:08 summaries/
-rw-------  0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000002
-rw-------  0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000001
-rw-------  0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000003

It seems like the place we'd be looking for the file is in
astreamer_tar_header(), so I added in some caveman debugging:

    /*
     * Parse key fields out of the header.
     */
fprintf(stderr, "XXXX [%s] XXXX\n", &buffer[TAR_OFFSET_NAME]);
    strlcpy(member->pathname, &buffer[TAR_OFFSET_NAME], MAXPGPATH);
    if (member->pathname[0] == '\0')
        pg_fatal("tar member has empty name");

Now I see:

XXXX [archive_status/] XXXX
XXXX [archive_status/000000010000000000000002.ready] XXXX
XXXX [archive_status/000000010000000000000001.ready] XXXX
XXXX [summaries/] XXXX
XXXX [PaxHeader/000000010000000000000002] XXXX
XXXX [GNUSparseFile.0/000000010000000000000002] XXXX
XXXX [000000010000000000000001] XXXX
rmgr: XLOG        len (rec/tot):     30/    30, tx:          0, lsn:
0/017A2610, prev 0/017A25F0, desc: NEXTOID 24576
rmgr: Standby     len (rec/tot):     42/    42, tx:        692, lsn:
0/017A2630, prev 0/017A2610, desc: LOCK xid 692 db 5 rel 16384
rmgr: Storage     len (rec/tot):     42/    42, tx:        692, lsn:
0/017A2660, prev 0/017A2630, desc: CREATE base/5/16384
... lots more normal output ...
rmgr: Hash        len (rec/tot):     72/    72, tx:        720, lsn:
0/01FFBED8, prev 0/01FFBE98, desc: INSERT off 97, blkref #0: rel
1663/5/16397 blk 2, blkref #1: rel
1663/5/16397 blk 0
rmgr: Heap        len (rec/tot):    575/   575, tx:        720, lsn:
0/01FFBF20, prev 0/01FFBED8, desc: INSERT off: 12, flags: 0x08, blkref
#0: rel 1663/5/16393 blk 52
rmgr: Btree       len (rec/tot):     64/    64, tx:        720,
lsn:XXXX [PaxHeader/000000010000000000000003] XXXX
XXXX [GNUSparseFile.0/000000010000000000000003] XXXX
 0/01FFC178, prev 0/01FFBF20, desc: INSERT_LEAF off: 344, blkref #0:
rel 1663/5/16396 blk 2
rmgr: Hash        len (rec/tot):     72/    72, tx:        720, lsn:
0/01FFC1B8, prev 0/01FFC178, desc: INSERT off 40, blkref #0: rel
1663/5/16397 blk 2, blkref #1: rel
1663/5/16397 blk 0
rmgr: Transaction len (rec/tot):     46/    46, tx:        720, lsn:
0/01FFC200, prev 0/01FFC1B8, desc: COMMIT 2026-03-29 10:15:24.112967
NZDT
pg_waldump: error: could not find WAL "000000010000000000000002" in
archive "_tmp_H_1gv81G1L_pg_wal.tar"

Seems like it already stepped over 000000010000000000000002 earlier?
Could it be a table-of-contents order dependency bug or something like
that?


Reply via email to