On Tue, Apr 26, 2022 at 6:11 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > I believe that the WAL prefetch patch probably accounts for the > intermittent errors that buildfarm member topminnow has shown > since it went in, eg [1]: > > diff -U3 > /home/nm/ext4/HEAD/pgsql/contrib/pg_walinspect/expected/pg_walinspect.out > /home/nm/ext4/HEAD/pgsql.build/contrib/pg_walinspect/results/pg_walinspect.out
Hmm, maybe but I suspect not. I think I might see what's happening here. > +ERROR: could not read WAL at 0/1903E40 > I've reproduced this manually on that machine, and confirmed that the > proximate cause is that XLogNextRecord() is returning NULL because > state->decode_queue_head == NULL, without bothering to provide an errormsg > (which doesn't seem very well thought out in itself). I obtained the Thanks for doing that. After several hours of trying I also managed to reproduce it on that gcc23 system (not at all sure why it doesn't show up elsewhere; MIPS 32 bit layout may be a factor), and added some trace to get some more clues. Still looking into it, but here is the current hypothesis I'm testing: 1. The reason there's a messageless ERROR in this case is because there is new read_page callback logic introduced for pg_walinspect, called via read_local_xlog_page_no_wait(), which is like the old read_local_xlog_page() except that it returns -1 if you try to read past the current "flushed" LSN, and we have no queued message. An error is then reported by XLogReadRecord(), and appears to the user. 2. The reason pg_walinspect tries to read WAL data past the flushed LSN is because its GetWALRecordsInfo() function keeps calling XLogReadRecord() until EndRecPtr >= end_lsn, where end_lsn is taken from a snapshot of the flushed LSN, but I don't see where it takes into account that the flushed LSN might momentarily fall in the middle of a record. In that case, xlogreader.c will try to read the next page, which fails because it's past the flushed LSN (see point 1). I will poke some more tomorrow to try to confirm this and try to come up with a fix.