Hi Fujii and Michael,

Thanks for your comments.

> On Sep 3, 2025, at 10:47, Michael Paquier <[email protected]> wrote:
> 
> On Wed, Sep 03, 2025 at 09:11:15AM +0900, Fujii Masao wrote:
>> Can pg_waldump really distinguish between the end of WAL and corruption?
> 
> I don't think you can really do that reliably, as some of the messages
> marking the end of WAL could also be bumped into upon a corruption, as
> far as I recall.  We need the CRC record check to make the
> distinction, which we cannot do at this stage because we don't have
> the full record yet for the check.
> 
> Perhaps what's been posted on your thread [1] could be revisited for
> the xlogreader because we are able to read the record headers more
> reliably thanks to Thomas' work around bae868caf222, backtracking on
> my previous take posted here, posted prior to this commit:
> https://www.postgresql.org/message-id/[email protected]
> 
> 

My theory is like:

WAL file has no septic “end of WAL record” marker. It purely depends on 
“xl_tot_len” to decide edge of current WAL record and start next WAL record.

Based on the code comment in xlogreader.c:

    /*
     * Read the record length.
     *
     * NB: Even though we use an XLogRecord pointer here, the whole record
     * header might not fit on this page. xl_tot_len is the first field of the
     * struct, so it must be on this page (the records are MAXALIGNed), but we
     * cannot access any other fields until we've verified that we got the
     * whole header.
     */
    record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
    total_len = record->xl_tot_len;

As “xl_tot_len” can always be read from the current page, it is reliable. Then 
if “xl_tot_len” is 0, that can be considered as a “end marker” of WAL.

If WAL happens to corrupt and xl_tot_len is overwritten to 0, then the WAL 
chain is broken, but the possibility should be very low because WAL corruption 
possibility is low plus that, even if WAL corrupts, xl_tot_len may be 
overwritten a random value, thus possibly of 0 is even lower.

But yes, we are still not 100% sure if that is “end of WAL” or a corruption. So 
maybe we can simply take Tom’s suggestion to change the log message to “reached 
apparent end of WAL stream”, which don’t lose the error hint, and make the 
message less scary, which is a small enhancement.

One thing I am not sure is the error message change would break callers. 
pg_waldump will just print the error message. For xlogrecovery.c, I did a quick 
test, looks like it just eats the error message:

```
2025-09-03 10:46:48.492 CST [52426] LOG:  starting archive recovery
2025-09-03 10:46:48.495 CST [52426] LOG:  consistent recovery state reached at 
0/017AAA90
2025-09-03 10:46:48.495 CST [52420] LOG:  database system is ready to accept 
read-only connections
2025-09-03 10:46:48.495 CST [52426] LOG:  redo starts at 0/017AAA90
2025-09-03 10:46:48.496 CST [52426] LOG:  redo done at 0/017AF398 system usage: 
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-09-03 10:46:48.496 CST [52426] LOG:  last completed transaction was at log 
time 2025-09-03 10:42:02.901807+08
```

So, I guess xlogreader may return a different log message when xl_tot_len is 0. 
Please correct me if my understanding is wrong.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




Reply via email to