Hi,

for many years I have been running a backup check system for my databases
that constantly
- upgrade to the latest available PG minor version (Debian PGDG)
- restore a DB from a basebackup on S3
- replay all available WAL
- perform a ton consistency checks
- repeat the same with the next DB
- when all DBs are done start from the beginning

All the DBs are PG14. After 14.21 was released last week I saw some of our
bigger DBs failing after replaying a few 1000 WAL files.

The error message reads like so:

2026-02-14 01:53:59.595 UTC [2441074] LOG: restored log file
"0000000500017F8D0000004E" from archive 2026-02-14 01:53:59.605 UTC
[2441074] FATAL: could not access status of transaction 2030956544
2026-02-14 01:53:59.605 UTC [2441074] DETAIL: Could not read from file
"pg_multixact/offsets/790D" at offset 245760: read too few bytes.
2026-02-14 01:53:59.605 UTC [2441074] CONTEXT: WAL redo at 17F8D/4E1E03E8
for MultiXact/CREATE_ID: 2030956543 offset 1335629905 nmembers 2: 691151655
(keysh) 691151658 (keysh)
It does not happen every time. A freshly taken backup succeeded in
restoring ~3000 WAL files. In the next round it failed at ~5000 WAL files.
If it fails, it is reproducible. It will fail at the same multixact offset
again.

The multixact offset file where it fails does not exist in the base backup.
It is built during replay. In all cases I saw, the offset mentioned in the
error message is the length of the file. So, PG apparently wants to read
beyond the end of the file.

After rolling back to PG 14.20, everything started working again.

The release notes mention a few multixact changes from 14.20 to 14.21. I
can't claim to understand the change fully. But
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=81416e101
looks like the best culprit candidate to me.

All the best,
Torsten

Reply via email to