On Fri, Mar 28, 2025 at 11:35:23PM -0400, Andres Freund wrote:
> The number of combinations is annoyingly large. It's e.g. plausible to use
> ignore_checksum_failure=on and zero_damaged_pages=on at the same time for
> recovery.
That's intricate indeed.
> But I finally got to a point where the code ends up readable, without undue
> duplication. It would, leaving some nasty hack aside, require a
> errhint_internal() - but I can't imagine a reason against introducing that,
> given we have it for the errmsg and errhint.
Introducing that is fine.
> Here's the relevant code:
>
> /*
> * Treat a read that had both zeroed buffers *and* ignored checksums as
> a
> * special case, it's too irregular to be emitted the same way as the
> other
> * cases.
> */
> if (zeroed_any && ignored_any)
> {
> Assert(zeroed_any && ignored_any);
> Assert(nblocks > 1); /* same block can't be both zeroed and
> ignored */
> Assert(result.status != PGAIO_RS_ERROR);
> affected_count = zeroed_or_error_count;
>
> ereport(elevel,
> errcode(ERRCODE_DATA_CORRUPTED),
> errmsg("zeroing %u pages and ignoring %u
> checksum failures among blocks %u..%u of relation %s",
> affected_count, checkfail_count,
> first, last, rpath.str),
Translation stumbles on this one, because each of the first two %u are
plural-sensitive. I'd do one of:
- Call ereport() twice, once for zeroed pages and once for ignored checksums.
Since elevel <= ERROR here, that doesn't lose the second call.
- s/pages/page(s)/ like msgid "There are %d other session(s) and %d prepared
transaction(s) using the database."
- Something more like the style of VACUUM VERBOSE, e.g. "INTRO_TEXT: %u
zeroed, %u checksums ignored". I've not written INTRO_TEXT, and this
doesn't really resolve pluralization. Probably don't use this option.
> affected_count > 1 ?
> errdetail("Block %u held first zeroed page.",
> first + first_off) : 0,
> errhint("See server log for details about the
> other %u invalid blocks.",
> affected_count +
> checkfail_count - 1));
> return;
> }
>
> /*
> * The other messages are highly repetitive. To avoid duplicating a long
> * and complicated ereport(), gather the translated format strings
> * separately and then do one common ereport.
> */
> if (result.status == PGAIO_RS_ERROR)
> {
> Assert(!zeroed_any); /* can't have invalid pages when
> zeroing them */
> affected_count = zeroed_or_error_count;
> msg_one = _("invalid page in block %u of relation %s");
> msg_mult = _("%u invalid pages among blocks %u..%u of relation
> %s");
> det_mult = _("Block %u held first invalid page.");
> hint_mult = _("See server log for the other %u invalid
> blocks.");
For each hint_mult, we would usually use ngettext() instead of _(). (Would be
errhint_plural() if not separated from its ereport().) Alternatively,
s/blocks/block(s)/ is fine.
> }
> else if (zeroed_any && !ignored_any)
> {
> affected_count = zeroed_or_error_count;
> msg_one = _("invalid page in block %u of relation %s; zeroing
> out page");
> msg_mult = _("zeroing out %u invalid pages among blocks %u..%u
> of relation %s");
> det_mult = _("Block %u held first zeroed page.");
> hint_mult = _("See server log for the other %u zeroed blocks.");
> }
> else if (!zeroed_any && ignored_any)
> {
> affected_count = checkfail_count;
> msg_one = _("ignoring checksum failure in block %u of relation
> %s");
> msg_mult = _("ignoring %u checksum failures among blocks %u..%u
> of relation %s");
> det_mult = _("Block %u held first ignored page.");
> hint_mult = _("See server log for the other %u ignored
> blocks.");
> }
> else
> pg_unreachable();
>
> ereport(elevel,
> errcode(ERRCODE_DATA_CORRUPTED),
> affected_count == 1 ?
> errmsg_internal(msg_one, first + first_off, rpath.str) :
> errmsg_internal(msg_mult, affected_count, first, last,
> rpath.str),
> affected_count > 1 ? errdetail_internal(det_mult, first
> + first_off) : 0,
> affected_count > 1 ? errhint_internal(hint_mult,
> affected_count - 1) : 0);
>
> Does that approach make sense?
Yes.
> What do you think about using
> "zeroing invalid page in block %u of relation %s"
> instead of
> "invalid page in block %u of relation %s; zeroing out page"
I like the replacement. It moves the important part to the front, and it's
shorter.
> I thought about instead translating "ignoring", "ignored", "zeroing",
> "zeroed", etc separately, but I have doubts about how well that would actually
> translate.
Agreed, I wouldn't have high hopes for that. An approach like that would
probably need messages that separate the independently-translated part
grammatically, e.g.:
/* last %s is translation of "ignore" or "zero-fill" */
"invalid page in block %u of relation %s; resolved by method \"%s\""
(Again, I'm not recommending that.)