Re: [patch] Fix checksum verification in base backups for zero page headers

Michael Paquier Wed, 07 Oct 2020 01:19:38 -0700

On Fri, Sep 25, 2020 at 08:53:27AM +0200, Michael Banck wrote:
> Oh right, I've fixed up the commit message in the attached V4.


Not much a fan of what's proposed here, for a couple of reasons:
- If the page is not new, we should check if the header is sane or
not.
- It may be better to actually count checksum failures if these are
repeatable in a given block, and report them.
- The code would be more simple with a "continue" added for a block
detected as new or with a LSN newer than the backup start.
- The new error messages are confusing, and I think that these are
hard to translate in a meaningful way.

So I think that we should try to use PageIsVerified() directly in the
code path of basebackup.c, and this requires a couple of changes to
make the routine more extensible:
- Addition of a dboid argument for pgstat_report_checksum_failure(),
whose call needs to be changed to use the *_in_db() flavor.
- Addition of an option to decide if a log should be generated or
not.
- Addition of an option to control if a checksum failure should be
reported to pgstat or not, even if this is actually linked to the
previous point, I have seen cases where being able to control both
separately would be helpful, particularly the log part.

For the last two ones, I would use an extensible argument based on
bits32 with a set of flags that the caller can set at will.  Something
else I would recommend here is to use an error context for the case
where we want to log a retry number in the base backup loop that
accepts false positives, with the blkno and the physical file name
when we find failures after retries.
--
Michael

signature.asc
Description: PGP signature

Re: [patch] Fix checksum verification in base backups for zero page headers

Reply via email to