On Fri, Sep 25, 2020 at 08:53:27AM +0200, Michael Banck wrote: > Oh right, I've fixed up the commit message in the attached V4.
Not much a fan of what's proposed here, for a couple of reasons: - If the page is not new, we should check if the header is sane or not. - It may be better to actually count checksum failures if these are repeatable in a given block, and report them. - The code would be more simple with a "continue" added for a block detected as new or with a LSN newer than the backup start. - The new error messages are confusing, and I think that these are hard to translate in a meaningful way. So I think that we should try to use PageIsVerified() directly in the code path of basebackup.c, and this requires a couple of changes to make the routine more extensible: - Addition of a dboid argument for pgstat_report_checksum_failure(), whose call needs to be changed to use the *_in_db() flavor. - Addition of an option to decide if a log should be generated or not. - Addition of an option to control if a checksum failure should be reported to pgstat or not, even if this is actually linked to the previous point, I have seen cases where being able to control both separately would be helpful, particularly the log part. For the last two ones, I would use an extensible argument based on bits32 with a set of flags that the caller can set at will. Something else I would recommend here is to use an error context for the case where we want to log a retry number in the base backup loop that accepts false positives, with the blkno and the physical file name when we find failures after retries. -- Michael
signature.asc
Description: PGP signature