On Fri, Jul 21, 2023 at 8:52 PM Thomas Munro <thomas.mu...@gmail.com> wrote: > Idea for future research: Perhaps pg_backup_stop()'s label-file > output should include the control file image (suitably encoded)? Then > the recovery-from-label code could completely ignore the existing > control file, and overwrite it using that copy. It's already > partially ignoring it, by using the label file's checkpoint LSN > instead of the control file's. Perhaps the captured copy could > include the correct LSN already, simplifying that code, and the low > level backup procedure would not need any additional steps or caveats. > No more atomicity problem for low-level-backups... but probably not > something we would back-patch, for such a rare failure mode.
I don't really know what the solution is, but this is a general problem with the low-level backup API, and I think it sucks pretty hard. Here, we're talking about the control file, but the same problem exists with the data files. We try to work around that but it's all hacks. Unless your backup tool has special magic powers of some kind, you can't take a backup using either pg_basebackup or the low-level API and then check that individual blocks have valid checksums, or that they have sensible, interpretable contents, because they might not. (Yeah, I know we have code to verify checksums during a base backup, but as discussed elsewhere, it doesn't work.) It's also why we have to force full-page write on during a backup. But the whole thing is nasty because you can't really verify anything about the backup you just took. It may be full of gibberish blocks but don't worry because, if all goes well, recovery will fix it. But you won't really know whether recovery actually does fix it. You just kind of have to cross your fingers and hope. It's unclear to me how we could do better, especially when using the low-level API. BASE_BACKUP could read via shared_buffers instead of the FS, and I think that might be a good idea if we can defend adequately against cache poisoning, but with the low-level API someone may just be calling a FS-level snapshot primitive. Unless we're prepared to pause all writes while that happens, I don't know how to do better. -- Robert Haas EDB: http://www.enterprisedb.com