On Tue, Jul 25, 2023 at 8:18 AM Robert Haas <robertmh...@gmail.com> wrote: > (Yeah, I know we have code to verify checksums during a base > backup, but as discussed elsewhere, it doesn't work.)
BTW the the code you are referring to there seems to think 4KB page-halves are atomic; not sure if that's imagining page-level locking in ancient Linux (?), or imagining default setvbuf() buffer size observed with some specific implementation of fread(), or confusing power-failure-sector-based atomicity with concurrent access atomicity, or something else, but for the record what we actually see in this scenario on ext4 is the old/new page contents mashed together on much smaller boundaries (maybe cache lines), caused by duelling concurrent memcpy() to/from, independent of any buffer/page-level implementation details we might have been thinking of with that code. Makes me wonder if it's even technically sound to examine the LSN. > It's also why we > have to force full-page write on during a backup. But the whole thing > is nasty because you can't really verify anything about the backup you > just took. It may be full of gibberish blocks but don't worry because, > if all goes well, recovery will fix it. But you won't really know > whether recovery actually does fix it. You just kind of have to cross > your fingers and hope. Well, not without also scanning the WAL for FPIs, anyway... And conceptually, that's why I think we probably want an 'FPI' of the control file somewhere.