On 1/25/17 10:38 PM, Stephen Frost wrote: > * Robert Haas (robertmh...@gmail.com) wrote: >> On Wed, Jan 25, 2017 at 7:37 PM, Andres Freund <and...@anarazel.de> wrote: >>> On 2017-01-25 19:30:08 -0500, Stephen Frost wrote: >>>> * Peter Geoghegan (p...@heroku.com) wrote: >>>>> On Wed, Jan 25, 2017 at 3:30 PM, Stephen Frost <sfr...@snowman.net> wrote: >>>>>> As it is, there are backup solutions which *do* check the checksum when >>>>>> backing up PG. This is no longer, thankfully, some hypothetical thing, >>>>>> but something which really exists and will hopefully keep users from >>>>>> losing data. >>>>> >>>>> Wouldn't that have issues with torn pages? >>>> >>>> No, why would it? The page has either been written out by PG to the OS, >>>> in which case the backup s/w will see the new page, or it hasn't been. >>> >>> Uh. Writes aren't atomic on that granularity. That means you very well >>> *can* see a torn page (in linux you can e.g. on 4KB os page boundaries >>> of a 8KB postgres page). Just read a page while it's being written out. >> >> Yeah. This is also why backups force full page writes on even if >> they're turned off in general. > > I've got a question into David about this, I know we chatted about the > risk at one point, I just don't recall what we ended up doing (I can > imagine a few different possible things- re-read the page, which isn't a > guarantee but reduces the chances a fair bit, or check the LSN, or > perhaps the plan was to just check if it's in the WAL, as I mentioned) > or if we ended up concluding it wasn't a risk for some, perhaps > incorrect, reason and need to revisit it.
The solution was to simply ignore the checksums of any pages with an LSN >= the LSN returned by pg_start_backup(). This means that hot blocks may never be checked during backup, but if they are active then any problems should be caught directly by PostgreSQL. This technique assumes that blocks can be consistently read in the order they were written. If the second 4k (or 512 byte, etc.) block of the fwrite is visible before the first 4k block then there would a false positive. I have a hard time imagining any sane buffering system working this way, but I can't discount it. It's definitely possible for pages on disk to have this characteristic (i.e., the first block is not written first) but that should be fixed during recovery before it is possible to take a backup. Note that reports of page checksum errors are informational only and do not have any effect on the backup process. Even so we would definitely prefer to avoid false positives. If anybody can poke a hole in this solution then I would like to hear it. -- -David da...@pgmasters.net
Description: OpenPGP digital signature