On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <mag...@hagander.net> > wrote: > > Also if that wasn't clear -- we only do the full page write if there > isn't > > already a checksum on the page and that checksum is correct. > > Hmm. > > Suppose that on the master there is a checksum on the page and that > checksum is correct, but on the standby the page contents differ in > some way that we don't always WAL-log, like as to hint bits, and there > the checksum is incorrect. Then you'll enable checksums when the > standby still has some pages without valid checksums, and disaster > will ensue. > > I think this could be hard to prevent if checksums are turned on and > off multiple times. > > Do we ever make hintbit changes on the standby for example? If so, it would definitely cause problems. I didn't realize we did, actually... I guess we could get there even if we don't by: * All checksums are correct * Checkums are disabled (which replicates) * Non-WAL logged change on the master, which updates checksum but does *not* replicate * Checksums re-enabled * Worker sees the checksum as correct, and thus does not force a full page write. * Worker completes and flips checksums on which replicates. At this point, if the replica reads the page, boom. I guess we have to remove that optimisation. It's definitely a bummer, but I don't think it's an absolute dealbreaker. We could say that we keep the optimisation if wal_level=minimal for example, because then we know there is no replica. But I doubt that's worth it? -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>