On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <mag...@hagander.net>
> > Also if that wasn't clear -- we only do the full page write if there
> > already a checksum on the page and that checksum is correct.
> Suppose that on the master there is a checksum on the page and that
> checksum is correct, but on the standby the page contents differ in
> some way that we don't always WAL-log, like as to hint bits, and there
> the checksum is incorrect. Then you'll enable checksums when the
> standby still has some pages without valid checksums, and disaster
> will ensue.
> I think this could be hard to prevent if checksums are turned on and
> off multiple times.
Do we ever make hintbit changes on the standby for example? If so, it would
definitely cause problems. I didn't realize we did, actually...
I guess we could get there even if we don't by:
* All checksums are correct
* Checkums are disabled (which replicates)
* Non-WAL logged change on the master, which updates checksum but does
* Checksums re-enabled
* Worker sees the checksum as correct, and thus does not force a full page
* Worker completes and flips checksums on which replicates. At this point,
if the replica reads the page, boom.
I guess we have to remove that optimisation. It's definitely a bummer, but
I don't think it's an absolute dealbreaker.
We could say that we keep the optimisation if wal_level=minimal for
example, because then we know there is no replica. But I doubt that's worth
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>