On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander <mag...@hagander.net>
> wrote:
> > Also if that wasn't clear -- we only do the full page write if there
> isn't
> > already a checksum on the page and that checksum is correct.
>
> Hmm.
>
> Suppose that on the master there is a checksum on the page and that
> checksum is correct, but on the standby the page contents differ in
> some way that we don't always WAL-log, like as to hint bits, and there
> the checksum is incorrect.  Then you'll enable checksums when the
> standby still has some pages without valid checksums, and disaster
> will ensue.
>
> I think this could be hard to prevent if checksums are turned on and
> off multiple times.
>
>
Do we ever make hintbit changes on the standby for example? If so, it would
definitely cause problems. I didn't realize we did, actually...

I guess we could get there even if we don't by:
* All checksums are correct
* Checkums are disabled (which replicates)
* Non-WAL logged change on the master, which updates checksum but does
*not* replicate
* Checksums re-enabled
* Worker sees the checksum as correct, and thus does not force a full page
write.
* Worker completes and flips checksums on which replicates. At this point,
if the replica reads the page, boom.

I guess we have to remove that optimisation. It's definitely a bummer, but
I don't think it's an absolute dealbreaker.

We could say that we keep the optimisation if wal_level=minimal for
example, because then we know there is no replica. But I doubt that's worth
it?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/>
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Reply via email to