On 03/02/2018 02:35 PM, Magnus Hagander wrote:
> On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmh...@gmail.com
> <mailto:robertmh...@gmail.com>> wrote:
>     On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
>     <mag...@hagander.net <mailto:mag...@hagander.net>> wrote:
>     > Also if that wasn't clear -- we only do the full page write if there 
> isn't
>     > already a checksum on the page and that checksum is correct.
>     Hmm.
>     Suppose that on the master there is a checksum on the page and that
>     checksum is correct, but on the standby the page contents differ in
>     some way that we don't always WAL-log, like as to hint bits, and there
>     the checksum is incorrect.  Then you'll enable checksums when the
>     standby still has some pages without valid checksums, and disaster
>     will ensue.
>     I think this could be hard to prevent if checksums are turned on and
>     off multiple times.
> Do we ever make hintbit changes on the standby for example? If so, it
> would definitely cause problems. I didn't realize we did, actually...

I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
AFAIK that can't be true on a standby.

> I guess we could get there even if we don't by:
> * All checksums are correct
> * Checkums are disabled (which replicates)
> * Non-WAL logged change on the master, which updates checksum but does
> *not* replicate
> * Checksums re-enabled
> * Worker sees the checksum as correct, and thus does not force a full
> page write.
> * Worker completes and flips checksums on which replicates. At this
> point, if the replica reads the page, boom.


My understanding of Robert's example is that you can start with an
instance that has wal_log_hints=off, and so pages on master/standby may
not be 100% identical. Then we do the online checksum thing, and the
standby may get pages with incorrect checksums.

> I guess we have to remove that optimisation. It's definitely a
> bummer, but I don't think it's an absolute dealbreaker.

I agree it's not a deal-breaker. Or at least I don't see why it should
be - any other maintenance activity on the database (freezing etc.) will
also generate full-page writes.

The good thing is the throttling also limits the amount of WAL, so it's
possible to prevent generating too many checkpoints etc.

I suggest we simply:

1) set the checksums to in-progress
2) wait for a checkpoint
3) use the regular logic for full-pages (i.e. first change after
checkpoint does a FPW)

BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does


I'm rather unhappy about that - immediate checkpoints have massive
impact on production systems, so we try not doing them (That's one of
the reasons why CREATE DATABASE is somewhat painful). It usually
requires a bit of thinking about when to do such commands. But in this
case it's unpredictable when exactly the checksumming completes, so it
may easily be in the middle of peak activity.

Why not to simply wait for regular spread checkpoint, the way
pg_basebackup does it?

> We could say that we keep the optimisation if wal_level=minimal for 
> example, because then we know there is no replica. But I doubt
> that's worth it?

If it doesn't require a lot of code, why not? But I don't really see
much point in doing that.


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to