On 03/02/2018 11:01 PM, Magnus Hagander wrote:
> On Fri, Mar 2, 2018 at 5:50 PM, Tomas Vondra
> <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> 
> 
>     On 03/02/2018 02:35 PM, Magnus Hagander wrote:
>     >
>     >
>     > On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmh...@gmail.com 
> <mailto:robertmh...@gmail.com>
>     > <mailto:robertmh...@gmail.com <mailto:robertmh...@gmail.com>>> wrote:
>     >
>     >     On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
>     >     <mag...@hagander.net <mailto:mag...@hagander.net>
>     <mailto:mag...@hagander.net <mailto:mag...@hagander.net>>> wrote:
>     >     > Also if that wasn't clear -- we only do the full page write if 
> there isn't
>     >     > already a checksum on the page and that checksum is correct.
>     >
>     >     Hmm.
>     >
>     >     Suppose that on the master there is a checksum on the page and that
>     >     checksum is correct, but on the standby the page contents differ in
>     >     some way that we don't always WAL-log, like as to hint bits, and 
> there
>     >     the checksum is incorrect.  Then you'll enable checksums when the
>     >     standby still has some pages without valid checksums, and disaster
>     >     will ensue.
>     >
>     >     I think this could be hard to prevent if checksums are turned on and
>     >     off multiple times.
>     >
>     >
>     > Do we ever make hintbit changes on the standby for example? If so, it
>     > would definitely cause problems. I didn't realize we did, actually...
>     >
> 
>     I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
>     AFAIK that can't be true on a standby.
> 
>     > I guess we could get there even if we don't by:
>     > * All checksums are correct
>     > * Checkums are disabled (which replicates)
>     > * Non-WAL logged change on the master, which updates checksum but does
>     > *not* replicate
>     > * Checksums re-enabled
>     > * Worker sees the checksum as correct, and thus does not force a full
>     > page write.
>     > * Worker completes and flips checksums on which replicates. At this
>     > point, if the replica reads the page, boom.
>     >
> 
>     Maybe.
> 
>     My understanding of Robert's example is that you can start with an
>     instance that has wal_log_hints=off, and so pages on master/standby may
>     not be 100% identical. Then we do the online checksum thing, and the
>     standby may get pages with incorrect checksums.
> 
> 
> No, in that case the master will issue full page writes for *all* pages,
> since they dind't hvae a checksum. The current patch only avoids doing
> that if the checksum on the master is correct, which it isn't when you
> start from checksums=off.  So this particular problem only shows up if
> you iterate between off/on/off multiple times.
> 

Hmmm, OK. So we need to have a valid checksum on a page, disable
checksums, set some hint bits on the page (which won't be WAL-logged),
enable checksums again and still get a valid checksum even with the new
hint bits? That's possible, albeit unlikely.

> 
>     > I guess we have to remove that optimisation. It's definitely a
>     > bummer, but I don't think it's an absolute dealbreaker.
>     >
> 
>     I agree it's not a deal-breaker. Or at least I don't see why it should
>     be - any other maintenance activity on the database (freezing etc.) will
>     also generate full-page writes.
> 
> 
> Yes.
> 
>  
> 
>     The good thing is the throttling also limits the amount of WAL, so it's
>     possible to prevent generating too many checkpoints etc.
> 
>     I suggest we simply:
> 
>     1) set the checksums to in-progress
>     2) wait for a checkpoint
>     3) use the regular logic for full-pages (i.e. first change after
>     checkpoint does a FPW)
> 
> 
> This is very close to what it does now, except it does not wait for a
> checkpoint in #2. Why does it need that?
> 

To guarantee that the page has a FPW with all the hint bits, before we
start messing with the checksums (or that setting the checksum itself
triggers a FPW).

> 
>     BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does
> 
>         RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
>                           CHECKPOINT_IMMEDIATE);
> 
>     I'm rather unhappy about that - immediate checkpoints have massive
>     impact on production systems, so we try not doing them (That's one of
>     the reasons why CREATE DATABASE is somewhat painful). It usually
>     requires a bit of thinking about when to do such commands. But in this
>     case it's unpredictable when exactly the checksumming completes, so it
>     may easily be in the middle of peak activity.
> 
>     Why not to simply wait for regular spread checkpoint, the way
>     pg_basebackup does it?
> 
> 
> Actually, that was my original idea. I changed it for testing, and shuld
> go change it back.
> 

OK

> 
> 
>     > We could say that we keep the optimisation if wal_level=minimal for
>     > example, because then we know there is no replica. But I doubt
>     > that's worth it?
>     >
> 
>     If it doesn't require a lot of code, why not? But I don't really see
>     much point in doing that.
> 
> 
> Yeah, I doubt there are a lot of people using "minimal" these days, not
> since we changed the default.
> 

Yeah. Although as I said, it depends on how much code would be needed to
enable that optimization (I guess not much). If someone is running with
wal_level=minimal intentionally, why not to help them.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to