On Fri, Mar 2, 2018 at 5:50 PM, Tomas Vondra <tomas.von...@2ndquadrant.com>
> On 03/02/2018 02:35 PM, Magnus Hagander wrote:
> > On Wed, Feb 28, 2018 at 6:06 PM, Robert Haas <robertmh...@gmail.com
> > <mailto:robertmh...@gmail.com>> wrote:
> > On Sun, Feb 25, 2018 at 9:54 AM, Magnus Hagander
> > <mag...@hagander.net <mailto:mag...@hagander.net>> wrote:
> > > Also if that wasn't clear -- we only do the full page write if
> there isn't
> > > already a checksum on the page and that checksum is correct.
> > Hmm.
> > Suppose that on the master there is a checksum on the page and that
> > checksum is correct, but on the standby the page contents differ in
> > some way that we don't always WAL-log, like as to hint bits, and
> > the checksum is incorrect. Then you'll enable checksums when the
> > standby still has some pages without valid checksums, and disaster
> > will ensue.
> > I think this could be hard to prevent if checksums are turned on and
> > off multiple times.
> > Do we ever make hintbit changes on the standby for example? If so, it
> > would definitely cause problems. I didn't realize we did, actually...
> I don't think we do. SetHintBits does TransactionIdIsValid(xid) and
> AFAIK that can't be true on a standby.
> > I guess we could get there even if we don't by:
> > * All checksums are correct
> > * Checkums are disabled (which replicates)
> > * Non-WAL logged change on the master, which updates checksum but does
> > *not* replicate
> > * Checksums re-enabled
> > * Worker sees the checksum as correct, and thus does not force a full
> > page write.
> > * Worker completes and flips checksums on which replicates. At this
> > point, if the replica reads the page, boom.
> My understanding of Robert's example is that you can start with an
> instance that has wal_log_hints=off, and so pages on master/standby may
> not be 100% identical. Then we do the online checksum thing, and the
> standby may get pages with incorrect checksums.
No, in that case the master will issue full page writes for *all* pages,
since they dind't hvae a checksum. The current patch only avoids doing that
if the checksum on the master is correct, which it isn't when you start
from checksums=off. So this particular problem only shows up if you
iterate between off/on/off multiple times.
> > I guess we have to remove that optimisation. It's definitely a
> > bummer, but I don't think it's an absolute dealbreaker.
> I agree it's not a deal-breaker. Or at least I don't see why it should
> be - any other maintenance activity on the database (freezing etc.) will
> also generate full-page writes.
> The good thing is the throttling also limits the amount of WAL, so it's
> possible to prevent generating too many checkpoints etc.
> I suggest we simply:
> 1) set the checksums to in-progress
> 2) wait for a checkpoint
> 3) use the regular logic for full-pages (i.e. first change after
> checkpoint does a FPW)
This is very close to what it does now, except it does not wait for a
checkpoint in #2. Why does it need that?
BTW speaking of checkpoints, I see ChecksumHelperLauncherMain does
> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | \
> I'm rather unhappy about that - immediate checkpoints have massive
> impact on production systems, so we try not doing them (That's one of
> the reasons why CREATE DATABASE is somewhat painful). It usually
> requires a bit of thinking about when to do such commands. But in this
> case it's unpredictable when exactly the checksumming completes, so it
> may easily be in the middle of peak activity.
> Why not to simply wait for regular spread checkpoint, the way
> pg_basebackup does it?
Actually, that was my original idea. I changed it for testing, and shuld go
change it back.
> We could say that we keep the optimisation if wal_level=minimal for
> > example, because then we know there is no replica. But I doubt
> > that's worth it?
> If it doesn't require a lot of code, why not? But I don't really see
> much point in doing that.
Yeah, I doubt there are a lot of people using "minimal" these days, not
since we changed the default.