On 03/03/2018 01:38 PM, Robert Haas wrote: > On Sat, Mar 3, 2018 at 7:32 AM, Robert Haas <robertmh...@gmail.com> wrote: >> On Fri, Mar 2, 2018 at 6:26 PM, Tomas Vondra >> <tomas.von...@2ndquadrant.com> wrote: >>> Hmmm, OK. So we need to have a valid checksum on a page, disable >>> checksums, set some hint bits on the page (which won't be >>> WAL-logged), enable checksums again and still get a valid >>> checksum even with the new hint bits? That's possible, albeit >>> unlikely. >> >> No, the problem is if - as is much more likely - the checksum is >> not still valid. > > Hmm, on second thought ... maybe I didn't think this through > carefully enough. If the checksum matches on the master by chance, > and the page is the same on the standby, then we're fine, right? It's > a weird accident, but nothing is actually broken. The failure > scenario is where the standby has a version of the page with a bad > checksum, but the master has a good checksum. So for example: > checksums disabled, master modifies the page (which is replicated), > master sets some hint bits (coincidentally making the checksum > match), now we try to turn checksums on and don't re-replicate the > page because the checksum already looks correct. >
Yeah. Doesn't that pretty much mean we can't skip any pages that have correct checksum, because we can't rely on standby having the same page data? That is, this block in ProcessSingleRelationFork: /* * If checksum was not set or was invalid, mark the buffer as dirty * and force a full page write. If the checksum was already valid, we * can leave it since we know that any other process writing the * buffer will update the checksum. */ if (checksum != pagehdr->pd_checksum) { START_CRIT_SECTION(); MarkBufferDirty(buf); log_newpage_buffer(buf, false); END_CRIT_SECTION(); } That would mean this optimization - only doing the write when the checksum does not match - is broken. If that's the case, it probably makes restarts/resume more expensive, because this optimization was why after restart the already processed data was only read (and the checksums verified) but not written. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services