(I may have said this before, but) My overall high-level impression of this patch is that it's really cmmplex for a feature that you use maybe once in the lifetime of a cluster. I'm happy to review but I'm not planning to commit this myself. I don't object if some other committer picks this up (Magnus?).

Now to the latest patch version:

On 03/02/2021 18:15, Daniel Gustafsson wrote:
The previous v35 had a tiny conflict in pg_class.h, the attached v36 (which is
a squash of the 2 commits in v35) fixes that.  No other changes are introduced
in this version.


        /*
         * Check to see if my copy of RedoRecPtr is out of date. If so, may have
         * to go back and have the caller recompute everything. This can only
         * happen just after a checkpoint, so it's better to be slow in this 
case
         * and fast otherwise.
         *
         * Also check to see if fullPageWrites or forcePageWrites was just 
turned
         * on, or if we are in the process of enabling checksums in the cluster;
         * if we weren't already doing full-page writes then go back and 
recompute.
         *
         * If we aren't doing full-page writes then RedoRecPtr doesn't actually
         * affect the contents of the XLOG record, so we'll update our local 
copy
         * but not force a recomputation.  (If doPageWrites was just turned off,
         * we could recompute the record without full pages, but we choose not 
to
         * bother.)
         */
        if (RedoRecPtr != Insert->RedoRecPtr)
        {
                Assert(RedoRecPtr < Insert->RedoRecPtr);
                RedoRecPtr = Insert->RedoRecPtr;
        }
        doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || 
DataChecksumsOnInProgress());

Why does this use DataChecksumsOnInProgress() instead of DataChecksumsNeedWrite()? If checksums are enabled, you always need full-page writes, don't you? If not, then why is it needed in the inprogress-on state?

We also set doPageWrites in InitXLOGAccess(). That should match the condition above (although it doesn't matter for correctness).

/*
 * DataChecksumsNeedVerify
 *              Returns whether data checksums must be verified or not
 *
 * Data checksums are only verified if they are fully enabled in the cluster.
 * During the "inprogress-on" and "inprogress-off" states they are only
 * updated, not verified (see datachecksumsworker.c for a longer discussion).
 *
 * This function is intended for callsites which have read data and are about
 * to perform checksum validation based on the result of this. To avoid the
 * the risk of the checksum state changing between reading and performing the
 * validation (or not), interrupts must be held off. This implies that calling
 * this function must be performed as close to the validation call as possible
 * to keep the critical section short. This is in order to protect against
 * time of check/time of use situations around data checksum validation.
 */
bool
DataChecksumsNeedVerify(void)
{
        Assert(InterruptHoldoffCount > 0);
        return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION);
}

What exactly is the intended call pattern here? Something like this?

smgrread() a data page
HOLD_INTERRUPTS();
if (DataChecksumsNeedVerify())
{
  if (pg_checksum_page((char *) page, blkno) != expected)
    elog(ERROR, "bad checksum");
}
RESUME_INTERRUPTS();

That seems to be what the code currently does. What good does holding interrupts do here? If checksums were not fully enabled at the smgrread() call, the page might have incorrect checksums, and if the state transitions from inprogress-on to on between the smggread() call and the DataChecksumsNeedVerify() call, you'll get an error. I think you need to hold the interrupts *before* the smgrread() call.

/*
 * Set checksum for a page in private memory.
 *
 * This must only be used when we know that no other process can be modifying
 * the page buffer.
 */
void
PageSetChecksumInplace(Page page, BlockNumber blkno)
{
        HOLD_INTERRUPTS();
        /* If we don't need a checksum, just return */
        if (PageIsNew(page) || !DataChecksumsNeedWrite())
        {
                RESUME_INTERRUPTS();
                return;
        }

        ((PageHeader) page)->pd_checksum = pg_checksum_page((char *) page, 
blkno);
        RESUME_INTERRUPTS();
}

The checksums state might change just after this call, before the caller has actually performed the smgrwrite() or smgrextend() call. The caller needs to hold interrupts across this function and the smgrwrite/smgrextend() call. It is a bad idea to HOLD_INTERRUPTS() here, because that just masks bugs where the caller isn't holding the interrupts. Same in PageSetChecksumCopy().

- Heikki


Reply via email to