Re: Online checksums patch - once again

Heikki Linnakangas Fri, 22 Jan 2021 03:56:19 -0800

I read through the latest patch,v31-0001-Support-checksum-enable-disable-in-a-running-clu.patch. Somecomments below:


On 19/01/2021 14:32, Daniel Gustafsson wrote:

+       /*
+        * Hold interrupts for the duration of xlogging to avoid the state of 
data
+        * checksums changing during the processing which would later the 
premise
+        * for xlogging hint bits.
+        */


Sentence sense does not make.

@@ -904,6 +916,7 @@ static void SetLatestXTime(TimestampTz xtime);
 static void SetCurrentChunkStartTime(TimestampTz xtime);
 static void CheckRequiredParameterValues(void);
 static void XLogReportParameters(void);
+static void XlogChecksums(ChecksumType new_type);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineID newTLI,
                                                                TimeLineID 
prevTLI);
 static void LocalSetXLogInsertAllowed(void);


Spelling: make it "XLogChecksums" for consistency.

/*
 * DataChecksumsNeedWrite
 *              Returns whether data checksums must be written or not
 *
 * Returns true iff data checksums are enabled or are in the process of being
 * enabled. In case data checksums are currently being enabled we must write
 * the checksum even though it's not verified during this stage. Interrupts
 * need to be held off by the caller to ensure that the returned state is
 * valid for the duration of the intended processing.
 */
bool
DataChecksumsNeedWrite(void)
{
        Assert(InterruptHoldoffCount > 0);
        return (LocalDataChecksumVersion == PG_DATA_CHECKSUM_VERSION ||
                        LocalDataChecksumVersion == 
PG_DATA_CHECKSUM_INPROGRESS_ON_VERSION ||
                        LocalDataChecksumVersion == 
PG_DATA_CHECKSUM_INPROGRESS_OFF_VERSION);
}

Can you be more precise on the "duration of the intended processing"? Itmeans, until you have actually written out the page, or something likethat, right? The similar explanation in DataChecksumsNeedVerify() iseasier to understand. The two functions are similar, it would be good tophrase the comments similarly, so that you can quickly see thedifference between the two.

/*
 * SetDataChecksumsOnInProgress
 *              Sets the data checksum state to "inprogress-on" to enable 
checksums
 *
 * In order to start the process of enabling data checksums in a running
 * cluster the data_checksum_version state must be changed to "inprogress-on".
 * This state requires data checksums to be written but not verified. The state
 * transition is performed in a critical section in order to provide crash
 * safety, and checkpoints are held off. When the emitted procsignalbarrier
 * has been absorbed by all backends we know that the cluster has started to
 * enable data checksums.
 */

The two "in order" are superfluous, it's more concise to say just "Tostart the process ..." (I was made aware of that by Jürgen Purtz'srecent patches that removed a couple of "in order"s from the docs asunnecessary).

It's a bit confusing to talk about the critical section andprocsignalbarrier here. Does the caller need to wait for theprocsignalbarrier? No, that's just explaining what the function doesinternally. Maybe move that explanation inside the function, and sayhere something like "This function blocks until all processes haveacknowledged the state change" or something like that.

/*
 * SetDataChecksumsOn
 *              Enables data checksums cluster-wide
 *
 * Enabling data checksums is performed using two barriers, the first one
 * sets the checksums state to "inprogress-on" (which is performed by
 * SetDataChecksumsOnInProgress()) and the second one to "on" (performed here).
 * During "inprogress-on", checksums are written but not verified. When all
 * existing pages are guaranteed to have checksums, and all new pages will be
 * initiated with checksums, the state can be changed to "on".
 */

Perhaps the explanation for how these SetDataChecksumsOn() andSetDataChecksumsOnInProgress() functions work together should be movedto one place. For example, explain both functions here atSetDataChecksumsOn(), and just have a "see SetDataChecksumsOn()" in theother function, or no comment at all if they're kept next to each other.As it stands, you have to read both comments and piece together the thebig picture in your head. Maybe add a "see datachecksumsworker.c" here,since there's a longer explanation of the overall mechanism there.

+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+       if (!superuser())
+               ereport(ERROR,
+                               (errmsg("must be superuser")));
+
+       StartDatachecksumsWorkerLauncher(false, 0, 0);
+
+       PG_RETURN_BOOL(true);
+}

The documentation says "Returns <literal>false</literal> in case datachecksums are disabled already", but the function always returns true.The enable_data_checksum() function also returns a constant 'true'; whynot make it void?

The "has immediate effect" comment seems wrong, given that it actuallylaunches a worker process.

+       /*
+        * If the launcher is already started, the only operation we can perform
+        * is to cancel it iff the user requested for checksums to be disabled.
+        * That doesn't however mean that all other cases yield an error, as 
some
+        * might be perfectly benevolent.
+        */


This comment is a bit hard to understand. Maybe something like

"If the launcher is already started, we cannot launch a new one. But ifthe user requested for checksums to be disabled, we can cancel it."

+       if (DatachecksumsWorkerShmem->launcher_started)
+       {
+               if (DatachecksumsWorkerShmem->abort)
+               {
+                       ereport(NOTICE,
+                                       (errmsg("data checksum processing is 
concurrently being aborted, please retry")));
+
+                       LWLockRelease(DatachecksumsWorkerLock);
+                       return;
+               }

If it's being aborted, and the user requested to disable checksum, canwe leave out the NOTICE? I guess not, because the worker process won'tcheck the 'abort' flag, if it's already finished processing all data pages.

+               /*
+                * If the launcher is started data checksums cannot be on or 
off, but
+                * it may be in an inprogress state. Since the state transition 
may
+                * not have happened yet (in case of rapidly initiated checksum 
enable
+                * calls for example) we inspect the target state of the 
currently
+                * running launcher.
+                */

This comment contradicts itself. If the state transition has nothappened yet, then contrary to the first sentence, data checksums *are*currently on or off.

+               if (enable_checksums)
+               {
+                       /*
+                        * If we are asked to enable checksums when they are 
already being
+                        * enabled, there is nothing to do so exit.
+                        */
+                       if (DatachecksumsWorkerShmem->enable_checksums)
+                       {
+                               LWLockRelease(DatachecksumsWorkerLock);
+                               return;
+                       }
+
+                       /*
+                        * Disabling checksums is likely to be a very quick 
operation in
+                        * many cases so trying to abort it to save the 
checksums would
+                        * run the risk of race conditions.
+                        */
+                       else
+                       {
+                               ereport(NOTICE,
+                                               (errmsg("data checksums are 
concurrently being disabled, please retry")));
+
+                               LWLockRelease(DatachecksumsWorkerLock);
+                               return;
+                       }
+
+                       /* This should be unreachable */
+                       Assert(false);
+               }
+               else if (!enable_checksums)
+               {
+                       /*
+                        * Data checksums are already being disabled, exit 
silently.
+                        */
+                       if (DataChecksumsOffInProgress())
+                       {
+                               LWLockRelease(DatachecksumsWorkerLock);
+                               return;
+                       }
+
+                       DatachecksumsWorkerShmem->abort = true;
+                       LWLockRelease(DatachecksumsWorkerLock);
+                       return;
+               }

The Assert seems unnecessary. The "if (!enable_checkums)" also seemsunnecessary, could be just "else".

+       /*
+        * The launcher is currently not running, so we need to query the system
+        * data checksum state to determine how to proceed based on the 
requested
+        * target state.
+        */

"query the system" makes me think "checking the catalogs" or similar,but this just looks at a few variables in shared memory.

+/*
+ * ShutdownDatachecksumsWorkerIfRunning
+ *             Request shutdown of the datachecksumsworker
+ *
+ * This does not turn off processing immediately, it signals the checksum
+ * process to end when done with the current block.
+ */
+void
+ShutdownDatachecksumsWorkerIfRunning(void)
+{
+       LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+
+       /* If the launcher isn't started, there is nothing to shut down */
+       if (DatachecksumsWorkerShmem->launcher_started)
+               DatachecksumsWorkerShmem->abort = true;
+
+       LWLockRelease(DatachecksumsWorkerLock);
+}


This function is unused.

+/*
+ * launcher_cancel_handler
+ *
+ * Internal routine for reacting to SIGINT and flagging the worker to abort.
+ * The worker won't be interrupted immediately but will check for abort flag
+ * between each block in a relation.
+ */
+static void
+launcher_cancel_handler(SIGNAL_ARGS)
+{
+       LWLockAcquire(DatachecksumsWorkerLock, LW_EXCLUSIVE);
+       DatachecksumsWorkerShmem->abort = true;
+       LWLockRelease(DatachecksumsWorkerLock);
+}


Acquiring an lwlock in signal handler is not safe.

Since this is a signal handler, it might still get called after theprocess has finished processing, and has already setDatachecksumsWorkerShmem->launcher_started = false. That seems like anunexpected state.

+/*
+ * WaitForAllTransactionsToFinish
+ *             Blocks awaiting all current transactions to finish
+ *
+ * Returns when all transactions which are active at the call of the function
+ * have ended, or if the postmaster dies while waiting. If the postmaster dies
+ * the abort flag will be set to indicate that the caller of this shouldn't
+ * proceed.
+ */

The caller of this function doesn't check the 'abort' flag AFAICS. Ithink you could just use use WL_EXIT_ON_PM_DEATH to error out onpostmaster death.

+       while (true)
+       {
+               int                     processed_databases = 0;
+
+               /*
+                * Get a list of all databases to process. This may include 
databases
+                * that were created during our runtime.
+                *
+                * Since a database can be created as a copy of any other 
database
+                * (which may not have existed in our last run), we have to 
repeat
+                * this loop until no new databases show up in the list. Since 
we wait
+                * for all pre-existing transactions finish, this way we can be
+                * certain that there are no databases left without checksums.
+                */
+               DatabaseList = BuildDatabaseList();

I think it's unnecessary to wait out the transactions on the firstiteration of this loop. There must be at least one database, so therewill always be at least two iterations.

I think it would be more clear to move theWaitForAllTransactionsToFinish() from BuildDatabaseList() to the callers.

@@ -3567,6 +3571,27 @@ RelationBuildLocalRelation(const char *relname,
                relkind == RELKIND_MATVIEW)
                RelationInitTableAccessMethod(rel);

+ /*

+        * Set the data checksum state. Since the data checksum state can 
change at
+        * any time, the fetched value might be out of date by the time the
+        * relation is built.  DataChecksumsNeedWrite returns true when data
+        * checksums are: enabled; are in the process of being enabled (state:
+        * "inprogress-on"); are in the process of being disabled (state:
+        * "inprogress-off"). Since relhaschecksums is only used to track 
progress
+        * when data checksums are being enabled, and going from disabled to
+        * enabled will clear relhaschecksums before starting, it is safe to use
+        * this value for a concurrent state transition to off.
+        *
+        * If DataChecksumsNeedWrite returns false, and is concurrently changed 
to
+        * true then that implies that checksums are being enabled. Worst case,
+        * this will lead to the relation being processed for checksums even 
though
+        * each page written will have them already.  Performing this last 
shortens
+        * the window, but doesn't avoid it.
+        */
+       HOLD_INTERRUPTS();
+       rel->rd_rel->relhaschecksums = DataChecksumsNeedWrite();
+       RESUME_INTERRUPTS();
+
        /*
         * Okay to insert into the relcache hash table.
         *

I grepped for relhashcheckums, and concluded that the value in therelcache isn't actually used for anything. Not so! Inheap_create_with_catalog(), the actual pg_class row is constructed fromthe relcache entry, so the value set in RelationBuildLocalRelation()finds its way to pg_class. Perhaps it would be more clear to passrelhachecksums directly as an argument to AddNewRelationTuple(). Thatway, the value in the relcache would be truly never used.


- Heikki

Re: Online checksums patch - once again

Reply via email to