Re: Online enabling of checksums

Tomas Vondra Tue, 03 Apr 2018 15:12:32 -0700

On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
>     <[email protected] <mailto:[email protected]>>
>     wrote:
> 
>         On 03/31/2018 05:05 PM, Magnus Hagander wrote:
>         > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
>         > <[email protected]
>         <mailto:[email protected]>
>         <mailto:[email protected]
>         <mailto:[email protected]>>> wrote:
>         >
>         > ...
>         >
>         >     I do think just waiting for all running transactions to 
> complete is
>         >     fine, and it's not the first place where we use it - CREATE 
> SUBSCRIPTION
>         >     does pretty much exactly the same thing (and CREATE INDEX 
> CONCURRENTLY
>         >     too, to some extent). So we have a precedent / working code we 
> can copy.
>         >
>         >
>         > Thinking again, I don't think it should be done as part of
>         > BuildRelationList(). We should just do it once in the launcher 
> before
>         > starting, that'll be both easier and cleaner. Anything started after
>         > that will have checksums on it, so we should be fine.
>         >
>         > PFA one that does this.
>         >
> 
>         Seems fine to me. I'd however log waitforxid, not the oldest one. If
>         you're a DBA and you want to make the checksumming to proceed,
>         knowing
>         the oldest running XID is useless for that. If we log
>         waitforxid, it can
>         be used to query pg_stat_activity and interrupt the sessions
>         somehow.
> 
> 
>     Yeah, makes sense. Updated.
> 
>      
> 
>         >     >     And if you try this with a temporary table (not
>         hidden in transaction,
>         >     >     so the bgworker can see it), the worker will fail
>         with this:
>         >     >
>         >     >       ERROR:  cannot access temporary tables of other
>         sessions
>         >     >
>         >     >     But of course, this is just another way how to crash
>         without updating
>         >     >     the result for the launcher, so checksums may end up
>         being enabled
>         >     >     anyway.
>         >     >
>         >     >
>         >     > Yeah, there will be plenty of side-effect issues from that
>         >     > crash-with-wrong-status case. Fixing that will at least
>         make things
>         >     > safer -- in that checksums won't be enabled when not put
>         on all pages. 
>         >     >
>         >
>         >     Sure, the outcome with checksums enabled incorrectly is a
>         consequence of
>         >     bogus status, and fixing that will prevent that. But that
>         wasn't my main
>         >     point here - not articulated very clearly, though.
>         >
>         >     The bigger question is how to handle temporary tables
>         gracefully, so
>         >     that it does not terminate the bgworker like this at all.
>         This might be
>         >     even bigger issue than dropped relations, considering that
>         temporary
>         >     tables are pretty common part of applications (and it also
>         includes
>         >     CREATE/DROP).
>         >
>         >     For some clusters it might mean the online checksum
>         enabling would
>         >     crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>         >
>         >     Unfortunately, try_relation_open() won't fix this, as the
>         error comes
>         >     from ReadBufferExtended. And it's not a matter of simply
>         creating a
>         >     ReadBuffer variant without that error check, because
>         temporary tables
>         >     use local buffers.
>         >
>         >     I wonder if we could just go and set the checksums anyway,
>         ignoring the
>         >     local buffers. If the other session does some changes,
>         it'll overwrite
>         >     our changes, this time with the correct checksums. But it
>         seems pretty
>         >     dangerous (I mean, what if they're writing stuff while
>         we're updating
>         >     the checksums? Considering the various short-cuts for
>         temporary tables,
>         >     I suspect that would be a boon for race conditions.)
>         >
>         >     Another option would be to do something similar to running
>         transactions,
>         >     i.e. wait until all temporary tables (that we've seen at
>         the beginning)
>         >     disappear. But we're starting to wait on more and more stuff.
>         >
>         >     If we do this, we should clearly log which backends we're
>         waiting for,
>         >     so that the admins can go and interrupt them manually.
>         >
>         >
>         >
>         > Yeah, waiting for all transactions at the beginning is pretty
>         simple.
>         >
>         > Making the worker simply ignore temporary tables would also be
>         easy.
>         >
>         > One of the bigger issues here is temporary tables are
>         *session* scope
>         > and not transaction, so we'd actually need the other session
>         to finish,
>         > not just the transaction.
>         >
>         > I guess what we could do is something like this:
>         >
>         > 1. Don't process temporary tables in the checksumworker, period.
>         > Instead, build a list of any temporary tables that existed
>         when the
>         > worker started in this particular database (basically anything
>         that we
>         > got in our scan). Once we have processed the complete
>         database, keep
>         > re-scanning pg_class until those particular tables are gone
>         (search by oid).
>         >
>         > That means that any temporary tables that are created *while*
>         we are
>         > processing a database are ignored, but they should already be
>         receiving
>         > checksums.
>         >
>         > It definitely leads to a potential issue with long running
>         temp tables.
>         > But as long as we look at the *actual tables* (by oid), we
>         should be
>         > able to handle long-running sessions once they have dropped
>         their temp
>         > tables.
>         >
>         > Does that sound workable to you?
>         >
> 
>         Yes, that's pretty much what I meant by 'wait until all
>         temporary tables
>         disappear'. Again, we need to make it easy to determine which
>         OIDs are
>         we waiting for, which sessions may need DBA's attention.
> 
>         I don't think it makes sense to log OIDs of the temporary
>         tables. There
>         can be many of them, and in most cases the connection/session is
>         managed
>         by the application, so the only thing you can do is kill the
>         connection.
> 
> 
>     Yeah, agreed. I think it makes sense to show the *number* of temp
>     tables. That's also a predictable amount of information -- logging
>     all temp tables may as you say lead to an insane amount of data.
> 
>     PFA a patch that does this. I've also added some docs for it.
> 
>     And I also noticed pg_verify_checksums wasn't installed, so fixed
>     that too.
> 
> 
> PFA a rebase on top of the just committed verify-checksums patch. 
>


This seems OK in terms of handling errors in the worker and passing it
to the launcher. I haven't managed to do any crash testing today, but
code-wise it seems sane.

It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:

 WARNING:  attempts = -1684366952
 WARNING:  attempts = 1010514489
 WARNING:  attempts = -1145390664
 WARNING:  attempts = 1162101570

I guess those are not the droids we're looking for?

Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:

    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  database "template0" does not allow connections
    HINT:  Allow connections using ALTER DATABASE and try again.
    test=# alter database template0 allow_connections = true;
    ALTER DATABASE
    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  could not start checksumhelper: already running
    test=# select pg_disable_data_checksums();
     pg_disable_data_checksums
    ---------------------------

    (1 row)

    test=# select pg_enable_data_checksums(10, 10);
    ERROR:  could not start checksumhelper: has been cancelled

At which point the only thing you can do is restarting the cluster,
which seems somewhat unnecessary. But perhaps it's intentional?

Attached is a diff with a couple of minor comment tweaks, and correct
initialization of the attempts field.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 123638b..b0d082a 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -257,7 +257,7 @@
   </para>
 
   <para>
-   When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+   When attempting to recover from corrupt data it may be necessary to bypass the checksum
    protection in order to recover data. To do this, temporarily set the configuration parameter
    <xref linkend="guc-ignore-checksum-failure" />.
   </para>
@@ -287,15 +287,17 @@
     be visible to the process enabling checksums. It will also, for each database,
     wait for all pre-existing temporary tables to get removed before it finishes.
     If long-lived temporary tables are used in the application it may be necessary
-    to terminate these application connections to allow the checksummer process
-    to complete.
+    to terminate these application connections to allow the process to complete.
+    Information about open transactions and connections with temporary tables is
+    written to log.
    </para>
 
    <para>
     If the cluster is stopped while in <literal>inprogress</literal> mode, for
     any reason, then this process must be restarted manually. To do this,
     re-execute the function <function>pg_enable_data_checksums()</function>
-    once the cluster has been restarted.
+    once the cluster has been restarted. It is not possible to resume the work,
+    the process has to start from scratch.
    </para>
 
    <note>
@@ -317,6 +319,7 @@
     <para>
      <literal>template0</literal> is by default not accepting connections, to
      enable checksums you'll need to temporarily make it accept connections.
+     See <xref linkend="sql-alterdatabase" /> for details.
     </para>
    </note>
 
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 67b9cd8..b76b268 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -700,6 +700,11 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
 	PG_RETURN_DATUM(xtime);
 }
 
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
 Datum
 disable_data_checksums(PG_FUNCTION_ARGS)
 {
@@ -718,6 +723,12 @@ disable_data_checksums(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
 Datum
 enable_data_checksums(PG_FUNCTION_ARGS)
 {
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
index 84a8cc8..e1c34e8 100644
--- a/src/backend/postmaster/checksumhelper.c
+++ b/src/backend/postmaster/checksumhelper.c
@@ -653,6 +653,8 @@ BuildDatabaseList(void)
 
 		db->dboid = HeapTupleGetOid(tup);
 		db->dbname = pstrdup(NameStr(pgdb->datname));
+		elog(WARNING, "attempts = %d", db->attempts);
+		db->attempts = 0;
 
 		DatabaseList = lappend(DatabaseList, db);

Re: Online enabling of checksums

Reply via email to