On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <[email protected]
> <mailto:[email protected]>> wrote:
>
> On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
> <[email protected] <mailto:[email protected]>>
> wrote:
>
> On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> > <[email protected]
> <mailto:[email protected]>
> <mailto:[email protected]
> <mailto:[email protected]>>> wrote:
> >
> > ...
> >
> > I do think just waiting for all running transactions to
> complete is
> > fine, and it's not the first place where we use it - CREATE
> SUBSCRIPTION
> > does pretty much exactly the same thing (and CREATE INDEX
> CONCURRENTLY
> > too, to some extent). So we have a precedent / working code we
> can copy.
> >
> >
> > Thinking again, I don't think it should be done as part of
> > BuildRelationList(). We should just do it once in the launcher
> before
> > starting, that'll be both easier and cleaner. Anything started after
> > that will have checksums on it, so we should be fine.
> >
> > PFA one that does this.
> >
>
> Seems fine to me. I'd however log waitforxid, not the oldest one. If
> you're a DBA and you want to make the checksumming to proceed,
> knowing
> the oldest running XID is useless for that. If we log
> waitforxid, it can
> be used to query pg_stat_activity and interrupt the sessions
> somehow.
>
>
> Yeah, makes sense. Updated.
>
>
>
> > > And if you try this with a temporary table (not
> hidden in transaction,
> > > so the bgworker can see it), the worker will fail
> with this:
> > >
> > > ERROR: cannot access temporary tables of other
> sessions
> > >
> > > But of course, this is just another way how to crash
> without updating
> > > the result for the launcher, so checksums may end up
> being enabled
> > > anyway.
> > >
> > >
> > > Yeah, there will be plenty of side-effect issues from that
> > > crash-with-wrong-status case. Fixing that will at least
> make things
> > > safer -- in that checksums won't be enabled when not put
> on all pages.
> > >
> >
> > Sure, the outcome with checksums enabled incorrectly is a
> consequence of
> > bogus status, and fixing that will prevent that. But that
> wasn't my main
> > point here - not articulated very clearly, though.
> >
> > The bigger question is how to handle temporary tables
> gracefully, so
> > that it does not terminate the bgworker like this at all.
> This might be
> > even bigger issue than dropped relations, considering that
> temporary
> > tables are pretty common part of applications (and it also
> includes
> > CREATE/DROP).
> >
> > For some clusters it might mean the online checksum
> enabling would
> > crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
> >
> > Unfortunately, try_relation_open() won't fix this, as the
> error comes
> > from ReadBufferExtended. And it's not a matter of simply
> creating a
> > ReadBuffer variant without that error check, because
> temporary tables
> > use local buffers.
> >
> > I wonder if we could just go and set the checksums anyway,
> ignoring the
> > local buffers. If the other session does some changes,
> it'll overwrite
> > our changes, this time with the correct checksums. But it
> seems pretty
> > dangerous (I mean, what if they're writing stuff while
> we're updating
> > the checksums? Considering the various short-cuts for
> temporary tables,
> > I suspect that would be a boon for race conditions.)
> >
> > Another option would be to do something similar to running
> transactions,
> > i.e. wait until all temporary tables (that we've seen at
> the beginning)
> > disappear. But we're starting to wait on more and more stuff.
> >
> > If we do this, we should clearly log which backends we're
> waiting for,
> > so that the admins can go and interrupt them manually.
> >
> >
> >
> > Yeah, waiting for all transactions at the beginning is pretty
> simple.
> >
> > Making the worker simply ignore temporary tables would also be
> easy.
> >
> > One of the bigger issues here is temporary tables are
> *session* scope
> > and not transaction, so we'd actually need the other session
> to finish,
> > not just the transaction.
> >
> > I guess what we could do is something like this:
> >
> > 1. Don't process temporary tables in the checksumworker, period.
> > Instead, build a list of any temporary tables that existed
> when the
> > worker started in this particular database (basically anything
> that we
> > got in our scan). Once we have processed the complete
> database, keep
> > re-scanning pg_class until those particular tables are gone
> (search by oid).
> >
> > That means that any temporary tables that are created *while*
> we are
> > processing a database are ignored, but they should already be
> receiving
> > checksums.
> >
> > It definitely leads to a potential issue with long running
> temp tables.
> > But as long as we look at the *actual tables* (by oid), we
> should be
> > able to handle long-running sessions once they have dropped
> their temp
> > tables.
> >
> > Does that sound workable to you?
> >
>
> Yes, that's pretty much what I meant by 'wait until all
> temporary tables
> disappear'. Again, we need to make it easy to determine which
> OIDs are
> we waiting for, which sessions may need DBA's attention.
>
> I don't think it makes sense to log OIDs of the temporary
> tables. There
> can be many of them, and in most cases the connection/session is
> managed
> by the application, so the only thing you can do is kill the
> connection.
>
>
> Yeah, agreed. I think it makes sense to show the *number* of temp
> tables. That's also a predictable amount of information -- logging
> all temp tables may as you say lead to an insane amount of data.
>
> PFA a patch that does this. I've also added some docs for it.
>
> And I also noticed pg_verify_checksums wasn't installed, so fixed
> that too.
>
>
> PFA a rebase on top of the just committed verify-checksums patch.
>
This seems OK in terms of handling errors in the worker and passing it
to the launcher. I haven't managed to do any crash testing today, but
code-wise it seems sane.
It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:
WARNING: attempts = -1684366952
WARNING: attempts = 1010514489
WARNING: attempts = -1145390664
WARNING: attempts = 1162101570
I guess those are not the droids we're looking for?
Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:
test=# select pg_enable_data_checksums(10, 10);
ERROR: database "template0" does not allow connections
HINT: Allow connections using ALTER DATABASE and try again.
test=# alter database template0 allow_connections = true;
ALTER DATABASE
test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: already running
test=# select pg_disable_data_checksums();
pg_disable_data_checksums
---------------------------
(1 row)
test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: has been cancelled
At which point the only thing you can do is restarting the cluster,
which seems somewhat unnecessary. But perhaps it's intentional?
Attached is a diff with a couple of minor comment tweaks, and correct
initialization of the attempts field.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 123638b..b0d082a 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -257,7 +257,7 @@
</para>
<para>
- When attempting to recover from corrupt data it may be necessarily to bypass the checksum
+ When attempting to recover from corrupt data it may be necessary to bypass the checksum
protection in order to recover data. To do this, temporarily set the configuration parameter
<xref linkend="guc-ignore-checksum-failure" />.
</para>
@@ -287,15 +287,17 @@
be visible to the process enabling checksums. It will also, for each database,
wait for all pre-existing temporary tables to get removed before it finishes.
If long-lived temporary tables are used in the application it may be necessary
- to terminate these application connections to allow the checksummer process
- to complete.
+ to terminate these application connections to allow the process to complete.
+ Information about open transactions and connections with temporary tables is
+ written to log.
</para>
<para>
If the cluster is stopped while in <literal>inprogress</literal> mode, for
any reason, then this process must be restarted manually. To do this,
re-execute the function <function>pg_enable_data_checksums()</function>
- once the cluster has been restarted.
+ once the cluster has been restarted. It is not possible to resume the work,
+ the process has to start from scratch.
</para>
<note>
@@ -317,6 +319,7 @@
<para>
<literal>template0</literal> is by default not accepting connections, to
enable checksums you'll need to temporarily make it accept connections.
+ See <xref linkend="sql-alterdatabase" /> for details.
</para>
</note>
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 67b9cd8..b76b268 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -700,6 +700,11 @@ pg_backup_start_time(PG_FUNCTION_ARGS)
PG_RETURN_DATUM(xtime);
}
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
Datum
disable_data_checksums(PG_FUNCTION_ARGS)
{
@@ -718,6 +723,12 @@ disable_data_checksums(PG_FUNCTION_ARGS)
PG_RETURN_VOID();
}
+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
Datum
enable_data_checksums(PG_FUNCTION_ARGS)
{
diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
index 84a8cc8..e1c34e8 100644
--- a/src/backend/postmaster/checksumhelper.c
+++ b/src/backend/postmaster/checksumhelper.c
@@ -653,6 +653,8 @@ BuildDatabaseList(void)
db->dboid = HeapTupleGetOid(tup);
db->dbname = pstrdup(NameStr(pgdb->datname));
+ elog(WARNING, "attempts = %d", db->attempts);
+ db->attempts = 0;
DatabaseList = lappend(DatabaseList, db);