On Tue, Oct 17, 2017 at 12:06 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> In general, it may be acceptable to rely on the elog() checks - which is
> pretty much what the FETCH+INSERT+SHARE example I shared in the first
> message of this thread does. I agree it's not particularly convenient,
> and perhaps we should replace it with checks while planning the queries.

Those checks are very much not comprehensive, though.  For example,
consider commit e9baa5e9fa147e00a2466ab2c40eb99c8a700824, which
allowed heap_insert() in parallel mode.  Here's the comment:

+     * Parallel operations are required to be strictly read-only in a parallel
+     * worker.  Parallel inserts are not safe even in the leader in the
+     * general case, because group locking means that heavyweight locks for
+     * relation extension or GIN page locks will not conflict between members
+     * of a lock group, but we don't prohibit that case here because there are
+     * useful special cases that we can safely allow, such as CREATE TABLE AS.
+     */
+    if (IsParallelWorker())
+                 errmsg("cannot insert tuples in a parallel worker")));

Now, as things stand, there's no way for this code to be reached
except in the case where the inserts are targeting a new table, or at
least I hope there isn't, because that would be a bug.  But if we
applied your patch then it could be easily done: just start a parallel
cursor and then run an INSERT command.  It might take up a little work
to gin up a test case that shows this really crashing and burning, but
I'm very confident that it's possible.  I wouldn't have gone to the
trouble of installing checks to prevent inserts from running in
parallel mode if inserts were safe in parallel mode.

>> Also, I doubt this guarantees that we won't try to call
>> parallel-unsafe functions will parallel mode is active, so things
>> will just blow up in whatever way they do, maybe crashing the server
>> or rendering the database inconsistent or whatever.
> Probably. What happens right now is that declaring the cursor switches
> the whole transaction into parallel mode (EnterParallelMode), so if you
> do FETCH + INSERT + FETCH that will fail with elog(ERROR).

But, again, only in the limited cases that the elog() checks catch.  A
C function can be parallel-unsafe without doing anything that hits the
elog() checks; there is no way for the system to protect itself
against arbitrary C code.  The elog() checks are intended to catch the
common or likely ways of breaking the world, but arbitrarily C code
can work around those checks -- if nothing else, duplicate one of the
functions that has an elog() in it, remove the elog(), and then call
it.  You just did something parallel-safe, unchecked!  That's an
artificial example, of course, which is not likely to occur in
practice, but I'm pretty confident that there are non-artificial
examples also.

I think this is a more or less classic kind of programming problem -
you're proposing to take a set of checks that were intended to ensure
safety under a limited set of circumstances and generalize them to a
much broader context than the one for which they were designed.  They
will not be adequate to those circumstances, and a considerable amount
of analysis will be needed to figure out where the gaps are.  If
somebody wants to do that analysis, I'm willing to review it and try
to spot any holes, but I don't really want to spend the time to go do
the whole analysis myself.

The main points I want to make clearly understood is the current
design relies on (1) functions being labeled correctly and (2) other
dangerous code paths being unreachable because there's nothing that
runs between EnterParallelMode and ExitParallelMode which could invoke
them, except by calling a mislabeled function.  Your patch expands the
vulnerability surface from "executor code that can be reached without
calling a mislabeled function" to "any code that can be reached by
typing an SQL command".  Just rejecting any queries that are
parallel-unsafe probably closes a good chunk of the holes, but that
still leaves a lot of code that's never been run in parallel mode
before potentially now running in parallel mode - e.g. any DDL command
you happen to type, transaction control commands, code that only runs
when the server is idle like idle_in_transaction_timeout, cursor
operations.  A lot of that stuff is probably fine, but it's got to be
thought through.  Error handling might be a problem, too: what happens
if a parallel worker is killed while the query is suspended?  I
suspect that doesn't work very nicely at all.

One way to get some ideas about where the problem lies would be to
write a test patch that keeps parallel mode active at all times except
when running a query that contains something parallel-unsafe.  Then
run the regression tests and see if anything blows up.  That's not
going to find everything that needs to be fixed but it would be a good
experiment to try.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to