Re: [HACKERS] Request more documentation for incompatibility of parallelism and plpgsql exec_run_select

Mark Dilger Thu, 10 Aug 2017 15:21:59 -0700

> On Aug 10, 2017, at 11:20 AM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Wed, Jul 5, 2017 at 12:14 AM, Mark Dilger <hornschnor...@gmail.com> wrote:
>> I can understand this, but wonder if I could use something like
>> 
>> FOR I TOTALLY PROMISE TO USE ALL ROWS rec IN EXECUTE sql LOOP
>> ...
>> END LOOP;
> 
> Actually, what you'd need is:
> 
> FOR I TOTALLY PROMISE TO USE ALL ROWS AND IT IS OK TO BUFFER THEM ALL
> IN MEMORY INSTEAD OF FETCHING THEM ONE AT A TIME FROM THE QUERY rec IN
> EXECUTE sql LOOP
> 
> Similarly, RETURN QUERY could be made to work with parallelism if we
> had RETURN QUERY AND IT IS OK TO BUFFER ALL THE ROWS IN MEMORY TWICE
> INSTEAD OF ONCE.
> 
> I've thought a bit about trying to make parallel query support partial
> execution, but it seems wicked hard.  The problem is that you can't
> let the main process do anything parallel-unsafe (e.g., currently,
> write any data) while the there are workers in existence, or things
> may blow up badly.  You could think about fixing that problem by
> having all of the workers exit cleanly when the query is suspended,
> and then firing up new ones when the query is resumed.  However, that
> presents two further problems: (1) having the workers exit cleanly
> when the query is suspended would cause wrong answers unless any
> tuples that the worker has implicitly claimed e.g. by taking a page
> from a parallel scan and returning only some of the tuples on it were
> somehow accounted for and (2) starting and stopping workers over and
> over would be bad for performance.  The second problem could be solved
> by having a persistent pool of workers that attach and detach instead
> of firing up new ones all the time, but that has a host of problems
> all of its own.  The first one would be desirable change for a bunch
> of reasons but is not easy for reasons that are a little longer than I
> feel like explaining right now.


That misses the point I was making.  I was suggesting a syntax where
the caller promises to use all rows without stopping short, and the
database performance problems of having a bunch of parallel workers
suspended in mid query is simply the caller's problem if the caller does
not honor the contract.  Maybe the ability to execute such queries would
be limited to users who are granted a privilege for doing so, and the DBA
can decide not to go around granting that privilege to anybody.
Certainly if this is being used from within a stored procedure, the DBA
can make certain only to use it in cases where there is no execution
path exiting the loop before completion, either because everything is
wrapped up with try/catch syntax and/or because the underlying query
does not call anything that might throw exceptions.

I'm not advocating that currently, as you responded to a somewhat old
email, so really I'm just making clear what I intended at the time.

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Request more documentation for incompatibility of parallelism and plpgsql exec_run_select

Reply via email to