On Mon, Nov 30, 2015 at 7:47 AM, Kyotaro HORIGUCHI
<horiguchi.kyot...@lab.ntt.co.jp> wrote:
> "Asynchronous execution" is a feature to start substantial work
> of nodes before doing Exec*. This can reduce total startup time
> by folding startup time of multiple execution nodes. Especially
> effective for the combination of joins or appends and their
> multiple children that needs long time to startup.
> This patch does that by inserting another phase "Start*" between
> ExecInit* and Exec* to launch parallel processing including
> pgworker and FDWs before requesting the very first tuple of the
> result.

I have thought about this, too, but I'm not very convinced that this
is the right model.  In a typical case involving parallelism, you hope
to have the Gather node as close to the top of the plan tree as
possible.  Therefore, the start phase will not happen much before the
first execution of the node, and you don't get much benefit.
Moreover, I think that prefetching can be useful not only at the start
of the query - which is the only thing that your model supports - but
also in mid-query.  For example, consider an Append of two ForeignScan
nodes.  Ideally we'd like to return the results in the order that they
become available, rather than serially.  This model might help with
that for the first batch of rows you fetch, but not after that.

There are a couple of other problems here that are specific to this
example.  You get a benefit here because you've got two Gather nodes
that both get kicked off before we try to read tuples from either, but
that's generally something to avoid - you can only use 3 processes and
typically at most 2 of those will actually be running (as opposed to
sleeping) at the same time: the workers will run to completion, and
then the leader will wake up and do its thing.   I'm not saying our
current implementation of parallel query scales well to a large number
of workers (it doesn't) but I think that's more about improving the
implementation than any theoretical problem, so this seems a little
worse.  Also, currently, both merge and hash joins have an
optimization wherein if the outer side of the join turns out to be
empty, we avoid paying the startup cost for the inner side of the
join; kicking off the work on the inner side of the merge join
asynchronously before we've gotten any tuples from the outer side
loses the benefit of that optimization.

I suspect there is no single paradigm that will help with all of the
cases where asynchronous execution is useful.  We're going to need a
series of changes that are targeted at specific problems.  For
example, here it would be useful to have one side of the join confirm
at the earliest possible stage that it will definitely return at least
one tuple eventually, but then return control to the caller so that we
can kick off the other side of the join.  The sort node never
eliminates anything, so as soon as the sequential scan underneath it
coughs up a tuple, we're definitely getting a return value eventually.
At that point it's safe to kick off the other Gather node.  I don't
quite know how to design a signalling system for that, but it could be

But is it important enough to be worthwhile?  Maybe, maybe not.  I
think we should be working toward a world where the Gather is at the
top of the plan tree as often as possible, in which case
asynchronously kicking off a Gather node won't be that exciting any
more - see notes on the "parallelism + sorting" thread where I talk
about primitives that would allow massively parallel merge joins,
rather than 2 or 3 way parallel.  From my point of view, the case
where we really need some kind of asynchronous execution solution is a
ForeignScan, and in particular a ForeignScan which is the child of an
Append.  In that case it's obviously really useful to be able to kick
off all the foreign scans and then return a tuple from whichever one
coughs it up first.  Is that the ONLY case where asynchronous
execution is useful?  Probably not, but I bet it's the big one.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to