Re: [HACKERS] Parallel Seq Scan

Robert Haas Tue, 10 Feb 2015 05:53:16 -0800

On Tue, Feb 10, 2015 at 2:48 AM, Andres Freund <and...@2ndquadrant.com> wrote:
> Note that I'm not saying that Amit's patch is right - I haven't read it
> - but that I don't think a 'scan this range of pages' heapscan API would
> not be a bad idea. Not even just for parallelism, but for a bunch of
> usecases.


We do have that, already.  heap_setscanlimits().  I'm just not
convinced that that's the right way to split up a parallel scan.
There's too much risk of ending up with a very-uneven distribution of
work.

>> Regarding tuple flow between backends, I've thought about that before,
>> I agree that we need it, and I don't think I know how to do it.  I can
>> see how to have a group of processes executing a single node in
>> parallel, or a single process executing a group of nodes we break off
>> from the query tree and push down to it, but what you're talking about
>> here is a group of processes executing a group of nodes jointly.
>
> I don't think it really is that. I think you'd do it essentially by
> introducing a couple more nodes. Something like
>
>                               SomeUpperLayerNode
>                                       |
>                                       |
>                                  AggCombinerNode
>                                     /   \
>                                    /     \
>                                   /       \
>                    PartialHashAggNode   PartialHashAggNode .... 
> .PartialHashAggNode ...
>                            |                    |
>                            |                    |
>                            |                    |
>                            |                    |
>                     PartialSeqScan        PartialSeqScan
>
> The only thing that'd potentially might need to end up working jointly
> jointly would be the block selection of the individual PartialSeqScans
> to avoid having to wait for stragglers for too long. E.g. each might
> just ask for a range of a 16 megabytes or so that it scans sequentially.
>
> In such a plan - a pretty sensible and not that uncommon thing for
> parallelized aggregates - you'd need to be able to tell the heap scans
> which blocks to scan. Right?

For this case, what I would imagine is that there is one parallel heap
scan, and each PartialSeqScan attaches to it.  The executor says "give
me a tuple" and heapam.c provides one.  Details like the chunk size
are managed down inside heapam.c, and the executor does not know about
them.  It just knows that it can establish a parallel scan and then
pull tuples from it.

>> Maybe we designate nodes as can-generate-multiple-tuple-streams (seq
>> scan, mostly, I would think) and can-absorb-parallel-tuple-streams
>> (sort, hash, materialize), or something like that, but I'm really
>> fuzzy on the details.
>
> I don't think we really should have individual nodes that produce
> multiple streams - that seems like it'd end up being really
> complicated. I'd more say that we have distinct nodes (like the
> PartialSeqScan ones above) that do a teensy bit of coordination about
> which work to perform.

I think we're in violent agreement here, except for some
terminological confusion.  Are there N PartialSeqScan nodes, one
running in each node, or is there one ParallelSeqScan node, which is
copied and run jointly across N nodes?  You can talk about either way
and have it make sense, but we haven't had enough conversations about
this on this list to have settled on a consistent set of vocabulary
yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

Reply via email to