On Tue, Feb 10, 2015 at 2:48 AM, Andres Freund <and...@2ndquadrant.com> wrote: > Note that I'm not saying that Amit's patch is right - I haven't read it > - but that I don't think a 'scan this range of pages' heapscan API would > not be a bad idea. Not even just for parallelism, but for a bunch of > usecases.
We do have that, already. heap_setscanlimits(). I'm just not convinced that that's the right way to split up a parallel scan. There's too much risk of ending up with a very-uneven distribution of work. >> Regarding tuple flow between backends, I've thought about that before, >> I agree that we need it, and I don't think I know how to do it. I can >> see how to have a group of processes executing a single node in >> parallel, or a single process executing a group of nodes we break off >> from the query tree and push down to it, but what you're talking about >> here is a group of processes executing a group of nodes jointly. > > I don't think it really is that. I think you'd do it essentially by > introducing a couple more nodes. Something like > > SomeUpperLayerNode > | > | > AggCombinerNode > / \ > / \ > / \ > PartialHashAggNode PartialHashAggNode .... > .PartialHashAggNode ... > | | > | | > | | > | | > PartialSeqScan PartialSeqScan > > The only thing that'd potentially might need to end up working jointly > jointly would be the block selection of the individual PartialSeqScans > to avoid having to wait for stragglers for too long. E.g. each might > just ask for a range of a 16 megabytes or so that it scans sequentially. > > In such a plan - a pretty sensible and not that uncommon thing for > parallelized aggregates - you'd need to be able to tell the heap scans > which blocks to scan. Right? For this case, what I would imagine is that there is one parallel heap scan, and each PartialSeqScan attaches to it. The executor says "give me a tuple" and heapam.c provides one. Details like the chunk size are managed down inside heapam.c, and the executor does not know about them. It just knows that it can establish a parallel scan and then pull tuples from it. >> Maybe we designate nodes as can-generate-multiple-tuple-streams (seq >> scan, mostly, I would think) and can-absorb-parallel-tuple-streams >> (sort, hash, materialize), or something like that, but I'm really >> fuzzy on the details. > > I don't think we really should have individual nodes that produce > multiple streams - that seems like it'd end up being really > complicated. I'd more say that we have distinct nodes (like the > PartialSeqScan ones above) that do a teensy bit of coordination about > which work to perform. I think we're in violent agreement here, except for some terminological confusion. Are there N PartialSeqScan nodes, one running in each node, or is there one ParallelSeqScan node, which is copied and run jointly across N nodes? You can talk about either way and have it make sense, but we haven't had enough conversations about this on this list to have settled on a consistent set of vocabulary yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers