On Wed, Jun 29, 2016 at 11:00 AM, Amit Khandekar <amitdkhan...@gmail.com> wrote:
> We may also want to consider handling abstract events such as
> "tuples-are-available-at-plan-node-X".
> One benefit is : we can combine this with batch processing. For e.g. in case
> of an Append node containing foreign scans, its parent node may not want to
> process the Append node result until Append is ready with at least 1000
> rows. In that case, Append node needs to raise an event
> "n-tuples-are-ready"; we cannot just rely on fd-ready events. fd-ready event
> will wake up the foreign scan, but it may not eventually cause its parent
> Append node to in turn wake up it's parent.

Right, I agree.  I think this case only arises in parallel query.  In
serial execution, there's not really any way for a plan node to just
become ready other than an FD or latch event or the parent becoming
ready.  But in parallel query it can happen, of course, because some
other backend can do work that makes that node ready to produce

It's not necessarily the case that we have to deal with this in the
initial patches for this feature, because the most obvious win for
this sort of thing is when we have an Append of ForeignScan plans.
Sure, parallel query has interesting cases, too, but a prototype that
just handles Append over a bunch of postgres_fdw ForeignScans would be
pretty cool.  I suggest that we make that the initial goal here.

> How we can do this event abstraction is the other question. We can have one
> latch for each of the event, and each node would raise its own event by
> setting the corresponding latch. But I am not sure about latches within a
> single process as against one process waking up another process. Or else,
> some internal event structures needs to be present (in estate ?), which then
> ExecProcNode would use when it does the event looping to wake up (i.e.
> execute) required nodes.

I think adding more latches would be a bad idea.  What I think we
should do instead is add two additional data structures to dynamic
shared memory:

1. An array of PGPROC * pointers for all of the workers.  Processes
add their PGPROC * to this array as they start up.  Then, parallel.h
can expose new API ParallelWorkerSetLatchesForGroup(void).  In the
leader, this sets the latch for every worker process for every
parallel context with which the leader is associated; in a worker, it
sets the latch for other processes in the parallel group, and the
leader also.

2. An array of executor nodes where one process might do something
that allows other processes to make progress on that node.  This would
be set up somehow by execParallel.c, which would need to somehow
figure out which plan nodes want to be included in the list.  When an
executor node does something that might unblock other workers, it
calls ParallelWorkerSetLatchesForGroup() and the async stuff then
tries calling all of the nodes in this array again to see if any of
them now think that they've got tuples to return (or just to let them
do additional work without returning tuples).

> Also, the WaitEvent.user_data field can have some more info besides the plan
> state. It can have its parent PlanState stored, so that we don't have to
> have parent field in plan state. It also can have some more data such as
> "n-tuples-available".

I don't think that works, because execution may need to flow
arbitrarily far up the tree.  Just knowing the immediate parent isn't
good enough.  If it generates a tuple, then you have to in turn call
it's parent, and that one then produces a tuple, you have to continue
on even further up the tree.  I think it's going to be very awkward to
make this work without those parent pointers.

BTW, we also need to benchmark those changes to add the parent
pointers and change the return conventions and see if they have any
measurable impact on performance.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to