On Wed, Jun 29, 2016 at 11:00 AM, Amit Khandekar <amitdkhan...@gmail.com> wrote: > We may also want to consider handling abstract events such as > "tuples-are-available-at-plan-node-X". > > One benefit is : we can combine this with batch processing. For e.g. in case > of an Append node containing foreign scans, its parent node may not want to > process the Append node result until Append is ready with at least 1000 > rows. In that case, Append node needs to raise an event > "n-tuples-are-ready"; we cannot just rely on fd-ready events. fd-ready event > will wake up the foreign scan, but it may not eventually cause its parent > Append node to in turn wake up it's parent.
Right, I agree. I think this case only arises in parallel query. In serial execution, there's not really any way for a plan node to just become ready other than an FD or latch event or the parent becoming ready. But in parallel query it can happen, of course, because some other backend can do work that makes that node ready to produce tuples. It's not necessarily the case that we have to deal with this in the initial patches for this feature, because the most obvious win for this sort of thing is when we have an Append of ForeignScan plans. Sure, parallel query has interesting cases, too, but a prototype that just handles Append over a bunch of postgres_fdw ForeignScans would be pretty cool. I suggest that we make that the initial goal here. > How we can do this event abstraction is the other question. We can have one > latch for each of the event, and each node would raise its own event by > setting the corresponding latch. But I am not sure about latches within a > single process as against one process waking up another process. Or else, > some internal event structures needs to be present (in estate ?), which then > ExecProcNode would use when it does the event looping to wake up (i.e. > execute) required nodes. I think adding more latches would be a bad idea. What I think we should do instead is add two additional data structures to dynamic shared memory: 1. An array of PGPROC * pointers for all of the workers. Processes add their PGPROC * to this array as they start up. Then, parallel.h can expose new API ParallelWorkerSetLatchesForGroup(void). In the leader, this sets the latch for every worker process for every parallel context with which the leader is associated; in a worker, it sets the latch for other processes in the parallel group, and the leader also. 2. An array of executor nodes where one process might do something that allows other processes to make progress on that node. This would be set up somehow by execParallel.c, which would need to somehow figure out which plan nodes want to be included in the list. When an executor node does something that might unblock other workers, it calls ParallelWorkerSetLatchesForGroup() and the async stuff then tries calling all of the nodes in this array again to see if any of them now think that they've got tuples to return (or just to let them do additional work without returning tuples). > Also, the WaitEvent.user_data field can have some more info besides the plan > state. It can have its parent PlanState stored, so that we don't have to > have parent field in plan state. It also can have some more data such as > "n-tuples-available". I don't think that works, because execution may need to flow arbitrarily far up the tree. Just knowing the immediate parent isn't good enough. If it generates a tuple, then you have to in turn call it's parent, and that one then produces a tuple, you have to continue on even further up the tree. I think it's going to be very awkward to make this work without those parent pointers. BTW, we also need to benchmark those changes to add the parent pointers and change the return conventions and see if they have any measurable impact on performance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers