Thank you for the thought.
At Fri, 23 Sep 2016 21:09:03 -0400, Robert Haas <robertmh...@gmail.com> wrote in <ca+tgmoaxqet4tz03ftqhnzedemzbck+lrni0uwhvvgotna6...@mail.gmail.com> > [ Adjusting subject line to reflect the actual topic of discussion better. ] > > On Fri, Sep 23, 2016 at 9:29 AM, Robert Haas <robertmh...@gmail.com> wrote: > > On Fri, Sep 23, 2016 at 8:45 AM, Amit Khandekar <amitdkhan...@gmail.com> > > wrote: > >> For e.g., in the above plan which you specified, suppose : > >> 1. Hash Join has called ExecProcNode() for the child foreign scan b, and so > >> is > >> waiting in ExecAsyncWaitForNode(foreign_scan_on_b). > >> 2. The event wait list already has foreign scan on a that is on a different > >> subtree. > >> 3. This foreign scan a happens to be ready, so in > >> ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called, > >> which returns with result_ready. > >> 4. Since it returns result_ready, it's parent node is now inserted in the > >> callbacks array, and so it's parent (Append) is executed. > >> 5. But, this Append planstate is already in the middle of executing Hash > >> join, and is waiting for HashJoin. > > > > Ah, yeah, something like that could happen. I've spent much of this > > week working on a new design for this feature which I think will avoid > > this problem. It doesn't work yet - in fact I can't even really test > > it yet. But I'll post what I've got by the end of the day today so > > that anyone who is interested can look at it and critique. > > Well, I promised to post this, so here it is. It's not really working > all that well at this point, and it's definitely not doing anything > that interesting, but you can see the outline of what I have in mind. > Since Kyotaro Horiguchi found that my previous design had a > system-wide performance impact due to the ExecProcNode changes, I > decided to take a different approach here: I created an async > infrastructure where both the requestor and the requestee have to be > specifically modified to support parallelism, and then modified Append > and ForeignScan to cooperate using the new interface. Hopefully that > means that anything other than those two nodes will suffer no > performance impact. Of course, it might have other problems.... The previous framework didn't need to distinguish async-capable and uncapable nodes from the parent node's view. The things in ExecProcNode was required for the reason. Instead, this new one removes the ExecProcNode stuff by distinguishing the two kinds of node in async-aware parents, that is, Append. This no longer involves async-unaware nodes into the tuple bubbling-up mechanism so the reentrant problem doesn't seem to occur. On the other hand, for example, the following plan, regrardless of its practicality, (there should be more good example..) (Async-unaware node) - NestLoop - Append - n * ForegnScan - Append - n * ForegnScan If the NestLoop, Append are async-aware, all of the ForeignScans can run asynchronously with the previous framework. The topmost NestLoop will be awakened after that firing of any ForenScans makes a tuple bubbles up to the NestLoop. This is because the not-need-to-distinguish-aware-or-not nature provided by the ExecProcNode stuff. On the other hand, with the new one, in order to do the same thing, ExecAppend have in turn to behave differently whether the parent is async or not. To do this will be bothersome but not with confidence. I examine this further intensively, especially for performance degeneration and obstacles to complete this. > Some notes: > > - EvalPlanQual rechecks are broken. > - EXPLAIN ANALYZE instrumentation is broken. > - ExecReScanAppend is broken, because the async stuff needs some way > of canceling an async request and I didn't invent anything like that > yet. > - The postgres_fdw changes pretend to be async but aren't actually. > It's just a demo of (part of) the interface at this point. > - The postgres_fdw changes also report all pg-fdw paths as > async-capable, but actually the direct-modify ones aren't, so the > regression tests fail. > - Errors in the executor can leak the WaitEventSet. Probably we need > to modify ResourceOwners to be able to own WaitEventSets. > - There are probably other bugs, too. > > Whee! > > Note that I've tried to solve the re-entrancy problems by (1) putting > all of the event loop's state inside the EState rather than in local > variables and (2) having the function that is called to report arrival > of a result be thoroughly different than the function that is used to > return a tuple to a synchronous caller. > > Comments welcome, if you're feeling brave enough to look at anything > this half-baked. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers