On Mon, Dec 14, 2015 at 3:34 AM, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > Yes, the most significant and obvious (but hard to estimate the > benefit) target of async execution is (Merge)Append-ForeignScan, > which is narrow but freuquently used. And this patch has started > from it. > > It is because of the startup-heavy nature of FDW. So I involved > sort as a target later then redesigned to give the ability on all > nodes. If it is obviously over-done for the (currently) expected > benefit and if it is preferable to shrink this patch so as to > touch only the portion where async-exec has a benefit, I'll do > so.
Suppose we equip each EState with the ability to fire "callbacks". Callbacks have the signature: typedef bool (*ExecCallback)(PlanState *planstate, TupleTableSlot *slot, void *context); Executor nodes can register immediate callbacks to be run at the earliest possible opportunity using a function like ExecRegisterCallback(estate, callback, planstate, slot, context). They can registered deferred callbacks that will be called when a file descriptor becomes ready for I/O, or when the process latch is set, using a call like ExecRegisterFileCallback(estate, fd, event, callback, planstate, slot, context) or ExecRegisterLatchCallback(estate, callback, planstate, slot, context). To execute callbacks, an executor node can call ExecFireCallbacks(), which will fire immediate callbacks in order of registration, and wait for the file descriptors for which callbacks have been registered and for the process latch when no immediate callbacks remain but there are still deferred callbacks. It will return when (1) there are no remaining immediate or deferred callbacks or (2) one of the callbacks returns "true". Then, suppose we add a function bool ExecStartAsync(PlanState *target, ExecCallback callback, PlanState *cb_planstate, void *cb_context). For non-async-aware plan nodes, this just returns false. async-aware plan nodes should initiate some work, register some callbacks, and return. The callback that get registered should arrange in turn to register the callback passed as an argument when a tuple becomes available, passing the planstate and context provided by ExecStartAsync's caller, plus the TupleTableSlot containing the tuple. So, in response to ExecStartAsync, if there's no tuple currently available, postgres_fdw can send a query to the remote server and request a callback when the fd becomes ready-ready. It must save the callback passed to ExecStartAsync inside the PlanState someplace so that when a tuple becomes available it can register that callback. ExecAppend can call ExecStartAsync on each of its subplans. For any subplan where ExecStartAsync returns false, ExecAppend will just execute it normally, by calling ExecProcNode repeatedly until no more tuples are returned. But for async-capable subplans, it can call ExecStartAsync on all of them, and then call ExecFireCallbacks. The tuple-ready callback it passes to its child plans will take the tuple provided by the child plan and store it into the Append node's slot. It will then return true if, and only if, ExecFireCallbacks is being invoked from ExecAppend (which it can figure out via some kind of signalling either through its own PlanState or centralized signalling through the EState). That way, if ExecAppend were itself invoked asynchronously, its tuple-ready callback could simply populate a slot appropriately register its invoker's tuple-ready callback. Whether called synchronously or asynchronously, each invocation of as asynchronous append after the first would just need to again ExecStartAsync on the child that last returned a tuple. It seems pretty straightforward to fit Gather into this infrastructure. It is unclear to me how useful this is beyond ForeignScan, Gather, and Append. MergeAppend's ordering constraint makes it less useful; we can asynchronously kick off the request for the next tuple before returning the previous one, but we're going to need to have that tuple before we can return the next one. But it could be done. It could potentially even be applied to seq scans or index scans using some set of asynchronous I/O interfaces, but I don't see how it could be applied to joins or aggregates, which typically can't really proceed until they get the next tuple. They could be plugged into this interface easily enough but it would only help to the extent that it enabled asynchrony elsewhere in the plan tree to be pulled up towards the root. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers