On Mon, Dec 14, 2015 at 3:34 AM, Kyotaro HORIGUCHI
<horiguchi.kyot...@lab.ntt.co.jp> wrote:
> Yes, the most significant and obvious (but hard to estimate the
> benefit) target of async execution is (Merge)Append-ForeignScan,
> which is narrow but freuquently used.  And this patch has started
> from it.
> It is because of the startup-heavy nature of FDW. So I involved
> sort as a target later then redesigned to give the ability on all
> nodes.  If it is obviously over-done for the (currently) expected
> benefit and if it is preferable to shrink this patch so as to
> touch only the portion where async-exec has a benefit, I'll do
> so.

Suppose we equip each EState with the ability to fire "callbacks".
Callbacks have the signature:

typedef bool (*ExecCallback)(PlanState *planstate, TupleTableSlot
*slot, void *context);

Executor nodes can register immediate callbacks to be run at the
earliest possible opportunity using a function like
ExecRegisterCallback(estate, callback, planstate, slot, context).
They can registered deferred callbacks that will be called when a file
descriptor becomes ready for I/O, or when the process latch is set,
using a call like ExecRegisterFileCallback(estate, fd, event,
callback, planstate, slot, context) or
ExecRegisterLatchCallback(estate, callback, planstate, slot, context).

To execute callbacks, an executor node can call ExecFireCallbacks(),
which will fire immediate callbacks in order of registration, and wait
for the file descriptors for which callbacks have been registered and
for the process latch when no immediate callbacks remain but there are
still deferred callbacks.  It will return when (1) there are no
remaining immediate or deferred callbacks or (2) one of the callbacks
returns "true".

Then, suppose we add a function bool ExecStartAsync(PlanState *target,
ExecCallback callback, PlanState *cb_planstate, void *cb_context).
For non-async-aware plan nodes, this just returns false.  async-aware
plan nodes should initiate some work, register some callbacks, and
return.  The callback that get registered should arrange in turn to
register the callback passed as an argument when a tuple becomes
available, passing the planstate and context provided by
ExecStartAsync's caller, plus the TupleTableSlot containing the tuple.

So, in response to ExecStartAsync, if there's no tuple currently
available, postgres_fdw can send a query to the remote server and
request a callback when the fd becomes ready-ready.  It must save the
callback passed to ExecStartAsync inside the PlanState someplace so
that when a tuple becomes available it can register that callback.

ExecAppend can call ExecStartAsync on each of its subplans.  For any
subplan where ExecStartAsync returns false, ExecAppend will just
execute it normally, by calling ExecProcNode repeatedly until no more
tuples are returned.  But for async-capable subplans, it can call
ExecStartAsync on all of them, and then call ExecFireCallbacks.  The
tuple-ready callback it passes to its child plans will take the tuple
provided by the child plan and store it into the Append node's slot.
It will then return true if, and only if, ExecFireCallbacks is being
invoked from ExecAppend (which it can figure out via some kind of
signalling either through its own PlanState or centralized signalling
through the EState).  That way, if ExecAppend were itself invoked
asynchronously, its tuple-ready callback could simply populate a slot
appropriately register its invoker's tuple-ready callback.  Whether
called synchronously or asynchronously, each invocation of as
asynchronous append after the first would just need to again
ExecStartAsync on the child that last returned a tuple.

It seems pretty straightforward to fit Gather into this infrastructure.

It is unclear to me how useful this is beyond ForeignScan, Gather, and
Append.  MergeAppend's ordering constraint makes it less useful; we
can asynchronously kick off the request for the next tuple before
returning the previous one, but we're going to need to have that tuple
before we can return the next one.  But it could be done.  It could
potentially even be applied to seq scans or index scans using some set
of asynchronous I/O interfaces, but I don't see how it could be
applied to joins or aggregates, which typically can't really proceed
until they get the next tuple.  They could be plugged into this
interface easily enough but it would only help to the extent that it
enabled asynchrony elsewhere in the plan tree to be pulled up towards
the root.


Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to