Hanada-san,

Thanks for your dedicated reviewing.

It's a very long message. So, let me summarize the things
I shall do in the next patch.

* fix bug: custom-plan class comparison
* fix up naming convention and syntax
  CREATE CUSTOM PLAN PROVIDER, rather than
  CREATE CUSTOM PLAN. Prefix shall be "cpp_".
* fix up: definition of get_custom_plan_oid()
* fix up: unexpected white spaces, to be tabs.
* fix up: remove unnecessary forward declarations.
* fix up: revert replace_nestloop_params() to static
* make SetCustomPlanRef an optional callback
* fix up: typos in various points
* add documentation to explain custom-plan interface.

Also, I want committer's opinion about the issues below
* whether set_cheapest() is called for all relkind?
* how argument of add_path handler shall be derivered?

Individual comments are put below:

> Kaigai-san,
> 
> Sorry for lagged response.
> 
> Here are my  random thoughts about the patch.  I couldn't understand the
> patch fully, because some of APIs are not used by ctidscan.  If
> 
> Custom Scan patch v2 review
> 
> * Custom plan class comparison
> In backend/optimizer/util/pathnode.c, custclass is compared by bit-and
> with 's'.  Do you plan to use custclass as bit field?  If so, values for
> custom plan class should not be a character.  Otherwise, custclass should
> be compared by == operator.
> 
Sorry, it is a bug that come from the previous design.
I had an idea that allows a custom plan provider to support multiple kind
of exec nodes, however, I concluded it does not make sense so much. (we
can define multiple CPP for each)

> * Purpose of GetSpecialCustomVar()
> The reason why FinalizeCustomPlan callback is necessary is not clear to
> me.
> Could you show a case that the API would be useful?
> 
It is needed feature to replace a built-in join by custom scan, however,
it might be unclear on the scan workloads.

Let me explain why join replacement needed. A join node has two input
slot (inner and outer), its expression node including Var node reference
either of slot according to its varno (INNER_VAR or OUTER_VAR).
In case when a CPP replaced a join, it has to generate an equivalent result
but it may not be a best choice to use two input streams.
(Please remind when we construct remote join on postgres_fdw, all the
materialization was done on remote side, thus we had one input stream to
generate local join equivalent view.)
On the other hands, EXPLAIN command has to understand what column is the
source of varnodes in targetlist of custom-node even if it is rewritten
to use just one slot. For example, which label shall be shown in case when
3rd item of targetlist is originally come from 2nd item of inner slot but
all the materialized result is stored into outer slot.
Only CPP can track its relationship between the original and the newer one.
This interface provides a way to solve a varnode that actually references.

> * Purpose of FinalizeCustomPlan()
> The reason why FinalizeCustomPlan callback is necessary is not clear to
> me, because ctidscan just calls finalize_primnode() and
> bms_add_members() with given information.  Could you show a case that the
> API would be useful?
> 
The main purpose of this callback gives an extension chance to apply
finalize_primenode() if custom-node hold expression tree on its private
fields.
In case when CPP picked up a part of clauses to run its own way, it shall
be attached on neither plan->targetlist nor plan->qual, only CPP knows
where does it attached. So, these orphan expression nodes have to be
treated by CPP.

> * Is it ok to call set_cheapest() for all relkind?
> Now set_cheapest() is called not for only relation and foreign table but
> also custom plan, and other relations such as subquery, function, and value.
> Calling call_custom_scan_provider() and set_cheapest() in the case of
> RTE_RELATION seems similar to the old construct, how do you think about
> this?
> 
I don't think we may be actually able to have some useful custom scan logic
on these special relation forms, however, I also didn't have a special reason
why custom-plan does not need to support these special relations.
I'd like to see committer's opinion here.


> * Is it hard to get rid of CopyCustomPlan()?
> Copying ForeignScan node doesn't need per-FDW copy function by limiting
> fdw_private to have only copy-able objects.  Can't we use the same way for
> CustomPlan?  Letting authors call NodeSetTag or
> copyObject() sounds uncomfortable to me.
> 
> This would be able to apply to TextOutCustomPlan() and TextOutCustomPath()
> too.
> 
FDW-like design was the original one, but the latest design was suggestion
by Tom on the v9.4 development cycle, because some data types are not
complianced to copyObject; like Bitmapset.

> * MultiExec support is appropriate for the first version?
> The cases need MultiExec seems little complex for the first version of custom
> scan.  What kind of plan do you image for this feature?
> 
It is definitely necessary to exchange multiple rows with custom-format with
upper level if both of nodes are managed by same CPP.
I plan to use this interface for bulk-loading that makes much faster data
loading to GPUs.

> * Does SupportBackwardScan() have enough information?
> Other scans check target list with TargetListSupportsBackwardScan().
> Isn't it necessary to check it for CustomPlan too in
> ExecSupportsBackwardScan()?
> 
It derivers CustomPlan node itself that includes Plan node.
If CPP thought it is necessary, it can run equivalent checks here.

> * Place to call custom plan provider
> Is it necessary to call provider when relkind != RELKIND_RELATION?  If yes,
> isn't it necessary to call for append relation?
> 
> I know that we concentrate to replacing scan in the initial version, so
> it would not be a serious problem, but it would be good to consider extensible
> design.
> 
Regarding of the child relation scan, set_append_rel_pathlist() calls
set_rel_pathlist() that is entry point of custom-scan paths.
If you mention about alternative-path of Append node, yes, it is not
a feature being supported in the first commit.

> * Custom Plan Provider is "addpath"?
> Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER seems
> little odd.
> Using handler like FDW makes the design too complex and/or messy?
> 
This design allows to pass a set of information needed according to the
workload; like join not only scan. If we need to extend customXXXXArg in
the future, all we need to extend is data structure definition, not
function prototype itself.
Anyway, I'd like to make a decision for this on committer review stage.

> * superclass of CustomPlanState
> CustomPlanState derives ScanState, instead of deriving PlanState directly.
> I worry the case of non-heap-scan custom plan, but it might be ok to postpone
> consideration about that at the first cut.
> 
We have some useful routines to implement custom-scan logic, but they takes
ScanState argument, like ExecScan().
Even though we can copy it and paste to extension code, it is not a good manner.
It takes three pointer variables in addition to PlanState. If CPP does not
take care about regular heap scan, keep them unused. It is quite helpful if
CPP implements some original logic on top of existing heap scan.

> * Naming and granularity of objects related to custom plan I'm not sure
> the current naming is appropriate, especially difference between "custom
> plan" and "provider" and "handler".  In the context of CREATE CUSTOM PLAN
> statement, what the term "custom plan" means?  My impression is that "custom
> plan" is an alternative plan type, e.g.
> ctidscan or pg_strom_scan.  Then what the term "provider" means?  My
> impression for that is extension, such as ctidscan and pg_strom.  The
> grammar allows users to pass function via PROVIDER clause of CREATE CUSTOM
> SCAN, so the function would be the provider of the custom plan created by
> the statement.
> 
Hmm... What you want to say is, CREATE X statement is expected to create X.
On the other hand, "custom-plan" is actually created by custom-plan provider,
not this DDL statement. The DDL statement defined custom-plan "provider".
I also think the suggestion is reasonable.

How about the statement below instead?

  CREATE CUSTOM PLAN PROVIDER cpp_name FOR cpp_kind HANDLER cpp_function;
  cpp_kind := SCAN (other types shall be supported later)

> * enable_customscan
> GUC parameter enable_customscan would be useful for users who want to
> disable custom plan feature temporarily.  In the case of pg_strom, using
> GPU for limited sessions for analytic or batch applications seems handy.
> 
It should be done by extension side individually.
Please imagine a user who install custom-GPU-scan, custom-matview-redirect
and custom-cache-only-scan. Purpose of each CPP are quite individually,
so I don't think enable_customscan makes sense.

> * Adding pg_custom_plan catalog
> Using "cust" as prefix for pg_custom_plan causes ambiguousness which makes
> it difficult to choose catalog prefix for a feature named "Custom Foo" in
> future.  How about using "cusp" (CUStom Plan)?
> 
> Or is it better to use pg_custom_plan_provider as catalog relation name,
> as the document says that "CREATE CUSTOM PLAN defines custom plan provider".
> Then prefix could be "cpp" (Custom Plan Provider).
> This seems to match the wording used for pg_foreign_data_wrapper.
> 
My preference "cpp" as a shorten of custom plan provider.


> * CREATE CUSTOM PLAN statement
> This is just a question:  We need to emit CREATE CUSTOM PLAN if we want
> to use I wonder how it is extended when supporting join as custom class.
> 
In case of join, I'll extend the syntax as follows:

  CREATE CUSTOM PLAN cppname
    FOR [HASH JOIN|MERGE JOIN|NEST LOOP]
    PROVIDER provider_func;

Like customScanArg, we will define an argument type for each join methods
then provider_func shall be called with this argument.
I think it is well flexible and extendable approach.

> * New operators about TID comparison
> IMO this portion should be a separated patch, because it adds OID definition
> of existing operators such as tidgt and tidle.  Is there any (explicit or
> implicit) rule about defining macro for oid of an operator?
> 
I don't know the general rules to define static OID definition.
Probably, these are added on demand.

> * Prototype of get_custom_plan_oid()
> custname (or cppname if use the rule I proposed above) seems appropriate
> as the parameter name of get_custom_plan_oid() because similar funcitons
> use catalog column names in such case.
> 
I'll rename it as follows:

  extern Oid get_custom_plan_provider_oid(const char *cpp_name, bool 
missing_ok);


> * Coding conventions
> Some lines are indented with white space.  Future pgindent run will fix
> this issue?
> 
It's my oversight, to be fixed.

> * Unnecessary struct forward declaration Forward declarations of
> CustomPathMethods, Plan, and CustomPlan in includes/nodes/relation.h seem
> unncecessary.  Other headers might have same issue.
> 
I'll check it. I had try & error during the development. It might leave
a dead code here.

> * Unnecessary externing of replace_nestloop_params()
> replace_nestloop_params() is extern-ed but it's never called outside
> createplan.c.
> 
Indeed, it's not needed until we support custom join logic.

> * Externing fix_scan_expr()
> If it's necessary for all custom plan providers to call fix_scan_expr (via
> fix_scan_list macro), isn't it able to do it in set_plan_refs() before
> calling SetCustomPlanRef()?
> 
One alternative idea is:
  if scanrelid of custom-plan is valid (scanrelid > 0) and custom-node
  has no private expression tree to be fixed up, CPP can have no
  SetCustomPlanRef callback. In this case, core backend applies
  fix_scan_list on the targetlist and qual, then adjust scanrelid.

It was what I did in the previous revision, that was concerned by Tom
because it assumes too much things to the custom-node. (It is useful
to only custom "scan" node)

> * What does T_CustomPlanMarkPos  mean?
> It's not clear to me when CustomPlanMarkPos works.  Is it for a custom plan
> provider which supports marking position and rewind to the position, and
> ctidscan just lacks capability to do that, so it is not used anywhere?
> 
Its previous design had a flag whether it allows backward scan, in the body
of CustomPlan structure. However, it makes a problem on 
ExecSupportsMarkRestore()
that takes only node-tag to determine whether the supplied node support
backward scan or not.
Once I tried to change ExecSupportsMarkRestore() to accept node body, then
Tom suggested to use a separated node tag instead.


> * Unnecessary changes in allpaths.c
> some comment about Subquery and CTE are changed (perhaps) accidentally.
> 
No, it is intentional because set_cheapest() was consolidated.

> * Typos
>   * planenr -> planner, implements -> implement in create_custom_plan.sgml
>   * CustomScan in nodeCustom.h should be CustomPlan?
>   * delivered -> derived, in src/backend/optimizer/util/pathnode.c
> 
OK, I'll fix them.

> * Document "Writing Custom Plan Provider" is not provided Custom Plan
> Provider author would (and I DO!) hope documents about writing a custom
> plan provider.
> 
A documentation like fdwhandler.sgml, isn't it?
OK, I'll make it up.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kai...@ak.jp.nec.com>


> 2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kai...@kaigai.gr.jp>:
> > Hanada-san,
> >
> > Thanks for your checks. I oversight the points when I submit the patch,
> sorry.
> > The attached one is revised one on documentation stuff and
> contrib/Makefile.
> >
> > Thanks,
> >
> > 2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.han...@gmail.com>:
> >> Kaigai-san,
> >>
> >> I've just applied v1 patch, and tried build and install, but I found
> two issues:
> >>
> >> 1) The contrib/ctidscan is not automatically built/installed because
> >> it's not described in contrib/Makefile.  Is this expected behavior?
> >> 2) I got an error message below when building document.
> >>
> >> $ cd doc/src/sgml
> >> $ make
> >> openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
> >> -d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
> >> openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
> >> "SQL-CREATECUSTOMPLAN"
> >> make: *** [HTML.index] Error 1
> >> make: *** Deleting file `HTML.index'
> >>
> >> I'll review another part of the patch, including the design.
> >>
> >>
> >> 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kai...@kaigai.gr.jp>:
> >>> According to the discussion upthread, I revised the custom-plan
> >>> patch to focus on regular relation scan but no join support right
> >>> now, and to support DDL command to define custom-plan providers.
> >>>
> >>> Planner integration with custom logic to scan a particular relation
> >>> is enough simple, unlike various join cases. It's almost similar to
> >>> what built-in logic are doing now - custom-plan provider adds a path
> >>> node with its cost estimation if it can offer alternative way to
> >>> scan referenced relation. (in case of no idea, it does not need to
> >>> add any paths)
> >>>
> >>> A new DDL syntax I'd like to propose is below:
> >>>
> >>>   CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
> >>>
> >>> <name> is as literal, put a unique identifier.
> >>> <class> is workload type to be offered by this custom-plan provider.
> >>> "scan" is the only option right now, that means base relation scan.
> >>> <function_name> is also as literal; it shall perform custom-plan
> provider.
> >>>
> >>> A custom-plan provider function is assumed to take an argument of
> >>> "internal" type to deliver a set of planner information that is
> >>> needed to construct custom-plan pathnode.
> >>> In case of "scan" class, pointer towards an customScanArg object
> >>> shall be delivered on invocation of custom-plan provider.
> >>>
> >>>     typedef struct {
> >>>         uint32            custom_class;
> >>>         PlannerInfo    *root;
> >>>         RelOptInfo     *baserel;
> >>>         RangeTblEntry  *rte;
> >>>     } customScanArg;
> >>>
> >>> In case when the custom-plan provider function being invoked thought
> >>> it can offer an alternative scan path on the relation of "baserel",
> >>> things to do is (1) construct a CustomPath (or its inherited data
> >>> type) object with a table of callback function pointers (2) put its
> >>> own cost estimation, and (3) call add_path() to register this path as
> an alternative one.
> >>>
> >>> Once the custom-path was chosen by query planner, its
> >>> CreateCustomPlan callback is called to populate CustomPlan node based
> on the pathnode.
> >>> It also has a table of callback function pointers to handle various
> >>> planner's job in setrefs.c and so on.
> >>>
> >>> Similarly, its CreateCustomPlanState callback is called to populate
> >>> CustomPlanState node based on the plannode. It also has a table of
> >>> callback function pointers to handle various executor's job during
> >>> quey execution.
> >>>
> >>> Most of callback designs are not changed from the prior proposition
> >>> in
> >>> v9.4 development cycle, however, here is a few changes.
> >>>
> >>> * CustomPlan became to inherit Scan, and CustomPlanState became to
> >>>   inherit ScanState. Because some useful routines to implement scan-
> >>>   logic, like ExecScan, expects state-node has ScanState as its base
> >>>   type, it's more kindness for extension side. (I'd like to avoid each
> >>>   extension reinvent ExecScan by copy & paste!)
> >>>   I'm not sure whether it should be a union of Join in the future,
> however,
> >>>   it is a reasonable choice to have compatible layout with
> Scan/ScanState
> >>>   to implement alternative "scan" logic.
> >>>
> >>> * Exporting static functions - I still don't have a graceful answer
> here.
> >>>   However, it is quite natural that extensions to follow up interface
> updates
> >>>   on the future version up of PostgreSQL.
> >>>   Probably, it shall become clear what class of functions shall be
> >>>   exported and what class of functions shall be re-implemented within
> >>>   extension side in the later discussion.
> >>>   Right now, I exported minimum ones that are needed to implement
> >>>   alternative scan method - contrib/ctidscan module.
> >>>
> >>> Items to be discussed later:
> >>> * planner integration for relations join - probably, we may define new
> >>>   custom-plan classes as alternative of hash-join, merge-join and
> >>>   nest-loop. If core can know this custom-plan is alternative of hash-
> >>>   join, we can utilize core code to check legality of join.
> >>> * generic key-value style options in custom-plan definition - Hanada
> >>>   san proposed me off-list - like foreign data wrapper. It may enable
> >>>   to configure multiple behavior on a binary.
> >>> * ownership and access control of custom-plan. right now, only
> >>>   superuser can create/drop custom-plan provider definition, thus,
> >>>   it has no explicit ownership and access control. It seems to me
> >>>   a reasonable assumption, however, we may have a usecase that
> >>>   needs custom-plan by unprivileged users.
> >>>
> >>> Thanks,
> >>>
> >>> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kai...@ak.jp.nec.com>:
> >>>>> On 8 May 2014 22:55, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >>>>>
> >>>>> >> We're past the prototyping stage and into productionising what
> >>>>> >> we know works, AFAIK. If that point is not clear, then we need
> >>>>> >> to discuss that first.
> >>>>> >
> >>>>> > OK, I'll bite: what here do we know works?  Not a damn thing
> >>>>> > AFAICS; it's all speculation that certain hooks might be useful,
> >>>>> > and speculation that's not supported by a lot of evidence.  If
> >>>>> > you think this isn't prototyping, I wonder what you think *is*
> prototyping.
> >>>>>
> >>>>> My research contacts advise me of this recent work
> >>>>>   http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
> >>>>> and also that they expect a prototype to be ready by October,
> >>>>> which I have been told will be open source.
> >>>>>
> >>>>> So there are at least two groups looking at this as a serious
> >>>>> option for Postgres (not including the above paper's authors).
> >>>>>
> >>>>> That isn't *now*, but it is at least a time scale that fits with
> >>>>> acting on this in the next release, if we can separate out the
> >>>>> various ideas and agree we wish to proceed.
> >>>>>
> >>>>> I'll submerge again...
> >>>>>
> >>>> Through the discussion last week, our minimum consensus are:
> >>>> 1. Deregulated enhancement of FDW is not a way to go 2. Custom-path
> >>>> that can replace built-in scan makes sense as a first step
> >>>>    towards the future enhancement. Its planner integration is enough
> simple
> >>>>    to do.
> >>>> 3. Custom-path that can replace built-in join takes investigation how
> to
> >>>>    integrate existing planner structure, to avoid (3a) reinvention
> of
> >>>>    whole of join handling in extension side, and (3b) unnecessary
> extension
> >>>>    calls towards the case obviously unsupported.
> >>>>
> >>>> So, I'd like to start the (2) portion towards the upcoming 1st
> >>>> commit-fest on the v9.5 development cycle. Also, we will be able to
> >>>> have discussion for the (3) portion concurrently, probably, towards
> 2nd commit-fest.
> >>>>
> >>>> Unfortunately, I cannot participate PGcon/Ottawa this year. Please
> >>>> share us the face-to-face discussion here.
> >>>>
> >>>> Thanks,
> >>>> --
> >>>> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> >>>> <kai...@ak.jp.nec.com>
> >>>>
> >>> --
> >>> KaiGai Kohei <kai...@kaigai.gr.jp>
> >>
> >>
> >>
> >> --
> >> Shigeru HANADA
> >
> >
> >
> > --
> > KaiGai Kohei <kai...@kaigai.gr.jp>
> 
> 
> 
> --
> Shigeru HANADA

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to