[HACKERS] WIP: Upper planner pathification

Tom Lane Sun, 28 Feb 2016 12:04:52 -0800

Those with long memories will recall that I've been waving my arms about
$SUBJECT for more than five years.  I started to work seriously on a patch
last summer, and here is a version that I feel comfortable exposing to
public scrutiny (which is not to call it "done"; more below).


The basic point of this patch is to apply the generate-and-compare-Paths
paradigm to the planning steps after query_planner(), which only covers
scan and join processing (the FROM and WHERE parts of a query).  These
later steps deal with grouping/aggregation, window functions, SELECT
DISTINCT, ORDER BY, LockRows (SELECT FOR UPDATE), LIMIT/OFFSET, and
ModifyTable.  Also UNION/INTERSECT/EXCEPT.  Back in the bad old days we
had only one way to do any of that stuff, so there was no real problem
with the approach of converting query_planner's answer into a Plan and
then stacking more Plan nodes atop that.  Over time we grew other ways
to do those steps, and chose between those ways with ad-hoc code in
grouping_planner().  That was messy enough in itself, but it had other
disadvantages too: subquery_planner() had to choose and return a single
Plan, without regard to what the outer query might need.  (Well, we did
pass down a tuple_fraction parameter, but that is a pretty limited bit of
information.)

An even larger problem is that we had no way to handle addition of new
alternative plan types for these upper-planning steps without fundamental
hacking on grouping_planner().  An example is the code I added in commit
addc42c339208d6a and later (planagg.c and other places) for optimization
of MIN/MAX aggregates: that code had a positively incestuous relationship
with grouping_planner(), and was darn ugly in multiple other ways besides.
Of late, the main way this issue has surfaced is that we have no practical
way to plan pushdown of aggregates or updates on remote tables to the
responsible FDWs, because the FDWs cannot create Paths representing such
operations.

The present patch addresses this problem by inventing Path nodes to
represent every post-scan/join step, and changing the API of
grouping_planner() and subquery_planner() so that they return sets of
Paths rather than single Plans.  Creation of a Plan tree happens only
after control returns to the top level of standard_planner().  The Path
nodes for these post-scan/join steps are attached to "upper relation"
RelOptInfos that didn't exist before.  There are provisions for FDWs to
inject candidate Paths for these upper-level steps.  As proof of concept
for that, planagg.c has been revised to work by injecting a new Path
into the grouping/aggregation upper rel, rather than predetermining what
the answer will be.  This vastly decreases its coupling with both
grouping_planner and some other parts of the system such as equivclass.c
(though, the Law of Conservation of Cruft being what it is, I did have to
push some knowledge about planagg.c's work into setrefs.c).

I'm pretty pleased with the way this turned out.  grouping_planner() is
about half the length it was before, and much more straightforward IMO.
planagg.c no longer seems like a complete hack; it's a reasonable
prototype for injecting nontraditional implementation paths into
aggregation or other late planner stages, and grouping_planner() doesn't
need to know about it.

The patch does add a lot of net new lines (and it's not done) but
most of the new code is very straightforward boilerplate.

The main thing that makes this WIP and not committable is that I've not
yet bothered to implement outfuncs.c code and some other debug support for
all the new path struct types.  A lot of the new function header comments
remain to be fleshed out too, and some more documentation needs to be
written.  But I think it's reviewable as-is; the other stuff would just
make it even longer but not more interesting.

There's a lot of future work to be done within this skeleton.  Notably,
I did not fix the UNION/INTERSECT/EXCEPT planning code to consider
multiple paths; it still only generates a single Path tree.  That code
needs to be rewritten from scratch, probably, and it seems like doing so
is a separate project.  I'd also like to do some more refactoring in
createplan.c: some code paths are still doing redundant cost estimation,
and I'm growing increasingly dissatisfied with the "use_physical_tlist"
hack.  But that seems like a separable issue as well.

So, where to go from here?  I'm acutely aware that we're hard up against
the final 9.6 commitfest, and that we discourage major patches arriving
so late in a devel cycle.  But I simply couldn't get this done any faster.
I don't really want to hold it over for the 9.7 devel cycle.  It's been
enough trouble maintaining this patch in the face of conflicting commits
over the last year or so (it's probably still got bugs related to parallel
query...), and there definitely are conflicting patches in the upcoming
'fest.  And the lack of this infrastructure is blocking progress on FDWs
and some other things.

So I'd really like to get this into 9.6.  I'm happy to put it into the
March commitfest if someone will volunteer to review it.

Comments?

                        regards, tom lane

upper-planner-pathification-1.patch.gz
Description: upper-planner-pathification-1.patch.gz

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP: Upper planner pathification

Reply via email to