Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-03-25 Thread David Rowley
On Tue, 26 Mar 2019 at 09:00, Tom Lane  wrote:
>
> David Rowley  writes:
> > [ drop-useless-merge-appends-15.patch ]
>
> Pushed with some minor adjustments.

Thank for all your work on this, and thanks for the final push.

FWIW, you should probably have credited yourself as the main author
since I don't think there was much of my patch left.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-03-25 Thread Tom Lane
David Rowley  writes:
> [ drop-useless-merge-appends-15.patch ]

Pushed with some minor adjustments.

> I can get a plan that does end up with Result nodes above a Scan node.

> create table rangep (a int, b int) partition by range (a);
> create table rangep1 partition of rangep for values from(0) to (100);
> explain select r1::text from rangep r1 inner join rangep r2 on
> r1::text = r2::Text order by r1::text;

> However, if we modify is_projection_capable_plan() similar to how
> ExecSupportsMarkRestore() has been modified then we'd need to ensure
> that any changes made to the Append/MergeAppend's tlist also are made
> to the underlying node's tlist.  I'm unsure if that's worth the
> complexity.  What do you think?

I'm inclined to think not, at least not without more compelling
examples than that one ;-).  Note that we had Result-above-the-Append
before, so we're not regressing for such cases, we're just leaving
something on the table.

>> One other remark is that the division of labor between
>> create_[merge]append_path and their costsize.c subroutines
>> seems pretty unprincipled.  I'd be inclined to push all the
>> relevant logic into costsize.c, but have not done so here.
>> Moving the existing cost calculations in create_mergeappend_path
>> into costsize.c would better be done as a separate refactoring patch,
>> perhaps.

> I've not touched that, but happy to do it as a separate patch.

After looking at this, I'm less excited than I was before.
It seems it'd be only marginally less crufty than the way it is now.
In particular, costsize.c would have to be aware of the rule about
single-child MergeAppend being removable, which seems a little outside
its purview: typically we only ask costsize.c to estimate the cost
of a plan node as-presented, not to make decisions about what the
shape of the plan tree will be.  So I left it alone.

regards, tom lane



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-03-09 Thread David Rowley
On Mon, 4 Mar 2019 at 16:26, Tom Lane  wrote:
> I was a bit surprised to find that I didn't need to fool around
> with lying about whether [Merge]Append can project.  I've not dug
> into the exact reason why, but I suspect it's that previous changes
> made in support of parallelization have resulted in ensuring that
> we push the upper tlist down to the children anyway, at some earlier
> stage.

I can get a plan that does end up with Result nodes above a Scan node.

create table rangep (a int, b int) partition by range (a);
create table rangep1 partition of rangep for values from(0) to (100);
explain select r1::text from rangep r1 inner join rangep r2 on
r1::text = r2::Text order by r1::text;

However, if we modify is_projection_capable_plan() similar to how
ExecSupportsMarkRestore() has been modified then we'd need to ensure
that any changes made to the Append/MergeAppend's tlist also are made
to the underlying node's tlist.  I'm unsure if that's worth the
complexity.  What do you think?

> I haven't looked into whether this does the right things for parallel
> planning --- possibly create_[merge]append_path need to propagate up
> parallel-related path fields from the single child?

I looked at this. For Append, with the current code in
add_paths_to_append_rel() we won't increase the parallel_workers
higher than the parallel workers from the single child rel.  The
current code does:

parallel_workers = Max(parallel_workers, fls(list_length(live_childrels)));

so fls(1) == 1.  However, I wonder if it's better to set that
explicitly in create_append_path() for the
list_length(pathnode->subpaths) == 1 case.

The other issue I found when looking into the parallel code was that
partial paths with pathkeys will get made, but will never be used.
The reason is that Append can't normally do anything with these as it
can't maintain their sort order and MergeAppend is not a parallel
aware node.  I ended up adding some new code at the bottom of
generate_append_paths() to add these paths as single subpaths to
Append nodes.  This also required changing create_append_path() so
that it records the child pathkeys when dealing with a single subpath
Append.

> Also, I wonder why you didn't teach ExecSupportsMarkRestore that a
> single-child MergeAppend can support mark/restore.  I didn't add such
> code here either, but I suspect that's an oversight.

Added code for that.

> One other remark is that the division of labor between
> create_[merge]append_path and their costsize.c subroutines
> seems pretty unprincipled.  I'd be inclined to push all the
> relevant logic into costsize.c, but have not done so here.
> Moving the existing cost calculations in create_mergeappend_path
> into costsize.c would better be done as a separate refactoring patch,
> perhaps.

I've not touched that, but happy to do it as a separate patch.

I've attached my changes to your v14 patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


drop-useless-merge-appends-15.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-03-07 Thread Tom Lane
David Rowley  writes:
> On Mon, 4 Mar 2019 at 16:26, Tom Lane  wrote:
>> One other remark is that the division of labor between
>> create_[merge]append_path and their costsize.c subroutines
>> seems pretty unprincipled.  I'd be inclined to push all the
>> relevant logic into costsize.c, but have not done so here.
>> Moving the existing cost calculations in create_mergeappend_path
>> into costsize.c would better be done as a separate refactoring patch,
>> perhaps.

> The part I don't like about that is the fact that we don't generate
> sort paths in the MergeAppend subpaths, instead, we rely on checking
> pathkeys_contained_in() again in create_merge_append_plan() and just
> creating a sort node, if required.  There is some repeat pathkeys
> checking there but I wonder if we'll see much a performance regression
> if we go to the trouble of making sort paths. Is this what you meant?

No, I'm just suggesting taking the stanza "Add up the sizes and costs of
the input paths" out of create_merge_append_path and putting it into
cost_merge_append.  It seems weird to me that in the plain Append case,
cost_append does the adding-up of the child costs, but in the
MergeAppend case it's not done similarly.

regards, tom lane



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-03-07 Thread David Rowley
On Mon, 4 Mar 2019 at 16:26, Tom Lane  wrote:
>
> David Rowley  writes:
> > [ v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch ]
>
> I continue to think that this is the wrong way to go about it,
> and as proof of concept present the attached, which reproduces
> all of the regression-test plan changes appearing in v13 ---
> with a whole lot less mechanism and next to no added planning
> cycles (which certainly cannot be said of v13).

Nice.  I see you solved the problem I had been complaining about by
pulling the surplus Append out of the final plan after setrefs, in
which case the varnos are all set to the special varnos, meaning you
don't suffer from the same problem with nodes higher in the plan
having the varnos of the parent instead of the append child.  This
method is certainly much much better than what I had. Thanks for
taking the time to show me how this should be done.

> Also, I wonder why you didn't teach ExecSupportsMarkRestore that a
> single-child MergeAppend can support mark/restore.  I didn't add such
> code here either, but I suspect that's an oversight.

The reason for that was that I didn't ever create any single-subpath
MergeAppends. I instead created Append paths having isproxy as true.
This is slightly confusing as I was just borrowing Append to act as
this proxy path and it was only these proxy Appends I was dealing with
in ExecSupportsMarkRestore.

I'll need to modify your version of the patch as you're keeping
single-subpath MergeAppends paths, so ExecSupportsMarkRestore needs to
know about those.

> One other remark is that the division of labor between
> create_[merge]append_path and their costsize.c subroutines
> seems pretty unprincipled.  I'd be inclined to push all the
> relevant logic into costsize.c, but have not done so here.
> Moving the existing cost calculations in create_mergeappend_path
> into costsize.c would better be done as a separate refactoring patch,
> perhaps.

The part I don't like about that is the fact that we don't generate
sort paths in the MergeAppend subpaths, instead, we rely on checking
pathkeys_contained_in() again in create_merge_append_plan() and just
creating a sort node, if required.  There is some repeat pathkeys
checking there but I wonder if we'll see much a performance regression
if we go to the trouble of making sort paths. Is this what you meant?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-03-03 Thread Tom Lane
David Rowley  writes:
> [ v13-0001-Forgo-generating-single-subpath-Append-and-Merge.patch ]

I continue to think that this is the wrong way to go about it,
and as proof of concept present the attached, which reproduces
all of the regression-test plan changes appearing in v13 ---
with a whole lot less mechanism and next to no added planning
cycles (which certainly cannot be said of v13).

I was a bit surprised to find that I didn't need to fool around
with lying about whether [Merge]Append can project.  I've not dug
into the exact reason why, but I suspect it's that previous changes
made in support of parallelization have resulted in ensuring that
we push the upper tlist down to the children anyway, at some earlier
stage.

I haven't looked into whether this does the right things for parallel
planning --- possibly create_[merge]append_path need to propagate up
parallel-related path fields from the single child?

Also, I wonder why you didn't teach ExecSupportsMarkRestore that a
single-child MergeAppend can support mark/restore.  I didn't add such
code here either, but I suspect that's an oversight.

One other remark is that the division of labor between
create_[merge]append_path and their costsize.c subroutines
seems pretty unprincipled.  I'd be inclined to push all the
relevant logic into costsize.c, but have not done so here.
Moving the existing cost calculations in create_mergeappend_path
into costsize.c would better be done as a separate refactoring patch,
perhaps.

regards, tom lane

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 42108bd..2be67bc 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8427,17 +8427,16 @@ SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER J
  400 | 400 | 0008
 (4 rows)
 
--- left outer join + nullable clasue
-EXPLAIN (COSTS OFF)
+-- left outer join + nullable clause
+EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
-QUERY PLAN 

- Sort
-   Sort Key: t1.a, ftprt2_p1.b, ftprt2_p1.c
-   ->  Append
- ->  Foreign Scan
-   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
-(5 rows)
+ QUERY PLAN 
+
+ Foreign Scan
+   Output: t1.a, ftprt2_p1.b, ftprt2_p1.c
+   Relations: (public.ftprt1_p1 t1) LEFT JOIN (public.ftprt2_p1 fprt2)
+   Remote SQL: SELECT r6.a, r9.b, r9.c FROM (public.fprt1_p1 r6 LEFT JOIN public.fprt2_p1 r9 ON (((r6.a = r9.b)) AND ((r6.b = r9.a)) AND ((r9.a < 10 WHERE ((r6.a < 10)) ORDER BY r6.a ASC NULLS LAST, r9.b ASC NULLS LAST, r9.c ASC NULLS LAST
+(4 rows)
 
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
  a | b |  c   
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index eb9d1ad..4728511 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2309,8 +2309,8 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
 SELECT t1.a,t2.b,t3.c FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) INNER JOIN fprt1 t3 ON (t2.b = t3.a) WHERE t1.a % 25 =0 ORDER BY 1,2,3;
 
--- left outer join + nullable clasue
-EXPLAIN (COSTS OFF)
+-- left outer join + nullable clause
+EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
 SELECT t1.a,t2.b,t2.c FROM fprt1 t1 LEFT JOIN (SELECT * FROM fprt2 WHERE a < 10) t2 ON (t1.a = t2.b and t1.b = t2.a) WHERE t1.a < 10 ORDER BY 1,2,3;
 
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 187f892..14ad738 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -447,6 +447,21 @@ ExecSupportsMarkRestore(Path *pathnode)
 return false;	/* childless Result */
 			}
 
+		case T_Append:
+			{
+AppendPath *appendPath = castNode(AppendPath, pathnode);
+
+/*
+ * If there's 

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-02-04 Thread David Rowley
On Thu, 31 Jan 2019 at 17:22, David Rowley  wrote:
> I've also attached a rebased patch.

Rebased again.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v12-0001-Forgo-generating-single-subpath-Append-and-Merge.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-01-30 Thread David Rowley
On Sat, 26 Jan 2019 at 13:51, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Thu, 3 Jan 2019 at 08:01, Tomas Vondra  
> > wrote:
> >> AFAICS the patch essentially does two things: (a) identifies Append
> >> paths with a single member and (b) ensures the Vars are properly mapped
> >> when the Append node is skipped when creating the plan.
> >> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But
> >> perhaps we could do at least (b) to setrefs.c?
>
> > The problem I found with doing var translations in setrefs.c was that
> > the plan tree is traversed there breadth-first and in createplan.c we
> > build the plan depth-first.  The problem I didn't manage to overcome
> > with that is that when we replace a "ProxyPath" (actually an Append
> > path marked as is_proxy=true) it only alter targetlists, etc of nodes
> > above that in the plan tree. The nodes below it and their targetlists
> > don't care, as these don't fetch Vars from nodes above them.  If we do
> > the Var translation in setrefs then we've not yet looked at the nodes
> > that don't care and we've already done the nodes that do care. So the
> > tree traversal is backwards.
>
> I don't quite buy this argument, because you haven't given a reason
> why it doesn't apply with equal force to setrefs.c's elimination of
> no-op SubqueryScan nodes.  Yet that does work.

I assume you're talking about the code that's in
set_subqueryscan_references() inside the trivial_subqueryscan()
condition?

If so, then that seems pretty different from what I'm doing here
because trivial_subqueryscan() only ever returns true when the
Vars/Exprs from the subquery's target list match the scan's targetlist
exactly, in the same order.  It's not quite that simple with the
single-subpath Append/MergeAppend case since the Vars/Exprs won't be
the same since they're from different relations and possibly in some
different order.

> Stepping back for a minute: even if we get this to work, how often
> is it going to do anything useful?  It seems like most of the other
> development that's going on is trying to postpone pruning till later,
> so I wonder how often we'll really know at the beginning of planning
> that only one child will survive.

It'll do something useful everytime the planner would otherwise
produce an Append or MergeAppend with a single subpath. Pruning down
to just 1 relation is pretty common, so seems worth optimising for.
There's no other development that postpones pruning until later. If
you're talking about run-time pruning then nothing is "postponed". We
still attempt to prune during planning, it's just that we might not be
able to prune in some cases until during execution, so we may do both.
  You make it sound like we're trying to do away with plan-time
pruning, but that's not happening, in fact, we're working pretty hard
to speed up the planner when plan-time pruning *can* be done. So far
there are no patches to speed up planner for partitioned tables when
the planner can't prune any partitions, although it is something that
I've been giving some thought to.

I've also attached a rebased patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v11-0001-Forgo-generating-single-subpath-Append-and-Merge.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-01-25 Thread Tom Lane
David Rowley  writes:
> On Thu, 3 Jan 2019 at 08:01, Tomas Vondra  
> wrote:
>> AFAICS the patch essentially does two things: (a) identifies Append
>> paths with a single member and (b) ensures the Vars are properly mapped
>> when the Append node is skipped when creating the plan.
>> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But
>> perhaps we could do at least (b) to setrefs.c?

> The problem I found with doing var translations in setrefs.c was that
> the plan tree is traversed there breadth-first and in createplan.c we
> build the plan depth-first.  The problem I didn't manage to overcome
> with that is that when we replace a "ProxyPath" (actually an Append
> path marked as is_proxy=true) it only alter targetlists, etc of nodes
> above that in the plan tree. The nodes below it and their targetlists
> don't care, as these don't fetch Vars from nodes above them.  If we do
> the Var translation in setrefs then we've not yet looked at the nodes
> that don't care and we've already done the nodes that do care. So the
> tree traversal is backwards.

I don't quite buy this argument, because you haven't given a reason
why it doesn't apply with equal force to setrefs.c's elimination of
no-op SubqueryScan nodes.  Yet that does work.

Stepping back for a minute: even if we get this to work, how often
is it going to do anything useful?  It seems like most of the other
development that's going on is trying to postpone pruning till later,
so I wonder how often we'll really know at the beginning of planning
that only one child will survive.

regards, tom lane



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-01-18 Thread David Rowley
I've attached an updated patch. The last one no longer applied due to
the changes made in d723f5687

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v10-0001-Forgo-generating-single-subpath-Append-and-Merge.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-01-12 Thread David Rowley
On Thu, 3 Jan 2019 at 08:01, Tomas Vondra  wrote:
> AFAICS the patch essentially does two things: (a) identifies Append
> paths with a single member and (b) ensures the Vars are properly mapped
> when the Append node is skipped when creating the plan.

Yes, but traditional Append and MergeAppend paths with a single member
are never actually generated.  I say "traditional" as we do happen to
use a single-subpath Append as the "proxy" path to represent a path
that needs to not have a plan node generated during createplan.
Originally I had a new Path type named "ProxyPath" to do this, but per
recommendation from Tom, I overloaded Append to mean that. (Append
already has its meaning overloaded in regards to dummy paths.)

> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But
> perhaps we could do at least (b) to setrefs.c?

The problem I found with doing var translations in setrefs.c was that
the plan tree is traversed there breadth-first and in createplan.c we
build the plan depth-first.  The problem I didn't manage to overcome
with that is that when we replace a "ProxyPath" (actually an Append
path marked as is_proxy=true) it only alter targetlists, etc of nodes
above that in the plan tree. The nodes below it and their targetlists
don't care, as these don't fetch Vars from nodes above them.  If we do
the Var translation in setrefs then we've not yet looked at the nodes
that don't care and we've already done the nodes that do care. So the
tree traversal is backwards.  I sort of borrowed the traversal in
createplan.c since it was in the correct depth-first order.   Equally,
I could have invented an entirely new stage that traverses the path
tree depth-first, but I imagined that would have added more overhead
and raised even more questions. Piggy-backing on createplan seemed
like the best option at the time.

> I'd say mapping the Vars is the most fragile and error-prone piece of
> this patch. It's a bit annoying that it's inventing a new infrastructure
> to do that, translates expressions in various places, establishes new
> rules for tlists (having to call finalize_plan_tlist when calling
> build_path_tlist before create_plan_recurse), etc.

I entirely agree. It's by far the worst part of the patch.  Getting
the code to a working state to do this was hard. I kept finding places
that I'd missed the translation.  I'd rather it worked some other way.
I just don't yet know what that way is.  I changed the core regression
table "tenk" to become a partitioned table with a single partition in
the hope to catch all of the places that need the translations to be
performed.  I'm not entirely confident that I've caught them all by
doing this.

I've attached an updated patch that's really just a rebased version of
v8.  The old version conflicted with changes made in 1db5667b. The
only real change was to the commit message. I'd previously managed to
miss out the part about not generating needless Append/MergeAppend
paths can allow plan shapes that were previously not possible.  I'd
only mentioned the executor not having to pull tuples through these
nodes, which increases performance. Not having this perhaps caused
some of the confusion earlier in this thread.

One drawback to this patch, which I'm unsure if I previously mentioned
is that run-time pruning only works for Append and MergeAppend.  If we
remove the Append/MergeAppend node then the single Append/MergeAppend
sub-plan has no hope of being pruned.  Going from 1 to 0 sub-plans may
not be that common, but no longer supporting that is something that
needs to be considered.   I had imagined that gained flexibility of
plan shapes plus less executor overhead was better than that, but
there likely is a small use-case for keeping that ability.  It seems
impossible to have both advantages of this patch without that
disadvantage.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v9-0001-Forgo-generating-single-subpath-Append-and-MergeA.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-01-02 Thread Tomas Vondra
On 4/1/18 9:26 AM, David Rowley wrote:
> On 16 March 2018 at 04:01, Tom Lane  wrote:
>> I hadn't been paying much attention to this thread, but I've now taken
>> a quick look at the 2018-02-19 patch, and I've got to say I do not like
>> it much.  The changes in createplan.c in particular seem like hack-and-
>> slash rather than anything principled or maintainable.
> 
> Thanks for looking at this.  I didn't manage to discover any other
> working solutions to when the Vars can be replaced. If we don't do
> this in createplan.c then it's going to cause problems in places such
> as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c
> gets hold of the plan.
> 
>> The core issue here is that Paths involving the appendrel and higher
>> will contain Vars referencing the appendrel's varno, whereas the child
>> is set up to emit Vars containing its own varno, and somewhere we've got
>> to match those up.  I don't think though that this problem is exactly
>> specific to single-member Appends, and so I would rather we not invent a
>> solution that's specific to that.  A nearly identical issue is getting
>> rid of no-op SubqueryScan nodes.  I've long wished we could simply not
>> emit those in the first place, but it's really hard to do because of
>> the fact that Vars inside the subquery have different varnos from those
>> outside.  (I've toyed with the idea of globally flattening the rangetable
>> before we start planning, not at the end, but haven't made it happen yet;
>> and anyway that would be only one step towards such a goal.)
> 
> I'm not quite sure why you think the solution I came up with is
> specific to single-member Appends. The solution merely uses
> single-member Append paths as a proxy path for the solution which I've
> tried to make generic to any node type. For example, the patch also
> resolves the issue for MergeAppend, so certainly nothing in there is
> specific to single-member Appends.  I could have made the proxy any
> other path type, it's just that you had suggested Append would be
> better than inventing ProxyPath, which is what I originally proposed.
> 

I think a natural question is how the approach devised in this thread
(based on single-member Append paths) could be used to deal with no-op
Subquery nodes. I don't see an obvious way to do that, and I guess
that's what Tom meant by "specific to single-member Appends".

>> It might be worth looking at whether we couldn't fix the single-member-
>> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c
>> get rid of them.  That's not the most beautiful solution perhaps, but
>> it'd be very localized and low-risk.
> 
> It might be possible, but wouldn't that only solve 1 out of 2
> problems. The problem of the planner not generating the most optimal
> plan is ignored with this. For example, it does not make much sense to
> bolt a Materialize node on top of an IndexScan node in order to
> provide the IndexScan with mark/restore capabilities... They already
> allow that.
> 

Yeah, I think we can't do all the magic in setrefs.c if we want the
decisions to affect plan shape in any significant way - the decision
whether an Append node has only a single node needs to happen while
constructing the path or shortly after it (before we build other paths
on top of it). And before costing the path.

So setrefs.c seems a too late for that :-(

I suppose this limitation was acceptable for no-op Subqueries, but I'm
not sure the same thing applies to single-member Appends.

>> In general setrefs.c is the right place to deal with variable-matching
>> issues.  So even if you don't like that specific idea, it'd probably be
>> worth thinking about handling this by recording instructions telling
>> setrefs what to do, instead of actually doing it at earlier stages.
> 
> From what I can see, setrefs.c is too late. ERRORs are generated
> before setrefs.c gets hold of the plan if we don't replace Vars.
> 
> I'm not opposed to finding a better way to do this.
> 

AFAICS the patch essentially does two things: (a) identifies Append
paths with a single member and (b) ensures the Vars are properly mapped
when the Append node is skipped when creating the plan.

I agree doing (a) in setrefs.c is too late, as mentioned earlier. But
perhaps we could do at least (b) to setrefs.c?

I'd say mapping the Vars is the most fragile and error-prone piece of
this patch. It's a bit annoying that it's inventing a new infrastructure
to do that, translates expressions in various places, establishes new
rules for tlists (having to call finalize_plan_tlist when calling
build_path_tlist before create_plan_recurse), etc.

But we already do such mappings (not exactly the same, but similar) for
other purposes - from setrefs.c. So why can't why do it the same way here?

FWIW I haven't tried to do any of that, and I haven't looked on this
patch for quite a long time, so perhaps I'm missing an obvious obstacle
preventing us from doing that ...


regards

-- 
Tomas 

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-11-23 Thread David Rowley
On Mon, 2 Jul 2018 at 09:37, David Rowley  wrote:
> Now that the September 'fest is open for new patches I'm going to move
> this patch over there.  This patch has become slightly less important
> than some other stuff, but I'd still like to come back to it.

This had bit-rotted quite a bit, so I've rebased it on top of today's master.

There's not really any other changes to speak of. I've not reevaluated
the patch to see if there is any more simple way to take care of this
problem yet.  However, I do recall getting off to quite a few false
starts with this patch and the way I have it now was the only way I
saw to make it possible. Although, perhaps some more experience will
show me something different.

Anyway, I've attached the rebased version.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v8-0001-Remove-single-subpath-Append-and-MergeAppend-node.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-10-01 Thread Michael Paquier
On Mon, Jul 02, 2018 at 09:37:31AM +1200, David Rowley wrote:
> Now that the September 'fest is open for new patches I'm going to move
> this patch over there.  This patch has become slightly less important
> than some other stuff, but I'd still like to come back to it.

Please note that the latest patch does not apply anymore, so I moved it
to CF 2018-11 (already this time of the year!), waiting for your input.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-07-01 Thread David Rowley
Now that the September 'fest is open for new patches I'm going to move
this patch over there.  This patch has become slightly less important
than some other stuff, but I'd still like to come back to it.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-04-01 Thread David Rowley
On 16 March 2018 at 04:01, Tom Lane  wrote:
> I hadn't been paying much attention to this thread, but I've now taken
> a quick look at the 2018-02-19 patch, and I've got to say I do not like
> it much.  The changes in createplan.c in particular seem like hack-and-
> slash rather than anything principled or maintainable.

Thanks for looking at this.  I didn't manage to discover any other
working solutions to when the Vars can be replaced. If we don't do
this in createplan.c then it's going to cause problems in places such
as apply_pathtarget_labeling_to_tlist, which is well before setrefs.c
gets hold of the plan.

> The core issue here is that Paths involving the appendrel and higher
> will contain Vars referencing the appendrel's varno, whereas the child
> is set up to emit Vars containing its own varno, and somewhere we've got
> to match those up.  I don't think though that this problem is exactly
> specific to single-member Appends, and so I would rather we not invent a
> solution that's specific to that.  A nearly identical issue is getting
> rid of no-op SubqueryScan nodes.  I've long wished we could simply not
> emit those in the first place, but it's really hard to do because of
> the fact that Vars inside the subquery have different varnos from those
> outside.  (I've toyed with the idea of globally flattening the rangetable
> before we start planning, not at the end, but haven't made it happen yet;
> and anyway that would be only one step towards such a goal.)

I'm not quite sure why you think the solution I came up with is
specific to single-member Appends. The solution merely uses
single-member Append paths as a proxy path for the solution which I've
tried to make generic to any node type. For example, the patch also
resolves the issue for MergeAppend, so certainly nothing in there is
specific to single-member Appends.  I could have made the proxy any
other path type, it's just that you had suggested Append would be
better than inventing ProxyPath, which is what I originally proposed.

> It might be worth looking at whether we couldn't fix the single-member-
> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c
> get rid of them.  That's not the most beautiful solution perhaps, but
> it'd be very localized and low-risk.

It might be possible, but wouldn't that only solve 1 out of 2
problems. The problem of the planner not generating the most optimal
plan is ignored with this. For example, it does not make much sense to
bolt a Materialize node on top of an IndexScan node in order to
provide the IndexScan with mark/restore capabilities... They already
allow that.

> In general setrefs.c is the right place to deal with variable-matching
> issues.  So even if you don't like that specific idea, it'd probably be
> worth thinking about handling this by recording instructions telling
> setrefs what to do, instead of actually doing it at earlier stages.

>From what I can see, setrefs.c is too late. ERRORs are generated
before setrefs.c gets hold of the plan if we don't replace Vars.

I'm not opposed to finding a better way to do this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-04-01 Thread David Rowley
On 16 March 2018 at 02:46, Robert Haas  wrote:
> If we stick with your idea of using AppendPath, do we actually need
> generate_proxy_paths()?  What paths get lost if we don't have a
> special case for them here?

Actually, we do a surprisingly good job of allowing plan shapes to
stay the same when planning for an only-child scan for an Append or
MergeAppend.  The main difference I did find was that and Append and
MergeAppend don't support Mark and Restore, so if you just generated
the same paths and simply skipped over Appends and MergeAppends you'd
still be left with Materialize nodes which might not actually be
required at all.

This might not be huge, but seeing this made me worried that there
might be some valid reason, if not today, then sometime in the future
why it might not be safe to simply pluck the singleton
Append/MergeAppend nodes out the plan tree.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-03-15 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 15, 2018 at 11:01 AM, Tom Lane  wrote:
>> It might be worth looking at whether we couldn't fix the single-member-
>> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c
>> get rid of them.  That's not the most beautiful solution perhaps, but
>> it'd be very localized and low-risk.

> That's definitely a thought; it's a probably the simplest way of
> saving the run-time cost of the Append node.  However, I don't think
> it's a great solution overall because it doesn't get us the other
> advantages that David mentions in his original post.  I think that to
> gain those advantages we'll need to know at path-creation time that
> there won't ultimately be an Append node in the finished plan.

Meh.  We could certainly know that by inspection ("only one child?
it'll be history").  I remain of the opinion that this is a big patch
with a small patch struggling to get out.

regards, tom lane



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-03-15 Thread Robert Haas
On Thu, Mar 15, 2018 at 11:01 AM, Tom Lane  wrote:
> It might be worth looking at whether we couldn't fix the single-member-
> Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c
> get rid of them.  That's not the most beautiful solution perhaps, but
> it'd be very localized and low-risk.

That's definitely a thought; it's a probably the simplest way of
saving the run-time cost of the Append node.  However, I don't think
it's a great solution overall because it doesn't get us the other
advantages that David mentions in his original post.  I think that to
gain those advantages we'll need to know at path-creation time that
there won't ultimately be an Append node in the finished plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-03-15 Thread Tom Lane
Robert Haas  writes:
> That said, I gather that one problem is the path might contain
> references to child varnos where we need to reference parent varnos.
> That does seem like something we need to handle, but I'm not sure
> whether this is really the right method.  I wonder if we couldn't
> deduce the necessary translations from the parent pointers of the
> paths.  That is, if we've got a proxy path associated with the parent
> rel and the path it's proxying is associated with the child rel, then
> we need to translate from realpath->parent->relids to
> proxypath->parent_relids.

I hadn't been paying much attention to this thread, but I've now taken
a quick look at the 2018-02-19 patch, and I've got to say I do not like
it much.  The changes in createplan.c in particular seem like hack-and-
slash rather than anything principled or maintainable.

The core issue here is that Paths involving the appendrel and higher
will contain Vars referencing the appendrel's varno, whereas the child
is set up to emit Vars containing its own varno, and somewhere we've got
to match those up.  I don't think though that this problem is exactly
specific to single-member Appends, and so I would rather we not invent a
solution that's specific to that.  A nearly identical issue is getting
rid of no-op SubqueryScan nodes.  I've long wished we could simply not
emit those in the first place, but it's really hard to do because of
the fact that Vars inside the subquery have different varnos from those
outside.  (I've toyed with the idea of globally flattening the rangetable
before we start planning, not at the end, but haven't made it happen yet;
and anyway that would be only one step towards such a goal.)

It might be worth looking at whether we couldn't fix the single-member-
Append issue the same way we fix no-op SubqueryScans, ie let setrefs.c
get rid of them.  That's not the most beautiful solution perhaps, but
it'd be very localized and low-risk.

In general setrefs.c is the right place to deal with variable-matching
issues.  So even if you don't like that specific idea, it'd probably be
worth thinking about handling this by recording instructions telling
setrefs what to do, instead of actually doing it at earlier stages.

regards, tom lane



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-03-15 Thread Robert Haas
On Thu, Mar 15, 2018 at 9:22 AM, Robert Haas  wrote:
>> Wouldn't a ProjectionPath just need the same additional translation
>> fields that I've bolted onto AppendPath to make it work properly?
>
> Well, I guess I'm not sure.

Sorry, hit send too soon there.  I'm not sure I entirely understand
the purpose of those changes; I think the comments could use some
work.   For example:

+   /*
+* First we must record the translation expressions in the PlannerInfo.
+* These need to be found when the expression translation is being done
+* when the final plan is being assembled.
+*/

+   /*
+* Record this child as having been promoted.  Some places treat child
+* relations in a special way, and this will give them a VIP ticket to
+* adulthood, where required.
+*/

These comments just recapitulate what the code does, without really
explaining what problem we're trying to solve ("need" for unspecified
reasons, unspecified "special" handling).

That said, I gather that one problem is the path might contain
references to child varnos where we need to reference parent varnos.
That does seem like something we need to handle, but I'm not sure
whether this is really the right method.  I wonder if we couldn't
deduce the necessary translations from the parent pointers of the
paths.  That is, if we've got a proxy path associated with the parent
rel and the path it's proxying is associated with the child rel, then
we need to translate from realpath->parent->relids to
proxypath->parent_relids.

Not that this is an argument of overwhelming import, but note that
ExecSupportsMarkRestore wouldn't need to change if we used
ProjectionPath; that's already got the required handling.

If we stick with your idea of using AppendPath, do we actually need
generate_proxy_paths()?  What paths get lost if we don't have a
special case for them here?

I find promote_child_rel_equivalences() kind of scary.  It seems like
a change that might have unintended and hard-to-predict consequences.
Perhaps that's only my own lack of knowledge.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-03-15 Thread Robert Haas
On Tue, Mar 13, 2018 at 8:20 PM, David Rowley
 wrote:
> On 14 March 2018 at 09:25, Robert Haas  wrote:
>> What do you think about the idea of using a projection path as a proxy
>> path instead of inventing a new method?  It seems simple enough to do:
>>
>> new_path = (Path *) create_projection_path(root, new_rel, old_path,
>> old_path->pathtarget);
>>
>> ...when we need a proxy path.
>
> I'm very open to finding a better way to do this, but does that not
> just handle the targetlist issue? The proxy path also carries
> information which allows the translation of Vars in other places in
> the plan from the old rel into the vars of the new rel. Join
> conditions, sort clauses etc.
>
> Wouldn't a ProjectionPath just need the same additional translation
> fields that I've bolted onto AppendPath to make it work properly?

Well, I guess I'm not sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-03-13 Thread David Rowley
On 14 March 2018 at 09:25, Robert Haas  wrote:
> What do you think about the idea of using a projection path as a proxy
> path instead of inventing a new method?  It seems simple enough to do:
>
> new_path = (Path *) create_projection_path(root, new_rel, old_path,
> old_path->pathtarget);
>
> ...when we need a proxy path.

I'm very open to finding a better way to do this, but does that not
just handle the targetlist issue? The proxy path also carries
information which allows the translation of Vars in other places in
the plan from the old rel into the vars of the new rel. Join
conditions, sort clauses etc.

Wouldn't a ProjectionPath just need the same additional translation
fields that I've bolted onto AppendPath to make it work properly?

I've looked at the projection path code but got a bit confused with
the dummypp field. I see where it gets set, but not where it gets
checked. A comment in create_projection_plan() seems to explain that
dummypp is pretty useless since targetlists are modified sometimes, so
I'm a bit unsure what the point of it is. Maybe that just goes to show
that my understanding of projection paths is not that great.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-03-13 Thread Robert Haas
On Mon, Feb 19, 2018 at 4:02 AM, David Rowley
 wrote:
> On 19 February 2018 at 18:01, David Rowley  
> wrote:
>> On 19 February 2018 at 15:11, Tomas Vondra  
>> wrote:
>>> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual
>>> naming for boolean variables.
>>
>> You're right. I'll change that and post an updated patch. I'll also
>> reprocess your email again and try to improve some comments to make
>> the intent of the code more clear.
>
> I've made a few changes to the patch. "isproxy" is now "is_proxy".
> I've made the comments a bit more verbose at the top of the definition
> of struct AppendPath. Also shuffled some comments around in allpaths.c
>
> I've attached both a delta patch with just these changes, and also a
> completely new patch based on a recent master.

While I was working on
http://postgr.es/m/ca+tgmoakt5gmahbpwgqrr2nadfomaonoxyowhrdvfgws34t...@mail.gmail.com
I noticed that a projection path is very close to being usable as a
proxy path, because we've already got code to strip out unnecessary
proxy paths at plan generation time.  I noticed that there were a few
problems with that which I wrote patches to fix (see 0001, 0002
attached to that email) but they seem to be only minor issues.

What do you think about the idea of using a projection path as a proxy
path instead of inventing a new method?  It seems simple enough to do:

new_path = (Path *) create_projection_path(root, new_rel, old_path,
old_path->pathtarget);

...when we need a proxy path.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-02-19 Thread David Rowley
On 19 February 2018 at 18:01, David Rowley  wrote:
> On 19 February 2018 at 15:11, Tomas Vondra  
> wrote:
>> and perhaps we should do s/isproxy/is_proxy/ which seems like the usual
>> naming for boolean variables.
>
> You're right. I'll change that and post an updated patch. I'll also
> reprocess your email again and try to improve some comments to make
> the intent of the code more clear.

I've made a few changes to the patch. "isproxy" is now "is_proxy".
I've made the comments a bit more verbose at the top of the definition
of struct AppendPath. Also shuffled some comments around in allpaths.c

I've attached both a delta patch with just these changes, and also a
completely new patch based on a recent master.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_singleton_appends_2018-02-19_delta.patch
Description: Binary data


remove_singleton_appends_2018-02-19.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-02-18 Thread David Rowley
On 19 February 2018 at 15:11, Tomas Vondra  wrote:
> 1) I can confirm that it indeed eliminates the Append overhead, using
> the example from [1], modified to use table with a single partition. Of
> course, this is a pretty formal check, because the patch simply removes
> the Append node and so it'd be utterly broken if the overhead did not
> disappear too.
>
> [1]
> https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com

Thanks for testing that.

> 2) If I had to guess where the bugs will be, my guess would be a missing
> finalize_plan_tlist call - not necessarily in existing code, but perhaps
> in future improvements.

I agree with that. I'd say though that additionally for the future
that we might end up with some missing translation calls to
replace_translatable_exprs().

To try to ensure I didn't miss anything, I did previously modify the
regression test tables "tenk" and "tenk1" to become partitioned tables
with a single partition which allowed all possible values to try to
ensure I'd not missed anywhere. I just failed to find a reasonable way
to make this a more permanent test without enforcing that change on
the tables as a permanent change.

> I wonder if we could automate this call somehow, say by detecting when
> create_plan_recurse is called after build_path_tlist (for a given path),
> and if needed calling finalize_plan_tlist from create_plan_recurse.

Sounds nice, but I'm unsure how we could do that.

> Although, now that I look at it, promote_child_relation may be doing
> something like that by adding stuff to the translate_* variables.

Do you mean this?

/*
* Record this child as having been promoted.  Some places treat child
* relations in a special way, and this will give them a VIP ticket to
* adulthood, where required.
*/
root->translated_childrelids =
bms_add_members(root->translated_childrelids, child->relids);

That's to get around a problem in use_physical_tlist() where there's
different behaviour for RELOPT_OTHER_MEMBER_REL than there is for
RELOPT_BASEREL. Another option might be to instead change the
RELOPT_OTHER_MEMBER_REL  into  RELOPT_BASEREL, although I was a little
too scared that might result in many other areas requiring being
changed. I may be wrong about that though. We do, for example,
occasionally change a RELOPT_BASEREL into a RELOPT_DEADREL, so
RelOptInfos changing their RelOptKind is not a new concept, so perhaps
it's fine.

> I'm
> not quite sure why we need to append to the lists, though?

Multiple Append rels may be replaced by their only-child relation, so
we'd need to be able to translate the targetlists for both. For
example, joins above the Appends may contain Vars which belong to
either of the Appends and those need to be translated into the
promoted child relation's Vars.

> 3) I'm not quite sure what replace_translatable_exprs does, or more
> precisely why do we actually need it. At this point the comment pretty
> much says "We need to do translation. This function does translation."
> Perhaps it should mention why removing/skipping the AppendPath implies
> the need for translation or something?

It does mention "Expr translation is required to translate the
targetlists of nodes above the Append", but perhaps I can make that a
bit more clear.

Let's say we have a query such as:

SELECT * FROM fact f INNER JOIN dim1 d ON f.dim1_id = d.id WHERE
f.date BETWEEN '2018-01-01' AND '2018-01-31';

Which results in a hash join and a single Jan 2018 partition being
scanned for "fact". A plan without the Append node having been removed
would have the Append targetlist containing the Vars from the "fact"
table, as would the Hash Join node... The problem this function is
trying to solve is translating the Vars in the Hash Join node (or any
node above the Append) to change the "fact" Vars into "fact_jan2018"
Vars. In create_plan.c we skip over any isproxy Append paths and
instead return the recursive call to the single proxy-path. Without
the translation we'd run into problems when trying to find Vars in the
targetlists later.

> 4) The thread talks about "[Merge]Append" but the patch apparently only
> tweaks create_append_path and does absolutely nothing to "merge" case.
> While looking into why, I notices that generate_mergeappend_paths always
> gets all_child_pathkeys=NULL in this case (with single subpath), and so
> we never create the MergePath at all. I suppose there's some condition
> somewhere responsible for doing that, but I haven't found it ...

Yeah, only Append paths are used as proxy paths. The confusion here is
that the path creation logic has also been changed in allpaths.c (see
generate_proxy_paths()), so that we no longer build MergeAppend paths
when there's a single subpath. It's probably just the fact that Append
is being hi-jacked to act as ProxyPath that's causing the confusion
there. If you look over generate_proxy_paths and where it gets called
from, you'll see 

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-02-18 Thread Tomas Vondra
Hi,

On 01/25/2018 01:55 PM, David Rowley wrote:
> I've attached an updated patch which takes care of the build failure 
> caused by the new call to create_append_path added in bb94ce4.
> 

I've been looking at the patch over the past few days. I haven't found
any obvious issues with it, but I do have a couple of questions.

1) I can confirm that it indeed eliminates the Append overhead, using
the example from [1], modified to use table with a single partition. Of
course, this is a pretty formal check, because the patch simply removes
the Append node and so it'd be utterly broken if the overhead did not
disappear too.

[1]
https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com


2) If I had to guess where the bugs will be, my guess would be a missing
finalize_plan_tlist call - not necessarily in existing code, but perhaps
in future improvements.

I wonder if we could automate this call somehow, say by detecting when
create_plan_recurse is called after build_path_tlist (for a given path),
and if needed calling finalize_plan_tlist from create_plan_recurse.

Although, now that I look at it, promote_child_relation may be doing
something like that by adding stuff to the translate_* variables. I'm
not quite sure why we need to append to the lists, though?


3) I'm not quite sure what replace_translatable_exprs does, or more
precisely why do we actually need it. At this point the comment pretty
much says "We need to do translation. This function does translation."
Perhaps it should mention why removing/skipping the AppendPath implies
the need for translation or something?


4) The thread talks about "[Merge]Append" but the patch apparently only
tweaks create_append_path and does absolutely nothing to "merge" case.
While looking into why, I notices that generate_mergeappend_paths always
gets all_child_pathkeys=NULL in this case (with single subpath), and so
we never create the MergePath at all. I suppose there's some condition
somewhere responsible for doing that, but I haven't found it ...


5) The comment before AppendPath typedef says this:

 * An AppendPath with a single subpath can be set up to become a "proxy"
 * path. This allows a Path which belongs to one relation to be added to
 * the pathlist of some other relation.  This is intended as generic
 * infrastructure, but its primary use case is to allow Appends with
 * only a single subpath to be removed from the final plan.

I'm not quite sure how adding a 'isproxy' flag into AppendPath node
makes this a generic infrastructure? Wouldn't an explicit ProxyPath be a
better way to achieve that?

What other proxy paths do you envision, and why should they be
represented as AppendPath? Although, there seems to be a precedent in
using AppendPath to represent dummy paths ...

FWIW one advantage of a separate ProxyPath would be that it would be a
much clearer breakage for third-party code (say, extensions tweaking
paths using hooks), either at compile or execution time. With just a new
flag in existing node, that may easily slip through. On the other hand
no one promises the structures to be stable API ...


6) I suggest we add this assert to create_append_path:

Assert(!isproxy || (isproxy && (list_length(subpaths)==1)));

and perhaps we should do s/isproxy/is_proxy/ which seems like the usual
naming for boolean variables.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-01-25 Thread David Rowley
I've attached an updated patch which takes care of the build failure
caused by the new call to create_append_path added in bb94ce4.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_singleton_appends_2018-01-26.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-27 Thread David Rowley
I've attached a rebased patch.

The previous patch was conflicting with parallel Hash Join (180428404)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_singleton_appends_2017-12-28.patch
Description: Binary data


Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-11 Thread Ashutosh Bapat
On Mon, Dec 11, 2017 at 3:16 PM, David Rowley
 wrote:
>
> Sometime in the future, I'd like to see some sort of UniqueKeys List
> in RelOptInfo which is initially populated by using looking at the
> unique indexes during build_simple_rel(). The idea is that these
> UniqueKeys would get propagated into RELOPT_JOINRELs when the join
> condition matches a set of unique keys on the other side of the join.
> e.g. If the outer side of the join had a UniqueKey matching the join
> condition then we could take all of the UniqueKeys from the RelOptInfo
> for the inner side of the join (and vice versa). This infrastructure
> would allow us to no-op a DISTINCT when the RelOptInfo has targetlist
> items which completely cover at least one UniqueKey set, or not bother
> performing a sort for a GROUP BY when the GROUP BY clause is covered
> by a UniqueKey.
>

That looks neat.

> To fix the case we're discussing, we can also propagate those
> UniqueKeys to an AppendPath with a single subpath similar to how I'm
> propagating the PathKeys for the same case in this patch, that way the
> unique join code would find the UniqueKeys instead of what it does
> today and not find any unique indexes on the partitioned table.
>

Another possibility is to support partition-wise join between an
unpartitioned table and a partitioned table by joining the
unpartitioned table with each partition separately and appending the
results. That's a lot of work and quite some new infrastructure. Each
of those join will pick unique key if appropriate index exists on the
partitions.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-11 Thread David Rowley
On 11 December 2017 at 22:23, Ashutosh Bapat
 wrote:
> I think this didn't work with inheritance before partition-wise join
> as well for the same reason.

Yeah, I was just demonstrating a reason that the plans are not
identical from querying a partitioned table when partition pruning
eliminates all but one partition vs directly querying that single
partition. I've been attaching some regression test changes which
demonstrate some other subtle differences along with each version of
the patch. I just wanted to be open with these subtle differences that
I've encountered while working on this.

> Probably, we could do it if unique paths (sorted paths which can be
> unique-ified) bubble up the Append hierarchy. We already have code in
> add_paths_to_append_rel() to create sorted paths when even a single
> child has an index on it.

Sometime in the future, I'd like to see some sort of UniqueKeys List
in RelOptInfo which is initially populated by using looking at the
unique indexes during build_simple_rel(). The idea is that these
UniqueKeys would get propagated into RELOPT_JOINRELs when the join
condition matches a set of unique keys on the other side of the join.
e.g. If the outer side of the join had a UniqueKey matching the join
condition then we could take all of the UniqueKeys from the RelOptInfo
for the inner side of the join (and vice versa). This infrastructure
would allow us to no-op a DISTINCT when the RelOptInfo has targetlist
items which completely cover at least one UniqueKey set, or not bother
performing a sort for a GROUP BY when the GROUP BY clause is covered
by a UniqueKey.

To fix the case we're discussing, we can also propagate those
UniqueKeys to an AppendPath with a single subpath similar to how I'm
propagating the PathKeys for the same case in this patch, that way the
unique join code would find the UniqueKeys instead of what it does
today and not find any unique indexes on the partitioned table.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-11 Thread Ashutosh Bapat
On Mon, Dec 11, 2017 at 2:42 PM, David Rowley
 wrote:
> On 11 December 2017 at 21:18, Ashutosh Bapat
>  wrote:
>> On Thu, Dec 7, 2017 at 5:11 AM, David Rowley
>>  wrote:
>>> While rebasing this today I also noticed that we won't properly detect
>>> unique joins in add_paths_to_joinrel() as we're still testing for
>>> uniqueness against the partitioned parent rather than the only child.
>>> This is likely not a huge problem since we'll always get a false
>>> negative and never a false positive, but it is a missing optimisation.
>>> I've not thought of how to solve it yet, it's perhaps not worth going
>>> to too much trouble over.
>>>
>>
> [...]
>
>> Do you have a testcase, which shows the problem?
>
> I do indeed:
>
> create table p (a int not null) partition by range (a);
> create table p1 partition of p for values from (minvalue) to (maxvalue);
> create unique index on p1 (a);
> create table t (a int not null);
>
> insert into p values(1),(2);
> insert into t values(1);
>
> analyze p;
> analyze t;
>
> explain (verbose, costs off) select * from p inner join t on p.a = t.a;
>  QUERY PLAN
> -
>  Nested Loop
>Output: p1.a, t.a
>Join Filter: (p1.a = t.a)
>->  Seq Scan on public.t
>  Output: t.a
>->  Seq Scan on public.p1
>  Output: p1.a
> (7 rows)
>
> explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a;
>  QUERY PLAN
> -
>  Nested Loop
>Output: p1.a, t.a
>Inner Unique: true
>Join Filter: (p1.a = t.a)
>->  Seq Scan on public.t
>  Output: t.a
>->  Seq Scan on public.p1
>  Output: p1.a
> (8 rows)
>
> Notice that when we join to the partitioned table directly and the
> Append is removed, we don't get the "Inner Unique: true"

Ok. I thought we don't use unique joins in partition-wise joins. Sorry
for the misunderstanding.

I think this didn't work with inheritance before partition-wise join
as well for the same reason.

Probably, we could do it if unique paths (sorted paths which can be
unique-ified) bubble up the Append hierarchy. We already have code in
add_paths_to_append_rel() to create sorted paths when even a single
child has an index on it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-11 Thread David Rowley
On 11 December 2017 at 21:18, Ashutosh Bapat
 wrote:
> On Thu, Dec 7, 2017 at 5:11 AM, David Rowley
>  wrote:
>> While rebasing this today I also noticed that we won't properly detect
>> unique joins in add_paths_to_joinrel() as we're still testing for
>> uniqueness against the partitioned parent rather than the only child.
>> This is likely not a huge problem since we'll always get a false
>> negative and never a false positive, but it is a missing optimisation.
>> I've not thought of how to solve it yet, it's perhaps not worth going
>> to too much trouble over.
>>
>
[...]

> Do you have a testcase, which shows the problem?

I do indeed:

create table p (a int not null) partition by range (a);
create table p1 partition of p for values from (minvalue) to (maxvalue);
create unique index on p1 (a);
create table t (a int not null);

insert into p values(1),(2);
insert into t values(1);

analyze p;
analyze t;

explain (verbose, costs off) select * from p inner join t on p.a = t.a;
 QUERY PLAN
-
 Nested Loop
   Output: p1.a, t.a
   Join Filter: (p1.a = t.a)
   ->  Seq Scan on public.t
 Output: t.a
   ->  Seq Scan on public.p1
 Output: p1.a
(7 rows)

explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a;
 QUERY PLAN
-
 Nested Loop
   Output: p1.a, t.a
   Inner Unique: true
   Join Filter: (p1.a = t.a)
   ->  Seq Scan on public.t
 Output: t.a
   ->  Seq Scan on public.p1
 Output: p1.a
(8 rows)

Notice that when we join to the partitioned table directly and the
Append is removed, we don't get the "Inner Unique: true"

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-11 Thread Ashutosh Bapat
On Thu, Dec 7, 2017 at 5:11 AM, David Rowley
 wrote:
>
> While rebasing this today I also noticed that we won't properly detect
> unique joins in add_paths_to_joinrel() as we're still testing for
> uniqueness against the partitioned parent rather than the only child.
> This is likely not a huge problem since we'll always get a false
> negative and never a false positive, but it is a missing optimisation.
> I've not thought of how to solve it yet, it's perhaps not worth going
> to too much trouble over.
>

It's only the jointrelids that are parent's relids, but all the
following tests for unique join use RelOptInfo::relids which are child
relids.

case JOIN_UNIQUE_INNER:
extra.inner_unique = bms_is_subset(sjinfo->min_lefthand,
   outerrel->relids);
break;
case JOIN_UNIQUE_OUTER:
extra.inner_unique = innerrel_is_unique(root,
outerrel->relids,
innerrel,
JOIN_INNER,
restrictlist,
false);
break;
default:
extra.inner_unique = innerrel_is_unique(root,
outerrel->relids,
innerrel,
jointype,
restrictlist,
false);
break;

Do you have a testcase, which shows the problem?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-11-29 Thread David Rowley
On 30 November 2017 at 15:34, Michael Paquier 
wrote:

> On Wed, Nov 15, 2017 at 3:17 PM, David Rowley
>  wrote:
> > The remove_singleton_appends_examples_of_differences_2017-11-15.patch
> > which I've attached applies changes to the regression tests to make
> > many of the major tables partitioned tables with a single partition.
> > It also changes the expected results of the small insignificant plan
> > changes and leaves only the actual true differences. This file is not
> > intended for commit, but more just a demo of it working for a larger
> > variety of plan shapes. There is actually no additional tests with the
> > patch. It relies on the places in the existing tests which have
> > changed.  I didn't add any extra as I can't think of any particular
> > area to target given that it's such a generic thing that can apply to
> > so many different cases.
>
> The last patch set does not apply and did not get any reviews. So I am
> moving this item to the next with waiting on author as status.
>

(returning from 2 weeks leave)

Thanks for managing the commitfest.

I've attached a patch which fixes the conflict with the regression test
expected output in inherits.out. I've also made changes to the expected
output in the new partition_prune test's expected output.

I'll set this back to waiting on review now.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


remove_append_rel_2017-11-30.patch
Description: Binary data