Re: UNION ALL

2019-08-16 Thread Tom Lane
Mark Pasterkamp  writes:
> I am comparing two queries, q1 and q2 respectively.
> Query q1 is the original query and q2 is an attempt to reduce the cost of
> execution via leveraging the materialized view ci_t_15.
> ...
> Running explain analyze on both queries I get the following execution plans.

Huh ... I wonder why the planner decided to try to parallelize Q2 (and not
Q1)?  That seems like a pretty stupid choice, because if I'm reading the
plan correctly (I might not be) each worker process has to read all of
the "title" table and build its own copy of the hash table.  That seems
likely to swamp whatever performance gain might come from parallelizing
the scan of cast_info --- which is likely to be not much, anyway, on a
laptop with probably-not-great disk I/O bandwidth.

In any case, whether that decision was good or bad, making it differently
renders the performance of Q1 and Q2 not very comparable.  It'd be worth
disabling parallelism (SET max_parallel_workers_per_gather = 0) and
retrying Q2 to get a more apples-to-apples comparison.

Another bit of advice is to increase work_mem, so the hashes don't
have to be split into quite so many batches.

I'm noting also that your queries aren't giving the same results ---
Q2 reports returning fewer rows overall.  Do you have rows where
title.production_year is null, perhaps?

regards, tom lane




Re: UNION ALL

2019-08-16 Thread Mark Pasterkamp
First of all, thank you for the replies.

I am using a base installation of postgres 10.10, with no modifications to
any of the system defaults.

I am trying to speedup a join between two tables: the title table and the
cast_info table.

The title table is a table containing information about different movies.
it contains 4626969 records.
the table also has a foreign key index on the cast_info table, enabling the
planner to use a hash-join.

The cast_info table is a table containing the information of which actor
was casted in which movie and contains 62039343 records.

The database also contains a materialized view ci_t_15, defined as:
select * from cast_info join title on cast_info.movie_id = title.id
where title.production_year < 2015

I am comparing two queries, q1 and q2 respectively.
Query q1 is the original query and q2 is an attempt to reduce the cost of
execution via leveraging the materialized view ci_t_15.

Query q1 is defined as:
select * from cast_info join title on cast_info.movie_id = title.id

Query q2 is defined as
select * from cast_info join title on cast_info.movie_id = title.id
where title.production_year >= 2015
UNION ALL
select * from ci_t_15

Both queries are executed on a Dell xps laptop with an I7-8750H processor
and 16 (2*8) gb ram on an SSD running on ubuntu 18.04.2 LTS.

Running explain analyze on both queries I get the following execution plans.
q1:
"Hash Join  (cost=199773.80..2561662.10 rows=62155656 width=103) (actual
time=855.063..25786.264 rows=62039343 loops=1)"
"  Hash Cond: (cast_info.ci_movie_id = title.t_id)"
"  ->  Seq Scan on cast_info  (cost=0.00..1056445.56 rows=62155656
width=42) (actual time=0.027..3837.722 rows=62039343 loops=1)"
"  ->  Hash  (cost=92232.69..92232.69 rows=4626969 width=61) (actual
time=854.548..854.548 rows=4626969 loops=1)"
"Buckets: 65536  Batches: 128  Memory Usage: 3431kB"
"->  Seq Scan on title  (cost=0.00..92232.69 rows=4626969 width=61)
(actual time=0.005..327.588 rows=4626969 loops=1)"
"Planning time: 5.097 ms"
"Execution time: 27236.088 ms"

q2:
"Append  (cost=123209.65..3713445.65 rows=61473488 width=105) (actual
time=442.207..29713.621 rows=60918189 loops=1)"
"  ->  Gather  (cost=123209.65..2412792.77 rows=10639784 width=103) (actual
time=442.206..14634.427 rows=10046633 loops=1)"
"Workers Planned: 2"
"Workers Launched: 2"
"->  Hash Join  (cost=122209.65..1347814.37 rows=4433243 width=103)
(actual time=471.969..12527.840 rows=3348878 loops=3)"
"  Hash Cond: (cast_info.ci_movie_id = title.t_id)"
"  ->  Parallel Seq Scan on cast_info  (cost=0.00..693870.90
rows=25898190 width=42) (actual time=0.006..7302.679 rows=20679781 loops=3)"
"  ->  Hash  (cost=103800.11..103800.11 rows=792043 width=61)
(actual time=471.351..471.351 rows=775098 loops=3)"
"Buckets: 65536  Batches: 32  Memory Usage: 2515kB"
"->  Seq Scan on title  (cost=0.00..103800.11
rows=792043 width=61) (actual time=0.009..376.127 rows=775098 loops=3)"
"  Filter: (t_production_year >= 2015)"
"  Rows Removed by Filter: 3851871"
"  ->  Seq Scan on ci_t_15  (cost=0.00..1194255.04 rows=50833704 width=105)
(actual time=1.143..11967.391 rows=50871556 loops=1)"
"Planning time: 0.268 ms"
"Execution time: 31379.854 ms"

Due to using the materialized view I can reduce the amount of records going
into the hash join, lowering the time from 25786.264 msec to 12527.840 msec.
However, this is where my question comes in, this reduction is completely
negated by the cost of appending both results in the UNION ALL command.
I was wondering if this is normal behaviour.
In my mind, I wouldn't expect appending 2 resultsets to have such a
relative huge cost associated with it.
This is also why I asked what exactly a UNION ALL does to achieve its
functionality, to perhaps gain some insight in its cost.


With kind regards,

Mark

On Thu, 15 Aug 2019 at 21:22, Ibrar Ahmed  wrote:

>
>
> On Fri, Aug 16, 2019 at 12:16 AM <066ce...@free.fr> wrote:
>
>> Generally speaking, when executing UNION ; a DISTINCT is run afterward on
>> the resultset.
>>
>> So, if you're sure that each part of UNION cannot return a line returned
>> by another one, you may use UNION ALL, you'll cut the cost of the final
>> implicit DISTINCT.
>>
>>
>> - Mail original -
>> De: "Mark Pasterkamp" 
>> À: pgsql-hackers@lists.postgresql.org
>> Envoyé: Jeudi 15 Août 2019 20:37:06
>> Objet: UNION ALL
>>
>>
>> Dear all,
>>
>>
>> I was wondering if someone could hel

Re: UNION ALL

2019-08-15 Thread Ibrar Ahmed
On Fri, Aug 16, 2019 at 12:16 AM <066ce...@free.fr> wrote:

> Generally speaking, when executing UNION ; a DISTINCT is run afterward on
> the resultset.
>
> So, if you're sure that each part of UNION cannot return a line returned
> by another one, you may use UNION ALL, you'll cut the cost of the final
> implicit DISTINCT.
>
>
> - Mail original -
> De: "Mark Pasterkamp" 
> À: pgsql-hackers@lists.postgresql.org
> Envoyé: Jeudi 15 Août 2019 20:37:06
> Objet: UNION ALL
>
>
> Dear all,
>
>
> I was wondering if someone could help me understands what a union all
> actually does.
>
>
> For my thesis I am using Apache Calcite to rewrite queries into using
> materialized views which I then give to a Postgres database.
> For some queries, this means that they will be rewritten in a UNION ALL
> style query between an expression and a table scan of a materialized view.
> However, contrary to what I expected, the UNION ALL query is actually a
> lot slower.
>
>
> As an example, say I have 2 tables: actor and movie. Furthermore, there is
> also a foreign key index on movie to actor.
> I also have a materialized view with the join of these 2 tables for all
> movies <= 2015 called A.
> Now, if I want to query all entries in the join between actor and movie, I
> would assume that a UNION ALL between the join of actor and movie for
> movies >2015 and A is faster than executing the original query..
> If I look at the explain analyze part, I can certainly see a reduction in
> cost up until the UNION ALL part, which carries a respective cost more than
> negating the cost reduction up to a point where I might as well not use the
> existing materialized view.
>
>
> I have some trouble understanding this phenomenon.
> One thought which came to my mind was that perhaps UNION ALL might create
> a temporary table containing both result sets, and then do a table scan and
> return that result.
>
> this would greatly increase IO cost which could attribute to the problem.
> However, I am really not sure what UNION ALL actually does to append both
> result sets so I was wondering if someone would be able to help me out with
> this.
>
>
>
>
> Mark
>
>
> 066ce...@free.fr:  Please, avoid top-posting. It makes harder to follow
the
discussion.

-- 
Ibrar Ahmed


Re: UNION ALL

2019-08-15 Thread 066ce286
Generally speaking, when executing UNION ; a DISTINCT is run afterward on the 
resultset.

So, if you're sure that each part of UNION cannot return a line returned by 
another one, you may use UNION ALL, you'll cut the cost of the final implicit 
DISTINCT.


- Mail original -
De: "Mark Pasterkamp" 
À: pgsql-hackers@lists.postgresql.org
Envoyé: Jeudi 15 Août 2019 20:37:06
Objet: UNION ALL


Dear all, 


I was wondering if someone could help me understands what a union all actually 
does. 


For my thesis I am using Apache Calcite to rewrite queries into using 
materialized views which I then give to a Postgres database. 
For some queries, this means that they will be rewritten in a UNION ALL style 
query between an expression and a table scan of a materialized view. 
However, contrary to what I expected, the UNION ALL query is actually a lot 
slower. 


As an example, say I have 2 tables: actor and movie. Furthermore, there is also 
a foreign key index on movie to actor. 
I also have a materialized view with the join of these 2 tables for all movies 
<= 2015 called A. 
Now, if I want to query all entries in the join between actor and movie, I 
would assume that a UNION ALL between the join of actor and movie for movies 
>2015 and A is faster than executing the original query.. 
If I look at the explain analyze part, I can certainly see a reduction in cost 
up until the UNION ALL part, which carries a respective cost more than negating 
the cost reduction up to a point where I might as well not use the existing 
materialized view. 


I have some trouble understanding this phenomenon. 
One thought which came to my mind was that perhaps UNION ALL might create a 
temporary table containing both result sets, and then do a table scan and 
return that result. 

this would greatly increase IO cost which could attribute to the problem. 
However, I am really not sure what UNION ALL actually does to append both 
result sets so I was wondering if someone would be able to help me out with 
this. 




Mark




Re: UNION ALL

2019-08-15 Thread Tom Lane
Mark Pasterkamp  writes:
> I was wondering if someone could help me understands what a union all
> actually does.

Generally speaking, it runs the first query and then the second query.
You'd really need to provide a lot more detail for anyone to say more
than that.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

regards, tom lane




UNION ALL

2019-08-15 Thread Mark Pasterkamp
Dear all,

I was wondering if someone could help me understands what a union all
actually does.

For my thesis I am using Apache Calcite to rewrite queries into using
materialized views which I then give to a Postgres database.
For some queries, this means that they will be rewritten in a UNION ALL
style query between an expression and a table scan of a materialized view.
However, contrary to what I expected, the UNION ALL query is actually a lot
slower.

As an example, say I have 2 tables: actor and movie. Furthermore, there is
also a foreign key index on movie to actor.
I also have a materialized view with the join of these 2 tables for all
movies <= 2015 called A.
Now, if I want to query all entries in the join between actor and movie, I
would assume that a UNION ALL between the join of actor and movie for
movies >2015 and A is faster than executing the original query..
If I look at the explain analyze part, I can certainly see a reduction in
cost up until the UNION ALL part, which carries a respective cost more than
negating the cost reduction up to a point where I might as well not use the
existing materialized view.

I have some trouble understanding this phenomenon.
One thought which came to my mind was that perhaps UNION ALL might create a
temporary table containing both result sets, and then do a table scan and
return that result.
this would greatly increase IO cost which could attribute to the problem.
However, I am really not sure what UNION ALL actually does to append both
result sets so I was wondering if someone would be able to help me out with
this.


Mark


Re: Postgres 10 problem with UNION ALL of null value in "subselect"

2018-04-18 Thread Pavel Stehule
2018-04-19 5:01 GMT+02:00 Kyotaro HORIGUCHI :

> At Mon, 16 Apr 2018 18:39:24 +0530, Ashutosh Bapat <
> ashutosh.ba...@enterprisedb.com> wrote in 

Re: Postgres 10 problem with UNION ALL of null value in "subselect"

2018-04-18 Thread Kyotaro HORIGUCHI
At Mon, 16 Apr 2018 18:39:24 +0530, Ashutosh Bapat 
 wrote in 

Re: Postgres 10 problem with UNION ALL of null value in "subselect"

2018-04-16 Thread Ashutosh Bapat
On Mon, Apr 16, 2018 at 4:10 PM, Martin Swiech <martin.swi...@gmail.com> wrote:
> Hi folks,
>
> I got some complex query which works on PostgreSQL 9.6 , but fails on
> PostgreSQL 10.
>
> Version of PostgreSQL:
> PostgreSQL 10.3 on x86_64-apple-darwin14.5.0, compiled by Apple LLVM version
> 7.0.0 (clang-700.1.76), 64-bit
>
> Simplified core of the problematic query looks like this:
> ```
> select * from (
>select 1::integer as a
> ) t1
> union all
> select * from (
>select null as a
> ) t2;
> ```
>
> It fails with this error message:
> ```
> ERROR:  UNION types integer and text cannot be matched
> LINE 5: select * from (
>^
> SQL state: 42804
> Character: 66
> ```
>

The error disappears if we go one commit before
1e7c4bb0049732ece651d993d03bb6772e5d281a, the error disappears. But
that's I think expected with that commit.

We can work around this problem by casting null to integer like null::integer.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Postgres 10 problem with UNION ALL of null value in "subselect"

2018-04-16 Thread Martin Swiech
Hi folks,

I got some complex query which works on PostgreSQL 9.6 , but fails on
PostgreSQL 10.

Version of PostgreSQL:
PostgreSQL 10.3 on x86_64-apple-darwin14.5.0, compiled by Apple LLVM
version 7.0.0 (clang-700.1.76), 64-bit

Simplified core of the problematic query looks like this:
```
select * from (
   select 1::integer as a
) t1
union all
select * from (
   select null as a
) t2;
```

It fails with this error message:
```
ERROR:  UNION types integer and text cannot be matched
LINE 5: select * from (
   ^
SQL state: 42804
Character: 66
```


It worked on PostgreSQL 9.6.


Query without wrapping subselects (t1 and t2) works on both versions of
PostgreSQL (9.6 and 10) well:
```
select 1::integer as a
union all
select null as a;
```


Is there some new optimization of query processing in PostgreSQL 10, which
needs some "early type determination", but named subselects (t1 and t2)
shades the type from first query?

Or could it be some regression bug?

Thanks for answer.

Martin Swiech


Re: parallel append vs. simple UNION ALL

2018-03-22 Thread Robert Haas
On Mon, Mar 19, 2018 at 11:57 AM, Robert Haas  wrote:
> On Fri, Mar 16, 2018 at 7:35 AM, Rajkumar Raghuwanshi
>  wrote:
>> On Fri, Mar 9, 2018 at 1:04 AM, Robert Haas  wrote:
>>> Great.  Committed 0001.  Are you planning any further testing of this
>>> patch series?
>>
>> Sorry I missed the mail.
>> Yes, I have further tested patches and find no more issues.
>
> OK, thanks to both you and Ashutosh Bapat.  Committed 0002 and 0003.

And now committed 0004.  This area could use significantly more work,
but I think it's better to have this much than not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel append vs. simple UNION ALL

2018-03-19 Thread Robert Haas
On Fri, Mar 16, 2018 at 7:35 AM, Rajkumar Raghuwanshi
 wrote:
> On Fri, Mar 9, 2018 at 1:04 AM, Robert Haas  wrote:
>> Great.  Committed 0001.  Are you planning any further testing of this
>> patch series?
>
> Sorry I missed the mail.
> Yes, I have further tested patches and find no more issues.

OK, thanks to both you and Ashutosh Bapat.  Committed 0002 and 0003.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel append vs. simple UNION ALL

2018-03-16 Thread Rajkumar Raghuwanshi
On Fri, Mar 9, 2018 at 1:04 AM, Robert Haas  wrote:

> Great.  Committed 0001.  Are you planning any further testing of this
> patch series?


Sorry I missed the mail.
Yes, I have further tested patches and find no more issues.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: parallel append vs. simple UNION ALL

2018-03-16 Thread Ashutosh Bapat
On Wed, Mar 14, 2018 at 2:09 AM, Robert Haas  wrote:
> On Tue, Mar 13, 2018 at 12:35 AM, Ashutosh Bapat
>  wrote:
>> It looks like it was not changed in all the places. make falied. I
>> have fixed all the instances of these two functions in the attached
>> patchset (only 0003 changes). Please check.
>
> Oops.  Thanks.
>
> I'm going to go ahead and commit 0001 here.  Any more thoughts on the rest?

Nope. I am good with the patchset.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: parallel append vs. simple UNION ALL

2018-03-13 Thread Robert Haas
On Tue, Mar 13, 2018 at 12:35 AM, Ashutosh Bapat
 wrote:
> It looks like it was not changed in all the places. make falied. I
> have fixed all the instances of these two functions in the attached
> patchset (only 0003 changes). Please check.

Oops.  Thanks.

I'm going to go ahead and commit 0001 here.  Any more thoughts on the rest?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel append vs. simple UNION ALL

2018-03-12 Thread Ashutosh Bapat
On Fri, Mar 9, 2018 at 1:28 AM, Robert Haas <robertmh...@gmail.com> wrote:
>
>> 0003
>> Probably we want to rename generate_union_path() as generate_union_rel() or
>> generate_union_paths() since the function doesn't return a path anymore.
>> Similarly for generate_nonunion_path().
>
> Good point.  Changed.

It looks like it was not changed in all the places. make falied. I
have fixed all the instances of these two functions in the attached
patchset (only 0003 changes). Please check.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
From 7e654c0ae30d867edea5a1a2ca8f7a17b05ed7c5 Mon Sep 17 00:00:00 2001
From: Robert Haas <rh...@postgresql.org>
Date: Fri, 23 Feb 2018 11:53:07 -0500
Subject: [PATCH 1/4] Let Parallel Append over simple UNION ALL have partial
 subpaths.

A simple UNION ALL gets flattened into an appendrel of subquery
RTEs, but up until now it's been impossible for the appendrel to use
the partial paths for the subqueries, so we can implement the
appendrel as a Parallel Append but only one with non-partial paths
as children.

There are three separate obstacles to removing that limitation.
First, when planning a subquery, propagate any partial paths to the
final_rel so that they are potentially visible to outer query levels
(but not if they have initPlans attached, because that wouldn't be
safe).  Second, after planning a subquery, propagate any partial paths
for the final_rel to the subquery RTE in the outer query level in the
same way we do for non-partial paths.  Third, teach finalize_plan() to
account for the possibility that the fake parameter we use for rescan
signalling when the plan contains a Gather (Merge) node may be
propagated from an outer query level.

Patch by me, reviewed and tested by Amit Khandekar and Rajkumar
Raghuwanshi.  Test cases based on examples by Rajkumar Raghuwanshi.

Discussion: http://postgr.es/m/ca+tgmoa6l9a1nnck3atdvzlz4kkhdn1+tm7mfyfvp+uqps7...@mail.gmail.com
---
 src/backend/optimizer/path/allpaths.c |   22 +
 src/backend/optimizer/plan/planner.c  |   16 ++
 src/backend/optimizer/plan/subselect.c|   17 ++-
 src/test/regress/expected/select_parallel.out |   65 +
 src/test/regress/sql/select_parallel.sql  |   25 ++
 5 files changed, 143 insertions(+), 2 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1c792a0..ea4e683 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2179,6 +2179,28 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel,
  create_subqueryscan_path(root, rel, subpath,
 		  pathkeys, required_outer));
 	}
+
+	/* If consider_parallel is false, there should be no partial paths. */
+	Assert(sub_final_rel->consider_parallel ||
+		   sub_final_rel->partial_pathlist == NIL);
+
+	/* Same for partial paths. */
+	foreach(lc, sub_final_rel->partial_pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+		List	   *pathkeys;
+
+		/* Convert subpath's pathkeys to outer representation */
+		pathkeys = convert_subquery_pathkeys(root,
+			 rel,
+			 subpath->pathkeys,
+			 make_tlist_from_pathtarget(subpath->pathtarget));
+
+		/* Generate outer path using this subpath */
+		add_partial_path(rel, (Path *)
+		 create_subqueryscan_path(root, rel, subpath,
+  pathkeys, required_outer));
+	}
 }
 
 /*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 24e6c46..66e7e7b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2195,6 +2195,22 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	}
 
 	/*
+	 * Generate partial paths for final_rel, too, if outer query levels might
+	 * be able to make use of them.
+	 */
+	if (final_rel->consider_parallel && root->query_level > 1 &&
+		!limit_needed(parse))
+	{
+		Assert(!parse->rowMarks && parse->commandType == CMD_SELECT);
+		foreach(lc, current_rel->partial_pathlist)
+		{
+			Path	   *partial_path = (Path *) lfirst(lc);
+
+			add_partial_path(final_rel, partial_path);
+		}
+	}
+
+	/*
 	 * If there is an FDW that's responsible for all baserels of the query,
 	 * let it consider adding ForeignPaths.
 	 */
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index dc86dd5..83008d7 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2202,6 +2202,13 @@ SS_charge_for_initplans(PlannerInfo *root, RelOptInfo *final_rel)
 		path->parallel_safe = false;
 	}
 
+	/*
+	 * Forget about any partial paths and clear consider_parallel, too;
+	 * they're not usable if we attached an initPlan.
+	 */
+	final_rel->partial_pathlist = NIL;
+	final_rel->consider_parallel = f

Re: parallel append vs. simple UNION ALL

2018-03-08 Thread Robert Haas
On Thu, Mar 1, 2018 at 8:30 AM, Ashutosh Bapat
<ashutosh.ba...@enterprisedb.com> wrote:
> The patches look clean. I particularly looked at 0003.
>
> patch 0001
> +/*
> + * Generate partial paths for final_rel, too, if outer query levels might
> + * be able to make use of them.
> + */
> I am not able to understand the construct esp. the if clause. Did you want to
> say "... if there are outer query levels. Those might ..." or something like
> that?

Well, that's what I meant, but I didn't think it was necessary to
spell it out in quite that much detail.

> 0002
> (op->all == top_union->all || op->all) &&
> This isn't really your change.  Checking
> op->all is cheaper than checking equality, so may be we should check that 
> first
> and take advantage of short-circuit condition evaluation. If we do that above
> condition reduces to (op->all || !top_union->all) which is two boolean
> conditions, even cheaper. But may be the second optimization is not worth the
> loss of readability.

I doubt this makes any difference.  The compiler should be smart
enough to do this the same way regardless of exactly how we write it,
and if it's not, it can't make more than a few instructions worth of
difference.  This code is not nearly performance-critical enough for
that to matter.  Also, it's not the job of this patch to whack this
around.

> "identically-propertied UNIONs" may be "UNIONs with identical properties".

Likewise, I didn't write those words, so I don't plan to change them
just because I might have written them differently.

> 0003
> Probably we want to rename generate_union_path() as generate_union_rel() or
> generate_union_paths() since the function doesn't return a path anymore.
> Similarly for generate_nonunion_path().

Good point.  Changed.

> In recurse_set_operations()
> -return NULL;/* keep compiler quiet */
> This line is deleted and instead rel is initialized to NULL. That way we loose
> any chance to detect a future bug because of a block leaving rel uninitialized
> through compiler warning. May be we should replace "return NULL" with "rel =
> NULL", which will not be executed because of the error.

I don't think this is really a problem.  If some code path fails to
initialize rel, it's going to crash when postprocess_setop_rel() calls
set_cheapest().  Any warning the compiler gives is more likely to be a
false-positive than an actual problem.

> +/* Build path list and relid set. */
> +foreach(lc, rellist)
> +{
> With the changes in this patch, we could actually use 
> add_paths_to_append_rel()
> to create an append path. That function builds paths with different pathkeys,
> parameterization (doesn't matter here) and also handles parallel append. So we
> can avoid code duplication and also leverage more optimizations like using
> MergeAppend instead of overall sort etc. But that function doesn't have 
> ability
> to add a final node like make_union_unique(). A similar requirement has arisen
> in partition-wise join where we need to add a final node for finalising
> aggregate on top of paths created by add_paths_to_append_rel().  May be we can
> change that function to return a list of paths, which are then finalized by 
> the
> caller and added to "append" rel. But I don't think doing all that is in the
> scope of this patch set.

Yeah, I thought about all of that and came to similar conclusions.

> 0004
> +if (!op->all)
> +ppath = make_union_unique(op, ppath, tlist, root);
> We could probably push the grouping/sorting down to the parallel workers. But
> again not part of this patchset.

Yep.  There's a lot more work that could be done to improve setop
planning, but I think getting even this much done for v11 would be a
significant step forward.

Updated patches attached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyOn Thu, Mar 1, 2018 at 8:30 AM, Ashutosh Bapat
mailto:ashutosh.ba...@enterprisedb.com;
target="_blank">ashutosh.ba...@enterprisedb.com
wrote:On
Sat, Feb 24, 2018 at 2:55 AM, Robert Haas mailto:robertmh...@gmail.com;>robertmh...@gmail.com
wrote:


 Here's an extended series of patches that
now handles both the simple
 UNION ALL case (where we flatten it) and the unflattened case:


The patches look clean. I particularly looked at 0003.

patch 0001
+  /*
+  * Generate partial paths for final_rel, too, if
outer query levels might
+  * be able to make use of them.
+  */
I am not able to understand the construct esp. the if clause. Did you
want to
say "... if there are outer query levels. Those might ..." or something like
that?

0002
(op-all ==
top_union-all || op-all) 
This 

Re: parallel append vs. simple UNION ALL

2018-03-08 Thread Robert Haas
On Thu, Mar 8, 2018 at 2:46 AM, Rajkumar Raghuwanshi
 wrote:
> On Thu, Mar 8, 2018 at 12:27 AM, Robert Haas  wrote:
>>
>> New patches attached, fixing all 3 of the issues you reported:
>
> Thanks. new patches applied cleanly on head and fixing all reported issue.

Great.  Committed 0001.  Are you planning any further testing of this
patch series?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel append vs. simple UNION ALL

2018-03-07 Thread Rajkumar Raghuwanshi
On Thu, Mar 8, 2018 at 12:27 AM, Robert Haas  wrote:

> New patches attached, fixing all 3 of the issues you reported:
>

Thanks. new patches applied cleanly on head and fixing all reported issue.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: parallel append vs. simple UNION ALL

2018-03-07 Thread Robert Haas
On Wed, Mar 7, 2018 at 5:36 AM, Rajkumar Raghuwanshi
<rajkumar.raghuwan...@enterprisedb.com> wrote:
> With 0001 applied on PG-head, I got reference leak warning and later a
> server crash.
> this crash is reproducible with enable_parallel_append=off also.
> below is the test case to reproduce this.

New patches attached, fixing all 3 of the issues you reported:

0001 is a new patch to fix the incorrect parallel safety marks on
upper relations.  I don't know of a visible effect of this patch by
itself, but there might be one.

0002 is the same as the old 0001, but I made a fix in
SS_charge_for_initplans() which fixed your most recent crash report.
Either this or the previous change also fixed the crash you saw when
using tab-completion.  Also, I added some test cases based on your
failing examples.

0003-0005 are the same as the old 0002-0004.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0005-Consider-Parallel-Append-as-a-way-to-implement-a-uni.patch
Description: Binary data


0004-Generate-a-separate-upper-relation-for-each-stage-of.patch
Description: Binary data


0003-Rewrite-recurse_union_children-to-iterate-rather-tha.patch
Description: Binary data


0002-Let-Parallel-Append-over-simple-UNION-ALL-have-parti.patch
Description: Binary data


0001-Correctly-assess-parallel-safety-of-tlists-when-SRFs.patch
Description: Binary data


Re: parallel append vs. simple UNION ALL

2018-03-05 Thread Robert Haas
On Tue, Feb 27, 2018 at 6:21 AM, Rajkumar Raghuwanshi
 wrote:
> I have applied 0001 on pg-head, and while playing with regression tests, it
> crashed with below test case.
>
> postgres=# SET min_parallel_table_scan_size=0;
> SET
> postgres=# SELECT * FROM information_schema.foreign_data_wrapper_options
> ORDER BY 1, 2, 3;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

Hmm, nice catch.  I think part of the problem here is that commit
69f4b9c85f168ae006929eec44fc44d569e846b9, wherein Andres introduced
the ProjectSet node, didn't really do the right thing in terms of
testing parallel-safety.  Before that commit, is_parallel_safe(root,
(Node *) scanjoin_target->exprs)) was really testing whether the tlist
produced during the scan/join phase was parallel-safe.  However, after
that commit, scanjoin_target->exprs wasn't really the final target for
the scan/join phase any more; instead, it was the first of possibly
several targets computed by split_pathtarget_at_srfs().  Really, the
right thing to do is to test the *last* element in that list for
parallel-safety, but as the code stands we end up testing the *first*
element.  So, if there's a parallel-restricted item in the target list
(this query ends up putting a CoerceToDomain in the target list, which
we currently consider parallel-restricted), it looks we can
nevertheless end up trying to project it in what is supposed to be a
partial path.

There are a large number of cases where this doesn't end up mattering
because the next upper_rel created will not get marked
consider_parallel because its target list will also contain the same
parallel-restricted construct, and therefore the partial paths
generated for the scan/join rel will never get used -- except to
generate Gather/Gather Merge paths, which has already happened; but
that step didn't know about the rest of the scan/join targets either,
so it won't have used them.  However, both create_distinct_paths() and
the code in grouping_planner that populates final_rel assume that they
don't need to retest the target for parallel-safety because no
projection is done at those levels; they just inherit the
parallel-safety marking of the input rel, so in those cases if the
input rel's marking is wrong the result is populated upward.

There's another way final_rel->consider_parallel can be wrong, too: if
the FROM-list is empty, then we create a join rel and set its
consider_parallel flag without regard to the parallel-safety of the
target list.  There are comments in query_planner() says that this
will be dealt with "later", but this seems not to be true. (Before
Tom's commit da1c91631e3577ea5818f855ebb5bd206d559006, the comments
simply ignored the question of whether a check was needed here, but
Tom seems to have inserted an incorrect justification for the
already-wrong code.)

I'm not sure to what degree, if at all, any of these problems are
visible given that we don't use final_rel->consider_parallel for much
of anything.  Certainly, it gets much easier to trigger a problem with
0001 applied, as the test case shows.  But I'm not entirely convinced
that there's no problem even without that.  It seems like every upper
rel that is setting its consider_parallel flag based on the first
element of some list of targets rather than the last is potentially
vulnerable to ending up with the wrong answer, and I'm afraid that
might have some adverse consequence that I haven't quite pinned down
yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: parallel append vs. simple UNION ALL

2018-03-01 Thread Ashutosh Bapat
On Sat, Feb 24, 2018 at 2:55 AM, Robert Haas <robertmh...@gmail.com> wrote:

>
> Here's an extended series of patches that now handles both the simple
> UNION ALL case (where we flatten it) and the unflattened case:
>

The patches look clean. I particularly looked at 0003.

patch 0001
+/*
+ * Generate partial paths for final_rel, too, if outer query levels might
+ * be able to make use of them.
+ */
I am not able to understand the construct esp. the if clause. Did you want to
say "... if there are outer query levels. Those might ..." or something like
that?

0002
(op->all == top_union->all || op->all) &&
This isn't really your change.  Checking
op->all is cheaper than checking equality, so may be we should check that first
and take advantage of short-circuit condition evaluation. If we do that above
condition reduces to (op->all || !top_union->all) which is two boolean
conditions, even cheaper. But may be the second optimization is not worth the
loss of readability.

"identically-propertied UNIONs" may be "UNIONs with identical properties".

0003
Probably we want to rename generate_union_path() as generate_union_rel() or
generate_union_paths() since the function doesn't return a path anymore.
Similarly for generate_nonunion_path().

In recurse_set_operations()
-return NULL;/* keep compiler quiet */
This line is deleted and instead rel is initialized to NULL. That way we loose
any chance to detect a future bug because of a block leaving rel uninitialized
through compiler warning. May be we should replace "return NULL" with "rel =
NULL", which will not be executed because of the error.

+/* Build path list and relid set. */
+foreach(lc, rellist)
+{
With the changes in this patch, we could actually use add_paths_to_append_rel()
to create an append path. That function builds paths with different pathkeys,
parameterization (doesn't matter here) and also handles parallel append. So we
can avoid code duplication and also leverage more optimizations like using
MergeAppend instead of overall sort etc. But that function doesn't have ability
to add a final node like make_union_unique(). A similar requirement has arisen
in partition-wise join where we need to add a final node for finalising
aggregate on top of paths created by add_paths_to_append_rel().  May be we can
change that function to return a list of paths, which are then finalized by the
caller and added to "append" rel. But I don't think doing all that is in the
scope of this patch set.

0004
+if (!op->all)
+ppath = make_union_unique(op, ppath, tlist, root);
We could probably push the grouping/sorting down to the parallel workers. But
again not part of this patchset.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



Re: parallel append vs. simple UNION ALL

2018-03-01 Thread Rajkumar Raghuwanshi
On Sat, Feb 24, 2018 at 2:55 AM, Robert Haas <robertmh...@gmail.com> wrote:

> 0004 causes generate_union_path() to consider both the traditional
> method and also Gather -> Parallel Append -> [partial path for each
> subquery].  This is still a bit rough around the edges and there's a
> lot more that could be done here, but I'm posting what I have for now
> in the (perhaps vain) hope of getting some feedback.  With this, you
> can use Parallel Append for the UNION ALL step of a query like SELECT
> .. UNION ALL .. SELECT ... EXCEPT SELECT ...
>

Hi,

With all 0001,0002,0003 and 0004 patch applied on head, I am getting a
strange crash, while trying to change table name
in a query by using "TAB" key.

Same test case working fine with only 0001 applied and also on PG-head.

below are steps to reproduce.

--run below sqls

SET parallel_setup_cost=0;
SET parallel_tuple_cost=0;
SET min_parallel_table_scan_size=0;
CREATE TABLE tbl_union_t1 (c1 INTEGER PRIMARY KEY,c2 INTEGER,c3 CHAR(10));
INSERT INTO tbl_union_t1 SELECT i, i % 125, to_char(i % 4, 'FM') FROM
generate_series(0, 499,2) i;
CREATE TABLE tbl_union_t2 (c1 INTEGER PRIMARY KEY,c2 INTEGER,c3 CHAR(10));
INSERT INTO tbl_union_t2 SELECT i, i % 125, to_char(i % 4, 'FM') FROM
generate_series(0, 499,3) i;
ANALYSE tbl_union_t1;
ANALYSE tbl_union_t2;

EXPLAIN SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM tbl_union_t1 EXCEPT
SELECT c1,c2 FROM tbl_union_t2 WHERE c1 % 25 =0 )UA;


--now try modifying tbl_union_t1 in the above query
--remove "_union_t1" and press TAB key, It crashed for me.

EXPLAIN SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM tbl**EXCEPT SELECT c1,c2 FROM tbl_union_t2 WHERE c1 % 25 =0 )UA;

postgres=# EXPLAIN SELECT AVG(c1),SUM(c2) FROM (SELECT c1,c2 FROM
tblWARNING:  terminating connection because of crash of another server
process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.

--logfile says something like this
2018-03-01 18:37:36.456 IST [50071] LOG:  database system is ready to
accept connections
2018-03-01 18:38:38.668 IST [50071] LOG:  background worker "parallel
worker" (PID 51703) was terminated by signal 11: Segmentation fault
2018-03-01 18:38:38.668 IST [50071] DETAIL:  Failed process was running:
SELECT pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c WHERE
c.relkind IN ('r', 'S', 'v', 'm', 'f', 'p') AND
substring(pg_catalog.quote_ident(c.relname),1,3)='tbl' AND
pg_catalog.pg_table_is_visible(c.oid) AND c.relnamespace <> (SELECT oid
FROM pg_catalog.pg_namespace WHERE nspname = 'pg_catalog')
UNION
SELECT pg_catalog.quote_ident(n.nspname) || '.' FROM
pg_catalog.pg_namespace n WHERE substring(pg_catalog.quote_ident(n.nspname)
|| '.',1,3)='tbl' AND (SELECT pg_catalog.count(*) FROM
pg_catalog.pg_namespace WHERE substring(pg_catalog.quote_ident(nspname) ||
'.',1,3) =
substring('tbl',1,pg_catalog.length(pg_catalog.quote_ident(nspname))+1)) > 1
UNION
SELECT pg_catalog.quote_ident(n.nspname) || '.' ||
pg_catalog.quote_ident(c.relname) FROM pg_catalog.pg_class c,
pg_catalog.pg_namespace n WHERE c.relnamespace = n.oid AND c.relkind IN
('r', 'S', 'v', 'm', 'f', 'p') AND
substring(pg_catalog.quote_ident(n.nspname) || '.' ||
pg_catalog.quote_ident(c.relname),1,3)='tbl' AND
substring(pg_catalog.quote_id
2018-03-01 18:38:38.668 IST [50071] LOG:  terminating any other active
server processes
2018-03-01 18:38:38.668 IST [50082] WARNING:  terminating connection
because of crash of another server process
2018-03-01 18:38:38.668 IST [50082] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2018-03-01 18:38:38.668 IST [50082] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2018-03-01 18:38:38.670 IST [50076] WARNING:  terminating connection
because of crash of another server process
2018-03-01 18:38:38.670 IST [50076] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2018-03-01 18:38:38.670 IST [50076] HINT:  In a moment you should be able
to reconnect to the database and repeat your command.
2018-03-01 18:38:38.675 IST [50071] LOG:  all server processes terminated;
reinitializing
2018-03-01 18:38:38.702 IST [51712] LOG:  database system was interrupted;
last known up at 2018-03-01 18:37:36 IST
2018-03-01 18:38:38.723 IST [51712] LOG:  database system was not properly
shut down; automatic recovery in progress
2018-03-01 18:38:38.724 IST [51712] LOG:  redo starts at 0/1639510
2

Re: parallel append vs. simple UNION ALL

2018-02-28 Thread Amit Khandekar
> On Sat, Feb 24, 2018 at 2:55 AM, Robert Haas  wrote:
>>
>> 0001 is pretty much the same as the subquery-smarts.patch file I
>> attached to the previous email.  I don't see much reason not to go
>> ahead and commit this, although it could use a test case.  It makes
>> the simple/flattened case work.  After some study I think that the
>> gather-parameter handling is correct, although if somebody felt like
>> reviewing that portion especially I wouldn't say no.

I had a look at 0001 patch. Other than the issue raised by Rajkumar,
it looks good functionally.

Regarding the finalize_plan() changes, I see that in the patch, the
Gather rescan param is now included in the valid_params while calling
finalize_plan() for the SubqueryScan, which looks correct. But I was
thinking that instead of doing that just before the recursive
finalize_plan(), it looks better if we do that at the initial section
of finalize_plan(). We already add initSetParam to valid_params. There
itself we can also add gather_params. Something like this :

@@ -2314,6 +2314,10 @@ finalize_plan(PlannerInfo *root, Plan *plan,
if (initSetParam)
valid_params = bms_union(valid_params, initSetParam);

+   /* Same applies for Gather rescan param */
+   if (gather_param >= 0)
+   valid_params = bms_add_member(valid_params, gather_param);


On 27 February 2018 at 16:51, Rajkumar Raghuwanshi
 wrote:
>
>
> I have applied 0001 on pg-head, and while playing with regression tests, it
> crashed with below test case.
>
> postgres=# SET min_parallel_table_scan_size=0;
> SET
> postgres=# SELECT * FROM information_schema.foreign_data_wrapper_options
> ORDER BY 1, 2, 3;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

This is happening because there is a ProjectionPath as one of the
subpaths, and that is not parallel safe.

 Sort
   Sort Key: ((current_database())::information_schema.sql_identifier),
((w.fdwname)::information_schema.sql_identifier),
pg_options_to_table(w.fdwoptions))).option_name)::information_schema.sql_identifier)
   ->  Result
 ->  ProjectSet
   ->  Hash Join
 Hash Cond: (w.fdwowner = u.oid)
 ->  Seq Scan on pg_foreign_data_wrapper w
   Filter: (pg_has_role(fdwowner,
'USAGE'::text) OR has_foreign_data_wrapper_privilege(oid,
'USAGE'::text))
 ->  Hash
   ->  Seq Scan on pg_authid u

In grouping_planner() where partial paths are generated for final_rel,
we can skip non-parallel-safe paths.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: parallel append vs. simple UNION ALL

2018-02-27 Thread Rajkumar Raghuwanshi
On Sat, Feb 24, 2018 at 2:55 AM, Robert Haas  wrote:

> 0001 is pretty much the same as the subquery-smarts.patch file I
> attached to the previous email.  I don't see much reason not to go
> ahead and commit this, although it could use a test case.  It makes
> the simple/flattened case work.  After some study I think that the
> gather-parameter handling is correct, although if somebody felt like
> reviewing that portion especially I wouldn't say no.
>

I have applied 0001 on pg-head, and while playing with regression tests, it
crashed with below test case.

postgres=# SET min_parallel_table_scan_size=0;
SET
postgres=# SELECT * FROM information_schema.foreign_data_wrapper_options
ORDER BY 1, 2, 3;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

--logfile
2018-02-26 22:06:07.331 IST [43508] LOG:  database system is ready to
accept connections
TRAP: FailedAssertion("!(subpath->parallel_safe)", File: "pathnode.c",
Line: 1813)
2018-02-26 22:06:42.345 IST [43508] LOG:  server process (PID 43519) was
terminated by signal 6: Aborted
2018-02-26 22:06:42.345 IST [43508] DETAIL:  Failed process was running:
SELECT * FROM information_schema.foreign_data_wrapper_options ORDER BY 1,
2, 3;

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: parallel append vs. simple UNION ALL

2018-02-23 Thread Robert Haas
On Sat, Dec 23, 2017 at 4:53 PM, Robert Haas <robertmh...@gmail.com> wrote:
> As I mentioned in the commit message for the Parallel Append commit
> (ab72716778128fb63d54ac256adf7fe6820a1185), it's kind of sad that this
> doesn't work with UNION ALL queries, which are an obvious candidate
> for such parallelization.  It turns out that it actually does work to
> a limited degree: assuming that the UNION ALL query can be converted
> to a simple appendrel, it can consider a parallel append of
> non-partial paths only.  The attached patch lets it consider a
> parallel append of partial paths ...

Here's an extended series of patches that now handles both the simple
UNION ALL case (where we flatten it) and the unflattened case:

0001 is pretty much the same as the subquery-smarts.patch file I
attached to the previous email.  I don't see much reason not to go
ahead and commit this, although it could use a test case.  It makes
the simple/flattened case work.  After some study I think that the
gather-parameter handling is correct, although if somebody felt like
reviewing that portion especially I wouldn't say no.

0002 rewrites recurse_union_children to work iteratively rather than
recursively and renames it to plan_union_children.  This probably
isn't 100% necessary, but it seems to me that the resulting code is
easier to understand, and it reduces the risk of blowing out the
stack.  There should be no user-visible behavior change.

0003 rewrites the setop planner to create a separate upper rel for
each stage of setop planning and uses them to return paths instead of
returning paths directly.  This is necessary preparatory work for
anything that wants to consider multiple possible paths for queries
that go through the full setop planner, but it shouldn't have any
visible impact all by itself.

0004 causes generate_union_path() to consider both the traditional
method and also Gather -> Parallel Append -> [partial path for each
subquery].  This is still a bit rough around the edges and there's a
lot more that could be done here, but I'm posting what I have for now
in the (perhaps vain) hope of getting some feedback.  With this, you
can use Parallel Append for the UNION ALL step of a query like SELECT
.. UNION ALL .. SELECT ... EXCEPT SELECT ...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0004-Consider-Parallel-Append-as-a-way-to-implement-a-uni.patch
Description: Binary data


0003-Generate-a-separate-upper-relation-for-each-stage-of.patch
Description: Binary data


0002-Rewrite-recurse_union_children-to-iterate-rather-tha.patch
Description: Binary data


0001-Let-Parallel-Append-over-simple-UNION-ALL-have-parti.patch
Description: Binary data


parallel append vs. simple UNION ALL

2017-12-23 Thread Robert Haas
As I mentioned in the commit message for the Parallel Append commit
(ab72716778128fb63d54ac256adf7fe6820a1185), it's kind of sad that this
doesn't work with UNION ALL queries, which are an obvious candidate
for such parallelization.  It turns out that it actually does work to
a limited degree: assuming that the UNION ALL query can be converted
to a simple appendrel, it can consider a parallel append of
non-partial paths only.  The attached patch lets it consider a
parallel append of partial paths by doing the following things:

1. Teaching set_subquery_pathlist to create *partial* SubqueryScan
paths as well as non-partial ones.
2. Teaching grouping_planner to create partial paths for the final rel
if not at the outermost query level.
3. Modifying finalize_plan to allow the gather_param to be passed
across subquery boundaries.

#3 is the only part I'm really unsure about; the other stuff looks
pretty cut and dried.

I have a draft patch that handles the case where the union can't be
converted to a simple appendrel, too, but that's not quite baked
enough to post yet.

For those for whom the above may be too technical to follow, here's an example:

pgbench -i 40
explain (costs off) select a.bid from pgbench_accounts a,
pgbench_branches b where a.bid = b.bid and aid % 1000 = 0 union all
select a.bid from pgbench_accounts a where aid % 1000 = 0;

Unpatched:

 Append
   ->  Gather
 Workers Planned: 2
 ->  Hash Join
   Hash Cond: (a.bid = b.bid)
   ->  Parallel Seq Scan on pgbench_accounts a
 Filter: ((aid % 1000) = 0)
   ->  Hash
 ->  Seq Scan on pgbench_branches b
   ->  Gather
 Workers Planned: 2
 ->  Parallel Seq Scan on pgbench_accounts a_1
   Filter: ((aid % 1000) = 0)

Patched:

 Gather
   Workers Planned: 2
   ->  Parallel Append
 ->  Hash Join
   Hash Cond: (a.bid = b.bid)
   ->  Parallel Seq Scan on pgbench_accounts a
 Filter: ((aid % 1000) = 0)
   ->  Hash
 ->  Seq Scan on pgbench_branches b
 ->  Parallel Seq Scan on pgbench_accounts a_1
   Filter: ((aid % 1000) = 0)

In this particular case the change doesn't buy very much, but the
second plan is better because avoid shutting down one set of workers
and starting a new set.  That's more efficient, plus it allows the two
branches to be worked in parallel rather than serially.  On a small
enough scale factor, even without the patch, you get this...

 Gather
   Workers Planned: 2
   ->  Parallel Append
 ->  Nested Loop
   Join Filter: (a.bid = b.bid)
   ->  Seq Scan on pgbench_branches b
   ->  Seq Scan on pgbench_accounts a
 Filter: ((aid % 1000) = 0)
 ->  Seq Scan on pgbench_accounts a_1
   Filter: ((aid % 1000) = 0)

...but that's not good because now we have regular sequential scans
instead of partial sequential scans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


subquery-smarts.patch
Description: Binary data