Re: refactoring allpaths.c (was Re: [HACKERS] Suppressing unused subquery output columns)

2014-07-25 Thread Etsuro Fujita
(2014/06/13 1:37), Tom Lane wrote:
> I wrote:
>> We have a couple votes for this patch and no one has spoken against it,
>> so I'll go ahead and push it into HEAD.
> 
> BTW, I forgot to mention that while working on this patch I was thinking
> it's past time to separate out the subquery support in allpaths.c into
> its own file.  With this patch, allpaths.c is 2363 lines, about 690 of
> which are set_subquery_pathlist and subsidiary functions.  While that
> might not be quite tail-wagging-dog level, I think it's enough to justify
> splitting that code out into a new file, say optimizer/path/subquerypath.c.

+1

Sorry for the late reply.

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


refactoring allpaths.c (was Re: [HACKERS] Suppressing unused subquery output columns)

2014-06-12 Thread Tom Lane
I wrote:
> We have a couple votes for this patch and no one has spoken against it,
> so I'll go ahead and push it into HEAD.

BTW, I forgot to mention that while working on this patch I was thinking
it's past time to separate out the subquery support in allpaths.c into
its own file.  With this patch, allpaths.c is 2363 lines, about 690 of
which are set_subquery_pathlist and subsidiary functions.  While that
might not be quite tail-wagging-dog level, I think it's enough to justify
splitting that code out into a new file, say optimizer/path/subquerypath.c.

There are also about 630 lines devoted to appendrel path generation,
which might be thought enough to deserve a separate file of its own.
I'm a bit less excited about that though, mainly because the appendrel
logic has some not-quite-arms-length interactions with set_rel_size();
but there's certainly some case to be made for doing it.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suppressing unused subquery output columns

2014-06-12 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane  wrote:
>> The attached draft patch fixes this by deleting unused output expressions
>> from unflattened subqueries, so that we get:
>> ...
>> I'm not entirely convinced that it's worth the extra planning cycles,
>> though.  Given the small number of complaints to date, it might not
>> be worth doing this.  Thoughts?

> I've encountered this issue before and think it would be great to fix
> it.  I'm not sure precisely how many cycles it's worth paying, but
> definitely more than zero.

We have a couple votes for this patch and no one has spoken against it,
so I'll go ahead and push it into HEAD.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suppressing unused subquery output columns

2014-06-12 Thread Robert Haas
On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane  wrote:
> The attached draft patch fixes this by deleting unused output expressions
> from unflattened subqueries, so that we get:
>
> regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 
> limit 1) ss;
> QUERY PLAN
> --
>  Subquery Scan on ss  (cost=0.00..0.02 rows=1 width=4)
>->  Limit  (cost=0.00..0.01 rows=1 width=4)
>  ->  Seq Scan on t1  (cost=0.00..34.00 rows=2400 width=4)
>  Planning time: 0.193 ms
> (4 rows)
>
> I'm not entirely convinced that it's worth the extra planning cycles,
> though.  Given the small number of complaints to date, it might not
> be worth doing this.  Thoughts?

I've encountered this issue before and think it would be great to fix
it.  I'm not sure precisely how many cycles it's worth paying, but
definitely more than zero.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suppressing unused subquery output columns

2014-06-07 Thread David Rowley
On Fri, Jun 6, 2014 at 2:27 PM, Tom Lane  wrote:

> I'm not entirely convinced that it's worth the extra planning cycles,
> though.  Given the small number of complaints to date, it might not
> be worth doing this.  Thoughts?
>
>
That's a difficult question for sure. Obviously it's going to depend on the
query that we're planning. If it's a very simple query and we don't reduce
the target list of the subquery any, then we'll get a small performance
impact... But if we do manage to prune down the targetlist and later allow
a left join on a 1 billion row table to be removed, then the cost of the
extra few cycles performed by this patch would be totally unmeasurable and
completely insignificant.

At work we use , and on a daily basis I come across poorly written queries by
other developers.

The things I can think of off hand are:

1. All primary key fields of a table are in a DISTINCT clause
2. Someone had written a query like: SELECT some,1 as type FROM table UNION
SELECT thing,2 as type from othertable

The database in question manage to remove the DISTINCT on 1 because it's
just not needed as each group could only ever have 1 row. On 2 it managed
to see that because column 2 of the UNION query was a constant and it was a
different constant on each side of the UNION, then the query would not
produce duplicates and it changed this to UNION ALL.

At home I checked how PostgreSQL handled both of these cases, and saw that
it failed to see through the non-sense in the poorly written queries on
both accounts. This is fine, as anyone who ever posted on the performance
list saying, "why is this slow?" We'd tell them to write their query
another way. But what about all the companies who consider porting their
application over to PostgreSQL and get to the stage of importing all the
data over onto a test server and trying a few of their (poorly written)
queries. And they see a 10 times slowdown and immediately think PostgreSQL
is just not for them. It's a shame that they'd turn us down so early, but
if we went ahead and added code to detect both of these situations then
we'd end up increasing planning time for everyone else who writes their
queries properly.

I don't really have an answer for this, but I do think it needs more
discussion. The only things I've thought of so far as a planner_strength
GNU that can try to optimise these things when it gets to a certain level,
but that just introduces surprise factor and a whole bunch of threads on
performance saying... Why is this query on pg10.4 slower than pg10.2? and
our reply goes, is planner_strength set to the same on both? It does not
sound pretty, or another option to replan a query when the cost is over a
certain threshold with the strength level turned to maximum, but that seems
like it could go along the same lines as far as surprise factor is
concerned.

It's pretty hard to get the best of both worlds here. I know my post does
not have many answers, but I posted it anyway just in case someone else
comes up with a good idea that perhaps we could all work towards.

Regards

David Rowley


Re: [HACKERS] Suppressing unused subquery output columns

2014-06-06 Thread Jim Nasby

On 6/5/14, 9:54 PM, Tom Lane wrote:

Rod Taylor  writes:

On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane  wrote:

I'm not entirely convinced that it's worth the extra planning cycles,
though.  Given the small number of complaints to date, it might not
be worth doing this.  Thoughts?



Would this avoid execution of expensive functions in views when their
output is discarded?


Yes, as long as they're not marked volatile and don't return sets.


That would certainly make it useful for us.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suppressing unused subquery output columns

2014-06-05 Thread Tom Lane
Rod Taylor  writes:
> On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane  wrote:
>> I'm not entirely convinced that it's worth the extra planning cycles,
>> though.  Given the small number of complaints to date, it might not
>> be worth doing this.  Thoughts?

> Would this avoid execution of expensive functions in views when their
> output is discarded?

Yes, as long as they're not marked volatile and don't return sets.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suppressing unused subquery output columns

2014-06-05 Thread Rod Taylor
On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane  wrote:

> I'm not entirely convinced that it's worth the extra planning cycles,
> though.  Given the small number of complaints to date, it might not
> be worth doing this.  Thoughts?
>


Would this avoid execution of expensive functions in views when their
output is discarded?

-- On 9.3
CREATE TABLE data (col1 serial primary key);
INSERT INTO data DEFAULT VALUES;
INSERT INTO data DEFAULT VALUES;

CREATE OR REPLACE VIEW v AS select *, (pg_sleep(1))::text FROM data;

t=# explain analyze select col1 from v;
  QUERY
PLAN
--
 Subquery Scan on v  (cost=0.00..76.00 rows=2400 width=4) (actual
time=1001.086..2002.217 rows=2 loops=1)
   ->  Seq Scan on data  (cost=0.00..52.00 rows=2400 width=4) (actual
time=1001.083..2002.210 rows=2 loops=1)
 Total runtime: 2002.268 ms
(3 rows)


regards,

Rod


[HACKERS] Suppressing unused subquery output columns

2014-06-05 Thread Tom Lane
A question in pgsql-general made me reflect about how the planner isn't
smart about unreferenced output columns of subqueries that it's not able
to flatten into the parent query.  Here's an example:

regression=# create table t1 (f1 int);
CREATE TABLE
regression=# create table t2 (f2 int primary key, f3 int);
CREATE TABLE
regression=# explain select f1 from (select * from t1 left join t2 on f1=f2) ss;
  QUERY PLAN  
--
 Seq Scan on t1  (cost=0.00..34.00 rows=2400 width=4)
 Planning time: 0.691 ms
(2 rows)

So far so good; we were able to apply join removal to t2, because the
query doesn't require outputting f2 or f3.  But if the subquery can't be
flattened, it stops working:

regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 
limit 1) ss;
 QUERY PLAN

 Subquery Scan on ss  (cost=0.15..0.38 rows=1 width=4)
   ->  Limit  (cost=0.15..0.37 rows=1 width=12)
 ->  Nested Loop Left Join  (cost=0.15..516.00 rows=2400 width=12)
   ->  Seq Scan on t1  (cost=0.00..34.00 rows=2400 width=4)
   ->  Index Scan using t2_pkey on t2  (cost=0.15..0.19 rows=1 
width=8)
 Index Cond: (t1.f1 = f2)
(6 rows)

This is because while planning the separate subquery, the planner sees
"select *" and doesn't realize that f2 and f3 aren't really needed.

The attached draft patch fixes this by deleting unused output expressions
from unflattened subqueries, so that we get:

regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 
limit 1) ss;
QUERY PLAN
--
 Subquery Scan on ss  (cost=0.00..0.02 rows=1 width=4)
   ->  Limit  (cost=0.00..0.01 rows=1 width=4)
 ->  Seq Scan on t1  (cost=0.00..34.00 rows=2400 width=4)
 Planning time: 0.193 ms
(4 rows)

I'm not entirely convinced that it's worth the extra planning cycles,
though.  Given the small number of complaints to date, it might not
be worth doing this.  Thoughts?

regards, tom lane

PS: to be clear, I'm not thinking of applying this till 9.5 opens,
in any case.

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 41eaa26..ec0bf3e 100644
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
***
*** 17,25 
--- 17,28 
  
  #include 
  
+ #include "access/sysattr.h"
  #include "catalog/pg_class.h"
  #include "catalog/pg_operator.h"
+ #include "catalog/pg_type.h"
  #include "foreign/fdwapi.h"
+ #include "nodes/makefuncs.h"
  #include "nodes/nodeFuncs.h"
  #ifdef OPTIMIZER_DEBUG
  #include "nodes/print.h"
*** static void subquery_push_qual(Query *su
*** 98,103 
--- 101,107 
     RangeTblEntry *rte, Index rti, Node *qual);
  static void recurse_push_qual(Node *setOp, Query *topquery,
    RangeTblEntry *rte, Index rti, Node *qual);
+ static void remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel);
  
  
  /*
*** set_subquery_pathlist(PlannerInfo *root,
*** 1124,1130 
  	/*
  	 * Must copy the Query so that planning doesn't mess up the RTE contents
  	 * (really really need to fix the planner to not scribble on its input,
! 	 * someday).
  	 */
  	subquery = copyObject(subquery);
  
--- 1128,1134 
  	/*
  	 * Must copy the Query so that planning doesn't mess up the RTE contents
  	 * (really really need to fix the planner to not scribble on its input,
! 	 * someday ... but see remove_unused_subquery_outputs to start with).
  	 */
  	subquery = copyObject(subquery);
  
*** set_subquery_pathlist(PlannerInfo *root,
*** 1199,1204 
--- 1203,1214 
  	pfree(unsafeColumns);
  
  	/*
+ 	 * The upper query might not use all the subquery's output columns; if
+ 	 * not, we can simplify.
+ 	 */
+ 	remove_unused_subquery_outputs(subquery, rel);
+ 
+ 	/*
  	 * We can safely pass the outer tuple_fraction down to the subquery if the
  	 * outer level has no joining, aggregation, or sorting to do. Otherwise
  	 * we'd better tell the subquery to plan for full retrieval. (XXX This
*** recurse_push_qual(Node *setOp, Query *to
*** 2033,2038 
--- 2043,2171 
  }
  
  /*
+  *			SIMPLIFYING SUBQUERY TARGETLISTS
+  */
+ 
+ /*
+  * remove_unused_subquery_outputs
+  *		Remove subquery targetlist items we don't need
+  *
+  * It's possible, even likely, that the upper query does not read all the
+  * output columns of the subquery.  We can remove any such outputs that are
+  * not needed by the subquery itself (e.g., as sort/grou