Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-06 Thread Merlin Moncure
On Fri, Jul 1, 2016 at 11:45 AM, Alvaro Herrera
 wrote:
> Merlin Moncure wrote:
>
>> It's pretty easy to craft a query where you're on the winning side,
>> but what's the worst case of doing two pass...is constant folding a
>> non trivial fraction of planning time?
>
> One thing that has been suggested is to re-examine the plan after
> planning is done, and if execution time is estimated to be large (FSVO),
> then run a second planning pass with more expensive optimizations
> enabled to try and find better plans.  The guiding principle would be to
> continue to very quickly find good enough plans for
> frequent/small/simple queries, but spend more planning effort on more
> complex ones where execution is likely to take much longer than planning
> time.
>
> So doing constant-folding twice would be enabled for the second planning
> pass.

I like this idea.  Maybe a GUC controlling the cost based cutoff (with
0 meaning, "assume the worst and plan the hard way first").

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Alvaro Herrera
Merlin Moncure wrote:

> It's pretty easy to craft a query where you're on the winning side,
> but what's the worst case of doing two pass...is constant folding a
> non trivial fraction of planning time?

One thing that has been suggested is to re-examine the plan after
planning is done, and if execution time is estimated to be large (FSVO),
then run a second planning pass with more expensive optimizations
enabled to try and find better plans.  The guiding principle would be to
continue to very quickly find good enough plans for
frequent/small/simple queries, but spend more planning effort on more
complex ones where execution is likely to take much longer than planning
time.

So doing constant-folding twice would be enabled for the second planning
pass.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Robert Haas
On Fri, Jul 1, 2016 at 12:00 PM, Merlin Moncure  wrote:
> Sure (I didn't put you on that position, just thinking out loud).  The
> problem with UNION ALL is that it's only safe to do so when you know
> for sure the both sides of the partition are non-overlapping.  The
> author of the query often knows this going in but for the planner it's
> not so simple to figure out in many cases.  If there's a subset of
> cases.   UNION sans ALL is probably a dead end on performance grounds.

I'm not sure about that.  It's certainly true that things are much
more likely to work out when you can prove that UNION ALL is
sufficient, because now you avoid de-duplication.  But if the number
of output rows is really small, it might work out anyway.  I mean,
consider this:

SELECT * FROM enormous WHERE rarely_one = 1 OR EXISTS (SELECT 1 FROM
tiny WHERE tiny.x = enormous.x)

As written, you're not going to be able to answer this query without
scanning a full scan of the enormous table.  If you rewrite it to use
UNION, then the first half can be implemented with an index scan or a
bitmap index scan, and the second half can be implemented with a
nested loop over the tiny table with an inner index scan on the
enormous table.  The fact that you have to deduplicate the results may
be a small price to pay for avoiding an enormous scan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Stephen Frost
Tom, all,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > On Tue, Jun 21, 2016 at 4:18 PM, Merlin Moncure  wrote:
> >> explain analyze select * from foo where false or exists (select 1 from
> >> bar where good and foo.id = bar.id);  -- A
> >> explain analyze select * from foo where exists (select 1 from bar
> >> where good and foo.id = bar.id);  -- B
> >> 
> >> These queries are trivially verified as identical but give very different 
> >> plans.
> 
> > Right.  I suspect wouldn't be very hard to notice the special case of
> > FALSE OR (SOMETHING THAT MIGHT NOT BE FALSE) but I'm not sure that's
> > worth optimizing by itself.
> 
> Constant-folding will get rid of the OR FALSE (as well as actually-useful
> variants of this example).  The problem is that that doesn't happen till
> after we identify semijoins.  So the second one gives you a semijoin plan
> and the first doesn't.  This isn't especially easy to improve.  Much of
> the value of doing constant-folding would disappear if we ran it before
> subquery pullup + join simplification, because in non-stupidly-written
> queries those are what expose the expression simplification opportunities.
> We could run it twice but that seems certain to be a dead loser most of
> the time.

While it might be a loser most of the time to run it twice, I have to
agree that it's pretty unfortunate that we don't handle this case in a
more sane way.  I looked a bit into pull_up_sublinks() and it doens't
look like there's an easy way to realize this case there without going
through the full effort of constant-folding.

One approach that I'm wondering about is to do constant folding first
and then track if we introduce a case where additional constant folding
might help and only perform it again in those cases.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Merlin Moncure
On Fri, Jul 1, 2016 at 10:27 AM, Robert Haas  wrote:
> On Fri, Jul 1, 2016 at 10:20 AM, Merlin Moncure  wrote:
>> Yeah.  Also, even if you could parse out those cases, it's major
>> optimization fence.  Consider if you have an ORDER BY clause here:
>>
>> SELECT FROM foo WHERE a OR b ORDER BY c;
>>
>> ... by pushing inside a union, you're going to be in trouble in real
>> world cases.  That's just a mess and it would add a lot of runtime
>> analysis of the alternative paths.  It's hard for me to believe
>> rewriting is easier and simpler than rewriting 'false OR x' to 'x'.  I
>> also thing that constant folding strategies are going to render much
>> more sensible output to EXPLAIN.
>
> I don't think that it's easier and simpler and didn't intend to say
> otherwise.  I do think that I've run across LOTS of queries over the
> years where rewriting OR using UNION ALL was a lot faster, and I think
> that case is more likely to occur in practice than FALSE OR WHATEVER.
> But, I'm just throwing out opinions to see what sticks here; I'm not
> deeply invested in this.

Sure (I didn't put you on that position, just thinking out loud).  The
problem with UNION ALL is that it's only safe to do so when you know
for sure the both sides of the partition are non-overlapping.  The
author of the query often knows this going in but for the planner it's
not so simple to figure out in many cases.  If there's a subset of
cases.   UNION sans ALL is probably a dead end on performance grounds.

This hinges on Tom's earlier statements, "Much of
the value of doing constant-folding would disappear if we ran it before
subquery pullup + join simplification, because in non-stupidly-written
queries those are what expose the expression simplification opportunities."

and, especially, "We could run it twice but that seems certain to be a
dead loser most of
the time."

It's pretty easy to craft a query where you're on the winning side,
but what's the worst case of doing two pass...is constant folding a
non trivial fraction of planning time?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Robert Haas
On Fri, Jul 1, 2016 at 10:11 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jul 1, 2016 at 9:52 AM, Tom Lane  wrote:
>>> Maybe, but neither UNION nor UNION ALL would duplicate the semantics
>>> of OR, so there's some handwaving here that I missed.
>
>> SELECT * FROM foo WHERE a = 5 OR a = 4
>> isn't equivalent to
>> SELECT * FROM foo WHERE a = 5
>> UNION
>> SELECT * FROM foo WHERE a = 4
>> ?
>
> It probably is, but you're assuming that "a" appears in the list of
> columns being unioned.  If you make that just "SELECT b FROM ..."
> then the latter form gets rid of duplicate b values where the first
> doesn't.  On the other hand, UNION ALL might introduce duplicates
> not present in the OR query's result.

Right, so, significant query transformations are non-trivial.  But the
point is that with the upper planification stuff, I think it is
possible, at least in some cases, that we could consider reordering
set operations with scan/join planning, just as we've previously
talked about reordering grouping stages relative to scan/join
planning.

The details are undeniably hard to get right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Robert Haas
On Fri, Jul 1, 2016 at 10:20 AM, Merlin Moncure  wrote:
> Yeah.  Also, even if you could parse out those cases, it's major
> optimization fence.  Consider if you have an ORDER BY clause here:
>
> SELECT FROM foo WHERE a OR b ORDER BY c;
>
> ... by pushing inside a union, you're going to be in trouble in real
> world cases.  That's just a mess and it would add a lot of runtime
> analysis of the alternative paths.  It's hard for me to believe
> rewriting is easier and simpler than rewriting 'false OR x' to 'x'.  I
> also thing that constant folding strategies are going to render much
> more sensible output to EXPLAIN.

I don't think that it's easier and simpler and didn't intend to say
otherwise.  I do think that I've run across LOTS of queries over the
years where rewriting OR using UNION ALL was a lot faster, and I think
that case is more likely to occur in practice than FALSE OR WHATEVER.
But, I'm just throwing out opinions to see what sticks here; I'm not
deeply invested in this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Merlin Moncure
On Fri, Jul 1, 2016 at 9:11 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jul 1, 2016 at 9:52 AM, Tom Lane  wrote:
>>> Maybe, but neither UNION nor UNION ALL would duplicate the semantics
>>> of OR, so there's some handwaving here that I missed.
>
>> SELECT * FROM foo WHERE a = 5 OR a = 4
>> isn't equivalent to
>> SELECT * FROM foo WHERE a = 5
>> UNION
>> SELECT * FROM foo WHERE a = 4
>> ?
>
> It probably is, but you're assuming that "a" appears in the list of
> columns being unioned.  If you make that just "SELECT b FROM ..."
> then the latter form gets rid of duplicate b values where the first
> doesn't.  On the other hand, UNION ALL might introduce duplicates
> not present in the OR query's result.

Yeah.  Also, even if you could parse out those cases, it's major
optimization fence.  Consider if you have an ORDER BY clause here:

SELECT FROM foo WHERE a OR b ORDER BY c;

... by pushing inside a union, you're going to be in trouble in real
world cases.  That's just a mess and it would add a lot of runtime
analysis of the alternative paths.  It's hard for me to believe
rewriting is easier and simpler than rewriting 'false OR x' to 'x'.  I
also thing that constant folding strategies are going to render much
more sensible output to EXPLAIN.

FYI, The query is something along the lines of
SELECT * FROM foo
WHERE
  ('a' = 'a' AND EXISTS ...)
  OR ('a' = 'b' AND EXISTS ...)
  OR ('a' = 'c' AND EXISTS ...)

...where the left side of the equality is a parameterized 'filter
mode' flag.  That way the query can introduce filtering behaviors
without doing dynamic acrobatics.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jul 1, 2016 at 9:52 AM, Tom Lane  wrote:
>> Maybe, but neither UNION nor UNION ALL would duplicate the semantics
>> of OR, so there's some handwaving here that I missed.

> SELECT * FROM foo WHERE a = 5 OR a = 4
> isn't equivalent to
> SELECT * FROM foo WHERE a = 5
> UNION
> SELECT * FROM foo WHERE a = 4
> ?

It probably is, but you're assuming that "a" appears in the list of
columns being unioned.  If you make that just "SELECT b FROM ..."
then the latter form gets rid of duplicate b values where the first
doesn't.  On the other hand, UNION ALL might introduce duplicates
not present in the OR query's result.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Robert Haas
On Fri, Jul 1, 2016 at 9:52 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Jun 21, 2016 at 4:18 PM, Merlin Moncure  wrote:
>>> explain analyze select * from foo where false or exists (select 1 from
>>> bar where good and foo.id = bar.id);  -- A
>>> explain analyze select * from foo where exists (select 1 from bar
>>> where good and foo.id = bar.id);  -- B
>>>
>>> These queries are trivially verified as identical but give very different 
>>> plans.
>
>> Right.  I suspect wouldn't be very hard to notice the special case of
>> FALSE OR (SOMETHING THAT MIGHT NOT BE FALSE) but I'm not sure that's
>> worth optimizing by itself.
>
> Constant-folding will get rid of the OR FALSE (as well as actually-useful
> variants of this example).  The problem is that that doesn't happen till
> after we identify semijoins.  So the second one gives you a semijoin plan
> and the first doesn't.  This isn't especially easy to improve.  Much of
> the value of doing constant-folding would disappear if we ran it before
> subquery pullup + join simplification, because in non-stupidly-written
> queries those are what expose the expression simplification opportunities.
> We could run it twice but that seems certain to be a dead loser most of
> the time.
>
>> A more promising line of attack as it
>> seems to me is to let the planner transform back and forth between
>> this form for the query and the UNION form.
>
> Maybe, but neither UNION nor UNION ALL would duplicate the semantics
> of OR, so there's some handwaving here that I missed.

SELECT * FROM foo WHERE a = 5 OR a = 4

isn't equivalent to

SELECT * FROM foo WHERE a = 5
UNION
SELECT * FROM foo WHERE a = 4

?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Tom Lane
Robert Haas  writes:
> On Tue, Jun 21, 2016 at 4:18 PM, Merlin Moncure  wrote:
>> explain analyze select * from foo where false or exists (select 1 from
>> bar where good and foo.id = bar.id);  -- A
>> explain analyze select * from foo where exists (select 1 from bar
>> where good and foo.id = bar.id);  -- B
>> 
>> These queries are trivially verified as identical but give very different 
>> plans.

> Right.  I suspect wouldn't be very hard to notice the special case of
> FALSE OR (SOMETHING THAT MIGHT NOT BE FALSE) but I'm not sure that's
> worth optimizing by itself.

Constant-folding will get rid of the OR FALSE (as well as actually-useful
variants of this example).  The problem is that that doesn't happen till
after we identify semijoins.  So the second one gives you a semijoin plan
and the first doesn't.  This isn't especially easy to improve.  Much of
the value of doing constant-folding would disappear if we ran it before
subquery pullup + join simplification, because in non-stupidly-written
queries those are what expose the expression simplification opportunities.
We could run it twice but that seems certain to be a dead loser most of
the time.

> A more promising line of attack as it
> seems to me is to let the planner transform back and forth between
> this form for the query and the UNION form.

Maybe, but neither UNION nor UNION ALL would duplicate the semantics
of OR, so there's some handwaving here that I missed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions

2016-07-01 Thread Robert Haas
On Tue, Jun 21, 2016 at 4:18 PM, Merlin Moncure  wrote:
> Observe the following test case (apologies if this is a well
> understood problem):
>
> create temp table foo as select generate_series(1,100) id;
> create index on foo(id);
>
> create temp table bar as select id, id % 10 = 0 as good from
> generate_series(1,100) id;
> create index on bar(good);
>
> analyze foo;
> analyze bar;
>
> explain analyze select * from foo where false or exists (select 1 from
> bar where good and foo.id = bar.id);  -- A
> explain analyze select * from foo where exists (select 1 from bar
> where good and foo.id = bar.id);  -- B
>
> These queries are trivially verified as identical but give very different 
> plans.

Right.  I suspect wouldn't be very hard to notice the special case of
FALSE OR (SOMETHING THAT MIGHT NOT BE FALSE) but I'm not sure that's
worth optimizing by itself.  A more promising line of attack as it
seems to me is to let the planner transform back and forth between
this form for the query and the UNION form.  Obviously, in this case,
the WHERE false branch could then be pruned altogether, but there are
lots of other cases where both branches survived.  Tom's upper planner
pathification stuff makes it much easier to think about how such an
optimization might work, but I think it's still not particularly
simple to get right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers