Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 11:45 AM, Alvaro Herrerawrote: > Merlin Moncure wrote: > >> It's pretty easy to craft a query where you're on the winning side, >> but what's the worst case of doing two pass...is constant folding a >> non trivial fraction of planning time? > > One thing that has been suggested is to re-examine the plan after > planning is done, and if execution time is estimated to be large (FSVO), > then run a second planning pass with more expensive optimizations > enabled to try and find better plans. The guiding principle would be to > continue to very quickly find good enough plans for > frequent/small/simple queries, but spend more planning effort on more > complex ones where execution is likely to take much longer than planning > time. > > So doing constant-folding twice would be enabled for the second planning > pass. I like this idea. Maybe a GUC controlling the cost based cutoff (with 0 meaning, "assume the worst and plan the hard way first"). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
Merlin Moncure wrote: > It's pretty easy to craft a query where you're on the winning side, > but what's the worst case of doing two pass...is constant folding a > non trivial fraction of planning time? One thing that has been suggested is to re-examine the plan after planning is done, and if execution time is estimated to be large (FSVO), then run a second planning pass with more expensive optimizations enabled to try and find better plans. The guiding principle would be to continue to very quickly find good enough plans for frequent/small/simple queries, but spend more planning effort on more complex ones where execution is likely to take much longer than planning time. So doing constant-folding twice would be enabled for the second planning pass. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 12:00 PM, Merlin Moncurewrote: > Sure (I didn't put you on that position, just thinking out loud). The > problem with UNION ALL is that it's only safe to do so when you know > for sure the both sides of the partition are non-overlapping. The > author of the query often knows this going in but for the planner it's > not so simple to figure out in many cases. If there's a subset of > cases. UNION sans ALL is probably a dead end on performance grounds. I'm not sure about that. It's certainly true that things are much more likely to work out when you can prove that UNION ALL is sufficient, because now you avoid de-duplication. But if the number of output rows is really small, it might work out anyway. I mean, consider this: SELECT * FROM enormous WHERE rarely_one = 1 OR EXISTS (SELECT 1 FROM tiny WHERE tiny.x = enormous.x) As written, you're not going to be able to answer this query without scanning a full scan of the enormous table. If you rewrite it to use UNION, then the first half can be implemented with an index scan or a bitmap index scan, and the second half can be implemented with a nested loop over the tiny table with an inner index scan on the enormous table. The fact that you have to deduplicate the results may be a small price to pay for avoiding an enormous scan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
Tom, all, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Robert Haaswrites: > > On Tue, Jun 21, 2016 at 4:18 PM, Merlin Moncure wrote: > >> explain analyze select * from foo where false or exists (select 1 from > >> bar where good and foo.id = bar.id); -- A > >> explain analyze select * from foo where exists (select 1 from bar > >> where good and foo.id = bar.id); -- B > >> > >> These queries are trivially verified as identical but give very different > >> plans. > > > Right. I suspect wouldn't be very hard to notice the special case of > > FALSE OR (SOMETHING THAT MIGHT NOT BE FALSE) but I'm not sure that's > > worth optimizing by itself. > > Constant-folding will get rid of the OR FALSE (as well as actually-useful > variants of this example). The problem is that that doesn't happen till > after we identify semijoins. So the second one gives you a semijoin plan > and the first doesn't. This isn't especially easy to improve. Much of > the value of doing constant-folding would disappear if we ran it before > subquery pullup + join simplification, because in non-stupidly-written > queries those are what expose the expression simplification opportunities. > We could run it twice but that seems certain to be a dead loser most of > the time. While it might be a loser most of the time to run it twice, I have to agree that it's pretty unfortunate that we don't handle this case in a more sane way. I looked a bit into pull_up_sublinks() and it doens't look like there's an easy way to realize this case there without going through the full effort of constant-folding. One approach that I'm wondering about is to do constant folding first and then track if we introduce a case where additional constant folding might help and only perform it again in those cases. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 10:27 AM, Robert Haaswrote: > On Fri, Jul 1, 2016 at 10:20 AM, Merlin Moncure wrote: >> Yeah. Also, even if you could parse out those cases, it's major >> optimization fence. Consider if you have an ORDER BY clause here: >> >> SELECT FROM foo WHERE a OR b ORDER BY c; >> >> ... by pushing inside a union, you're going to be in trouble in real >> world cases. That's just a mess and it would add a lot of runtime >> analysis of the alternative paths. It's hard for me to believe >> rewriting is easier and simpler than rewriting 'false OR x' to 'x'. I >> also thing that constant folding strategies are going to render much >> more sensible output to EXPLAIN. > > I don't think that it's easier and simpler and didn't intend to say > otherwise. I do think that I've run across LOTS of queries over the > years where rewriting OR using UNION ALL was a lot faster, and I think > that case is more likely to occur in practice than FALSE OR WHATEVER. > But, I'm just throwing out opinions to see what sticks here; I'm not > deeply invested in this. Sure (I didn't put you on that position, just thinking out loud). The problem with UNION ALL is that it's only safe to do so when you know for sure the both sides of the partition are non-overlapping. The author of the query often knows this going in but for the planner it's not so simple to figure out in many cases. If there's a subset of cases. UNION sans ALL is probably a dead end on performance grounds. This hinges on Tom's earlier statements, "Much of the value of doing constant-folding would disappear if we ran it before subquery pullup + join simplification, because in non-stupidly-written queries those are what expose the expression simplification opportunities." and, especially, "We could run it twice but that seems certain to be a dead loser most of the time." It's pretty easy to craft a query where you're on the winning side, but what's the worst case of doing two pass...is constant folding a non trivial fraction of planning time? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 10:11 AM, Tom Lanewrote: > Robert Haas writes: >> On Fri, Jul 1, 2016 at 9:52 AM, Tom Lane wrote: >>> Maybe, but neither UNION nor UNION ALL would duplicate the semantics >>> of OR, so there's some handwaving here that I missed. > >> SELECT * FROM foo WHERE a = 5 OR a = 4 >> isn't equivalent to >> SELECT * FROM foo WHERE a = 5 >> UNION >> SELECT * FROM foo WHERE a = 4 >> ? > > It probably is, but you're assuming that "a" appears in the list of > columns being unioned. If you make that just "SELECT b FROM ..." > then the latter form gets rid of duplicate b values where the first > doesn't. On the other hand, UNION ALL might introduce duplicates > not present in the OR query's result. Right, so, significant query transformations are non-trivial. But the point is that with the upper planification stuff, I think it is possible, at least in some cases, that we could consider reordering set operations with scan/join planning, just as we've previously talked about reordering grouping stages relative to scan/join planning. The details are undeniably hard to get right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 10:20 AM, Merlin Moncurewrote: > Yeah. Also, even if you could parse out those cases, it's major > optimization fence. Consider if you have an ORDER BY clause here: > > SELECT FROM foo WHERE a OR b ORDER BY c; > > ... by pushing inside a union, you're going to be in trouble in real > world cases. That's just a mess and it would add a lot of runtime > analysis of the alternative paths. It's hard for me to believe > rewriting is easier and simpler than rewriting 'false OR x' to 'x'. I > also thing that constant folding strategies are going to render much > more sensible output to EXPLAIN. I don't think that it's easier and simpler and didn't intend to say otherwise. I do think that I've run across LOTS of queries over the years where rewriting OR using UNION ALL was a lot faster, and I think that case is more likely to occur in practice than FALSE OR WHATEVER. But, I'm just throwing out opinions to see what sticks here; I'm not deeply invested in this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 9:11 AM, Tom Lanewrote: > Robert Haas writes: >> On Fri, Jul 1, 2016 at 9:52 AM, Tom Lane wrote: >>> Maybe, but neither UNION nor UNION ALL would duplicate the semantics >>> of OR, so there's some handwaving here that I missed. > >> SELECT * FROM foo WHERE a = 5 OR a = 4 >> isn't equivalent to >> SELECT * FROM foo WHERE a = 5 >> UNION >> SELECT * FROM foo WHERE a = 4 >> ? > > It probably is, but you're assuming that "a" appears in the list of > columns being unioned. If you make that just "SELECT b FROM ..." > then the latter form gets rid of duplicate b values where the first > doesn't. On the other hand, UNION ALL might introduce duplicates > not present in the OR query's result. Yeah. Also, even if you could parse out those cases, it's major optimization fence. Consider if you have an ORDER BY clause here: SELECT FROM foo WHERE a OR b ORDER BY c; ... by pushing inside a union, you're going to be in trouble in real world cases. That's just a mess and it would add a lot of runtime analysis of the alternative paths. It's hard for me to believe rewriting is easier and simpler than rewriting 'false OR x' to 'x'. I also thing that constant folding strategies are going to render much more sensible output to EXPLAIN. FYI, The query is something along the lines of SELECT * FROM foo WHERE ('a' = 'a' AND EXISTS ...) OR ('a' = 'b' AND EXISTS ...) OR ('a' = 'c' AND EXISTS ...) ...where the left side of the equality is a parameterized 'filter mode' flag. That way the query can introduce filtering behaviors without doing dynamic acrobatics. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
Robert Haaswrites: > On Fri, Jul 1, 2016 at 9:52 AM, Tom Lane wrote: >> Maybe, but neither UNION nor UNION ALL would duplicate the semantics >> of OR, so there's some handwaving here that I missed. > SELECT * FROM foo WHERE a = 5 OR a = 4 > isn't equivalent to > SELECT * FROM foo WHERE a = 5 > UNION > SELECT * FROM foo WHERE a = 4 > ? It probably is, but you're assuming that "a" appears in the list of columns being unioned. If you make that just "SELECT b FROM ..." then the latter form gets rid of duplicate b values where the first doesn't. On the other hand, UNION ALL might introduce duplicates not present in the OR query's result. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Fri, Jul 1, 2016 at 9:52 AM, Tom Lanewrote: > Robert Haas writes: >> On Tue, Jun 21, 2016 at 4:18 PM, Merlin Moncure wrote: >>> explain analyze select * from foo where false or exists (select 1 from >>> bar where good and foo.id = bar.id); -- A >>> explain analyze select * from foo where exists (select 1 from bar >>> where good and foo.id = bar.id); -- B >>> >>> These queries are trivially verified as identical but give very different >>> plans. > >> Right. I suspect wouldn't be very hard to notice the special case of >> FALSE OR (SOMETHING THAT MIGHT NOT BE FALSE) but I'm not sure that's >> worth optimizing by itself. > > Constant-folding will get rid of the OR FALSE (as well as actually-useful > variants of this example). The problem is that that doesn't happen till > after we identify semijoins. So the second one gives you a semijoin plan > and the first doesn't. This isn't especially easy to improve. Much of > the value of doing constant-folding would disappear if we ran it before > subquery pullup + join simplification, because in non-stupidly-written > queries those are what expose the expression simplification opportunities. > We could run it twice but that seems certain to be a dead loser most of > the time. > >> A more promising line of attack as it >> seems to me is to let the planner transform back and forth between >> this form for the query and the UNION form. > > Maybe, but neither UNION nor UNION ALL would duplicate the semantics > of OR, so there's some handwaving here that I missed. SELECT * FROM foo WHERE a = 5 OR a = 4 isn't equivalent to SELECT * FROM foo WHERE a = 5 UNION SELECT * FROM foo WHERE a = 4 ? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
Robert Haaswrites: > On Tue, Jun 21, 2016 at 4:18 PM, Merlin Moncure wrote: >> explain analyze select * from foo where false or exists (select 1 from >> bar where good and foo.id = bar.id); -- A >> explain analyze select * from foo where exists (select 1 from bar >> where good and foo.id = bar.id); -- B >> >> These queries are trivially verified as identical but give very different >> plans. > Right. I suspect wouldn't be very hard to notice the special case of > FALSE OR (SOMETHING THAT MIGHT NOT BE FALSE) but I'm not sure that's > worth optimizing by itself. Constant-folding will get rid of the OR FALSE (as well as actually-useful variants of this example). The problem is that that doesn't happen till after we identify semijoins. So the second one gives you a semijoin plan and the first doesn't. This isn't especially easy to improve. Much of the value of doing constant-folding would disappear if we ran it before subquery pullup + join simplification, because in non-stupidly-written queries those are what expose the expression simplification opportunities. We could run it twice but that seems certain to be a dead loser most of the time. > A more promising line of attack as it > seems to me is to let the planner transform back and forth between > this form for the query and the UNION form. Maybe, but neither UNION nor UNION ALL would duplicate the semantics of OR, so there's some handwaving here that I missed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] EXISTS clauses not being optimized in the face of 'one time pass' optimizable expressions
On Tue, Jun 21, 2016 at 4:18 PM, Merlin Moncurewrote: > Observe the following test case (apologies if this is a well > understood problem): > > create temp table foo as select generate_series(1,100) id; > create index on foo(id); > > create temp table bar as select id, id % 10 = 0 as good from > generate_series(1,100) id; > create index on bar(good); > > analyze foo; > analyze bar; > > explain analyze select * from foo where false or exists (select 1 from > bar where good and foo.id = bar.id); -- A > explain analyze select * from foo where exists (select 1 from bar > where good and foo.id = bar.id); -- B > > These queries are trivially verified as identical but give very different > plans. Right. I suspect wouldn't be very hard to notice the special case of FALSE OR (SOMETHING THAT MIGHT NOT BE FALSE) but I'm not sure that's worth optimizing by itself. A more promising line of attack as it seems to me is to let the planner transform back and forth between this form for the query and the UNION form. Obviously, in this case, the WHERE false branch could then be pruned altogether, but there are lots of other cases where both branches survived. Tom's upper planner pathification stuff makes it much easier to think about how such an optimization might work, but I think it's still not particularly simple to get right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers