Re: [HACKERS] query execution time faster with geqo on than off: bug?
On Sun, Jun 14, 2015 at 6:43 PM, David Kamholz lautges...@gmail.com wrote: I've encountered a query with 11 joins whose execution time (i.e., the time not taken up by planning) is significantly faster with geqo on rather than off. This is surprising to me and seems like it might be a bug in the planner, so I am posting it here rather than to -performance. The query is below, along with EXPLAIN ANALYZE results with geqo on and off. The server version is 9.4.4. The various geqo options are all set to the default. join_collapse_limit is set to 12 (the query is much slower with it set to the default of 8). Let me know what other information might be helpful in debugging this further. Thanks! Well, for starters you're looking at an estimation miss. The exhaustive search found the 'cheaper' plan than what geqo came up with, but that did not correlate to execution time. This is a common and frustrating problem. Generally to try and avoid it it's good to avoid things in tables and queries that the database has difficulty planning or to crank statistics in specific cases. Anyways, In the non geqo plan, I see lines like this: - Nested Loop (cost=0.76..107.61 rows=27 width=20) (actual time=0.116..797.027 rows=1047967 loops=1) ...that suggest any good result is a matter of luck, more or less; a 5 order of magnitude miss into a nestloop is fodder for unpleasant results because that error is carried into the estimate itself. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] query execution time faster with geqo on than off: bug?
Well, for starters you're looking at an estimation miss. The exhaustive search found the 'cheaper' plan than what geqo came up with, but that did not correlate to execution time. This is a common and frustrating problem. Generally to try and avoid it it's good to avoid things in tables and queries that the database has difficulty planning or to crank statistics in specific cases. RhodiumToad on #postgresql thinks it may be a different issue -- namely, the fact that there are various estimates of rows=1 when the actual number is higher. Increasing default_statistics_target to 1000 and 1 seems to confirm this -- if anything, the query runs slower, and the plans look to be about the same. So I'm not convinced yet that it's not a bug. The better performance with geqo on is pretty consistent. I recognize that the query is complex, and I can try to simplify it, but it would be nice if its performance were not a matter of luck. I've attached a file containing the original query and the EXPLAIN ANALYZE results for geqo on and default_statistics_target 100, geqo off and default_statistics_target 100, geqo on and default_statistics_target 1, and geqo off and default_statistics_target 1, showing that the increased statistics target doesn't help. (I figured it would be easier to read as an attachment because my email client automatically wraps long lines.) Dave SELECT ex.ex, ex.lv, ex.tt, ex.td, dnsrc.ex AS trex, ((uiuq_score(array_agg(ap.ui), array_agg(ap.uq)) + uiuq_score(array_agg(apsrc.ui), array_agg(apsrc.uq))) / 2) AS trq FROM ex INNER JOIN lv ON (lv.lv = ex.lv) INNER JOIN dn ON (dn.ex = ex.ex) INNER JOIN mn ON (mn.mn = dn.mn) INNER JOIN ap ON (ap.ap = mn.ap) INNER JOIN dn AS dn2 ON (dn2.mn = dn.mn) INNER JOIN dn AS dn3 ON (dn3.ex = dn2.ex) INNER JOIN dn AS dnsrc ON (dnsrc.mn = dn3.mn) INNER JOIN mn AS mnsrc ON (mnsrc.mn = dnsrc.mn) INNER JOIN ap AS apsrc ON (apsrc.ap = mnsrc.ap) INNER JOIN ex AS exsrc ON (exsrc.ex = dnsrc.ex) INNER JOIN lv AS lvsrc ON (lvsrc.lv = exsrc.lv) WHERE dn.ex != dn2.ex AND dn3.ex != dnsrc.ex AND mn.ap != mnsrc.ap AND dn.ex != dnsrc.ex AND lcvc(lv.lc, lv.vc) IN ('zul-000') AND lcvc(lvsrc.lc, lvsrc.vc) IN ('gle-000') AND exsrc.tt IN ('doras') GROUP BY ex.ex, dnsrc.ex ORDER BY trq desc LIMIT 2000; EXPLAIN ANALYZE with geqo on and default_statistics_target = 100: QUERY PLAN - Limit (cost=3603.82..3603.82 rows=2 width=57) (actual time=674.046..674.050 rows=31 loops=1) - Sort (cost=3603.82..3603.82 rows=2 width=57) (actual time=674.044..674.046 rows=31 loops=1) Sort Key: (((uiuq_score(array_agg(ap.ui), array_agg(ap.uq)) + uiuq_score(array_agg(apsrc.ui), array_agg(apsrc.uq))) / 2)) Sort Method: quicksort Memory: 27kB - HashAggregate (cost=3603.60..3603.82 rows=2 width=57) (actual time=672.799..674.008 rows=31 loops=1) Group Key: ex.ex, dnsrc.ex - Nested Loop (cost=3075.23..3603.59 rows=2 width=57) (actual time=8.048..671.384 rows=766 loops=1) - Nested Loop (cost=3075.17..3603.43 rows=2 width=55) (actual time=8.036..669.963 rows=766 loops=1) Join Filter: (mn.ap mnsrc.ap) Rows Removed by Join Filter: 3793 - Hash Join (cost=3075.08..3603.21 rows=2 width=63) (actual time=8.012..660.229 rows=4559 loops=1) Hash Cond: (dn.ex = ex.ex) - Nested Loop (cost=3.04..510.58 rows=20564 width=26) (actual time=0.939..553.677 rows=1047985 loops=1) Join Filter: (dn.ex dnsrc.ex) Rows Removed by Join Filter: 5396 - Nested Loop (cost=2.92..277.03 rows=118 width=22) (actual time=0.934..120.245 rows=40543 loops=1) - Nested Loop (cost=2.87..267.76 rows=118 width=16) (actual time=0.921..62.140 rows=40543 loops=1) - Hash Join (cost=2.76..266.08 rows=1 width=12) (actual time=0.899..6.318 rows=1254 loops=1) Hash Cond: (exsrc.lv = lvsrc.lv) - Nested Loop (cost=0.43..263.08 rows=871 width=16) (actual time=0.173..5.788 rows=1516 loops=1) - Nested Loop (cost=0.34..164.79 rows=871 width=20) (actual time=0.142..1.239 rows=1516 loops=1)
[HACKERS] query execution time faster with geqo on than off: bug?
I've encountered a query with 11 joins whose execution time (i.e., the time not taken up by planning) is significantly faster with geqo on rather than off. This is surprising to me and seems like it might be a bug in the planner, so I am posting it here rather than to -performance. The query is below, along with EXPLAIN ANALYZE results with geqo on and off. The server version is 9.4.4. The various geqo options are all set to the default. join_collapse_limit is set to 12 (the query is much slower with it set to the default of 8). Let me know what other information might be helpful in debugging this further. Thanks! QUERY: SELECT ex.ex, ex.lv, ex.tt, ex.td, dnsrc.ex AS trex, ((uiuq_score(array_agg(ap.ui), array_agg(ap.uq)) + uiuq_score(array_agg(apsrc.ui), array_agg(apsrc.uq))) / 2) AS trq FROM ex INNER JOIN lv ON (lv.lv = ex.lv) INNER JOIN dn ON (dn.ex = ex.ex) INNER JOIN mn ON (mn.mn = dn.mn) INNER JOIN ap ON (ap.ap = mn.ap) INNER JOIN dn AS dn2 ON (dn2.mn = dn.mn) INNER JOIN dn AS dn3 ON (dn3.ex = dn2.ex) INNER JOIN dn AS dnsrc ON (dnsrc.mn = dn3.mn) INNER JOIN mn AS mnsrc ON (mnsrc.mn = dnsrc.mn) INNER JOIN ap AS apsrc ON (apsrc.ap = mnsrc.ap) INNER JOIN ex AS exsrc ON (exsrc.ex = dnsrc.ex) INNER JOIN lv AS lvsrc ON (lvsrc.lv = exsrc.lv) WHERE dn.ex != dn2.ex AND dn3.ex != dnsrc.ex AND mn.ap != mnsrc.ap AND dn.ex != dnsrc.ex AND lcvc(lv.lc, lv.vc) IN ('zul-000') AND lcvc(lvsrc.lc, lvsrc.vc) IN ('gle-000') AND exsrc.tt IN ('doras') GROUP BY ex.ex, dnsrc.ex ORDER BY trq desc LIMIT 2000; EXPLAIN ANALYZE with geqo on: QUERY PLAN - Limit (cost=3603.82..3603.82 rows=2 width=57) (actual time=623.322..623.326 rows=31 loops=1) - Sort (cost=3603.82..3603.82 rows=2 width=57) (actual time=623.321..623.321 rows=31 loops=1) Sort Key: (((uiuq_score(array_agg(ap.ui), array_agg(ap.uq)) + uiuq_score(array_agg(apsrc.ui), array_agg(apsrc.uq))) / 2)) Sort Method: quicksort Memory: 27kB - HashAggregate (cost=3603.60..3603.82 rows=2 width=57) (actual time=622.074..623.283 rows=31 loops=1) Group Key: ex.ex, dnsrc.ex - Nested Loop (cost=3075.23..3603.59 rows=2 width=57) (actual time=8.023..620.761 rows=766 loops=1) - Nested Loop (cost=3075.17..3603.43 rows=2 width=55) (actual time=8.009..619.417 rows=766 loops=1) Join Filter: (mn.ap mnsrc.ap) Rows Removed by Join Filter: 3793 - Hash Join (cost=3075.08..3603.21 rows=2 width=63) (actual time=7.984..610.206 rows=4559 loops=1) Hash Cond: (dn.ex = ex.ex) - Nested Loop (cost=3.04..510.58 rows=20564 width=26) (actual time=0.918..509.574 rows=1047967 loops=1) Join Filter: (dn.ex dnsrc.ex) Rows Removed by Join Filter: 5396 - Nested Loop (cost=2.92..277.03 rows=118 width=22) (actual time=0.913..113.830 rows=40525 loops=1) - Nested Loop (cost=2.87..267.76 rows=118 width=16) (actual time=0.899..60.170 rows=40525 loops=1) - Hash Join (cost=2.76..266.08 rows=1 width=12) (actual time=0.878..5.057 rows=1254 loops=1) Hash Cond: ( exsrc.lv = lvsrc.lv) - Nested Loop (cost=0.43..263.08 rows=871 width=16) (actual time=0.169..4.626 rows=1516 loops=1) - Nested Loop (cost=0.34..164.79 rows=871 width=20) (actual time=0.138..1.180 rows=1516 loops=1) - Nested Loop (cost=0.23..155.09 rows=5 width=12) (actual time=0.097..0.306 rows=43 loops=1) - Index Scan using ex_tt_idx on ex exsrc (cost=0.11..2.57 rows=2 width=8) (actual time=0.056..0.061 rows=5 loops=1) Index Cond: (tt = 'doras'::text) - Index Scan using dn_ex_idx on dn dnsrc (cost=0.11..75.55 rows=71 width=8) (actual time=0.027..0.046 rows=9 loops=5) Index Cond: (ex = exsrc.ex) - Index Only Scan using dn_mn_ex_key on dn dn3 (cost=0.11..1.13 rows=81 width=8) (actual time=0.008..0.015 rows=35 loops=43) Index Cond: (mn = dnsrc.mn) Filter: (ex dnsrc.ex) Rows Removed by Filter: 1 Heap Fetches: 0 - Index Scan using mn_pkey on mn mnsrc (cost=0.09..0.10 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1516)