Re: [PERFORM] Different query plans for the same query
Hi Tom, it would be really hard for us to change the underlying tables and the executed query. Is there any other way for us to avoid the really bad query (e.g. a hint for the planner)? Regards, Robert Hell -Ursprüngliche Nachricht- Von: Tom Lane [mailto:t...@sss.pgh.pa.us] Gesendet: Freitag, 18. September 2009 17:43 An: Hell, Robert Cc: t...@fuzzy.cz; pgsql-performance@postgresql.org Betreff: Re: [PERFORM] Different query plans for the same query Hell, Robert robert.h...@fabasoft.com writes: bad plan (sometimes with statistcs target 100, seconds after the good plan was chosen) - about 2 minutes: http://explain.depesz.com/s/gcr good plan (most of the time with statistcs target 100) - about one second: http://explain.depesz.com/s/HX very good plan (with statistics target 10) - about 15 ms: http://explain.depesz.com/s/qMc What's the reason for that? Garbage in, garbage out :-(. When you've got rowcount estimates that are off by a couple orders of magnitude, it's unsurprising that you get bad plan choices. In this case it appears that the bad and good plans have just about the same estimated cost. I'm guessing that the underlying statistics change a bit due to autovacuum activity, causing the plan choice to flip unexpectedly. The real fix would be to get the rowcount estimates more in line with reality. I think the main problem is that in cases like - Index Scan using ind_atobjval on atobjval t6 (cost=0.00..12.04 rows=1 width=12) (actual time=0.032..0.953 rows=775 loops=1) Index Cond: ((attrid = 285774255985991::bigint) AND (objval = 285774255985589::bigint)) the planner is supposing that the two conditions are independent when they are not. Is there any way you can refactor the data representation to remove the hidden redundancy? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Different query plans for the same query
Hi all, on our PostgreSQL 8.3.1 (CentOS 5.3 64-bit) two different query plans for one of our (weird) queries are generated. One of the query plans seems to be good (and is used most of the time). The other one is bad - the query takes about 2 minutes and the database process, which is executing the query, is cpu bound during this time. After several tries I was able to reproduce the problem when executing the query with EXPLAIN ANALYZE. The bad query plan was generated only seconds after the good one was used when executing the query. What's the reasond for the different query plans? Statistics are up to date. Good: EXPLAIN ANALYZE SELECT DISTINCT t6.objid FROM ataggval q1_1, atobjval t6, atobjval t7, atobjval t8, atobjval t9, cooobject t10 WHERE q1_1.objid = t6.objid AND q1_1.attrid = 285774255985993 AND q1_1.aggrid = 0 AND t6.aggrid = q1_1.aggval AND t7.aggrid = 0 AND t7.objid = t6.objid AND t8.aggrid = 0 AND t8.objid = t6.objid AND t9.aggrid = 0 AND t9.objid = t6.objid AND t10.objid = t6.objid AND t6.objval = 285774255985589 AND t6.attrid=285774255985991 AND t7.objval = 625445988202446985 AND t7.attrid=285774255985855 AND t8.objval = 625445988286355913 AND t8.attrid=285774255985935 AND t9.objval = 625445988269570350 AND t9.attrid=285774255985938 AND t10.objclassid = 285774255985894 ORDER BY t6.objid; QUERY PLAN Unique (cost=66.58..66.59 rows=1 width=8) (actual time=1548.207..1548.208 rows=1 loops=1) - Sort (cost=66.58..66.58 rows=1 width=8) (actual time=1548.206..1548.207 rows=1 loops=1) Sort Key: t6.objid Sort Method: quicksort Memory: 25kB - Nested Loop (cost=0.00..66.57 rows=1 width=8) (actual time=1044.759..1548.190 rows=1 loops=1) Join Filter: (t6.objid = t7.objid) - Nested Loop (cost=0.00..54.52 rows=1 width=40) (actual time=21.938..1541.633 rows=350 loops=1) Join Filter: (t6.objid = t8.objid) - Nested Loop (cost=0.00..42.47 rows=1 width=32) (actual time=21.907..1422.710 rows=364 loops=1) Join Filter: (t6.objid = t9.objid) - Nested Loop (cost=0.00..30.42 rows=1 width=24) (actual time=0.151..920.873 rows=775 loops=1) - Nested Loop (cost=0.00..21.97 rows=1 width=16) (actual time=0.065..915.387 rows=775 loops=1) Join Filter: (q1_1.objid = t6.objid) - Index Scan using ind_atobjval on atobjval t6 (cost=0.00..12.04 rows=1 width=12) (actual time=0.031..0.863 rows=775 loops=1) Index Cond: ((attrid = 285774255985991::bigint) AND (objval = 285774255985589::bigint)) - Index Scan using ind_ataggval on ataggval q1_1 (cost=0.00..9.92 rows=1 width=12) (actual time=0.006..0.897 rows=1243 loops=775) Index Cond: ((q1_1.attrid = 285774255985993::bigint) AND (q1_1.aggval = t6.aggrid)) Filter: (q1_1.aggrid = 0) - Index Scan using cooobjectix on cooobject t10 (cost=0.00..8.44 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=775) Index Cond: (t10.objid = t6.objid) Filter: (t10.objclassid = 285774255985894::bigint) - Index Scan using ind_atobjval on atobjval t9 (cost=0.00..12.04 rows=1 width=8) (actual time=0.007..0.490 rows=694 loops=775) Index Cond: ((t9.attrid = 285774255985938::bigint) AND (t9.objval = 625445988269570350::bigint)) Filter: (t9.aggrid = 0) - Index Scan using ind_atobjval on atobjval t8 (cost=0.00..12.04 rows=1 width=8) (actual time=0.007..0.248 rows=350 loops=364) Index Cond: ((t8.attrid = 285774255985935::bigint) AND (t8.objval = 625445988286355913::bigint)) Filter: (t8.aggrid = 0) - Index Scan using ind_atobjval on atobjval t7 (cost=0.00..12.04 rows=1 width=8) (actual time=0.005..0.015 rows=13 loops=350) Index Cond: ((t7.attrid = 285774255985855::bigint) AND (t7.objval = 625445988202446985::bigint)) Filter: (t7.aggrid = 0) Total runtime: 1548.339 ms (31 rows) Bad: EXPLAIN ANALYZE SELECT DISTINCT t6.objid FROM ataggval q1_1, atobjval t6, atobjval t7, atobjval t8, atobjval t9, cooobject t10 WHERE q1_1.objid = t6.objid
Re: [PERFORM] Different query plans for the same query
Hi all, on our PostgreSQL 8.3.1 (CentOS 5.3 64-bit) two different query plans for one of our (weird) queries are generated. One of the query plans seems to be good (and is used most of the time). The other one is bad - the query takes about 2 minutes and the database process, which is executing the query, is cpu bound during this time. After several tries I was able to reproduce the problem when executing the query with EXPLAIN ANALYZE. The bad query plan was generated only seconds after the good one was used when executing the query. What's the reasond for the different query plans? Statistics are up to date. ... Hi, please, when posting an explain plan, either save it into a file and provide a URL (attachments are not allowed here), or use explain.depesz.com or something like that. This wrapping makes the plan unreadable so it's much more difficult to help you. I've used the explain.depesz.com (this time): - good plan: http://explain.depesz.com/s/HX - bad plan: http://explain.depesz.com/s/gcr It seems the whole problem is caused by the 'Index Scan using ind_atobjval on atobjval t9' - in the first case it's executed only 775x, but in the second case it's moved to the nested loop (one level deeper) and suddenly it's executed 271250x. And that causes the huge increase in cost. Why is this happening? I'm not sure, but I'm not quite sure the statistics are up to data and precise enough - some of the steps state 'rows=1' estimate, but 'rows=775' in the actual results. Have you tried to increase target on the tables? That might provide more accurate stats, thus better estimates. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Different query plans for the same query
Hi, sorry about that. We use 100 as default_statistics_target for this database. The default should be 10 here - statistics are up to date I executed analyze manually this morning. As mentioned before, the bad plan only happens once or twice a day - so the reproduction of that plan is very difficult. I now played a little bit with statistics target for those three tables (alter table ...set statistics). It seems that there is a better query plan than the good one when using 10 as statistics target. bad plan (sometimes with statistcs target 100, seconds after the good plan was chosen) - about 2 minutes: http://explain.depesz.com/s/gcr good plan (most of the time with statistcs target 100) - about one second: http://explain.depesz.com/s/HX very good plan (with statistics target 10) - about 15 ms: http://explain.depesz.com/s/qMc What's the reason for that? I always thought increasing default statistics target should make statistics (and query plans) better. Regards, Robert -Ursprüngliche Nachricht- Von: t...@fuzzy.cz [mailto:t...@fuzzy.cz] Gesendet: Freitag, 18. September 2009 11:20 An: Hell, Robert Cc: pgsql-performance@postgresql.org Betreff: Re: [PERFORM] Different query plans for the same query Hi all, on our PostgreSQL 8.3.1 (CentOS 5.3 64-bit) two different query plans for one of our (weird) queries are generated. One of the query plans seems to be good (and is used most of the time). The other one is bad - the query takes about 2 minutes and the database process, which is executing the query, is cpu bound during this time. After several tries I was able to reproduce the problem when executing the query with EXPLAIN ANALYZE. The bad query plan was generated only seconds after the good one was used when executing the query. What's the reasond for the different query plans? Statistics are up to date. ... Hi, please, when posting an explain plan, either save it into a file and provide a URL (attachments are not allowed here), or use explain.depesz.com or something like that. This wrapping makes the plan unreadable so it's much more difficult to help you. I've used the explain.depesz.com (this time): - good plan: http://explain.depesz.com/s/HX - bad plan: http://explain.depesz.com/s/gcr It seems the whole problem is caused by the 'Index Scan using ind_atobjval on atobjval t9' - in the first case it's executed only 775x, but in the second case it's moved to the nested loop (one level deeper) and suddenly it's executed 271250x. And that causes the huge increase in cost. Why is this happening? I'm not sure, but I'm not quite sure the statistics are up to data and precise enough - some of the steps state 'rows=1' estimate, but 'rows=775' in the actual results. Have you tried to increase target on the tables? That might provide more accurate stats, thus better estimates. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Different query plans for the same query
Hell, Robert robert.h...@fabasoft.com writes: bad plan (sometimes with statistcs target 100, seconds after the good plan was chosen) - about 2 minutes: http://explain.depesz.com/s/gcr good plan (most of the time with statistcs target 100) - about one second: http://explain.depesz.com/s/HX very good plan (with statistics target 10) - about 15 ms: http://explain.depesz.com/s/qMc What's the reason for that? Garbage in, garbage out :-(. When you've got rowcount estimates that are off by a couple orders of magnitude, it's unsurprising that you get bad plan choices. In this case it appears that the bad and good plans have just about the same estimated cost. I'm guessing that the underlying statistics change a bit due to autovacuum activity, causing the plan choice to flip unexpectedly. The real fix would be to get the rowcount estimates more in line with reality. I think the main problem is that in cases like - Index Scan using ind_atobjval on atobjval t6 (cost=0.00..12.04 rows=1 width=12) (actual time=0.032..0.953 rows=775 loops=1) Index Cond: ((attrid = 285774255985991::bigint) AND (objval = 285774255985589::bigint)) the planner is supposing that the two conditions are independent when they are not. Is there any way you can refactor the data representation to remove the hidden redundancy? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Different query plans for the same query
2009/9/18 t...@fuzzy.cz: Hi all, on our PostgreSQL 8.3.1 (CentOS 5.3 64-bit) two different query plans for one of our (weird) queries are generated. One of the query plans seems to be good (and is used most of the time). The other one is bad - the query takes about 2 minutes and the database process, which is executing the query, is cpu bound during this time. After several tries I was able to reproduce the problem when executing the query with EXPLAIN ANALYZE. The bad query plan was generated only seconds after the good one was used when executing the query. What's the reasond for the different query plans? Statistics are up to date. ... Hi, please, when posting an explain plan, either save it into a file and provide a URL (attachments are not allowed here), or use explain.depesz.com or something like that. This wrapping makes the plan unreadable so it's much more difficult to help you. Uh, since when are attachments not allowed here? I completely agree that line-wrapping is BAD, but I don't agree that pastebin is good. I would much rather have the relevant material in the email, or in an attachment, than in some other web site that may or may not format it readably, may or may not be easy to cut and paste, and will definitely not become part of our archives. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance