Re: [PERFORM] Postgres 8.3, four times slower queries?
On Wed, Mar 4, 2009 at 6:20 PM, Aaron Guyon battlem...@gmail.com wrote: On Tue, Mar 3, 2009 at 5:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Comparisons on numerics aren't terribly fast though (in either release). I wonder whether you could change the key columns to int or bigint. I changed the affected columns from numeric to integers and I was unable to get any performance gain: 8.3.3: 1195 ms 8.2.12: 611 ms I've attached the new query plans. Are you doing something to force the join order, like running with a small join_collapse_limit setting? If so maybe you shouldn't. No, we left the join_collapse_limit to the default 8. We tried a higher value, but there was no difference in performance. I'll post the query and the table descriptions in separate messages to the list to avoid my mail from being rejected for exceeding the size limit :) Well, it looks like the problem is that 8.3 is not using the index idx_bundle_content_bundle_id. But I don't know why that should be happening, unless there's a problem with that index. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres 8.3, four times slower queries?
On Thu, Mar 5, 2009 at 10:20 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Robert Haas robertmh...@gmail.com wrote: Well, it looks like the problem is that 8.3 is not using the index idx_bundle_content_bundle_id. But I don't know why that should be happening, unless there's a problem with that index. I didn't see that index defined. In fact, in the query shown, t8 is the payment_amount table, but in plan, I don't see any references to that table, and t8 is a table called bundle_content which is not included. Good point. Now that you mention it, I notice that many of the tables and columns seem to have been renamed. It's pretty hard to make anything intelligible out of a schema that doesn't resemble the plan. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Index scan plan estimates way off.
I have a relatively simple query with a single index on (contract_id, time): vjtrade= EXPLAIN SELECT * FROM ticks WHERE contract_id=1 ORDER BY time; QUERY PLAN - Sort (cost=11684028.44..11761274.94 rows=30898601 width=40) Sort Key: time - Bitmap Heap Scan on ticks (cost=715657.57..6995196.08 rows=30898601 width=40) Recheck Cond: (contract_id = 1) - Bitmap Index Scan on contract_id_time_idx (cost=0.00..707932.92 rows=30898601 width=0) Index Cond: (contract_id = 1) (6 rows) This plan doesn't complete in a reasonable amount of time. I end up having to kill the query after it's been running for over an hour. If I do a: SET enable_sort=FALSE; SET enable_bitmapscan=FALSE; Then it gives me this plan: Index Scan using contract_id_time_idx on ticks (cost=0.00..117276552.51 rows=30897044 width=40) (actual time=34.025..738583.609 rows=27858174 loops=1) Index Cond: (contract_id = 1) Total runtime: 742323.102 ms Notice how the estimated cost is so much different from the actual time. The row estimate is pretty good, however. This is on postgresql 8.3.5 with: shared_buffers = 512MB temp_buffers = 256MB work_mem = 256MB max_fsm_pages = 153600 effective_cache_size = 1500MB Is there any way to give postgresql a better estimate of the index scan time? I tried setting random_page_cost=1, but it still gave me the bitmap plan. Thanks, Jonathan Hseu
Re: [PERFORM] Postgres 8.3, four times slower queries?
Matching query plans with numerics changed to integers. I sent the wrong query plans earlier 8.3.3: 1195 ms 8.2.12: 611 ms QUERY PLAN - Unique (cost=6986.01..6986.02 rows=1 width=20) (actual time=611.061..611.061 rows=0 loops=1) - Sort (cost=6986.01..6986.02 rows=1 width=20) (actual time=611.053..611.053 rows=0 loops=1) Sort Key: t8.id - Nested Loop (cost=955.90..6986.00 rows=1 width=20) (actual time=610.942..610.942 rows=0 loops=1) Join Filter: ((t2.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t3.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t6.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t7.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t8.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t9.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t10.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t11.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text))) - Nested Loop (cost=955.90..4989.27 rows=523 width=76) (actual time=28.274..548.664 rows=1104 loops=1) - Hash Join (cost=955.90..3195.56 rows=526 width=52) (actual time=28.123..520.620 rows=1104 loops=1) Hash Cond: (t6.invoice_id = t7.id) - Seq Scan on payment t6 (cost=0.00..2002.69 rows=61789 width=16) (actual time=0.026..253.360 rows=60920 loops=1) Filter: (active 0) - Hash (cost=954.15..954.15 rows=140 width=44) (actual time=25.795..25.795 rows=332 loops=1) - Nested Loop (cost=0.00..954.15 rows=140 width=44) (actual time=0.546..24.364 rows=332 loops=1) - Nested Loop (cost=0.00..562.62 rows=100 width=40) (actual time=0.471..15.759 rows=336 loops=1) - Nested Loop (cost=0.00..74.40 rows=156 width=28) (actual time=0.233..6.404 rows=376 loops=1) - Nested Loop (cost=0.00..45.04 rows=4 width=20) (actual time=0.157..0.324 rows=4 loops=1) - Index Scan using idx_department_du on department t10 (cost=0.00..7.30 rows=6 width=12) (actual time=0.076..0.108 rows=7 loops=1) Index Cond: (company_id = 250893) Filter: (active 0) - Index Scan using idx_project_department_id on project t2 (cost=0.00..6.28 rows=1 width=16) (actual time=0.016..0.018 rows=1 loops=7) Index Cond: (t2.department_id = t10.id) Filter: (active 0) - Index Scan
Re: [PERFORM] Index scan plan estimates way off.
Jonathan Hseu vom...@vomjom.net writes: Sort (cost=11684028.44..11761274.94 rows=30898601 width=40) Sort Key: time - Bitmap Heap Scan on ticks (cost=715657.57..6995196.08 rows=30898601 width=40) Recheck Cond: (contract_id = 1) - Bitmap Index Scan on contract_id_time_idx (cost=0.00..707932.92 rows=30898601 width=0) Index Cond: (contract_id = 1) (6 rows) This plan doesn't complete in a reasonable amount of time. I end up having to kill the query after it's been running for over an hour. The bitmap scan should be at least as efficient as the plain indexscan, so I suppose the problem is that the sort is slow. What's the datatype of time? Can this machine actually support 256MB+ work_mem, or is that likely to be driving it into swapping? You might learn more from enabling trace_sort and watching the postmaster log entries it generates. On the whole I think the planner isn't making a stupid choice here: sorting a large number of rows usually *is* preferable to making an indexscan over them, unless the table is remarkably close to being in physical order for the index. So it would be worth trying to figure out what the problem with the sort is. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres 8.3, four times slower queries?
Aaron Guyon battlem...@gmail.com wrote: 8.3.3: 1195 ms 8.2.12: 611 ms Could you send the non-commented lines from the postgresql.conf files from both installations? If feasible, update to the latest bug-fix version of 8.3. Also, if you haven't already done so, try setting effective_cache_size = '3GB' and random_page_cost = 2 for the 8.3 database and restart to see what kind of plan you get. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres 8.3, four times slower queries?
On Thu, Mar 5, 2009 at 1:57 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Aaron Guyon battlem...@gmail.com wrote: 8.3.3: 1195 ms 8.2.12: 611 ms Could you send the non-commented lines from the postgresql.conf files from both installations? If feasible, update to the latest bug-fix version of 8.3. Also, if you haven't already done so, try setting effective_cache_size = '3GB' and random_page_cost = 2 for the 8.3 database and restart to see what kind of plan you get. I still there's a problem with that index, now it's called idx_payment_amount_payment_id. What do you get from this? select * from pg_index where indexrelid = 'idx_payment_amount_payment_id'::regclass; Have you tried: reindex table payment_amount ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres 8.3, four times slower queries?
On Thu, Mar 5, 2009 at 12:58 PM, Joshua D. Drake j...@commandprompt.comwrote: What happens if you do this: SET cpu_tuple_cost TO '0.5'; SET cpu_index_tuple_cost TO '0.5'; EXPLAIN ANALYZE 8.3 query Right now, I'm getting very good results with the above. I'm still running additional tests but I'll keep you guys updated. I've attached the new explain analyze. SET SET QUERY PLAN - Unique (cost=17397.17..17397.18 rows=2 width=20) (actual time=621.865..621.865 rows=0 loops=1) - Sort (cost=17397.17..17397.18 rows=2 width=20) (actual time=621.857..621.857 rows=0 loops=1) Sort Key: t8.id Sort Method: quicksort Memory: 17kB - Nested Loop (cost=0.00..17397.16 rows=2 width=20) (actual time=621.825..621.825 rows=0 loops=1) Join Filter: ((t2.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t3.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t6.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t7.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t8.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t9.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t10.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t11.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text))) - Nested Loop (cost=0.00..14615.82 rows=537 width=76) (actual time=51.092..556.405 rows=1104 loops=1) - Nested Loop (cost=0.00..12007.84 rows=541 width=52) (actual time=46.530..523.869 rows=1104 loops=1) - Nested Loop (cost=0.00..2229.06 rows=147 width=44) (actual time=42.642..501.168 rows=332 loops=1) - Nested Loop (cost=0.00..1530.56 rows=114 width=40) (actual time=39.485..489.224 rows=336 loops=1) - Nested Loop (cost=0.00..757.31 rows=177 width=28) (actual time=10.877..473.774 rows=376 loops=1) - Nested Loop (cost=0.00..69.07 rows=4 width=20) (actual time=6.463..6.640 rows=4 loops=1) - Index Scan using idx_department_du on department t10 (cost=0.00..14.72 rows=7 width=12) (actual time=6.141..6.173 rows=7 loops=1) Index Cond: (company_id = 250893) Filter: (active 0) - Index Scan using idx_project_department_id on project t2 (cost=0.00..7.26 rows=1 width=16) (actual time=0.051..0.053 rows=1 loops=7) Index Cond: (t2.department_id = t10.id) Filter: (t2.active 0) - Index Scan using idx_project_type_project_id on project_type t11 (cost=0.00..133.37 rows=77 width=16) (actual time=1.112..116.094 rows=94
Re: [PERFORM] Index scan plan estimates way off.
Oops, forgot to CC my reply to the list. Sorry if this gets messed up. On Thu, Mar 5, 2009 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Jonathan Hseu vom...@vomjom.net writes: Sort (cost=11684028.44..11761274.94 rows=30898601 width=40) Sort Key: time - Bitmap Heap Scan on ticks (cost=715657.57..6995196.08 rows=30898601 width=40) Recheck Cond: (contract_id = 1) - Bitmap Index Scan on contract_id_time_idx (cost=0.00..707932.92 rows=30898601 width=0) Index Cond: (contract_id = 1) (6 rows) This plan doesn't complete in a reasonable amount of time. I end up having to kill the query after it's been running for over an hour. The bitmap scan should be at least as efficient as the plain indexscan, so I suppose the problem is that the sort is slow. What's the datatype of time? It's a timestamp with time zone and not null. Can this machine actually support 256MB+ work_mem, or is that likely to be driving it into swapping? Yeah, the machine has 4 GB of RAM and isn't even close to swapping at all. You might learn more from enabling trace_sort and watching the postmaster log entries it generates. I got this (I'm not sure how to interpret it, as there doesn't seem to be any documentation about it on the web): 2009-03-05 15:28:27 CST STATEMENT: select * from ticks where contract_id=1 order by time limit 280; 2009-03-05 15:28:30 CST LOG: begin tuple sort: nkeys = 1, workMem = 262144, randomAccess = f 2009-03-05 15:28:30 CST STATEMENT: explain analyze select * from ticks where contract_id=1 order by time limit 280; 2009-03-05 16:50:31 CST LOG: switching to external sort with 937 tapes: CPU 26.57s/4835.39u sec elapsed 4921.38 sec 2009-03-05 16:50:31 CST STATEMENT: explain analyze select * from ticks where contract_id=1 order by time limit 280; 2009-03-05 17:00:46 CST LOG: performsort starting: CPU 92.51s/4955.58u sec elapsed 5536.57 sec 2009-03-05 17:00:46 CST STATEMENT: explain analyze select * from ticks where contract_id=1 order by time limit 280; 2009-03-05 17:00:50 CST LOG: finished writing run 1 to tape 0: CPU 92.86s/4958.30u sec elapsed 5539.78 sec 2009-03-05 17:00:50 CST STATEMENT: explain analyze select * from ticks where contract_id=1 order by time limit 280; 2009-03-05 17:00:50 CST LOG: finished writing final run 2 to tape 1: CPU 92.88s/4958.40u sec elapsed 5539.90 sec 2009-03-05 17:00:50 CST STATEMENT: explain analyze select * from ticks where contract_id=1 order by time limit 280; 2009-03-05 17:00:51 CST LOG: performsort done (except 2-way final merge): CPU 92.96s/4958.55u sec elapsed 5541.10 sec 2009-03-05 17:00:51 CST STATEMENT: explain analyze select * from ticks where contract_id=1 order by time limit 280; 2009-03-05 17:00:58 CST LOG: external sort ended, 204674 disk blocks used: CPU 93.36s/4960.04u sec elapsed 5548.33 sec 2009-03-05 17:00:58 CST STATEMENT: explain analyze select * from ticks where contract_id=1 order by time limit 280; On the whole I think the planner isn't making a stupid choice here: sorting a large number of rows usually *is* preferable to making an indexscan over them, unless the table is remarkably close to being in physical order for the index. So it would be worth trying to figure out what the problem with the sort is. I don't really understand this. It seems to me that fetching and sorting 30 million rows wouldn't be preferable to just fetching them in the correct order in the first place, even if it's in a random order. I tried another query with a much smaller result set, and the index scan takes 76 seconds, but the bitmap scan sort takes 1.5 hours. That's quite a difference. I'm pretty sure the physical order of the index is very different from the physical order of the table. The elements of the table are inserted in strictly time order, if that's how it ends up being on disk, whereas the index, as far as I understand it, would be sorted by the first of the multiple columns, the contract_id, then the time. Here's both of the EXPLAIN ANALYZEs for the same query: = explain analyze select * from ticks where contract_id=1 order by time limit 280; QUERY PLAN Limit (cost=10487812.41..10494812.41 rows=280 width=40) (actual time=5541109.704..5545345.598 rows=280 loops=1) - Sort (cost=10487812.41..10565267.29 rows=30981949 width=40) (actual time=5541109.702..5544883.149 rows=280 loops=1) Sort Key: time Sort Method: external merge Disk: 1637392kB - Bitmap Heap Scan on ticks (cost=718724.01..7015201.37 rows=30981949 width=40) (actual time=4874084.105..5465131.997 rows=27917481 loops=1) Recheck Cond: (contract_id = 1) - Bitmap Index Scan on contract_id_time_idx
Re: [PERFORM] Index scan plan estimates way off.
On Thu, Mar 5, 2009 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote: Jonathan Hseu vom...@vomjom.net writes: Sort (cost=11684028.44..11761274.94 rows=30898601 width=40) Sort Key: time - Bitmap Heap Scan on ticks (cost=715657.57..6995196.08 rows=30898601 width=40) Recheck Cond: (contract_id = 1) - Bitmap Index Scan on contract_id_time_idx (cost=0.00..707932.92 rows=30898601 width=0) Index Cond: (contract_id = 1) (6 rows) This plan doesn't complete in a reasonable amount of time. I end up having to kill the query after it's been running for over an hour. The bitmap scan should be at least as efficient as the plain indexscan, so I suppose the problem is that the sort is slow. What's the datatype of time? Can this machine actually support 256MB+ work_mem, or is that likely to be driving it into swapping? You might learn more from enabling trace_sort and watching the postmaster log entries it generates. On the whole I think the planner isn't making a stupid choice here: sorting a large number of rows usually *is* preferable to making an indexscan over them, unless the table is remarkably close to being in physical order for the index. It seems like this is only likely to be true if most of the data needs to be read from a magnetic disk, so that many seeks are involved. That might not be the case here, since the machine has an awful lot of RAM. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance