Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-05 Thread Robert Haas
On Wed, Mar 4, 2009 at 6:20 PM, Aaron Guyon battlem...@gmail.com wrote:
 On Tue, Mar 3, 2009 at 5:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Comparisons on
 numerics aren't terribly fast though (in either release).  I wonder
 whether you could change the key columns to int or bigint.

 I changed the affected columns from numeric to integers and I was unable to
 get any performance gain:
 8.3.3: 1195 ms
 8.2.12: 611 ms

 I've attached the new query plans.

 Are you doing
 something to force the join order, like running with a small
 join_collapse_limit setting?  If so maybe you shouldn't.

 No, we left the join_collapse_limit to the default 8.  We tried a higher
 value, but there was no difference in performance.

 I'll post the query and the table descriptions in separate messages to the
 list to avoid my mail from being rejected for exceeding the size limit :)

Well, it looks like the problem is that 8.3 is not using the index
idx_bundle_content_bundle_id.  But I don't know why that should be
happening, unless there's a problem with that index.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-05 Thread Robert Haas
On Thu, Mar 5, 2009 at 10:20 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 Well, it looks like the problem is that 8.3 is not using the index
 idx_bundle_content_bundle_id.  But I don't know why that should be
 happening, unless there's a problem with that index.

 I didn't see that index defined.  In fact, in the query shown, t8 is
 the payment_amount table, but in plan, I don't see any references to
 that table, and t8 is a table called bundle_content which is not
 included.

Good point.  Now that you mention it, I notice that many of the tables
and columns seem to have been renamed.  It's pretty hard to make
anything intelligible out of a schema that doesn't resemble the plan.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index scan plan estimates way off.

2009-03-05 Thread Jonathan Hseu
I have a relatively simple query with a single index on (contract_id, time):

vjtrade= EXPLAIN SELECT * FROM ticks WHERE contract_id=1 ORDER BY time;
 QUERY
PLAN
-
 Sort  (cost=11684028.44..11761274.94 rows=30898601 width=40)
   Sort Key: time
   -  Bitmap Heap Scan on ticks  (cost=715657.57..6995196.08 rows=30898601
width=40)
 Recheck Cond: (contract_id = 1)
 -  Bitmap Index Scan on contract_id_time_idx
(cost=0.00..707932.92 rows=30898601 width=0)
   Index Cond: (contract_id = 1)
(6 rows)

This plan doesn't complete in a reasonable amount of time.  I end up having
to kill the query after it's been running for over an hour.

If I do a:
SET enable_sort=FALSE;
SET enable_bitmapscan=FALSE;

Then it gives me this plan:

Index Scan using contract_id_time_idx on ticks  (cost=0.00..117276552.51
rows=30897044 width=40) (actual time=34.025..738583.609 rows=27858174
loops=1)
  Index Cond: (contract_id = 1)
Total runtime: 742323.102 ms

Notice how the estimated cost is so much different from the actual time.
The row estimate is pretty good, however.

This is on postgresql 8.3.5 with:
shared_buffers = 512MB
temp_buffers = 256MB
work_mem = 256MB
max_fsm_pages = 153600
effective_cache_size = 1500MB

Is there any way to give postgresql a better estimate of the index scan
time?  I tried setting random_page_cost=1, but it still gave me the bitmap
plan.

Thanks,
Jonathan Hseu


Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-05 Thread Aaron Guyon
Matching query plans with numerics changed to integers.

I sent the wrong query plans earlier

8.3.3: 1195 ms
8.2.12: 611 ms





 QUERY PLAN 




 
-
 Unique  (cost=6986.01..6986.02 rows=1 width=20) (actual time=611.061..611.061 
rows=0 loops=1)
   -  Sort  (cost=6986.01..6986.02 rows=1 width=20) (actual 
time=611.053..611.053 rows=0 loops=1)
 Sort Key: t8.id
 -  Nested Loop  (cost=955.90..6986.00 rows=1 width=20) (actual 
time=610.942..610.942 rows=0 loops=1)
   Join Filter: ((t2.not_modified_since = 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t3.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD 
HH:MI:SS'::text)) OR (t6.not_modified_since = 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t7.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD 
HH:MI:SS'::text)) OR (t8.not_modified_since = 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t9.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD 
HH:MI:SS'::text)) OR (t10.not_modified_since = 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t11.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, 
'/MM/DD HH:MI:SS'::text)))
   -  Nested Loop  (cost=955.90..4989.27 rows=523 width=76) 
(actual time=28.274..548.664 rows=1104 loops=1)
 -  Hash Join  (cost=955.90..3195.56 rows=526 width=52) 
(actual time=28.123..520.620 rows=1104 loops=1)
   Hash Cond: (t6.invoice_id = t7.id)
   -  Seq Scan on payment t6  (cost=0.00..2002.69 
rows=61789 width=16) (actual time=0.026..253.360 rows=60920 loops=1)
 Filter: (active  0)
   -  Hash  (cost=954.15..954.15 rows=140 width=44) 
(actual time=25.795..25.795 rows=332 loops=1)
 -  Nested Loop  (cost=0.00..954.15 rows=140 
width=44) (actual time=0.546..24.364 rows=332 loops=1)
   -  Nested Loop  (cost=0.00..562.62 
rows=100 width=40) (actual time=0.471..15.759 rows=336 loops=1)
 -  Nested Loop  (cost=0.00..74.40 
rows=156 width=28) (actual time=0.233..6.404 rows=376 loops=1)
   -  Nested Loop  
(cost=0.00..45.04 rows=4 width=20) (actual time=0.157..0.324 rows=4 loops=1)
 -  Index Scan using 
idx_department_du on department t10  (cost=0.00..7.30 rows=6 width=12) (actual 
time=0.076..0.108 rows=7 loops=1)
   Index Cond: 
(company_id = 250893)
   Filter: (active 
 0)
 -  Index Scan using 
idx_project_department_id on project t2  (cost=0.00..6.28 rows=1 width=16) 
(actual time=0.016..0.018 rows=1 loops=7)
   Index Cond: 
(t2.department_id = t10.id)
   Filter: (active 
 0)
   -  Index Scan 

Re: [PERFORM] Index scan plan estimates way off.

2009-03-05 Thread Tom Lane
Jonathan Hseu vom...@vomjom.net writes:
  Sort  (cost=11684028.44..11761274.94 rows=30898601 width=40)
Sort Key: time
-  Bitmap Heap Scan on ticks  (cost=715657.57..6995196.08 rows=30898601
 width=40)
  Recheck Cond: (contract_id = 1)
  -  Bitmap Index Scan on contract_id_time_idx
 (cost=0.00..707932.92 rows=30898601 width=0)
Index Cond: (contract_id = 1)
 (6 rows)

 This plan doesn't complete in a reasonable amount of time.  I end up having
 to kill the query after it's been running for over an hour.

The bitmap scan should be at least as efficient as the plain indexscan,
so I suppose the problem is that the sort is slow.  What's the datatype
of time?  Can this machine actually support 256MB+ work_mem, or is that
likely to be driving it into swapping?

You might learn more from enabling trace_sort and watching the
postmaster log entries it generates.  On the whole I think the planner
isn't making a stupid choice here: sorting a large number of rows
usually *is* preferable to making an indexscan over them, unless the
table is remarkably close to being in physical order for the index.
So it would be worth trying to figure out what the problem with the
sort is.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-05 Thread Kevin Grittner
 Aaron Guyon battlem...@gmail.com wrote: 
 8.3.3: 1195 ms
 8.2.12: 611 ms
 
Could you send the non-commented lines from the postgresql.conf files
from both installations?
 
If feasible, update to the latest bug-fix version of 8.3.
 
Also, if you haven't already done so, try setting effective_cache_size
= '3GB' and random_page_cost = 2 for the 8.3 database and restart to
see what kind of plan you get.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-05 Thread Robert Haas
On Thu, Mar 5, 2009 at 1:57 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Aaron Guyon battlem...@gmail.com wrote:
 8.3.3: 1195 ms
 8.2.12: 611 ms

 Could you send the non-commented lines from the postgresql.conf files
 from both installations?

 If feasible, update to the latest bug-fix version of 8.3.

 Also, if you haven't already done so, try setting effective_cache_size
 = '3GB' and random_page_cost = 2 for the 8.3 database and restart to
 see what kind of plan you get.

I still there's a problem with that index, now it's called
idx_payment_amount_payment_id.  What do you get from this?

select * from pg_index where indexrelid =
'idx_payment_amount_payment_id'::regclass;

Have you tried:

reindex table payment_amount

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-05 Thread Aaron Guyon
On Thu, Mar 5, 2009 at 12:58 PM, Joshua D. Drake j...@commandprompt.comwrote:

 What happens if you do this:

 SET cpu_tuple_cost TO '0.5';
 SET cpu_index_tuple_cost TO '0.5';
 EXPLAIN ANALYZE 8.3 query


Right now, I'm getting very good results with the above.  I'm still running
additional tests but I'll keep you guys updated. I've attached the new
explain analyze.
SET
SET





 QUERY PLAN 




 
-
 Unique  (cost=17397.17..17397.18 rows=2 width=20) (actual 
time=621.865..621.865 rows=0 loops=1)
   -  Sort  (cost=17397.17..17397.18 rows=2 width=20) (actual 
time=621.857..621.857 rows=0 loops=1)
 Sort Key: t8.id
 Sort Method:  quicksort  Memory: 17kB
 -  Nested Loop  (cost=0.00..17397.16 rows=2 width=20) (actual 
time=621.825..621.825 rows=0 loops=1)
   Join Filter: ((t2.not_modified_since = 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t3.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD 
HH:MI:SS'::text)) OR (t6.not_modified_since = 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t7.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD 
HH:MI:SS'::text)) OR (t8.not_modified_since = 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t9.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD 
HH:MI:SS'::text)) OR (t10.not_modified_since = 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t11.not_modified_since = to_timestamp('2009-02-24T11:19:57'::text, 
'/MM/DD HH:MI:SS'::text)))
   -  Nested Loop  (cost=0.00..14615.82 rows=537 width=76) (actual 
time=51.092..556.405 rows=1104 loops=1)
 -  Nested Loop  (cost=0.00..12007.84 rows=541 width=52) 
(actual time=46.530..523.869 rows=1104 loops=1)
   -  Nested Loop  (cost=0.00..2229.06 rows=147 
width=44) (actual time=42.642..501.168 rows=332 loops=1)
 -  Nested Loop  (cost=0.00..1530.56 rows=114 
width=40) (actual time=39.485..489.224 rows=336 loops=1)
   -  Nested Loop  (cost=0.00..757.31 
rows=177 width=28) (actual time=10.877..473.774 rows=376 loops=1)
 -  Nested Loop  (cost=0.00..69.07 
rows=4 width=20) (actual time=6.463..6.640 rows=4 loops=1)
   -  Index Scan using 
idx_department_du on department t10  (cost=0.00..14.72 rows=7 width=12) (actual 
time=6.141..6.173 rows=7 loops=1)
 Index Cond: 
(company_id = 250893)
 Filter: (active  0)
   -  Index Scan using 
idx_project_department_id on project t2  (cost=0.00..7.26 rows=1 width=16) 
(actual time=0.051..0.053 rows=1 loops=7)
 Index Cond: 
(t2.department_id = t10.id)
 Filter: (t2.active  
0)
 -  Index Scan using 
idx_project_type_project_id on project_type t11  (cost=0.00..133.37 rows=77 
width=16) (actual time=1.112..116.094 rows=94 

Re: [PERFORM] Index scan plan estimates way off.

2009-03-05 Thread Jonathan Hseu
Oops, forgot to CC my reply to the list.  Sorry if this gets messed up.

On Thu, Mar 5, 2009 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jonathan Hseu vom...@vomjom.net writes:
   Sort  (cost=11684028.44..11761274.94 rows=30898601 width=40)
 Sort Key: time
 -  Bitmap Heap Scan on ticks  (cost=715657.57..6995196.08
 rows=30898601
  width=40)
   Recheck Cond: (contract_id = 1)
   -  Bitmap Index Scan on contract_id_time_idx
  (cost=0.00..707932.92 rows=30898601 width=0)
 Index Cond: (contract_id = 1)
  (6 rows)

  This plan doesn't complete in a reasonable amount of time.  I end up
 having
  to kill the query after it's been running for over an hour.

 The bitmap scan should be at least as efficient as the plain indexscan,
 so I suppose the problem is that the sort is slow.  What's the datatype
 of time?


It's a timestamp with time zone and not null.


 Can this machine actually support 256MB+ work_mem, or is that
 likely to be driving it into swapping?


Yeah, the machine has 4 GB of RAM and isn't even close to swapping at all.


 You might learn more from enabling trace_sort and watching the
 postmaster log entries it generates.


I got this (I'm not sure how to interpret it, as there doesn't seem to be
any documentation about it on the web):

2009-03-05 15:28:27 CST STATEMENT:  select * from ticks where contract_id=1
order by time limit 280;
2009-03-05 15:28:30 CST LOG:  begin tuple sort: nkeys = 1, workMem = 262144,
randomAccess = f
2009-03-05 15:28:30 CST STATEMENT:  explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 16:50:31 CST LOG:  switching to external sort with 937 tapes: CPU
26.57s/4835.39u sec elapsed 4921.38 sec
2009-03-05 16:50:31 CST STATEMENT:  explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:46 CST LOG:  performsort starting: CPU 92.51s/4955.58u sec
elapsed 5536.57 sec
2009-03-05 17:00:46 CST STATEMENT:  explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:50 CST LOG:  finished writing run 1 to tape 0: CPU
92.86s/4958.30u sec elapsed 5539.78 sec
2009-03-05 17:00:50 CST STATEMENT:  explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:50 CST LOG:  finished writing final run 2 to tape 1: CPU
92.88s/4958.40u sec elapsed 5539.90 sec
2009-03-05 17:00:50 CST STATEMENT:  explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:51 CST LOG:  performsort done (except 2-way final merge):
CPU 92.96s/4958.55u sec elapsed 5541.10 sec
2009-03-05 17:00:51 CST STATEMENT:  explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:58 CST LOG:  external sort ended, 204674 disk blocks used:
CPU 93.36s/4960.04u sec elapsed 5548.33 sec
2009-03-05 17:00:58 CST STATEMENT:  explain analyze select * from ticks
where contract_id=1 order by time limit 280;



 On the whole I think the planner
 isn't making a stupid choice here: sorting a large number of rows
 usually *is* preferable to making an indexscan over them, unless the
 table is remarkably close to being in physical order for the index.
 So it would be worth trying to figure out what the problem with the
 sort is.


I don't really understand this.  It seems to me that fetching and sorting 30
million rows wouldn't be preferable to just fetching them in the correct
order in the first place, even if it's in a random order.

I tried another query with a much smaller result set, and the index scan
takes 76 seconds, but the bitmap scan  sort takes 1.5 hours.  That's quite
a difference.  I'm pretty sure the physical order of the index is very
different from the physical order of the table.  The elements of the table
are inserted in strictly time order, if that's how it ends up being on disk,
whereas the index, as far as I understand it, would be sorted by the first
of the multiple columns, the contract_id, then the time.

Here's both of the EXPLAIN ANALYZEs for the same query:

= explain analyze select * from ticks where contract_id=1 order by time
limit 280;

QUERY
PLAN


 Limit  (cost=10487812.41..10494812.41 rows=280 width=40) (actual
time=5541109.704..5545345.598 rows=280 loops=1)
   -  Sort  (cost=10487812.41..10565267.29 rows=30981949 width=40) (actual
time=5541109.702..5544883.149 rows=280 loops=1)
 Sort Key: time
 Sort Method:  external merge  Disk: 1637392kB
 -  Bitmap Heap Scan on ticks  (cost=718724.01..7015201.37
rows=30981949 width=40) (actual time=4874084.105..5465131.997 rows=27917481
loops=1)
   Recheck Cond: (contract_id = 1)
   -  Bitmap Index Scan on contract_id_time_idx

Re: [PERFORM] Index scan plan estimates way off.

2009-03-05 Thread Robert Haas
On Thu, Mar 5, 2009 at 1:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jonathan Hseu vom...@vomjom.net writes:
  Sort  (cost=11684028.44..11761274.94 rows=30898601 width=40)
    Sort Key: time
    -  Bitmap Heap Scan on ticks  (cost=715657.57..6995196.08 rows=30898601
 width=40)
          Recheck Cond: (contract_id = 1)
          -  Bitmap Index Scan on contract_id_time_idx
 (cost=0.00..707932.92 rows=30898601 width=0)
                Index Cond: (contract_id = 1)
 (6 rows)

 This plan doesn't complete in a reasonable amount of time.  I end up having
 to kill the query after it's been running for over an hour.

 The bitmap scan should be at least as efficient as the plain indexscan,
 so I suppose the problem is that the sort is slow.  What's the datatype
 of time?  Can this machine actually support 256MB+ work_mem, or is that
 likely to be driving it into swapping?

 You might learn more from enabling trace_sort and watching the
 postmaster log entries it generates.  On the whole I think the planner
 isn't making a stupid choice here: sorting a large number of rows
 usually *is* preferable to making an indexscan over them, unless the
 table is remarkably close to being in physical order for the index.

It seems like this is only likely to be true if most of the data needs
to be read from a magnetic disk, so that many seeks are involved.
That might not be the case here, since the machine has an awful lot of
RAM.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance