Oops, forgot to CC my reply to the list. Sorry if this gets messed up.
On Thu, Mar 5, 2009 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Jonathan Hseu vom...@vomjom.net writes:
Sort (cost=11684028.44..11761274.94 rows=30898601 width=40)
Sort Key: time
- Bitmap Heap Scan on ticks (cost=715657.57..6995196.08
rows=30898601
width=40)
Recheck Cond: (contract_id = 1)
- Bitmap Index Scan on contract_id_time_idx
(cost=0.00..707932.92 rows=30898601 width=0)
Index Cond: (contract_id = 1)
(6 rows)
This plan doesn't complete in a reasonable amount of time. I end up
having
to kill the query after it's been running for over an hour.
The bitmap scan should be at least as efficient as the plain indexscan,
so I suppose the problem is that the sort is slow. What's the datatype
of time?
It's a timestamp with time zone and not null.
Can this machine actually support 256MB+ work_mem, or is that
likely to be driving it into swapping?
Yeah, the machine has 4 GB of RAM and isn't even close to swapping at all.
You might learn more from enabling trace_sort and watching the
postmaster log entries it generates.
I got this (I'm not sure how to interpret it, as there doesn't seem to be
any documentation about it on the web):
2009-03-05 15:28:27 CST STATEMENT: select * from ticks where contract_id=1
order by time limit 280;
2009-03-05 15:28:30 CST LOG: begin tuple sort: nkeys = 1, workMem = 262144,
randomAccess = f
2009-03-05 15:28:30 CST STATEMENT: explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 16:50:31 CST LOG: switching to external sort with 937 tapes: CPU
26.57s/4835.39u sec elapsed 4921.38 sec
2009-03-05 16:50:31 CST STATEMENT: explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:46 CST LOG: performsort starting: CPU 92.51s/4955.58u sec
elapsed 5536.57 sec
2009-03-05 17:00:46 CST STATEMENT: explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:50 CST LOG: finished writing run 1 to tape 0: CPU
92.86s/4958.30u sec elapsed 5539.78 sec
2009-03-05 17:00:50 CST STATEMENT: explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:50 CST LOG: finished writing final run 2 to tape 1: CPU
92.88s/4958.40u sec elapsed 5539.90 sec
2009-03-05 17:00:50 CST STATEMENT: explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:51 CST LOG: performsort done (except 2-way final merge):
CPU 92.96s/4958.55u sec elapsed 5541.10 sec
2009-03-05 17:00:51 CST STATEMENT: explain analyze select * from ticks
where contract_id=1 order by time limit 280;
2009-03-05 17:00:58 CST LOG: external sort ended, 204674 disk blocks used:
CPU 93.36s/4960.04u sec elapsed 5548.33 sec
2009-03-05 17:00:58 CST STATEMENT: explain analyze select * from ticks
where contract_id=1 order by time limit 280;
On the whole I think the planner
isn't making a stupid choice here: sorting a large number of rows
usually *is* preferable to making an indexscan over them, unless the
table is remarkably close to being in physical order for the index.
So it would be worth trying to figure out what the problem with the
sort is.
I don't really understand this. It seems to me that fetching and sorting 30
million rows wouldn't be preferable to just fetching them in the correct
order in the first place, even if it's in a random order.
I tried another query with a much smaller result set, and the index scan
takes 76 seconds, but the bitmap scan sort takes 1.5 hours. That's quite
a difference. I'm pretty sure the physical order of the index is very
different from the physical order of the table. The elements of the table
are inserted in strictly time order, if that's how it ends up being on disk,
whereas the index, as far as I understand it, would be sorted by the first
of the multiple columns, the contract_id, then the time.
Here's both of the EXPLAIN ANALYZEs for the same query:
= explain analyze select * from ticks where contract_id=1 order by time
limit 280;
QUERY
PLAN
Limit (cost=10487812.41..10494812.41 rows=280 width=40) (actual
time=5541109.704..5545345.598 rows=280 loops=1)
- Sort (cost=10487812.41..10565267.29 rows=30981949 width=40) (actual
time=5541109.702..5544883.149 rows=280 loops=1)
Sort Key: time
Sort Method: external merge Disk: 1637392kB
- Bitmap Heap Scan on ticks (cost=718724.01..7015201.37
rows=30981949 width=40) (actual time=4874084.105..5465131.997 rows=27917481
loops=1)
Recheck Cond: (contract_id = 1)
- Bitmap Index Scan on contract_id_time_idx
(cost=0.00