On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
> Hi Ken,
> 
> Thanks for your tips! Yes it is the case, and I run another query sorting on 
> the second column whose values are random.
> 
> postgres=# explain analyze select * from big_wf order by id;
>                                                       QUERY PLAN              
>                                           
> -------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=565525.45..575775.45 rows=4100000 width=8) (actual 
> time=25681.875..36458.824 rows=4100000 loops=1)
>   Sort Key: id
>   Sort Method:  external merge  Disk: 72048kB
>   ->  Seq Scan on big_wf  (cost=0.00..59142.00 rows=4100000 width=8) (actual 
> time=8.595..5569.500 rows=4100000 loops=1)
> 
> Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 
> seconds, right? But one thing I'm confused is that, why is additional 
> comparison really so expensive?  Does it incur additional I/O? From the cost 
> model, it seems not, all the "cost" are the same (575775.45).
> 
> Thanks,
> Li Jie

In the first query, the cost is basically the I/O cost to read the
table from disk. The actual sort does not do anything since the
sort values are the same. In the second query, the sort has to
swap things in memory/disk to get them in the correct order for
the result. This actually takes CPU and possibly additional I/O
which is why it is slower. In the case of sorting by just the "id"
column, the size of the sorted values is smaller which would need
fewer batches to complete the sort since the sort is bigger than
the work_mem.

Cheers,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to