[HACKERS] Why is sorting on two columns so slower than sorting on one column?

2010-12-23 Thread Jie Li
Hi,

Here is the test table,

postgres=# \d big_wf
Table public.big_wf
 Column |  Type   | Modifiers
+-+---
 age| integer |
 id | integer |

postgres=# \dt+ big_wf
 List of relations
 Schema |  Name  | Type  |  Owner   |  Size  | Description
++---+--++-
 public | big_wf | table | workshop | 142 MB |


The first query sorting on one column:
postgres=# explain analyze select * from big_wf order by age;
   QUERY
PLAN
-
 Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
time=11228.155..16427.149 rows=410 loops=1)
   Sort Key: age
   Sort Method:  external sort  Disk: 72112kB
   -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
(actual time=6.196..4797.620 rows=410 loops=1)
 Total runtime: 19530.452 ms
(5 rows)

The second query sorting on two columns:
postgres=# explain analyze select * from big_wf order by age,id;
   QUERY
PLAN
-
 Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
time=37544.779..48206.702 rows=410 loops=1)
   Sort Key: age, id
   Sort Method:  external merge  Disk: 72048kB
   -  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
(actual time=6.796..5518.663 rows=410 loops=1)
 Total runtime: 51258.000 ms
(5 rows)

The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
first column(age) of all the tuples are of the same value, so the second
column(id) is always needed for comparison.  While the first sorting takes
about only 6 seconds, the second one takes over 30 seconds,  Is this too
much than expected? Is there any possible optimization ?

Thanks,
Li Jie


Re: [HACKERS] Why is sorting on two columns so slower than sorting on one column?

2010-12-23 Thread Kenneth Marshall
On Thu, Dec 23, 2010 at 02:33:12AM -0500, Jie Li wrote:
 Hi,
 
 Here is the test table,
 
 postgres=# \d big_wf
 Table public.big_wf
  Column |  Type   | Modifiers
 +-+---
  age| integer |
  id | integer |
 
 postgres=# \dt+ big_wf
  List of relations
  Schema |  Name  | Type  |  Owner   |  Size  | Description
 ++---+--++-
  public | big_wf | table | workshop | 142 MB |
 
 
 The first query sorting on one column:
 postgres=# explain analyze select * from big_wf order by age;
QUERY
 PLAN
 -
  Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
 time=11228.155..16427.149 rows=410 loops=1)
Sort Key: age
Sort Method:  external sort  Disk: 72112kB
-  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
 (actual time=6.196..4797.620 rows=410 loops=1)
  Total runtime: 19530.452 ms
 (5 rows)
 
 The second query sorting on two columns:
 postgres=# explain analyze select * from big_wf order by age,id;
QUERY
 PLAN
 -
  Sort  (cost=565525.45..575775.45 rows=410 width=8) (actual
 time=37544.779..48206.702 rows=410 loops=1)
Sort Key: age, id
Sort Method:  external merge  Disk: 72048kB
-  Seq Scan on big_wf  (cost=0.00..59142.00 rows=410 width=8)
 (actual time=6.796..5518.663 rows=410 loops=1)
  Total runtime: 51258.000 ms
 (5 rows)
 
 The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
 first column(age) of all the tuples are of the same value, so the second
 column(id) is always needed for comparison.  While the first sorting takes
 about only 6 seconds, the second one takes over 30 seconds,  Is this too
 much than expected? Is there any possible optimization ?
 
 Thanks,
 Li Jie

Hi Li,

If I understand your description, in the first query the sort does
not actually have to do anything because the column values for age
are all degenerate. In the second query, you actually need to sort
the values which is why it takes longer. If the first column values
are the same, then simply sorting by id alone would be faster.
You could also bump up work_mem for the query to perform the sort
in memory.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is sorting on two columns so slower than sorting on one column?

2010-12-23 Thread Marti Raudsepp
On Thu, Dec 23, 2010 at 09:33, Jie Li jay23j...@gmail.com wrote:
 While the first sorting takes
 about only 6 seconds, the second one takes over 30 seconds,  Is this too
 much than expected? Is there any possible optimization ?

If you're doing these queries often, you should:
CREATE INDEX ix_big_wf_age_id ON big_wf (age, id)

If that's still not fast enough, you can physically sort rows in the
table using the newly created index:
CLUSTER big_wf USING ix_big_wf_age_id;

Please post back your results. :)

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is sorting on two columns so slower than sorting on one column?

2010-12-23 Thread Li Jie
Hi Marti,

Thanks for your help! I guess I understand what you mean, a clustered index 
will make sorting as cheap as a seq scan, right?

But what I meant is, is there any potential optimization for the backend 
implementation? Intuitively, if sorting on one column or two columns will incur 
the same I/O costs, why should there be so much difference?

Thanks,
Li Jie

- Original Message - 
From: Marti Raudsepp ma...@juffo.org
To: Jie Li jay23j...@gmail.com
Cc: pgsql-hackers pgsql-hackers@postgresql.org
Sent: Thursday, December 23, 2010 10:17 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower than sorting on 
one column?


On Thu, Dec 23, 2010 at 09:33, Jie Li jay23j...@gmail.com wrote:
 While the first sorting takes
 about only 6 seconds, the second one takes over 30 seconds, Is this too
 much than expected? Is there any possible optimization ?

If you're doing these queries often, you should:
CREATE INDEX ix_big_wf_age_id ON big_wf (age, id)

If that's still not fast enough, you can physically sort rows in the
table using the newly created index:
CLUSTER big_wf USING ix_big_wf_age_id;

Please post back your results. :)

Regards,
Marti
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers