Re: [PERFORM] Sort and index

Jim C. Nasby Fri, 22 Apr 2005 19:36:07 -0700

I've run some performance tests. The actual test case is at
http://stats.distributed.net/~decibel/timing.sql, and the results are at
http://stats.distributed.net/~decibel/timing.log. In a nutshell, doing
an index scan appears to be about 2x faster than a sequential scan and a
sort.


Something else of interest is that going from 50M of sort memory to 3G
sped the sort up by 900 seconds. If someone wants to record data about
the effect of sort_mem on on-disk sorts somewhere (maybe in the docs?) I
can run some more tests for that case.

In any case, it's clear that the planner is making the wrong choice
here. BTW, changing random_page_cost to 3 or 4 doesn't change the plan.

On Tue, Apr 19, 2005 at 10:40:41PM -0500, Jim C. Nasby wrote:
> On Tue, Apr 19, 2005 at 11:01:26PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > > Actually, the planner (at least in 7.4) isn't smart enough to consider
> > > if the sort would fit in memory or not.
> > 
> > Really?  Have you read cost_sort()?
> > 
> > It's certainly possible that the calculation is all wet, but to claim
> > that the issue is not considered is just wrong.
> 
> To be fair, no, I haven't looked at the code. This is based strictly on
> anecdotal evidence on a 120M row table. I'm currently running a test to
> see how an index scan compares to a seqscan. I also got the same results
> when I added a where clause that would restrict it to about 7% of the
> table.
> 
> Actually, after running some tests (below), the plan cost does change
> when I change sort_mem (it was originally 50000).
> 
> stats=# \d email_contrib
>    Table "public.email_contrib"
>    Column   |  Type   | Modifiers 
> ------------+---------+-----------
>  project_id | integer | not null
>  id         | integer | not null
>  date       | date    | not null
>  team_id    | integer | 
>  work_units | bigint  | not null
> Indexes:
>     "email_contrib_pkey" primary key, btree (project_id, id, date)
>     "email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
>     "email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
>     "email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
>     "email_contrib__project_date" btree (project_id, date)
> Foreign-key constraints:
>     "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) 
> ON UPDATE CASCADE
>     "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES 
> stats_team(team) ON UPDATE CASCADE
> 
> stats=# explain select * from email_contrib where project_id=8 order by 
> project_id, id, date;
>                                    QUERY PLAN                                 
>   
> --------------------------------------------------------------------------------
>  Sort  (cost=3613476.05..3635631.71 rows=8862263 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2471377.50 rows=8862263 
> width=24)
>          Filter: (project_id = 8)
> (4 rows)
> 
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                     QUERY PLAN                                
>     
> ----------------------------------------------------------------------------------
>  Sort  (cost=25046060.83..25373484.33 rows=130969400 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 
> width=24)
> (3 rows)
> 
> stats=# select 8862263::float/130969400;
>       ?column?      
> --------------------
>  0.0676666687027657
> (1 row)
> 
> stats=# explain select * from email_contrib where project_id=8 order by 
> project_id, id, date;
>                                              QUERY PLAN                       
>                        
> -----------------------------------------------------------------------------------------------------
>  Index Scan using email_contrib_pkey on email_contrib  (cost=0.00..6832005.57 
> rows=8862263 width=24)
>    Index Cond: (project_id = 8)
> (2 rows)
> 
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                                QUERY PLAN                     
>                            
> ---------------------------------------------------------------------------------------------------------
>  Index Scan using email_contrib_pkey on email_contrib  
> (cost=0.00..100055905.62 rows=130969400 width=24)
> (1 row)
> 
> stats=# set enable_seqscan=on;
> SET
> stats=# set sort_mem=1000;
> SET
> stats=# explain select * from email_contrib order by project_id, id, date;
>                                     QUERY PLAN                                
>     
> ----------------------------------------------------------------------------------
>  Sort  (cost=28542316.63..28869740.13 rows=130969400 width=24)
>    Sort Key: project_id, id, date
>    ->  Seq Scan on email_contrib  (cost=0.00..2143954.00 rows=130969400 
> width=24)
> (3 rows)
> 
> stats=# 
> 
> -- 
> Jim C. Nasby, Database Consultant               [EMAIL PROTECTED] 
> Give your computer some brain candy! www.distributed.net Team #1828
> 
> Windows: "Where do you want to go today?"
> Linux: "Where do you want to go tomorrow?"
> FreeBSD: "Are you guys coming, or what?"
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 

-- 
Jim C. Nasby, Database Consultant               [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Sort and index

Reply via email to