[PERFORM] Subquery WHERE IN or WHERE EXISTS faster?

Ulrich Sat, 28 Jun 2008 08:23:04 -0700

Hi,

I am new to SQL and have two tables..., "processor" and"users_processors". The first table contains Processors:


CREATE TABLE processor (
id SERIAL,
speed varchar(50) NOT NULL,
type int2 NOT NULL,
PRIMARY KEY (id)
);
CREATE UNIQUE INDEX processor_speed_index ON processors(lower(speed));

Example:
1 "100MHz" 0
2 "36GHz" 7
...


The second Table defines which processor one user has got:

CREATE TABLE users_processors (
userid int REFERENCES users ON UPDATE CASCADE ON DELETE CASCADE,
processorid int REFERENCES processors ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY(userid, processorid)
);

CREATE INDEX users_processors_processorid_index ONusers_processors(processorid);CREATE INDEX users_processors_processorid_index ONusers_processors(processorid);


Example:
1 2
1 3
1 4
...
2 1
2 2
...
(The user "1" own processors 2,3,4 and the user 2 owns processors 1,2)


__________________________________________________________

Now, I would like to list all processors user "1" has got. The followingquery does that:SELECT speed FROM processors WHERE id IN (SELECT processorid FROMusers_processors WHERE userid=1) ORDER BY speed ASC LIMIT 10 OFFSET 2;

This would return 10 processors beginning with number 3. I have read,that this query is slow and can be faster. I analyzed it:Limit (cost=22.90..22.90 rows=1 width=118) (actual time=0.344..0.349rows=9 loops=1)-> Sort (cost=22.90..22.90 rows=2 width=118) (actualtime=0.341..0.341 rows=11 loops=1)

        Sort Key: processors.speed
        Sort Method:  quicksort  Memory: 17kB

-> Nested Loop (cost=15.03..22.89 rows=2 width=118) (actualtime=0.225..0.289 rows=11 loops=1)-> HashAggregate (cost=15.03..15.05 rows=2 width=4)(actual time=0.207..0.214 rows=11 loops=1)-> Bitmap Heap Scan on users_processors(cost=4.34..15.01 rows=11 width=4) (actual time=0.175..0.179 rows=11loops=1)

                          Recheck Cond: (userid = 1)

-> Bitmap Index Scan onusers_processors_userid_index (cost=0.00..4.33 rows=11 width=0) (actualtime=0.159..0.159 rows=12 loops=1)

                                Index Cond: (userid = 1)

-> Index Scan using processors_pkey on processors(cost=0.00..3.90 rows=1 width=122) (actual time=0.004..0.004 rows=1loops=11)Index Cond: (processors.id =users_processors.processorid)

Total runtime: 0.478 ms
(13 rows)


__________________________________________________________


People say that this query is faster:

SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROMusers_processors WHERE userid=1 AND processorid=processors.id) ORDER BYspeed ASC LIMIT 10 OFFSET 2;


Analyze returns:

Limit (cost=4404.52..4404.55 rows=10 width=118) (actualtime=0.179..0.184 rows=9 loops=1)-> Sort (cost=4404.52..4405.18 rows=265 width=118) (actualtime=0.176..0.177 rows=11 loops=1)

        Sort Key: processors.speed
        Sort Method:  quicksort  Memory: 17kB

-> Seq Scan on processors (cost=0.00..4398.44 rows=265width=118) (actual time=0.056..0.118 rows=11 loops=1)

              Filter: (subplan)
              SubPlan

-> Index Scan using users_processors_pkey onusers_processors (cost=0.00..8.27 rows=1 width=0) (actualtime=0.006..0.006 rows=1 loops=11)

                      Index Cond: ((userid = 1) AND (processorid = $0))
Total runtime: 0.267 ms
(10 rows)

The second query is faster, but I have only used a very small table withless than 20 items. In real-world I will have tables with thousands ofentries. I wonder if the second query is also faster in cases where Ihave big tables, because it does a "Seq Scan", for me this looks like acomplete table scan. This seams reasonable if we look at the query I donot expect that it is possible to use an INDEX for the second query. So,is it slower?


Which query would you use, the first or the second one?

I would also like to know the total number of processors one user hasgot. I would use one of those queries and replace the "SELECT speed"with "SELECT count(*)" and remove the LIMIT and OFFSET. Is this good? Ihave read that count(*) is slow.


Kind regards
Ulrich

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Subquery WHERE IN or WHERE EXISTS faster?

Reply via email to