Re: [PERFORM] Talking about optimizer, my long dream

Віталій Тимчишин Sun, 27 Feb 2011 14:21:23 -0800

27 лютого 2011 р. 19:59 Robert Haas <[email protected]> написав:


> 2011/2/4 Віталій Тимчишин <[email protected]>:
> > Hi, all.
> > All this optimizer vs hint thread reminded me about crazy idea that got
> to
> > my head some time ago.
> > I currently has two problems with postgresql optimizer
> > 1) Dictionary tables. Very usual thing is something like "select * from
> > big_table where distionary_id = (select id from dictionary where
> > name=value)". This works awful if dictionary_id distribution is not
> uniform.
>
> Does it work better if you write it as a join?
>

> SELECT b.* FROM big_table b, dictionary d WHERE b.dictionary_id = d.id
> AND d.name = 'value'
>
> I would like to see a concrete example of this not working well,
> because I've been writing queries like this (with MANY tables) for
> years and it's usually worked very well for me.
>
> Here you are:
 PostgreSQL 8.4.7 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real
(Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
create table a(dict int4, val int4);
create table b(dict int4, name text);
create index c on a(dict);
insert into b values (1, 'small'), (2, 'large');
insert into a values (1,1);
insert into a select 2,generate_series(1,10000);
analyze a;
analyze b;
test=# explain analyze select * from a where dict=1;
                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------
 Index Scan using c on a  (cost=0.00..8.27 rows=1 width=8) (actual
time=0.014..0.016 rows=1 loops=1)
   Index Cond: (dict = 1)
 Total runtime: 0.041 ms
(3 rows)
test=# explain analyze select * from a where dict=2;
                                             QUERY PLAN

-----------------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..170.01 rows=10000 width=8) (actual
time=0.014..6.876 rows=10000 loops=1)
   Filter: (dict = 2)
 Total runtime: 13.419 ms
(3 rows)
test=# explain analyze select * from a,b where a.dict=b.dict and b.name
='small';
                                                QUERY PLAN

-----------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.04..233.55 rows=5000 width=18) (actual
time=0.047..13.159 rows=1 loops=1)
   Hash Cond: (a.dict = b.dict)
   ->  Seq Scan on a  (cost=0.00..145.01 rows=10001 width=8) (actual
time=0.009..6.633 rows=10001 loops=1)
   ->  Hash  (cost=1.02..1.02 rows=1 width=10) (actual time=0.011..0.011
rows=1 loops=1)
         ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=10) (actual
time=0.006..0.008 rows=1 loops=1)
               Filter: (name = 'small'::text)
 Total runtime: 13.197 ms
(7 rows)
test=# explain analyze select * from a,b where a.dict=b.dict and b.name
='large';
                                                QUERY PLAN

-----------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.04..233.55 rows=5000 width=18) (actual
time=0.074..21.476 rows=10000 loops=1)
   Hash Cond: (a.dict = b.dict)
   ->  Seq Scan on a  (cost=0.00..145.01 rows=10001 width=8) (actual
time=0.012..7.085 rows=10001 loops=1)
   ->  Hash  (cost=1.02..1.02 rows=1 width=10) (actual time=0.021..0.021
rows=1 loops=1)
         ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=10) (actual
time=0.015..0.016 rows=1 loops=1)
               Filter: (name = 'large'::text)
 Total runtime: 28.293 ms
(7 rows)

It simply don't know that small=1 and large=2, so it never uses nested loop
+ iindex scan:
test=# set enable_hashjoin=false;
SET
test=# explain analyze select * from a,b where a.dict=b.dict and b.name
='small';
                                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..253.28 rows=5000 width=18) (actual
time=0.041..0.047 rows=1 loops=1)
   ->  Seq Scan on b  (cost=0.00..1.02 rows=1 width=10) (actual
time=0.010..0.012 rows=1 loops=1)
         Filter: (name = 'small'::text)
   ->  Index Scan using c on a  (cost=0.00..189.75 rows=5000 width=8)
(actual time=0.021..0.023 rows=1 loops=1)
         Index Cond: (a.dict = b.dict)
 Total runtime: 0.089 ms
(6 rows)

-- 
Best regards,
 Vitalii Tymchyshyn

Re: [PERFORM] Talking about optimizer, my long dream

Reply via email to