Re: [HACKERS] Cost of sort/order by not estimated by the query planner

2009-12-03 Thread Laurent Laborde
'morning !

And here is the query plan for :
---
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
ORDER BY _article.id ASC
LIMIT 5;

 Limit  (cost=0.00..2238.33 rows=5 width=1099) (actual
time=17548636.326..17548837.082 rows=5 loops=1)
   -  Index Scan using _article_pkey on _article
(cost=0.00..7762964.53 rows=17341 width=1099) (actual
time=17548636.324..17548837.075 rows=5 loops=1)
 Filter: (bitfield  B'1'::bit varying)
 Total runtime: 17548837.154 ms


Versus the limit 500 query plan :
---
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
ORDER BY _article.id ASC
LIMIT 500;

 Limit  (cost=66229.90..66231.15 rows=500 width=1099) (actual
time=1491.905..1492.146 rows=500 loops=1)
   -  Sort  (cost=66229.90..66273.25 rows=17341 width=1099) (actual
time=1491.904..1492.008 rows=500 loops=1)
 Sort Key: id
 Sort Method:  top-N heapsort  Memory: 603kB
 -  Bitmap Heap Scan on _article  (cost=138.67..65365.82
rows=17341 width=1099) (actual time=777.489..1487.120 rows=6729
loops=1)
   Recheck Cond: (bitfield  B'1'::bit varying)
   -  Bitmap Index Scan on idx_article_bitfield
(cost=0.00..134.33 rows=17341 width=0) (actual time=769.799..769.799
rows=6729 loops=1)
 Index Cond: (bitfield  B'1'::bit varying)
 Total runtime: 1630.690 ms


I will read (and try to understand) all you said yesterday and reply
as soon as i can :)
Thank you !

-- 
Laurent ker2x Laborde
Sysadmin  DBA at http://www.over-blog.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost of sort/order by not estimated by the query planner

2009-12-03 Thread Laurent Laborde
The table is clustered by by blog_id.
So, for testing purpose, i tried an ORDER BY blog_id.

limit 500 :
-
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
ORDER BY _article.blog_id ASC
LIMIT 500;

 Limit  (cost=66229.90..66231.15 rows=500 width=1099) (actual
time=9.368..9.580 rows=500 loops=1)
   -  Sort  (cost=66229.90..66273.25 rows=17341 width=1099) (actual
time=9.367..9.443 rows=500 loops=1)
 Sort Key: blog_id
 Sort Method:  top-N heapsort  Memory: 660kB
 -  Bitmap Heap Scan on _article  (cost=138.67..65365.82
rows=17341 width=1099) (actual time=0.905..4.042 rows=6729 loops=1)
   Recheck Cond: (bitfield  B'1'::bit varying)
   -  Bitmap Index Scan on idx_article_bitfield
(cost=0.00..134.33 rows=17341 width=0) (actual time=0.772..0.772
rows=6729 loops=1)
 Index Cond: (bitfield  B'1'::bit varying)
 Total runtime: 9.824 ms

Limit 5 :
--
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
ORDER BY _article.blog_id ASC
LIMIT 5;

 Limit  (cost=0.00..1419.22 rows=5 width=1099) (actual
time=125076.420..280419.143 rows=5 loops=1)
   -  Index Scan using idx_article_blog_id on _article
(cost=0.00..4922126.37 rows=17341 width=1099) (actual
time=125076.419..280419.137 rows=5 loops=1)
 Filter: (bitfield  B'1'::bit varying)
 Total runtime: 280419.241 ms


-- 
Laurent ker2x Laborde
Sysadmin  DBA at http://www.over-blog.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost of sort/order by not estimated by the query planner

2009-12-02 Thread Laurent Laborde
hummm Adding pgsql-perf :)

On Mon, Nov 30, 2009 at 5:54 PM, Laurent Laborde kerdez...@gmail.com wrote:
 Friendly greetings !
 I use postgresql 8.3.6.

 here is a few info about the table i'm querying :
 -
 - select count(*) from _article : 17301610
 - select count(*) from _article WHERE (_article.bitfield  getbit(0)) : 6729


 Here are both request with problems :
 --

 QUERY 1 : Very fast !
 -

 explain SELECT *
 FROM   _article
 WHERE (_article.bitfield  getbit(0))
 ORDER BY _article.id ASC
 LIMIT 500;
                                             QUERY PLAN
 -
  Limit  (cost=66114.13..66115.38 rows=500 width=1114)
   -  Sort  (cost=66114.13..66157.37 rows=17296 width=1114)
         Sort Key: id
         -  Bitmap Heap Scan on _article  (cost=138.32..65252.29
 rows=17296 width=1114)
               Recheck Cond: (bitfield  B'1'::bit varying)
               -  Bitmap Index Scan on idx_article_bitfield
 (cost=0.00..134.00 rows=17296 width=0)
                     Index Cond: (bitfield  B'1'::bit varying)




 QUERY 2 : Endless ... (more than 30mn... i stopped the query)
 -

 explain SELECT *
 FROM   _article
 WHERE (_article.bitfield  getbit(0))
 ORDER BY _article.id ASC
 LIMIT 5;
                                           QUERY PLAN
 -
  Limit  (cost=0.00..2042.87 rows=5 width=1114)
   -  Index Scan using _article_pkey on _article
 (cost=0.00..7066684.46 rows=17296 width=1114)
         Filter: (bitfield  B'1'::bit varying)
 (3 rows)


 With LIMIT 5 and LIMIT 500, the query plan are differents.
 Postgresql estimate that it can do a a simple index scan to find only 5 row.
 With more than LIMIT ~400 it estimate that it's faster to do a more
 complex plan.
 and it make sense !

 The problem is in the order by, of course.
 If i remove the order by the LIMIT 5 is faster (0.044 ms) and do an
 index scan.
 At limit 500 (without order) it still use an index scan and it is
 slightly slower.
 At limit 5000 (without order) it switch to a Bitmap Index Scan +
 Bitmap Heap Scan and it's slower but acceptable (5.275 ms)

 Why, with the QUERY 2, postgresql doesn't estimate the cost of the
 Sort/ORDER BY ?
 Of course, by ignoring the order, both query plan are right and the
 choice for thoses differents plans totally make sense.

 But... if the planner would be kind enough to considerate the cost of
 the order by, it would certainly choose the Bitmap Index + Bitmap Heap
 scan for the limit 5.
 And not an index_scan pkey !

 I have set the statistics to 1000 for _article.bitfield, just in case
 (and ran a vacuum analyze), it doesn't change anything.

 Is that a bug ? any Idea ?

 Thank you :)

 --
 Laurent ker2x Laborde
 Sysadmin  DBA at http://www.over-blog.com/




-- 
Laurent ker2x Laborde
Sysadmin  DBA at http://www.over-blog.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cost of sort/order by not estimated by the query planner

2009-12-02 Thread Hitoshi Harada
2009/12/1 Laurent Laborde kerdez...@gmail.com:
 The problem is in the order by, of course.
 If i remove the order by the LIMIT 5 is faster (0.044 ms) and do an
 index scan.
 At limit 500 (without order) it still use an index scan and it is
 slightly slower.
 At limit 5000 (without order) it switch to a Bitmap Index Scan +
 Bitmap Heap Scan and it's slower but acceptable (5.275 ms)

 Why, with the QUERY 2, postgresql doesn't estimate the cost of the
 Sort/ORDER BY ?
 Of course, by ignoring the order, both query plan are right and the
 choice for thoses differents plans totally make sense.

It's because the result of IndexScan is already sorted by demanded
key, whereas the one of BitmapIndexScan isn't. But I'm not sure why
the query lasts more than 30 minutes...


Regards,

-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cost of sort/order by not estimated by the query planner

2009-11-30 Thread Laurent Laborde
Friendly greetings !
I use postgresql 8.3.6.

here is a few info about the table i'm querying :
-
- select count(*) from _article : 17301610
- select count(*) from _article WHERE (_article.bitfield  getbit(0)) : 6729


Here are both request with problems :
--

QUERY 1 : Very fast !
-

explain SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
ORDER BY _article.id ASC
LIMIT 500;
 QUERY PLAN
-
 Limit  (cost=66114.13..66115.38 rows=500 width=1114)
   -  Sort  (cost=66114.13..66157.37 rows=17296 width=1114)
 Sort Key: id
 -  Bitmap Heap Scan on _article  (cost=138.32..65252.29
rows=17296 width=1114)
   Recheck Cond: (bitfield  B'1'::bit varying)
   -  Bitmap Index Scan on idx_article_bitfield
(cost=0.00..134.00 rows=17296 width=0)
 Index Cond: (bitfield  B'1'::bit varying)




QUERY 2 : Endless ... (more than 30mn... i stopped the query)
-

explain SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
ORDER BY _article.id ASC
LIMIT 5;
   QUERY PLAN
-
 Limit  (cost=0.00..2042.87 rows=5 width=1114)
   -  Index Scan using _article_pkey on _article
(cost=0.00..7066684.46 rows=17296 width=1114)
 Filter: (bitfield  B'1'::bit varying)
(3 rows)


With LIMIT 5 and LIMIT 500, the query plan are differents.
Postgresql estimate that it can do a a simple index scan to find only 5 row.
With more than LIMIT ~400 it estimate that it's faster to do a more
complex plan.
and it make sense !

The problem is in the order by, of course.
If i remove the order by the LIMIT 5 is faster (0.044 ms) and do an
index scan.
At limit 500 (without order) it still use an index scan and it is
slightly slower.
At limit 5000 (without order) it switch to a Bitmap Index Scan +
Bitmap Heap Scan and it's slower but acceptable (5.275 ms)

Why, with the QUERY 2, postgresql doesn't estimate the cost of the
Sort/ORDER BY ?
Of course, by ignoring the order, both query plan are right and the
choice for thoses differents plans totally make sense.

But... if the planner would be kind enough to considerate the cost of
the order by, it would certainly choose the Bitmap Index + Bitmap Heap
scan for the limit 5.
And not an index_scan pkey !

I have set the statistics to 1000 for _article.bitfield, just in case
(and ran a vacuum analyze), it doesn't change anything.

Is that a bug ? any Idea ?

Thank you :)

-- 
Laurent ker2x Laborde
Sysadmin  DBA at http://www.over-blog.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers