Re: [PERFORM] limit clause breaks query planner?

Matt Smiley Wed, 03 Sep 2008 23:11:31 -0700

"Tom Lane" <[EMAIL PROTECTED]> writes:
>  "Matt Smiley" <[EMAIL PROTECTED]> writes:
>  >  So an Index Scan is always going to have a higher cost estimate than
>  >  an equivalent Seq Scan returning the same result rows (unless
>  >  random_page_cost is < 1).  That's why I think the planner is always
>  >  preferring the plan that uses a Seq Scan.
>  
>  If that were the case, we'd never choose an indexscan at all...


You're right, that was a silly guess.

>  It's true that a plain indexscan is not preferred for queries that will
>  return a large fraction of the table.  However, it should be willing to
>  use a bitmap scan for this query, given default cost settings (the
>  default cost settings will cause it to prefer bitmap scan for retrieving
>  up to about a third of the table, in my experience).  I too am confused
>  about why it doesn't prefer that choice in the OP's example.

It looks like the bitmap scan has a higher cost estimate because the entire 
bitmap index must be built before beginning the heap scan and returning rows up 
the pipeline.  The row-count limit can't be pushed lower than the 
bitmap-heap-scan like it can for the basic index-scan.

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# set enable_indexscan = false ;
SET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=17070.22..17071.02 rows=15 width=8) (actual time=606.902..607.086 
rows=15 loops=1)
   ->  Bitmap Heap Scan on my_table  (cost=17070.22..69478.96 rows=988217 
width=8) (actual time=606.892..606.983 rows=15 loops=1)
         Recheck Cond: (b = 3)
         Filter: (a IS NULL)
         ->  Bitmap Index Scan on idx_b  (cost=0.00..16823.17 rows=1033339 
width=0) (actual time=592.657..592.657 rows=1000000 loops=1)
               Index Cond: (b = 3)
 Total runtime: 607.340 ms
(7 rows)


>  It would be interesting to alter the random_page_cost setting and see if he 
> gets
>  different results.

Using an unmodified postgresql.conf, the cost estimate for an index-scan were 
so much higher than for a seqscan that random_page_cost had to be set below 0.2 
before the index-scan was preferred.  However, it looks like this was mainly 
because effective_cache_size was too small.  The planner thought the cache was 
only 128 MB, and the size of the complete table+index was 39492 + 21946 pages * 
8 KB/block = 330 MB.  It makes sense for the cost estimate to be so much higher 
if blocks are expected to be repeatedly re-fetched from disk.  I wonder if 
David's effective_cache_size is too small.

test_8_3_3=# reset all ;
RESET

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.50 rows=15 width=8) (actual time=0.036..0.239 rows=15 
loops=1)
   ->  Seq Scan on my_table  (cost=0.00..164492.74 rows=988217 width=8) (actual 
time=0.028..0.138 rows=15 loops=1)
         Filter: ((a IS NULL) AND (b = 3))
 Total runtime: 0.338 ms
(4 rows)

test_8_3_3=# set enable_seqscan = false ;
SET

test_8_3_3=# show random_page_cost ;
 random_page_cost
------------------
 4
(1 row)

test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..45.99 rows=15 width=8) (actual time=0.051..0.200 rows=15 
loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..3029924.36 rows=988217 
width=8) (actual time=0.043..0.100 rows=15 loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.308 ms
(5 rows)

test_8_3_3=# set random_page_cost = 0.19 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.45 rows=15 width=8) (actual time=0.050..0.201 rows=15 
loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..161190.65 rows=988217 
width=8) (actual time=0.042..0.097 rows=15 loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.307 ms
(5 rows)


Now fix effective_cache_size and try again.

test_8_3_3=# reset all ;
RESET

test_8_3_3=# set effective_cache_size = '500MB' ;
SET
test_8_3_3=# set enable_seqscan = false ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.78 rows=15 width=8) (actual time=0.051..0.204 rows=15 
loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..183361.21 rows=988217 
width=8) (actual time=0.043..0.103 rows=15 loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.311 ms
(5 rows)

That's better, but still not quite low enough cost estimate to beat the 
seqscan.  Try adjusting random_page_cost again.

test_8_3_3=# set random_page_cost = 3 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3 
limit 15 ;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.16 rows=15 width=8) (actual time=0.052..0.202 rows=15 
loops=1)
   ->  Index Scan using idx_b on my_table  (cost=0.00..142053.51 rows=988217 
width=8) (actual time=0.043..0.100 rows=15 loops=1)
         Index Cond: (b = 3)
         Filter: (a IS NULL)
 Total runtime: 0.311 ms
(5 rows)

That's enough: index-scan's 142053.51 beats seqscan's 164492.74.  We no longer 
need to set enable_seqscan=false.



-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] limit clause breaks query planner?

Reply via email to