"Tom Lane" <[EMAIL PROTECTED]> writes:
> "Matt Smiley" <[EMAIL PROTECTED]> writes:
> > So an Index Scan is always going to have a higher cost estimate than
> > an equivalent Seq Scan returning the same result rows (unless
> > random_page_cost is < 1). That's why I think the planner is always
> > preferring the plan that uses a Seq Scan.
>
> If that were the case, we'd never choose an indexscan at all...
You're right, that was a silly guess.
> It's true that a plain indexscan is not preferred for queries that will
> return a large fraction of the table. However, it should be willing to
> use a bitmap scan for this query, given default cost settings (the
> default cost settings will cause it to prefer bitmap scan for retrieving
> up to about a third of the table, in my experience). I too am confused
> about why it doesn't prefer that choice in the OP's example.
It looks like the bitmap scan has a higher cost estimate because the entire
bitmap index must be built before beginning the heap scan and returning rows up
the pipeline. The row-count limit can't be pushed lower than the
bitmap-heap-scan like it can for the basic index-scan.
test_8_3_3=# set enable_seqscan = false ;
SET
test_8_3_3=# set enable_indexscan = false ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3
limit 15 ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=17070.22..17071.02 rows=15 width=8) (actual time=606.902..607.086
rows=15 loops=1)
-> Bitmap Heap Scan on my_table (cost=17070.22..69478.96 rows=988217
width=8) (actual time=606.892..606.983 rows=15 loops=1)
Recheck Cond: (b = 3)
Filter: (a IS NULL)
-> Bitmap Index Scan on idx_b (cost=0.00..16823.17 rows=1033339
width=0) (actual time=592.657..592.657 rows=1000000 loops=1)
Index Cond: (b = 3)
Total runtime: 607.340 ms
(7 rows)
> It would be interesting to alter the random_page_cost setting and see if he
> gets
> different results.
Using an unmodified postgresql.conf, the cost estimate for an index-scan were
so much higher than for a seqscan that random_page_cost had to be set below 0.2
before the index-scan was preferred. However, it looks like this was mainly
because effective_cache_size was too small. The planner thought the cache was
only 128 MB, and the size of the complete table+index was 39492 + 21946 pages *
8 KB/block = 330 MB. It makes sense for the cost estimate to be so much higher
if blocks are expected to be repeatedly re-fetched from disk. I wonder if
David's effective_cache_size is too small.
test_8_3_3=# reset all ;
RESET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3
limit 15 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.50 rows=15 width=8) (actual time=0.036..0.239 rows=15
loops=1)
-> Seq Scan on my_table (cost=0.00..164492.74 rows=988217 width=8) (actual
time=0.028..0.138 rows=15 loops=1)
Filter: ((a IS NULL) AND (b = 3))
Total runtime: 0.338 ms
(4 rows)
test_8_3_3=# set enable_seqscan = false ;
SET
test_8_3_3=# show random_page_cost ;
random_page_cost
------------------
4
(1 row)
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3
limit 15 ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..45.99 rows=15 width=8) (actual time=0.051..0.200 rows=15
loops=1)
-> Index Scan using idx_b on my_table (cost=0.00..3029924.36 rows=988217
width=8) (actual time=0.043..0.100 rows=15 loops=1)
Index Cond: (b = 3)
Filter: (a IS NULL)
Total runtime: 0.308 ms
(5 rows)
test_8_3_3=# set random_page_cost = 0.19 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3
limit 15 ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.45 rows=15 width=8) (actual time=0.050..0.201 rows=15
loops=1)
-> Index Scan using idx_b on my_table (cost=0.00..161190.65 rows=988217
width=8) (actual time=0.042..0.097 rows=15 loops=1)
Index Cond: (b = 3)
Filter: (a IS NULL)
Total runtime: 0.307 ms
(5 rows)
Now fix effective_cache_size and try again.
test_8_3_3=# reset all ;
RESET
test_8_3_3=# set effective_cache_size = '500MB' ;
SET
test_8_3_3=# set enable_seqscan = false ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3
limit 15 ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.78 rows=15 width=8) (actual time=0.051..0.204 rows=15
loops=1)
-> Index Scan using idx_b on my_table (cost=0.00..183361.21 rows=988217
width=8) (actual time=0.043..0.103 rows=15 loops=1)
Index Cond: (b = 3)
Filter: (a IS NULL)
Total runtime: 0.311 ms
(5 rows)
That's better, but still not quite low enough cost estimate to beat the
seqscan. Try adjusting random_page_cost again.
test_8_3_3=# set random_page_cost = 3 ;
SET
test_8_3_3=# explain analyze select * from my_table where a is null and b = 3
limit 15 ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.16 rows=15 width=8) (actual time=0.052..0.202 rows=15
loops=1)
-> Index Scan using idx_b on my_table (cost=0.00..142053.51 rows=988217
width=8) (actual time=0.043..0.100 rows=15 loops=1)
Index Cond: (b = 3)
Filter: (a IS NULL)
Total runtime: 0.311 ms
(5 rows)
That's enough: index-scan's 142053.51 beats seqscan's 164492.74. We no longer
need to set enable_seqscan=false.
--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance