I've noticed that row count estimates for expression indexes appear
to rely on default_statistics_target rather than on a column's
actual statistics target.  That is, if I use ALTER TABLE SET
STATISTICS to increase a column's statistics target and then run
ANALYZE, then estimates for non-expression-index queries improve
as expected.  However, queries that use an expression index remain
accurate for only around the N most common values, where N is the
default_statistics_target that was in effect when ANALYZE ran.  I'm
still rummaging through the archives looking for past discussion;
is this behavior a known limitation or just an oversight?

CREATE TABLE foo (x integer);

CREATE INDEX foo_x_idx ON foo (x);
CREATE INDEX foo_abs_x_idx ON foo (abs(x));

INSERT INTO foo (x)
  SELECT r1 % r2
  FROM generate_series(1, 100) AS g1(r1),
       generate_series(1, 100) AS g2(r2);

SET default_statistics_target TO 15;
ALTER TABLE foo ALTER COLUMN x SET STATISTICS 20;
ANALYZE foo;

SELECT most_common_vals FROM pg_stats WHERE attname = 'x';
                  most_common_vals                   
-----------------------------------------------------
 {0,1,2,3,4,5,6,7,8,10,9,11,12,14,13,15,16,19,17,18}
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 13;
                                                      QUERY PLAN                
                                      
----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=2.72..50.28 rows=205 width=4) (actual 
time=0.370..1.766 rows=220 loops=1)
   Recheck Cond: (x = 13)
   ->  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.72 rows=205 width=0) 
(actual time=0.314..0.314 rows=220 loops=1)
         Index Cond: (x = 13)
 Total runtime: 2.905 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 13;
                                                        QUERY PLAN              
                                          
--------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=2.72..50.80 rows=205 width=4) (actual 
time=0.358..1.720 rows=220 loops=1)
   Recheck Cond: (abs(x) = 13)
   ->  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.72 rows=205 width=0) 
(actual time=0.305..0.305 rows=220 loops=1)
         Index Cond: (abs(x) = 13)
 Total runtime: 2.875 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 18;
                                                      QUERY PLAN                
                                      
----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=2.60..49.75 rows=172 width=4) (actual 
time=0.312..1.442 rows=180 loops=1)
   Recheck Cond: (x = 18)
   ->  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.60 rows=172 width=0) 
(actual time=0.262..0.262 rows=180 loops=1)
         Index Cond: (x = 18)
 Total runtime: 2.393 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 18;
                                                       QUERY PLAN               
                                         
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=2.22..43.65 rows=63 width=4) (actual 
time=0.313..1.436 rows=180 loops=1)
   Recheck Cond: (abs(x) = 18)
   ->  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.22 rows=63 width=0) 
(actual time=0.263..0.263 rows=180 loops=1)
         Index Cond: (abs(x) = 18)
 Total runtime: 2.418 ms
(5 rows)

-- 
Michael Fuhr

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to