Re: [PERFORM] Thinking About Correlated Columns (again)

Thomas Kellerer Wed, 15 May 2013 13:36:55 -0700

Shaun Thomas wrote on 15.05.2013 17:31:

Hi!


This has been a pain point for quite a while. While we've had several
discussions in the area, it always seems to just kinda trail off and
eventually vanish every time it comes up.

A really basic example of how bad the planner is here:

CREATE TABLE foo AS
SELECT a.id, a.id % 1000 AS col_a, a.id % 1000 AS col_b
   FROM generate_series(1, 1000000) a(id);

CREATE INDEX idx_foo_ab ON foo (col_a, col_b);

Index Scan using idx_foo_ab on foo  (cost=0.00..6.35 rows=1 width=12)
            (actual time=0.030..3.643 rows=1000 loops=1)
   Index Cond: ((col_a = 50) AND (col_b = 50))

Hey, look! The row estimate is off by a factor of 1000. This
particular case doesn't suffer terribly from the mis-estimation, but
others do. Boy, do they ever.


What happens if you create one index for each column? (instead of one combined 
index)

For your example it does not seem to improve the situation, but maybe things get better 
with the "bad" queries?






--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Thinking About Correlated Columns (again)

Reply via email to