Hello,

I have a situation with a highly volatile table where the planner seems to
make the incorrect choice.  The problem is definitely statistics related,
but I don't think statistics ought to be related to the determination of
which index to consider.

Setting up a reproduction case is a bit challenging so the main question is
if my position is logically correct.

Consider a table:

CREATE TABLE foo(foo_key bigint, tt1 TIMESTAMPTZ, t2 TIMESTAMPTZ int);
...and two partial indexes

CREATE INDEX foo_a_idx ON foo(a) WHERE t2 IS NOT NULL;
CREATE INDEX foo_t1_idx ON foo(t1) WHERE t2 IS NOT NULL;

...and the following query
SELECT * FROM foo f WHERE a = ? AND t2 IS NOT NULL;

...the database seems to want to pick foo_t1_idx when the estimated row
counts are similar even though ISTM there is no way it could give better
results for any distribution of dat; only the same, or worse (is this
true?).  So my position here is that, when selecting indexes, quals seem
not to be given enough weight.   This can lead to dramatic performance
degradations if the estimated row counts are off, which can happen if when
t2 'not null' is volatile and designed to isolate the 'needle' from the
'haystack'.

This example is a bit simplified from my actual case but this seems to be a
general issue; over the years I've noticed that I've had to aggressively
analyze or do other tricks to force the database to pick optimal index a vs
suboptimal index b.  I'm interested if there is awareness and/or interest
in this; if so and if needed;  I can start working through some
reproduction cases.  Note, I haven't tested on 18+ yet so  there is a
chance this is already addressed assuming it is in fact a known and
solvable issue.

Thanks in advance,
merlin

Reply via email to