Re: a wrong index choose when statistics is out of date

Andy Fan Mon, 04 Mar 2024 03:37:59 -0800


David Rowley <dgrowle...@gmail.com> writes:

> On Sun, 3 Mar 2024 at 20:08, Andy Fan <zhihuifan1...@163.com> wrote:
>> The issue can be reproduced with the following steps:
>>
>> create table x_events (.., created_at timestamp, a int, b int);
>>
>> create index idx_1 on t(created_at, a);
>> create index idx_2 on t(created_at, b);
>>
>> query:
>> select * from t where create_at = current_timestamp and b = 1;
>>
>> index (created_at, a) rather than (created_at, b) may be chosen for the
>> above query if the statistics think "create_at = current_timestamp" has
>> no rows, then both index are OK, actually it is true just because
>> statistics is out of date.
>
> I don't think there's really anything too special about the fact that
> the created_at column is always increasing. We commonly get 1-row
> estimates after multiplying the selectivities from individual stats.
> Your example just seems like yet another reason that this could
> happen.

You are right about there are more cases which lead this happen. However
this is the only case where the created_at = $1 trick can works, which
was the problem I wanted to resove when I was writing. 

> I've been periodically talking about introducing "risk" as a factor
> that the planner should consider.  I did provide some detail in [1]
> about the design that was in my head at that time.  I'd not previously
> thought that it could also solve this problem, but after reading your
> email, I think it can.

Haha, I remeber you were against "risk factor" before at [1], and at
that time we are talking about the exact same topic as here, and I
proposaled another risk factor. Without an agreement, I did it in my
own internal version and get hurted then, something like I didn't pay
enough attention to Bitmap Index Scan and Index scan. Then I forget the
"risk factor".

>
> I don't think it would be right to fudge the costs in any way, but I
> think the risk factor for IndexPaths could take into account the
> number of unmatched index clauses and increment the risk factor, or
> "certainty_factor" as it is currently in my brain-based design. That
> way add_path() would be more likely to prefer the index that matches
> the most conditions.

This is somehow similar with my proposal at [1]?  What do you think
about the treat 'col op const' as 'col op $1' for the marked column?
This could just resolve a subset of questions in your mind, but the
method looks have a solid reason.

Currently I treat the risk factor as what you did before, but this maybe
another time for me to switch my mind again.

[1] 
https://www.postgresql.org/message-id/CAApHDvovVWCbeR4v%2BA4Dkwb%3DYS_GuJG9OyCm8jZu%2B%2BcP2xsY_A%40mail.gmail.com
-- 
Best Regards
Andy Fan

Re: a wrong index choose when statistics is out of date

Reply via email to