Re: A wrong index choose issue because of inaccurate statistics

Andy Fan Thu, 04 Jun 2020 20:34:13 -0700

>
>
>
> Why will the (a, c) be choose?  If planner think a = x has only 1 row ..
>


I just did more research and found above statement is not accurate,
the root cause of this situation is because IndexSelectivity = 0. Even
through I
don't think we can fix anything here since IndexSelectivity is calculated
from
statistics and we don't know it is 1% wrong or 10% wrong or more just like
What
Tomas said.

The way of fixing it is just add a "small" extra cost for pqquals for index
scan. that should be small enough to not have impacts on others part. I will
discuss how small is small later with details, we can say it just a guc
variable
for now.

+++ b/src/backend/optimizer/path/costsize.c
@@ -730,6 +730,13 @@ cost_index(IndexPath *path, PlannerInfo *root, double
loop_count,

        cpu_run_cost += cpu_per_tuple * tuples_fetched;

+       /*
+        * To make the planner more robust to handle some inaccurate
statistics
+        * issue, we will add a extra cost to qpquals so that the less
qpquals
+        * the lower cost it has.
+        */
+       cpu_run_cost += stat_stale_cost * list_length(qpquals);
+

If we want to reduce the impact of this change further, we can only add
this if
the IndexSelecivity == 0.

How to set the value of stat_stale_cost? Since the minimum cost for a query
should be a cpu_tuple_cost which is 0.01 default. Adding an 0.01 cost for
each
pqqual in index scan should not make a big difference.  However sometimes
we may
set it to 0.13 if we consider index->tree_height was estimated wrongly for
1 (cost is
50 * 0.0025 = 0.125). I don't know how it happened, but looks it do happen
in prod
environment. At the same time it is unlikely index->tree_height is estimated
wrongly for 2 or more. so basically we can set this value to 0(totally
disable
this feature), 0.01 (should be ok for most case), 0.13 (A bit aggressive).

The wrong estimation of IndexSelectitity = 0 might be common case and if
people just have 2 related index like (A, B) and (A, C). we have 50%
chances to
have a wrong decision, so I would say this case worth the troubles. My
current
implementation looks not cool, so any suggestion to research further is
pretty
welcome.

-- 
Best Regards
Andy Fan

Re: A wrong index choose issue because of inaccurate statistics

Reply via email to