On Sat, Dec 2, 2017 at 3:44 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Jeff Janes <jeff.ja...@gmail.com> writes:
> > On Fri, Dec 1, 2017 at 11:08 PM, Vitaliy Garnashevich <
> > vgarnashev...@gmail.com> wrote:
> >> # x4 tuple/operator costs - bitmap scan still a bit cheaper
> >> set seq_page_cost = 1.0;
> >> set random_page_cost = 1.0;
> >> set cpu_tuple_cost = 0.04;
> >> set cpu_index_tuple_cost = 0.02;
> >> set cpu_operator_cost = 0.01;
> > If you really want to target the plan with the BitmapAnd, you should
> > increase cpu_index_tuple_cost and/or cpu_operator_cost but not increase
> > cpu_tuple_cost. That is because the unselective bitmap index scan does
> > not incur any cpu_tuple_cost, but does incur index_tuple and operator
> > costs. Unfortunately all other index scans in the system will also be
> > skewed by such a change if you make the change system-wide.
> I think it'd be a serious error to screw around with your cost settings
> on the basis of a single case in which the rowcount estimates are so
> far off. It's really those estimates that are the problem AFAICS.
> The core issue in this example is that, the way the test data is set up,
> the "flag = true" condition actually adds no selectivity at all, because
> every row with "num = 1" is certain to have "flag = true". If the planner
> realized that, it would certainly not bother with BitmapAnd'ing the flag
> index onto the results of the num index. But it doesn't know that those
> columns are correlated, so it supposes that adding the extra index will
> give a 10x reduction in the number of heap rows that have to be visited
> (since it knows that only 1/10th of the rows have "flag = true").
> *That* is what causes the overly optimistic cost estimate for the
> two-index bitmapscan, and no amount of fiddling with the cost parameters
> will make that better.
But he also tested with num=2 and num=39, which reverses the situation so
the bitmap is 100% selective rather than the 90% the planner thinks it will
But it is still slower for him (I am having trouble replicating that exact
behavior), so building the bitmap to rule out 100% of the rows is
empirically not worth it, I don't see how building it to rule out 90%, as
the planner things, would be any better.
> I tried creating multiple-column statistics using the v10 facility for
> regression=# create statistics s1 on num, flag from aaa;
> CREATE STATISTICS
> regression=# analyze aaa;
> but that changed the estimate not at all, which surprised me because
> dependency statistics are supposed to fix exactly this type of problem.
> I suspect there may be something in the extended-stats code that causes it
> not to work right for boolean columns --- this wouldn't be excessively
> surprising because of the way the planner messes around with converting
> "flag = true" to just "flag" and sometimes back again. But I've not
> looked closer yet.
I think the non-extended stats code also has trouble with booleans.
pg_stats gives me a correlation of 0.8 or higher for the flag column.
Due to that, when I disable bitmapscans and seqscans, I start getting slow
index scans on the wrong index, i2 rather than i1. I don't know why he
doesn't see that in his example.