On Tue, May 26, 2020 at 9:59 PM Ashutosh Bapat <ashutosh.bapat....@gmail.com>
wrote:

> On Tue, May 26, 2020 at 1:52 PM Andy Fan <zhihui.fan1...@gmail.com> wrote:
> >
> >
> > Consider the below example:
> >
> > create table j1(i int, im5 int,  im100 int, im1000 int);
> > insert into j1 select i, i%5, i%100, i%1000 from generate_series(1,
> 10000000)i;
> > create index j1_i_im5 on j1(i, im5);
> > create index j1_i_im100 on j1(i, im100);
> > analyze j1;
> > explain select * from j1 where i = 100 and im5 = 5;
> >
> > We may get the plan like this:
> >
> > demo=# explain select  * from  j1 where i = 100 and im5 = 1;
> >                               QUERY PLAN
> > ----------------------------------------------------------------------
> >  Index Scan using j1_i_im100 on j1  (cost=0.43..8.46 rows=1 width=16)
> >    Index Cond: (i = 100)
> >    Filter: (im5 = 1)
> > (3 rows)
> >
> > At this case, optimizer can estimate there are only 1 row to return, so
> both
> > indexes have same cost, which one will be choose is un-controlable. This
> is
> > fine for above query based on the estimation is accurate. However
> estimation
> > can't be always accurate in real life. Some inaccurate estimation can
> cause an
> > wrong index choose. As an experience, j1_i_im5 index should always be
> choose
> > for above query.
>
> I think we need a better example where choosing an index makes a
> difference.
>
> An index can be chosen just because it's path was created before some
> other more appropriate index but the cost difference was within fuzzy
> limit. Purely based on the order in which index paths are created.
>

Here is an further example with the above case:

demo=# insert into j1 select 1, 1, 1, 1 from generate_series(1, 100000)i;
INSERT 0 100000

With the current implementation, it is

demo=# explain analyze select * from j1 where i = 1 and im5 = 2;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Index Scan using j1_i_im100 on j1  (cost=0.43..8.44 rows=1 width=16)
(actual time=63.431..63.431 rows=0 loops=1)
   Index Cond: (i = 1)
   Filter: (im5 = 2)
   Rows Removed by Filter: 100001
 Planning Time: 0.183 ms
 Execution Time: 63.484 ms
(6 rows)

With the patch above, it can always choose a correct index even the
statistics is inaccurate:

demo=# explain analyze select * from j1 where i = 1 and im5 = 2;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Index Scan using j1_i_im5 on j1  (cost=0.43..8.46 rows=1 width=16) (actual
time=0.030..0.030 rows=0 loops=1)
   Index Cond: ((i = 1) AND (im5 = 2))
 Planning Time: 1.087 ms
 Execution Time: 0.077 ms
(4 rows)

-- 
Best Regards
Andy Fan

Reply via email to