On Tue, May 26, 2020 at 9:59 PM Ashutosh Bapat <ashutosh.bapat....@gmail.com> wrote:
> On Tue, May 26, 2020 at 1:52 PM Andy Fan <zhihui.fan1...@gmail.com> wrote: > > > > > > Consider the below example: > > > > create table j1(i int, im5 int, im100 int, im1000 int); > > insert into j1 select i, i%5, i%100, i%1000 from generate_series(1, > 10000000)i; > > create index j1_i_im5 on j1(i, im5); > > create index j1_i_im100 on j1(i, im100); > > analyze j1; > > explain select * from j1 where i = 100 and im5 = 5; > > > > We may get the plan like this: > > > > demo=# explain select * from j1 where i = 100 and im5 = 1; > > QUERY PLAN > > ---------------------------------------------------------------------- > > Index Scan using j1_i_im100 on j1 (cost=0.43..8.46 rows=1 width=16) > > Index Cond: (i = 100) > > Filter: (im5 = 1) > > (3 rows) > > > > At this case, optimizer can estimate there are only 1 row to return, so > both > > indexes have same cost, which one will be choose is un-controlable. This > is > > fine for above query based on the estimation is accurate. However > estimation > > can't be always accurate in real life. Some inaccurate estimation can > cause an > > wrong index choose. As an experience, j1_i_im5 index should always be > choose > > for above query. > > I think we need a better example where choosing an index makes a > difference. > > An index can be chosen just because it's path was created before some > other more appropriate index but the cost difference was within fuzzy > limit. Purely based on the order in which index paths are created. > Here is an further example with the above case: demo=# insert into j1 select 1, 1, 1, 1 from generate_series(1, 100000)i; INSERT 0 100000 With the current implementation, it is demo=# explain analyze select * from j1 where i = 1 and im5 = 2; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Index Scan using j1_i_im100 on j1 (cost=0.43..8.44 rows=1 width=16) (actual time=63.431..63.431 rows=0 loops=1) Index Cond: (i = 1) Filter: (im5 = 2) Rows Removed by Filter: 100001 Planning Time: 0.183 ms Execution Time: 63.484 ms (6 rows) With the patch above, it can always choose a correct index even the statistics is inaccurate: demo=# explain analyze select * from j1 where i = 1 and im5 = 2; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Index Scan using j1_i_im5 on j1 (cost=0.43..8.46 rows=1 width=16) (actual time=0.030..0.030 rows=0 loops=1) Index Cond: ((i = 1) AND (im5 = 2)) Planning Time: 1.087 ms Execution Time: 0.077 ms (4 rows) -- Best Regards Andy Fan