Renzo Bertuzzi wrote: > I have a strange problem with a query where the planner only uses and index > if I use a constant value, but if I use a subquery it will prefer a seq scan. > > I have table "sample_table" with columns id serial primary key, and int_flag, > with an index on int_flag. > I inserted 240387 values with int_flag=1 and 1 value with int_flag=2 > > so the table has 240388 total rows, the last row of the table has int_flag=2 > > If I execute this query, the planner chooses the index: > > explain (analyze ,verbose,buffers) > SELECT id > FROM sample_table > WHERE > int_flag = any((array[2])::int[]) > > QUERY PLAN > > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Index Scan using index_sample_table_int_flag_ix on public.sample_table > (cost=0.42..39.86 rows=1 width=8) (actual time=0.023..0.023 rows=1 loops=1) > Output: id > > > Index Cond: (sample_table.int_flag = ANY ('{2}'::integer[])) > > > Buffers: shared hit=28 > > > Planning time: 0.087 ms > > > Execution time: 0.046 ms > > > but if I slightly change the query to: > > explain (analyze ,verbose,buffers) > SELECT id > FROM sample_table > WHERE > int_flag = any((select array[2])::int[]) > > now postgres will do a seq scan. > I have run vacuum and analyze but the result is the same. > > QUERY PLAN > > --------------------------------------------------------------------------------------------------------------------------------------- > > Seq Scan on public.sample_table (cost=0.01..8843.74 rows=240388 width=8) > (actual time=44.993..44.995 rows=1 loops=1) > Output: id > > Filter: (sample_table.int_flag = ANY ($0)) > > Rows Removed by Filter: 240387 > > Buffers: shared hit=3435 > > InitPlan 1 (returns $0) > > -> Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.002..0.003 > rows=1 loops=1) > Output: '{2}'::integer[] > > Planning time: 0.092 ms > > Execution time: 45.017 ms > > I suppose postgres prefers a seq scan because it treats the subquery as a > non-deterministic > value while in the first case the planner has all the values before hand??? > > I'm using PostgreSQL 9.6.6 on x86_64-pc-linux-gnu, compiled by gcc (Debian > 4.9.2-10) 4.9.2, 64-bit > > PS: This is a simplified query, the actual query will use another small table > to build the array > with less than 100 values and sample_table can have up to 5 millions entries. > I have tried using a CTE with the array, but it still will do a seq scan.
In the second case, the optimizer does not think hard enough to figure out that it actually could know that the InitPlan has a result of 2, and with your real query it probably couldn't know for sure even if it tried hard. So it has to come up with a plan without knowing what the search values will be, and it chooses a sequential scan as the lesser evil, since it guesses that it will have to retrieve most of the tuples anyway. Maybe you can write your query as a join instead. Yours, Laurenz Albe