Hi hackers,

Postgres optimizer is not able to build efficient execution plan for the following query:

explain select * from  people_raw where not ("ID"<2068113880 AND "INN" is not null) and "ID"<=2068629726 AND "INN" is not null;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Bitmap Heap Scan on people_raw  (cost=74937803.72..210449640.49 rows=121521030 width=336)
   Recheck Cond: ("ID" <= 2068629726)
   Filter: (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS NULL)))    ->  Bitmap Index Scan on "People_pkey" (cost=0.00..74907423.47 rows=2077021718 width=0)
         Index Cond: ("ID" <= 2068629726)
(5 rows)


Here the table is very large, but query effects only relatively small number of rows located in the range: [2068113880,2068629726]
But unfortunately optimizer doesn't take it into the account.
Moreover, using "is not null" and "not null" is both operands of AND is not smart:
     (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS NULL)))

If I remove "is not null" condition, then plan is perfect:

explain select * from  people_raw where not ("ID"<2068113880) and "ID"<=2068629726;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Index Scan using "People_pkey" on people_raw  (cost=0.58..196745.57 rows=586160 width=336)
   Index Cond: (("ID" >= 2068113880) AND ("ID" <= 2068629726))
(2 rows)

Before starting  investigation of the problem, I will like to know opinion and may be some advise of people familiar with optimizer:
how difficult will be to handle this case and where to look.

Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Reply via email to