Hi hackers,
Postgres optimizer is not able to build efficient execution plan for the
following query:
explain select * from people_raw where not ("ID"<2068113880 AND "INN"
is not null) and "ID"<=2068629726 AND "INN" is not null;
QUERY PLAN
--------------------------------------------------------------------------------------------
Bitmap Heap Scan on people_raw (cost=74937803.72..210449640.49
rows=121521030 width=336)
Recheck Cond: ("ID" <= 2068629726)
Filter: (("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS
NULL)))
-> Bitmap Index Scan on "People_pkey" (cost=0.00..74907423.47
rows=2077021718 width=0)
Index Cond: ("ID" <= 2068629726)
(5 rows)
Here the table is very large, but query effects only relatively small
number of rows located in the range: [2068113880,2068629726]
But unfortunately optimizer doesn't take it into the account.
Moreover, using "is not null" and "not null" is both operands of AND is
not smart:
(("INN" IS NOT NULL) AND (("ID" >= 2068113880) OR ("INN" IS NULL)))
If I remove "is not null" condition, then plan is perfect:
explain select * from people_raw where not ("ID"<2068113880) and
"ID"<=2068629726;
QUERY PLAN
--------------------------------------------------------------------------------------------
Index Scan using "People_pkey" on people_raw (cost=0.58..196745.57
rows=586160 width=336)
Index Cond: (("ID" >= 2068113880) AND ("ID" <= 2068629726))
(2 rows)
Before starting investigation of the problem, I will like to know
opinion and may be some advise of people familiar with optimizer:
how difficult will be to handle this case and where to look.
Thanks in advance,
--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company