Hi, Jian! Thank you for your work on this topic!
On 28.10.2024 10:19, jian he wrote:
* NOTE: returns NULL if clause is an OR or AND clause; it is the
* responsibility of higher-level routines to cope with those.
*/
static IndexClause *
match_clause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
IndexOptInfo *index)
the above comments need a slight change.
EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand = 1
OR thousand = 3);
QUERY PLAN
-----------------------------------------------------------
Bitmap Heap Scan on tenk2
Recheck Cond: ((thousand = 1) OR (thousand = 3))
-> Bitmap Index Scan on tenk2_thous_tenthous
Index Cond: (thousand = ANY ('{1,3}'::integer[]))
EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE (thousand in (1,3));
QUERY PLAN
-----------------------------------------------------------
Bitmap Heap Scan on tenk2
Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
-> Bitmap Index Scan on tenk2_thous_tenthous
Index Cond: (thousand = ANY ('{1,3}'::integer[]))
tenk2 index:
Indexes:
"tenk2_thous_tenthous" btree (thousand, tenthous)
Looking at the above cases, I found out the "Recheck Cond" is
different from "Index Cond".
I wonder why there is a difference, or if they should be the same.
then i come to:
match_orclause_to_indexcol
/*
* Finally, build an IndexClause based on the SAOP node. Use
* make_simple_restrictinfo() to get RestrictInfo with clean selectivity
* estimations because it may differ from the estimation made for an OR
* clause. Although it is not a lossy expression, keep the old version of
* rinfo in iclause->rinfo to detect duplicates and recheck the original
* clause.
*/
iclause = makeNode(IndexClause);
iclause->rinfo = rinfo;
iclause->indexquals = list_make1(make_simple_restrictinfo(root,
&saopexpr->xpr));
iclause->lossy = false;
iclause->indexcol = indexcol;
iclause->indexcols = NIL;
looking at create_bitmap_scan_plan.
I think "iclause->rinfo" itself won't be able to detect duplicates.
since the upper code would mostly use "iclause->indexquals" for comparison?
typedef struct IndexClause comments says:
"
* indexquals is a list of RestrictInfos for the directly-usable index
* conditions associated with this IndexClause. In the simplest case
* it's a one-element list whose member is iclause->rinfo. Otherwise,
* it contains one or more directly-usable indexqual conditions extracted
* from the given clause. The 'lossy' flag indicates whether the
* indexquals are semantically equivalent to the original clause, or
* represent a weaker condition.
"
should lossy be iclause->lossy be true at the end of match_orclause_to_indexcol?
since it meets the comment condition: "semantically equivalent to the
original clause"
or is the above comment slightly wrong?
in match_orclause_to_indexcol
i changed from
iclause->rinfo = rinfo;
to
iclause->rinfo = make_simple_restrictinfo(root,
&saopexpr->xpr);
as expected. now the "Recheck Cond" is same as "Index Cond"
Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
-> Bitmap Index Scan on tenk2_thous_tenthous
Index Cond: (thousand = ANY ('{1,3}'::integer[]))
I am not sure of the implication of this change.
I may be wrong, but the original idea was to double-check the result
with the original expression.
But I'm willing to agree with you. I think we should add transformed
rinfo variable through add_predicate_to_index_quals function. I attached
the diff file to the letter.
diff --git a/src/backend/optimizer/path/indxpath.c
b/src/backend/optimizer/path/indxpath.c
index 3da7ea8ed57..c68ac7008e6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3463,10 +3463,11 @@ match_orclause_to_indexcol(PlannerInfo *root,
* rinfo in iclause->rinfo to detect duplicates and recheck the
original
* clause.
*/
+ RestrictInfo *rinfo_new = make_simple_restrictinfo(root,
+ &saopexpr->xpr);
iclause = makeNode(IndexClause);
- iclause->rinfo = rinfo;
- iclause->indexquals = list_make1(make_simple_restrictinfo(root,
- &saopexpr->xpr));
+ iclause->rinfo = rinfo_new;
+ iclause->indexquals = add_predicate_to_index_quals(index,
list_make1(rinfo_new));
iclause->lossy = false;
iclause->indexcol = indexcol;
iclause->indexcols = NIL;
I figured out comments that you mentioned and found some addition
explanation.
As I understand it, this processing is related to ensuring that the
selectivity of the index is assessed correctly and that there is no
underestimation, which can lead to the selection of a partial index in
the plan. See comment for the add_predicate_to_index_quals function:
* ANDing the index predicate with the explicitly given indexquals produces
* a more accurate idea of the index's selectivity. *However, we need to be
* careful not to insert redundant clauses, because
clauselist_selectivity()
* is easily fooled into computing a too-low selectivity estimate*. Our
* approach is to add only the predicate clause(s) that cannot be proven to
* be implied by the given indexquals. This successfully handles cases
such
* as a qual "x = 42" used with a partial index "WHERE x >= 40 AND x < 50".
* There are many other cases where we won't detect redundancy, leading
to a
* too-low selectivity estimate, which will bias the system in favor of
using
* partial indexes where possible. That is not necessarily bad though.
*
* *Note that indexQuals contains RestrictInfo nodes while the indpred
* does not, so the output list will be mixed. This is OK for both
* predicate_implied_by() and clauselist_selectivity()*, but might be
* problematic if the result were passed to other things.
*/
In those comments that you mentioned, it was written that this problem
of expression redundancy is checked using the predicate_implied_by
function, note that it is called there.
* In some situations (particularly with OR'd index conditions) we may *
have scan_clauses that are not equal to, but are logically implied by, *
the index quals; so we also try a predicate_implied_by() check to see *
if we can discard quals that way. (predicate_implied_by assumes its *
first input contains only immutable functions, so we have to check * that.)
I also figured out more information about loosy variable. First of all,
I tried changing the value of the variable and did not notice any
difference in regression tests. As I understood, our transformation is
completely equivalent, so loosy should be true. But I don't think they
are needed since our expressions are equivalent. I thought for a long
time about an example where this could be a mistake and didn’t come up
with any of them.
--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3da7ea8ed57..e72a4fb15dd 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3242,6 +3242,7 @@ match_orclause_to_indexcol(PlannerInfo *root,
Oid inputcollid = InvalidOid;
bool firstTime = true;
bool have_param = false;
+ RestrictInfo *rinfo_new;
Assert(IsA(orclause, BoolExpr));
Assert(orclause->boolop == OR_EXPR);
@@ -3463,11 +3464,12 @@ match_orclause_to_indexcol(PlannerInfo *root,
* rinfo in iclause->rinfo to detect duplicates and recheck the original
* clause.
*/
+ rinfo_new = make_simple_restrictinfo(root,
+ &saopexpr->xpr);
iclause = makeNode(IndexClause);
- iclause->rinfo = rinfo;
- iclause->indexquals = list_make1(make_simple_restrictinfo(root,
- &saopexpr->xpr));
- iclause->lossy = false;
+ iclause->rinfo = rinfo_new;
+ iclause->indexquals = add_predicate_to_index_quals(index, list_make1(rinfo_new));
+ iclause->lossy = true;
iclause->indexcol = indexcol;
iclause->indexcols = NIL;
return iclause;
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index b003492c5c8..3c46484d642 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1878,10 +1878,10 @@ SELECT * FROM tenk1
EXPLAIN (COSTS OFF)
SELECT * FROM tenk1
WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42 OR tenthous IS NULL);
- QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tenk1
- Recheck Cond: (((thousand = 42) AND (tenthous IS NULL)) OR ((thousand = 42) AND ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))))
+ Recheck Cond: (((thousand = 42) AND (tenthous IS NULL)) OR ((thousand = 42) AND (tenthous = ANY ('{1,3,42}'::integer[]))))
Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42) OR (tenthous IS NULL))
-> BitmapOr
-> Bitmap Index Scan on tenk1_thous_tenthous
@@ -1917,10 +1917,10 @@ SELECT * FROM tenk1
EXPLAIN (COSTS OFF)
SELECT * FROM tenk1
WHERE thousand = 42 AND (tenthous = 1::int2 OR tenthous = 3::int8 OR tenthous = 42::int8);
- QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tenk1
- Recheck Cond: (((thousand = 42) AND ((tenthous = '3'::bigint) OR (tenthous = '42'::bigint))) OR ((thousand = 42) AND (tenthous = '1'::smallint)))
+ Recheck Cond: (((thousand = 42) AND (tenthous = ANY ('{3,42}'::bigint[]))) OR ((thousand = 42) AND (tenthous = '1'::smallint)))
Filter: ((tenthous = '1'::smallint) OR (tenthous = '3'::bigint) OR (tenthous = '42'::bigint))
-> BitmapOr
-> Bitmap Index Scan on tenk1_thous_tenthous
@@ -1932,11 +1932,11 @@ SELECT * FROM tenk1
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE hundred = 42 AND (thousand = 42 OR thousand = 99);
- QUERY PLAN
----------------------------------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------------------------------------
Aggregate
-> Bitmap Heap Scan on tenk1
- Recheck Cond: ((hundred = 42) AND ((thousand = 42) OR (thousand = 99)))
+ Recheck Cond: ((hundred = 42) AND (thousand = ANY ('{42,99}'::integer[])))
-> BitmapAnd
-> Bitmap Index Scan on tenk1_hundred
Index Cond: (hundred = 42)
@@ -1991,11 +1991,11 @@ SELECT * FROM tenk1
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE hundred = 42 AND (thousand = 42 OR thousand = 99);
- QUERY PLAN
----------------------------------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------------------------------------
Aggregate
-> Bitmap Heap Scan on tenk1
- Recheck Cond: ((hundred = 42) AND ((thousand = 42) OR (thousand = 99)))
+ Recheck Cond: ((hundred = 42) AND (thousand = ANY ('{42,99}'::integer[])))
-> BitmapAnd
-> Bitmap Index Scan on tenk1_hundred
Index Cond: (hundred = 42)
@@ -2013,11 +2013,11 @@ SELECT count(*) FROM tenk1
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE hundred = 42 AND (thousand < 42 OR thousand < 99 OR 43 > thousand OR 42 > thousand);
- QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------------------------------------------
Aggregate
-> Bitmap Heap Scan on tenk1
- Recheck Cond: ((hundred = 42) AND ((thousand < 42) OR (thousand < 99) OR (43 > thousand) OR (42 > thousand)))
+ Recheck Cond: ((hundred = 42) AND (thousand < ANY ('{42,99,43,42}'::integer[])))
-> BitmapAnd
-> Bitmap Index Scan on tenk1_hundred
Index Cond: (hundred = 42)
@@ -2035,11 +2035,11 @@ SELECT count(*) FROM tenk1
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41;
- QUERY PLAN
------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------
Aggregate
-> Bitmap Heap Scan on tenk1
- Recheck Cond: (((thousand = 42) AND ((tenthous = 1) OR (tenthous = 3))) OR (thousand = 41))
+ Recheck Cond: (((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[]))) OR (thousand = 41))
-> BitmapOr
-> Bitmap Index Scan on tenk1_thous_tenthous
Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[])))
@@ -2057,11 +2057,11 @@ SELECT count(*) FROM tenk1
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous < 2) OR thousand = 41;
- QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------
Aggregate
-> Bitmap Heap Scan on tenk1
- Recheck Cond: (((hundred = 42) AND (((thousand = 42) OR (thousand = 99)) OR (tenthous < 2))) OR (thousand = 41))
+ Recheck Cond: (((hundred = 42) AND ((thousand = ANY ('{42,99}'::integer[])) OR (tenthous < 2))) OR (thousand = 41))
Filter: (((hundred = 42) AND ((thousand = 42) OR (thousand = 99) OR (tenthous < 2))) OR (thousand = 41))
-> BitmapOr
-> BitmapAnd
@@ -2086,11 +2086,11 @@ SELECT count(*) FROM tenk1
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE hundred = 42 AND (thousand = 42 OR thousand = 41 OR thousand = 99 AND tenthous = 2);
- QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------
Aggregate
-> Bitmap Heap Scan on tenk1
- Recheck Cond: ((hundred = 42) AND (((thousand = 99) AND (tenthous = 2)) OR ((thousand = 42) OR (thousand = 41))))
+ Recheck Cond: ((hundred = 42) AND (((thousand = 99) AND (tenthous = 2)) OR (thousand = ANY ('{42,41}'::integer[]))))
Filter: ((thousand = 42) OR (thousand = 41) OR ((thousand = 99) AND (tenthous = 2)))
-> BitmapAnd
-> Bitmap Index Scan on tenk1_hundred
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index ebf2e3f851a..ea83578ccf1 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4348,7 +4348,7 @@ select * from tenk1 a join tenk1 b on
Index Cond: (unique1 = 2)
-> Materialize
-> Bitmap Heap Scan on tenk1 a
- Recheck Cond: (((unique2 = 3) OR (unique2 = 7)) OR (unique1 = 1))
+ Recheck Cond: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1))
Filter: ((unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
-> BitmapOr
-> Bitmap Index Scan on tenk1_unique2
@@ -4374,7 +4374,7 @@ select * from tenk1 a join tenk1 b on
Index Cond: (unique1 = 2)
-> Materialize
-> Bitmap Heap Scan on tenk1 a
- Recheck Cond: (((unique2 = 3) OR (unique2 = 7)) OR (unique1 = 1))
+ Recheck Cond: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1))
Filter: ((unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
-> BitmapOr
-> Bitmap Index Scan on tenk1_unique2
@@ -4394,7 +4394,7 @@ select * from tenk1 a join tenk1 b on
-> Seq Scan on tenk1 b
-> Materialize
-> Bitmap Heap Scan on tenk1 a
- Recheck Cond: (((unique2 = 3) OR (unique2 = 7)) OR ((unique1 = 3) OR (unique1 = 1)) OR (unique1 < 20))
+ Recheck Cond: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = ANY ('{3,1}'::integer[])) OR (unique1 < 20))
Filter: ((unique1 < 20) OR (unique1 = 3) OR (unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
-> BitmapOr
-> Bitmap Index Scan on tenk1_unique2