Hi, Jian! Thank you for your work on this topic!

On 28.10.2024 10:19, jian he wrote:
  * NOTE:  returns NULL if clause is an OR or AND clause; it is the
  * responsibility of higher-level routines to cope with those.
  */
static IndexClause *
match_clause_to_indexcol(PlannerInfo *root,
                          RestrictInfo *rinfo,
                          int indexcol,
                          IndexOptInfo *index)

the above comments need a slight change.


EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE  (thousand = 1
OR thousand = 3);
                         QUERY PLAN
-----------------------------------------------------------
  Bitmap Heap Scan on tenk2
    Recheck Cond: ((thousand = 1) OR (thousand = 3))
    ->  Bitmap Index Scan on tenk2_thous_tenthous
          Index Cond: (thousand = ANY ('{1,3}'::integer[]))

EXPLAIN (COSTS OFF, settings) SELECT * FROM tenk2 WHERE  (thousand in (1,3));
                         QUERY PLAN
-----------------------------------------------------------
  Bitmap Heap Scan on tenk2
    Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
    ->  Bitmap Index Scan on tenk2_thous_tenthous
          Index Cond: (thousand = ANY ('{1,3}'::integer[]))

tenk2 index:
Indexes:
     "tenk2_thous_tenthous" btree (thousand, tenthous)

Looking at the above cases, I found out the "Recheck Cond" is
different from "Index Cond".
I wonder why there is a difference, or if they should be the same.
then i come to:
match_orclause_to_indexcol

     /*
      * Finally, build an IndexClause based on the SAOP node. Use
      * make_simple_restrictinfo() to get RestrictInfo with clean selectivity
      * estimations because it may differ from the estimation made for an OR
      * clause. Although it is not a lossy expression, keep the old version of
      * rinfo in iclause->rinfo to detect duplicates and recheck the original
      * clause.
      */
     iclause = makeNode(IndexClause);
     iclause->rinfo = rinfo;
     iclause->indexquals = list_make1(make_simple_restrictinfo(root,
                                                               &saopexpr->xpr));
     iclause->lossy = false;
     iclause->indexcol = indexcol;
     iclause->indexcols = NIL;

looking at create_bitmap_scan_plan.
I think "iclause->rinfo" itself won't be able to detect duplicates.
since the upper code would mostly use "iclause->indexquals" for comparison?


typedef struct IndexClause comments says:
"
  * indexquals is a list of RestrictInfos for the directly-usable index
  * conditions associated with this IndexClause.  In the simplest case
  * it's a one-element list whose member is iclause->rinfo.  Otherwise,
  * it contains one or more directly-usable indexqual conditions extracted
  * from the given clause.  The 'lossy' flag indicates whether the
  * indexquals are semantically equivalent to the original clause, or
  * represent a weaker condition.
"
should lossy be iclause->lossy be true at the end of match_orclause_to_indexcol?
since it meets the comment condition: "semantically equivalent to the
original clause"
or is the above comment slightly wrong?

in match_orclause_to_indexcol
i changed from
iclause->rinfo = rinfo;
to
  iclause->rinfo = make_simple_restrictinfo(root,
                                                 &saopexpr->xpr);

as expected. now the "Recheck Cond" is same as "Index Cond"
    Recheck Cond: (thousand = ANY ('{1,3}'::integer[]))
    ->  Bitmap Index Scan on tenk2_thous_tenthous
          Index Cond: (thousand = ANY ('{1,3}'::integer[]))

I am not sure of the implication of this change.

I may be wrong, but the original idea was to double-check the result with the original expression.

But I'm willing to agree with you. I think we should add transformed rinfo variable through add_predicate_to_index_quals function. I attached the diff file to the letter.

diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3da7ea8ed57..c68ac7008e6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3463,10 +3463,11 @@ match_orclause_to_indexcol(PlannerInfo *root,
      * rinfo in iclause->rinfo to detect duplicates and recheck the original
      * clause.
      */
+    RestrictInfo *rinfo_new = make_simple_restrictinfo(root,
+ &saopexpr->xpr);
     iclause = makeNode(IndexClause);
-    iclause->rinfo = rinfo;
-    iclause->indexquals = list_make1(make_simple_restrictinfo(root,
- &saopexpr->xpr));
+    iclause->rinfo = rinfo_new;
+    iclause->indexquals = add_predicate_to_index_quals(index, list_make1(rinfo_new));
     iclause->lossy = false;
     iclause->indexcol = indexcol;
     iclause->indexcols = NIL;


I figured out comments that you mentioned and found some addition explanation.

As I understand it, this processing is related to ensuring that the selectivity of the index is assessed correctly and that there is no underestimation, which can lead to the selection of a partial index in the plan. See comment for the add_predicate_to_index_quals function:

* ANDing the index predicate with the explicitly given indexquals produces
 * a more accurate idea of the index's selectivity. *However, we need to be
 * careful not to insert redundant clauses, because clauselist_selectivity()
 * is easily fooled into computing a too-low selectivity estimate*.  Our
 * approach is to add only the predicate clause(s) that cannot be proven to
 * be implied by the given indexquals.  This successfully handles cases such
 * as a qual "x = 42" used with a partial index "WHERE x >= 40 AND x < 50".
 * There are many other cases where we won't detect redundancy, leading to a  * too-low selectivity estimate, which will bias the system in favor of using
 * partial indexes where possible.  That is not necessarily bad though.
 *
 * *Note that indexQuals contains RestrictInfo nodes while the indpred
 * does not, so the output list will be mixed.  This is OK for both
 * predicate_implied_by() and clauselist_selectivity()*, but might be
 * problematic if the result were passed to other things.
 */

In those comments that you mentioned, it was written that this problem of expression redundancy is checked using the predicate_implied_by function, note that it is called there.

* In some situations (particularly with OR'd index conditions) we may * have scan_clauses that are not equal to, but are logically implied by, * the index quals; so we also try a predicate_implied_by() check to see * if we can discard quals that way. (predicate_implied_by assumes its * first input contains only immutable functions, so we have to check * that.)


I also figured out more information about loosy variable. First of all, I tried changing the value of the variable and did not notice any difference in regression tests. As I understood, our transformation is completely equivalent, so loosy should be true. But I don't think they are needed since our expressions are equivalent. I thought for a long time about an example where this could be a mistake and didn’t come up with any of them.

--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3da7ea8ed57..e72a4fb15dd 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3242,6 +3242,7 @@ match_orclause_to_indexcol(PlannerInfo *root,
 	Oid			inputcollid = InvalidOid;
 	bool		firstTime = true;
 	bool		have_param = false;
+	RestrictInfo *rinfo_new;
 
 	Assert(IsA(orclause, BoolExpr));
 	Assert(orclause->boolop == OR_EXPR);
@@ -3463,11 +3464,12 @@ match_orclause_to_indexcol(PlannerInfo *root,
 	 * rinfo in iclause->rinfo to detect duplicates and recheck the original
 	 * clause.
 	 */
+	rinfo_new = make_simple_restrictinfo(root,
+                                                &saopexpr->xpr);
 	iclause = makeNode(IndexClause);
-	iclause->rinfo = rinfo;
-	iclause->indexquals = list_make1(make_simple_restrictinfo(root,
-															  &saopexpr->xpr));
-	iclause->lossy = false;
+	iclause->rinfo = rinfo_new;
+	iclause->indexquals = add_predicate_to_index_quals(index, list_make1(rinfo_new));
+	iclause->lossy = true;
 	iclause->indexcol = indexcol;
 	iclause->indexcols = NIL;
 	return iclause;
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index b003492c5c8..3c46484d642 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1878,10 +1878,10 @@ SELECT * FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3 OR tenthous = 42 OR tenthous IS NULL);
-                                                                QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
+                                                          QUERY PLAN                                                          
+------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on tenk1
-   Recheck Cond: (((thousand = 42) AND (tenthous IS NULL)) OR ((thousand = 42) AND ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42))))
+   Recheck Cond: (((thousand = 42) AND (tenthous IS NULL)) OR ((thousand = 42) AND (tenthous = ANY ('{1,3,42}'::integer[]))))
    Filter: ((tenthous = 1) OR (tenthous = 3) OR (tenthous = 42) OR (tenthous IS NULL))
    ->  BitmapOr
          ->  Bitmap Index Scan on tenk1_thous_tenthous
@@ -1917,10 +1917,10 @@ SELECT * FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1::int2 OR tenthous = 3::int8 OR tenthous = 42::int8);
-                                                                     QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                            QUERY PLAN                                                             
+-----------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on tenk1
-   Recheck Cond: (((thousand = 42) AND ((tenthous = '3'::bigint) OR (tenthous = '42'::bigint))) OR ((thousand = 42) AND (tenthous = '1'::smallint)))
+   Recheck Cond: (((thousand = 42) AND (tenthous = ANY ('{3,42}'::bigint[]))) OR ((thousand = 42) AND (tenthous = '1'::smallint)))
    Filter: ((tenthous = '1'::smallint) OR (tenthous = '3'::bigint) OR (tenthous = '42'::bigint))
    ->  BitmapOr
          ->  Bitmap Index Scan on tenk1_thous_tenthous
@@ -1932,11 +1932,11 @@ SELECT * FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand = 42 OR thousand = 99);
-                                   QUERY PLAN                                    
----------------------------------------------------------------------------------
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: ((hundred = 42) AND ((thousand = 42) OR (thousand = 99)))
+         Recheck Cond: ((hundred = 42) AND (thousand = ANY ('{42,99}'::integer[])))
          ->  BitmapAnd
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = 42)
@@ -1991,11 +1991,11 @@ SELECT * FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand = 42 OR thousand = 99);
-                                   QUERY PLAN                                    
----------------------------------------------------------------------------------
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: ((hundred = 42) AND ((thousand = 42) OR (thousand = 99)))
+         Recheck Cond: ((hundred = 42) AND (thousand = ANY ('{42,99}'::integer[])))
          ->  BitmapAnd
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = 42)
@@ -2013,11 +2013,11 @@ SELECT count(*) FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand < 42 OR thousand < 99 OR 43 > thousand OR 42 > thousand);
-                                                      QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
+                                        QUERY PLAN                                        
+------------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: ((hundred = 42) AND ((thousand < 42) OR (thousand < 99) OR (43 > thousand) OR (42 > thousand)))
+         Recheck Cond: ((hundred = 42) AND (thousand < ANY ('{42,99,43,42}'::integer[])))
          ->  BitmapAnd
                ->  Bitmap Index Scan on tenk1_hundred
                      Index Cond: (hundred = 42)
@@ -2035,11 +2035,11 @@ SELECT count(*) FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41;
-                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
+                                               QUERY PLAN                                               
+--------------------------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: (((thousand = 42) AND ((tenthous = 1) OR (tenthous = 3))) OR (thousand = 41))
+         Recheck Cond: (((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[]))) OR (thousand = 41))
          ->  BitmapOr
                ->  Bitmap Index Scan on tenk1_thous_tenthous
                      Index Cond: ((thousand = 42) AND (tenthous = ANY ('{1,3}'::integer[])))
@@ -2057,11 +2057,11 @@ SELECT count(*) FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous < 2) OR thousand = 41;
-                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
+                                                         QUERY PLAN                                                          
+-----------------------------------------------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: (((hundred = 42) AND (((thousand = 42) OR (thousand = 99)) OR (tenthous < 2))) OR (thousand = 41))
+         Recheck Cond: (((hundred = 42) AND ((thousand = ANY ('{42,99}'::integer[])) OR (tenthous < 2))) OR (thousand = 41))
          Filter: (((hundred = 42) AND ((thousand = 42) OR (thousand = 99) OR (tenthous < 2))) OR (thousand = 41))
          ->  BitmapOr
                ->  BitmapAnd
@@ -2086,11 +2086,11 @@ SELECT count(*) FROM tenk1
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE hundred = 42 AND (thousand = 42 OR thousand = 41 OR thousand = 99 AND tenthous = 2);
-                                                        QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
+                                                          QUERY PLAN                                                          
+------------------------------------------------------------------------------------------------------------------------------
  Aggregate
    ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: ((hundred = 42) AND (((thousand = 99) AND (tenthous = 2)) OR ((thousand = 42) OR (thousand = 41))))
+         Recheck Cond: ((hundred = 42) AND (((thousand = 99) AND (tenthous = 2)) OR (thousand = ANY ('{42,41}'::integer[]))))
          Filter: ((thousand = 42) OR (thousand = 41) OR ((thousand = 99) AND (tenthous = 2)))
          ->  BitmapAnd
                ->  Bitmap Index Scan on tenk1_hundred
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index ebf2e3f851a..ea83578ccf1 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -4348,7 +4348,7 @@ select * from tenk1 a join tenk1 b on
                      Index Cond: (unique1 = 2)
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
-               Recheck Cond: (((unique2 = 3) OR (unique2 = 7)) OR (unique1 = 1))
+               Recheck Cond: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1))
                Filter: ((unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique2
@@ -4374,7 +4374,7 @@ select * from tenk1 a join tenk1 b on
                      Index Cond: (unique1 = 2)
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
-               Recheck Cond: (((unique2 = 3) OR (unique2 = 7)) OR (unique1 = 1))
+               Recheck Cond: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 1))
                Filter: ((unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique2
@@ -4394,7 +4394,7 @@ select * from tenk1 a join tenk1 b on
    ->  Seq Scan on tenk1 b
    ->  Materialize
          ->  Bitmap Heap Scan on tenk1 a
-               Recheck Cond: (((unique2 = 3) OR (unique2 = 7)) OR ((unique1 = 3) OR (unique1 = 1)) OR (unique1 < 20))
+               Recheck Cond: ((unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = ANY ('{3,1}'::integer[])) OR (unique1 < 20))
                Filter: ((unique1 < 20) OR (unique1 = 3) OR (unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                ->  BitmapOr
                      ->  Bitmap Index Scan on tenk1_unique2

Reply via email to