Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Alena Rybakina Wed, 02 Apr 2025 07:34:10 -0700

On 01.04.2025 17:23, Alena Rybakina wrote:

Hi, Alexander!

On 01.04.2025 15:07, Alexander Korotkov wrote:
Hi, Alena!
On Tue, Apr 1, 2025 at 2:11 AM Alena Rybakina<a.rybak...@postgrespro.ru> wrote:
    4.1) explain analyze SELECT ten

    FROM onek t WHERE unique1 IN ( VALUES (0), ((2 IN ( SELECT
    unique2 FROM onek c WHERE c.unique2 in
    ((values(0),(2))))::integer)) );

    QUERY PLAN
    
-------------------------------------------------------------------------------------------------------------
    Seq Scan on onek t (cost=180.11..410.25 rows=2 width=6) (actual
    time=5.014..13.256 rows=3.00 loops=1) Filter: (unique1 = ANY
    (ARRAY[0, ((ANY (2 = (hashed SubPlan 1).col1)))::integer])) Rows
    Removed by Filter: 10005 *Buffers: shared hit=110* SubPlan 1 ->
    Seq Scan on onek c (cost=0.00..180.10 rows=3 width=4) (actual
    time=0.022..4.951 rows=2.00 loops=1) Filter: (unique2 = ANY
    ('{0,2}'::integer[])) Rows Removed by Filter: 10006 *Buffers:
    shared hit=55* Planning: Buffers: shared hit=6 dirtied=1 Planning
    Time: 0.502 ms Execution Time: 13.348 ms (13 rows)

    The query plan without our patch:

    
--------------------------------------------------------------------------------------------------------------------------------------------
    Hash Semi Join (cost=0.05..181.42 rows=2 width=6) (actual
    time=5.072..9.076 rows=3.00 loops=1) Hash Cond: (t.unique1 =
    "*VALUES*".column1) *Buffers: shared hit=55 read=55* -> Seq Scan
    on onek t (cost=0.00..155.08 rows=10008 width=10) (actual
    time=0.145..1.802 rows=10008.00 loops=1) *Buffers: shared hit=52
    read=3* -> Hash (cost=0.03..0.03 rows=2 width=4) (actual
    time=4.908..4.912 rows=2.00 loops=1) Buckets: 1024 Batches: 1
    Memory Usage: 9kB *Buffers: shared hit=3 read=52* -> Values Scan
    on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) (actual
    time=0.003..4.901 rows=2.00 loops=1) *Buffers: shared hit=3
    read=52* SubPlan 1 -> Hash Semi Join (cost=0.05..181.42 rows=2
    width=4) (actual time=0.036..4.861 rows=2.00 loops=1) Hash Cond:
    (c.unique2 = "*VALUES*_1".column1) *Buffers: shared hit=3
    read=52* -> Seq Scan on onek c (cost=0.00..155.08 rows=10008
    width=4) (actual time=0.009..2.120 rows=10008.00 loops=1)
    *Buffers: shared hit=3 read=52* -> Hash (cost=0.03..0.03 rows=2
    width=4) (actual time=0.006..0.008 rows=2.00 loops=1) Buckets:
    1024 Batches: 1 Memory Usage: 9kB -> Values Scan on "*VALUES*_1"
    (cost=0.00..0.03 rows=2 width=4) (actual time=0.001..0.002
    rows=2.00 loops=1) Planning: Buffers: shared hit=102 read=22
    Planning Time: 1.853 ms Execution Time: 9.281 ms (23 rows)


I think I managed to understand what is going on.
When we run a query with SOAP over a constant arraythen convert_saop_to_hashed_saop_walker() provides acceleration withhashing.
# explain analyze select * from test where val IN (5000, 4000, 9000,2000, 1000, 140050);
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..21925.00 rows=6 width=4) (actualtime=2.015..223.984 rows=6.00 loops=1)
   Filter: (val = ANY ('{5000,4000,9000,2000,1000,140050}'::integer[]))
   Rows Removed by Filter: 999994
   Buffers: shared hit=2228 read=2197
 Planning Time: 0.246 ms
 Execution Time: 224.036 ms
(6 rows)
But when there is expression or subselect, then hashing doesn't workand query becomes slower.
# explain analyze select * from test where val IN (5000, 4000, 9000,2000, 1000, (select 140050));
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.01..21925.01 rows=6 width=4) (actualtime=0.904..396.495 rows=6.00 loops=1) Filter: (val = ANY (ARRAY[5000, 4000, 9000, 2000, 1000, (InitPlan1).col1]))
   Rows Removed by Filter: 999994
   Buffers: shared hit=2292 read=2133
   InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actualtime=0.002..0.002 rows=1.00 loops=1)
 Planning Time: 0.160 ms
 Execution Time: 396.538 ms
(8 rows)

In contrast, hashing is always available with VALUES.
# explain analyze select * from test where val in (VALUES (5000),(4000), (9000), (2000), (1000), ((select 140050)));
 QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.16..17050.23 rows=6 width=4) (actualtime=1.589..225.061 rows=6.00 loops=1)
   Hash Cond: (test.val = "*VALUES*".column1)
   Buffers: shared hit=2356 read=2069
   InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actualtime=0.003..0.003 rows=1.00 loops=1) -> Seq Scan on test (cost=0.00..14425.00 rows=1000000 width=4)(actual time=0.460..91.912 rows=1000000.00 loops=1)
         Buffers: shared hit=2356 read=2069
-> Hash (cost=0.08..0.08 rows=6 width=4) (actualtime=0.049..0.050 rows=6.00 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
-> Values Scan on "*VALUES*" (cost=0.00..0.08 rows=6width=4) (actual time=0.009..0.032 rows=6.00 loops=1)
 Planning Time: 0.627 ms
 Execution Time: 225.155 ms
(12 rows)
I think we should allow our transformation only when the array isconstant (attached patchset).
Yes, I agree with your conclusions; however, I noticed that you didn’ttake Param-type variables into account.These still get executed during the VALUES -> ANY transformation (seeregression tests).
+PREPARE test2 (int,numeric, text) AS
+  SELECT ten FROM onek
+  WHERE sin(two)*four/($3::real) IN (VALUES (2), ($2), ($2), ($1));
+-- VTA forbidden because of unresolved casting of numeric parameterto common type
+EXPLAIN (COSTS OFF) EXECUTE test2(2, 2, '2');
+                                                         QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on onek
+ Filter: (((sin((two)::double precision) * (four)::doubleprecision) / '2'::real) = ANY ('{2,2,2,2}'::double precision[]))
+(2 rows)
+
+PREPARE test3 (int,int, text) AS
+  SELECT ten FROM onek
+  WHERE sin(two)*four/($3::real) IN (VALUES (2), ($2), ($2), ($1));
+EXPLAIN (COSTS OFF) EXECUTE test3(2, 2, '2');
+                                                         QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------
+ Seq Scan on onek
+ Filter: (((sin((two)::double precision) * (four)::doubleprecision) / '2'::real) = ANY ('{2,2,2,2}'::double precision[]))
+(2 rows)
In my opinion, we can apply the VALUES ->ANY transformation to them aswell. What do you think? I ran some queries and didn’t notice anysignificant performance degradation.
create table test (x int);
insert into test select id from generate_series(1,1000) id;
PREPARE test4 (int,int, int) AS select * from test where x IN ($1, $2,$3);
PREPARE test3 (int,int, int) AS select * from test where x IN ($1, $2,
 (select $3));
EXPLAIN ANALYZE EXECUTE test4(2, 2, 2);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..18.75 rows=3 width=4) (actualtime=0.016..0.353 rows=1.00 loops=1)
   Filter: (x = ANY (ARRAY[$1, $2, $3]))
   Rows Removed by Filter: 999
   Buffers: shared hit=5
 Planning:
   Buffers: shared hit=20
 Planning Time: 0.266 ms
 Execution Time: 0.367 ms
(8 rows)

alena@postgres=# EXPLAIN ANALYZE EXECUTE test3(2, 2, 2);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.01..18.76 rows=3 width=4) (actualtime=0.072..1.379 rows=1.00 loops=1)
   Filter: (x = ANY (ARRAY[2, 2, (InitPlan 1).col1]))
   Rows Removed by Filter: 999
   Buffers: shared hit=5
   InitPlan 1
-> Result (cost=0.00..0.01 rows=1 width=4) (actualtime=0.003..0.003 rows=1.00 loops=1)
 Planning Time: 0.350 ms
 Execution Time: 1.431 ms
(8 rows)
alena@postgres=# PREPARE test6 (int,int, int) AS select * from testwhere x IN (values($1), ($2), ($3));
PREPARE
alena@postgres=# EXPLAIN ANALYZE EXECUTE test6(2, 2, 2);
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..18.75 rows=3 width=4) (actualtime=0.055..0.683 rows=1.00 loops=1)
   Filter: (x = ANY ('{2,2,2}'::integer[]))
   Rows Removed by Filter: 999
   Buffers: shared hit=5
 Planning Time: 0.230 ms
 Execution Time: 0.724 ms
(6 rows)
We can’t use hashing for them, but without this transformation, westill have to perform a join.
----------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=0.08..17.73 rows=3 width=4) (actualtime=0.124..0.943 rows=1.00 loops=1)
   Hash Cond: (test.x = "*VALUES*".column1)
   Buffers: shared hit=5
-> Seq Scan on test (cost=0.00..15.00 rows=1000 width=4) (actualtime=0.051..0.389 rows=1000.00 loops=1)
         Buffers: shared hit=5
-> Hash (cost=0.04..0.04 rows=3 width=4) (actualtime=0.028..0.030 rows=3.00 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
-> Values Scan on "*VALUES*" (cost=0.00..0.04 rows=3width=4) (actual time=0.004..0.010 rows=3.00 loops=1)
 Planning:
   Buffers: shared hit=105 read=1
 Planning Time: 2.176 ms
 Execution Time: 1.077 ms
(12 rows)
So, I think we can bring it back and construct the Array node based onthe have_param flag.
foreach (lc, rte->values_lists)
+    {
+        List *elem = lfirst(lc);
+        Node *value = linitial(elem);
+
+        value = eval_const_expressions(NULL, value);
+
+        if (!IsA(value, Const))
+            have_param = true;
+
+        consts = lappend(consts, value);
+
+    }
Regarding the check for the presence of Var elements before thetransformation, I think we should, for now, restore the walkerfunction (values_simplicity_check_walker) thattraverses the query to identify Var nodes. This function was includedin the initial version of the patch:
+/*
+ * The function traverses the tree looking for elements of type var.
+ * If it finds it, it returns true.
+ */
+static bool
+values_simplicity_check_walker(Node *node, void *ctx)
+{
+    if (node == NULL)
+    {
+        return false;
+    }
+    else if(IsA(node, Var))
+        return true;
+    else if(IsA(node, Query))
+        return query_tree_walker((Query *) node,
+  values_simplicity_check_walker,
+                                 (void*) ctx,
+                                 QTW_EXAMINE_RTES_BEFORE);
+
+    return expression_tree_walker(node, values_simplicity_check_walker,
+                                  (void *) ctx);
+}
In future we may implement dynamic SAOP hashing, and then allow ourtransformation in more cases.
I agree with your suggestion) Thank you for your interest to thissubject and contribution!

I prepared a patch according to my suggestions, it just checks that thetransformation is not carried out if there is a var element, there arechanges only in one test, but I think it is correct.

diff -U3/home/alena/postgrespro_or3/src/test/regress/expected/subselect.out/home/alena/postgrespro_or3/src/test/regress/results/subselect.out--- /home/alena/postgrespro_or3/src/test/regress/expected/subselect.out2025-04-02 02:50:07.018329864 +0300+++ /home/alena/postgrespro_or3/src/test/regress/results/subselect.out2025-04-02 17:27:09.845104001 +0300

@@ -3027,18 +3027,15 @@
 SELECT ten FROM onek t WHERE unique1 IN (VALUES (0), ((2 IN
   (SELECT (3)))::integer)
 );
-                     QUERY PLAN
-----------------------------------------------------
- Nested Loop
-   ->  Unique
-         ->  Sort
-               Sort Key: "*VALUES*".column1
-               ->  Values Scan on "*VALUES*"
-                     SubPlan 1
-                       ->  Result
-   ->  Index Scan using onek_unique1 on onek t
-         Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+                                           QUERY PLAN
+------------------------------------------------------------------------------------------------
+ Bitmap Heap Scan on onek t

+ Recheck Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed SubPlan1).col1)))::integer]))

+   ->  Bitmap Index Scan on onek_unique1

+ Index Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashedSubPlan 1).col1)))::integer]))

+   SubPlan 1
+     ->  Result
+(6 rows)

 -- Alow to transformation and hold conversion between types of colemns and
 -- declared type of column pointed in RTE

--
Regards,
Alena Rybakina
Postgres Professional

diff --git a/src/backend/optimizer/plan/subselect.c 
b/src/backend/optimizer/plan/subselect.c
index e0d1899c29b..2bf883a818f 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1214,6 +1214,29 @@ inline_cte_walker(Node *node, inline_cte_walker_context 
*context)
        return expression_tree_walker(node, inline_cte_walker, context);
 }
 
+/*
+ * The function traverses the tree looking for elements of type var.
+ * If it finds it, it returns true.
+ */
+static bool
+values_simplicity_check_walker(Node *node, void *ctx)
+{
+       if (node == NULL)
+       {
+               return false;
+       }
+       else if(IsA(node, Var))
+               return true;
+       else if(IsA(node, Query))
+               return query_tree_walker((Query *) node,
+                                                                
values_simplicity_check_walker,
+                                                                (void*) ctx,
+                                                                
QTW_EXAMINE_RTES_BEFORE);
+
+       return expression_tree_walker(node, values_simplicity_check_walker,
+                                                                 (void *) ctx);
+}
+
 /*
  * Attempt to transform 'testexpr' over the VALUES subquery into
  * a ScalarArrayOpExpr.
@@ -1258,7 +1281,10 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, 
Query *values)
         */
        if (rte->rtekind != RTE_VALUES ||
                list_length(rte->values_lists) < 2 ||
-               contain_volatile_functions((Node *) rte->values_lists))
+               contain_volatile_functions((Node *) rte->values_lists) ||
+               expression_tree_walker((Node *) (rte->values_lists),
+                                                                
values_simplicity_check_walker,
+                                                                NULL))
                return NULL;
 
        foreach(lc, rte->values_lists)
@@ -1279,7 +1305,9 @@ convert_VALUES_to_ANY(PlannerInfo *root, Node *testexpr, 
Query *values)
                value = eval_const_expressions(root, value);
 
                if (!IsA(value, Const))
-                       return NULL;
+               {
+                       haveNonConst = true;
+               }
 
                exprs = lappend(exprs, value);
        }
diff --git a/src/test/regress/expected/subselect.out 
b/src/test/regress/expected/subselect.out
index c6867720a69..bff1582d388 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -3027,18 +3027,15 @@ EXPLAIN (COSTS OFF)
 SELECT ten FROM onek t WHERE unique1 IN (VALUES (0), ((2 IN
   (SELECT (3)))::integer)
 );
-                     QUERY PLAN                     
-----------------------------------------------------
- Nested Loop
-   ->  Unique
-         ->  Sort
-               Sort Key: "*VALUES*".column1
-               ->  Values Scan on "*VALUES*"
-                     SubPlan 1
-                       ->  Result
-   ->  Index Scan using onek_unique1 on onek t
-         Index Cond: (unique1 = "*VALUES*".column1)
-(9 rows)
+                                           QUERY PLAN                          
                 
+------------------------------------------------------------------------------------------------
+ Bitmap Heap Scan on onek t
+   Recheck Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed SubPlan 
1).col1)))::integer]))
+   ->  Bitmap Index Scan on onek_unique1
+         Index Cond: (unique1 = ANY (ARRAY[0, ((ANY (2 = (hashed SubPlan 
1).col1)))::integer]))
+   SubPlan 1
+     ->  Result
+(6 rows)
 
 -- Alow to transformation and hold conversion between types of colemns and
 -- declared type of column pointed in RTE

Re: Replace IN VALUES with ANY in WHERE clauses during optimization

Reply via email to