Replace IN VALUES with ANY in WHERE clauses during optimization

Ivan Kush Thu, 03 Oct 2024 12:53:08 -0700

Hello, hackers! I with my friends propose the patch to replace IN VALUES to ANY in WHERE clauses.


# Intro

The `VALUES` in the `IN VALUES` construct is replaced with with an array of values when `VALUES` contains 1 column. In the end it will be replaced with ANY by the existing function makeA_Expr (src/backend/nodes/makefuncs.c)


This improves performance, especially if the values are small.

# Patch

v1-in_values_to_array.patch

# How realized

`VALUES` statement corresponds to `values_clause` nonterminal symbol in gram.y, where it's parsed to `SelectStmt` node.

`IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1 column, parser extracts data from `SelectStmt` and passes it


to function call `makeSimpleA_Expr` where simple `A_Expr` is created.

Later during optimizations of parser tree this `A_Expr` will be transformed to `ArrayExpr` (already realized in Postgres)



# Authors.
Author: Ivan Kush <ivan.k...@tantorlabs.com>
Author: Vadim Yacenko <vadim.yace...@tantorlabs.com>
Author: Alexander Simonov <alexander.simo...@tantorlabs.com>

# Tests

Implementation contains many regression tests of varying complexity, which check supported features.


# Platform

This patch was checkouted from tag REL_17_STABLE. Code is developed in Linux, doesn't contain platfrom-specific code, only Postgres internal data structures and functions.


# Documentation
Regression tests contain many examples

# Performance
It increases performance

# Example
Let's compare result. With path the execution time is significantly lower.

We have a table table1 with 10000 rows.

postgres=# \d table1;
                         Table "public.table1"
 Column |            Type             | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+---------
 fld1   | timestamp without time zone |           | not null |
 fld2   | bytea                       |           | not null |
Indexes:
    "table1index" btree (fld2)

Let's execute several commands
see commands.sql

Plan no patch
see plan_no_patch.txt


Plan with patch
see plan_with_patch.txt


--
Best wishes,
Ivan Kush
Tantor Labs LLC

commands.sql
Description: application/sql

                                                                              
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=4229.78..4251.54 rows=3 width=65) (actual 
time=37.091..37.095 rows=0 loops=1)
   Hash Cond: (t7.fldtmp2 = t1.fld2)
   ->  Seq Scan on tt4 t7  (cost=0.00..18.00 rows=1000 width=25) (never 
executed)
   ->  Hash  (cost=4229.74..4229.74 rows=3 width=57) (actual 
time=37.076..37.080 rows=0 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 8kB
         ->  Hash Right Join  (cost=4207.98..4229.74 rows=3 width=57) (actual 
time=37.076..37.080 rows=0 loops=1)
               Hash Cond: (t6.fldtmp2 = t1.fld2)
               ->  Seq Scan on tt6 t6  (cost=0.00..18.00 rows=1000 width=25) 
(never executed)
               ->  Hash  (cost=4207.94..4207.94 rows=3 width=49) (actual 
time=37.066..37.070 rows=0 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 8kB
                     ->  Hash Semi Join  (cost=122.05..4207.94 rows=3 width=49) 
(actual time=37.066..37.069 rows=0 loops=1)
                           Hash Cond: (t1.fld2 = "*VALUES*".column1)
                           ->  Hash Left Join  (cost=122.00..3945.36 
rows=100000 width=49) (actual time=0.575..33.126 rows=101020 loops=1)
                                 Hash Cond: (t1.fld2 = t5.fldtmp2)
                                 ->  Hash Left Join  (cost=91.50..3401.02 
rows=100000 width=41) (actual time=0.430..26.265 rows=100476 loops=1)
                                       Hash Cond: (t1.fld2 = t4.fldtmp2)
                                       ->  Hash Left Join  (cost=61.00..2856.68 
rows=100000 width=33) (actual time=0.278..19.463 rows=100204 loops=1)
                                             Hash Cond: (t1.fld2 = t3.fldtmp2)
                                             ->  Hash Left Join  
(cost=30.50..2312.34 rows=100000 width=25) (actual time=0.148..12.776 
rows=100068 loops=1)
                                                   Hash Cond: (t1.fld2 = 
t2.fldtmp2)
                                                   ->  Seq Scan on table1 t1  
(cost=0.00..1768.00 rows=100000 width=17) (actual time=0.005..4.784 rows=100000 
loops=1)
                                                   ->  Hash  (cost=18.00..18.00 
rows=1000 width=25) (actual time=0.131..0.132 rows=1000 loops=1)
                                                         Buckets: 1024  
Batches: 1  Memory Usage: 64kB
                                                         ->  Seq Scan on tt1 t2 
 (cost=0.00..18.00 rows=1000 width=25) (actual time=0.004..0.046 rows=1000 
loops=1)
                                             ->  Hash  (cost=18.00..18.00 
rows=1000 width=25) (actual time=0.122..0.122 rows=1000 loops=1)
                                                   Buckets: 1024  Batches: 1  
Memory Usage: 64kB
                                                   ->  Seq Scan on tt2 t3  
(cost=0.00..18.00 rows=1000 width=25) (actual time=0.003..0.045 rows=1000 
loops=1)
                                       ->  Hash  (cost=18.00..18.00 rows=1000 
width=25) (actual time=0.141..0.141 rows=1000 loops=1)
                                             Buckets: 1024  Batches: 1  Memory 
Usage: 64kB
                                             ->  Seq Scan on tt3 t4  
(cost=0.00..18.00 rows=1000 width=25) (actual time=0.002..0.047 rows=1000 
loops=1)
                                 ->  Hash  (cost=18.00..18.00 rows=1000 
width=25) (actual time=0.129..0.129 rows=1000 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 
64kB
                                       ->  Seq Scan on tt5 t5  
(cost=0.00..18.00 rows=1000 width=25) (actual time=0.006..0.050 rows=1000 
loops=1)
                           ->  Hash  (cost=0.03..0.03 rows=2 width=32) (actual 
time=0.005..0.005 rows=2 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                 ->  Values Scan on "*VALUES*"  
(cost=0.00..0.03 rows=2 width=32) (actual time=0.001..0.002 rows=2 loops=1)
 Planning Time: 0.588 ms
 Execution Time: 37.161 ms
(38 rows)

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=134.41..158.66 rows=3 width=65) (actual 
time=0.070..0.072 rows=0 loops=1)
   Hash Cond: (t7.fldtmp2 = t1.fld2)
   ->  Seq Scan on tt4 t7  (cost=0.00..18.00 rows=1000 width=25) (never 
executed)
   ->  Hash  (cost=134.37..134.37 rows=3 width=57) (actual time=0.061..0.062 
rows=0 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 8kB
         ->  Hash Right Join  (cost=110.11..134.37 rows=3 width=57) (actual 
time=0.061..0.062 rows=0 loops=1)
               Hash Cond: (t6.fldtmp2 = t1.fld2)
               ->  Seq Scan on tt6 t6  (cost=0.00..18.00 rows=1000 width=25) 
(never executed)
               ->  Hash  (cost=110.07..110.07 rows=3 width=49) (actual 
time=0.052..0.053 rows=0 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 8kB
                     ->  Hash Right Join  (cost=85.81..110.07 rows=3 width=49) 
(actual time=0.052..0.053 rows=0 loops=1)
                           Hash Cond: (t5.fldtmp2 = t1.fld2)
                           ->  Seq Scan on tt5 t5  (cost=0.00..18.00 rows=1000 
width=25) (never executed)
                           ->  Hash  (cost=85.77..85.77 rows=3 width=41) 
(actual time=0.044..0.045 rows=0 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                 ->  Hash Right Join  (cost=61.51..85.77 rows=3 
width=41) (actual time=0.043..0.045 rows=0 loops=1)
                                       Hash Cond: (t4.fldtmp2 = t1.fld2)
                                       ->  Seq Scan on tt3 t4  
(cost=0.00..18.00 rows=1000 width=25) (never executed)
                                       ->  Hash  (cost=61.48..61.48 rows=3 
width=33) (actual time=0.033..0.034 rows=0 loops=1)
                                             Buckets: 1024  Batches: 1  Memory 
Usage: 8kB
                                             ->  Hash Right Join  
(cost=37.22..61.48 rows=3 width=33) (actual time=0.033..0.034 rows=0 loops=1)
                                                   Hash Cond: (t3.fldtmp2 = 
t1.fld2)
                                                   ->  Seq Scan on tt2 t3  
(cost=0.00..18.00 rows=1000 width=25) (never executed)
                                                   ->  Hash  (cost=37.18..37.18 
rows=3 width=25) (actual time=0.025..0.026 rows=0 loops=1)
                                                         Buckets: 1024  
Batches: 1  Memory Usage: 8kB
                                                         ->  Hash Right Join  
(cost=12.92..37.18 rows=3 width=25) (actual time=0.025..0.026 rows=0 loops=1)
                                                               Hash Cond: 
(t2.fldtmp2 = t1.fld2)
                                                               ->  Seq Scan on 
tt1 t2  (cost=0.00..18.00 rows=1000 width=25) (never executed)
                                                               ->  Hash  
(cost=12.88..12.88 rows=3 width=17) (actual time=0.016..0.016 rows=0 loops=1)
                                                                     Buckets: 
1024  Batches: 1  Memory Usage: 8kB
                                                                     ->  Index 
Only Scan using table1index on table1 t1  (cost=0.42..12.88 rows=3 width=17) 
(actual time=0.
016..0.016 rows=0 loops=1)
                                                                           
Index Cond: (fld2 = ANY 
('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c333
5365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[]))
                                                                           Heap 
Fetches: 0
 Planning Time: 0.564 ms
 Execution Time: 0.133 ms
(35 rows)

From 805498afb64f092aad29e7b3734eecf266f98b16 Mon Sep 17 00:00:00 2001
From: Ivan Kush <ivan.k...@tantorlabs.com>
Date: Thu, 3 Oct 2024 17:30:00 +0000
Subject: [PATCH] In VALUES to ANY

The `VALUES` in the `IN VALUES` construct is replaced with with an array of values when `VALUES` contains 1 column. In the end it will be replaced with ANY by the existing function makeA_Expr (src/backend/nodes/makefuncs.c)
This improves performance, especially if the values are small.


How realized

`VALUES` statement corresponds to `values_clause` nonterminal symbol in gram.y, where it's parsed to `SelectStmt` node.

`IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1 column, parser extracts data from `SelectStmt` and passes it to function call `makeSimpleA_Expr` where simple `A_Expr` is created.

Later during optimizations of parser tree this `A_Expr` will be transformed to `ArrayExpr` (already realized in Postgres)


Author: Ivan Kush <ivan.k...@tantorlabs.com>
Author: Vadim Yacenko <vadim.yace...@tantorlabs.com>
Author: Alexander Simonov <alexander.simo...@tantorlabs.com>

---
 src/backend/nodes/Makefile                    |   1 +
 src/backend/nodes/makefuncs_2.c               |  47 +++
 src/backend/parser/gram.y                     |  22 +-
 src/backend/utils/misc/guc_tables.c           |  11 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/makefuncs_2.h               |  28 ++
 src/test/regress/expected/create_view.out     |  12 +-
 src/test/regress/expected/in_values.out       | 344 ++++++++++++++++++
 src/test/regress/expected/sysviews.out        |  51 +--
 src/test/regress/parallel_schedule            |   5 +
 src/test/regress/sql/in_values.sql            | 252 +++++++++++++
 11 files changed, 736 insertions(+), 38 deletions(-)
 create mode 100644 src/backend/nodes/makefuncs_2.c
 create mode 100644 src/include/nodes/makefuncs_2.h
 create mode 100644 src/test/regress/expected/in_values.out
 create mode 100644 src/test/regress/sql/in_values.sql

diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 66bbad8..74ea3c2 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -21,6 +21,7 @@ OBJS = \
 	extensible.o \
 	list.o \
 	makefuncs.o \
+	makefuncs_2.o \
 	multibitmapset.o \
 	nodeFuncs.o \
 	outfuncs.o \
diff --git a/src/backend/nodes/makefuncs_2.c b/src/backend/nodes/makefuncs_2.c
new file mode 100644
index 0000000..f16c673
--- /dev/null
+++ b/src/backend/nodes/makefuncs_2.c
@@ -0,0 +1,47 @@
+#include "postgres.h"
+
+#include "catalog/pg_class.h"
+#include "catalog/pg_type.h"
+#include "nodes/makefuncs_2.h"
+#include "nodes/nodeFuncs.h"
+#include "nodes/nodes.h"
+#include "nodes/pg_list.h"
+#include "utils/errcodes.h"
+#include "utils/lsyscache.h"
+
+
+bool enable_convert_in_values_to_any = true;
+
+Node *
+convert_in_values_clause_to_array(Node *lexpr, SelectStmt *stmt, int location)
+{
+	ListCell   *lc;
+	List *vals = NIL;
+	List *sublist = NIL;
+
+	if(!enable_convert_in_values_to_any) {
+		return NULL;
+	}
+
+	if(!stmt->valuesLists)
+	{
+		return NULL;
+	}
+	sublist = (List *) linitial(stmt->valuesLists);
+	if(list_length(sublist) != 1)
+	{
+		return NULL;
+	}
+
+	foreach(lc, stmt->valuesLists)
+	{
+		void *val;
+		sublist = (List *) lfirst(lc);
+		Assert(list_length(sublist) > 0);
+		val = linitial(sublist);
+		vals = (!vals) ? list_make1(val) : lappend(vals, val);
+	}
+
+	// same func as in gram.y for IN with expr_list
+	return (Node *) makeSimpleA_Expr(AEXPR_IN, "=", lexpr, (Node *) vals, location);
+}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index bca627c..ae5f406 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -57,6 +57,7 @@
 #include "commands/trigger.h"
 #include "gramparse.h"
 #include "nodes/makefuncs.h"
+#include "nodes/makefuncs_2.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parser.h"
 #include "storage/lmgr.h"
@@ -15105,13 +15106,20 @@ a_expr:		c_expr									{ $$ = $1; }
 					{
 						/* generate foo = ANY (subquery) */
 						SubLink	   *n = (SubLink *) $3;
-
-						n->subLinkType = ANY_SUBLINK;
-						n->subLinkId = 0;
-						n->testexpr = $1;
-						n->operName = NIL;		/* show it's IN not = ANY */
-						n->location = @2;
-						$$ = (Node *) n;
+						$$ = NULL;
+						if(IsA(n->subselect, SelectStmt))
+						{
+							$$ = convert_in_values_clause_to_array($1, (SelectStmt *) n->subselect, @2);
+						}
+						if(!$$)
+						{
+							n->subLinkType = ANY_SUBLINK;
+							n->subLinkId = 0;
+							n->testexpr = $1;
+							n->operName = NIL;		/* show it's IN not = ANY */
+							n->location = @2;
+							$$ = (Node *) n;
+						}
 					}
 					else
 					{
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index dc222d9..a808f6d 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -49,6 +49,7 @@
 #include "libpq/auth.h"
 #include "libpq/libpq.h"
 #include "libpq/scram.h"
+#include "nodes/makefuncs_2.h"
 #include "nodes/queryjumble.h"
 #include "optimizer/cost.h"
 #include "optimizer/geqo.h"
@@ -790,6 +791,16 @@ struct config_bool ConfigureNamesBool[] =
 		true,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_convert_in_values_to_any", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables the planner's optimization of converting IN VALUES clauses to ANY clauses."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_convert_in_values_to_any,
+		true,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_indexonlyscan", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of index-only-scan plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 667e0dc..7b09000 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -396,6 +396,7 @@
 #enable_gathermerge = on
 #enable_hashagg = on
 #enable_hashjoin = on
+#enable_convert_in_values_to_any = on
 #enable_incremental_sort = on
 #enable_indexscan = on
 #enable_indexonlyscan = on
diff --git a/src/include/nodes/makefuncs_2.h b/src/include/nodes/makefuncs_2.h
new file mode 100644
index 0000000..f710cda
--- /dev/null
+++ b/src/include/nodes/makefuncs_2.h
@@ -0,0 +1,28 @@
+/*-------------------------------------------------------------------------
+ *
+ * makefuncs_2.h
+ *	  prototypes for the creator functions of various nodes
+ *
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/nodes/makefuncs_2.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef MAKEFUNC_TANTOR_H
+#define MAKEFUNC_TANTOR_H
+
+#include "nodes/execnodes.h"
+#include "nodes/parsenodes.h"
+
+extern PGDLLIMPORT bool enable_convert_in_values_to_any;
+
+extern A_Expr *makeSimpleA_Expr(A_Expr_Kind kind, char *name,
+								Node *lexpr, Node *rexpr, int location);
+
+extern Node *
+convert_in_values_clause_to_array(Node *lexpr, SelectStmt *stmt, int location);
+
+#endif							/* MAKEFUNC_2_H */
diff --git a/src/test/regress/expected/create_view.out b/src/test/regress/expected/create_view.out
index f551624..51f3e1f 100644
--- a/src/test/regress/expected/create_view.out
+++ b/src/test/regress/expected/create_view.out
@@ -1852,12 +1852,12 @@ select * from tt17v;
 (5 rows)
 
 select pg_get_viewdef('tt17v', true);
-               pg_get_viewdef                
----------------------------------------------
-  SELECT q1,                                +
-     q2                                     +
-    FROM int8_tbl i                         +
-   WHERE (i.* IN ( VALUES (i.*::int8_tbl)));
+   pg_get_viewdef   
+--------------------
+  SELECT q1,       +
+     q2            +
+    FROM int8_tbl i+
+   WHERE i.* = i.*;
 (1 row)
 
 select * from int8_tbl i where i.* in (values(i.*::int8_tbl));
diff --git a/src/test/regress/expected/in_values.out b/src/test/regress/expected/in_values.out
new file mode 100644
index 0000000..9760423
--- /dev/null
+++ b/src/test/regress/expected/in_values.out
@@ -0,0 +1,344 @@
+-- The `IN` operator in SQL is often used in conjunction with a `VALUES` expression
+-- to specify a set of values for filtering or matching. Below are various cases
+-- and examples where the `VALUES` expression is followed by the `IN` operator:
+--------------------------------------------------------------
+----- Basic Filtering with `IN` and `VALUES` -----------------
+--------------------------------------------------------------
+----- VALUES (int), (int), etc.
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+SELECT * FROM tbl
+WHERE a IN VALUES (1), (2), (3);
+ERROR:  syntax error at or near "VALUES"
+LINE 2: WHERE a IN VALUES (1), (2), (3);
+                   ^
+SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+ a 
+---
+ 2
+(1 row)
+
+SELECT * FROM tbl
+WHERE a IN (((((VALUES (1), (2), (3))))));
+ a 
+---
+ 2
+(1 row)
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Seq Scan on tbl  (cost=0.00..45.06 rows=38 width=4)
+   Filter: (a = ANY ('{1,2,3}'::integer[]))
+(2 rows)
+
+-- reference plan, without VALUES
+EXPLAIN SELECT * FROM tbl
+WHERE a IN ((1), (2), (3));
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Seq Scan on tbl  (cost=0.00..45.06 rows=38 width=4)
+   Filter: (a = ANY ('{1,2,3}'::integer[]))
+(2 rows)
+
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- Basic Filtering with `IN` and `VALUES` -----------------
+--------------------------------------------------------------
+----- Basic VALUES (int, int, etc.)
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+SELECT * FROM tbl
+WHERE a IN VALUES (1, 2, 3);
+ERROR:  syntax error at or near "VALUES"
+LINE 2: WHERE a IN VALUES (1, 2, 3);
+                   ^
+SELECT * FROM tbl
+WHERE a IN (VALUES (1, 2, 3));
+ERROR:  subquery has too many columns
+LINE 2: WHERE a IN (VALUES (1, 2, 3));
+                ^
+SELECT * FROM tbl
+WHERE a IN (((((VALUES (1, 2, 3))))));
+ERROR:  subquery has too many columns
+LINE 2: WHERE a IN (((((VALUES (1, 2, 3))))));
+                ^
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- Enable/disable patch -----------------------------------
+--------------------------------------------------------------
+----- enable_convert_in_values_to_any. By default it's true.
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+-- by default is ON
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Seq Scan on tbl  (cost=0.00..45.06 rows=38 width=4)
+   Filter: (a = ANY ('{1,2,3}'::integer[]))
+(2 rows)
+
+SET enable_convert_in_values_to_any TO OFF;
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ Hash Semi Join  (cost=0.08..42.69 rows=38 width=4)
+   Hash Cond: (tbl.a = "*VALUES*".column1)
+   ->  Seq Scan on tbl  (cost=0.00..35.50 rows=2550 width=4)
+   ->  Hash  (cost=0.04..0.04 rows=3 width=4)
+         ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=4)
+(5 rows)
+
+SET enable_convert_in_values_to_any TO ON;
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Seq Scan on tbl  (cost=0.00..45.06 rows=38 width=4)
+   Filter: (a = ANY ('{1,2,3}'::integer[]))
+(2 rows)
+
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- Using VARCHAR type -------------------------------------
+--------------------------------------------------------------
+CREATE TABLE tbl (a VARCHAR);
+INSERT INTO tbl VALUES ('A');
+SELECT a FROM tbl
+WHERE a IN (VALUES ('A'), ('B'), ('C'));
+ a 
+---
+ A
+(1 row)
+
+SELECT a FROM tbl
+WHERE a IN ((((VALUES ('A'), ('B'), ('C')))));
+ a 
+---
+ A
+(1 row)
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES ('A'), ('B'), ('C'));
+                      QUERY PLAN                      
+------------------------------------------------------
+ Seq Scan on tbl  (cost=0.00..28.70 rows=20 width=32)
+   Filter: ((a)::text = ANY ('{A,B,C}'::text[]))
+(2 rows)
+
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- Using custom TYPE
+--------------------------------------------------------------
+CREATE TYPE MY_TUPLE AS (
+    first INT,
+    second INT
+);
+CREATE TABLE tbl (a MY_TUPLE);
+INSERT INTO tbl (a) VALUES
+    (ROW(1, 10)),
+    (ROW(2, 20)),
+    (ROW(3, 30));
+SELECT * FROM tbl;
+   a    
+--------
+ (1,10)
+ (2,20)
+ (3,30)
+(3 rows)
+
+EXPLAIN
+SELECT * FROM tbl
+WHERE a IN (ROW(1, 10), ROW(2, 20));
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Seq Scan on tbl  (cost=0.00..30.40 rows=14 width=32)
+   Filter: ((a = '(1,10)'::record) OR (a = '(2,20)'::record))
+(2 rows)
+
+EXPLAIN
+SELECT * FROM tbl
+WHERE a IN (VALUES (ROW(1, 10)), (ROW(2, 20)));
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Seq Scan on tbl  (cost=0.00..30.40 rows=14 width=32)
+   Filter: ((a = '(1,10)'::record) OR (a = '(2,20)'::record))
+(2 rows)
+
+SELECT * FROM tbl
+WHERE a IN (VALUES (ROW(1, 10)), (ROW(2, 20)));
+   a    
+--------
+ (1,10)
+ (2,20)
+(2 rows)
+
+DROP TABLE tbl;
+DROP TYPE MY_TUPLE;
+--------------------------------------------------------------
+----- Using different types ----------------------------------
+--------------------------------------------------------------
+----- Must be errors
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (1);
+SELECT a FROM tbl
+WHERE a IN ((1), ('B'), ('\x0f8d'::bytea));
+ERROR:  invalid input syntax for type integer: "B"
+LINE 2: WHERE a IN ((1), ('B'), ('\x0f8d'::bytea));
+                          ^
+SELECT a FROM tbl
+WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea));
+ERROR:  invalid input syntax for type integer: "B"
+LINE 2: WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea));
+                                 ^
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea));
+ERROR:  invalid input syntax for type integer: "B"
+LINE 2: WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea));
+                                 ^
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- Filtering with Subquery and `VALUES` -------------------
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+SELECT * FROM tbl
+WHERE a IN (SELECT * FROM (VALUES (1), (2), (3)) AS foo);
+ a 
+---
+ 2
+(1 row)
+
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- Combination with `EXISTS` ------------------------------
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+SELECT * FROM tbl
+WHERE EXISTS (
+    SELECT 1 FROM (VALUES (1), (2), (3)) AS allowed_depts(dept_id)
+    WHERE a = allowed_depts.dept_id
+);
+ a 
+---
+ 2
+(1 row)
+
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- Using `IN` with `VALUES` for multi-column matching. ----
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT, b INT);
+INSERT INTO tbl VALUES (2, 20);
+SELECT * FROM tbl
+WHERE (a, b) IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30));
+ a | b  
+---+----
+ 2 | 20
+(1 row)
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30));
+ERROR:  subquery has too many columns
+LINE 2: WHERE a IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30));
+                ^
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- `VALUES` in CTE and `IN` Clause ------------------------
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (1), (2), (3), (4);
+WITH allowed_products AS (
+    VALUES (1), (2), (3)
+)
+SELECT * FROM tbl
+WHERE a IN (SELECT * FROM allowed_products);
+ a 
+---
+ 1
+ 2
+ 3
+(3 rows)
+
+EXPLAIN WITH allowed_products AS (
+    VALUES (1), (2), (3)
+)
+SELECT * FROM tbl
+WHERE a IN (SELECT * FROM allowed_products);
+                               QUERY PLAN                                
+-------------------------------------------------------------------------
+ Hash Semi Join  (cost=0.08..42.69 rows=38 width=4)
+   Hash Cond: (tbl.a = "*VALUES*".column1)
+   ->  Seq Scan on tbl  (cost=0.00..35.50 rows=2550 width=4)
+   ->  Hash  (cost=0.04..0.04 rows=3 width=4)
+         ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=4)
+(5 rows)
+
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- Dynamic Lists with `VALUES` and `IN` -------------------
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT, b INT);
+INSERT INTO tbl VALUES (2, 20);
+SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+ a | b  
+---+----
+ 2 | 20
+(1 row)
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Seq Scan on tbl  (cost=0.00..41.08 rows=34 width=8)
+   Filter: (a = ANY ('{1,2,3}'::integer[]))
+(2 rows)
+
+DROP TABLE tbl;
+--------------------------------------------------------------
+----- `VALUES` with Nested `IN` Clauses ----------------------
+--------------------------------------------------------------
+CREATE TABLE tbl (a INT, b INT);
+INSERT INTO tbl VALUES (1, 10), (1, 20), (2, 20), (3, 30);
+CREATE TABLE tbl2 (c INT, d INT);
+INSERT INTO tbl2 VALUES (1, 30), (1, 40), (3, 50);
+SELECT * FROM tbl
+WHERE a IN (
+    SELECT c FROM tbl2
+    WHERE d IN (VALUES (40), (50))
+);
+ a | b  
+---+----
+ 1 | 10
+ 1 | 20
+ 3 | 30
+(3 rows)
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (
+    SELECT c FROM tbl2
+    WHERE d IN (VALUES (40), (50))
+);
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Hash Join  (cost=38.80..80.23 rows=260 width=8)
+   Hash Cond: (tbl.a = tbl2.c)
+   ->  Seq Scan on tbl  (cost=0.00..32.60 rows=2260 width=8)
+   ->  Hash  (cost=38.53..38.53 rows=22 width=4)
+         ->  HashAggregate  (cost=38.31..38.53 rows=22 width=4)
+               Group Key: tbl2.c
+               ->  Seq Scan on tbl2  (cost=0.00..38.25 rows=23 width=4)
+                     Filter: (d = ANY ('{40,50}'::integer[]))
+(8 rows)
+
+DROP TABLE tbl;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 2176a54..7f2edf8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -132,31 +132,32 @@ select count(*) = 0 as ok from pg_stat_wal_receiver;
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
-              name              | setting 
---------------------------------+---------
- enable_async_append            | on
- enable_bitmapscan              | on
- enable_gathermerge             | on
- enable_group_by_reordering     | on
- enable_hashagg                 | on
- enable_hashjoin                | on
- enable_incremental_sort        | on
- enable_indexonlyscan           | on
- enable_indexscan               | on
- enable_material                | on
- enable_memoize                 | on
- enable_mergejoin               | on
- enable_nestloop                | on
- enable_parallel_append         | on
- enable_parallel_hash           | on
- enable_partition_pruning       | on
- enable_partitionwise_aggregate | off
- enable_partitionwise_join      | off
- enable_presorted_aggregate     | on
- enable_seqscan                 | on
- enable_sort                    | on
- enable_tidscan                 | on
-(22 rows)
+              name               | setting 
+---------------------------------+---------
+ enable_async_append             | on
+ enable_bitmapscan               | on
+ enable_convert_in_values_to_any | on
+ enable_gathermerge              | on
+ enable_group_by_reordering      | on
+ enable_hashagg                  | on
+ enable_hashjoin                 | on
+ enable_incremental_sort         | on
+ enable_indexonlyscan            | on
+ enable_indexscan                | on
+ enable_material                 | on
+ enable_memoize                  | on
+ enable_mergejoin                | on
+ enable_nestloop                 | on
+ enable_parallel_append          | on
+ enable_parallel_hash            | on
+ enable_partition_pruning        | on
+ enable_partitionwise_aggregate  | off
+ enable_partitionwise_join       | off
+ enable_presorted_aggregate      | on
+ enable_seqscan                  | on
+ enable_sort                     | on
+ enable_tidscan                  | on
+(23 rows)
 
 -- There are always wait event descriptions for various types.  InjectionPoint
 -- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index f53a526..99d6078 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -29,6 +29,11 @@ test: strings md5 numerology point lseg line box path polygon circle date time t
 # horology depends on date, time, timetz, timestamp, timestamptz, interval
 # ----------
 test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc database
+ 
+# ----------
+# Optimization tests
+# ----------
+test: in_values
 
 # ----------
 # Load huge amounts of data
diff --git a/src/test/regress/sql/in_values.sql b/src/test/regress/sql/in_values.sql
new file mode 100644
index 0000000..e9d2a52
--- /dev/null
+++ b/src/test/regress/sql/in_values.sql
@@ -0,0 +1,252 @@
+-- The `IN` operator in SQL is often used in conjunction with a `VALUES` expression
+-- to specify a set of values for filtering or matching. Below are various cases
+-- and examples where the `VALUES` expression is followed by the `IN` operator:
+
+
+--------------------------------------------------------------
+----- Basic Filtering with `IN` and `VALUES` -----------------
+--------------------------------------------------------------
+----- VALUES (int), (int), etc.
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+
+SELECT * FROM tbl
+WHERE a IN VALUES (1), (2), (3);
+
+SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+
+SELECT * FROM tbl
+WHERE a IN (((((VALUES (1), (2), (3))))));
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+
+-- reference plan, without VALUES
+EXPLAIN SELECT * FROM tbl
+WHERE a IN ((1), (2), (3));
+
+DROP TABLE tbl;
+
+
+--------------------------------------------------------------
+----- Basic Filtering with `IN` and `VALUES` -----------------
+--------------------------------------------------------------
+----- Basic VALUES (int, int, etc.)
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+
+SELECT * FROM tbl
+WHERE a IN VALUES (1, 2, 3);
+
+SELECT * FROM tbl
+WHERE a IN (VALUES (1, 2, 3));
+
+SELECT * FROM tbl
+WHERE a IN (((((VALUES (1, 2, 3))))));
+
+DROP TABLE tbl;
+
+
+--------------------------------------------------------------
+----- Enable/disable patch -----------------------------------
+--------------------------------------------------------------
+----- enable_convert_in_values_to_any. By default it's true.
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+
+-- by default is ON
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+
+SET enable_convert_in_values_to_any TO OFF;
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+
+SET enable_convert_in_values_to_any TO ON;
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+
+DROP TABLE tbl;
+
+
+--------------------------------------------------------------
+----- Using VARCHAR type -------------------------------------
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a VARCHAR);
+INSERT INTO tbl VALUES ('A');
+
+SELECT a FROM tbl
+WHERE a IN (VALUES ('A'), ('B'), ('C'));
+
+SELECT a FROM tbl
+WHERE a IN ((((VALUES ('A'), ('B'), ('C')))));
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES ('A'), ('B'), ('C'));
+
+DROP TABLE tbl;
+
+
+--------------------------------------------------------------
+----- Using custom TYPE
+--------------------------------------------------------------
+CREATE TYPE MY_TUPLE AS (
+    first INT,
+    second INT
+);
+
+CREATE TABLE tbl (a MY_TUPLE);
+INSERT INTO tbl (a) VALUES
+    (ROW(1, 10)),
+    (ROW(2, 20)),
+    (ROW(3, 30));
+SELECT * FROM tbl;
+
+EXPLAIN
+SELECT * FROM tbl
+WHERE a IN (ROW(1, 10), ROW(2, 20));
+
+EXPLAIN
+SELECT * FROM tbl
+WHERE a IN (VALUES (ROW(1, 10)), (ROW(2, 20)));
+
+SELECT * FROM tbl
+WHERE a IN (VALUES (ROW(1, 10)), (ROW(2, 20)));
+
+DROP TABLE tbl;
+DROP TYPE MY_TUPLE;
+
+--------------------------------------------------------------
+----- Using different types ----------------------------------
+--------------------------------------------------------------
+----- Must be errors
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (1);
+
+SELECT a FROM tbl
+WHERE a IN ((1), ('B'), ('\x0f8d'::bytea));
+
+SELECT a FROM tbl
+WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea));
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea));
+
+DROP TABLE tbl;
+
+--------------------------------------------------------------
+----- Filtering with Subquery and `VALUES` -------------------
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+
+SELECT * FROM tbl
+WHERE a IN (SELECT * FROM (VALUES (1), (2), (3)) AS foo);
+
+DROP TABLE tbl;
+
+
+--------------------------------------------------------------
+----- Combination with `EXISTS` ------------------------------
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (2);
+
+SELECT * FROM tbl
+WHERE EXISTS (
+    SELECT 1 FROM (VALUES (1), (2), (3)) AS allowed_depts(dept_id)
+    WHERE a = allowed_depts.dept_id
+);
+
+DROP TABLE tbl;
+
+
+--------------------------------------------------------------
+----- Using `IN` with `VALUES` for multi-column matching. ----
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT, b INT);
+INSERT INTO tbl VALUES (2, 20);
+
+SELECT * FROM tbl
+WHERE (a, b) IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30));
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30));
+
+DROP TABLE tbl;
+
+
+--------------------------------------------------------------
+----- `VALUES` in CTE and `IN` Clause ------------------------
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT);
+INSERT INTO tbl VALUES (1), (2), (3), (4);
+
+WITH allowed_products AS (
+    VALUES (1), (2), (3)
+)
+SELECT * FROM tbl
+WHERE a IN (SELECT * FROM allowed_products);
+
+EXPLAIN WITH allowed_products AS (
+    VALUES (1), (2), (3)
+)
+SELECT * FROM tbl
+WHERE a IN (SELECT * FROM allowed_products);
+
+DROP TABLE tbl;
+
+
+--------------------------------------------------------------
+----- Dynamic Lists with `VALUES` and `IN` -------------------
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT, b INT);
+INSERT INTO tbl VALUES (2, 20);
+
+SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (VALUES (1), (2), (3));
+
+DROP TABLE tbl;
+
+
+--------------------------------------------------------------
+----- `VALUES` with Nested `IN` Clauses ----------------------
+--------------------------------------------------------------
+
+CREATE TABLE tbl (a INT, b INT);
+INSERT INTO tbl VALUES (1, 10), (1, 20), (2, 20), (3, 30);
+
+CREATE TABLE tbl2 (c INT, d INT);
+INSERT INTO tbl2 VALUES (1, 30), (1, 40), (3, 50);
+
+SELECT * FROM tbl
+WHERE a IN (
+    SELECT c FROM tbl2
+    WHERE d IN (VALUES (40), (50))
+);
+
+EXPLAIN SELECT * FROM tbl
+WHERE a IN (
+    SELECT c FROM tbl2
+    WHERE d IN (VALUES (40), (50))
+);
+
+DROP TABLE tbl;
-- 
2.30.2

Replace IN VALUES with ANY in WHERE clauses during optimization

Reply via email to