Hello, hackers! I with my friends propose the patch to replace IN VALUES to ANY in WHERE clauses.
# IntroThe `VALUES` in the `IN VALUES` construct is replaced with with an array of values when `VALUES` contains 1 column. In the end it will be replaced with ANY by the existing function makeA_Expr (src/backend/nodes/makefuncs.c)
This improves performance, especially if the values are small. # Patch v1-in_values_to_array.patch # How realized`VALUES` statement corresponds to `values_clause` nonterminal symbol in gram.y, where it's parsed to `SelectStmt` node.
`IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1 column, parser extracts data from `SelectStmt` and passes it
to function call `makeSimpleA_Expr` where simple `A_Expr` is created.Later during optimizations of parser tree this `A_Expr` will be transformed to `ArrayExpr` (already realized in Postgres)
# Authors. Author: Ivan Kush <ivan.k...@tantorlabs.com> Author: Vadim Yacenko <vadim.yace...@tantorlabs.com> Author: Alexander Simonov <alexander.simo...@tantorlabs.com> # TestsImplementation contains many regression tests of varying complexity, which check supported features.
# PlatformThis patch was checkouted from tag REL_17_STABLE. Code is developed in Linux, doesn't contain platfrom-specific code, only Postgres internal data structures and functions.
# Documentation Regression tests contain many examples # Performance It increases performance # Example Let's compare result. With path the execution time is significantly lower. We have a table table1 with 10000 rows. postgres=# \d table1; Table "public.table1" Column | Type | Collation | Nullable | Default --------+-----------------------------+-----------+----------+--------- fld1 | timestamp without time zone | | not null | fld2 | bytea | | not null | Indexes: "table1index" btree (fld2) Let's execute several commands see commands.sql Plan no patch see plan_no_patch.txt Plan with patch see plan_with_patch.txt -- Best wishes, Ivan Kush Tantor Labs LLC
commands.sql
Description: application/sql
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Right Join (cost=4229.78..4251.54 rows=3 width=65) (actual time=37.091..37.095 rows=0 loops=1) Hash Cond: (t7.fldtmp2 = t1.fld2) -> Seq Scan on tt4 t7 (cost=0.00..18.00 rows=1000 width=25) (never executed) -> Hash (cost=4229.74..4229.74 rows=3 width=57) (actual time=37.076..37.080 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Hash Right Join (cost=4207.98..4229.74 rows=3 width=57) (actual time=37.076..37.080 rows=0 loops=1) Hash Cond: (t6.fldtmp2 = t1.fld2) -> Seq Scan on tt6 t6 (cost=0.00..18.00 rows=1000 width=25) (never executed) -> Hash (cost=4207.94..4207.94 rows=3 width=49) (actual time=37.066..37.070 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Hash Semi Join (cost=122.05..4207.94 rows=3 width=49) (actual time=37.066..37.069 rows=0 loops=1) Hash Cond: (t1.fld2 = "*VALUES*".column1) -> Hash Left Join (cost=122.00..3945.36 rows=100000 width=49) (actual time=0.575..33.126 rows=101020 loops=1) Hash Cond: (t1.fld2 = t5.fldtmp2) -> Hash Left Join (cost=91.50..3401.02 rows=100000 width=41) (actual time=0.430..26.265 rows=100476 loops=1) Hash Cond: (t1.fld2 = t4.fldtmp2) -> Hash Left Join (cost=61.00..2856.68 rows=100000 width=33) (actual time=0.278..19.463 rows=100204 loops=1) Hash Cond: (t1.fld2 = t3.fldtmp2) -> Hash Left Join (cost=30.50..2312.34 rows=100000 width=25) (actual time=0.148..12.776 rows=100068 loops=1) Hash Cond: (t1.fld2 = t2.fldtmp2) -> Seq Scan on table1 t1 (cost=0.00..1768.00 rows=100000 width=17) (actual time=0.005..4.784 rows=100000 loops=1) -> Hash (cost=18.00..18.00 rows=1000 width=25) (actual time=0.131..0.132 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 64kB -> Seq Scan on tt1 t2 (cost=0.00..18.00 rows=1000 width=25) (actual time=0.004..0.046 rows=1000 loops=1) -> Hash (cost=18.00..18.00 rows=1000 width=25) (actual time=0.122..0.122 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 64kB -> Seq Scan on tt2 t3 (cost=0.00..18.00 rows=1000 width=25) (actual time=0.003..0.045 rows=1000 loops=1) -> Hash (cost=18.00..18.00 rows=1000 width=25) (actual time=0.141..0.141 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 64kB -> Seq Scan on tt3 t4 (cost=0.00..18.00 rows=1000 width=25) (actual time=0.002..0.047 rows=1000 loops=1) -> Hash (cost=18.00..18.00 rows=1000 width=25) (actual time=0.129..0.129 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 64kB -> Seq Scan on tt5 t5 (cost=0.00..18.00 rows=1000 width=25) (actual time=0.006..0.050 rows=1000 loops=1) -> Hash (cost=0.03..0.03 rows=2 width=32) (actual time=0.005..0.005 rows=2 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=32) (actual time=0.001..0.002 rows=2 loops=1) Planning Time: 0.588 ms Execution Time: 37.161 ms (38 rows)
QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------- Hash Right Join (cost=134.41..158.66 rows=3 width=65) (actual time=0.070..0.072 rows=0 loops=1) Hash Cond: (t7.fldtmp2 = t1.fld2) -> Seq Scan on tt4 t7 (cost=0.00..18.00 rows=1000 width=25) (never executed) -> Hash (cost=134.37..134.37 rows=3 width=57) (actual time=0.061..0.062 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Hash Right Join (cost=110.11..134.37 rows=3 width=57) (actual time=0.061..0.062 rows=0 loops=1) Hash Cond: (t6.fldtmp2 = t1.fld2) -> Seq Scan on tt6 t6 (cost=0.00..18.00 rows=1000 width=25) (never executed) -> Hash (cost=110.07..110.07 rows=3 width=49) (actual time=0.052..0.053 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Hash Right Join (cost=85.81..110.07 rows=3 width=49) (actual time=0.052..0.053 rows=0 loops=1) Hash Cond: (t5.fldtmp2 = t1.fld2) -> Seq Scan on tt5 t5 (cost=0.00..18.00 rows=1000 width=25) (never executed) -> Hash (cost=85.77..85.77 rows=3 width=41) (actual time=0.044..0.045 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Hash Right Join (cost=61.51..85.77 rows=3 width=41) (actual time=0.043..0.045 rows=0 loops=1) Hash Cond: (t4.fldtmp2 = t1.fld2) -> Seq Scan on tt3 t4 (cost=0.00..18.00 rows=1000 width=25) (never executed) -> Hash (cost=61.48..61.48 rows=3 width=33) (actual time=0.033..0.034 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Hash Right Join (cost=37.22..61.48 rows=3 width=33) (actual time=0.033..0.034 rows=0 loops=1) Hash Cond: (t3.fldtmp2 = t1.fld2) -> Seq Scan on tt2 t3 (cost=0.00..18.00 rows=1000 width=25) (never executed) -> Hash (cost=37.18..37.18 rows=3 width=25) (actual time=0.025..0.026 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Hash Right Join (cost=12.92..37.18 rows=3 width=25) (actual time=0.025..0.026 rows=0 loops=1) Hash Cond: (t2.fldtmp2 = t1.fld2) -> Seq Scan on tt1 t2 (cost=0.00..18.00 rows=1000 width=25) (never executed) -> Hash (cost=12.88..12.88 rows=3 width=17) (actual time=0.016..0.016 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Index Only Scan using table1index on table1 t1 (cost=0.42..12.88 rows=3 width=17) (actual time=0. 016..0.016 rows=0 loops=1) Index Cond: (fld2 = ANY ('{"\\x5c3233305c3231315c303330665c3333325c323631525c3333335c3032315c333 5365c3333377a355c3333365c3033325c333732","\\x5c3233355c323034205c3030345c3031375c3335335c3330315c3230305c3032315c3335356126647d5c3234355c333132"}'::bytea[])) Heap Fetches: 0 Planning Time: 0.564 ms Execution Time: 0.133 ms (35 rows)
From 805498afb64f092aad29e7b3734eecf266f98b16 Mon Sep 17 00:00:00 2001 From: Ivan Kush <ivan.k...@tantorlabs.com> Date: Thu, 3 Oct 2024 17:30:00 +0000 Subject: [PATCH] In VALUES to ANY The `VALUES` in the `IN VALUES` construct is replaced with with an array of values when `VALUES` contains 1 column. In the end it will be replaced with ANY by the existing function makeA_Expr (src/backend/nodes/makefuncs.c) This improves performance, especially if the values are small. How realized `VALUES` statement corresponds to `values_clause` nonterminal symbol in gram.y, where it's parsed to `SelectStmt` node. `IN` is parsed in `a_expr` symbol. When it contains `VALUES` with 1 column, parser extracts data from `SelectStmt` and passes it to function call `makeSimpleA_Expr` where simple `A_Expr` is created. Later during optimizations of parser tree this `A_Expr` will be transformed to `ArrayExpr` (already realized in Postgres) Author: Ivan Kush <ivan.k...@tantorlabs.com> Author: Vadim Yacenko <vadim.yace...@tantorlabs.com> Author: Alexander Simonov <alexander.simo...@tantorlabs.com> --- src/backend/nodes/Makefile | 1 + src/backend/nodes/makefuncs_2.c | 47 +++ src/backend/parser/gram.y | 22 +- src/backend/utils/misc/guc_tables.c | 11 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/nodes/makefuncs_2.h | 28 ++ src/test/regress/expected/create_view.out | 12 +- src/test/regress/expected/in_values.out | 344 ++++++++++++++++++ src/test/regress/expected/sysviews.out | 51 +-- src/test/regress/parallel_schedule | 5 + src/test/regress/sql/in_values.sql | 252 +++++++++++++ 11 files changed, 736 insertions(+), 38 deletions(-) create mode 100644 src/backend/nodes/makefuncs_2.c create mode 100644 src/include/nodes/makefuncs_2.h create mode 100644 src/test/regress/expected/in_values.out create mode 100644 src/test/regress/sql/in_values.sql diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile index 66bbad8..74ea3c2 100644 --- a/src/backend/nodes/Makefile +++ b/src/backend/nodes/Makefile @@ -21,6 +21,7 @@ OBJS = \ extensible.o \ list.o \ makefuncs.o \ + makefuncs_2.o \ multibitmapset.o \ nodeFuncs.o \ outfuncs.o \ diff --git a/src/backend/nodes/makefuncs_2.c b/src/backend/nodes/makefuncs_2.c new file mode 100644 index 0000000..f16c673 --- /dev/null +++ b/src/backend/nodes/makefuncs_2.c @@ -0,0 +1,47 @@ +#include "postgres.h" + +#include "catalog/pg_class.h" +#include "catalog/pg_type.h" +#include "nodes/makefuncs_2.h" +#include "nodes/nodeFuncs.h" +#include "nodes/nodes.h" +#include "nodes/pg_list.h" +#include "utils/errcodes.h" +#include "utils/lsyscache.h" + + +bool enable_convert_in_values_to_any = true; + +Node * +convert_in_values_clause_to_array(Node *lexpr, SelectStmt *stmt, int location) +{ + ListCell *lc; + List *vals = NIL; + List *sublist = NIL; + + if(!enable_convert_in_values_to_any) { + return NULL; + } + + if(!stmt->valuesLists) + { + return NULL; + } + sublist = (List *) linitial(stmt->valuesLists); + if(list_length(sublist) != 1) + { + return NULL; + } + + foreach(lc, stmt->valuesLists) + { + void *val; + sublist = (List *) lfirst(lc); + Assert(list_length(sublist) > 0); + val = linitial(sublist); + vals = (!vals) ? list_make1(val) : lappend(vals, val); + } + + // same func as in gram.y for IN with expr_list + return (Node *) makeSimpleA_Expr(AEXPR_IN, "=", lexpr, (Node *) vals, location); +} diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index bca627c..ae5f406 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -57,6 +57,7 @@ #include "commands/trigger.h" #include "gramparse.h" #include "nodes/makefuncs.h" +#include "nodes/makefuncs_2.h" #include "nodes/nodeFuncs.h" #include "parser/parser.h" #include "storage/lmgr.h" @@ -15105,13 +15106,20 @@ a_expr: c_expr { $$ = $1; } { /* generate foo = ANY (subquery) */ SubLink *n = (SubLink *) $3; - - n->subLinkType = ANY_SUBLINK; - n->subLinkId = 0; - n->testexpr = $1; - n->operName = NIL; /* show it's IN not = ANY */ - n->location = @2; - $$ = (Node *) n; + $$ = NULL; + if(IsA(n->subselect, SelectStmt)) + { + $$ = convert_in_values_clause_to_array($1, (SelectStmt *) n->subselect, @2); + } + if(!$$) + { + n->subLinkType = ANY_SUBLINK; + n->subLinkId = 0; + n->testexpr = $1; + n->operName = NIL; /* show it's IN not = ANY */ + n->location = @2; + $$ = (Node *) n; + } } else { diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index dc222d9..a808f6d 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -49,6 +49,7 @@ #include "libpq/auth.h" #include "libpq/libpq.h" #include "libpq/scram.h" +#include "nodes/makefuncs_2.h" #include "nodes/queryjumble.h" #include "optimizer/cost.h" #include "optimizer/geqo.h" @@ -790,6 +791,16 @@ struct config_bool ConfigureNamesBool[] = true, NULL, NULL, NULL }, + { + {"enable_convert_in_values_to_any", PGC_USERSET, QUERY_TUNING_METHOD, + gettext_noop("Enables the planner's optimization of converting IN VALUES clauses to ANY clauses."), + NULL, + GUC_EXPLAIN + }, + &enable_convert_in_values_to_any, + true, + NULL, NULL, NULL + }, { {"enable_indexonlyscan", PGC_USERSET, QUERY_TUNING_METHOD, gettext_noop("Enables the planner's use of index-only-scan plans."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 667e0dc..7b09000 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -396,6 +396,7 @@ #enable_gathermerge = on #enable_hashagg = on #enable_hashjoin = on +#enable_convert_in_values_to_any = on #enable_incremental_sort = on #enable_indexscan = on #enable_indexonlyscan = on diff --git a/src/include/nodes/makefuncs_2.h b/src/include/nodes/makefuncs_2.h new file mode 100644 index 0000000..f710cda --- /dev/null +++ b/src/include/nodes/makefuncs_2.h @@ -0,0 +1,28 @@ +/*------------------------------------------------------------------------- + * + * makefuncs_2.h + * prototypes for the creator functions of various nodes + * + * + * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/nodes/makefuncs_2.h + * + *------------------------------------------------------------------------- + */ +#ifndef MAKEFUNC_TANTOR_H +#define MAKEFUNC_TANTOR_H + +#include "nodes/execnodes.h" +#include "nodes/parsenodes.h" + +extern PGDLLIMPORT bool enable_convert_in_values_to_any; + +extern A_Expr *makeSimpleA_Expr(A_Expr_Kind kind, char *name, + Node *lexpr, Node *rexpr, int location); + +extern Node * +convert_in_values_clause_to_array(Node *lexpr, SelectStmt *stmt, int location); + +#endif /* MAKEFUNC_2_H */ diff --git a/src/test/regress/expected/create_view.out b/src/test/regress/expected/create_view.out index f551624..51f3e1f 100644 --- a/src/test/regress/expected/create_view.out +++ b/src/test/regress/expected/create_view.out @@ -1852,12 +1852,12 @@ select * from tt17v; (5 rows) select pg_get_viewdef('tt17v', true); - pg_get_viewdef ---------------------------------------------- - SELECT q1, + - q2 + - FROM int8_tbl i + - WHERE (i.* IN ( VALUES (i.*::int8_tbl))); + pg_get_viewdef +-------------------- + SELECT q1, + + q2 + + FROM int8_tbl i+ + WHERE i.* = i.*; (1 row) select * from int8_tbl i where i.* in (values(i.*::int8_tbl)); diff --git a/src/test/regress/expected/in_values.out b/src/test/regress/expected/in_values.out new file mode 100644 index 0000000..9760423 --- /dev/null +++ b/src/test/regress/expected/in_values.out @@ -0,0 +1,344 @@ +-- The `IN` operator in SQL is often used in conjunction with a `VALUES` expression +-- to specify a set of values for filtering or matching. Below are various cases +-- and examples where the `VALUES` expression is followed by the `IN` operator: +-------------------------------------------------------------- +----- Basic Filtering with `IN` and `VALUES` ----------------- +-------------------------------------------------------------- +----- VALUES (int), (int), etc. +-------------------------------------------------------------- +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); +SELECT * FROM tbl +WHERE a IN VALUES (1), (2), (3); +ERROR: syntax error at or near "VALUES" +LINE 2: WHERE a IN VALUES (1), (2), (3); + ^ +SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + a +--- + 2 +(1 row) + +SELECT * FROM tbl +WHERE a IN (((((VALUES (1), (2), (3)))))); + a +--- + 2 +(1 row) + +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + QUERY PLAN +----------------------------------------------------- + Seq Scan on tbl (cost=0.00..45.06 rows=38 width=4) + Filter: (a = ANY ('{1,2,3}'::integer[])) +(2 rows) + +-- reference plan, without VALUES +EXPLAIN SELECT * FROM tbl +WHERE a IN ((1), (2), (3)); + QUERY PLAN +----------------------------------------------------- + Seq Scan on tbl (cost=0.00..45.06 rows=38 width=4) + Filter: (a = ANY ('{1,2,3}'::integer[])) +(2 rows) + +DROP TABLE tbl; +-------------------------------------------------------------- +----- Basic Filtering with `IN` and `VALUES` ----------------- +-------------------------------------------------------------- +----- Basic VALUES (int, int, etc.) +-------------------------------------------------------------- +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); +SELECT * FROM tbl +WHERE a IN VALUES (1, 2, 3); +ERROR: syntax error at or near "VALUES" +LINE 2: WHERE a IN VALUES (1, 2, 3); + ^ +SELECT * FROM tbl +WHERE a IN (VALUES (1, 2, 3)); +ERROR: subquery has too many columns +LINE 2: WHERE a IN (VALUES (1, 2, 3)); + ^ +SELECT * FROM tbl +WHERE a IN (((((VALUES (1, 2, 3)))))); +ERROR: subquery has too many columns +LINE 2: WHERE a IN (((((VALUES (1, 2, 3)))))); + ^ +DROP TABLE tbl; +-------------------------------------------------------------- +----- Enable/disable patch ----------------------------------- +-------------------------------------------------------------- +----- enable_convert_in_values_to_any. By default it's true. +-------------------------------------------------------------- +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); +-- by default is ON +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + QUERY PLAN +----------------------------------------------------- + Seq Scan on tbl (cost=0.00..45.06 rows=38 width=4) + Filter: (a = ANY ('{1,2,3}'::integer[])) +(2 rows) + +SET enable_convert_in_values_to_any TO OFF; +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + QUERY PLAN +------------------------------------------------------------------------- + Hash Semi Join (cost=0.08..42.69 rows=38 width=4) + Hash Cond: (tbl.a = "*VALUES*".column1) + -> Seq Scan on tbl (cost=0.00..35.50 rows=2550 width=4) + -> Hash (cost=0.04..0.04 rows=3 width=4) + -> Values Scan on "*VALUES*" (cost=0.00..0.04 rows=3 width=4) +(5 rows) + +SET enable_convert_in_values_to_any TO ON; +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + QUERY PLAN +----------------------------------------------------- + Seq Scan on tbl (cost=0.00..45.06 rows=38 width=4) + Filter: (a = ANY ('{1,2,3}'::integer[])) +(2 rows) + +DROP TABLE tbl; +-------------------------------------------------------------- +----- Using VARCHAR type ------------------------------------- +-------------------------------------------------------------- +CREATE TABLE tbl (a VARCHAR); +INSERT INTO tbl VALUES ('A'); +SELECT a FROM tbl +WHERE a IN (VALUES ('A'), ('B'), ('C')); + a +--- + A +(1 row) + +SELECT a FROM tbl +WHERE a IN ((((VALUES ('A'), ('B'), ('C'))))); + a +--- + A +(1 row) + +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES ('A'), ('B'), ('C')); + QUERY PLAN +------------------------------------------------------ + Seq Scan on tbl (cost=0.00..28.70 rows=20 width=32) + Filter: ((a)::text = ANY ('{A,B,C}'::text[])) +(2 rows) + +DROP TABLE tbl; +-------------------------------------------------------------- +----- Using custom TYPE +-------------------------------------------------------------- +CREATE TYPE MY_TUPLE AS ( + first INT, + second INT +); +CREATE TABLE tbl (a MY_TUPLE); +INSERT INTO tbl (a) VALUES + (ROW(1, 10)), + (ROW(2, 20)), + (ROW(3, 30)); +SELECT * FROM tbl; + a +-------- + (1,10) + (2,20) + (3,30) +(3 rows) + +EXPLAIN +SELECT * FROM tbl +WHERE a IN (ROW(1, 10), ROW(2, 20)); + QUERY PLAN +-------------------------------------------------------------- + Seq Scan on tbl (cost=0.00..30.40 rows=14 width=32) + Filter: ((a = '(1,10)'::record) OR (a = '(2,20)'::record)) +(2 rows) + +EXPLAIN +SELECT * FROM tbl +WHERE a IN (VALUES (ROW(1, 10)), (ROW(2, 20))); + QUERY PLAN +-------------------------------------------------------------- + Seq Scan on tbl (cost=0.00..30.40 rows=14 width=32) + Filter: ((a = '(1,10)'::record) OR (a = '(2,20)'::record)) +(2 rows) + +SELECT * FROM tbl +WHERE a IN (VALUES (ROW(1, 10)), (ROW(2, 20))); + a +-------- + (1,10) + (2,20) +(2 rows) + +DROP TABLE tbl; +DROP TYPE MY_TUPLE; +-------------------------------------------------------------- +----- Using different types ---------------------------------- +-------------------------------------------------------------- +----- Must be errors +-------------------------------------------------------------- +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (1); +SELECT a FROM tbl +WHERE a IN ((1), ('B'), ('\x0f8d'::bytea)); +ERROR: invalid input syntax for type integer: "B" +LINE 2: WHERE a IN ((1), ('B'), ('\x0f8d'::bytea)); + ^ +SELECT a FROM tbl +WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea)); +ERROR: invalid input syntax for type integer: "B" +LINE 2: WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea)); + ^ +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea)); +ERROR: invalid input syntax for type integer: "B" +LINE 2: WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea)); + ^ +DROP TABLE tbl; +-------------------------------------------------------------- +----- Filtering with Subquery and `VALUES` ------------------- +-------------------------------------------------------------- +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); +SELECT * FROM tbl +WHERE a IN (SELECT * FROM (VALUES (1), (2), (3)) AS foo); + a +--- + 2 +(1 row) + +DROP TABLE tbl; +-------------------------------------------------------------- +----- Combination with `EXISTS` ------------------------------ +-------------------------------------------------------------- +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); +SELECT * FROM tbl +WHERE EXISTS ( + SELECT 1 FROM (VALUES (1), (2), (3)) AS allowed_depts(dept_id) + WHERE a = allowed_depts.dept_id +); + a +--- + 2 +(1 row) + +DROP TABLE tbl; +-------------------------------------------------------------- +----- Using `IN` with `VALUES` for multi-column matching. ---- +-------------------------------------------------------------- +CREATE TABLE tbl (a INT, b INT); +INSERT INTO tbl VALUES (2, 20); +SELECT * FROM tbl +WHERE (a, b) IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30)); + a | b +---+---- + 2 | 20 +(1 row) + +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30)); +ERROR: subquery has too many columns +LINE 2: WHERE a IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30)); + ^ +DROP TABLE tbl; +-------------------------------------------------------------- +----- `VALUES` in CTE and `IN` Clause ------------------------ +-------------------------------------------------------------- +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (1), (2), (3), (4); +WITH allowed_products AS ( + VALUES (1), (2), (3) +) +SELECT * FROM tbl +WHERE a IN (SELECT * FROM allowed_products); + a +--- + 1 + 2 + 3 +(3 rows) + +EXPLAIN WITH allowed_products AS ( + VALUES (1), (2), (3) +) +SELECT * FROM tbl +WHERE a IN (SELECT * FROM allowed_products); + QUERY PLAN +------------------------------------------------------------------------- + Hash Semi Join (cost=0.08..42.69 rows=38 width=4) + Hash Cond: (tbl.a = "*VALUES*".column1) + -> Seq Scan on tbl (cost=0.00..35.50 rows=2550 width=4) + -> Hash (cost=0.04..0.04 rows=3 width=4) + -> Values Scan on "*VALUES*" (cost=0.00..0.04 rows=3 width=4) +(5 rows) + +DROP TABLE tbl; +-------------------------------------------------------------- +----- Dynamic Lists with `VALUES` and `IN` ------------------- +-------------------------------------------------------------- +CREATE TABLE tbl (a INT, b INT); +INSERT INTO tbl VALUES (2, 20); +SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + a | b +---+---- + 2 | 20 +(1 row) + +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + QUERY PLAN +----------------------------------------------------- + Seq Scan on tbl (cost=0.00..41.08 rows=34 width=8) + Filter: (a = ANY ('{1,2,3}'::integer[])) +(2 rows) + +DROP TABLE tbl; +-------------------------------------------------------------- +----- `VALUES` with Nested `IN` Clauses ---------------------- +-------------------------------------------------------------- +CREATE TABLE tbl (a INT, b INT); +INSERT INTO tbl VALUES (1, 10), (1, 20), (2, 20), (3, 30); +CREATE TABLE tbl2 (c INT, d INT); +INSERT INTO tbl2 VALUES (1, 30), (1, 40), (3, 50); +SELECT * FROM tbl +WHERE a IN ( + SELECT c FROM tbl2 + WHERE d IN (VALUES (40), (50)) +); + a | b +---+---- + 1 | 10 + 1 | 20 + 3 | 30 +(3 rows) + +EXPLAIN SELECT * FROM tbl +WHERE a IN ( + SELECT c FROM tbl2 + WHERE d IN (VALUES (40), (50)) +); + QUERY PLAN +------------------------------------------------------------------------ + Hash Join (cost=38.80..80.23 rows=260 width=8) + Hash Cond: (tbl.a = tbl2.c) + -> Seq Scan on tbl (cost=0.00..32.60 rows=2260 width=8) + -> Hash (cost=38.53..38.53 rows=22 width=4) + -> HashAggregate (cost=38.31..38.53 rows=22 width=4) + Group Key: tbl2.c + -> Seq Scan on tbl2 (cost=0.00..38.25 rows=23 width=4) + Filter: (d = ANY ('{40,50}'::integer[])) +(8 rows) + +DROP TABLE tbl; diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index 2176a54..7f2edf8 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -132,31 +132,32 @@ select count(*) = 0 as ok from pg_stat_wal_receiver; -- This is to record the prevailing planner enable_foo settings during -- a regression test run. select name, setting from pg_settings where name like 'enable%'; - name | setting ---------------------------------+--------- - enable_async_append | on - enable_bitmapscan | on - enable_gathermerge | on - enable_group_by_reordering | on - enable_hashagg | on - enable_hashjoin | on - enable_incremental_sort | on - enable_indexonlyscan | on - enable_indexscan | on - enable_material | on - enable_memoize | on - enable_mergejoin | on - enable_nestloop | on - enable_parallel_append | on - enable_parallel_hash | on - enable_partition_pruning | on - enable_partitionwise_aggregate | off - enable_partitionwise_join | off - enable_presorted_aggregate | on - enable_seqscan | on - enable_sort | on - enable_tidscan | on -(22 rows) + name | setting +---------------------------------+--------- + enable_async_append | on + enable_bitmapscan | on + enable_convert_in_values_to_any | on + enable_gathermerge | on + enable_group_by_reordering | on + enable_hashagg | on + enable_hashjoin | on + enable_incremental_sort | on + enable_indexonlyscan | on + enable_indexscan | on + enable_material | on + enable_memoize | on + enable_mergejoin | on + enable_nestloop | on + enable_parallel_append | on + enable_parallel_hash | on + enable_partition_pruning | on + enable_partitionwise_aggregate | off + enable_partitionwise_join | off + enable_presorted_aggregate | on + enable_seqscan | on + enable_sort | on + enable_tidscan | on +(23 rows) -- There are always wait event descriptions for various types. InjectionPoint -- may be present or absent, depending on history since last postmaster start. diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index f53a526..99d6078 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -29,6 +29,11 @@ test: strings md5 numerology point lseg line box path polygon circle date time t # horology depends on date, time, timetz, timestamp, timestamptz, interval # ---------- test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc database + +# ---------- +# Optimization tests +# ---------- +test: in_values # ---------- # Load huge amounts of data diff --git a/src/test/regress/sql/in_values.sql b/src/test/regress/sql/in_values.sql new file mode 100644 index 0000000..e9d2a52 --- /dev/null +++ b/src/test/regress/sql/in_values.sql @@ -0,0 +1,252 @@ +-- The `IN` operator in SQL is often used in conjunction with a `VALUES` expression +-- to specify a set of values for filtering or matching. Below are various cases +-- and examples where the `VALUES` expression is followed by the `IN` operator: + + +-------------------------------------------------------------- +----- Basic Filtering with `IN` and `VALUES` ----------------- +-------------------------------------------------------------- +----- VALUES (int), (int), etc. +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); + +SELECT * FROM tbl +WHERE a IN VALUES (1), (2), (3); + +SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + +SELECT * FROM tbl +WHERE a IN (((((VALUES (1), (2), (3)))))); + +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + +-- reference plan, without VALUES +EXPLAIN SELECT * FROM tbl +WHERE a IN ((1), (2), (3)); + +DROP TABLE tbl; + + +-------------------------------------------------------------- +----- Basic Filtering with `IN` and `VALUES` ----------------- +-------------------------------------------------------------- +----- Basic VALUES (int, int, etc.) +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); + +SELECT * FROM tbl +WHERE a IN VALUES (1, 2, 3); + +SELECT * FROM tbl +WHERE a IN (VALUES (1, 2, 3)); + +SELECT * FROM tbl +WHERE a IN (((((VALUES (1, 2, 3)))))); + +DROP TABLE tbl; + + +-------------------------------------------------------------- +----- Enable/disable patch ----------------------------------- +-------------------------------------------------------------- +----- enable_convert_in_values_to_any. By default it's true. +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); + +-- by default is ON +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + +SET enable_convert_in_values_to_any TO OFF; +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + +SET enable_convert_in_values_to_any TO ON; +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + +DROP TABLE tbl; + + +-------------------------------------------------------------- +----- Using VARCHAR type ------------------------------------- +-------------------------------------------------------------- + +CREATE TABLE tbl (a VARCHAR); +INSERT INTO tbl VALUES ('A'); + +SELECT a FROM tbl +WHERE a IN (VALUES ('A'), ('B'), ('C')); + +SELECT a FROM tbl +WHERE a IN ((((VALUES ('A'), ('B'), ('C'))))); + +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES ('A'), ('B'), ('C')); + +DROP TABLE tbl; + + +-------------------------------------------------------------- +----- Using custom TYPE +-------------------------------------------------------------- +CREATE TYPE MY_TUPLE AS ( + first INT, + second INT +); + +CREATE TABLE tbl (a MY_TUPLE); +INSERT INTO tbl (a) VALUES + (ROW(1, 10)), + (ROW(2, 20)), + (ROW(3, 30)); +SELECT * FROM tbl; + +EXPLAIN +SELECT * FROM tbl +WHERE a IN (ROW(1, 10), ROW(2, 20)); + +EXPLAIN +SELECT * FROM tbl +WHERE a IN (VALUES (ROW(1, 10)), (ROW(2, 20))); + +SELECT * FROM tbl +WHERE a IN (VALUES (ROW(1, 10)), (ROW(2, 20))); + +DROP TABLE tbl; +DROP TYPE MY_TUPLE; + +-------------------------------------------------------------- +----- Using different types ---------------------------------- +-------------------------------------------------------------- +----- Must be errors +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (1); + +SELECT a FROM tbl +WHERE a IN ((1), ('B'), ('\x0f8d'::bytea)); + +SELECT a FROM tbl +WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea)); + +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), ('B'), ('\x0f8d'::bytea)); + +DROP TABLE tbl; + +-------------------------------------------------------------- +----- Filtering with Subquery and `VALUES` ------------------- +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); + +SELECT * FROM tbl +WHERE a IN (SELECT * FROM (VALUES (1), (2), (3)) AS foo); + +DROP TABLE tbl; + + +-------------------------------------------------------------- +----- Combination with `EXISTS` ------------------------------ +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (2); + +SELECT * FROM tbl +WHERE EXISTS ( + SELECT 1 FROM (VALUES (1), (2), (3)) AS allowed_depts(dept_id) + WHERE a = allowed_depts.dept_id +); + +DROP TABLE tbl; + + +-------------------------------------------------------------- +----- Using `IN` with `VALUES` for multi-column matching. ---- +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT, b INT); +INSERT INTO tbl VALUES (2, 20); + +SELECT * FROM tbl +WHERE (a, b) IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30)); + +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1, 10), (1, 20), (2, 20), (3, 30)); + +DROP TABLE tbl; + + +-------------------------------------------------------------- +----- `VALUES` in CTE and `IN` Clause ------------------------ +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT); +INSERT INTO tbl VALUES (1), (2), (3), (4); + +WITH allowed_products AS ( + VALUES (1), (2), (3) +) +SELECT * FROM tbl +WHERE a IN (SELECT * FROM allowed_products); + +EXPLAIN WITH allowed_products AS ( + VALUES (1), (2), (3) +) +SELECT * FROM tbl +WHERE a IN (SELECT * FROM allowed_products); + +DROP TABLE tbl; + + +-------------------------------------------------------------- +----- Dynamic Lists with `VALUES` and `IN` ------------------- +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT, b INT); +INSERT INTO tbl VALUES (2, 20); + +SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + +EXPLAIN SELECT * FROM tbl +WHERE a IN (VALUES (1), (2), (3)); + +DROP TABLE tbl; + + +-------------------------------------------------------------- +----- `VALUES` with Nested `IN` Clauses ---------------------- +-------------------------------------------------------------- + +CREATE TABLE tbl (a INT, b INT); +INSERT INTO tbl VALUES (1, 10), (1, 20), (2, 20), (3, 30); + +CREATE TABLE tbl2 (c INT, d INT); +INSERT INTO tbl2 VALUES (1, 30), (1, 40), (3, 50); + +SELECT * FROM tbl +WHERE a IN ( + SELECT c FROM tbl2 + WHERE d IN (VALUES (40), (50)) +); + +EXPLAIN SELECT * FROM tbl +WHERE a IN ( + SELECT c FROM tbl2 + WHERE d IN (VALUES (40), (50)) +); + +DROP TABLE tbl; -- 2.30.2