Re: Reduce planning time for large NOT IN lists containing NULL

Ilia Evdokimov Tue, 03 Mar 2026 13:56:16 -0800

On 3/3/26 04:08, David Rowley wrote:

I had a look at this and wondered if we guarantee that no rows will
match, then why can't we perform constant folding on the
ScalarArrayOpExpr when !useOr and the array contains a NULL element
and the operator is strict. Seemingly, one of the reasons for that is
down to the expression returning NULL vs false. Take the following two
tests from expressions.out:


select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 2, null);
  ?column?
----------

(1 row)

select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1, null);
  ?column?
----------
  f
(1 row)

Here we see that we return false when we find the left operand in the
array, but NULL when we don't find it and the array contains NULL. So,
unless the left operand is a const, we wouldn't know how to simplify
the ScalarArrayOpExpr during planning as the false or NULL would only
be known during evaluation of the expression.

However, when the expression being simplified is an EXPRKIND_QUAL, it
shouldn't matter if the result is false or NULL as both mean the same
and there shouldn't be any code that cares about the difference.
Currently, we don't pass the "kind" down into
eval_const_expressions(), but I don't really see why we couldn't. It
would be a fair bit of work figuring out with confidence what the
extra arg should be passed as in all the existing call sites of that
function. We'd have to document in the header comment for
eval_const_expressions() that constant-folding on EXPRKIND_QUAL
expressions can enable additional optimisations which disregard the
difference between NULL and false.

For the patch, I imagine it's still a useful optimisation as the
ScalarArrayOpExpr might not be in an EXPRKIND_QUAL.

I agree that this could be a useful optimization, at leastfor EXPRKIND_QUAL, where NULL and false are semantically equivalent.However, passing EXPRKIND through eval_const_expressions() would requircareful auditing of all call sites. If I explore this further and havesomething concrete, I'll start a separate thread on that topic.

There are a couple of things I don't like:

1) The new test is in expressions.sql. The comment at the top of that
file reads: "expression evaluation tests that don't fit into a more
specific file". The new test isn't anything to do with expression
evaluation. It's about planner estimation. I see that
misc_function.sql has the explain_mask_costs() function. I'm not sure
that's the right place either, as the usages of that function are for
testing SupportRequestRows prosupport functions. I wonder if we need a
dedicated row_estimate.sql or selectivity_est.sql file. The
explain_mask_costs() wouldn't be out of place if they were moved into
a new test like that. It was me that started putting those in
misc_function.sql, and I don't object to them being moved to a new
test. I'd be as a separate commit, however.

I've moved explain_mask_costs() and all related tests into a newregression test file selectivity_est.sql. This is done as separate patchv6-0001 containing only the test refactoring, with no behavioral changes.

2) The new test creates a new table and inserts 1000 rows. There does
not seem to be anything special about the new table. Why don't you use
one of the ones from test_setup.sql?

I've switched the test to use tenk1 from test_setup.sql instead ofcreating a new table.


3) Looking at var_eq_const(), it seems like it's coded to assume the
operator is always strict, per "If the constant is NULL, assume
operator is strict and return zero". If that's good enough for
var_eq_const(), then it should be good enough for the new code. I
think it would be good if you wrote that or something similar in the
new code so that the reader knows taking the short-circuit with
non-strict functions is on purpose.

The updated comments are included in the v6-0002, and the test is now inselectivity_est.sql


--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC,
https://tantorlabs.com/

From 6c287794c65276dbdce69771fde86e9b93267469 Mon Sep 17 00:00:00 2001
From: Evdokimov Ilia <[email protected]>
Date: Wed, 4 Mar 2026 00:35:56 +0300
Subject: [PATCH v6 2/2] Reduce planning time for large NOT IN lists containing
  NULL

For x <> ALL (...), the presence of a NULL makes the selectivity 0.0.

The planner currently still iterates over all elements and computes
per-element selectivity, even though the final result is known.

Add an early NULL check for constant arrays and immediately return
0.0 under ALL semantics.

This reduces planning time for large <> ALL lists without
changing semantics.
---
 src/backend/utils/adt/selfuncs.c              | 17 +++++++++++
 src/test/regress/expected/selectivity_est.out | 30 +++++++++++++++++++
 src/test/regress/sql/selectivity_est.sql      | 22 ++++++++++++++
 3 files changed, 69 insertions(+)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index dd7e11c0ca5..d93ce6e6ea9 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2018,6 +2018,15 @@ scalararraysel(PlannerInfo *root,
 		if (arrayisnull)		/* qual can't succeed if null array */
 			return (Selectivity) 0.0;
 		arrayval = DatumGetArrayTypeP(arraydatum);
+
+		/*
+		 * For ALL semantics, if the array contains NULL, assume operator is
+		 * strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so return
+		 * zero.
+		 */
+		if (!useOr && array_contains_nulls(arrayval))
+			return (Selectivity) 0.0;
+
 		get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
 							 &elmlen, &elmbyval, &elmalign);
 		deconstruct_array(arrayval,
@@ -2115,6 +2124,14 @@ scalararraysel(PlannerInfo *root,
 			List	   *args;
 			Selectivity s2;
 
+			/*
+			 * For ALL semantics, if the array contains NULL, assume operator
+			 * is strict. The ScalarArrayOpExpr cannot evaluate to TRUE, so
+			 * return zero.
+			 */
+			if (!useOr && IsA(elem, Const) && ((Const *) elem)->constisnull)
+				return (Selectivity) 0.0;
+
 			/*
 			 * Theoretically, if elem isn't of nominal_element_type we should
 			 * insert a RelabelType, but it seems unlikely that any operator
diff --git a/src/test/regress/expected/selectivity_est.out b/src/test/regress/expected/selectivity_est.out
index 8fc5c9c9e07..d482f2ae7a0 100644
--- a/src/test/regress/expected/selectivity_est.out
+++ b/src/test/regress/expected/selectivity_est.out
@@ -175,4 +175,34 @@ false, true, false, true);
  Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
 (1 row)
 
+--
+-- Test <> ALL when array initially contained NULL but no longer does
+--
+CREATE FUNCTION replace_elem(arr int[], idx int, val int)
+RETURNS int[] AS $$
+BEGIN
+        arr[idx] := val;
+        RETURN arr;
+end;
+$$ language plpgsql IMMUTABLE;
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(ARRAY[1,99,3])',
+false, true, false, true );
+                 explain_mask_costs                 
+----------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=9997 width=N)
+   Filter: (unique1 <> ALL ('{1,99,3}'::integer[]))
+(2 rows)
+
+-- same array, constructed from an array with a NULL
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99))',
+false, true, false, true );
+                 explain_mask_costs                 
+----------------------------------------------------
+ Seq Scan on tenk1  (cost=N..N rows=9997 width=N)
+   Filter: (unique1 <> ALL ('{1,99,3}'::integer[]))
+(2 rows)
+
+DROP FUNCTION replace_elem;
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
diff --git a/src/test/regress/sql/selectivity_est.sql b/src/test/regress/sql/selectivity_est.sql
index 416d5ea1f75..3ffec43907f 100644
--- a/src/test/regress/sql/selectivity_est.sql
+++ b/src/test/regress/sql/selectivity_est.sql
@@ -122,5 +122,27 @@ SELECT explain_mask_costs($$
 SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
 false, true, false, true);
 
+--
+-- Test <> ALL when array initially contained NULL but no longer does
+--
+CREATE FUNCTION replace_elem(arr int[], idx int, val int)
+RETURNS int[] AS $$
+BEGIN
+        arr[idx] := val;
+        RETURN arr;
+end;
+$$ language plpgsql IMMUTABLE;
+
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(ARRAY[1,99,3])',
+false, true, false, true );
+
+-- same array, constructed from an array with a NULL
+SELECT explain_mask_costs(
+    'SELECT * FROM tenk1 WHERE unique1 <> ALL(replace_elem(ARRAY[1,NULL,3], 2, 99))',
+false, true, false, true );
+
+DROP FUNCTION replace_elem;
+
 
 DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
-- 
2.34.1

From 055fc1b196ec8a0d2092c38f62fc983183429386 Mon Sep 17 00:00:00 2001
From: Evdokimov Ilia <[email protected]>
Date: Wed, 4 Mar 2026 00:23:19 +0300
Subject: [PATCH v6 1/2] Move planner row-estimation tests to selectivity.sql

Move explain_mask_costs() and the associated planner row-estimation
tests from misc_functions.sql to a new regression test file,
selectivity.sql.

The tests exercise SupportRequestRows support functions and other
planner selectivity estimation behavior, so they do not logically
belong in misc_functions.sql.
---
 src/test/regress/expected/misc_functions.out  | 178 ------------------
 src/test/regress/expected/selectivity_est.out | 178 ++++++++++++++++++
 src/test/regress/parallel_schedule            |   2 +-
 src/test/regress/sql/misc_functions.sql       | 125 ------------
 src/test/regress/sql/selectivity_est.sql      | 126 +++++++++++++
 5 files changed, 305 insertions(+), 304 deletions(-)
 create mode 100644 src/test/regress/expected/selectivity_est.out
 create mode 100644 src/test/regress/sql/selectivity_est.sql

diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out
index 6c03b1a79d7..cf55cdf3688 100644
--- a/src/test/regress/expected/misc_functions.out
+++ b/src/test/regress/expected/misc_functions.out
@@ -2,46 +2,6 @@
 \getenv libdir PG_LIBDIR
 \getenv dlsuffix PG_DLSUFFIX
 \set regresslib :libdir '/regress' :dlsuffix
--- Function to assist with verifying EXPLAIN which includes costs.  A series
--- of bool flags allows control over which portions are masked out
-CREATE FUNCTION explain_mask_costs(query text, do_analyze bool,
-    hide_costs bool, hide_row_est bool, hide_width bool) RETURNS setof text
-LANGUAGE plpgsql AS
-$$
-DECLARE
-    ln text;
-    analyze_str text;
-BEGIN
-    IF do_analyze = true THEN
-        analyze_str := 'on';
-    ELSE
-        analyze_str := 'off';
-    END IF;
-
-    -- avoid jit related output by disabling it
-    SET LOCAL jit = 0;
-
-    FOR ln IN
-        EXECUTE format('explain (analyze %s, costs on, summary off, timing off, buffers off) %s',
-            analyze_str, query)
-    LOOP
-        IF hide_costs = true THEN
-            ln := regexp_replace(ln, 'cost=\d+\.\d\d\.\.\d+\.\d\d', 'cost=N..N');
-        END IF;
-
-        IF hide_row_est = true THEN
-            -- don't use 'g' so that we leave the actual rows intact
-            ln := regexp_replace(ln, 'rows=\d+', 'rows=N');
-        END IF;
-
-        IF hide_width = true THEN
-            ln := regexp_replace(ln, 'width=\d+', 'width=N');
-        END IF;
-
-        RETURN NEXT ln;
-    END LOOP;
-END;
-$$;
 --
 -- num_nulls()
 --
@@ -671,143 +631,6 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
          Index Cond: (unique1 = g.g)
 (4 rows)
 
---
--- Test the SupportRequestRows support function for generate_series_timestamp()
---
--- Ensure the row estimate matches the actual rows
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
-(1 row)
-
--- As above but with generate_series_timestamp
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMP '2024-02-01', TIMESTAMP '2024-03-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
-(1 row)
-
--- As above but with generate_series_timestamptz_at_zone()
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day', 'UTC') g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
-(1 row)
-
--- Ensure the estimated and actual row counts match when the range isn't
--- evenly divisible by the step
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '7 day') g(s);$$,
-true, true, false, true);
-                                    explain_mask_costs                                     
--------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=5 width=N) (actual rows=5.00 loops=1)
-(1 row)
-
--- Ensure the estimates match when step is decreasing
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '-1 day') g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
-(1 row)
-
--- Ensure an empty range estimates 1 row
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-                                    explain_mask_costs                                     
--------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
-(1 row)
-
--- Ensure we get the default row estimate for infinity values
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '-infinity', TIMESTAMPTZ 'infinity', INTERVAL '1 day') g(s);$$,
-false, true, false, true);
-                        explain_mask_costs                         
--------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
-(1 row)
-
--- Ensure the row estimate behaves correctly when step size is zero.
--- We expect generate_series_timestamp() to throw the error rather than in
--- the support function.
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
-ERROR:  step size cannot equal zero
---
--- Test the SupportRequestRows support function for generate_series_numeric()
---
--- Ensure the row estimate matches the actual rows
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25.00 loops=1)
-(1 row)
-
--- As above but with non-default step
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=13 width=N) (actual rows=13.00 loops=1)
-(1 row)
-
--- Ensure the estimates match when step is decreasing
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
-true, true, false, true);
-                                     explain_mask_costs                                      
----------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25.00 loops=1)
-(1 row)
-
--- Ensure an empty range estimates 1 row
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
-true, true, false, true);
-                                    explain_mask_costs                                     
--------------------------------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
-(1 row)
-
--- Ensure we get the default row estimate for error cases (infinity/NaN values
--- and zero step size)
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
-false, true, false, true);
-                        explain_mask_costs                         
--------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
-(1 row)
-
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
-false, true, false, true);
-                        explain_mask_costs                         
--------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
-(1 row)
-
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
-false, true, false, true);
-                        explain_mask_costs                         
--------------------------------------------------------------------
- Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
-(1 row)
-
 --
 -- Test SupportRequestInlineInFrom request
 --
@@ -970,7 +793,6 @@ SELECT pg_column_toast_chunk_id(a) IS NULL,
 (1 row)
 
 DROP TABLE test_chunk_id;
-DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
 -- test stratnum translation support functions
 SELECT gist_translate_cmptype_common(7);
  gist_translate_cmptype_common 
diff --git a/src/test/regress/expected/selectivity_est.out b/src/test/regress/expected/selectivity_est.out
new file mode 100644
index 00000000000..8fc5c9c9e07
--- /dev/null
+++ b/src/test/regress/expected/selectivity_est.out
@@ -0,0 +1,178 @@
+-- Function to assist with verifying EXPLAIN which includes costs.  A series
+-- of bool flags allows control over which portions are masked out
+CREATE FUNCTION explain_mask_costs(query text, do_analyze bool,
+    hide_costs bool, hide_row_est bool, hide_width bool) RETURNS setof text
+LANGUAGE plpgsql AS
+$$
+DECLARE
+    ln text;
+    analyze_str text;
+BEGIN
+    IF do_analyze = true THEN
+        analyze_str := 'on';
+    ELSE
+        analyze_str := 'off';
+    END IF;
+
+    -- avoid jit related output by disabling it
+    SET LOCAL jit = 0;
+
+    FOR ln IN
+        EXECUTE format('explain (analyze %s, costs on, summary off, timing off, buffers off) %s',
+            analyze_str, query)
+    LOOP
+        IF hide_costs = true THEN
+            ln := regexp_replace(ln, 'cost=\d+\.\d\d\.\.\d+\.\d\d', 'cost=N..N');
+        END IF;
+
+        IF hide_row_est = true THEN
+            -- don't use 'g' so that we leave the actual rows intact
+            ln := regexp_replace(ln, 'rows=\d+', 'rows=N');
+        END IF;
+
+        IF hide_width = true THEN
+            ln := regexp_replace(ln, 'width=\d+', 'width=N');
+        END IF;
+
+        RETURN NEXT ln;
+    END LOOP;
+END;
+$$;
+--
+-- Test the SupportRequestRows support function for generate_series_timestamp()
+--
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
+(1 row)
+
+-- As above but with generate_series_timestamp
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMP '2024-02-01', TIMESTAMP '2024-03-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
+(1 row)
+
+-- As above but with generate_series_timestamptz_at_zone()
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day', 'UTC') g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
+(1 row)
+
+-- Ensure the estimated and actual row counts match when the range isn't
+-- evenly divisible by the step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '7 day') g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                     
+-------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=5 width=N) (actual rows=5.00 loops=1)
+(1 row)
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '-1 day') g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=30 width=N) (actual rows=30.00 loops=1)
+(1 row)
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                     
+-------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
+(1 row)
+
+-- Ensure we get the default row estimate for infinity values
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '-infinity', TIMESTAMPTZ 'infinity', INTERVAL '1 day') g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+-- Ensure the row estimate behaves correctly when step size is zero.
+-- We expect generate_series_timestamp() to throw the error rather than in
+-- the support function.
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
+ERROR:  step size cannot equal zero
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25.00 loops=1)
+(1 row)
+
+-- As above but with non-default step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=13 width=N) (actual rows=13.00 loops=1)
+(1 row)
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+                                     explain_mask_costs                                      
+---------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=25 width=N) (actual rows=25.00 loops=1)
+(1 row)
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+                                    explain_mask_costs                                     
+-------------------------------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1 width=N) (actual rows=0.00 loops=1)
+(1 row)
+
+-- Ensure we get the default row estimate for error cases (infinity/NaN values
+-- and zero step size)
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
+false, true, false, true);
+                        explain_mask_costs                         
+-------------------------------------------------------------------
+ Function Scan on generate_series g  (cost=N..N rows=1000 width=N)
+(1 row)
+
+DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 549e9b2d7be..524e600eb54 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # The stats test resets stats, so nothing else needing stats access can be in
 # this group.
 # ----------
-test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
+test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate selectivity_est
 
 # event_trigger depends on create_am and cannot run concurrently with
 # any test that runs DDL
diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql
index 35b7983996c..c8226652f2c 100644
--- a/src/test/regress/sql/misc_functions.sql
+++ b/src/test/regress/sql/misc_functions.sql
@@ -4,47 +4,6 @@
 
 \set regresslib :libdir '/regress' :dlsuffix
 
--- Function to assist with verifying EXPLAIN which includes costs.  A series
--- of bool flags allows control over which portions are masked out
-CREATE FUNCTION explain_mask_costs(query text, do_analyze bool,
-    hide_costs bool, hide_row_est bool, hide_width bool) RETURNS setof text
-LANGUAGE plpgsql AS
-$$
-DECLARE
-    ln text;
-    analyze_str text;
-BEGIN
-    IF do_analyze = true THEN
-        analyze_str := 'on';
-    ELSE
-        analyze_str := 'off';
-    END IF;
-
-    -- avoid jit related output by disabling it
-    SET LOCAL jit = 0;
-
-    FOR ln IN
-        EXECUTE format('explain (analyze %s, costs on, summary off, timing off, buffers off) %s',
-            analyze_str, query)
-    LOOP
-        IF hide_costs = true THEN
-            ln := regexp_replace(ln, 'cost=\d+\.\d\d\.\.\d+\.\d\d', 'cost=N..N');
-        END IF;
-
-        IF hide_row_est = true THEN
-            -- don't use 'g' so that we leave the actual rows intact
-            ln := regexp_replace(ln, 'rows=\d+', 'rows=N');
-        END IF;
-
-        IF hide_width = true THEN
-            ln := regexp_replace(ln, 'width=\d+', 'width=N');
-        END IF;
-
-        RETURN NEXT ln;
-    END LOOP;
-END;
-$$;
-
 --
 -- num_nulls()
 --
@@ -277,89 +236,6 @@ SELECT * FROM tenk1 a JOIN my_gen_series(1,1000) g ON a.unique1 = g;
 EXPLAIN (COSTS OFF)
 SELECT * FROM tenk1 a JOIN my_gen_series(1,10) g ON a.unique1 = g;
 
---
--- Test the SupportRequestRows support function for generate_series_timestamp()
---
-
--- Ensure the row estimate matches the actual rows
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-
--- As above but with generate_series_timestamp
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMP '2024-02-01', TIMESTAMP '2024-03-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-
--- As above but with generate_series_timestamptz_at_zone()
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day', 'UTC') g(s);$$,
-true, true, false, true);
-
--- Ensure the estimated and actual row counts match when the range isn't
--- evenly divisible by the step
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '7 day') g(s);$$,
-true, true, false, true);
-
--- Ensure the estimates match when step is decreasing
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '-1 day') g(s);$$,
-true, true, false, true);
-
--- Ensure an empty range estimates 1 row
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '1 day') g(s);$$,
-true, true, false, true);
-
--- Ensure we get the default row estimate for infinity values
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(TIMESTAMPTZ '-infinity', TIMESTAMPTZ 'infinity', INTERVAL '1 day') g(s);$$,
-false, true, false, true);
-
--- Ensure the row estimate behaves correctly when step size is zero.
--- We expect generate_series_timestamp() to throw the error rather than in
--- the support function.
-SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
-
---
--- Test the SupportRequestRows support function for generate_series_numeric()
---
-
--- Ensure the row estimate matches the actual rows
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
-true, true, false, true);
-
--- As above but with non-default step
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
-true, true, false, true);
-
--- Ensure the estimates match when step is decreasing
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
-true, true, false, true);
-
--- Ensure an empty range estimates 1 row
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
-true, true, false, true);
-
--- Ensure we get the default row estimate for error cases (infinity/NaN values
--- and zero step size)
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
-false, true, false, true);
-
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
-false, true, false, true);
-
-SELECT explain_mask_costs($$
-SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
-false, true, false, true);
-
 --
 -- Test SupportRequestInlineInFrom request
 --
@@ -443,7 +319,6 @@ SELECT pg_column_toast_chunk_id(a) IS NULL,
   pg_column_toast_chunk_id(b) IN (SELECT chunk_id FROM pg_toast.:toastrel)
   FROM test_chunk_id;
 DROP TABLE test_chunk_id;
-DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
 
 -- test stratnum translation support functions
 SELECT gist_translate_cmptype_common(7);
diff --git a/src/test/regress/sql/selectivity_est.sql b/src/test/regress/sql/selectivity_est.sql
new file mode 100644
index 00000000000..416d5ea1f75
--- /dev/null
+++ b/src/test/regress/sql/selectivity_est.sql
@@ -0,0 +1,126 @@
+-- Function to assist with verifying EXPLAIN which includes costs.  A series
+-- of bool flags allows control over which portions are masked out
+CREATE FUNCTION explain_mask_costs(query text, do_analyze bool,
+    hide_costs bool, hide_row_est bool, hide_width bool) RETURNS setof text
+LANGUAGE plpgsql AS
+$$
+DECLARE
+    ln text;
+    analyze_str text;
+BEGIN
+    IF do_analyze = true THEN
+        analyze_str := 'on';
+    ELSE
+        analyze_str := 'off';
+    END IF;
+
+    -- avoid jit related output by disabling it
+    SET LOCAL jit = 0;
+
+    FOR ln IN
+        EXECUTE format('explain (analyze %s, costs on, summary off, timing off, buffers off) %s',
+            analyze_str, query)
+    LOOP
+        IF hide_costs = true THEN
+            ln := regexp_replace(ln, 'cost=\d+\.\d\d\.\.\d+\.\d\d', 'cost=N..N');
+        END IF;
+
+        IF hide_row_est = true THEN
+            -- don't use 'g' so that we leave the actual rows intact
+            ln := regexp_replace(ln, 'rows=\d+', 'rows=N');
+        END IF;
+
+        IF hide_width = true THEN
+            ln := regexp_replace(ln, 'width=\d+', 'width=N');
+        END IF;
+
+        RETURN NEXT ln;
+    END LOOP;
+END;
+$$;
+
+--
+-- Test the SupportRequestRows support function for generate_series_timestamp()
+--
+
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+
+-- As above but with generate_series_timestamp
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMP '2024-02-01', TIMESTAMP '2024-03-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+
+-- As above but with generate_series_timestamptz_at_zone()
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '1 day', 'UTC') g(s);$$,
+true, true, false, true);
+
+-- Ensure the estimated and actual row counts match when the range isn't
+-- evenly divisible by the step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '7 day') g(s);$$,
+true, true, false, true);
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '-1 day') g(s);$$,
+true, true, false, true);
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-03-01', TIMESTAMPTZ '2024-02-01', INTERVAL '1 day') g(s);$$,
+true, true, false, true);
+
+-- Ensure we get the default row estimate for infinity values
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(TIMESTAMPTZ '-infinity', TIMESTAMPTZ 'infinity', INTERVAL '1 day') g(s);$$,
+false, true, false, true);
+
+-- Ensure the row estimate behaves correctly when step size is zero.
+-- We expect generate_series_timestamp() to throw the error rather than in
+-- the support function.
+SELECT * FROM generate_series(TIMESTAMPTZ '2024-02-01', TIMESTAMPTZ '2024-03-01', INTERVAL '0 day') g(s);
+
+--
+-- Test the SupportRequestRows support function for generate_series_numeric()
+--
+
+-- Ensure the row estimate matches the actual rows
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0) g(s);$$,
+true, true, false, true);
+
+-- As above but with non-default step
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 2.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure the estimates match when step is decreasing
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, -1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure an empty range estimates 1 row
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 1.0, 1.0) g(s);$$,
+true, true, false, true);
+
+-- Ensure we get the default row estimate for error cases (infinity/NaN values
+-- and zero step size)
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series('-infinity'::NUMERIC, 'infinity'::NUMERIC, 1.0) g(s);$$,
+false, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(1.0, 25.0, 'NaN'::NUMERIC) g(s);$$,
+false, true, false, true);
+
+SELECT explain_mask_costs($$
+SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
+false, true, false, true);
+
+
+DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
-- 
2.34.1

Re: Reduce planning time for large NOT IN lists containing NULL

Reply via email to