Re: Reduce planning time for large NOT IN lists containing NULL

David Rowley Mon, 02 Mar 2026 17:08:50 -0800

On Tue, 3 Mar 2026 at 04:04, Ilia Evdokimov
<[email protected]> wrote:
> I've fixed this in v5-patch.


I had a look at this and wondered if we guarantee that no rows will
match, then why can't we perform constant folding on the
ScalarArrayOpExpr when !useOr and the array contains a NULL element
and the operator is strict. Seemingly, one of the reasons for that is
down to the expression returning NULL vs false. Take the following two
tests from expressions.out:

select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 2, null);
 ?column?
----------

(1 row)

select return_int_input(1) not in (10, 9, 2, 8, 3, 7, 4, 6, 5, 1, null);
 ?column?
----------
 f
(1 row)

Here we see that we return false when we find the left operand in the
array, but NULL when we don't find it and the array contains NULL. So,
unless the left operand is a const, we wouldn't know how to simplify
the ScalarArrayOpExpr during planning as the false or NULL would only
be known during evaluation of the expression.

However, when the expression being simplified is an EXPRKIND_QUAL, it
shouldn't matter if the result is false or NULL as both mean the same
and there shouldn't be any code that cares about the difference.
Currently, we don't pass the "kind" down into
eval_const_expressions(), but I don't really see why we couldn't. It
would be a fair bit of work figuring out with confidence what the
extra arg should be passed as in all the existing call sites of that
function. We'd have to document in the header comment for
eval_const_expressions() that constant-folding on EXPRKIND_QUAL
expressions can enable additional optimisations which disregard the
difference between NULL and false.

For the patch, I imagine it's still a useful optimisation as the
ScalarArrayOpExpr might not be in an EXPRKIND_QUAL.

There are a couple of things I don't like:

1) The new test is in expressions.sql. The comment at the top of that
file reads: "expression evaluation tests that don't fit into a more
specific file". The new test isn't anything to do with expression
evaluation. It's about planner estimation. I see that
misc_function.sql has the explain_mask_costs() function. I'm not sure
that's the right place either, as the usages of that function are for
testing SupportRequestRows prosupport functions. I wonder if we need a
dedicated row_estimate.sql or selectivity_est.sql file. The
explain_mask_costs() wouldn't be out of place if they were moved into
a new test like that. It was me that started putting those in
misc_function.sql, and I don't object to them being moved to a new
test. I'd be as a separate commit, however.

2) The new test creates a new table and inserts 1000 rows. There does
not seem to be anything special about the new table. Why don't you use
one of the ones from test_setup.sql?

3) Looking at var_eq_const(), it seems like it's coded to assume the
operator is always strict, per "If the constant is NULL, assume
operator is strict and return zero". If that's good enough for
var_eq_const(), then it should be good enough for the new code. I
think it would be good if you wrote that or something similar in the
new code so that the reader knows taking the short-circuit with
non-strict functions is on purpose.

David

Re: Reduce planning time for large NOT IN lists containing NULL

Reply via email to