Convert NOT IN sublinks to anti-joins when safe

The planner has historically been unable to convert "x NOT IN (SELECT
y ...)" sublinks into anti-joins.  This is because standard SQL
semantics for NOT IN require that if the comparison "x = y" returns
NULL, the "NOT IN" expression evaluates to NULL (effectively false),
causing the row to be discarded.  In contrast, an anti-join preserves
the row if no match is found.  Due to this semantic mismatch regarding
NULL handling, the conversion was previously considered unsafe.

However, if we can prove that neither side of the comparison can yield
NULL values, and further that the operator itself cannot return NULL
for non-null inputs, the behavior of NOT IN and anti-join becomes
identical.  Enabling this conversion allows the planner to treat the
sublink as a first-class relation rather than an opaque SubPlan
filter.  This unlocks global join ordering optimization and permits
the selection of the most efficient join algorithm based on cost,
often yielding significant performance improvements for large
datasets.

This patch verifies that neither side of the comparison can be NULL
and that the operator is safe regarding NULL results before performing
the conversion.

To verify operator safety, we require that the operator be a member of
a B-tree or Hash operator family.  This serves as a proxy for standard
boolean behavior, ensuring the operator does not return NULL on valid
non-null inputs, as doing so would break index integrity.

For operand non-nullability, this patch makes use of several existing
mechanisms.  It leverages the outer-join-aware-Var infrastructure to
verify that a Var does not come from the nullable side of an outer
join, and consults the NOT-NULL-attnums hash table to efficiently
verify schema-level NOT NULL constraints.  Additionally, it employs
find_nonnullable_vars to identify Vars forced non-nullable by qual
clauses, and expr_is_nonnullable to deduce non-nullability for other
expression types.

The logic for verifying the non-nullability of the subquery outputs
was adapted from prior work by David Rowley and Tom Lane.

Author: Richard Guo <[email protected]>
Reviewed-by: wenhui qiu <[email protected]>
Reviewed-by: Zhang Mingli <[email protected]>
Reviewed-by: Japin Li <[email protected]>
Discussion: 
https://postgr.es/m/CAMbWs495eF=-fsa5cwjs6b-baei3arp0unb4lt3ekgugzjw...@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/383eb21ebffe9ddd97dca03e529fa363580e7ccf

Modified Files
--------------
src/backend/optimizer/path/allpaths.c     |   5 +
src/backend/optimizer/plan/initsplan.c    |   4 +-
src/backend/optimizer/plan/subselect.c    | 159 ++++++++++-
src/backend/optimizer/prep/prepjointree.c |  72 ++++-
src/backend/optimizer/util/clauses.c      | 385 ++++++++++++++++++++++----
src/backend/optimizer/util/var.c          |  15 +-
src/backend/utils/adt/int8.c              |   2 +-
src/backend/utils/adt/ruleutils.c         |   3 +
src/backend/utils/cache/lsyscache.c       |  68 +++++
src/include/optimizer/clauses.h           |   1 +
src/include/optimizer/optimizer.h         |  13 +-
src/include/optimizer/subselect.h         |   1 +
src/include/utils/lsyscache.h             |   2 +
src/test/regress/expected/subselect.out   | 439 ++++++++++++++++++++++++++++++
src/test/regress/sql/subselect.sql        | 185 +++++++++++++
src/tools/pgindent/typedefs.list          |   1 +
16 files changed, 1280 insertions(+), 75 deletions(-)

Reply via email to