Convert NOT IN sublinks to anti-joins when safe The planner has historically been unable to convert "x NOT IN (SELECT y ...)" sublinks into anti-joins. This is because standard SQL semantics for NOT IN require that if the comparison "x = y" returns NULL, the "NOT IN" expression evaluates to NULL (effectively false), causing the row to be discarded. In contrast, an anti-join preserves the row if no match is found. Due to this semantic mismatch regarding NULL handling, the conversion was previously considered unsafe.
However, if we can prove that neither side of the comparison can yield NULL values, and further that the operator itself cannot return NULL for non-null inputs, the behavior of NOT IN and anti-join becomes identical. Enabling this conversion allows the planner to treat the sublink as a first-class relation rather than an opaque SubPlan filter. This unlocks global join ordering optimization and permits the selection of the most efficient join algorithm based on cost, often yielding significant performance improvements for large datasets. This patch verifies that neither side of the comparison can be NULL and that the operator is safe regarding NULL results before performing the conversion. To verify operator safety, we require that the operator be a member of a B-tree or Hash operator family. This serves as a proxy for standard boolean behavior, ensuring the operator does not return NULL on valid non-null inputs, as doing so would break index integrity. For operand non-nullability, this patch makes use of several existing mechanisms. It leverages the outer-join-aware-Var infrastructure to verify that a Var does not come from the nullable side of an outer join, and consults the NOT-NULL-attnums hash table to efficiently verify schema-level NOT NULL constraints. Additionally, it employs find_nonnullable_vars to identify Vars forced non-nullable by qual clauses, and expr_is_nonnullable to deduce non-nullability for other expression types. The logic for verifying the non-nullability of the subquery outputs was adapted from prior work by David Rowley and Tom Lane. Author: Richard Guo <[email protected]> Reviewed-by: wenhui qiu <[email protected]> Reviewed-by: Zhang Mingli <[email protected]> Reviewed-by: Japin Li <[email protected]> Discussion: https://postgr.es/m/CAMbWs495eF=-fsa5cwjs6b-baei3arp0unb4lt3ekgugzjw...@mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/383eb21ebffe9ddd97dca03e529fa363580e7ccf Modified Files -------------- src/backend/optimizer/path/allpaths.c | 5 + src/backend/optimizer/plan/initsplan.c | 4 +- src/backend/optimizer/plan/subselect.c | 159 ++++++++++- src/backend/optimizer/prep/prepjointree.c | 72 ++++- src/backend/optimizer/util/clauses.c | 385 ++++++++++++++++++++++---- src/backend/optimizer/util/var.c | 15 +- src/backend/utils/adt/int8.c | 2 +- src/backend/utils/adt/ruleutils.c | 3 + src/backend/utils/cache/lsyscache.c | 68 +++++ src/include/optimizer/clauses.h | 1 + src/include/optimizer/optimizer.h | 13 +- src/include/optimizer/subselect.h | 1 + src/include/utils/lsyscache.h | 2 + src/test/regress/expected/subselect.out | 439 ++++++++++++++++++++++++++++++ src/test/regress/sql/subselect.sql | 185 +++++++++++++ src/tools/pgindent/typedefs.list | 1 + 16 files changed, 1280 insertions(+), 75 deletions(-)
