Hi, On Feb 3, 2026 at 15:13 +0800, Richard Guo <[email protected]>, wrote: > > This topic has been discussed several times in the past. Due to the > semantic mismatch regarding NULL handling, NOT IN is not ordinarily > equivalent to an anti-join. However, if we can prove that neither the > outer expressions nor the subquery outputs can yield NULL values, it > should be safe to convert NOT IN to an anti-join.
Thanks for working on this! I've reviewed the patch and it looks good overall. I noticed several minor issues in the test case comments: 1. The comment doesn't match the SQL: +-- ANTI JOIN: outer side is defined NOT NULL, inner side is forced nonnullable +-- by qual clause +SELECT * FROM not_null_tab +WHERE id NOT IN ( + SELECT t1.id + FROM null_tab t1 + INNER JOIN null_tab t2 ON t1.id = t2.id + LEFT JOIN null_tab t3 ON TRUE +); The comment says "forced nonnullable by qual clause", but there's no explicit IS NOT NULL qual here. 2. Another test case that could use a more precise comment: +-- ANTI JOIN: outer side is defined NOT NULL and is not nulled by outer join, +-- inner side is defined NOT NULL +SELECT * FROM not_null_tab t1 +LEFT JOIN not_null_tab t2 +ON t2.id NOT IN (SELECT id FROM not_null_tab); Correct me if I’m wrong. This is a subtle case - the key point is that the ON clause is evaluated on actual t2 rows *before* LEFT JOIN's null-padding. The current comment is technically correct but might be clearer as: -- ANTI JOIN: outer side(t2) is defined NOT NULL. — ON clause is evaluated on actual t2 rows before LEFT JOIN's -- null-padding, so t2.id is NOT NULL; inner side is also defined NOT NULL 3.Also, one suggestion for additional test coverage - the case where the subquery output comes from the nullable side of an outer join but is forced non-nullable by qual: -- ANTI JOIN: inner side comes from nullable side of outer join -- but is forced non-nullable by WHERE clause EXPLAIN (COSTS OFF) SELECT * FROM not_null_tab WHERE id NOT IN ( SELECT t2.id FROM not_null_tab t1 LEFT JOIN not_null_tab t2 ON t1.id = t2.id WHERE t2.id IS NOT NULL ); The existing tests cover t1.id (non-nullable side) with IS NOT NULL, but not t2.id (nullable side). If I read the code correctly, this should work via find_subquery_safe_quals + find_nonnullable_vars, but explicit coverage would be good. And I test it: QUERY PLAN ---------------------------------------------------- Hash Anti Join Hash Cond: (not_null_tab.id = t2.id) -> Seq Scan on not_null_tab -> Hash -> Merge Join Merge Cond: (t2.id = t1.id) -> Sort Sort Key: t2.id -> Seq Scan on null_tab t2 Filter: (id IS NOT NULL) -> Sort Sort Key: t1.id -> Seq Scan on null_tab t1 (13 rows) -- Zhang Mingli HashData
