On Tue, Feb 3, 2026 at 4:12 PM Richard Guo <[email protected]> wrote: > This topic has been discussed several times in the past. Due to the > semantic mismatch regarding NULL handling, NOT IN is not ordinarily > equivalent to an anti-join. However, if we can prove that neither the > outer expressions nor the subquery outputs can yield NULL values, it > should be safe to convert NOT IN to an anti-join.
I've noticed a loose end in the v1 patch. The semantic gap between NOT IN and anti-join actually exists whenever the operator returns NULL. For NOT IN, if (A op B) returns NULL, then NOT (NULL) evaluates to NULL (effectively false), and the row is discarded. In contrast, for an anti-join, if (A op B) returns NULL, it implies no match was found, and the anti-join logic dictates that the row should be kept. To guarantee that (A op B) never returns NULL, the current patch verifies that both A and B are non-nullable. However, this is not sufficient. The "op" might be an operator that returns NULL on non-null inputs. On the other hand, if "op" does not return NULL on NULL inputs, like IS DISTINCT FROM, we technically would not even need to require that A and B are non-nullable. Is there a convenient way to verify that an operator never returns NULL on non-null inputs? Would it be sufficient to insist that the operator belongs to btree opclass (assuming that the strict ordering requirements of btree imply this safety)? And, is it worth checking if an operator never returns NULL even on NULL inputs? If we can identify such operators, we should be able to remove the requirement that both sides of NOT IN must be non-nullable. Is there a convenient way to check for such operators? - Richard
