I was reading a paper on null-aware anti-joins (section 6 in [1]) and
noticed that they use <> ALL to represent NOT IN semantics.  We
currently execute ALL SubLinks using a nested-loop SubPlan.  This
made me realize that it could be beneficial to convert ALL SubLinks to
ANY SubLinks.

According to De Morgan's laws:

  foo op ALL (sub-SELECT) => NOT (foo negator_op ANY (sub-SELECT))
  NOT (foo op ALL (sub-SELECT)) => foo negator_op ANY (sub-SELECT)

This conversion makes it possible for the executor to evaluate the
sublink using a hashed SubPlan, which replaces a nested loop with a
more efficient hash probe.  Even better, because our planner already
knows how to pull up ANY SubLinks (and pending the other patch, NOT
ANY SubLinks), these transformed clauses have the chance to be
optimized directly into semijoins or anti-joins.

In the worst-case scenario where the transformed ANY sublink cannot be
pulled up and cannot be hashed, execution falls back to a nested-loop
SubPlan.  Performance in this scenario is effectively identical to the
legacy ALL SubPlan, so there should be no regressions.  Because the
operator is negated, the ANY subplan will short-circuit on the exact
same inner tuple that the ALL subplan would have short-circuited on.
The only added overhead is a single boolean NOT inversion per outer
tuple, which is, IMO, computationally negligible compared to the cost
of the nested-loop execution itself.

Attached is a draft patch illustrating the idea.  Any thoughts or
interest in this transformation?

[1] http://www.vldb.org/pvldb/vol2/vldb09-423.pdf

- Richard

Attachment: v1-0001-Convert-ALL-SubLinks-to-ANY-SubLinks.patch
Description: Binary data

Reply via email to