On Sat, 21 Mar 2015 19:01:16 +0100
"Mario M. Westphal" <mw at mwlabs.de> wrote:
> For now I have increased the threshold for IN clauses (instead of
> TEMP tables) and use WHERE IN (SELECT ? from TEMP) instead of a JOIN.
Because the two are conceptually different, it's not surprising they
run differently. IN is an existence test: does the value appear in the
set? JOIN is a product: produce every matching combination.
For some queries, they reduce to the same thing. For example,
FROM A JOIN B ON A.b = B.b
is the same as IN if B.b
1. is unique, and
2. no other columns from B are used.
The query optimizer has to be sophisticated enough to recognize
those conditions, which is unlikely in the case of a temporary table.
There is also the question of linear vs binary searches. When you
supply a list of constants to IN, most if not all DBMSs search the list
sequentially. When IN (or EXISTS) is supplied from an indexed column,
the search is often binary. For a small number of elements, there's no
distinction. For 1000 elements, it's 2 orders of magnitude: 1000 hops
versus 10.
--jkl