It depends on the data shape.
In the first case, using a correlated subquery, the outer table query is
processed applying all applicable joins and where conditions, and if and only
if the row is still a candidate is the correlated subquery performed. If it
succeeds the result row is output. For maximum efficiency you need indexes
covering (at least) the correlated columns.
In the second case, the subselect to generate the list for in is executed first
and the results put in a temporary b-tree. Then either (a) this list is
traversed to find all the initial candidates in the outer query if the
appropriate indexes exist, else the outer query is performed checking whether
the outer candidate row is in the result set from the inner subquery.
Assuming that proper indexes exist the first form (correlated subquery) will
almost always be quicker since it will probabilistically requires less
operations in all cases.
In the case where tableb's size closely matches the set of candidates from the
outer query, the two will execute at the same speed. As the size of tableb
increases (becomes less selective) the speed of the second form will decrease.
Take the case where tablea has a billion rows, and tableb has a billion rows,
and the intersection is five rows. Which WHERE condition would you expect to
be quicker? Now you may think that a given application will only ever have 5
rows in tableb (or only has five rows during testing) and therefore the choice
Sometimes this is an incorrect assumption and this is what leads to the
"billion dollars spent on a new system that does not work and must be thrown in
the dust bin" stories in the news.
> -----Original Message-----
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Dave Blake
> Sent: Sunday, 16 October, 2016 00:58
> To: SQLite mailing list
> Subject: [sqlite] IN verses EXISTS Query Speed
> Some simple testing is showing using an EXISTS statement is generally
> quicker then using an IN
> SELECT * FROM tablea
> WHERE EXISTS (SELECT 1 FROM tableb WHERE tablea.id = tableb.id AND ...)
> is quicker than
> SELECT * FROM tablea
> WHERE tablea.id IN (SELECT tableb.id FROM tableb WHERE ...)
> Is there any reason for this to be always true in SQLite, or is it query
> sqlite-users mailing list
sqlite-users mailing list