Re: [sqlite] slow join, fast subselect

R Smith Wed, 17 Apr 2019 02:37:11 -0700

On 2019/04/17 10:55 AM, Poor Yorick wrote:

I've used the following two test queries in a version of sqlite built against a
recent checkout of trunk, and also another recent version of sqlite.  a.ref is
indexed.  The subselect query is faster than the join query -- two orders of
magnitude faster on a larger dataset.  Is sqlite missing some easy optimisation
opportunity here?



select a.rowid
from a join b on a.rowid = b.rowid
where a.ref = $x


select a.rowid
from a,b
where a.ref = $x and a.rowid in (select rowid from b)

These queries have vastly different meanings and as such must havevastly different execution plans. I won't go into that because it'sobvious, so I will just mention the assumptions.

For the query engine to optimize this according to your suggestion (i.e.for it to really do the second thing when you've asked for the firstthing) it has to assume the following:

- 1. You do not need any of the fields from b in any part of the query(which IS the case here and can be deduced, but is an extremely smalllikelihood for JOINs in general queries),- 2. there are no special Collations on either of the columns (again,this can be deduced here), and- 3. it must assume that every b.rowid appears ONLY once in table b (Iknow this can be assumed for rowid's, but is something that cannot beeasily known for any other field, where it could be true with or withouta unique constraint).

And, these are only the obvious outsider-viewpoint assumptions, I haveno insight in what other checks might be needed internally from the QP.This means that this optimization carries a cost deficit in the generalcase. i.e. Wasting CPU cycles "looking" for this optimizationopportunity in the general case, considering the high unlikelihood offinding it viable, will slow down many more queries than it will help.

That said, this is the very kind of thing where we as engineeringprogrammers or database queryers could optimize by asking our questionsright, mind you - I am not advocating thinking for the Query Planner,but do state your question optimally. If you really want to know howmany rabbits have been eaten by foxes, do not ask for a list of allrabbit names alphabetically and exclude every one that was NOT eaten bya fox, simply ask for the count of rabbits where eaten by fox = true.

In your above example you really wish to know all the a's which have anentry in b. The first query asks to join and list all b's found forevery a (which works mathematically in this case by virtue of rowiduniqueness, but isn't the real question and forces a lot of "join"algorithm checking on the QP), the correct question is the second query:Show every a which can also be found in b. It releases the QP of a lotof responsibility and let's it follow a plan that is much faster.



Hope that makes sense :)

Ryan


_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] slow join, fast subselect

Reply via email to