As you said because of your LEFT JOIN SQLite (or any other DBMS in its
place) is forced to use t2 as a base table. You have no conditions on
t2, so SQLite will make full scan on it and for each row it will need
to pick up a corresponding rows from t1 which it does using primary
index.

And FYI, by using condition "where t1.uid = 'x'" you're killing all
purpose of LEFT JOIN - you're forcing for the row in t1 to exist and
thus you can with the same success use INNER JOIN for this query. But
if you do use it I believe SQLite will be smart enough to select rows
from t1 using t1_idx and then for each row pick up corresponding row
from t2 using t2_idx.


Pavel

On Fri, Dec 18, 2009 at 3:52 AM, Jens Miltner <j...@mac.com> wrote:
> Hi everybody,
>
> consider the following database schema:
>
> CREATE TABLE t1 (
>        id INTEGER PRIMARY KEY,
>        uid,
>        x
> );
> CREATE INDEX t1_idx ON t1(uid);
> CREATE TABLE t2 (
>        id INTEGER PRIMARY KEY,
>         t1_id INTEGER,
>        y
> );
> CREATE INDEX t2_idx ON t2(t1_id);
>
>
> Shouldn't the following query attempt to use both indexes, instead of
> not using an index on table t2:
>
> sqlite> EXPLAIN QUERY PLAN SELECT * FROM t2 LEFT JOIN t1 ON
> t1.id=t2.t1_id WHERE t1.uid='x';
> 0|0|TABLE t2
> 1|1|TABLE t1 USING PRIMARY KEY
>
> even when explicitely forcing to use t1_idx, no index on t2 will be
> used:
>
> sqlite> EXPLAIN QUERY PLAN SELECT * FROM t2 LEFT JOIN t1 ON
> +t1.id=t2.t1_id WHERE t1.uid='x';
> 0|0|TABLE t2
> 1|1|TABLE t1 WITH INDEX t1_idx
>
>
> I would have expected the query optimizer to use t1_idx and t2_idx?
> I understand that using the LEFT JOIN in the way I did above will have
> to use t2 as the base table, but shouldn't this also be possible when
> filtering on t1 first (by uid) and then applying the JOIN to records
> from t2 satisfying the JOIN condition?
>
> It definitely would make sense in the case I have in mind where t2
> contains much, much more records than t1 and only very few of them
> actually satisfy the JOIN condition AND the uid condition...
> I found that I can rewrite the query as
>
> sqlite> EXPLAIN QUERY PLAN SELECT * FROM t2 LEFT JOIN t1 ON
> t1.id=t2.t1_id WHERE t2.t1_id=(SELECT id FROM t1 WHERE uid='x');
> 0|0|TABLE t2 WITH INDEX t2_idx
> 1|1|TABLE t1 USING PRIMARY KEY
> 0|0|TABLE t1 WITH INDEX t1_idx
>
> but this is somewhat awkward and hard to do in our case, since the
> queries are generated dynamically...
> Wouldn't/Shouldn't the query optimizer be able to do this kind of
> optimization without me explicitely rewriting the query (especially
> when I have run ANALYZE, so the optimizer knows the data distribution)?
>
> Don't get me wrong - I'm not trying to complain, I'd just like to
> understand whether there is potential for the query optimizer to
> transparently optimize queries as the above or whether I'm completely
> off with my idea :)
>
>
> Thanks,
> </jum>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to