As you said because of your LEFT JOIN SQLite (or any other DBMS in its place) is forced to use t2 as a base table. You have no conditions on t2, so SQLite will make full scan on it and for each row it will need to pick up a corresponding rows from t1 which it does using primary index.
And FYI, by using condition "where t1.uid = 'x'" you're killing all purpose of LEFT JOIN - you're forcing for the row in t1 to exist and thus you can with the same success use INNER JOIN for this query. But if you do use it I believe SQLite will be smart enough to select rows from t1 using t1_idx and then for each row pick up corresponding row from t2 using t2_idx. Pavel On Fri, Dec 18, 2009 at 3:52 AM, Jens Miltner <j...@mac.com> wrote: > Hi everybody, > > consider the following database schema: > > CREATE TABLE t1 ( > id INTEGER PRIMARY KEY, > uid, > x > ); > CREATE INDEX t1_idx ON t1(uid); > CREATE TABLE t2 ( > id INTEGER PRIMARY KEY, > t1_id INTEGER, > y > ); > CREATE INDEX t2_idx ON t2(t1_id); > > > Shouldn't the following query attempt to use both indexes, instead of > not using an index on table t2: > > sqlite> EXPLAIN QUERY PLAN SELECT * FROM t2 LEFT JOIN t1 ON > t1.id=t2.t1_id WHERE t1.uid='x'; > 0|0|TABLE t2 > 1|1|TABLE t1 USING PRIMARY KEY > > even when explicitely forcing to use t1_idx, no index on t2 will be > used: > > sqlite> EXPLAIN QUERY PLAN SELECT * FROM t2 LEFT JOIN t1 ON > +t1.id=t2.t1_id WHERE t1.uid='x'; > 0|0|TABLE t2 > 1|1|TABLE t1 WITH INDEX t1_idx > > > I would have expected the query optimizer to use t1_idx and t2_idx? > I understand that using the LEFT JOIN in the way I did above will have > to use t2 as the base table, but shouldn't this also be possible when > filtering on t1 first (by uid) and then applying the JOIN to records > from t2 satisfying the JOIN condition? > > It definitely would make sense in the case I have in mind where t2 > contains much, much more records than t1 and only very few of them > actually satisfy the JOIN condition AND the uid condition... > I found that I can rewrite the query as > > sqlite> EXPLAIN QUERY PLAN SELECT * FROM t2 LEFT JOIN t1 ON > t1.id=t2.t1_id WHERE t2.t1_id=(SELECT id FROM t1 WHERE uid='x'); > 0|0|TABLE t2 WITH INDEX t2_idx > 1|1|TABLE t1 USING PRIMARY KEY > 0|0|TABLE t1 WITH INDEX t1_idx > > but this is somewhat awkward and hard to do in our case, since the > queries are generated dynamically... > Wouldn't/Shouldn't the query optimizer be able to do this kind of > optimization without me explicitely rewriting the query (especially > when I have run ANALYZE, so the optimizer knows the data distribution)? > > Don't get me wrong - I'm not trying to complain, I'd just like to > understand whether there is potential for the query optimizer to > transparently optimize queries as the above or whether I'm completely > off with my idea :) > > > Thanks, > </jum> > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users