> My question: how can I optimize this kind of query so that it utilizes > both indexes, to grab the first [b] rows (ordered by time) which also > match [a]? Or am I just going to have to guess at which way will be > faster, and use "INDEXED BY" to force it? (The documentation says I > shouldn't have to do this)
There's no way to optimize your query to be fast in both situations. LIMIT clause is pretty hard to optimize. Maybe just to have a closer look at the application structure - maybe it's not so necessary to do ORDER BY or maybe LIMIT can be moved to inner query... But for this particular case I think it's pretty reasonable to use INDEXED BY clause despite what documentation says (it discourages usage for common cases). Pavel On Fri, Sep 18, 2009 at 5:55 PM, Matthew L. Creech <mlcre...@gmail.com> wrote: > Hi, > > I'm trying to optimize a query for 2 different scenarios, and I'm > having trouble getting something that works good in general. I want > to be sure I'm not missing something. Here are the tables and indexes > used in my database: > > sqlite> CREATE TABLE path_table (idx INTEGER PRIMARY KEY, path TEXT UNIQUE); > sqlite> CREATE TABLE val_table (idx INTEGER, val REAL, time INTEGER); > > sqlite> CREATE INDEX time_idx ON val_table (time ASC); > sqlite> CREATE INDEX path_idx ON val_table (idx ASC); > > 'path_table' contains unique string path names, while 'val_table' > records any number of values associated with each path, and the time > at which the value occurred. > > My query looks something like: > > sqlite> SELECT val FROM val_table WHERE idx IN (SELECT idx FROM > path_table WHERE path GLOB '[a]') ORDER BY time ASC LIMIT [b]; > > where [a] and [b] are provided by my code's caller. My problem occurs > when the idx value(s) selected from 'path_table' match a large number > of records in the database - say, 100,000 out of 1,000,000 records. > In that case, the query takes several minutes to complete even when > [b] is small. Presumably it's first looking up all 100,000 rows where > 'idx' matches, then applying the ORDER BY clause to those results > without indexing. EXPLAIN QUERY PLAN confirms: > > 0|0|TABLE val_table WITH INDEX path_idx > > I tried adding "INDEXED BY time_idx", which greatly improved this > particular case, because statistically 1/10 rows will match 'idx' and > therefore we find [b] of them very quickly when [b] is small. But > this hurts performance in other cases, since if there are only a few > rows with a matching 'idx', the query ends up manually walking through > most of the table. > > My question: how can I optimize this kind of query so that it utilizes > both indexes, to grab the first [b] rows (ordered by time) which also > match [a]? Or am I just going to have to guess at which way will be > faster, and use "INDEXED BY" to force it? (The documentation says I > shouldn't have to do this) > > Thanks for the help! > > -- > Matthew L. Creech > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users