> My question: how can I optimize this kind of query so that it utilizes
> both indexes, to grab the first [b] rows (ordered by time) which also
> match [a]?  Or am I just going to have to guess at which way will be
> faster, and use "INDEXED BY" to force it?  (The documentation says I
> shouldn't have to do this)

There's no way to optimize your query to be fast in both situations.
LIMIT clause is pretty hard to optimize. Maybe just to have a closer
look at the application structure - maybe it's not so necessary to do
ORDER BY or maybe LIMIT can be moved to inner query...
But for this particular case I think it's pretty reasonable to use
INDEXED BY clause despite what documentation says (it discourages
usage for common cases).

Pavel

On Fri, Sep 18, 2009 at 5:55 PM, Matthew L. Creech <mlcre...@gmail.com> wrote:
> Hi,
>
> I'm trying to optimize a query for 2 different scenarios, and I'm
> having trouble getting something that works good in general.  I want
> to be sure I'm not missing something.  Here are the tables and indexes
> used in my database:
>
> sqlite> CREATE TABLE path_table (idx INTEGER PRIMARY KEY, path TEXT UNIQUE);
> sqlite> CREATE TABLE val_table (idx INTEGER, val REAL, time INTEGER);
>
> sqlite> CREATE INDEX time_idx ON val_table (time ASC);
> sqlite> CREATE INDEX path_idx ON val_table (idx ASC);
>
> 'path_table' contains unique string path names, while 'val_table'
> records any number of values associated with each path, and the time
> at which the value occurred.
>
> My query looks something like:
>
> sqlite> SELECT val FROM val_table WHERE idx IN (SELECT idx FROM
> path_table WHERE path GLOB '[a]') ORDER BY time ASC LIMIT [b];
>
> where [a] and [b] are provided by my code's caller.  My problem occurs
> when the idx value(s) selected from 'path_table' match a large number
> of records in the database - say, 100,000 out of 1,000,000 records.
> In that case, the query takes several minutes to complete even when
> [b] is small.  Presumably it's first looking up all 100,000 rows where
> 'idx' matches, then applying the ORDER BY clause to those results
> without indexing.  EXPLAIN QUERY PLAN confirms:
>
> 0|0|TABLE val_table WITH INDEX path_idx
>
> I tried adding "INDEXED BY time_idx", which greatly improved this
> particular case, because statistically 1/10 rows will match 'idx' and
> therefore we find [b] of them very quickly when [b] is small.  But
> this hurts performance in other cases, since if there are only a few
> rows with a matching 'idx', the query ends up manually walking through
> most of the table.
>
> My question: how can I optimize this kind of query so that it utilizes
> both indexes, to grab the first [b] rows (ordered by time) which also
> match [a]?  Or am I just going to have to guess at which way will be
> faster, and use "INDEXED BY" to force it?  (The documentation says I
> shouldn't have to do this)
>
> Thanks for the help!
>
> --
> Matthew L. Creech
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to