Re: [sqlite] Slow Queries, FROM order and indexes

Dennis Cote Thu, 04 Aug 2005 08:10:34 -0700

Aaron Schneider wrote:

Hello All,


I've using SQLite 2.8.16 with a music management application, and I'm
trying out why certain queries take a long time and to figure out how
SQLite uses my indexes.  I've got a master "media" table with a couple
of auxiliary tables like "artists", "albums",  and "genres".  Each
table's primary key is ____id (mediaid, artistid, albumid, and genreid),
and the media table contains columns to each of the side tables' primary
key (and an index for each one, too).

I found that a particular query  takes forever when run on a particular
database:
SELECT DISTINCT albums.albumid, albums.name_lower FROM artists, albums,
media WHERE albums.media_count > 0 AND media.albumid=albums.albumid AND
media.artistid=artists.artistid ORDER BY artists.name_key,
albums.name_key;

In this query sqlite does a table scan through the first table, artists,to select the records. This table is joined to the media table by thethird condition. I suspect that you do not have an index on the artistidcolumn in the media table, so sqlite must do a full table scan of themedia table for each record in the artists table (this is what isslowing it down). This generates a set of virtual records that is thesize of the artists table multiplied by the size of the media table.Each of these virtual records are then joined to the albums table usingthe second condition. In this case I believe that the albumid field isthe key for the albums table, so sqlite can go directly to each matchingrecord in the albums table. Finally it checks the first condition foreach of these extended records to see if it should be selected. All thematching records are collected and then sorted before being returned toyou.

The lack of an index on the media table artistid field forces many scansof your largest table.

Well, the largest table (by far) in these queries is always the media
table, and I found that by moving the media table to the beginning of
the FROM list, it runs almost instantly:
SELECT DISTINCT albums.albumid, albums.name_lower FROM media, artists,
albums WHERE albums.media_count > 0 AND media.albumid=albums.albumid AND
media.artistid=artists.artistid ORDER BY artists.name_key,
albums.name_key

Thus, when "media" is first, SQLite scans each record in the media table
and tests the smaller tables with the NotExists command which uses the
primary key to locate the record immediately.

Yes, I believe this is correct. This requires only one table scan of themedia table thought.


I've been reading up on the virtual machine, op-codes, and performance
tips in the wiki for the past few days, and I've got a few questions
about what was happening:

1)  Shouldn't it be faster to iterate through one of the smaller tables
and then use an index on the media table to join with the other small

table?

Yes, see above. You just need the correct index.


2)  It's unclear to me what order my FROM tables should be in.  In the
first query, did I accidently choose the absolute worst order for the

FROM list?

Yes, I think you did. If you had used the albums table first, it wouldhave been able to apply the first condition to each record as it scannedthat table. This would have eliminated some records immediately. Itwould have then done a slow join by scanning the media table to findmatching albumids (assuming there is no index on the albumid field inthe media table). This would have been just as bad as the first case.The relative time would depend on the size of the artists table vs thenumber of records in the albums table that meet the first criterion.Again this can be corrected by creating the required index.


So the tables in the FROM clause should appear in the order that tables
are introduced, with special consideration to the first table in the
list be the primary value we are SELECTing?  In the query above, that
would mean "FROM albums, media, artists".  (Both "FROM media, albums,
artists" and "FROM albums, media, artists" run too fast for eye-ball
speed compare.)

3)  What role, if any, do the ORDER BY columns play in the FROM order?
(I would assume none since by the time you've selected a row, you have
all of the data for the sorting of that row.)

I believe sqlite will optimize the order by clause if it can use anindex to scan the first table in the the order that is required. If notit just does a table scan through the first table and then sorts thecollected records. In your case, you are ordering by columns from twotables, so you can't create an index, and it can't be optimized.


4)  Is there a query optimizer for sqlite?  A program that automagically
discovered which queries were not in optimal form that could propose a
different order or alternate values for certain terms in FROM and WHERE
clauses?  I'm thinking that the input would be a query and an already
existing database (with indexes), and the output would be a (better)
optimized query.  This would be especially useful for SQLite since
everything must be hand-tuned.

I believe that Richard is currently working on improving the queryoptimization of sqlite, but your best bet right now is using the explaincommand to examine how sqlite will execute various queries.


Thanks!
Aaron



HTH
Dennis Cote

Re: [sqlite] Slow Queries, FROM order and indexes

Reply via email to