Re: [sqlite] help needed to optimize a query

Jan Hudec Tue, 28 Jun 2011 10:18:36 -0700

On Tue, Jun 28, 2011 at 17:33:23 +0300, Lazarus 101 wrote:
> Hi guys, i'm working on an Android app and using sqlite to store some data
> and i need some help with a query.
> 
> I have the following table structure:
> 
> FILES
> file_id INTEGER NOT NULL,
          ^^^^^^^^^^^^^^^^ Do file_ids repeat? If not, it should be "integer
          primary key". Than sqlite will alias the column to the "rowid"
          column the table is internally ordered by for improved performance
          looking up by file_id and saving some space (one fewer columns).
> name TEXT
> 
> DATA
> file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE,
> data_type TEXT,


If nothing else, you want to define integer identifiers for the data types
and use integer here. That will save you some space (reading from flash is
still performance bottleneck, especially if the flash is SD card) and some
unnecessarily costly string comparisons. In code, you'd obviously use
symbolic constants.

> value TEXT

Obviously you need index on DATA(file_id, data_type, value)

Yes, all three columns. The first two will appear in your query, so have to
be first two having the result also included in the index saves time, because
now everything can be obtained from the index and the table itself does not
have to be fetched.

> data_type can be one of:
> (title, description, date_taken, latitude,longitude, duration,
> album,track_nr,genre, artist, year)

Why this "hypernormalized" structure. When the set is fixed and there can be
only one of each for each file, a big table with one column for each
attribute will do you much better service. Just create FILES with

    file_id integer primary key,
    name text,
    title text,
    description text,
    date_taken text, /* or integer if you decide to store timestamps instead */
    latitude number,
    longitude number,
    duration number,
    album text,
    track_nr integer,
    genre text,
    artist text,
    year integer

Or you can split up to generic attributes, photograph attributes (date_taken,
latitude, longitude) and song attributes (duration, album, track_nr, genre,
artist, year), but I don't think you would save space (the extra row id and
foreign key will take more than the nulls) and you definitely wouldn't save
time.

You certainly want index on the name column as I suppose that will be your
initial input.

You can combine it with separate table for data with multiple values per
file. E.g. tags:

    file_id integer references files(file_id) on delete cascade,
    tag text,
    value text

If it's user-defined tags, you want text, but if it's just additional
multi-valued entries defined by the application, use integer identifiers to
save space and unnecessarily costly string comparisons.

> Now, I want to select all music details in the following format
> file_id | name | duration | genre | artist | description | album | track_nr
> | year

Of course with the less normalized structure, that's result of simple select
from single table.

> it's possible that a music from files does not have any data associated with
> it, in this case i want the name to be in the result set
> 
> This has to run as fast as possible. A left join between these tables is too
> slow, for 10.000 entries it takes around 15 seconds just to navigate through
> the cursor, if I add a where clause selecting only one kind of data then it
> reduces to less than 5 seconds which is acceptable.

Last but not least, "explain query plan" is your friend. If you prefix your
query with "explain query plan" and run it against the database (you can use
the command-line shell or some management tool), sqlite will tell you what
tables it would read, in which order, using which indices and how big it
expects the result set to be.

So create various indices and experiment with tweaking the query and look
what explain query plan tells you. Reading by primary key is fastest,
followed by covering index, noncovering index and temporary index or linear
search are worst (except join by temporary b-tree is near-optimal when you
are not filtering out anything).

Sqlite before 3.7.4 didn't support temporary indices and temporary b-tree
joins, so it is much more critical to have good indices in older versions,
because they did happily regress to quadratic or worse complexity and that
would take ages to complete.

Oh, and remember to remove the indices you end up not using to save space
and time needed to keep them up to date.

-- 
                                                 Jan 'Bulb' Hudec <b...@ucw.cz>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] help needed to optimize a query

Reply via email to