On Tue, Jun 28, 2011 at 17:33:23 +0300, Lazarus 101 wrote: > Hi guys, i'm working on an Android app and using sqlite to store some data > and i need some help with a query. > > I have the following table structure: > > FILES > file_id INTEGER NOT NULL, ^^^^^^^^^^^^^^^^ Do file_ids repeat? If not, it should be "integer primary key". Than sqlite will alias the column to the "rowid" column the table is internally ordered by for improved performance looking up by file_id and saving some space (one fewer columns). > name TEXT > > DATA > file_id INTEGER REFERENCES files(resource_id) ON DELETE CASCADE, > data_type TEXT,
If nothing else, you want to define integer identifiers for the data types and use integer here. That will save you some space (reading from flash is still performance bottleneck, especially if the flash is SD card) and some unnecessarily costly string comparisons. In code, you'd obviously use symbolic constants. > value TEXT Obviously you need index on DATA(file_id, data_type, value) Yes, all three columns. The first two will appear in your query, so have to be first two having the result also included in the index saves time, because now everything can be obtained from the index and the table itself does not have to be fetched. > data_type can be one of: > (title, description, date_taken, latitude,longitude, duration, > album,track_nr,genre, artist, year) Why this "hypernormalized" structure. When the set is fixed and there can be only one of each for each file, a big table with one column for each attribute will do you much better service. Just create FILES with file_id integer primary key, name text, title text, description text, date_taken text, /* or integer if you decide to store timestamps instead */ latitude number, longitude number, duration number, album text, track_nr integer, genre text, artist text, year integer Or you can split up to generic attributes, photograph attributes (date_taken, latitude, longitude) and song attributes (duration, album, track_nr, genre, artist, year), but I don't think you would save space (the extra row id and foreign key will take more than the nulls) and you definitely wouldn't save time. You certainly want index on the name column as I suppose that will be your initial input. You can combine it with separate table for data with multiple values per file. E.g. tags: file_id integer references files(file_id) on delete cascade, tag text, value text If it's user-defined tags, you want text, but if it's just additional multi-valued entries defined by the application, use integer identifiers to save space and unnecessarily costly string comparisons. > Now, I want to select all music details in the following format > file_id | name | duration | genre | artist | description | album | track_nr > | year Of course with the less normalized structure, that's result of simple select from single table. > it's possible that a music from files does not have any data associated with > it, in this case i want the name to be in the result set > > This has to run as fast as possible. A left join between these tables is too > slow, for 10.000 entries it takes around 15 seconds just to navigate through > the cursor, if I add a where clause selecting only one kind of data then it > reduces to less than 5 seconds which is acceptable. Last but not least, "explain query plan" is your friend. If you prefix your query with "explain query plan" and run it against the database (you can use the command-line shell or some management tool), sqlite will tell you what tables it would read, in which order, using which indices and how big it expects the result set to be. So create various indices and experiment with tweaking the query and look what explain query plan tells you. Reading by primary key is fastest, followed by covering index, noncovering index and temporary index or linear search are worst (except join by temporary b-tree is near-optimal when you are not filtering out anything). Sqlite before 3.7.4 didn't support temporary indices and temporary b-tree joins, so it is much more critical to have good indices in older versions, because they did happily regress to quadratic or worse complexity and that would take ages to complete. Oh, and remember to remove the indices you end up not using to save space and time needed to keep them up to date. -- Jan 'Bulb' Hudec <b...@ucw.cz> _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users