On Fri, Jan 29, 2010 at 6:02 AM, Doyel5 <doyel...@gmail.com> wrote: > I deleted all indexes, kept only the indexes on the temporary tables, > namely > - tempSimArgs and tempPartArgs, and seemingly my original query's running > time decreased to around 1 sec (when I ran the query the second time). The > execution time of the first time I run my query varies wildly, I dunno why. > Any ideas? >
You mentioned 800 MB as the size of your base. I think that although indexes effectively used, the actual data can be spread across the size of your base so first query involves many hard disk seek operations and disk cache doing unnecessary job (for example sqlite needs only 1024 bytes at a particular offset but the OS reads much more than that). To be sure that there's no full or almost-full table scan I use VFS monitoring in such cases. If you install your functions as your own file system but forward all calls to original VFS, you can calculate real data flow for your query. As for Access and MS db engines like Jet, we only can guess how it works, maybe when you first open your db, the engine starts some caching immediately, we don't know, it's closed software. Also If you want your first join or select operations for a large base to be faster as for general rule I'd suggest to minimize data record size for any table that involved in join or select operations or make the page size lower. I did some tests to explain why I think it would help. My explanation can be questionable so please correct me if I'm wrong. Sqlite bases works with pages that for example can have size 1024 bytes, all data packed in pages and operate by pages. As long as I know the only operation involving fewer sizes is header operations. Also B-tree cells always come with the data so they're not separated inside the db file. The only complex thing is overflow pages when one single record can not fit inside one page. The consequence of such format is that if the data record is fewer than 1024 bytes, visiting a record is always full record read. Just made a quick test with VFS monitoring: CREATE TABLE [TestTable] ([Id] INTEGER PRIMARY KEY AUTOINCREMENT,[Text] TEXT) 100,000 records INSERT INTO TestTable (Text) VALUES ("1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ") db size: 4,671,488 SELECT * FROM TestTable VFS read flow: 4,669,456 SELECT rowid FROM TestTable VFS read flow: 4,669,456 But if insert records with large texts (I chose intentionally 2 kb for overflow pages to be present), SELECT rowid made twice as low page reading comparing to SELECT *. By the way, just discovered a strange trick that may help in similar or other cases. Creating Index for this table on [Id] looks like terrible idea wasting the space (since rowid already ordered), but actually after creating the index CREATE INDEX [idx_ID] ON [TestTable] ([ID] ) changing SELECT rowid query to SELECT rowid FROM TestTable ORDER BY Id led to only 1,312,000 data flow (comparing to 4,669,456) and it really makes sence because: - indexes in sqlite contains the data and rowid so it doesn't need the main table to return the results, - comparing to the rowid B-tree that contains the full datarecords, this index B-tree more effectively packed so need to visit fewer pages. I don't know what one should do to apply this trick in complex queries, but I hope it is possible. Max _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users