On Fri, Jan 29, 2010 at 6:02 AM, Doyel5 <doyel...@gmail.com> wrote:

> I deleted all indexes, kept only the indexes on the temporary tables,
> namely
> - tempSimArgs and tempPartArgs, and seemingly my original query's running
> time decreased to around 1 sec (when I ran the query the second time). The
> execution time of the first time I run my query varies wildly, I dunno why.
> Any ideas?
>


You mentioned 800 MB as the size of your base. I think that although indexes
effectively used, the actual data can be spread across the size of your base
so first query involves many hard disk seek operations and disk cache doing
unnecessary job (for example sqlite needs only 1024 bytes at a particular
offset but the OS reads much more than that). To be sure that there's no
full or almost-full table scan I use VFS monitoring in such cases. If you
install your functions as your own file system but forward all calls to
original VFS, you can calculate real data flow for your query.

As for Access and MS db engines like Jet, we only can guess how it works,
maybe when you first open your db, the engine starts some caching
immediately, we don't know, it's closed software.

Also If you want your first join or select operations for a large base to be
faster as for general rule I'd suggest to minimize data record size for any
table that involved in join or select operations or make the page size
lower.

I did some tests to explain why I think it would help. My explanation can be
questionable so please correct me if I'm wrong. Sqlite bases works with
pages that for example can have size 1024 bytes, all data packed in pages
and operate by pages. As long as I know the only operation involving fewer
sizes is header operations. Also B-tree cells always come with the data so
they're not separated inside the db file. The only complex thing is overflow
pages when one single record can not fit inside one page. The consequence of
such format is that if the data record is fewer than 1024 bytes, visiting a
record is always full record read. Just made a quick test with VFS
monitoring:

CREATE TABLE [TestTable] ([Id] INTEGER PRIMARY KEY AUTOINCREMENT,[Text]
TEXT)
100,000 records
INSERT INTO TestTable (Text) VALUES ("1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ")

db size: 4,671,488

SELECT * FROM TestTable
VFS read flow: 4,669,456
SELECT rowid FROM TestTable
VFS read flow: 4,669,456

But if insert records with large texts (I chose intentionally 2 kb for
overflow pages to be present), SELECT rowid made twice as low page reading
comparing to SELECT *.

By the way, just discovered a strange trick that may help in similar or
other cases.
Creating Index for this table on [Id] looks like terrible idea wasting the
space (since rowid already ordered), but actually after creating the index

CREATE INDEX [idx_ID] ON [TestTable] ([ID] )

changing SELECT rowid query to

SELECT rowid FROM TestTable ORDER BY Id

led to only 1,312,000 data flow (comparing to 4,669,456) and it really makes
sence because:
- indexes in sqlite contains the data and rowid so it doesn't need the main
table to return the results, - comparing to the rowid B-tree that contains
the full datarecords, this index B-tree more effectively packed so need to
visit fewer pages. I don't know what one should do to apply this trick in
complex queries, but I hope it is possible.

Max
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to