Folks.
Looking for some advice from hardened SQliters...
- Our application uses an SQLite 3.4.1 database with 8 tables. 1 of
the tables may contain tens or maybe hundreds of thousands of records
with about 30 fields.
- The tables form a linked tree type hierarchy where one table is the
trunk, another is the boughs and then branches, twigs and finally
very many leaves. UIDs in the tables allow us to find the children
of a record in the next table.
In one operation we have to construct a summary of a portion of the
tree. This basically means picking the stem, then recursively
finding its branches, their twigs and so on.
For each twig we have to find all the leaves. The Leaves table has
maybe 15000 records and we have a query where we search the Leaves
table once for each twig we find. In our part of the tree there
might be 200 twigs, and so we end up searching the leaves DB 200
times, to build up the list of leaves attached to the twigs
Unlike a real tree, our leaves may be attached to more than one twig,
so we need to keep the tables linked but abstracted, hence the
separate tables.
The problem we have is performance......
forgetting the main part of the tree, let's jump to leaves:
- We have queried all the way down to the twigs quite quickly, and we
have found 200 twigs. Each twig is connected to a few leaves.
in pseudocode:
for (z=0;z<numTwigs;z++) //there are 200
{
sqlite3_get_table("select * from LEAVES where leafID =
theTwig[z].childLeafID")
// we end up performing this query 200 times, once for each of the
parent twigs.
}
It may take 100mSec to perform each select query on our 15000 entry
LEAVES table (which have about 30 fields each, most of which we
want), this ends up taking 20 seconds which is not acceptable in our
application.
I am certain that we are missing some important SQL language or
SQLite optimisations, and I wanted to ask for advice. We are
already using BEGIN and COMMIT transactions around this code
(although see later for my concerns about that, since maybe its not
working)
Profiling the code, it spends alot of the time reading into the SQLite pager.
Other information:
- the app is a highly multithreaded server with query connections
spawing new threads per request. Each connection needs to query the
database, and our approach thus far (in order to improve performance)
is to open the database ONCE (on the first thread that needs it) call
BEGIN and leave it open. We then have a lock which allows later
threads to request the database lock (we pass the global DB handle
back to the new thread). When its finished it releases it and another
thread can get the handle. Once every minute we COMMIT to flush the
database, back it up, then BEGIN another transaction.
I am slightly concerned that our BEGIN, COMMIT and usage of the
handle are made on different threads, and this may be somehow
defeating the transaction or not taking full advantage of the page
cache. Perhaps we should close the DB down properly and do the
entire open process and close transaction locally in the new thread ?
This is all a READ ONLY transaction, by the way.
We have considered copying the entire database into a memory resident
SQLite database at the start of this transaction (finding all the
records ONCE takes less than a second, so we may be able to afford
this drastic step) but I have read that memory resident databases may
be not much faster.
We have even considered finding all the records ONCE into our own
cache them manually performing the searches using our own code
(hoping it would be faster), but I can't help feeling that we should
be able to do this faster just using SQLite.
We have considered preparing the transaction first binding the
parameter, and I wondered if that might help ? Would preparing:
"select * from LEAVES where leafID = boundParam1"
really make much difference than just running it ?
We are typically using a Quad 3.0GHz processor Macintosh so we have
plenty of horsepower. I can't really believe that we should accept
that this process will take as long as 20 seconds. I am convinced it
can be massively optimised.
There are so many options, each of which would require quote alot of
effort to prototype and test, I wondered what wisdom any hardened
SQlite folk could offer. What is the best approach to attack this
problem ?
Thanks for any experience you can share..
Mark
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users