Folks.

Looking for some advice from hardened SQliters...

- Our application uses an SQLite 3.4.1 database with 8 tables.  1 of 
the tables may contain tens or maybe hundreds of thousands of records 
with about 30 fields.

- The tables form a linked tree type hierarchy where one table is the 
trunk, another is the boughs and then branches, twigs and finally 
very many leaves.  UIDs in the tables allow us to find the children 
of a record in the next table.

In one operation we have to construct a summary of a portion of the 
tree.  This basically means picking the stem, then recursively 
finding its branches, their twigs and so on.

For each twig we  have to find all the leaves.  The Leaves table has 
maybe 15000 records and we have a query where we search the Leaves 
table once for each twig we find.  In our part of the tree there 
might be 200 twigs, and so we end up searching the leaves DB 200 
times, to build up the list of leaves attached to the twigs

Unlike a real tree, our leaves may be attached to more than one twig, 
so we need to keep the tables linked but abstracted, hence the 
separate tables.

The problem we have is performance......

forgetting the main part of the tree, let's jump to leaves:

- We have queried all the way down to the twigs quite quickly, and we 
have found 200 twigs.  Each twig is connected to a few leaves.

in pseudocode:
for (z=0;z<numTwigs;z++)  //there are 200
{
    sqlite3_get_table("select * from LEAVES where leafID = 
theTwig[z].childLeafID")
// we end up performing this query 200 times, once for each of the 
parent twigs.
}

It may take 100mSec to perform  each select  query on our 15000 entry 
LEAVES table (which have about 30 fields each, most of which we 
want), this ends up taking 20 seconds which is not acceptable in our 
application.

I am certain that we are missing some important SQL language or 
SQLite optimisations, and I wanted to ask for advice.    We are 
already using BEGIN and COMMIT transactions around this code 
(although see later for my concerns about that, since maybe its not 
working)

Profiling the code, it spends alot of the time reading into the SQLite pager.

Other information:
- the app is a highly multithreaded server with query connections 
spawing new threads per request. Each connection needs to query the 
database, and our approach thus far (in order to improve performance) 
is to open the database ONCE (on the first thread that needs it) call 
BEGIN and leave it open.  We then have a lock which allows later 
threads to request the database lock (we pass the global  DB handle 
back to the new thread). When its finished it releases it and another 
thread can get the handle.  Once every minute we COMMIT to flush the 
database, back it up, then BEGIN another transaction.

I am slightly concerned that our BEGIN, COMMIT and usage of the 
handle are made on different threads, and this may be somehow 
defeating the transaction or not taking full advantage of the page 
cache.  Perhaps we should close the DB down properly and do the 
entire open process and close transaction locally in the new thread ? 
This is all a READ ONLY transaction, by the way.

We have considered copying the entire database into a memory resident 
SQLite database at the start of this transaction  (finding all the 
records ONCE takes less than a second, so we may be able to afford 
this drastic step) but I have read that memory resident databases may 
be not much faster.

We have even considered finding all the records ONCE into our own 
cache them manually performing the searches using our own code 
(hoping it would be faster), but I can't help feeling that we should 
be able to do this faster just using SQLite.

We have considered preparing the transaction first binding the 
parameter, and I wondered if that might help ?  Would preparing:
"select * from LEAVES where leafID  =  boundParam1"
really make much difference than just running it ?

We are typically using a Quad 3.0GHz processor Macintosh so we have 
plenty of horsepower.  I can't really believe that we should accept 
that this process will take as long as 20 seconds.  I am convinced it 
can be massively optimised.

There are so many options, each of which would require quote alot of 
effort to prototype and test, I wondered what wisdom any hardened 
SQlite folk could offer.   What is the best approach to attack this 
problem ?

Thanks for any experience you can share..

Mark

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to