Hi

We are using SQLite for a fairly big (but simple) calculation, and have some problems when creating an index on a database with 600.000.000 rows. Specifically it has not ended even after 5 days of running. We have done it successfully on 25% of the full data base, and are now wondering what we can do.

Details:
We have a very simple database, just two columns VAR1 and VAR2 which are both integers. Every combination of (VAR1, VAR2) is unique. We want to add an index by using the following command
 CREATE UNIQUE INDEX on DATABASE (VAR1, VAR2);
After creating the index we have to do appr. 100.000.000 lookups on (VAR1, VAR2). We have benchmarked the results on a small(er) database with 145.000.000 rows where the indexing takes around 1 hour and the speed of the lookup is _very_ acceptable. However when running the indexing command on the full database it does not terminate even after 5-6 days (unfortunately our server got rebooted while it was running). We are wondering on the rather drastic increase in run time when going from 145.000.000 rows to 600.000.000 rows.

?: does the unique keyword matter, ie. does it make lookups faster and does it slow down the index creation. ?: does the order of the data matter a lot - right now it is partially sorted in that all combinations involving a single value of VAR2 (not VAR1) are next to each other. ?: is it possible to enable some kind of logging/debugging mode so we can monitor progression and see how it is doing.

We run the program on an 2600mhz Opteron with 16GB of ram and so far it seems we are not getting killed by I/O or RAM: using top frequently shows around 99% cpu usage and less than 40% memory usage. We are using the following program (including all pragmas):

sqlite3 blasthits.db "pragma default_cache_size = 12000000;"
sqlite3 blasthits.db "pragma synchronous = off;
  pragma temp_store = 2;
  create unique index probeseqIDX on blasthits (probeidx,prokMSA);"
(here blashits.db is the database name and probeidx, prokMSA are VAR1, VAR2).

?: Is there anything we can do to speed it up further?

Thanks a lot for any help, even if it is "it does not seem to be possible to improve the speed of this operation".

Jim and Kasper

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to