If you have only one index, then pre-sorting your large datasets prior to 
inserting with the default sqlite cache will yield twice as good insert 
performance as not pre-sorting your data and using a huge cache. 
This stands to reason since you're basically appending pages to the end
of the database file. (I'm assuming it's just an append since the pre-sorted
inserts' timing is constant regardless of the number of rows).

But if you wish to maintain more than one index and still have good insert 
performance with huge datasets, then there's not much else you can 
do other than using some combination of the techniques already discussed, 
or use another database or some other form of data storage altogether.

SQLite's virtual table mechanism might be useful in your case.
You can create custom code and a custom file format that is optimal for
your dataset and indexes and still have the benefit of an SQL interface:

 http://www.sqlite.org/cvstrac/wiki?p=VirtualTables

This paper may give you some ideas as well:

 http://labs.google.com/papers/sawzall.html

----- Original Message ----
From: Peter De Rijk <[EMAIL PROTECTED]>
To: sqlite-users@sqlite.org
Sent: Monday, October 30, 2006 1:00:25 PM
Subject: Re: [sqlite] serious performance problems with indexes

Presorting is unfortunately not always possible, e.g. when 
there are several indexes (with different order). But increasing 
the cache seems a good solution for this anyway (and is less 
cumbersome than presorting or staging). Your ideas set me thinking
on why even with the larger cache, we still do worse than mysql on large 
datasets, and did some further experimenting (code at the end of the mail, 
all tests with larger cache size). 

I guess that it is not only the number of entries that is 
important, but that there are at least two different factors 
affecting the performance: the number of different values actually present 
in the index, and the number of rows that have one value

The number of values has an important influence. If there is 
only a limited number of possible values, performance stays nearly 
identical regardless of the data size (and thus faster than mysql).

For larger numbers of possible values behaviour gets a bit 
unexpected (to me at least): the time needed for random inserts rises 
about linear with the data size, until it reaches a certain point, 
after which the insert time remains the same regardless of dataset 
size. The height of this plateau seems to related to the number of 
possible values in a linear fashion.







-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to