If you have only one index, then pre-sorting your large datasets prior to inserting with the default sqlite cache will yield twice as good insert performance as not pre-sorting your data and using a huge cache. This stands to reason since you're basically appending pages to the end of the database file. (I'm assuming it's just an append since the pre-sorted inserts' timing is constant regardless of the number of rows).
But if you wish to maintain more than one index and still have good insert performance with huge datasets, then there's not much else you can do other than using some combination of the techniques already discussed, or use another database or some other form of data storage altogether. SQLite's virtual table mechanism might be useful in your case. You can create custom code and a custom file format that is optimal for your dataset and indexes and still have the benefit of an SQL interface: http://www.sqlite.org/cvstrac/wiki?p=VirtualTables This paper may give you some ideas as well: http://labs.google.com/papers/sawzall.html ----- Original Message ---- From: Peter De Rijk <[EMAIL PROTECTED]> To: sqlite-users@sqlite.org Sent: Monday, October 30, 2006 1:00:25 PM Subject: Re: [sqlite] serious performance problems with indexes Presorting is unfortunately not always possible, e.g. when there are several indexes (with different order). But increasing the cache seems a good solution for this anyway (and is less cumbersome than presorting or staging). Your ideas set me thinking on why even with the larger cache, we still do worse than mysql on large datasets, and did some further experimenting (code at the end of the mail, all tests with larger cache size). I guess that it is not only the number of entries that is important, but that there are at least two different factors affecting the performance: the number of different values actually present in the index, and the number of rows that have one value The number of values has an important influence. If there is only a limited number of possible values, performance stays nearly identical regardless of the data size (and thus faster than mysql). For larger numbers of possible values behaviour gets a bit unexpected (to me at least): the time needed for random inserts rises about linear with the data size, until it reaches a certain point, after which the insert time remains the same regardless of dataset size. The height of this plateau seems to related to the number of possible values in a linear fashion. ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------