Hi,

This post is a question directed to D. Richard Hipp :

I have been using SQLite for 3 years in a records linkage software 
package I have developed. My organization recently had to use the 
package to perform daily linkage of large administrative governmental 
registers (up to 20 million records each). During the linkage process, 
auxiliary tables containing records "fingerprints" must be created, and 
two columns be indexed in them.

SQLite provides decent indexing times for such tables with up to 1M 
rows, but beyond this size the (already well-discussed) critical slowing 
down of indexing performance due to disk nonlocality kicks in. The only 
workaround I could imagine to ease the problem would be to duplicate the 
auxiliary table and load pre-sorted rows in it, with sort key being the 
column I intend to index on. This is unfortunately too costly in terms 
of disk space used.

I therefore had to develop an alternate datasource type (based on flat 
files) in order for my system to be able to efficiently handle big 
files. Which is a pity since SQLite provides great features I still 
would like to be able to rely upon when dealing with large files.

Now my question: in the "To do" section of SQLite's wiki, you mention 
"Develop a new sort implementation that does much less disk seeking. Use 
to improve indexing performance on large tables.". I have been seeing 
this entry for 3 years but nothing concrete seems to have happened on 
this front in the meantime. Do you have any idea about if (and when) you 
will work on this in the future ? Can I nourish reasonable hopes that 
the situation will improve on this aspect within the next 2 years ? This 
really has become a critical factor for me to decide on my future 
development strategy with this product.

Thanks in advance for any useful information.

Jerome


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to