Hi, This post is a question directed to D. Richard Hipp :
I have been using SQLite for 3 years in a records linkage software package I have developed. My organization recently had to use the package to perform daily linkage of large administrative governmental registers (up to 20 million records each). During the linkage process, auxiliary tables containing records "fingerprints" must be created, and two columns be indexed in them. SQLite provides decent indexing times for such tables with up to 1M rows, but beyond this size the (already well-discussed) critical slowing down of indexing performance due to disk nonlocality kicks in. The only workaround I could imagine to ease the problem would be to duplicate the auxiliary table and load pre-sorted rows in it, with sort key being the column I intend to index on. This is unfortunately too costly in terms of disk space used. I therefore had to develop an alternate datasource type (based on flat files) in order for my system to be able to efficiently handle big files. Which is a pity since SQLite provides great features I still would like to be able to rely upon when dealing with large files. Now my question: in the "To do" section of SQLite's wiki, you mention "Develop a new sort implementation that does much less disk seeking. Use to improve indexing performance on large tables.". I have been seeing this entry for 3 years but nothing concrete seems to have happened on this front in the meantime. Do you have any idea about if (and when) you will work on this in the future ? Can I nourish reasonable hopes that the situation will improve on this aspect within the next 2 years ? This really has become a critical factor for me to decide on my future development strategy with this product. Thanks in advance for any useful information. Jerome _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users