On Sat, Jul 18, 2009 at 12:22:08AM -0700, Brian Dantes scratched on the wall: > I have a largish DB around 1GB in size. There is a table with 5 million rows > in it that has a 3-key index on it. This database file is fragmented -- to > what degree I'm not sure. > > Using sqlite 3.6.14, dropping and recreating the index under WinXP Pro (on a > local disk) with no other activity takes about 90 seconds. With the same > database on Red Hat Linux 64-bit with no other activity and a local disk, > the index recreation takes almost 30 *minutes*. The activity is completely > I/O bound. If I vacuum the database, the Windows time drops to 70 seconds, > and the Linux time drops to 7 minutes.
Assuming your data import is semi-random (in terms of index order) and assuming you haven't raised the default cache size, chances are you're going to be thrashing the page cache. On nearly any system, bumping up the page cache size anywhere from 2x to 100x will almost always have a dramatic effect on the index build speed. If you're using a more modern version of SQLite, the Windows database will default the page size to the block size of the file system you're using. I understand on most Windows systems that is 4K. The UNIX system is very likely using the default 1K page size, unless you've changed it. Since the page cache is defined by page numbers and not memory size, the page cache on the Windows box is likely to be 4x bigger. An index is also a fairly efficient use of page space (assuming one of your index fields is not large strings) so the Windows box will be doing about 1/4th the I/O. Matching the pages to blocks may also help with I/O performance. There is also the question of how much memory both systems have and how much the OS can use for a file cache, I'm guessing that's not as critical as the difference in page size. The other issue is disk speed. Because SQLite waits for the drive to indicate the data has been physically written to the platters, there is a direct link between disk rotation speed and I/O performance. If one system has a 5200 RPM disk and one has a 7200, you're likely to notice a difference. > Conversely, with this same table and index starting from empty, if I start > inserting rows with the index in place, on Windows the insertion speed drops > dramatically after about 100-200K rows and takes about 6-7 *hours* to > complete. On Linux 64-bit, the same experiment takes less than an hour to > complete and the insertion speed seems fairly constant. It sounds like you're not using transactions to batch the inserts, requiring I/O transactions for each row. For bulk inserts, it is much more efficient to batch 100 to 10000+ inserts into a transaction (BEGIN; INSERT...; INSERT...; ... COMMIT;) to reduce the I/O load. That should reduce your import times on both platforms. I'd also hope the times became similar if you use transactions. I'm a lot more surprised by this set of results, but I'm guessing the root of it is still the difference in page size. There might be some issue on the Windows platform that is getting uncovered by the different access patterns that further limits performance. Try bumping the page size on the Linux DB up to 4K and see what happens. Also, try using transactions for the inserts and see what happens, as well as verify the disk speeds are similar (although using transactions should help with this). -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users