If you pre-sort the data prior to inserting into sqlite you will see much better timings for both the default cache size and the larger cache size.
When I run your script (unmodified, except for the removal of MySQL) up to 400000: tablesize sqlite sqlite-largecache 0 1.956 1.956 20000 2.064 2.079 40000 2.095 2.111 60000 2.173 2.173 80000 2.189 2.283 100000 2.345 2.298 120000 2.345 2.360 140000 2.407 2.361 160000 2.470 2.423 180000 2.548 2.501 200000 2.564 2.439 220000 2.830 2.439 240000 2.876 2.907 260000 2.829 2.564 280000 3.423 3.533 300000 4.939 3.564 320000 7.236 3.736 340000 7.283 3.751 360000 10.611 3.767 380000 11.142 3.845 400000 13.736 3.798 When I make the following change to your script to simulate an ordered data set (okay, it's cheating slightly - no pre-sort need be performed here): # set r [expr {int(rand()*500000)}] set r [expr {($num+$i)*5}] tablesize sqlite sqlite-largecache 0 1.878 1.894 20000 1.925 1.923 40000 1.923 1.923 60000 1.923 1.954 80000 1.970 1.923 100000 1.923 1.908 120000 1.923 1.970 140000 1.940 2.383 160000 2.048 1.908 180000 1.923 2.002 200000 1.923 1.923 220000 1.939 1.923 240000 1.923 1.923 260000 1.939 1.923 280000 1.939 1.939 300000 1.954 1.939 320000 1.939 1.923 340000 1.970 1.954 360000 1.939 1.954 380000 1.923 1.939 400000 1.970 1.939 I wonder if other databases pre-sort their batch inserts by index order to achieve more consistant insert speed with datasets that exceed cache size. ----- Original Message ---- From: Peter De Rijk <[EMAIL PROTECTED]> To: sqlite-users@sqlite.org Sent: Saturday, October 28, 2006 10:40:03 AM Subject: Re: [sqlite] serious performance problems with indexes I have checked more fine grained (results and code at the end of the mail), and it seems indeed that the extreme growth of time needed for inserts only starts at around 160000 rows (on my system). Before that, it also rises, but much much more slowly. As this indeed looked like a cache effect, I tried setting the default_cache_size to something big (20000). This was enough to remove the bend (at least till 2000000 rows), so that the increase in time kept going at the same slower pace as for the smaller data sizes. Curiously, mysql which I included as a reference, manages to keep nearly the same insert time even for the largest data sets. Using the large cache size, sqlite is faster than mysql until about 700000 rows. At 2000000 rows mysql is about 3 times faster than sqlite with large cache (vs more than 20 times faster than sqlite with default cache size). ------- tablesize sqlite sqlite-largecache mysql 0 1.12 1.04 2.86 20000 1.09 1.17 2.89 40000 1.14 1.15 3.27 60000 1.23 1.2 3.18 80000 1.47 1.25 3.01 100000 1.3 1.32 2.94 120000 1.4 1.34 2.94 140000 2.06 1.38 3 160000 2.41 1.46 2.99 180000 3.09 1.54 3.03 200000 3.31 1.62 3.04 220000 3.84 1.72 3.06 240000 4.22 1.7 3.03 260000 4.23 1.76 3.06 280000 4.83 1.94 2.98 300000 6.27 2.08 3.07 320000 9.02 2.06 3.11 340000 9.48 1.94 2.88 360000 10 2.13 3.04 380000 10.6 2.21 3.13 400000 10.72 2.35 3.05 420000 11.35 2.24 3.11 440000 11.55 2.49 3.05 460000 11.75 2.5 3.07 480000 16.89 2.39 3.1 500000 17.81 2.59 3.09 520000 19.22 2.4 3.03 540000 19.35 2.77 3.05 560000 19.44 2.59 3.1 580000 19.87 2.67 3.08 600000 21.47 2.88 3.09 620000 20.82 2.74 3.08 640000 21.55 3 3.11 660000 25.17 3.09 3.12 680000 29.89 2.84 3.16 700000 28.48 3.25 3.09 720000 28.62 3.31 3.2 740000 28.69 3.23 3.07 760000 30.34 3.59 3.1 780000 30.27 3.64 3.13 800000 30.57 3.37 3.14 820000 31.41 3.57 3.06 840000 31.89 3.89 3.09 860000 33.77 3.57 3.05 880000 36.01 3.87 3.1 900000 37.58 4.04 3.07 920000 40.59 3.68 3.04 940000 40.9 3.72 3.09 960000 41.45 4.24 3.1 980000 42.05 4.28 3.13 1000000 41.11 4.09 3.03 1020000 42.29 4.63 3.11 1040000 42.09 4.81 3.17 1060000 42.8 4.34 3.14 1080000 43.82 4.43 3.02 1100000 43.01 4.5 3.12 1120000 44.08 4.84 3.1 1140000 45.4 5.08 3.16 1160000 47.51 4.66 3.13 1180000 47.46 5.02 3.17 1200000 48.21 5.42 3.17 1220000 49.09 5.33 3.21 1240000 49.65 5.12 3.1 1260000 50.55 5.3 3.11 1280000 50.56 5.8 3.17 1300000 52.35 5.7 3.17 1320000 53.67 5.43 3.14 1340000 52.38 5.79 3.26 1360000 51.44 6.03 3.18 1380000 52.69 6.16 3.08 1400000 53.88 5.67 3.04 1420000 52.61 5.86 3.2 1440000 53.61 5.92 3.01 1460000 52.98 5.98 3.08 1480000 54.74 6.27 3.18 1500000 55.4 6.81 3.18 1520000 58.45 6.54 3.12 1540000 57.14 6.65 3.12 1560000 58.92 6.97 3.17 1580000 58.17 6.86 3.22 1600000 58.29 6.83 3.15 1620000 58.19 6.6 3.18 1640000 58.77 7.3 3.16 1660000 60.59 7.44 3.22 1680000 61.12 7.14 3.18 1700000 61.25 7.41 3.33 1720000 62.2 7.18 3.21 1740000 61.48 7.95 3.02 1760000 62.63 7.91 3.17 1780000 62.58 7.52 3.19 1800000 62.28 7.56 3.14 1820000 64.2 7.62 3.33 1840000 64.18 8.6 3.17 1860000 64.34 8.71 3.33 1880000 63.93 8.44 3.22 1900000 62.98 8.13 3.3 1920000 63.9 8.45 3.29 1940000 64.85 8.36 3.21 1960000 68.26 10.75 4.81 1980000 67.81 9.46 3.17 ---------- code ---------- proc runtest {sqlfile} { set result {} set delay 1000 exec sync; after $delay; set t [time "exec sqlite3 def.db <$sqlfile" 1] set t [expr {[lindex $t 0]/1000000.0}] lappend result [format %.3f $t] exec sync; after $delay; set t [time "exec sqlite3 lc.db <$sqlfile" 1] set t [expr {[lindex $t 0]/1000000.0}] lappend result [format %.3f $t] exec sync; after $delay; set t [time "exec mysql -f drh <$sqlfile" 1] set t [expr {[lindex $t 0]/1000000.0}] lappend result [format %.3f $t] return $result } # Initialize the environment # expr srand(1) catch {exec /bin/sh -c {rm -f s*.db}} set fd [open clear.sql w] puts $fd { drop table t1; drop table t2; } close $fd catch {exec mysql drh <clear.sql} set fd [open 2kinit.sql w] puts $fd { PRAGMA default_cache_size=2000; } close $fd exec sqlite3 def.db <2kinit.sql set fd [open nosync-init.sql w] puts $fd { PRAGMA default_cache_size=20000; } close $fd exec sqlite3 lc.db <nosync-init.sql # initialize set fd [open test.sql w] puts $fd "BEGIN;" puts $fd "CREATE TABLE t1(a INTEGER, b INTEGER);" puts $fd "CREATE INDEX tq_b on t1(b);" puts $fd "COMMIT;" close $fd runtest test.sql set o [open results.csv w] puts $o "tablesize\tsqlite\tsqlite-largecache\t mysql" set step 20000 for {set num 0} {$num < 2000000} {incr num $step} { puts $num set fd [open test.sql w] puts $fd "BEGIN;" for {set i 1} {$i<=$step} {incr i} { set r [expr {int(rand()*500000)}] puts $fd "INSERT INTO t1 VALUES([expr {$num+$i}],$r);" } puts $fd "COMMIT;" close $fd puts $o $num\t[join [runtest test.sql] \t] flush $o } close $o -- Dr Peter De Rijk E-mail: [EMAIL PROTECTED] Bioinformatics Unit Tel. +32 3 265 10 08 Department of Molecular Genetics VIB8 Fax. +32 3 265 10 12 University of Antwerp http://www.molgen.ua.ac.be/ Universiteitsplein 1 B-2610 Antwerpen, Belgium The glass is not half full, nor half empty. The glass is just too big. ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------