On Sunday 29 October 2006 18:47, Joe Wilson wrote: > The last test that simulated pre-sorting 2 million rows > in index order prior to insert may show optimal insert speed > since it's only appending pages to the table and the index, > but it may not be realistic, since you may not have sufficient > memory to pre-sort all data in memory prior to insert. > (Although if you have the memory to spare, this is the way to go).
Presorting is unfortunately not always possible, e.g. when there are several indexes (with different order). But increasing the cache seems a good solution for this anyway (and is less cumbersome than presorting or staging). Your ideas set me thinking on why even with the larger cache, we still do worse than mysql on large datasets, and did some further experimenting (code at the end of the mail, all tests with larger cache size). I guess that it is not only the number of entries that is important, but that there are at least two different factors affecting the performance: the number of different values actually present in the index, and the number of rows that have one value The number of values has an important influence. If there is only a limited number of possible values, performance stays nearly identical regardless of the data size (and thus faster than mysql). For larger numbers of possible values behaviour gets a bit unexpected (to me at least): the time needed for random inserts rises about linear with the data size, until it reaches a certain point, after which the insert time remains the same regardless of dataset size. The height of this plateau seems to related to the number of possible values in a linear fashion. Any ideas? ---------- code ---------- proc runtest {sqlfile} { set result {} set delay 1000 exec sync; after $delay; set t [time "exec sqlite3 lc.db <$sqlfile" 1] set t [expr {[lindex $t 0]/1000000.0}] lappend result [format %.3f $t] return $result } set numvlist {2 10 100 200 300 400 500 600 700 800 1000 2000 3000 5000 7000 10000} # Initialize the environment # expr srand(1) catch {exec /bin/sh -c {rm -f *.db}} set fd [open clear.sql w] foreach numv $numvlist { puts $fd "drop table t$numv;" } close $fd catch {exec mysql drh <clear.sql} set fd [open 2kinit.sql w] puts $fd { PRAGMA default_cache_size=2000; } close $fd exec sqlite3 def.db <2kinit.sql set fd [open nosync-init.sql w] puts $fd { PRAGMA default_cache_size=20000; } close $fd exec sqlite3 lc.db <nosync-init.sql # initialize set fd [open test.sql w] foreach numv $numvlist { puts $fd "BEGIN;" puts $fd "CREATE TABLE t${numv}(a INTEGER, b INTEGER);" puts $fd "CREATE INDEX tq_b${numv} on t${numv}(b);" puts $fd "COMMIT;" } close $fd runtest test.sql set o [open results.csv w] set temp "tablesize" foreach numv $numvlist { # append temp \tsqlite$numv\tsqlite-lc$numv\t mysql$numv append temp \t$numv } puts $o $temp set step 20000 foreach numv $numvlist { set num 0 puts "tablesize: $num numv: $numv" set fd [open test.sql w] puts $fd "BEGIN;" set r 0 for {set i 1} {$i<=$step} {incr i} { puts $fd "INSERT INTO t$numv VALUES([expr {$num+$i}],$r);" incr r if {$r == $numv} {set r 0} } puts $fd "COMMIT;" close $fd eval lappend resultline [runtest test.sql] } puts $o $num\t[join $resultline \t] flush $o puts start for {} {$num < 2000000} {incr num $step} { set resultline {} foreach numv $numvlist { puts "tablesize: $num numv: $numv" set fd [open test.sql w] puts $fd "BEGIN;" for {set i 1} {$i<=$step} {incr i} { set r [expr {int(rand()*$numv)}] puts $fd "INSERT INTO t$numv VALUES([expr {$num+$i}],$r);" } puts $fd "COMMIT;" close $fd eval lappend resultline [runtest test.sql] } puts $o $num\t[join $resultline \t] flush $o } close $o -- Dr Peter De Rijk E-mail: [EMAIL PROTECTED] Bioinformatics Unit Tel. +32 3 265 10 08 Department of Molecular Genetics VIB8 Fax. +32 3 265 10 12 University of Antwerp http://www.molgen.ua.ac.be/ Universiteitsplein 1 B-2610 Antwerpen, Belgium The glass is not half full, nor half empty. The glass is just too big. ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------