Avner Levy wrote:
Hi,
I've written the following program to test the sqlite performance for a specific scenario I have.
I've used most tricks I've found but still when the database gets big the performance gets unacceptable.
The performance problem happens of course only if the indexes are defined.
Since I need the indexes, is there any other option that can make it faster ?
When the database gets big the insert rate gets to 50 rows per second.
Thanks in advance.


* It seems that the larger the DB gets, the journal copies more and more pages out of the database into the file, which kills performance.


Each transaction in your test involves 30000 INSERTs into a single table defined (roughly) as follows:

    CREATE TABLE test(a,b);
    CREATE INDEX idx1 ON test(a);
    CREATE INDEX idx2 ON test(b);

In these INSERTs, the values for test.a are non-decreasing.  That means
that new entries into the "test" table and into the "idx1" index always
go into the same place in their B*Trees - specifically at the end.
But the values for test.b are random, which means that entries into
idx2 are distributed throughout its B*Tree.  Because changes to
test and idx1 are localized, only a handful of disk blocks are changed
(many others are added, but few existing ones are changed) but the
changes to idx2 are not localized, which means that nearly every disk
block associated with idx2 must be changed.  Modifying the (thousands)
of disk blocks associated with idx2 is what is taking so long.  I do
not know anything I can do inside SQLite to make it go any faster.  I
do not know of any other way of creating an index that could work
around this issue.

If you disable idx2, you will see that the inserts go much faster.  I
also observe that the journal file is about 1/3rd the size of the database,
which is consistent with the observation that every disk block associated
with idx2 needs to be changed.  My TCL test script for this situation
is appended to this message.

If you really are doing millions of INSERTs prior to doing any SELECTs,
you might consider deferring the creation of idx2 until after all the
data has been put into the database.  Generally speaking, it is a little
faster to create an index for existing data than it is to create the
index at the same time that the data is being inserted.  But the
difference is not that great.  And, of course, this won't help if in
reality you need to do some SELECTs along the way...

I'm curious to know how other database engines deal with this problem.
Have you tried a similar experiment on MySQL, or PostgreSQL, or even
Oracle?  How did they perform?

Can any readers suggest ways that I have not thought of for making
large numbers of non-localized INSERTs go faster?
--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565



file delete -force test.db
file delete -force test.db-journal
sqlite db test.db
db eval "CREATE TABLE test(parent_id INTEGER, age INTEGER)"
db eval "CREATE INDEX test_parent_id ON test(parent_id)"
db eval "CREATE INDEX test_age ON test(age)"
db eval "PRAGMA synchronous=OFF"
set start [clock clicks -milli]
db eval "BEGIN"
set fmt "%5d: %8.2f %6dKB %7dKB"
for {set i 0} {$i<10000} {incr i} {
  for {set j 0} {$j<300} {incr j} {
    db eval "INSERT INTO test VALUES($i,[expr {int(rand()*1000000)}])"
  }
  if {$i%100==99} {
    set jsize [expr {[file size test.db-journal]/1024}]
    db eval "COMMIT"
    set dsize [expr {[file size test.db]/1024}]
    set end [clock clicks -milli]
    puts [format $fmt [expr {$i+1}] [expr {30000000.0/($end-$start)}] \
          $jsize $dsize]
    set start $end
    db eval "BEGIN"
  }
}
db eval "COMMIT"


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to