Hi,

we're seeing terrible performance problems when fetching data from one  
of our tables:
The table contains roughly 1.2 Million rows and a plain "SELECT  
COUNT(*) FROM t" query takes 8 minutes to finish.
The table contains 10 fields, but the records average to about 100  
Bytes of data total. The largest records are ~1k of data, but there  
are only  a few hundred records with more than 256 Bytes of data.

Another table in the same database contains over 5 Million records  
(and probably an average of a few hundred Bytes per record). Counting  
rows in this table only takes about 40 seconds.

(All times taken from first run and they're reproducible on several  
machines).


When looking at a sample taken while sqlite is executing the COUNT(*)  
query, almost all the time is spent inside read() calls.
The average I/O rate is about 500kB - 800kB / seconds read  
performance. When reading from other databases / other tables, I can  
easily get 20-30 MB/s overall read performance.
So, I can only conclude that for some reason, there seems to be some  
terrible I/O behavior for the data in this specific table.

Running fs_usage shows that there are almost no consecutive pages  
being read (i.e. the file offsets are not consecutive), so I suspect  
that for some reason, this table is fragmented all across the 3 GB  
database file, which may explain the slow overall read performance.

Now, there are two things that happened to this table, which didn't  
happen to the other big table:

(1) we've updated the schema by appending a couple of columns
(2) when data is updated, it's updated by first deleting a bunch of  
rows and then re-inserting the new data. This happens once or twice a  
day for almost all entries, thus the

Would either (1) or (2) lead to heavy fragmentation of this kind?

Vacuuming the database does resolve the performance issue, but  
vacuuming that specific database took about 75 minutes (MacPro @  
3GHz), so it's not something we can do frequently (especially, since  
the database is used in a server product that has 24/7 uptime) :(

Is there anything we could do to avoid the table fragmentation in the  
first place (write queries in a different way, avoid deletions/re- 
insertions, etc.)?
(Note that I'm not talking about reclaiming free space here - that's  
not really an issue for us. We need to avoid the performance hit we've  
seen which IMHO is caused by the pages for a single table being  
scattered all over the database file, thus requiring the paging code  
to jump a lot in the file).


Thanks,
-jens


BTW: this is SQLite 3.6.3, but at least running the COUNT queries is  
not any faster with 3.5.x. Tests run on Mac OS X.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to