Given how clever and compelling Sqlite is - I am testing how it scales to
tables in the 100GB / 200 million row range. This is for a strictly "read
only" application - but first the tables must be populated in a one time
process. As is often the case with Big Data - the data is a little dirty -
so the process involves importing - selecting - counting - inspecting -
updating some rows - deleting some rows - selecting - counting -
inspecting, etc. until clean.

Placing the Sqlite database on a traditional C: drive - IO was too slow. At
15 MB/sec - reading a 50GB table would take an hour. So I moved it to
external Raid array where I ran across an interesting find. IO wasn't that
much faster - until I vaccuumed the database - which increase IO 10X to 150
MB/sec - with the same CPU utilization.

This is good news for the final implementation of this read-only database -
but still a dilemma at the data load phase. After a ".vaccuum" - issueing a
single DML against a table - even a DELETE which deletes no rows at all -
causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
DMLing - which makes the data loading / cleansing phase very long.

So I have 2 questions -

(1) Why would simple DML cause such an extreme slowdown as compared with
"post vaccuum" speeds ?

(2) Any knobs to turn to try and maintain the higher speeds post DML -
without resorting to ".vaccuum" ?

Thanks,

Udi
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to