Re: [sqlite] Commit frequency and performance
On Wed, Feb 04, 2009 at 10:10:15PM -0700, Gerry Snyder scratched on the wall: > pri...@gmail.com wrote: > > Yes, there are 3 indexes being created > > > > I'll post again after figuring out which of the changes improved the > > performance. Thanks for the clues! > > Since you are able to do some experimenting, try entering the data > without the indices, and then create them. Generally, this won't make a huge difference. It is more or less the same as doing all the inserts in one commit. Dealing with sorting that much data all at once can actually be slower. What will make the biggest difference is how much memory you have available. As you noticed, part of that is page size, but a big part is also the size of the page cache. The default settings are 1K pages w/ 2000 page cache. Each page in the cache has ~0.5K overhead, so that's a 3MB cache. If you're doing this on a beefy desktop with the default 1K pages, crank the cache up to 100,000 pages or more. Adjust accordingly for larger pages. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Commit frequency and performance
pri...@gmail.com wrote: > Yes, there are 3 indexes being created > > I'll post again after figuring out which of the changes improved the > performance. Thanks for the clues! > Since you are able to do some experimenting, try entering the data without the indices, and then create them. Gerry ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Commit frequency and performance
Yes, there are 3 indexes being created. For 2 indexes, the data is in order. For the 3rd index, the data is not in order. I can understand how that will cause extra seeking. It's a good point. I just ran another test after downloading the latest SQLite version, changed the page size to 4K on the main database (the 230MB file), kept pragma synchronous=off, and changed the commit interval back to 5 seconds (even though it shouldn't be doing anything because of synchronous=off). This improved performance from 190 minutes to 106 minutes, and disk I/O and CPU overlap was fairly steady throughout, with the CPU waiting for I/O about 30% of the time. That's higher than the sequential file case because of the extra seeking needed to build the trees/indexes, and it's less than double the best-case sequential time, so I'm okay with that. With the slower times I reported before, there would be long periods of time, like 10-15 seconds, where the CPU was completely idle while data was being written at relatively slow speeds, like < 1000 I/O's per second. I'm assuming these were mostly random I/O, hence the slow rate. With the lastest changes, it doesn't do that. I'll post again after figuring out which of the changes improved the performance. Thanks for the clues! Jim On 2/4/09, Gerry Snyderwrote: > Jim Wilcoxson wrote: >> I am creating an SQLite database via Python, and trying to understand >> some performance issues. > > Wild guess--are you creating an index (or indices) on data being entered > out of order? > > HTH, > > > Gerry ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Commit frequency and performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 pri...@gmail.com wrote: > I still can't understand how 1 long transaction (no syncs) can be > slower than periodic 5-second commits. Any ideas? I'd suggest posting a link to your code/representative data so that others can try to reproduce it. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkmKPIEACgkQmOOfHg372QSCHQCg4t3FcD+7kPjP+2i0rRcBAfbj jGsAoIEKNTKbrrEyX4v17ioeUARPCC1v =rbcc -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Commit frequency and performance
Thanks Roger. There is no other activity on this machine. I have run vmstat 5 during all of the different cases. For plain sequential files, Linux is able to almost completely overlap I/O with CPU, so the real time is close to user+sys time. With SQLite, there are often periods where the CPU is stalled waiting on I/O. I still can't understand how 1 long transaction (no syncs) can be slower than periodic 5-second commits. Any ideas? Jim On 2/4/09, Roger Binnswrote: > > Jim Wilcoxson wrote: >> Can anyone shed light on why building a database inside a single >> transaction would be slower than periodically commiting? > > If you look at the user and sys times then each approach is almost the > same. The real times differing so much indicates other activity on the > machine as well as I/O wait. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Commit frequency and performance
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Jim Wilcoxson wrote: > Can anyone shed light on why building a database inside a single > transaction would be slower than periodically commiting? If you look at the user and sys times then each approach is almost the same. The real times differing so much indicates other activity on the machine as well as I/O wait. Assuming you are using Linux, one gotcha with ext3 filesystems is that the fsync/fdatasync turns into a sync - ie they cause all outstanding data for the entire filesystem to be written not just the file in the fsync request. Consequently *any* other filesystem activity will slow your SQLite activity and give the kind of varying real times you see. I like to run "vmstat 1" to get a better picture of what is going on with disk activity. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkmKLFUACgkQmOOfHg372QSlqACfWr+iKHni+VD2BTx4Do8MU83j eN8AnRUTysAky05K6sj6f9DuR7S15fqH =3fnx -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Commit frequency and performance
I am creating an SQLite database via Python, and trying to understand some performance issues. This application does 3.8M inserts. Most inserts are to a main database that ends up being around 293MB. Around 740K of the inserts with larger data records are to 25 related databases of around 600MB each. The main database and 1 subdatabase are active together, then the subdatabase is closed when it gets to around 600MB and a new one is opened. This is an archiving application. As a performance baseline, I changed the application to just write out plain sequential text files, and it takes around 62 minutes to complete. This includes all the Python overhead and the raw hard drive overhead to write out the same amount of data in a "best case" scenario: Time: 3754.16 seconds Files: 708120 Bytes: 31565490710 real62m34.335s user53m55.492s sys 2m58.305s If I use SQLite and commit every 5 seconds, I get this performance: Time: 11383.95 seconds Files: 708120 Bytes: 31565490710 real189m45.061s user55m58.638s sys 4m46.528s If I commit every 30 seconds, I get this: Time: 13021.34 seconds Files: 708120 Bytes: 31565490710 real217m2.078s user56m9.647s sys 4m59.850s I believe fsync/fdatasync are significant performance issues, so I thought that it should improve performance if I start a transaction, do ALL of the inserts, then commit. Since I'm starting with an empty database, the journal should stay mostly empty, the database will be built without any syncs, and when I commit, the small journal will be deleted. I also added pragma synchronous=off. (All of these test are run with pragma lockmode exclusive.) But when I tried this, the performance was slowest of all: Time: 15356.42 seconds Files: 708120 Bytes: 31565490710 real255m57.523s user55m51.215s sys 4m22.173s Can anyone shed light on why building a database inside a single transaction would be slower than periodically commiting? Thanks, Jim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users