[sqlite] VACUUM requires 6.7 times space ?
Simon - instead of using vacuum, it's much faster to create a new database from the old one, then rename it. It's easy to do this in Python using iterdump(), or you can connect to the new (empty) database, do your create table statements, attach the old database as olddb, then do: insert into table1 select * from olddb.table1; insert into table2 select 8 from olddb.table2; This also lets you do the create table stmts w/o indexes, and add the indexes after the inserts. Not sure if that is faster or not for your data. Good luck! Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file
If you have a table where rows are inserted but never deleted, and you have a rowid column, you can use this: select seq from sqlite_sequence where name = 'tablename' This will return instantly, without scanning any rows or indexes, and is much faster than max(rowid) for huge tables. If no rows have been inserted, you will get NULL. If rows have been inserted, you will get back the last rowid inserted. Jim Simon wrote: If this is a table for which rows are inserted but never deleted, then you will find that SELECT max(rowid) FROM hp_table1 returns the same value almost immediately. Perhaps value-1, but whatever it is it'll be consistent. -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] HashBackup sqlite 3.6.18 -> 3.8.8 performance
To add to the real-world performance info, I recently upgraded the backup program I'm developing, HashBackup, from SQLite 3.6.18 to 3.8.8, mostly to see if it fixed some occasional "Database is locked" errors (multithreaded). For these tests, nothing was changed except the SQLite library. Hardware: -- The tests were on a quad-core processor Mac Mini i7 with 16GB of memory, though HB only uses maybe 100-150MB for these tests. In most cases, can make good use of up to 3 processors before other limiting factors kick in. The Mini has 2 1TB 5400rpm drives that max out at about 112MB/s on the outer tracks with dd on the raw device: sh-3.2# dd if=/dev/rdisk0 of=/dev/null bs=128k ^C5661+0 records in 5661+0 records out 741998592 bytes transferred in 6.578497 secs (112791508 bytes/sec) An application can read from the file system at about 82MB/sec. This disk has full-disk-encryption enabled, so maybe that is a factor: sh-3.2# dd if=test-hb-home.tar of=/dev/null bs=128k ^C5361+0 records in 5361+0 records out 702676992 bytes transferred in 8.552060 secs (82164648 bytes/sec) Environment: -- These tests backup data from drive 0 and write the backup data and SQLite database to drive 1. Before testing different SQLite versions, drive 1 (the output drive) is emptied. Drive 0, the system drive, is mounted and of course changing slightly between tests, but nothing is running except the usual system processes. Test 1: Full system backup followed by incremental backup -- For a full backup, HB is mainly doing inserts of file system and HB metadata using prepared insert statements. The dedup table and arc files (compressed user data) do not use SQLite. Around 700K files are backed up on the system drive, totalling 49GB. There are only 3000 files 1MB or larger, so this test is seek-bound on the filesystem. The largest file is 8GB. SQLite 3.6.18 full backup: Time: 2745.6s, 45m 45s Checked: 698470 paths, 49500931464 bytes, 49 GB Saved: 698418 paths, 49500931464 bytes, 49 GB Excluded: 45 Dupbytes: 15300578096, 15 GB, 30% Compression: 51%, 2.1:1 Space: 23 GB, 23 GB total No errors real45m52.149s user26m43.971s sys 4m4.460s SQLite 3.6.18 incremental backup: Time: 425.8s, 7m 5s Checked: 698471 paths, 49501306705 bytes, 49 GB Saved: 108 paths, 1235685931 bytes, 1.2 GB Excluded: 45 Dupbytes: 1230321290, 1.2 GB, 99% Compression: 99%, 1493.3:1 Space: 827 KB, 23 GB total No errors real7m17.200s user1m51.300s sys 0m43.691s SQLite 3.8.8 full backup: Time: 3068.1s, 51m 8s Checked: 698472 paths, 49501873829 bytes, 49 GB Saved: 698420 paths, 49501873829 bytes, 49 GB Excluded: 45 Dupbytes: 15296606115, 15 GB, 30% Compression: 51%, 2.1:1 Space: 23 GB, 23 GB total No errors real51m21.242s user26m42.716s sys 4m12.255s SQLite 3.8.8 incremental backup: NOTE: a large Postgres log file was backed up with 3.6.18 that was not backed up this time. This backup saved 18MB instead of 1.2GB. Time: 362.6s, 6m 2s Checked: 698472 paths, 49501964307 bytes, 49 GB Saved: 64 paths, 18663074 bytes, 18 MB Excluded: 45 Dupbytes: 17388470, 17 MB, 93% Compression: 99%, 116.6:1 Space: 160 KB, 23 GB total No errors real6m9.153s user1m29.346s sys 0m42.033s My conclusion, based on the full system backup test, is that SQLite 3.8.8 is 11-12% slower than 3.6.18 for this insert workload: (3068-2745)/2745 .1176 I'm not particularly happy about that, but if it fixes my Database is locked problem, I can live with it, and I might be able to use a partial index on one index to speed this up. Test 2: backup 2 versions of a large SQL dump -- In this test, there are 2 versions of an SQL database dump, to SQL commands. The 2 files are around 1.8GB. This test eliminates a lot of the filesystem seeking aspect of a system backup. The purge command was used before each backup to force actual disk I/O on the data file and SQLite database: 3.6.18 initial backup: Around 33K SQL inserts are performed as the file is split into blocks. Time: 23.1s Checked: 4 paths, 1812149567 bytes, 1.8 GB Saved: 4 paths, 1812149567 bytes, 1.8 GB Excluded: 0 Dupbytes: 0 Compression: 78%, 4.7:1 Space: 382 MB, 382 MB total No errors real0m23.223s user0m53.623s sys 0m1.709s 3.6.18 incremental backup: Around 9K inserts are performed for new blocks. Around 24K queries are performed to verify block hashes match Around 24K updates are performed to bump reference counts Time: 21.5s Checked: 4 paths, 1771063478 bytes, 1.8 GB Saved: 4 paths, 1771063478 bytes, 1.8 GB Excluded: 0 Dupbytes: 1306412633, 1.3 GB, 73% Compression: 93%, 14.6:1 Space: 121 MB, 503 MB total No errors real0m24.181s user0m31.974s sys 0m1.266s 3.8.8 initial backup: Time: 22.4s Checked: 4 paths, 1812149567 bytes, 1.8 GB Saved: 4 paths, 1812149567 bytes, 1.8 GB Excluded: 0 Dupbytes:
Re: [sqlite] Optimizing list retrieval with a FTS3 table
On Wed, Mar 16, 2011 at 3:15 PM, Travis Orrwrote: > I am currently working on a project that requires retrieving a list of > all the rows from a FTS3 table. The ordering of the results varies by > search criteria. Since this is for an embedded project the list results > are passed in chunks to another module to give the appearance of faster > operations. Somewhere you have some state information so that you know what offset to use. Store the SQLite cursor with that state information, and use it to fetch the next 2000 rows on each call, ie, only do the query once. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing list retrieval with a FTS3 table
On Wed, Mar 16, 2011 at 4:12 PM, Travis Orr <t...@ivl.com> wrote: > Jim, just did a test with your recommendation, however that ended up > using too much memory, since it is selecting the entire list in on go > and keeping the memory for a significant amount of time, for the > embedded system it is running on. Other tasks started having problems. > > Travis I think it should only use cachesize*pagesize memory, so you should be able to control memory usage with pragma cache_size. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jim Wilcoxson > Sent: March-16-11 12:51 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table > > On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr <t...@ivl.com> wrote: >> I am currently working on a project that requires retrieving a list of >> all the rows from a FTS3 table. The ordering of the results varies by >> search criteria. Since this is for an embedded project the list > results >> are passed in chunks to another module to give the appearance of > faster >> operations. > > Somewhere you have some state information so that you know what offset > to use. Store the SQLite cursor with that state information, and use > it to fetch the next 2000 rows on each call, ie, only do the query > once. > > Jim > -- > HashBackup: easy onsite and offsite Unix backup > http://www.hashbackup.com > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] unexpected large journal file
Eric Smithwrites: > > Jay A. Kreibich wrote: > > > Try getting rid of the PK definition and see how much that buys you. > > It might be worth it, or it might not. > > and Simon Slavin wrote: > > > We know you are doing a huge amount of writing to this database. Are > > you also reading it frequently ? If not, then it might be worth making an > > index on that primary key only when you're about to need it. ... > I tried removing the PK definition as you both suggested, and the > journal stays fixed at less than 20Kb, even against a db size of (at > the moment) 37Gb. My insert batch run times are improved by a factor of > ~2.5 and seem to be O(1). > > So, bingo. :) Here's a trick that might help you: Since your primary key is two integers, you can combine them into one integer and use them as the primary key, without requiring a separate index: create table x ( pkey integer primary key, val text); a = (value 0-2M) b = (32-bit integer) pkey = a<<32 | b insert into x values (pkey, data) This shift assumes that b is a 32-bit integer. Since a is limited to 2M, which requires only 21 bits, b can be up to 43 bits, or 8796093022207. Also, I think you mentioned that you are inserting records in this order: a=0 b=0 a=1 b=0 ... a=2M b=0 then a=0 b=1 a=1 b=1 ... a=2M b=1 To insert records in order, you should insert them as: a=0 b=0 a=0 b=1 a=1 b=0 a=1 b=1 Jim --- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)
On 6/22/10, Eric Smithwrote: > I have confirmed that INSERT times are roughly logarithmic in > the number of existing records after creating my specific user > indices. > > But INSERT times appeared to be *linear* in the number of existing > records before I had created any user indices (and with no primary > keys or unique indices defined). > > Can anyone explain this? > > Eric Insert times should be constant for the 2nd case: no primary key, no indexes; ie, it doesn't matter how many records are already in the database. I confirmed this with SQLite 3.6.18. Did you see my earlier note about combining your two integers into the primary key? This will also give you constant insert times, if you insert items in the order: a=0, b=0 a=0, b=1 a=1, b=0 a=1, b=1 etc. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)
On 6/22/10, Eric Smith <eas@gmail.com> wrote: > Jim Wilcoxson wrote: ... >> Did you see my earlier note about combining your two integers into the >> primary key? This will also give you constant insert times, if you >> insert items in the order: ... > Thanks also for the tip on insertion order. Does that also hold for > multi-column indices (and not single-column indices transformed from two > integers)? I assume it's because we get more cache hits and fewer tree > rebalances when we insert in key-order? Yes, I'm pretty sure this applies to multi-column indices too. So if you do your inserts as: a=0 b=0 a=1 b=0 ... a=0 b=1 a=1 b=1 Then the first set of rows with b=0 will be added in "more or less" constant time. When you start doing the second set of inserts, with b=1, that will cause pain, because you will be modifying every index record you created earlier and squeezing a new entry in between every existing entry. This will require a lot of journalling. I think it would run faster (less journalling) to insert in order with: a=0 b=0 a=0 b=1 a=1 b=0 a=1 b=1 etc. Even if you load the data without indexes and add the index later, my guess is that SQLite will still traverse the data in rowid order to create the index. So you are still better off inserting in the 2nd order rather than the first. The added advantage is that your index pages will be clustered together. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupted sqlite journal
On Fri, Jul 9, 2010 at 10:38 AM, Ke Taowrote: > > HI All , > > I have sqlite db name "wdb" and "wdb-journal" file was created by power > failure something , when I do any db operation sqlite always prompt "disk > I/O error" , but when I delete the "wdb-journal" ,there is no errors > prompted. I think maybe the wdb-journal file was corrupted , does anyone > have any idea on this ? > I used sqlite on linux system. > > Best Regards, > Ke Tao > Permission problems can cause this. The "disk I/O error" actually means "I can't do a rollback", maybe because root created the journal and owns the database, then another user is trying to do things with it. Just a guess. This "disk I/O error" message is very confusing. Been there! Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupted sqlite journal
On Fri, Jul 9, 2010 at 3:21 PM, Jim Wilcoxson <pri...@gmail.com> wrote: > On Fri, Jul 9, 2010 at 10:38 AM, Ke Tao <kota...@gmail.com> wrote: > >> >> HI All , >> >> I have sqlite db name "wdb" and "wdb-journal" file was created by power >> failure something , when I do any db operation sqlite always prompt "disk >> I/O error" , but when I delete the "wdb-journal" ,there is no errors >> prompted. I think maybe the wdb-journal file was corrupted , does anyone >> have any idea on this ? >> I used sqlite on linux system. >> >> Best Regards, >> Ke Tao >> > > Permission problems can cause this. The "disk I/O error" actually means "I > can't do a rollback", maybe because root created the journal and owns the > database, then another user is trying to do things with it. Just a guess. > This "disk I/O error" message is very confusing. Been there! > > Jim > -- > HashBackup: easy onsite and offsite Unix backup > http://sites.google.com/site/hashbackup > Also, by deleting the journal, you are likely to corrupt your database. You can't delete the journal! Jim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Database corruption on Linux ext3
On Tue, Jul 13, 2010 at 8:06 PM, Roger Binnswrote: > > On 07/13/2010 04:57 PM, Simon Slavin wrote: > > One on each page and one for the entire file that checksums the page > checksums ? > > One for each page plus one of the header would make the most sense, but the > I don't think this would work, because the problem described is that the writes aren't making it to disk. If pages don't make it to disk, the old pages will be present, with the old, and valid checksums. The only way I can see checksums helping with this problem is if there is a checksum over the entire file (or checksums of checksums of each page). Then if you do any writes, but not all writes, the overall checksum will be invalid. SQLite would have to verify the checksum on every page when the database is opened and a hot journal exists, which could be quite a lot of overhead for a large database. Plus, SQLite would have to keep a list of the checksums for every page, and at commit time, recompute the overall hash/checksum. This could be lots of memory for a large database. A 1GB database for example would require 1M 20-byte SHA1 hashes, so 20MB. If a bit for every page in the database was a scalability problem in earlier versions of SQLite, I'm guessing that 20 bytes for every page would be unworkable. I think a checksum on every page, and maybe even an error-correction code, is a great idea as an optional feature. But it would only detect hardware problems and bit rot. This problem of not doing writes, or doing them in the wrong order, is a different animal IMO. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Database corruption on Linux ext3
On Wed, Jul 14, 2010 at 1:35 AM, Roger Binns <rog...@rogerbinns.com> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 07/13/2010 05:30 PM, Jim Wilcoxson wrote: > > I don't think this would work, because the problem described is that the > > writes aren't making it to disk. If pages don't make it to disk, the > old > > pages will be present, with the old, and valid checksums. > > You are assuming the checksums are stored in the page they checksum. That > would only detect corruption of that page. You could have pages that store > the checksums of numerous other pages, so both the checksum page and the > data page would have to fail to make it to disk. Yes, there are scenarios > where you could still get old apparently valid pages, but those are harder > to happen. > It seems there are several level of checking possible: - checksum on the page itself lets you detect some errors, with no extra I/O - checksum pages for a group of pages lets you detect missing writes within the group, with some extra I/O - checksum of all checksum pages lets you detect missing writes for an entire commit, with even more extra I/O How much extra I/O depends on the size of the db, page size, and how much memory is available for caching checksum pages. Scott mentioned that a detection system without the ability to correct might not be useful, but I think it is useful. Not as good as correction of course, but useful because: - it might prevent the application program from issuing a bogus error message like "the row you asked for isn't in the database"; lots of time could be spent in the weeds chasing down a misleading error - some applications might have backup copies of the database; they could display an error message and revert to a backup Jim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How much a table takes (will this query always work)
On Fri, Jul 16, 2010 at 6:01 AM, Max Vlasovwrote: > Hi, > > always wanted to have a possibility to calculate how much a table occupies. > As long as I see from the archive, there's no out-of-the-box solution > (CMIIW) > > You mean, like this? pragma page_count pragma page_size size = page_count*page_size Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How much a table takes (will this query always work)
On Fri, Jul 16, 2010 at 9:13 AM, Jim Wilcoxson <pri...@gmail.com> wrote: > On Fri, Jul 16, 2010 at 6:01 AM, Max Vlasov <max.vla...@gmail.com> wrote: > >> Hi, >> >> always wanted to have a possibility to calculate how much a table >> occupies. >> As long as I see from the archive, there's no out-of-the-box solution >> (CMIIW) >> >> > You mean, like this? > > pragma page_count > pragma page_size > size = page_count*page_size > > Doh. Nevermind. I see you said table, not database. :) JIm ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] [PATCH] cache preloading
On 8/10/10, Paweł Hajdan, Jr.wrote: > So this is another chromium patch I'd like to submit: > http://src.chromium.org/viewvc/chrome/trunk/src/third_party/sqlite/preload-cache.patch?revision=26596=markup > > I'm not the author of that one, but the main idea seems to be that with > preloading we get better performance with many scenarios chromium uses > sqlite (like for omnibox local suggestions). What do you think about the > patch and the idea? Is there better way to do the same thing? Would you like > to see some benchmarks? The benefit of preloading is to replace random I/O with sequential I/O. I have definitely seen this be effective in some cases. For those who want something like this, doing a cat or dd command, or using OS reads in a loop, will probably give the same benefit: the database won't be loaded into SQLite's cache, but will be loaded into the OS buffer cache. An advantage of preloading outside SQLite is that the dd/cat/read could happen in a separate thread, so it wouldn't impact app startup time. And it could be that one use of a database might benefit from preloading, whereas another use might not, so it seems that preloading inside SQLite would have to be something requested with a pragma rather than always happening by default. What I would like to see more than SQLite preloading is better page allocation strategies and incremental defragmentation, to cut down on the amount of random I/O and keep related pages in sequential order. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Forthcoming release of SQLite 3.7.1
SQLite is a great product; I use SEE, the encrypted edition. For those of us who are not using the C interface, I'd like to request that when possible, new C features like hinting about file sizes and extents also be added as pragmas. Pragmas make it very easy for non-C interfaces to make use of the new features. The async writer thread in earlier versions is one feature I'd really like to use, or at least try out, but I'm using pysqlite bindings. If there was a pragma to activate this, it'd be a snap. Thanks for providing a great library! Jim On 8/18/10, Richard Hippwrote: > We are striving to release SQLite version 3.7.1 within the next few days. > If you have any feedback, objections, comments, or concerns about this > release, please respond as soon as possible to this mailing list, or via > private email to me. > > A summary of the changes in 3.7.1 can be found here: > http://www.sqlite.org/draft/releaselog/3_7_1.html > > A release candidate amalgamation can be downloaded from here: > http://www.sqlite.org/draft/download.html > > Code changes since the prior release can be seen here (warning - approx > 10,000 lines of diff output): > http://www.sqlite.org/src/vdiff?from=release=trunk=1 > > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?
On 8/19/10, Simon Slavinwrote: > > On 19 Aug 2010, at 9:27pm, Taras Glek wrote: > >> I really appreciate that sqlite got this feature to reduce >> fragmentation, but why not expose this as a pragma? > > Do you have figures which suggest that reducing fragmentation leads to any > improvement in performance ? Whether SQLITE_FCNTL_CHUNKS_SIZE is useful is a different discussion. I myself would love to see features exposed via pragmas whenever possible, for the simple reason that I don't use the C API and can't make use of the features otherwise. I would assume that since the SQLite developers added the feature to the C API, there must be a use for it or they wouldn't have bothered. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On 8/20/10, Cory Nelsonwrote: > +1 for this feature request. They've got a very specific and fairly > rare use case, but when opportunity strikes partial indexes are much > more convenient, straightforward, and efficient than the alternative. > > - If a table has 100,000,000 rows and each index page for it holds > 100 rows, a full index will need 1,000,000 pages of storage and 4 page > reads to search. > - With a partial index covering only 10,000 rows of the same table, > it will only need 100 pages of storage and 2 page reads to search. > > Big improvement while keeping query syntax and results exactly the same! > >> Are there any known hacks to implement something similar? > > There's not really any good solution. You can create a separate table > with the subset in it, that's pretty much it. But that has some > overhead too, and can complicate your queries. A while back I needed something like a partial index. The backup program I'm working on does block dedup, so I made an index on each block's SHA1. But then I wanted to only dedup some blocks, not all. I needed to say "insert this block row and also index the sha in the sha index", or "insert this block row but don't index the sha in the sha index". Then I wanted to be able to say "is X in the sha index?", ie, only look in the index. SQL doesn't really have a way to do that. It's a simple concept, but it doesn't seem to fit within the SQL framework. The best I could come up with is a separate table. The problem is, indexing the SHA1 normally means there is a copy in the row and a copy in the index. Using a separate table, which still has to be indexed, means there is a copy in the row of the main table, a copy in the separate table, and a copy in the separate table's index. I guess one way to do this using the SQL partial index feature being discussed would be to have an extra column in the row called "isindexed", setting that to 1 or 0 depending on whether the SHA should be included in the index, and using create index ... where isindexed=1. Then on the queries, say "select blockid from blocks where isindexed=1 and sha=X". And any query that didn't have isindexed=1 wouldn't be able to use the index at all. Is that how it would work? Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] partial index?
On 8/20/10, Cory Nelson <phro...@gmail.com> wrote: > On Fri, Aug 20, 2010 at 7:39 AM, Jim Wilcoxson <pri...@gmail.com> wrote: >> ... >> The best I could come up with is a separate table. The problem is, >> indexing the SHA1 normally means there is a copy in the row and a copy >> in the index. Using a separate table, which still has to be indexed, >> means there is a copy in the row of the main table, a copy in the >> separate table, and a copy in the separate table's index. >> > > You might want to index the "sha" column on the primary table, then > the secondary table can just be indexedblocks(blockid INTEGER PRIMARY > KEY). No wasted space that way, but will require more I/Os to JOIN > the tables. In my case, the point of a partial index would be to save space. If the sha column were indexed completely in the primary table, no other tables or indexes are necessary, but then no space is saved. If I only want to index 80% of the rows, I'd save 20*.2n bytes with a partial index, where n is the number of rows. (SHA1 is 20 bytes) Using a separate table, I'd need 40 bytes (roughly) for each sha indexed: 20 bytes for the main table, 20 for the index; or 40m for m entries. This saves space if half or fewer of all blocks are indexed. If more than than half are indexed, indexing all rows in the main table uses less space. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite version 3.7.2
Using the SQLite Encryption Edition rev 3.7.0.1, the latest released version, on OSX Tiger (10.4), results in an undefined reference to gethostuuid. Is OSX 10.4 no longer supported? Thanks, Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup On Mon, Aug 23, 2010 at 9:01 PM, Richard Hippwrote: > SQLite version 3.7.2 is now available on the SQLite website: > http://www.sqlite.org/ > > SQLite version 3.7.2 fixes a single bug that was discovered just hours > after > the release of 3.7.1. The bug can result in corruption of the database > free-list after an incremental vacuum. The bug had nothing whatsoever to > do > with SQLite version 3.7.1 or any other recent release. The problem had > been > in the code for over a year, since version 3.6.16. The discovery of the > problem so soon after the release of version 3.7.1 was purely coincidental. > > The bug fixed in 3.7.2 can result in database corruption. However, the > corruption caused by this bug can almost always be fixed simply by running > VACUUM on the database. And the corruption will only occur in an > incrementally vacuumed database which at some point in time contains > hundreds of unused pages which are slowly released back to the operating > system by multiple calls to the incremental_vacuum PRAGMA. Even then, one > must be particularly unlucky to hit the right combination of freed pages in > order to trigger the bug. Hence the problem is quite obscure and was not > noticed for over a year. > > Hundreds of lines of code where changed for version 3.7.2, but most of > those > changes were to test procedures. As is the custom with SQLite, not only > was > the specific bug fixed, but new tests where put in place to detect and > prevent similar kinds of bugs elsewhere in the code. We believe that one > should not just fix the bug, but also fix the process that generated the > bug. The only 4 working lines of code were changed for version 3.7.2: > > > http://www.sqlite.org/src/fdiff?v1=2dff4076d3c994dc=5047fb303cdf6806 > > Special thanks to Filip Navara for finding and reporting the problem with > incremental vacuum. > > Please report any other problems to the sqlite-users@sqlite.org mailing > list, or directly to me. Thanks. > > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Feature request: copying vacuum
I'd also be interested in a VACUUM TO feature, more for performance aspect than the fragmentation, although that's a plus too. The backup program I'm working on packs many files into archives, which are SQLite databases. I have run some vacuum tests here; the filesystem cache was purged before each test: cp 1GB archive: 44 seconds (for baseline comparison) Vacuum 1GB archive w/sqlite3: real 2m15.421s user 0m8.776s sys0m34.205s Dump and reload 1GB archive: $ time sqlite3 arc.0.0.rm ".dump"|sqlite3 arc.0.0.new real0m52.174s user0m23.750s sys 0m9.086s Creating a new archive is more than twice as fast as doing a vacuum on an existing archive, and nearly as fast as a straight cp. While an extra minute and a half for SQLite vacuum may not seem like a big deal, a backup retention operation could affect many archives. So 30 archives would require an extra 45 minutes to vacuum. I've had to add code to the backup program to avoid doing vacuums whenever possible because they're slow. I would suggest the VACUUM TO feature takes a read lock on the database and creates a new, vacuumed database, but leaves it up to the application whether to replace the original or not. If the application decides to do a rename over the original database, then yes, this could goof up other connections, but that could happen anyway if an app decided to delete a database: the other connections would keep on using the database, even though it is unlinked from the filesystem. For my single-user application, VACUUM TO would be very useful. In multi-connection cases, the app would have to provide some kind of mechanism outside SQLite to coordinate things, or just stick to the regular vacuum operation. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup On Thu, Sep 9, 2010 at 7:19 PM, Taras Glekwrote: > On 09/01/2010 11:41 AM, Taras Glek wrote: > > Hi, > > Currently VACUUM takes care of sqlite-level fragmentation. > > Unfortunately it does little for fs-level fragmentation since the same > > file is being reused. It would be really beneficial for Mozilla > > performance if we could get a vacuum/hotcopy mode. > > > > As I understand it, currently vacuum works by copying out the data to > > a new file, then copying it back in and truncating the original db file. > > > > It would be really nice to instead do something like: > > > > copy the data to a new file > > > > swap the underlying filehandles to point at new file > > > > remove old file > > > > rename the new file to old name. > > > > This yields two benefits: > > > > A less fragmented db > > > > ~50% vacuum speedup since the data is only copied once > > > > Currently we can copy the data to a new file, but it is a pretty > > invasive change to swap all of the current sqlite connections to the > > new file. Things like prepared statements, etc need to be updated for > > every single db consumer. Thus it would make sense to have this > > feature on the sqlite side. > > > > Is this reasonable request? > > > > Thanks, > > Taras > I sent this last week, wanted to address some issues that were raised > about my proposal. > > Sorry for the weird(and late!) reply, I'm subscribed via a digest(would > be so nice to have a gmane mirror), so I can't reply directly. > > Richard Hipp: > > If other connections have the database file open while it is being > vacuumed, > > then on unix the other connections will still be left open on the old > > unlinked version of the database file and will never see the new content. > > And on windows, the file swapping and renaming simply is not allowed > while > > other connections have the database files open. > > The work around is to modify SQLite so that it is constantly closing and > > reopening the database files. But that adds rather large overheads that > > seem likely to be much greater than any savings seen through a reduction > in > > disk FS fragmentation. > > > > > I agree with both points. A copying VACUUM should specify that it does > not support the multi-connection usecase. It fail abort if it detects > another db connection( or have this mentioned in documentation if this > detection isn't possible). > The wins from avoiding disk fragmentation+copying less data are > significant. Punting seems like a reasonable alternative to forcing > sqlite to constantly close/open the db. > > > Jay A. Kreibich: > > >You're also breaking transactional integrity. You need the option of > >backing-out of the operation right up until the moment it works, and > >this procedure can't do that. For example, if you lose power right > >after "remove old file", your database is no longer there. > > > You are right my original sequence of events was flawed, it should be: > >copy the data to a new file > >swap the underlying filehandles to point at new file > >rename the new file to old name(this also removes old file). >
[sqlite] Detach command in 3.7.1 gives "database is locked" error
This may be a bug in 3.7.1, or I may have been taking advantage of a flaw in 3.6.18: HashBackup uses SQLite to store backup data in a meta data db and file data archive db. First I open the main database, then attach an archive database. When the archive gets full, I detach it, create a new archive db, and attach that. This all worked fine with 3.6.18. With 3.7.1, I get an error on the detach: OperationalError: database arc is locked The database is not opened in Exclusive mode, and a commit gets executed before the detach. What may be causing this - just a guess - is that there are selects open on the main database at the time of the detach, but the archive database is not referenced. Is this a bug in 3.7.1, or a correction to a flaw in 3.6.18? Thanks, Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Detach says database is locked
This type of thing works with SQLite 3.6.18. I have a similar issue with 3.7. I believe the difference is that in 3.6.18, if you do a select on a main database that doesn't involve an attached database, the attached database isn't locked. In 3.7, if you do a select on the main database that doesn't involve an attached database, but there IS an attached database, the attached database is locked anyway. Then it's not possible to detach it with the open select, even though that select has nothing to do with the attached db. I don't know if this new behavior is a bug in 3.7, or is considered a fix to 3.6. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup On Thu, Oct 21, 2010 at 12:21 PM, Alan Chandlerwrote: > On 21/10/10 00:35, Igor Tandetnik wrote: > > Alan Chandler wrote: > >> Further to my other post related to attaching to databases with PHP PDO, > >> I have now managed to ATTACH OK > >> > >> However, when I come to DETACH, I am getting a Database is locked error > >> when I try and execute it. > >> > >> The only thing happening to that database in between ATTACH and DETACH > >> is a single row SELECT > > > > Make sure you reset or finalize the statement (not sure how it's done in > PHP). > I believe closeCursor(); does the job. If not, I unset the variable. > > However, I have figured out the problem - which is really sad since I > can't do what I hoped - which is loop round a set of rows from a higher > level select statement ATTACHing and DETACHing to a database in turn . > Because the top level select loop is its own transaction, you can't > detach from the database which you attached in the inner part of the > loop since at that moment you are in a transaction. > > I think my way out of the problem is to pull out all the rows into a > single array, then close the transaction and interate over the array > members. Fortunately in the case its just a menu - so there probably > won't be too many items. > > > > -- > Alan Chandler > http://www.chandlerfamily.org.uk > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] programmatic way of determining fragmentation?
On Thu, Oct 21, 2010 at 1:27 PM, Dustin Sallingswrote: > > On Oct 21, 2010, at 10:05, Pavel Ivanov wrote: > > > I think it's not related to fragmentation, but to fill percentage of > > b-tree pages. I guess your reconstructed table is much less in total > > size than your initial one. Also does changing cache_size changes > > above numbers? > > Interesting. We have been using sqlite3_analyze on some tests and > finding that we can get our dbs very fragmented. It doesn't report fill > size as far as I can tell. > > It does report fill percentage, as "Bytes of user payload stored" >We'll play around with cache_size to see if that does anything > useful for us in the meantime. > > > What size do these tables have? > > About 2.2GB with about 4 million rows. > > > What bottleneck appears to be in 3-hour query execution? Is it disk > thrashing? > > Yes. > >I've tried different strategies in the past. Vacuum and the rebuild > both seem to help quite a bit. I don't understand the file layout all that > well right now, so I don't completely understand how the index is traversed. > How much memory do you have in the test system? Because if you rebuild, you are loading the whole db into the filesystem cache, if it'll fit. Then you are no longer accessing the file randomly. You might get the same performance increase by just catting the db (sequentially reading it into the FS cache) and then running your test. It's a good idea to make sure you purge your filesystem cache before running any performance tests related to fragmentation. On OSX, it's purge. On Linux, its echo 3>/proc/sys/vm/drop_caches. Otherwise, it's easy to fool yourself. But I agree with you that fragmentation can be a big issue, depending on how the data is loaded and how it is subsequently accessed. Jim -- HashBackup, LLC http://sites.google.com/site/hashbackup ** ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow SELECT Statements in Large Database file
Jonathan - 500 queries per second is 2ms per query. You'll have a hard time getting that kind of speed for random queries with any rotating media. Your database needs to be in memory - all of it, not just indexes - or on a flash drive. If your queries are not random but are somehow related, eg, you are doing thousands of queries within a small area, and the db records were also inserted by area, you may have better luck. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup On Fri, Oct 29, 2010 at 12:07 PM, Jonathan Hawswrote: > All, > > I am having some problems with a new database that I am trying to setup. > > This database is a large file (about 8.7 GB without indexing). The problem > I am having is that SELECT statements are extremely slow. The goal is to > get the database file up and running for an embedded application (we have > tons of storage space so the size is not a problem). > > Here is the schema layout: > > CREATE table Dted (dtedkey INTEGER PRIMARY KEY, dted_lat > INTEGER, dted_lon INTEGER, dted_alt FLOAT); > > We lookup dted_alt based on dted_lat and dted_lon. Here is our SELECT > statement: > > SELECT dted_alt FROM Dted WHERE (dted_lat=%d AND dted_lon=%d); > > The numbers fed to dted_lat and dted_lon are typically on the order of > 37 and -111. > > What can we do to speed up our SELECT statements? Minutes is > unacceptable for our application. We were hoping we could run somewhere > on the order of 500 queries per second and get valid results back. > > I am not an SQL expert, but I was reading about indexes that that it is > best to have a specific index per SELECT. Since we only have one, > this is the index I am creating now (it has been creating this index on my > machine for the past 10 minutes now): > > CREATE INDEX idx_ll ON Dted(dted_lat,dted_lon); > > Is that a good index for my SELECT? Will it speed up the accesses? > > Any thoughts? > > > Thanks! > -- > Jonathan > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_open on non-DB files / corrupt DBs
On Thu, Nov 25, 2010 at 11:06 AM, Tito Ciurowrote: > On 25 Nov 2010, at 12:51, Igor Tandetnik wrote: > > > Run "PRAGMA integrity_check" right after opening. > > That could be a potentially slow operation if the database is valid and > contains lots of records. > > Wouldn't be better to issue a SELECT statement and let SQLite come back > with a SQLITE_NOTADB error right away? > > I ran into this problem too. What I did was to execute pragma synchronous=full (or whatever mode you are using) after opening the database. This will fail if the file is not really an SQLite db or you have the wrong key for an encrypted db. It doesn't handle the problem of a corrupted db, but as has been mentioned, there's not a fast way to detect that. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] database size (again)
A lot of the SQLite overhead is going to be in the stuff surrounding your actual data; I'd be surprised if you saved much space by using fixed-size ints vs the varints used by SQLite. You didn't mention about indexes; if you have any, they will take a lot of space because your row size is so small. Maybe write your own VFS for SQLite? Definitely run sqlite3_analyzer before deciding anything. Jim -- HashBackup: easy onsite and offsite Unix backup http://sites.google.com/site/hashbackup On Tue, Dec 7, 2010 at 10:57 AM, Laszlo Nemethwrote: > Hi, > > I have a database (6 in fact) of high-frequency data > > create table eurusd (tick integer not null, bid float not null, ask > float not null); > > with 80M records currently and growing, freshly inserted, no deletions > will ever take place, the schema will never change, and neither of the > fields can be null. The size is already 3.6G (I put an index on it > after bulk insert), which is a bit too much for me considering that > the existing infrastructure (ie binary file 12bytes per record) is > 800M (which fits into memory and I don't even need to index). > > Having checked older posts on sqlite-users, I noticed that this issue > comes up frequently and normally the problem is either wrong choice of > datatypes (ie text instead of integer), or unnecessary indices. None > of which applies here. > > Question: is it possible to recompile sqlite to force the > representation of integers to be 4 bytes, and that of floats to be > also 4 bytes. I would like to have no observable change in the > behaviour of sqlite. > > I > (1) am quite comfortable with the hacking, > (2) understand that the database will no longer be platform > independent, nor compatible with anything else, > (3) tried to run the analyser to see if there is something fishy with > half full pages, but it wants tcl8.6 and haven't gotten around to > install it (will do), > (4) also checked the file format document, but that didn't give me any > immediate hint how to achieve what I'd like, though I only skimmed it > through. > > The point of doing this is that I get a smaller db, and I still get > all the beautiful machinery built for sqlite. > > Any suggestions, a complete solution, or a "no that's not possible, > because..." will be much appreciated. > Thanks, Z > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Order of UNION query results
On Fri, Jan 21, 2011 at 6:53 PM, Josh Gibbswrote: > Could someone please clarify for me if the the resulting order of a UNION > query will come back with the left data first, then the right data in > the case > that no ordering has been defined for the query. > > My need is to have a parameter stored in a database, with an optional > overriding parameter which should take precedence, such as: > > select value from param_overrides where key='setting' UNION > select value from params where key='setting' > > I'd like the resulting recordset to always contain the override parameter > first if it exists so I can simply use that value. > > Thanks, Josh > If you add a prio field (priority) to your param database, 0=highest priority (override), 1=normal priority, you only need 1 table. Then you can say: select value from params where key='setting' order by prio limit 1 Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] stored procedures implementation for SQLite
It looks interesting. Should your except stmt reference apsw? -Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com On Wed, Jan 26, 2011 at 11:38 AM, Chris Wolfwrote: > > > I know this is an old thread, but shortly after I read it, I attempted to > implement > stored procedures in SQLite! I only did it to see if I could, not because > I > necessarily think it's a good idea... It's very experimental and not fully > implemented, > but if anyone is interested, I checked in my work on GitHub, including > pre-compiled > binaries for MacOS and Linux. > > > http://chriswolf.heroku.com/articles/2011/01/26/adding-stored-procedures-to-sqlite/#extended > > Regards, > > Chris Wolf > > BareFeetWare wrote: > > On 13/11/2010, at 11:14 AM, Scott Hess wrote: > > > >> On Fri, Nov 12, 2010 at 5:50 AM, BareFeetWare < > list@barefeetware.com> wrote: > >>> IMO, if you're implementing database logic (ie constraints and > triggers) in application code, then you're reinventing the wheel, making > your package unnecessarily complex and grossly inefficient. If you're just > using SQLite to store your data but doing all the logic outside of SQLite, > then you might as well just be saving your data to a CSV file or XML. See my > previous post for reasoning behind this. > >> From http://www.sqlite.org/about.html : > >>> Think of SQLite not as a replacement for Oracle but as a replacement > for fopen() > > > > The full paragraph from that page reads: > > > >>> SQLite is an embedded SQL database engine. Unlike most other SQL > databases, SQLite does not have a separate server process. SQLite reads and > writes directly to ordinary disk files. A complete SQL database with > multiple tables, indices, triggers, and views, is contained in a single disk > file. The database file format is cross-platform - you can freely copy a > database between 32-bit and 64-bit systems or between big-endian and > little-endian architectures. These features make SQLite a popular choice as > an Application File Format. Think of SQLite not as a replacement for Oracle > but as a replacement for fopen() > > > > So I think it's referring to how SQLite stores its data in a local file, > rather than on a remote server with which it communicates indirectly. ie > "SQLite does not have a separate server process". In that way, SQLite is > like fopen rather than Oracle. The same paragraphs mentions SQLite > "triggers, and views", freely copying a [self contained] SQLite database > between architectures, which allude to my point about putting the logic in > the database itself so you can move the whole database between > architectures. > > > >> So, yes, you might as well just be saving your data to a CSV or XML > file. And I'm sure if you had a package to do that, someone would be > arguing about whether your XML should allow for embedded transforms. > > > > What do you gain by implementing database logic in the application layer, > when it could be done far more efficiently and reliably in the SQL schema? > The only thing I can think of is avoiding the (shallow) learning curve. Why > re-invent and roll your own integrity checking etc when it's already > available and in a way much closer to the data than your application code > can get? > > > > See my previous post for the contrary argument: > > > http://sqlite.org:8080/cgi-bin/mailman/private/sqlite-users/2010-October/025096.html > > > > Basically, database logic in the database itself is much faster, causes > less error, optimises queries, makes the database portable between > application environments or GUI apps. What's not to love? > > > > Thanks, > > Tom > > BareFeetWare > > > > -- > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Trigger for incrementing a column is slow
On Thu, Feb 3, 2011 at 5:07 PM, Kevin Wojniakwrote: > > On Feb 3, 2011, at 11:41 AM, Petite Abeille wrote: > > > On Feb 3, 2011, at 6:53 PM, Kevin Wojniak wrote: > > > >> The trigger is ran once via sqlite3_exec(); > > > > Hmm... you mean the trigger is run every single time you perform an > insert, no? > > Yes. I should say the trigger is created once via sqlite3_exec(). > > >> Any insight as to why the trigger is significantly slower? > Perhaps SQLite is having to recompile the trigger SQL on every use, whereas your update stmt is prepared. I tried triggers once, and they were slow for me too. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bi-directional unique
If you don't care about the order and the integers are smallish, like 31 bits or less, I'd do this: create table t(k int primary key); i = whatever j = whatever if i < j: k = i<<32 | j else: k = j<<32 | i insert k into table To see if a pair is in the table, do the same steps and lookup k. If you do care about the order, you can add k as primary key to the table layout you mentioned with i and j. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com On Wed, Feb 9, 2011 at 1:12 PM, Black, Michael (IS)wrote: > I have a need to create a unique bi-directional relationship. > > You can think of it as pairings of people who eat dinner together. > > > > create table t(i int, j int); > > insert into t(1,2); > > insert into t(2,1); << should give an error because the pairing of 1-2 > already exists. > > insert into t(3,2); << OK > > insert into t(3,1); << OK > > insert into t(1,3); << should be error > > > > You can't guarantee that one column is less than the other so there's no > win there. > > > > Speed is of the utmost concern here so fast is really important (how many > ways can I say that???). > > > > Is there anything clever here that can be done with indexes or such? > > > > > > Michael D. Black > > Senior Scientist > > NG Information Systems > > Advanced Analytics Directorate > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Bi-directional unique
On Wed, Feb 9, 2011 at 5:25 PM, Simon Slavinwrote: > > On 9 Feb 2011, at 7:32pm, Black, Michael (IS) wrote: > > > Didn't we just determine a couple of days ago that triggers were > performance killers? > > > > That's one reason I want to avoid those. > > Okay, then since your program knows the logic of how that table will be > used, it can do it for itself. Just do two INSERTS. > > Alternatively, if your language is capable of it, simply well-order the two > personIDs, so that the one in one column is always smaller than the one in > the other column. Then change your SELECT logic so that it returns the > UNION of SELECTing on both columns. > > One system speeds up the INSERTs, the other speeds up the SELECTs. It > depends which is the more important to you. > I assumed you could generate k in a procedural language outside of SQL, but if you want to do this purely in SQL, I think you can just say: create table t(k int primary key) insert into t values (min(?,?)<<32 | max(?,?)) and bind i,j,i,j to the parameters. For the select, same thing: select * from t where k=min(?,?)<<32 | max(?,?) and bind i,j,i,j I don't see the need to do 2 selects or 2 inserts, but maybe I'm not understanding something. Jim --- http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
On Fri, Feb 11, 2011 at 10:50 PM, Thomas Fjellstrom < tfjellst...@strangesoft.net> wrote: > I've seen numbers on the internet stating that sqlite is super fast, should > be > able to handle tens of thousands of inserts or updates in a second or two, > even in sync mode. So I'm a bit stumped as to why its performance is so low > in > my case. One time, I saw it take 5 seconds to flush almost 3k rows back to > the > db. > > No hosts are actually ever removed from the database, which has about 120k > hosts in it by now, totaling up to a file size of around 7-8MB total. So > its > not a lot of data, and the queries are pretty simple. So I really am > stumped. > You mentioned your db is 8mb. If you are using the default page size of 1k, that means you have 8k pages in your db. The default cache size is 2000 pages, so your db doesn't fit into SQLite's default cache. I'd suggest using pragma page_size=4096. This will have the effect of increasing your cache size from 2mb to 8mb and prevent cache overflow. You might want to use pragma cache_size=3000 as a safety margin. This would be a 12MB cache with 4KB pages. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
On Sat, Feb 12, 2011 at 9:10 AM, Black, Michael (IS)wrote: > Here's a little benchmark program I wrote to test a super-simple > one-integer insert to test sql speed and commit interval behavior. > > Running it on my machine (Windows XP64 8-core 3Ghz gcc (tdm64-1) 4.5.1) I > go from 320M inserts per second to 361K inserts per second when no > begin/commit occurs. With WAL mode turned on it only drops to 5.9M inserts > per second. > > D:\SQLite>batch 5000 5000 > 32000.0 inserts per sec > D:\SQLite>batch 5000 2500 > 32000.0 inserts per sec > D:\SQLite>batch 5000 1200 > 16000.0 inserts per sec > D:\SQLite>batch 5000 600 > 16000.0 inserts per sec > D:\SQLite>batch 5000 300 > 10666.7 inserts per sec > D:\SQLite>batch 5000 150 > 5333.3 inserts per sec > D:\SQLite>batch 5000 75 > 3200.0 inserts per sec > D:\SQLite>batch 5000 40 > 1777.8 inserts per sec > D:\SQLite>batch 5000 20 > 1000.0 inserts per sec > D:\SQLite>batch 5000 10 > 333.3 inserts per sec > D:\SQLite>batch 5000 5 > 2406015.0 inserts per sec > D:\SQLite>batch 5000 2 > 526315.8 inserts per sec > D:\SQLite>batch 5000 1 > 360766.6 inserts per sec > Unless I'm missing something, SQLite has to update the first page of the database on every commit, to update the change counter. Assuming you are using rotating media, that record can only be updated 120 times per second, maximum, on a 7200RPM drive. I don't understand how you can do 360K commits per second if your system is actually doing "to the platter" writes on every commit. Can someone clue me in? Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
On Sun, Feb 13, 2011 at 1:15 AM, Max Vlasov <max.vla...@gmail.com> wrote: > On Sat, Feb 12, 2011 at 7:11 PM, Jim Wilcoxson <pri...@gmail.com> wrote: > > > > > > > Unless I'm missing something, SQLite has to update the first page of the > > database on every commit, to update the change counter. Assuming you are > > using rotating media, that record can only be updated 120 times per > second, > > maximum, on a 7200RPM drive. > > > > > > Hmm, I think there's something wrong with this theory of a single flush per > round > > Yesterday I posted here results about from tests that ended with something > like 50 commits per second in case of single flush (journal_mode=off). I > decided to run a simple program that writes a byte to a file and does > windows FlushFileBuffers. It reported 53 writes per second. This was > expected results if this theory is right. But when I connected an external > drive (WD1200E1MS, bus-powered, it seems it is even 5400 rpm), this number > jumped to 253. I though "Ok, something, maybe os maybe drive tried to fool > me, let's do this other way". I did the following: > > - Checked the removable drive "Quick removaI" is on so no write cache on > os > side > - opened the same test db on this drive, journal_mode=off for best > scenario > - Executed 10,000 updates to a single record made as separated > transactions > ( UPDATE TestTable SET Id=Id+1 ) > - When the program reported ok, I quickly unplugged the usb cord. The > device > is bus-powered so had no power to complete any postponed operation. The > total time was about 50 seconds and to do the real quick unplug I prepared > my hands and the delay was no more than 200 ms (subjectively). > - Replug the drive again and opened the db in the tool to see was there any > corruption or other lost of data. > > So the final results: > - the db was always ok and contains the correct value (id=10001 for initial > 1). > - the speed was about 227 commits per second so very close to my system > flush test (253) > > So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even if > it's 7200 (manufacturers sometimes upgrade drives inside portable hd > without > prior notice), it's still twice as much as 7200/60=120. > If we go with the theory that a single record/sector can only be written once per revolution, that means writes can occur 90 times per second, or about every 11ms on your 5400 rpm drive. So for this test to show corruption, you'd have to unplug the drive within 11ms of the last commit, which I think is practically impossible. My hypothesis to explain the behavior you are seeing is that it takes 4ms (1/253) for your OS to flush its buffers to the hard drive cache, issue the sync to the drive, and the hard drive to acknowledge the sync. When it is convenient, the drive will write this data to the disk. The sync command may make this more urgent than usual, which would explain why I/O slows down if sync is used. Or, the drive could simply have a sync timer: the first sync command starts the timer, and when 4ms have passed, the drive dumps its cache to the platters. Jim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
On Sun, Feb 13, 2011 at 7:07 AM, Jean-Christophe Deschampswrote: > > >So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even if > >it's 7200 (manufacturers sometimes upgrade drives inside portable hd > >without > >prior notice), it's still twice as much as 7200/60=120. > > 5400/60, 7200/60 ... those values rely on the assumption that > successive LBAs are mapped to successive physical sectors (512 or 4K, > whatever) on the same face of the same plater. Is it obvious that all > today's typical stock drives actually implement only that simple old > scheme and not an untold mix of various interleaving techniques? > This is true for the case of writing multiple records or multiple sectors. For example, if you have a drive with 5000 sectors per track and you write sector 1 of the track with a hard sync, you may have time to write sector 2500 with a hard sync in the same revolution. Or maybe you can write every 500 sectors with a hard sync in the same revolution, giving you a commit rate of 10 per revolution or 900 commits per second on a 5400 rpm drive. But what I postulate is that you can't physically write *the same* record over and over more than 90 times per second on a 5400 rpm drive, unless the drive, OS, or filesystem implements something like wear-leveling, where the physical location of sectors is constantly changing. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
On Sun, Feb 13, 2011 at 11:55 AM, Max Vlasov <max.vla...@gmail.com> wrote: > On Sun, Feb 13, 2011 at 5:31 PM, Jim Wilcoxson <pri...@gmail.com> wrote: > > > On Sun, Feb 13, 2011 at 1:15 AM, Max Vlasov <max.vla...@gmail.com> > wrote: > > > So the final results: > > > - the db was always ok and contains the correct value (id=10001 for > > initial > > > 1). > > > - the speed was about 227 commits per second so very close to my system > > > flush test (253) > > > > > > So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even > if > > > it's 7200 (manufacturers sometimes upgrade drives inside portable hd > > > without > > > prior notice), it's still twice as much as 7200/60=120. > > > > > > > My hypothesis to explain the behavior you are seeing is that it takes 4ms > > (1/253) for your OS to flush its buffers to the hard drive cache, issue > the > > sync to the drive, and the hard drive to acknowledge the sync. When it > is > > convenient, the drive will write this data to the disk. The sync command > > may make this more urgent than usual, which would explain why I/O slows > > down > > if sync is used. Or, the drive could simply have a sync timer: the first > > sync command starts the timer, and when 4ms have passed, the drive dumps > > its > > cache to the platters. > > > > > Jim, I see your point, maybe really for removable media on Windows sync is > still lazy, but much less "lazy" then general cached operations. Another > version is that removable hds can report that they wrote some data while > actually work as your described postonning it a little. > > But I thought about how it would be possible to test this explanation . I'm > going to do some tests that works like this. The same id updating, but in > the middle of 10,000 operation I will unplug the cord, the sqlite will say > that it can't commit particular insert and I can see what is the value of > this insert. After replugging two variants possible: > - I will find the previous value in the base. If for several dozens tests > the result is the same, that would mean that the situation is still needed > to be explained. > - I will find some value less than previous to the one sqlite could not > commit at least in some of the tests (maybe -3, -4 to the failed one). In > this case the explanation will be confirmed. > > How about this? Does it sound reasonable? > > Sounds like a clever test to me! I also found this page, used to force a Linux system crash: http://evuraan.blogspot.com/2005/06/how-to-force-paniccrash-in-linux.html I seem to remember a post that SQLite commit/sync is tested with the kill command, but it seems like actually crashing the machine (which is done without a sync first) might be a better test. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov <max.vla...@gmail.com> wrote: > > On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson <pri...@gmail.com> wrote: > > > > But I thought about how it would be possible to test this explanation . > > I'm > > > going to do some tests that works like this. The same id updating, but in > > > the middle of 10,000 operation I will unplug the cord, the sqlite will > > say > > > that it can't commit particular insert and I can see what is the value of > > > this insert. After replugging two variants possible: > > > - I will find the previous value in the base. If for several dozens tests > > > the result is the same, that would mean that the situation is still > > needed > > > to be explained. > > > - I will find some value less than previous to the one sqlite could not > > > commit at least in some of the tests (maybe -3, -4 to the failed one). In > > > this case the explanation will be confirmed. > > > > > > How about this? Does it sound reasonable? > > > > > > > > Sounds like a clever test to me! > > > > > > > Thanks for supporting :) now the results. > > I switched off the journal: > PRAGMA journal_mode=OFF; > As I described, the only record contained id=1 and the sql query was > UPDATE TestTable SET Id=Id + 1 > The cord was unplugged in the middle of the 10,000 operations when about > 5000 records were updated. The hd was bus-powered external hard drive and I > repeated the test several times. No test showed expected value that confirms > the Jim's explanation about postponed writing. The actual values are below > (the first one is the expected value to be found after reopening, the next > one is actual value and the difference is self-explainable) > > 5094 -> 5086 = -8 > 5084 -> 5083 = -1 > 5070 -> 5049 = -21 > 5082 -> 5069 = -13 > 5095 -> 5086 = -9 > 5072 -> 5033 = -39 > 5118 -> 5053 = -65 > 5081 -> 5075 = -6 > > So the maximum of non-flushed commits was 65 > > I also made a couple of tests with journal on and see what is the difference > between expected, non-restored value, and restored value. One of the > results: > 5078 (expeced) -> 5077 (non restored) -> 5021 (restored) > It seems that for correctly operating os/hardware the numbers should be > 5078->5077->5077 or no journal present depending on the moment of > unplugging. So this postponing also made possible existing of some prior > version of journal file. > > So, the next question was 'where?'. Is this software or hardware to blame. > Richard always mentioned hardware in this context, I decided to check this > with another device. This time it was a harddisk box with separated bus and > power (the hd was Maxtor IDE 80Gb drive). > > The first variation was similar to the one with bus-powered, this time I > used hard button on the box that is equivalent to unplugging both connection > and power. The difference for a single test was really impressive 5355 -> > 4445 = -910. And when I calculated numbers for non-interrupted test the > drive showed about 300 commits per second. > > The second variation was just unplugging the cord but keeping the power > intact, so if it's drive that caches, it would end its operations > completely. This time the results were perfect, for example 4822 -> 4822, > and even 5371 -> 5372 = +1 that actually would mean the process was > interrupted after all data is written but before sqlite made winsync, os > reported failure, but the data was already there. > > So the sad news about faulty hardware is probably true once again. Hi Max - this turned out to be a good test you devised! You know, you can never learn *too* much. :) Your test confirms that the USB drive is saying data is written to the drive when it really isn't, which seems bad. But it got me thinking a little more. Your drive for sure has lost the Durability property of ACID: when SQLite comes back from a commit, the data just written may or may not be on the drive, as you proved. So my question is, does it maintain the other 3 parts of ACID, so that the database will never be in a corrupted state after a power loss, even though it may be missing some writes that were confirmed? There are many use cases where a drive like yours would be usable by SQLite: it would be as if the async vfs thing was being used. But if using this drive could lead to a corrupted database, the drive isn't safe to use at all with SQLite, IMO. It would be interesting to know the difference, and even better, if SQLite or an SQLite app could somehow run a test on a drive to figure out whether the drive is: a) perfectly safe: every commit is on the
Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote: >> So my question is, does it maintain the other 3 parts of ACID, so that >> the database will never be in a corrupted state after a power loss, >> even though it may be missing some writes that were confirmed? > > Jim, I think the answer to your question is already in Max's tests: > the USB drive is completely unreliable and can easily lead to database > corruption. I'll explain. Max's tests showed that there were > situations when database and journal were different not by one > transaction but by several ones. So if one makes changes to several > database pages (located in different disk sectors) and/or makes > transactions touching several database pages (which makes multi-page > journal files) then these pages can be way out of sync with each other > (including pages inside journal). And this will easily lead to > database corruption. > > > Pavel Hmm... I guess I am not quite convinced. :) I believe (though am not sure) that the drive can reorder individual sector/record writes between syncs however it likes, as long as it doesn't allow writes to cross a sync and also executes the sync groups (for lack of a better term - the group of writes between syncs) in order. As an example, say you are inserting records on each commit instead of updating. Maybe the last record inserted was 1000 and the plug is pulled. When plugged back in, if you only have 800 records in the db, it's not so bad. However, if there is a corrupt db instead, it is much worse. Maybe Max would be kind enough to re-run his test with an insert instead of an update and see if there ever is corruption or if just the last n inserts are missing. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com > > On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson <pri...@gmail.com> wrote: >> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov <max.vla...@gmail.com> wrote: >>> >>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson <pri...@gmail.com> wrote: >>> >>> > > But I thought about how it would be possible to test this explanation . >>> > I'm >>> > > going to do some tests that works like this. The same id updating, but >>> > > in >>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will >>> > say >>> > > that it can't commit particular insert and I can see what is the value >>> > > of >>> > > this insert. After replugging two variants possible: >>> > > - I will find the previous value in the base. If for several dozens >>> > > tests >>> > > the result is the same, that would mean that the situation is still >>> > needed >>> > > to be explained. >>> > > - I will find some value less than previous to the one sqlite could not >>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). >>> > > In >>> > > this case the explanation will be confirmed. >>> > > >>> > > How about this? Does it sound reasonable? >>> > > >>> > > >>> > Sounds like a clever test to me! >>> > >>> > >>> > >>> Thanks for supporting :) now the results. >>> >>> I switched off the journal: >>> PRAGMA journal_mode=OFF; >>> As I described, the only record contained id=1 and the sql query was >>> UPDATE TestTable SET Id=Id + 1 >>> The cord was unplugged in the middle of the 10,000 operations when about >>> 5000 records were updated. The hd was bus-powered external hard drive and I >>> repeated the test several times. No test showed expected value that confirms >>> the Jim's explanation about postponed writing. The actual values are below >>> (the first one is the expected value to be found after reopening, the next >>> one is actual value and the difference is self-explainable) >>> >>> 5094 -> 5086 = -8 >>> 5084 -> 5083 = -1 >>> 5070 -> 5049 = -21 >>> 5082 -> 5069 = -13 >>> 5095 -> 5086 = -9 >>> 5072 -> 5033 = -39 >>> 5118 -> 5053 = -65 >>> 5081 -> 5075 = -6 >>> >>> So the maximum of non-flushed commits was 65 >>> >>> I also made a couple of tests with journal on and see what is the difference >>> between expected, non-restored value, and restored value. One of the >>> results: >>> 5078 (expeced) -> 5077 (non restored) -> 5021 (restored) >>> It seems that for correctly ope
Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote: >> So my question is, does it maintain the other 3 parts of ACID, so that >> the database will never be in a corrupted state after a power loss, >> even though it may be missing some writes that were confirmed? > > Jim, I think the answer to your question is already in Max's tests: > the USB drive is completely unreliable and can easily lead to database > corruption. I'll explain. Max's tests showed that there were > situations when database and journal were different not by one > transaction but by several ones. So if one makes changes to several > database pages (located in different disk sectors) and/or makes > transactions touching several database pages (which makes multi-page > journal files) then these pages can be way out of sync with each other > (including pages inside journal). And this will easily lead to > database corruption. You are right I think. I wrote my own test program and ran it on a Macbook Pro with a Seagate USB drive. Here is the Python test program: import os, sqlite3, time path = '/Volumes/TM/file.db' exists = os.path.exists(path) con = sqlite3.connect(path) if exists: k = con.execute('select max(k) from t').fetchone()[0] print 'Previous run:', k con.execute('drop table t') con.commit() con.execute('create table t (k integer primary key)') t0 = time.time() try: for i in range(5000): con.execute('insert into t values(?)', (i,)) con.commit() con.close() except Exception, err: print err, 'at record', i et = time.time() - t0 print i, 'records inserted in', et, 'seconds;', i/et, 'recs/sec' Here are the results: 1. First run, no unplug: [jim@mb backup]$ time py t.py 4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec real0m17.794s user0m0.729s sys 0m2.139s 2. Unplugged during the run: [jim@mb backup]$ time py t.py Previous run: 4999 disk I/O error at record 2681 2681 records inserted in 9.2398519516 seconds; 290.156164194 recs/sec real0m9.294s user0m0.410s sys 0m1.193s 3. This shows that only 161 records made it to disk in the previous run, not 2681: [jim@mb backup]$ time py t.py Previous run: 161 disk I/O error at record 977 977 records inserted in 3.34149599075 seconds; 292.384010845 recs/sec real0m3.453s user0m0.162s sys 0m0.432s 4. This time there was no unplug, but SQLite could not recover the db: [jim@mb backup]$ time py t.py Previous run: 161 database disk image is malformed at record 1323 1323 records inserted in 4.26392698288 seconds; 310.277358246 recs/sec real0m4.341s user0m0.193s sys 0m0.535s Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com > > > Pavel > > On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson <pri...@gmail.com> wrote: >> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov <max.vla...@gmail.com> wrote: >>> >>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson <pri...@gmail.com> wrote: >>> >>> > > But I thought about how it would be possible to test this explanation . >>> > I'm >>> > > going to do some tests that works like this. The same id updating, but >>> > > in >>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will >>> > say >>> > > that it can't commit particular insert and I can see what is the value >>> > > of >>> > > this insert. After replugging two variants possible: >>> > > - I will find the previous value in the base. If for several dozens >>> > > tests >>> > > the result is the same, that would mean that the situation is still >>> > needed >>> > > to be explained. >>> > > - I will find some value less than previous to the one sqlite could not >>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). >>> > > In >>> > > this case the explanation will be confirmed. >>> > > >>> > > How about this? Does it sound reasonable? >>> > > >>> > > >>> > Sounds like a clever test to me! >>> > >>> > >>> > >>> Thanks for supporting :) now the results. >>> >>> I switched off the journal: >>> PRAGMA journal_mode=OFF; >>> As I described, the only record contained id=1 and the sql query was >>> UPDATE TestTable SET Id=Id + 1 >>> The cord was unplugged in the middle of the 10,000 operations when about >>> 5000 records were updated. The hd was bus-powered external hard drive and I >>> repeated the test several times. No test showed
Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
On Mon, Feb 14, 2011 at 3:02 PM, Max Vlasov <max.vla...@gmail.com> wrote: > On Mon, Feb 14, 2011 at 8:42 PM, Jim Wilcoxson <pri...@gmail.com> wrote: > >> On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote: >> >> So my question is, does it maintain the other 3 parts of ACID, so that >> >> the database will never be in a corrupted state after a power loss, >> >> even though it may be missing some writes that were confirmed? >> > >> > Jim, I think the answer to your question is already in Max's tests: >> > the USB drive is completely unreliable and can easily lead to database >> > corruption. I'll explain. Max's tests showed that there were >> > situations when database and journal were different not by one >> > transaction but by several ones. So if one makes changes to several >> > database pages (located in different disk sectors) and/or makes >> > transactions touching several database pages (which makes multi-page >> > journal files) then these pages can be way out of sync with each other >> > (including pages inside journal). And this will easily lead to >> > database corruption. >> >> You are right I think. I wrote my own test program and ran it on a >> Macbook Pro with a Seagate USB drive. Here is the Python test >> program: >> >> [jim@mb backup]$ time py t.py >> 4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec >> >> > > Jim, your tests also shows (this time on a different os) that either you > have a fantastic hd with 18000 Rpm or just someone at Seagate _want_ you to > think you have a fantastic hd :) > > Just wondering, I know this maybe sounds fantastic, but I'm thinking whether > some "acid-passed harddrives" at sqlite.org can encourage manufacturers to > hold the horses. The logic would be like this: if some model is present in > either section then googling it will make this page very high in the google > results (due to high pagerank of sqlite.org). So they probably very quickly > notice that this page at least partly can affect their sales. Unfortunately > the technical side is more complex, the developers just can't rely on > e-mails from users, this should be some sqlite-originated tests performing > on a known configuration and it'd better be an oss os with known tuning. > Maybe some other, less fantastic form of such tests could be possible... > > Max I think what would be useful is for drives to have 3 settings: a) the drive can do whatever it wants to optimize performance b) the drive can reorder writes, but not across a sync (ACI, but no D) c) the drive has to respect all syncs (ACID) If the drive mgfr wants to make a) the default, that's fine, but there should be an easy way to request the other 2 from an application. Users are not usually sophisticated enough to know when it's okay to cache writes or not okay. For my use of SQLite, b) would probably be fine, but a) apparently is not since it corrupts databases. Michael mentioned doing backups and forgetting about all of this, but many people are using USB drives to store their backups. So there is a bit of a Catch-22 here. Almost all modern-day filesystems are going to depend on some kind of journalling to prevent corrupted file systems, and as far I as know, journalling filesystems depend on syncs to maintain FS consistency. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
This is a common issue on the mailing list. The first time you do count(*), SQLite (actually your OS) has to load data into memory. Most OS's will keep the file in a buffer cache, so the 2nd count(*) doesn't have to read from disk. Here's a timing from my own system, after a purge command to clear the buffer cache: $ time sqlite3 hb.db 'select count(*) from logs' -- Loading resources from /Users/jim/.sqliterc count(*) -- 734909 real0m0.580s user0m0.190s sys 0m0.034s Same command again, with the file cached: $ time sqlite3 hb.db 'select count(*) from logs' -- Loading resources from /Users/jim/.sqliterc count(*) -- 734909 real0m0.189s user0m0.165s sys 0m0.019s This time is consistent no matter how many times I run it, because the file is still cached. Doing a purge command to clear the cache and re-running the query, we get: $ purge $ time sqlite3 hb.db 'select count(*) from logs' -- Loading resources from /Users/jim/.sqliterc count(*) -- 734909 real0m0.427s user0m0.175s sys 0m0.024s On my system, there is not a huge difference, but it is consistent. Now, if you have a fragmented file system, you will see a much larger difference. There are many posts on the mailing list about both file system fragmentation and logical fragmentation within the SQLite file itself. Your first count(*) is subject to these fragmentation effects, while your 2nd usually is not, because the file is in memory. Some people on the list believe fragmentation is an unimportant detail you shouldn't worry about, because you have little control over it. That may be true, but it's useful to understand how it can affect performance. I think you are seeing this first hand. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com On Mon, Feb 21, 2011 at 9:37 AM, Sven Lwrote: > > Same result :( > Note that I have compiled SQLite with the following switches: > SQLITE_ENABLE_STAT2 > SQLITE_THREADSAFE=2 > > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the engine > has to traverse all columns and it might even return another value if there > are NULL-values... > > Also, this is quite interesting: > > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items; > 0|0|0|SCAN TABLE Items (~100 rows) > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items; > sqlite> > > I would expect an index scan on the first statement. The second statement > tells me nada?! > > Thanks for your help! > > >> From: slav...@bigfraud.org >> Date: Mon, 21 Feb 2011 14:24:50 + >> To: sqlite-users@sqlite.org >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> >> On 21 Feb 2011, at 2:23pm, Sven L wrote: >> >> > SELECT COUNT(ItemID) FROM Items; >> > >> > This takes around 40 seconds the first time! WHY?! >> >> Try again, doing everything identically except that instead of the above >> line use >> >> SELECT COUNT(*) FROM Items; >> >> Simon. >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Mon, Feb 21, 2011 at 10:38 AM, Puneet Kishor <punk.k...@gmail.com> wrote: > On Mon, Feb 21, 2011 at 10:17:03AM -0500, Jim Wilcoxson wrote: >> This is a common issue on the mailing list. The first time you do >> count(*), SQLite (actually your OS) has to load data into memory. >> Most OS's will keep the file in a buffer cache, so the 2nd count(*) >> doesn't have to read from disk. > > > One question I have related to the above -- how long does that buffer > cache remain filled with the data? I am assuming it is until new stuff > to be cached pushes out old stuff, no? For most OS's, the time data remains in the cache and the size of the cache will vary as a function of available RAM. > > I was doing some R*Tree selects, and the first query was dog slow, > although benchmarking showed that the actual CPU time was very small. > Subsequent queries were lightning fast. I am assuming that the buffer is > not getting filled with the results as much as it is getting filled with > whatever part of the db that the program needs to open to do its work. Right. SQLite doesn't cache query results. It does cache database pages in its own cache, which by default is rather small: 2000 pages. At the default page size is 1K, that's a 2MB cache. To fetch records not in it's own cache, SQLite will use the OS. If the page is in the OS cache, there is no seek time and no read latency (for spinning media). Jim > > >> >> Here's a timing from my own system, after a purge command to clear the >> buffer cache: >> >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> -- >> 734909 >> >> real 0m0.580s >> user 0m0.190s >> sys 0m0.034s >> >> Same command again, with the file cached: >> >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> -- >> 734909 >> >> real 0m0.189s >> user 0m0.165s >> sys 0m0.019s >> >> This time is consistent no matter how many times I run it, because the >> file is still cached. Doing a purge command to clear the cache and >> re-running the query, we get: >> >> $ purge >> $ time sqlite3 hb.db 'select count(*) from logs' >> -- Loading resources from /Users/jim/.sqliterc >> count(*) >> -- >> 734909 >> >> real 0m0.427s >> user 0m0.175s >> sys 0m0.024s >> >> On my system, there is not a huge difference, but it is consistent. >> Now, if you have a fragmented file system, you will see a much larger >> difference. There are many posts on the mailing list about both file >> system fragmentation and logical fragmentation within the SQLite file >> itself. Your first count(*) is subject to these fragmentation >> effects, while your 2nd usually is not, because the file is in memory. >> >> Some people on the list believe fragmentation is an unimportant detail >> you shouldn't worry about, because you have little control over it. >> That may be true, but it's useful to understand how it can affect >> performance. I think you are seeing this first hand. >> >> Jim >> -- >> HashBackup: easy onsite and offsite Unix backup >> http://www.hashbackup.com >> >> >> >> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote: >> > >> > Same result :( >> > Note that I have compiled SQLite with the following switches: >> > SQLITE_ENABLE_STAT2 >> > SQLITE_THREADSAFE=2 >> > >> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the >> > engine has to traverse all columns and it might even return another value >> > if there are NULL-values... >> > >> > Also, this is quite interesting: >> > >> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items; >> > 0|0|0|SCAN TABLE Items (~100 rows) >> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items; >> > sqlite> >> > >> > I would expect an index scan on the first statement. The second statement >> > tells me nada?! >> > >> > Thanks for your help! >> > >> > >> >> From: slav...@bigfraud.org >> >> Date: Mon, 21 Feb 2011 14:24:50 + >> >> To: sqlite-users@sqlite.org >> >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> >> >> >> >> On 21 Feb 2011, at 2:23pm, Sven L wrote: >> >> >> >> > SELECT COUNT(ItemID) FROM
Re: [sqlite] COUNT() extremely slow first time!
On Mon, Feb 21, 2011 at 11:05 AM, Sven Lwrote: > > Thank you for your detailed explanation! > First, can you please tell me how to purge the cache in Windows 7? This could > be very useful for my tests! Sorry, dunno for Windows. On Mac OSX it is the purge command, in the development tools. On Linux, you do: echo 3 > /prog/sys/vm/drop_caches > > I'm quite sure my database itself is not fragmented, since I have only > inserted data. If you insert the data in primary key order, or you don't have any key (SQLite generates one) it's probably not very fragmented. But keep in mind that while to you, things are happening sequentially as you add records, inside SQLite, things are happening quite differently. For example, let's say you have 4 indexes on this database. So for every insert you do, there are 5 database pages being affected (one for the data row, one for each index). These pages will fill up at different rates, depending on the size of your keys, and will be written to disk at different times. So you are very likely to have data pages and various index pages all intermixed in your SQLite file. Also with multiple indexes, it's unlikely that every index will be in sorted order, based on the records you are inserting. So IMO, there's no practical way to avoid fragmentation within the SQLite file. > The file system is in good shape too; Windows reports 0% fragmentation. > Perhaps there is some other bottleneck, like disk performance in general > (this is a 5400 rpm)? Or antivirus-related CPU holds? (I did turn real-time > scanning off though.) > I have even turned Windows Search off (and got a 20% performance gain!). My > 32-bit application is running under Windows 7 (64-bit). Could WOW64 have > something to do with this performance issue? > > The size of the database is 1,98 GB, with 1.5M rows. Couldn't it be that the > table scan simply is darn slow for huge tables? How much free RAM do you have? Windows being the pig that it is, I'm guessing not much, unless your system has > 4GB of RAM. For comparison, you could write a program that just read through the file, 4K at a time (I think that's the default page size for SQLite on Windows). Starting with a cold cache, this is near the best time you will ever see for count(*) in SQLite. > > In an ideal world the COUNT() would be performed on the primary key in RAM. > That's not possible? :P Sure, if all of the primary key records are in either the SQLite cache or OS cache. If they aren't, you have seek time. I just did a small test to compare count(*) with count(primary key). Here is count(*): sqlite> explain select count(*) from logs; addr opcode p1p2p3p4 p5 comment - - -- - 0 Trace 0 0 000 1 Goto 0 8 000 2 OpenRead 1 120 keyinfo(1,BINARY) 00 3 Count 1 1 000 4 Close 1 0 000 5 SCopy 1 2 000 6 ResultRow 2 1 000 7 Halt 0 0 000 8 Transaction0 0 000 9 VerifyCookie 0 23000 10TableLock 0 8 0 logs 00 11Goto 0 2 000 Here is count(primary key): sqlite> explain select count(logid) from logs; addr opcode p1p2p3p4 p5 comment - - -- - 0 Trace 0 0 000 1 Null 0 2 000 2 Null 0 1 000 3 Goto 0 14000 4 OpenRead 0 8 0 0 00 5 Rewind 0 9 000 6 Rowid 0 3 000 7 AggStep0 3 1 count(1) 01 8 Next 0 6 001 9 Close 0 0 000 10AggFinal 1 1 0 count(1) 00 11SCopy 1 4 000 12ResultRow 4 1 000 13Halt 0 0 000 14Transaction0 0 000 15VerifyCookie 0 23000 16TableLock 0 8 0 logs 00 17Goto 0 4 000 It's not clear to me whether for count(*), SQLite is reading each data page or reading the primary key index pages. But when timed, count(*) is 16x faster than count(logid) the first time, and 8x faster once all pages are cached. This is just
Re: [sqlite] COUNT() extremely slow first time!
The SQLite cache size is in pages (2000 by default), so by increasing the page size 8x, you're also increasing the cache size and memory requirements by 8x. Not saying it's a bad thing, just something to be aware of. If you want to compare 1K and 8K page size and only compare the effect page size has, you should either increase the cache size to 16000 for 1K pages or decrease the cache to 250 for 8K pages. The other thing to be aware of is that SQLite will not allow a row to cross 2 pages. (It does allow a row to be larger than a page, using an overflow page.) So for example, if your page size is 1024 and row size is 512 bytes, you can fit 2 rows on a page. I'm simplifying this somewhat and ignoring internal SQLite data, but you get the idea. If your row size is 513 bytes, you will have 511 bytes of waste on each page, so 50% of your database will be "air". As your row size heads toward 1024 there will be less waste. At 1025 bytes, SQLite will start splitting rows into overflow pages, putting 1024 bytes into the overflow page and 1 byte in the btree page. These numbers aren't right, but illustrate the point. So to find a good page size, experiment and measure. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com On Wed, Feb 23, 2011 at 10:20 AM, Tegwrote: > Hello Greg, > > I found this to be the case too. The difference between 1K and 8K is > staggering. I default all my windows DB's to 8K now. > > > Tuesday, February 22, 2011, 1:59:29 PM, you wrote: > > GB> I'm currently dealing with a similar issue. I've found that the page_size > GB> PRAGMA setting can have a dramatic effect on how long it takes to "warm > up" > GB> the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column) > GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 > takes > GB> 8.5 seconds. This was done with a reboot between each test. > > > > > -- > Best regards, > Teg mailto:t...@djii.com > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] COUNT() extremely slow first time!
On Wed, Feb 23, 2011 at 11:12 AM, Sven Lwrote: > > Thanks for pointing this out! > > In my case I have spent much time on normalizing my tables, so the row size > should be constant in most cases. I do wonder though, what if the row size is > 32 bytes? Or is there a minimum? > > For instance, I have many lookup tables with ID+text (usually around 20 > characters): > MyID|MyText > > With a page size of 4096, will SQLite put ~200 rows in one page? Yes, very roughly. There is other internal information: a header on each page, on each row, on each field, ints are variable length, etc., and SQLite reserves some free space on each page for later inserts. Use sqlite3_analyzer for lots of useful info when picking a page size. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com > > > >> Date: Wed, 23 Feb 2011 10:47:03 -0500 >> From: pri...@gmail.com >> To: t...@djii.com; sqlite-users@sqlite.org >> Subject: Re: [sqlite] COUNT() extremely slow first time! >> >> The SQLite cache size is in pages (2000 by default), so by increasing >> the page size 8x, you're also increasing the cache size and memory >> requirements by 8x. Not saying it's a bad thing, just something to be >> aware of. >> >> If you want to compare 1K and 8K page size and only compare the effect >> page size has, you should either increase the cache size to 16000 for >> 1K pages or decrease the cache to 250 for 8K pages. >> >> The other thing to be aware of is that SQLite will not allow a row to >> cross 2 pages. (It does allow a row to be larger than a page, using >> an overflow page.) So for example, if your page size is 1024 and row >> size is 512 bytes, you can fit 2 rows on a page. I'm simplifying this >> somewhat and ignoring internal SQLite data, but you get the idea. If >> your row size is 513 bytes, you will have 511 bytes of waste on each >> page, so 50% of your database will be "air". As your row size heads >> toward 1024 there will be less waste. At 1025 bytes, SQLite will >> start splitting rows into overflow pages, putting 1024 bytes into the >> overflow page and 1 byte in the btree page. These numbers aren't >> right, but illustrate the point. >> >> So to find a good page size, experiment and measure. >> >> Jim >> -- >> HashBackup: easy onsite and offsite Unix backup >> http://www.hashbackup.com >> >> >> >> On Wed, Feb 23, 2011 at 10:20 AM, Teg wrote: >> > Hello Greg, >> > >> > I found this to be the case too. The difference between 1K and 8K is >> > staggering. I default all my windows DB's to 8K now. >> > >> > >> > Tuesday, February 22, 2011, 1:59:29 PM, you wrote: >> > >> > GB> I'm currently dealing with a similar issue. I've found that the >> > page_size >> > GB> PRAGMA setting can have a dramatic effect on how long it takes to >> > "warm up" >> > GB> the table. On Windows 7, with page_size=1024, a SELECT >> > COUNT(last_column) >> > GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 >> > takes >> > GB> 8.5 seconds. This was done with a reboot between each test. >> > >> > >> > >> > >> > -- >> > Best regards, >> > Teg mailto:t...@djii.com >> > >> > ___ >> > sqlite-users mailing list >> > sqlite-users@sqlite.org >> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Commit frequency and performance
I am creating an SQLite database via Python, and trying to understand some performance issues. This application does 3.8M inserts. Most inserts are to a main database that ends up being around 293MB. Around 740K of the inserts with larger data records are to 25 related databases of around 600MB each. The main database and 1 subdatabase are active together, then the subdatabase is closed when it gets to around 600MB and a new one is opened. This is an archiving application. As a performance baseline, I changed the application to just write out plain sequential text files, and it takes around 62 minutes to complete. This includes all the Python overhead and the raw hard drive overhead to write out the same amount of data in a "best case" scenario: Time: 3754.16 seconds Files: 708120 Bytes: 31565490710 real62m34.335s user53m55.492s sys 2m58.305s If I use SQLite and commit every 5 seconds, I get this performance: Time: 11383.95 seconds Files: 708120 Bytes: 31565490710 real189m45.061s user55m58.638s sys 4m46.528s If I commit every 30 seconds, I get this: Time: 13021.34 seconds Files: 708120 Bytes: 31565490710 real217m2.078s user56m9.647s sys 4m59.850s I believe fsync/fdatasync are significant performance issues, so I thought that it should improve performance if I start a transaction, do ALL of the inserts, then commit. Since I'm starting with an empty database, the journal should stay mostly empty, the database will be built without any syncs, and when I commit, the small journal will be deleted. I also added pragma synchronous=off. (All of these test are run with pragma lockmode exclusive.) But when I tried this, the performance was slowest of all: Time: 15356.42 seconds Files: 708120 Bytes: 31565490710 real255m57.523s user55m51.215s sys 4m22.173s Can anyone shed light on why building a database inside a single transaction would be slower than periodically commiting? Thanks, Jim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Hundreds of thousands of INSERTs
I had a similar experience: changing my page size decreased the run time of my SQLite application from 4 hours to 80 minutes (Linux). I think it would be very good if the default page size was changed to 4096, at least for LInux builds. It makes a huge difference in performance. Jim On 2/17/09, Paolo Pisatiwrote: > Alexey Pechnikov wrote: >> >> PRAGMA default_cache_size = 20; >> pragma page_size=4096; >> vacuum; >> > > sqlite> pragma page_size; > 4096 > sqlite> PRAGMA default_cache_size; > 20 > > p...@ferret:~/log_analyzer >date ; ./log_parser_ms_sqlite.pl 2 > vmta2-2009-02-16_1400-success.log.bz2.done.ec.ed ; date > Tue Feb 17 15:07:40 CET 2009 > Tue Feb 17 15:13:22 CET 2009 > > it's a 75% shave of application time, amazing!!! > > -- > > bye, > P. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow performance with Sum function
Have you tried changing the page size to 4096 or 8192? Doing this with my SQLite application and increasing the transaction size decreased runtime from over 4 hours to 75 minutes.The runtime for my app writing the same amount of data to flat files was 55 minutes, so the time penalty for building a database was about 35%, which seemed reasonable. I haven't tried changing the cache size yet, because I like that my app uses a small amount of memory. Good luck! Jim On 3/4/09, Alexey Pechnikovwrote: > Can enough cache size prevent fragmentation? And how to calculate degree of > fragmentation and when is needed make vacuum of database? > > Best regards. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite Transaction Rate and speed...
If your goal is to handle 1 million inserts/updates per second, and a hard transaction to disk takes 1/60th of a second, you need at least 1 insert/updates per transaction. Do your testing with a transaction size of 20,000 and see what kind of performance you get. I'd probably set it higher, like to 100,000 transactions if you have a continuous stream of data coming in at this rate. You get no benefit from one insert/update per commit, unless you are acking the transaction back to the source and it has retry/resend capabilities. In that case, you need a battery backed-up disk controller, and still will probably benefit from a large transaction size. I sort of doubt you have retry/resend abilities though, because for a data stream coming in at 192 MBit/sec (3 doubles = 192 bits), any kind of latency for error recovery might put you in a situation where you could never catch back up. Also, if you use "insert or replace", you can avoid the select, speeding up your app.r Jim On 3/6/09, Nuzziwrote: > > > > ken-33 wrote: >> >> >> >> look at the sql syntax for insert or replace for sqlite. >> >> Also you goal to handle 1 million per minute is probably going to be >> dependant upon your hardware. >> >> For instance throughput greatly increases with disk striping. >> >> Also the faster the RPM of the drive the more transactions can be >> processed. Code it up and find out! >> >> > > I have actually coded it up. The way I am currently doing it is sending the > data to a function (the data is 3 doubles) and in that function doing a > SELECT to get the data currently in the DB, then updating the data, then > UPDATE or INSERT. The SQL calls are compiled statements with binds, etc. > It is woefully slow. I was kind of hoping that maybe I was missing a step > or just unfamiliar with the best techniques. I know that when adding a lot > of data with the BEGIN and END TRANSACTION things speed up considerably, but > I can't use it in my case (I don't believe) because the UPDATE depends upon > the data in the SELECT. > > John > > -- > View this message in context: > http://www.nabble.com/SQLite-Transaction-Rate-and-speed...-tp22379931p22380539.html > Sent from the SQLite mailing list archive at Nabble.com. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] I need help with very complex queries
You have specified how the movies table relates to the other tables, but you haven't specified any independent selection criteria for any tables. For example, in your query you need to add something like: and genres.genre = 'drama'. For this query, only the movies and genres tables are needed because you aren't selecting any fields from the other tables, from what I can tell. Running through a few online SQL tutorials will help you get a better feel for how SQL works. Jim On 3/10/09, Yuzemwrote: > > As an example, I have: > tables: > movies genres keywords languages countries etc... > > movies columns: > id title year director etc > > The others are: > id "name of the table" (example: id genres) > > The id in all the tables are the same column that is the id of the movie so > that every movie can be many times in all the tables but only one time in > movies. > > What I want to do is something like: > select ... where genres = Horror and countries = italy and keywords = > > This is what I have: > sqlite3 movies.db "select movies.id,title,year from > movies,genres,countries,languages,keywords,tags where movies.id = genres.id > and movies.id = countries.id and movies.id = languages.id and movies.id = > keywords.id and movies.id = tags.id" > > There is no query and it is giving me nothing... > Many thanks in advance! > -- > View this message in context: > http://www.nabble.com/I-need-help-with-very-complex-queries-tp22446301p22446301.html > Sent from the SQLite mailing list archive at Nabble.com. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] designing a db to hold repeating data
You could eliminate met_grid_id from the cells table and replace it with an expression cell_id/2500. This expression will automatically truncate, giving you met_grid_id whenever you need it. This will save around 5 MB for a 1M row cell table. Also, queries in the cells table by met_grid_id, if you do that, can be made much faster using the relationship above. If you query directly on met_grid_id, sqlite will have to do a table scan. But if you use the relationship, sqlite can use the index, for example: select * from cells where met_grid_id = N (table scan required) becomes: select *,cell_id/2500 as met_grid_id from cells where cell_id between N*2500 and (N*5000)-1 (can use the cell_id index) I think by the same method you can eliminate met_grid_id from the met table, saving around 14.5MB Jim On 3/15/09, P Kishorwrote: > I have a grid of 1000 x 1000 cells with their own data as well as 20 > years of daily weather data (20 * 365 = 7300 rows) for each contiguous > group of 50 x 50 cell. > > CREATE TABLE cells ( > cell_id INTEGER PRIMARY KEY, > other cell attributes, > lat, > lon, > met_grid_id INTEGER > ); > > cell_id met_grid_id > --- > 0 0 > 1 0 > .. > 24990 > 25001 > 25011 > .. > 49991 > > CREATE TABLE met ( > met_id INTEGER PRIMARY KEY, > other met attributes, > met_grid_id INTEGER > ); > > met_id met_grid_id > -- --- > 0 0 > 1 0 > .. > 7299 0 > 7300 1 > 7301 1 > .. > > CREATE VIRTUAL TABLE cell_index USING rtree ( > cell_id INTEGER, > minx REAL, > maxx REAL, > miny REAL, > maxy REAL > ) > > The db is about 350 MB with the cell table with 1000,000 rows and the > met table with 2,920,000 rows and the R*Tree index. > > Is there any other better way that jumps out at any of you? > > > -- > Puneet Kishor http://www.punkish.org/ > Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/ > Carbon Model http://carbonmodel.org/ > Open Source Geospatial Foundation http://www.osgeo.org/ > Sent from: Madison WI United States. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speed of DROP INDEX
Drop is executed within a transaction, which means that every record you touch has to be backed up to the journal first, then modified in the database. I'm guessing that if you use pragma synchronous=off, it would speed up the drop index, but you'd take a chance on corrupting the database if the machine crashed during the drop. It would probably be faster to make a copy of the database (all sequential I/O), then drop the index with synchronous=off in one of the copies (no journal I/O), then use vacuum if you want to really clean up the DB. If something goes wrong, you still have your backup copy. I haven't actually tried this; let us know if it makes a big difference. Jim On 3/17/09, Nikolas Stevenson-Molnarwrote: > Hi, > > I'm trying to drop an index on a table with about 8 million rows and > it's taking a very long time. I can understand why building the index > would take some time, but why dropping it? And is there any way to > speed it up? > > Thanks! > _Nik > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speed of DROP INDEX
Hmm... Maybe it is creating the journal but not really using it with synchronous=off. You might try pragma journal_mode = off. That might keep it from creating a journal, but if you already tried using synchronous=off, my guess is journal_mode=off won't run any faster. Jim On 3/18/09, Nikolas Stevenson-Molnar <steve...@evergreen.edu> wrote: > I've actually been running it with synchronous=off. Unfortunately, it > doesn't seem to run any faster and still creates a journal file. > > _Nik > > On Mar 17, 2009, at 6:05 PM, Jim Wilcoxson wrote: > >> Drop is executed within a transaction, which means that every record >> you touch has to be backed up to the journal first, then modified in >> the database. I'm guessing that if you use pragma synchronous=off, it >> would speed up the drop index, but you'd take a chance on corrupting >> the database if the machine crashed during the drop. >> >> It would probably be faster to make a copy of the database (all >> sequential I/O), then drop the index with synchronous=off in one of >> the copies (no journal I/O), then use vacuum if you want to really >> clean up the DB. If something goes wrong, you still have your backup >> copy. >> >> I haven't actually tried this; let us know if it makes a big >> difference. >> >> Jim >> >> On 3/17/09, Nikolas Stevenson-Molnar <steve...@evergreen.edu> wrote: >>> Hi, >>> >>> I'm trying to drop an index on a table with about 8 million rows and >>> it's taking a very long time. I can understand why building the index >>> would take some time, but why dropping it? And is there any way to >>> speed it up? >>> >>> Thanks! >>> _Nik >>> ___ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> >> >> -- >> Software first. Software lasts! >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Newbie: round function
Well, it doesn't exactly say that an integer is returned. round() always returns a float: sqlite> select typeof(3); typeof(3) -- integer sqlite> select typeof(round(3)); typeof(round(3)) real sqlite> You can do this: sqlite> select typeof(cast(round(3.14) as integer)); typeof(cast(round(3.14) as integer)) integer sqlite> Jim On 3/18/09, sqlite.20.ede...@spamgourmet.comwrote: > Hi! > > Is this expected? > > $ sqlite3 > SQLite version 3.6.10 > Enter ".help"for instructions > Enter SQL statements terminated with a ";" > sqlite> select round(3.1416); > 3.0 > CPU Time: user 0.00 sys 0.00 > sqlite> > > I expected an integer 3 as documented: > > round(X) > round(X,Y)Round off the number X to Y digits to the right of the > decimal point. If the Y argument is omitted, 0 is assumed. > > Thanks in advance. > > Edésio > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up row by row lookup in a large db
I'm not sure I completely understand your data structure, but here are some ideas: First, conduct experiments with different page and cache sizes. I don't know if you jumped to a 32K page size or experimented with each size, but experimentation is the way to go. I'm guessing that in your cell_lc and cell_dist tables, rows are unique on cell_id. If so, you could make cell_id the primary key and avoid the indexes altogether. When you posted before, you mentioned that you had multi-TB hard drives, tons of memory, quad CPU's, etc, and didn't care about saving space. I'd denormalize the data, putting lc and dist fields in the cell table. That will eliminate join #2 and join #3. You don't really say what you're doing with the results of these queries below. Are you doing a bunch off Perlish stuff with the results? It's possible that most of your time is being spent in the Perl interpreter rather than in SQLite. To check this, try sending the output to /dev/null and using the sqlite3 command line tool to execute your queries. If they happen quickly, then the overhead is in the manipulation of the results, not the queries. Good luck! Jim On 3/21/09, P Kishorwrote: > Part 1. > --- > > I have the following schema in a SQLite db that is 430 MB on my > Macbook laptop's 320 GB HFS+ formatted 7200 RPM disk with an 8 MB > cache. > > -- 1000,000 rows > CREATE TABLE cell ( > cell_id INTEGER PRIMARY KEY, > met_cell_id INTEGER, > 8 other INTEGER or REAL columns > ) > > -- 38 rows > CREATE TABLE lc ( > lc_id INTEGER PRIMARY KEY, > 56 other INTEGER or REAL columns > ) > > -- 10 rows > CREATE TABLE dist ( > dist_id INTEGER PRIMARY KEY, > 37 other INTEGER or REAL columns > ) > > -- 2,920,000 > CREATE TABLE met ( > met_id INTEGER PRIMARY KEY, > met_cell_id INTEGER, > 9 other INTEGER or REAL columns > ) > > CREATE TABLE cell_lc (cell_id INTEGER, lc_id INTEGER) > CREATE TABLE cell_dist (cell_id INTEGER, dist_id INTEGER) > > CREATE INDEX idx_met_cell_id ON met (met_cell_id) > CREATE INDEX idx_cell_lc ON cell_lc (cell_id) > CREATE INDEX idx_cell_dist ON cell_dist (cell_id) > > I also have an R*Tree index, but that is a different story, not relevant > here. > > I retrieve *all* data for one cell ':cell_id' using the following queries > > [1] First retrieve all data from cell table > SELECT * FROM cell WHERE cell_id = :cell_id > > [2] Now retrieve the related lc, dist and met > SELECT lc.* > FROM lc l JOIN cell_lc c on l.lc_id = c.lc_id > WHERE c.cell_id = :cell_id > > [3] Retrieve the related dist > SELECT d.* > FROM dist d JOIN cell_lc c on d.dist_id = c.dist_id > WHERE c.cell_id = :cell_id > > [4] Retrieve the related met > SELECT * FROM met WHERE met_cell_id = > > I did some benchmarking with the above schema using Perl DBI, and I > get about 30 transactions per second as long as I returning the data > to memory. > > [08:38 AM] ~/Data/carbonmodel$perl carbonmodel.pl > > timethis 1: 0 wallclock secs ( 0.03 usr + 0.00 sys = 0.03 CPU) @ > 33.33/s (n=1) > > timethis 10: 0 wallclock secs ( 0.31 usr + 0.02 sys = 0.33 CPU) @ > 30.30/s (n=10) > > timethis 100: 3 wallclock secs ( 2.85 usr + 0.20 sys = 3.05 CPU) @ > 32.79/s (n=100) > > timethis 1000: 33 wallclock secs (31.08 usr + 1.22 sys = 32.30 CPU) @ > 30.96/s (n=1000) > > if I write the data to file, the speed drops to about 1 transaction per > second > > timethis 1000: 783 wallclock secs (732.26 usr + 18.22 sys = 750.48 > CPU) @ 1.33/s (n=1000) > > Even if I stick with manipulating the data in memory, at 30 > transactions per second (or 33 ms per transaction), it would take more > than 9 hours to query each of the 1 million cells one by one. > > In the real world, I will first find the relevant cell ids based on > lat-lon bounds (hence my R*Tree index) and then extract their data one > by one. > > How can I, if at all, speed this up? > > Part 2. > --- > > Alternatively, I could denormalize the data completely. Inspired by a > post on the Flickr blog > (http://code.flickr.com/blog/2009/03/18/building-fast-client-side-searches/), > in particular the para "To make this data available quickly from the > server, we maintain and update a per-member cache in our database, > where we store each member’s contact list in a text blob — this way > it’s a single quick DB query to retrieve it. We can format this blob > in any way we want: XML, JSON, etc" I decided to experiment with the > same technique. So... > > CREATE TABLE cell_blobs (cell_id INTEGER PRIMARY KEY, cell_data BLOB); > > I then queried each cell as in Part 1, serialized it and stored it in > the cell_blobs table. My intent is to simply retrieve a BLOB and > deserialize it... it would *possibly* be quicker than 33 ms per > retrieval. Well, I haven't yet completed this test because each BLOB > is taking about 430 KB. At 1 million rows, that is going to occupy > upward of 400 GB. I broke the load_blob_table routine after about a > third of the records had
Re: [sqlite] speeding up row by row lookup in a large db
The reason you're getting the same results is because you are CPU-bound. I/O has nothing to do with this problem. From your timings of your app, 31.76/33 = 96% CPU. If you were I/O bound, your real time would be 33 seconds and your sys+user time would be 3 seconds, or something low. My guess is you are spending more time looping around in Perl than you think. Try running your Perl code with one small, static set of test data in a loop and see how much time it takes w/o any DB interactions. That will give you a baseline for performance improvements. If it turns out that 1000 loops w/o SQL takes 25 seconds instead of 33, none of your SQL optimizations matter much. Jim On 3/21/09, P Kishorwrote: > So, I increased the cache_size to 1048576 but got the same results... > 30 odd SELECTs per second. > > Then I created an in-memory db and copied all the data from the > on-disk db to memory. I didn't use the backup API... simply opened a > db connection to an in-memory db, then created all the tables and > indexes, ATTACHed the on-disk db and did an INSERT .. SELECT * FROM > attached db. Interestingly, the same results -- > > [04:24 PM] ~/Data/carbonmodel$perl carbonmodel.pl > Creating in memory tables... done. > Transferring data to memory... done. Took: 90 wallclock secs (75.88 > usr + 8.44 sys = 84.32 CPU) > Creating indexes... done. Took: 38 wallclock secs (23.82 usr + 13.36 > sys = 37.18 CPU) > Prepare load testing > ...timethis 1000: 33 wallclock secs (30.74 usr + 1.02 sys = 31.76 > CPU) @ 31.49/s (n=1000) -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to get the previous row before rows matching a where clause...
What about: select blah from blah where rowid < windowstartrowid order by rowid desc limit 1 to get the row before, and: select blah from blah where rowid > windowlastrowid limit 1 to get the row after. Jim On 3/22/09, sorkawrote: > > I have a table of events that have a title, start time, and end time. > > The start time is guaranteed unique, so I've made it my primary integer key. > > I need all events that overlap the a window of time between say windowstart > and windowend. Currently, the statement > > SELECT title FROM event WHERE startTime < windowEnd AND endTime > > windowStart. > > I've indexed the end time and the query is pretty fast, but it could be a > lot faster if I only had to use the integer primary key. > > If instead I do > > SELECT title from event WHERE startTime > windowStart AND startTime < > windowEnd > > this will get me almost the same thing except that it will be missing the > first event that overlaps the windowStart because it's startTime is at or > before startTime. > > In this case, if I can get exactly the previous row added to what is > returned in the results above, I'll have exactly what I need. > > So the question is, when a WHERE clause returns a set of rows, is there a > way to also get the row at the ROWID that comes just before the row that is > returned from above with the lowest ROWID? > > Another way of putting it, if I take the lowest ROWID that is returned in my > second example and get the next lowest ROW, the one that is less than the > lowest ROWID I got but closest to it .i.e, the one right before it, then it > would be complete. > > > -- > View this message in context: > http://www.nabble.com/How-to-get-the-previous-row-before-rows-matching-a-where-clause...-tp22650799p22650799.html > Sent from the SQLite mailing list archive at Nabble.com. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] problems with shared cache?
Not sure if it will make a difference, but in your trigger stuff you explicitly coded null for the primary key value. Have you tried changing that so that you don't specify the primary key field at all? I can't remember from the previous post, but I think it was (or should be) set up as autoincrement. I think SQLite allows using multiple nulls for the primary key, but according to their docs, it is non-standard and it says something about "this may change in the future". Maybe you are getting caught in the middle of a change that is going to occur across multiple revisions of SQLite. Jim On 3/24/09, Damien Elmeswrote: > Sorry, my application's files are called decks, and I unwittingly used > the wrong terminology. > > Any ideas about the problem? > > On Sat, Mar 21, 2009 at 2:27 AM, Griggs, Donald > wrote: >> >>> However, when I ask the user to send me their deck, I find that: >>> >>> sqlite> pragma integrity_check; >>> integrity_check >>> --- >>> ok >>> sqlite> select id, count(id) from cards group by id having >>> count(id) 1; >>> sqlite> >>> >>> Any ideas? >> >> Obviously, that user is not playing with a full deck. ;-) >> >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] creating unique data takes many hours, help
Yes, you're right, but if the data is already in index order, you'll do less I/O when creating the index. Whether the sort + create DB time is less than "create DB from random input" time is another question. Jim On 3/29/09, mrobi...@cs.fiu.eduwrote: > question: > When you say >> 2) Sort your input file on the PRIMARY KEY >> or on some other INDEX > > I thought that while sqlite inserts the data it is creating the indexes > therefore sorting the data by way of the index, is this corret? > I have decared one column ONLY, unique and indexed. -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] delete with an "exists" clause
You want: delete from dummy where var=2; In years of DB work, I've never used exists. If you're mentioning this as a bug, I guess it could be: I'd have to lookup exists to see exactly how it's supposed to work. Jim On 4/9/09, Dave Dyerwrote: > > This little program deletes all rows. Is this a bug, or > perhaps I misunderstand how delete with an exists clause > is supposed to work. > > drop table if exists dummy; > create table dummy > ( var int > ); > insert into dummy (var) values (1); > insert into dummy (var) values (2); > insert into dummy (var) values (3); > > delete from dummy where exists (select * from dummy where var='2'); > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] disk I/O error...?
You'll get this if you have a database with an active journal (incomplete transactions) and you don't have write access to the database. In other words, the database needs a rollback from some prior operations done under a different userid, but now you don't have write access to do the rollback. To me, it should be a permission error instead, to make it clear what's going on. Jim On 4/11/09, Alberto Simõeswrote: > Hello > I am getting disk I/O error with: > > [a...@search1 align]$ du -hs T.3.sqlite > 122M T.3.sqlite > [a...@search1 align]$ sqlite3 T.3.sqlite > SQLite version 3.6.6 > Enter ".help" for instructions > Enter SQL statements terminated with a ";" > sqlite> .schema > CREATE TABLE trigrams (w1,w2,w3,occs); > CREATE INDEX idxT3w1 ON trigrams (w1); > sqlite> CREATE INDEX idxT3w2 ON trigrams (w2); > SQL error: disk I/O error > sqlite> [a...@search1 align]$ df -h . > FilesystemSize Used Avail Use% Mounted on > /dev/sdc1 148G 42G 100G 30% /export3 > [a...@search1 align]$ > > I tried to get I/O errors with other commands (for instance, yes > _) > but couldn't get any error. > > Any hint on what I can check to discover the problem? > Thank you > Alberto > > -- > Alberto Simões > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fixing a database
I think it would be a good idea for sqlite3 to display a message like "Database opened for read-only" if you don't have permission to write. I saw this problem myself where a rollback was necessary from a previous root process, and as a new SQLite user, it was confusing and made me think "great, the database is already corrupted". Jim On 4/22/09, Igor Tandetnikwrote: > Alberto Simoes > wrote: >> I am not sure what happened, but I have a database that opens >> correctly with sqlite3 shell, I can select and query it, but I can't >> edit it. I get Disk I/O error. The disk is not full. > > You don't have write permissions to the file, perhaps? > > Igor Tandetnik > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] corrupt database recovery
You could do a binary search to find the highest accessible rowid: select rowid where rowid = 2147483647 (fails) select rowid where rowid = 1073741824 (fails) ... select rowid where rowid = 65536 (fails) select rowid where rowid = 32768 (works!) select rowid where rowid = 49152 (works!) Within 32 selects, you will find the highest accessible rowid. Then do an insert from select * where rowid <= XXX. Jim On 4/25/09, Genewrote: > You are exactly right John...that is indeed what the code looks > like...except we have over 25 columns (it's a flat table). > > We've already fixed the bad code, but there are some customers who have old > versions...it didn't break very often with the old code, but it does still > did. > > I haven't tried a select Min or max on the row id but a select count(*) > returns an error...that's how I know I need to do the row by row recovery > method. Select * from mytable also returns an error. > > The tables usually have tens of thousands of rows, sometimes over a couple > hundred thousand but that's rare. > > What seems to work is that I do a select * from myTable where rowId = 'X' > incing X until I get an error. After I get the error, every row higher then > X also returns an error. So as soon as I get an error, I stop trying to > recover more rows. -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing concurrency with sql query - locks?
I'm not sure what you are considering a massive slowdown, but let's assume that the entire database fits into memory and disk I/O isn't the bottleneck. You said you're running 300 instances of the query on several processors. If several means 3 CPUs, then in a perfect world, running 300 instances will be 100 times slower than running just 1. This assumes you are getting linear scalability, which no one ever does, so the slowdown will be more than 100x. If you have 4 processors, running 300 queries simultaneously should still be 75x slower (best case) than running 1. You also mentioned seeing a wide variance in response times. This is typical, because most schedulers in multi-user systems won't perfectly distribute CPU time to 300 processes. If the scheduler decides to run task 1 to completion, then task 2, etc., your last task will appear to take much longer than the first task. For example, let's say that each task by itself takes 1 second of CPU time and zero I/O time, assuming the database is all in memory. 300 queries will take 300 seconds to complete. If the system scheduler runs each task, in order, to completion, then the first task will take 1 second and the last task will take 300 seconds to complete. Wide variance. Or, the system scheduler could decide to give each task 1/100th of a second. It will take 3 seconds for all tasks to get a timeslice. In this scenario, it will still take 300 seconds to complete all 300 jobs, but they will complete within 1/100th of a second of each other, and each job will report that it took 300 seconds to complete. No variance. The price you will pay for no variance is that you increase the multiprocessing overhead because now instead of doing just 300 task switches to execute each job to completion, you are doing 300x100 = 30,000 task switches. Task switches aren't free, so the "no variance" schedule will take longer overall than the wide variance. This is a classic fairness vs high throughput dilemma; you can't have both. If you are seeing something like 1000x slower performance, then as others have mentioned you could have a disk I/O or locking bottleneck. Jim On 5/6/09, Igor Tandetnikwrote: > "Rosemary Alles" wrote > in message news:af79a266-b697-4924-b304-2b1feccba...@ipac.caltech.edu >> Run on a single processor, the following query is quite fast: >> >> When concurrency is introduced (simply running the query on several >> processors against the same database - say 300 instances of it) causes >> a massive slow down > > Well, you may have multiple CPUs, but you only have a single hard drive. > That drive head can't be in multiple places simultaneously. > > Igor Tandetnik > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing concurrency with sql query - locks?
The original question was about 300 queries, which I took to mean selects. If the database is in memory, will 300 selects still cause synchronous disk I/O? Jim On 5/6/09, John Stanton <jo...@viacognis.com> wrote: > Sqlite is an ACID database - it ensures that data is written to disk, so > a database in memory still shares a single disk resource. > > Jim Wilcoxson wrote: >> I'm not sure what you are considering a massive slowdown, but let's >> assume that the entire database fits into memory and disk I/O isn't >> the bottleneck. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select performance with join
I don't know if it makes any difference, but is that where clause the same as: WHERE name< 'zUmM' OR (name= 'zUmM' AND S.id< 8122) The original way it was coded, all 3 conditions would have to be evaluated most of the time. The new way might get most rows with 1 condition. Depends on the data distribution of course. Jim On 5/8/09, galea...@korg.itwrote: > Citando Igor Tandetnik : > >> Andrea Galeazzi wrote: >>> but when I execute: >>> >>> SELECT S.id,title,artist,bpm,name >>> >>> FROM Song AS S >>> >>> LEFT JOIN Genre AS G ON (S.genre_id = G.id) >>> >>> WHERE name<= 'zUmM' AND (name< 'zUmM' OR S.id< 8122) >>> >>> ORDER BY name DESC, S.id DESC LIMIT 20; >> >> Note that LEFT JOIN is pointless here, since any record with G.name=NULL >> won't make it past the WHERE clause. Replace it with plain JOIN, you >> should see an improvement. >> >> Igor Tandetnik >> > I replaced LEFT JOIN with JOIN but it got worse, now the the time is > about 8700 ms! But I think I need to use LEFT JOIN because I have also > to accept the records with S.genre_id = NULL. > I also tried this query: > “SELECT S.id,title,artist,bpm,name " > "FROM Song AS S, Genre AS G " > "WHERE (S.genre_id = G.id) AND (name<= 'zUmM') AND (name< 'zUmM' OR > S.id< 8122) " > "ORDER BY name DESC, S.id DESC LIMIT 20"; > even if it doesn't work for me because it doesn't match S.genre_id = > NULL, I noticed a little improvement to 6000 ms. Then I delete S.id > DESC and the performance has been abruptly increased to 400 ms. > Anyway probably the right statement is LEFT JOIN but how can I > optimize this kind of task? > Is it really an hard work or does it depend on my no knowledge about sqlite? >> >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite version 3.6.14
May I suggest an extension PRAGMA SYNCHRONOUS = 3 | ASYNC so that non-C bindings can use the async functionality? Thanks, this is a great enhancement! Jim >> On Wed, May 06, 2009 at 10:36:50PM -0400, D. Richard Hipp wrote: >>> >>> A new optional extension is included that implements an >>> asynchronous I/ >>> O backend for SQLite on either windows or unix. The asynchronous I/O >>> backend processes all writes using a background thread. This gives >>> the appearance of faster response time at the cost of durability and >>> additional memory usage. See http://www.sqlite.org/asyncvfs.html for >>> additional information. -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] select performance with join
Thanks for the explanation. I recall seeing posts suggesting the use of union instead of or, and thought "if it's that easy, why doesn't SQLite do it?" The optimizer documentation says: --- Suppose the OR clause consists of multiple subterms as follows: expr1 OR expr2 OR expr3 If every subterm of an OR clause is separately indexable and the transformation to an IN operator described above does not apply, then the OR clause is coded so that it logically works the same as the following: rowid IN (SELECT rowid FROM table WHERE expr1 UNION SELECT rowid FROM table WHERE expr2 UNION SELECT rowid FROM table WHERE expr3) The implemention of the OR clause does not really use subqueries. A more efficient internal mechanism is employed. The implementation also works even for tables where the "rowid" column name has been overloaded for other uses and no longer refers to the real rowid. But the essence of the implementation is captured by the statement above: Separate indices are used to find rowids that satisfy each subterm of the OR clause and then the union of those rowids is used to find all matching rows in the database. --- It sounds like it might use indexes for an OR after all. Jim On 5/8/09, Igor Tandetnik <itandet...@mvps.org> wrote: > "Jim Wilcoxson" <pri...@gmail.com> wrote > in message > news:c5830b750905080813p6bf901bn526c81ca8ce7a...@mail.gmail.com >> I don't know if it makes any difference, but is that where clause the >> same as: >> >> WHERE name< 'zUmM' OR (name= 'zUmM' AND S.id< 8122) > > SQLite's optimizer cannot use an index for any condition involving OR. > That's why it's common to write an equivalent but somewhat unnatural > > name <= 'zUmM' AND (name< 'zUmM' OR S.id< 8122) > > This way, at least the first condition has a chance of being satisfied > with an index. > > Igor Tandetnik > > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupt Database Problems
I think you have answered your own question. If you use synchronous=off, you are saying "I don't care much about this database." When you "save" documents, you are merely putting them in a computer's cache (memory) and then confirming to the user that they are on the hard drive, when they aren't necessarily there. So, user clicks Save, program says it saved it, user turns off computer, database is corrupt. Don't know why this would happen all of a sudden, unless maybe they upgraded their OS and it has decided to cache volatile data longer to increase performance at the expense of data integrity. I hope you're able to rescue your data. Someone else mentioned on this list a while back that they could recover their data by doing retrieval based on rowid: do a select * where rowid=1, then 2, then 3, etc. until you get a failure. Once you get a failure, the rest is lost. Good luck, Jim On 5/15/09, Kevin Galewrote: ... > 4. synchronous is OFF (we have stopped setting this in the new build of our > app). -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Corrupt Database Problems
On 5/18/09, Kevin Galewrote: > According to the PRAGMA command documentation the database might become > corrupted if the o/s crashes or the computer loses power before the data has > been written to the disk surface. From the information we have from the > customer they continued to use the computer for some time after performing > the final save. This indicates that there could be other situations that > also cause this problem. It really depends on when the OS decides to flush its cache. All/most OS's cache data you write before actually putting it on the drive. These days, drives also cache writes. When the application says "sync this to disk", it's supposed to wait until the data is physically recorded on the drive before returning. But because of all the caching going on at various levels, it's sometimes hard to ensure this happens, and it's hard to verify that the write really did happen: when you try to read the data, the OS gives it to you, either from its cache (most likely) or from the drive, but you can't tell which was the source. You might want to do some crash tests with a virtual machine to find out what's happening. Good luck! Jim -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Feature request
I'd like to suggest that features such as this be implemented with PRAGMA commands whenever possible, the reason being that the new functionality will then be immediately available to folks using non-C bindings, without waiting for an update to a binding package that may or may not occur. Jim On 5/23/09, Filip Navarawrote: > PRAGMA schema_version ... for the second case. > > F. > > On Sat, May 23, 2009 at 4:20 PM, Simon Slavin > wrote: >> >> On 21 May 2009, at 5:59am, Simon Slavin wrote: >> >>> int sqlite3_change_count(sqlite3*) >>> >>> Returns an integer which is incremented whenn a change is made to any >>> table in the database. May be the value stored in bytes 24..27 of >>> file header or something else if that's not what I really want. >> >> Having thought about it some more, I want two different numbers. One >> of them changes when a field changes: INSERT/DELETE/UPDATE. The other >> changes when the schema changes: ALTER/DROP/CREATE. Rather than crowd >> the function library with multiple functions, it might be possible to >> implement it like so: >> >> int sqlite3_change_count(sqlite3*, int cTypes) >> >> when cTypes = 1, you get the number of changes to the schema >> when cTypes = 2, you get the number of changes to fields >> when cTypes = 3, you get the sum of the above two figures >> >> Or possibly 1 and 2 should be the other way around. >> >> Any ideas, criticism, or reasons it can't be done ? >> >> Simon. >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Minor errors in CREATE TABLE syntax diagrams
For my money, I'd prefer to have a smaller, faster parser that worked correctly on correct input at the expense of not catching all possible syntax errors on silly input. There is a definite trade-off here, and I could see where a totally complete parser that caught every possible error in SQL grammer might be twice the size of the entire SQLite code base. Of course, you don't want an SQL syntax typo to trash your database either, without warning. I'm assuming the SQLite developers have made reasonable decisions about which parsing errors are important, and which aren't. Jim On 5/26/09, John Machinwrote: > > 1. SQLite allows NULL as a column-constraint. > > E.g. CREATE TABLE tname (col0 TEXT NOT NULL, col1 TEXT NULL); > > The column-constraint diagram doesn't show this possibility. > > Aside: The empirical evidence is that NULL is recognised and *ignored*; > consequently there is no warning about sillinesses and typoes like in > these examples of column-def: > col1 INTEGER NOT NULL NULL > col1 INTEGER NOTE NULL -- type="INTEGER NOTE", constraint="NULL" > > 2. According to the diagram for foreign-key-clause, there is no "express > track" which allows skipping both "ON DELETE|UPDATE|INSERT etc" and > "MATCH name". However SQLite does permit all of that to be skipped. > > E.g. CREATE TABLE tname(col0 TEXT PRIMARY KEY, col1 TEXT REFERENCES > ftable(fcol)); > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow Transaction Speed?
I'm running on Linux with ext3 and just wrote a Python test program to insert rows into a table with a single column, no indexing, and doing a commit after each insert. When I first ran it, I got around 440 inserts per second, which is clearly impossible. This is a 7200rpm drive, so even if I could write a row on every revolution, the maximum insert rate would be 120 per second. I tried adding "pragma sychronous=normal", and it sped up to 520 TPS. With synchronous=full it slowed back to 420-460, so this must be the Python default. Adding synchronous=off increased the rate to over 6000 TPS -- basically writing to memory instead of disk. After using hdparm to turn off the drive cache (hdparm -W 0 /dev/sda), the insert rate was 15 rows per second. So it seems that for my particular hardware configuration, the Linux fsync call isn't doing what it should be doing. I have built the SQLite stuff from source, so perhaps my build has a problem. If you look on the Linux kernel mailing list archives, there are several discussions about drive write caching not playing nice with ext3's efforts to ensure data is actually on the disk. $ uname -a Linux amd 2.6.25-ARCH #1 SMP PREEMPT Sat Jun 14 18:07:19 CEST 2008 i686 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux Here is a test program you can run, to show your system's maximum physical I/O rate: #include #include #include #include #include #define MAX 3000 main () { int fd; int n; int loops; time_t elap; time_t start; if ((fd=open("big", O_RDWR+O_CREAT, 0777)) == -1) { perror("Error opening file"); exit(1); } start = time(NULL); for(loops=0; loopswrote: > Thanks Nick, > > good point. ahh yes, I've read about this somewhere... > > My extension is currently ".db", a quick check indicates > that using ".abc" gives a slight speed improvement, maybe 10%. > But that is allready very close to the variation I get between > different test runs, so I'm not really sure if I have this > "microsoft effect" here. > > Anyway, thanks for the reminder. > > Well, I think now there is nothing wrong here, > it is just as it is... Currently I achieve about 10 transactions/second, > maybe > not that bad... still slower than the "few dozen" that are mentioned on the > www pages, but I agree that there too many issues that may affect the disk > speed. > > > Marcus > > > Brandon, Nicholas (UK) wrote: >>> my developemnt system is a Win XP, with of course NTFS, I >>> don't know which drive it has, I guess a standard 7200 rpm. >>> >> >> What file extension (i.e. the letters after the dot in the filename) do >> you give the database? >> >> I faintly recall there is a windows peculiarity with system restore or >> something similar that archives certain file extensions in the >> background. That may contribute to your slow down. >> >> Nick >> >> >> This email and any attachments are confidential to the intended >> recipient and may also be privileged. If you are not the intended >> recipient please delete it from your system and notify the sender. >> You should not copy it or use it for any purpose nor disclose or >> distribute its contents to any other person. >> >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >>
Re: [sqlite] Slow Transaction Speed?
I should have mentioned that when running my C test program on Linux, with the hard drive cache enabled, it ran instantly. But it should have taken quite a while to do 3000 fsyncs. So this problem with defeating the hard drive cache is not an SQLite issue; it's a Linux issue. I also ran the same C test program on a virtual machine, without any hdparm tweaking, and the syncs worked correctly: I got about 115 TPS. This system is also using ext3, but a different version of Linux: $ uname -a Linux prirun 2.6.24-19-xen #1 SMP Sat Jul 12 00:15:59 UTC 2008 x86_64 Dual-Core AMD Opteron(tm) Processor 2212 AuthenticAMD GNU/Linux So I don't know why syncs work correctly in one place but not the other. Could be my hardware config, the virtualization layer fixing things, different version of Linux, ... If you see SQLite run much slower on ReiserFS than ext3, it probably means that ReiserFS is doing the syncs correctly, but ext3 is getting cached. Jim On 5/27/09, Jim Wilcoxson <pri...@gmail.com> wrote: > I'm running on Linux with ext3 and just wrote a Python test program to > insert rows into a table with a single column, no indexing, and doing > a commit after each insert. When I first ran it, I got around 440 > inserts per second, which is clearly impossible. This is a 7200rpm > drive, so even if I could write a row on every revolution, the maximum > insert rate would be 120 per second. I tried adding "pragma > sychronous=normal", and it sped up to 520 TPS. With synchronous=full > it slowed back to 420-460, so this must be the Python default. Adding > synchronous=off increased the rate to over 6000 TPS -- basically > writing to memory instead of disk. > > After using hdparm to turn off the drive cache (hdparm -W 0 /dev/sda), > the insert rate was 15 rows per second. So it seems that for my > particular hardware configuration, the Linux fsync call isn't doing > what it should be doing. I have built the SQLite stuff from source, > so perhaps my build has a problem. If you look on the Linux kernel > mailing list archives, there are several discussions about drive write > caching not playing nice with ext3's efforts to ensure data is > actually on the disk. > > $ uname -a > Linux amd 2.6.25-ARCH #1 SMP PREEMPT Sat Jun 14 18:07:19 CEST 2008 > i686 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux > > Here is a test program you can run, to show your system's maximum > physical I/O rate: > > #include > #include > #include > #include > #include > > #define MAX 3000 > > main () { > int fd; > int n; > int loops; > time_t elap; > time_t start; > > if ((fd=open("big", O_RDWR+O_CREAT, 0777)) == -1) { > perror("Error opening file"); > exit(1); > } > > start = time(NULL); > for(loops=0; loops<MAX; loops++) { > if (lseek(fd, 0, SEEK_SET) == -1) { > perror("Error seeking file"); > exit(1); > } > n = write(fd, , 1); > if (n != 1) { > perror("Error writing file"); > exit(1); > } > fsync(fd); > } > elap = time(NULL)-start; > printf("Time: %d seconds; TPS=%f\n", elap, MAX*1.0/elap); > } > > On my system, I get these results: > > [...@amd ~]$ cc -o sync sync.c > [...@amd ~]$ ./sync > Time: 25 seconds; TPS=120.00 > [...@amd ~]$ > > Running vmstat concurrently, we see this: > > [...@amd]$ vmstat 5 > procs ---memory-- ---swap-- -io -system-- > cpu > r b swpd free buff cache si sobibo in cs us sy id > wa > 0 0 0 791108 127632 7635600 0 9 10 33 1 1 98 > 1 > 0 1 0 791100 127648 7635600 0 439 102 212 0 1 13 > 86 > 0 1 0 791100 127664 7635600 0 478 119 242 0 1 0 > 99 > 0 1 0 790976 127672 7635600 0 478 119 242 0 1 0 > 99 > 0 1 0 790976 127688 7635600 0 481 120 244 0 > 1 0 99 <-- steady state > 0 1 0 790976 127696 7635600 0 482 120 244 0 1 0 > 99 > 0 0 0 790984 127700 7635600 0 167 40 75 0 0 71 > 29 > 0 0 0 790984 127712 7635600 0 316 0 0 100 > 0 > > During the steady state, there are 480KBytes written per second. > Linux does I/O in 4K chunks, so dividing 480K by 4K gives you I/O's > per second: 120. > > Jim > > On 5/27/09, Marcus Grimm <mgr...@medcom-online.de> wrote: >> Thanks Nick, >> >> good point. ahh yes, I've read about this somewhere... >> >> My extension is currently ".db", a quick check indicates >> that using "
Re: [sqlite] corrupt database recovery
What platform is this? I just posted a note today that my Linux box running 2.6-25 and ext3 isn't doing fsyncs like it should, so I would be susceptible to DB corruption if my machine crashed during DB I/O. I posted a C program you could run on the specific machine with a corrupt DB to see if it's really doing synchronous I/O. The other thing you might want to check is hardware. Running memtest on the machine overnight could show something, unless you are on server-class hardware with ECC memory. Jim On 5/27/09, Genewrote: > My code is outside the database layer. So I do all my database work, then > compress and encrypt it. No errors are returned anywhere. I'm guessing > that it's going to be an uninitialized variable or byte alignment problems > somewhere. > > This code is running on hundreds of machines without a problem and I've > never reproduced it but every now and again I get a support ticket showing a > corrupt database. So I'm trying to figure out WHERE to look. > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick > Sent: Wednesday, May 27, 2009 10:59 AM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] corrupt database recovery > > Gene Allen wrote: >> Ok...it's happened again and I've decided that I need to track this down >> once and for all! >> >> Here is what I'm seeing: I get errors when I do a integrity_check (see >> below), but I can .dump it to a text file and then .read it into another >> database ok. >> >> It seems to me that I'm screwing up an index or something. Are indexes >> stored at the end of the database file? All I can think of is that my >> compression/encryption routines are messing something up and I'm trying to >> figure out 'where' to look. >> >> I guess the real question is, what would I have to do to make an >> integrity_check fail, but still let a dump work correctly? >> >> Many thanks for any advice on tracking down this ugliness. >> > > SNIP > > Personally, I'd refactor the code to allow me to verify the operation of > the compression/encryption routines independently of the database > operation. How are you injecting the compression/encryption into the > database layer? > > > John > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Feature request: reset file mtime on rollback
It would be handy for file synchronization if SQLite stored the database file's mtime in the journal and reset it if a rollback occurs. It's difficult to do this in an application: 1. If a DB is opened, a transaction started, some stuff happens, and a rollback is executed, the file mtime only gets changed if a cache page had to be written. Only SQLite knows whether cache pages were written. 2. Suppose the DB has mtime T. An app runs and starts modifying the DB (and pages are written from cache), but it crashes before the commit. When the app starts again, if the app looks at mtime before opening the DB, the DB appears to be modified. After the open, the rollback will occur, the DB will be in it's prior state when mtime was T, but the mtime is still updated. Not a huge deal, just would be nice. Jim -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow Transaction Speed?
I agree that adding this to the library, and making it accessible via a pragma command would be very useful. For example, pragma commitrate 1000 would test the commit rate of 1000 commits and return the results in transactions per second as a row. If I install my app on a client's machine, I could run this test periodically to ensure that the system environment is going to support "no corruption" operation, and/or send some kind of warning to my customer that there is a risk of corruption because their system environment has problems. If it were only an external program bundled with sqlite, I couldn't really make use of it, because I'd have to distribute the program and instructions how to use it, and rely on customers to actually do it. This is a pretty small function. Just for my own use, I'd consider foreign key support to be way more bloated that this. Jim On 5/29/09, Marcus Grimmwrote: >> On Thu, May 28, 2009 at 03:12:55PM -0700, Allen Fowler scratched on the >> wall: >>> >>> > just for anybody who is interested: >>> >>> > >>> > I translated Jim's function into window code and added >>> > a page of 1024 that will be written, instead of a single byte. >>> > On my Win-XP system I got 55 TPS, much faster than sqlite >>> > seems to write a page but that might be related to the >>> > additional overhead sqlite needs to do. > > just to add: I traced a little what sqlite does when an > simple UPDATE is done within a transaction: It does > two syncs on the journal file and one final sync on the > db itselve, so achieving something like 15 TPS is reasonable. > > >>> > >>> > This brings me to a nice to have feature request: >>> > How about adding similar test function in the sqlite API ? >>> > This might use the vfs to write pages and gives some feedback >>> > on the performance of the system where sqlite runs on. >>> > It might also detect problems with the commit function... >>> > Just an idea... >>> > >>> >>> Interesting idea. >> >> It would make a lot more sense to make this an external utility >> or an extension of the sqlite3 shell. Adding it to the core library >> is a definite case of code bloat. > > Adding it into the API would allow my application to > easily make the test for example the first time it runs > on a system. But maybe a problem for the users that > apply a sqlite wrapper. > However, having it in sqlite3 shell would be very useful as > well. > > Marcus > >> >> Actually, a whole suite of performance related tests might be >> interesting. >> >>-j >> >> -- >> Jay A. Kreibich < J A Y @ K R E I B I.C H > >> >> "Our opponent is an alien starship packed with atomic bombs. We have >> a protractor." "I'll go home and see if I can scrounge up a ruler >> and a piece of string." --from Anathem by Neal Stephenson >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Join performance in SQLite
SQLite has surprised me with its quick performance, not the other way around. In fact, I've implemented all kinds of lookup queries that I knew could be optimized by caching results so I didn't have to keep repeating the SQL query, but the performance was so good even repeating the queries that I never bothered with the caching. I'm sure there are queries that SQLite doesn't run as fast as database product X, and I'm sure it goes the other way too. It's a balancing act, and as the primary developer, you have to choose for us what's important to optimize and what isn't. So far, I'm very happy with the choices and trade-offs that have been made in SQLite. :-) Jim On 5/30/09, Simon Slavinwrote: > I'm interested in how sqlite works differently to the SQL systems > which keep a daemon running as a background task. One of the > advantages of having a daemon which persists between runs of an > application is that the daemon can keep its own list of ORDERs, and > JOINs which are asked for frequently, and decide to maintain them even > when no SQL-using application is running. This can give the > impression that something is being done very quickly, when in fact the > majority of the time was taken during a previous run of the > application. It can be particularly hard to figure out what a > performance test means under these circumstances. > > But the problem is that I like the way sqlite works. I like the tiny > library, I like the way that the SQL library is entirely inside my > application, and any CPU load is mine. I like knowing that when my > app quits, nothing is going on. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Feature request
I read through the header after Simon's request, and saw the data change counter he mentions. What I wasn't sure about is whether this number gets reset during a rollback. For example, you start a transaction, make some changes, some cache pages get flushed, and SQLite changes this number so that other processes know that the database has changed. In fact, SQLite may change this number anytime it changes pages in my local cache - not sure. The header word seems to be designed to prevent stale cache lookups. If I rollback my changes, it would be unimportant for cache management whether the value rolled back also, but it would be important if the data didn't actually change for the value to roll back. Or, you could use the value as "data might have changed", but then it would be less useful for file synchronization. Jim On 5/30/09, Simon Slavinwrote: > > On 23 May 2009, at 3:32pm, Filip Navara wrote: > >> PRAGMA schema_version ... for the second case. > > Okay. Given that this does exactly what I want from one of my > requests, and given Jim Wilcoxson's point that adding a PRAGMA is > better than adding a function to the library, I can simplify my > feature request to asking for something like > > PRAGMA content_version OR PRAGMA data_version > > which returns the number stored in bytes 24 to 27 of the header. This > should have only a small impact on library size, and require very > little extra code since the number needs to be worked out anyway for > storage in the header. It should be useful for purposes associated > with synchronisation and journaling but it's mostly so that > applications which store some data outside SQL and some inside can > tell if they need to worry about something messing with their data. > > It doesn't matter to me whether a schema-change is considered a > content-change or not. I can sum the two in my own code if needed. > But the documentation should describe whether field-changes, or > COMMITs, or whatever is counted. > > I know I can read the file's header myself (and that's what my current > solution does) but this means I need to include file-handing libraries > in my library which I don't otherwise need. It's not a neat solution. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow Transaction Speed?
Here's an update for the commit rate tester for Mac OSX. With the original version, using fsync(), OSX doesn't do real syncs because the fsync call isn't properly implemented (my opinion). Apple took a shortcut with fsync and only flushes OS buffers to the drive, but the drive is then free to reorder the requests (that's okay) and execute them at a later time (that's not okay). You have to use fdcntl(F_FULLSYNC) to get data actually written to the platters. At least Apple does document this behavior. Here's the updated program: #include #include #include #include #include #define MAX 3000 main () { int fd; int n; int loops; time_t elap; time_t start; if ((fd=open("big", O_RDWR+O_CREAT, 0777)) == -1) { perror("Error opening file"); exit(1); } start = time(NULL); for(loops=0; loops<MAX; loops++) { if (lseek(fd, 0, SEEK_SET) == -1) { perror("Error seeking file"); exit(1); } n = write(fd, , 1); if (n != 1) { perror("Error writing file"); exit(1); } #ifdef __APPLE__ fcntl(fd, F_FULLFSYNC); #else fsync(fd); #endif } elap = time(NULL)-start; printf("Time: %d seconds; TPS=%f\n", elap, MAX*1.0/elap); } With F_FULLSYNC, my G5 server and G4 laptop running 10.4 get these results: g5:~ mac$ ./sync Time: 174 seconds; TPS=17.241379 $ ./sync Time: 265 seconds; TPS=11.320755 The G5 is running a software RAID driver, so I thought that might be contributing to the awful performance, but the G4 laptop w/o RAID is even worse - I'm assuming because the drive rotational speed is probably 4200RPM instead of 7200RPM. These results are pretty horrible. The SQLite commit documentation says that Apple is doing a hard drive reset to implement the cache flush, which is why the performance is so bad. Maybe they have improved things with Leopard (10.5). Since SQLite does 3-4 syncs per commit, I was curious what the commit performance would be like in this environment. Here are the results with a Python test program and SQLite 3.6.14.2: import sqlite3 import time def runtest(path, sync): con = sqlite3.connect(path) con.execute('create table if not exists t (f)') con.execute('pragma synchronous=' + sync) con.execute('pragma fullsync=ON') con.execute('delete from t') con.commit() start = time.time() end = start+60 i = 0 n = 3000 while i < n and time.time() < end: con.execute("insert into t (f) values (?)", (i,)) con.commit() i += 1 elap = time.time() - start print "For pragma synchronous=", sync, "Time:", elap, "TPS:", n/elap con.close() path = "table" runtest(path, "off") runtest(path, "normal") runtest(path, "full") g5:~ mac$ Python-2.6.1/python.exe dbsync.py /Users/mac/Python-2.6.1/Lib/sqlite3/dbapi2.py:27: RuntimeWarning: Python C API version mismatch for module _sqlite3: This Python has API version 1013, module _sqlite3 has version 1012. from _sqlite3 import * For pragma synchronous= off Time: 1.97417807579 TPS: 1519.61975305 For pragma synchronous= normal Time: 4.06918787956 TPS: 737.247846202 For pragma synchronous= full Time: 4.78447008133 TPS: 627.028688445 Obviously, there's no way a 7200RPM drive can actually achieve these results. I'm building my own version of Python and sqlite here, and after some research, I think I didn't build sqlite with the -DHAVE_FULLFSYNC flag, so it isn't actually doing the fdcntl() call. Jim On 5/29/09, Jim Wilcoxson <pri...@gmail.com> wrote: > I agree that adding this to the library, and making it accessible via > a pragma command would be very useful. For example, pragma commitrate > 1000 would test the commit rate of 1000 commits and return the results > in transactions per second as a row. > > If I install my app on a client's machine, I could run this test > periodically to ensure that the system environment is going to support > "no corruption" operation, and/or send some kind of warning to my > customer that there is a risk of corruption because their system > environment has problems. > > If it were only an external program bundled with sqlite, I couldn't > really make use of it, because I'd have to distribute the program and > instructions how to use it, and rely on customers to actually do it. > > This is a pretty small function. Just for my own use, I'd consider > foreign key support to be way more bloated that this. > > Jim > > On 5/29/09, Marcus Grimm <mgr...@medcom-online.de> wrote: >>> On Thu, May 28, 2009 at 03:12:55PM -0700, Allen Fowler scratched on the >>> wall: >>>> >>>> > just for anybody who is interested: >>>> >>>> > >>>> > I translated Jim's function into window code a
Re: [sqlite] Slow Transaction Speed?
Simon, the real point here is that Python and SQLite aren't doing real transactions. What we should be seeing here is AT MOST 30 TPS with synchronous=normal, assuming SQLite does 3 hard disk syncs per commit as was described in an earlier post. On my old AMD box, the one where fsync() is broken, I get this: [...@amd toys]$ uname -a Linux amd 2.6.25-ARCH #1 SMP PREEMPT Sat Jun 14 18:07:19 CEST 2008 i686 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux [...@amd toys]$ py dbsync.py For pragma synchronous= off Time: 0.380044937134 TPS: 7893.80335553 For pragma synchronous= normal Time: 5.70174884796 TPS: 526.154357197 For pragma synchronous= full Time: 7.00615310669 TPS: 428.195038606 [...@amd toys]$ py dbsync.py For pragma synchronous= off Time: 0.361596107483 TPS: 8296.54948689 For pragma synchronous= normal Time: 5.72849082947 TPS: 523.698141327 For pragma synchronous= full Time: 7.06226301193 TPS: 424.793015345 But by disabling the IDE hard drive write cache: [r...@amd toys]# hdparm -W 0 /dev/sda /dev/sda: setting drive write-caching to 0 (off) write-caching = 0 (off) [r...@amd toys]# py dbsync.py For pragma synchronous= off Time: 0.361658096313 TPS: 8295.12744379 For pragma synchronous= normal Time: 60.0442349911 TPS: 49.9631646643 For pragma synchronous= full Time: 60.0657091141 TPS: 49.9453023072 These last 2 are expected numbers for a true "on the platters" commit that is limited by the drive's rotational speed. The CPU doesn't really matter for this test. Jim On 5/31/09, Simon Slavinwrote: > > On 31 May 2009, at 4:59pm, Simon Slavin wrote: > >> Standard MacBookPro3,1 Core 2 Duo 2.4 GHz, OS 10.5.7, whatever hard >> disk Apple puts in them. >> Using SQLite version 3.4.0 (installed as part of the OS installation). >> >> SimonsMBP:Documents simon$ python --version >> Python 2.5.1 >> >> SimonsMBP:Documents simon$ python dbsynch.py >> For pragma synchronous= off Time: 2.70638608932 TPS: 1108.48929199 >> For pragma synchronous= normal Time: 4.9895169735 TPS: 601.260606174 >> For pragma synchronous= full Time: 6.40981006622 TPS: 468.032588954 > > I am, of course, an idiot. That was running from the command line. > If I run it as a .pyc I get > > >>> import dbsynch > For pragma synchronous= off Time: 2.69734382629 TPS: 1112.2052631 > For pragma synchronous= normal Time: 4.79093813896 TPS: 626.182161611 > For pragma synchronous= full Time: 6.31202697754 TPS: 475.283139739 > > SimonsMBP:Documents simon$ python dbsynch.pyc > For pragma synchronous= off Time: 2.61415219307 TPS: 1147.59959575 > For pragma synchronous= normal Time: 4.84184503555 TPS: 619.598516262 > For pragma synchronous= full Time: 6.0191090107 TPS: 498.412637928 > > I don't see why the 'full' figures vary so much. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow Transaction Speed?
Microsoft has an interesting article on hard drive caches re: SQL Server: http://support.microsoft.com/kb/234656 "Many disk drives (SATA, ATA, SCSI and IDE based) contain onboard caches of 512 KB, 1 MB, and larger. Drive caches usually rely on a capacitor and not a battery-backed solution. These caching mechanisms cannot guarantee writes across a power cycle or similar failure point. They only guarantee the completion of the sector write operations. As the drives continue to grow in size, the caches become larger, and they can expose larger amounts of data during a failure. " On 6/1/09, John Stantonwrote: > Simon Slavin wrote: >> On 31 May 2009, at 11:56pm, John Stanton wrote: >> >> >>> Try studying basic database theory and technology to get a better >>> understanding of the problem. >>> >> >> I have a pretty good understanding, I think. Plus 25 years experience. >> > Is it 23 years experience or 1 year 23 times? This forum is to share > information on Sqlite, for mutual advantage, not get into pointless > arguments. Let us keep it that way. > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow select from a single table when using a view
If you only have a handful of values for C and are already going to the trouble of creating separate views for each C, you could partition your data into separate tables for each value of C and maybe create another table containing the list of values of C and maybe the number of items in each C table (if that would be useful), or other attributes common to each value of C. Whether that makes sense depends on what else you're doing with the views. You might have to repeat queries across a set of tables and then do some work in your application to collate the results. That could be a big deal or not, depending on what you're doing. For this min example, you'd only have to query the master table. Jim On 6/11/09, Antti Nietosvaarawrote: > On Thursday 11 June 2009 16:30:12 Igor Tandetnik wrote: >> > Ah, this would indeed explain the slowdown. I was hoping views would >> > translate into the "where" part of the query, like: >> > SELECT min(C) FROM T_view; -> SELECT min(C) FROM T WHERE (C=1 OR C=2); >> >> I predict this last query wouldn't run any faster. Try it. > > You are correct. It is way too slow. > > -- > Antti Nietosvaara > Turun Turvatekniikka Oy > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.
Here's what I'd try: 1. Write a small server that accepts connections and writes to the SQLite database using prepared statements. If you need require 500 transaction per second, it's simply not possible with rotating media. So the solution is to either turn off synchronous, which is dangerous, or use group commit, ie, batching your external transactions into larger database commits. It's possible you might lose a batch of transactions from a hardware failure, but at least your database will be usable afterwards. With synchronous=off, your database may be corrupted. You might object to group commit because when you "ack" your external connection, you want to ensure you have done a commit before closing the connection. However, with synchronous=off, your commit is only in memory, so it's basically the same as group commit. Using group commit, it should be easy to do 500 SQL external "transactions" per second. 2. If this is still too slow, it's likely because of establishing the TCP connection. If possible, you could switch to UDP, which has a much lower overhead. You'd have to be able to live with losing or repeating data points sometimes, but maybe that would be easy to manage in your mini server above by ignoring repeated data points or using averaging to fill in missing data points. Jim On 6/11/09, Pavel Ivanovwrote: > I bet "synchronous"ness will not be your only bottleneck. Opening > connection, preparing statement and closing connection will take in > total much longer than executing statement itself. So that doing all > these operations 500 times per second will not be possible I think. If > you keep pool of connections along with already prepared statements > then your application will have chances for survival in such > contention environment. > And yes, your application will not have any chances without > "synchronous = OFF". Without it you're able to do only 10 to 20 > transactions per second. > > Pavel > > On Thu, Jun 11, 2009 at 9:53 AM, Robel Girma wrote: >> Thanks all for your input, very helpful. And yes, there will be 500 >> separate >> connections to the db per seconds, each updating 1 record. I've read about >> setting PRAGMA synchronous=OFF to cause SQLite to not wait on data to >> reach >> the disk surface, which will make write operations appear to be much >> faster. >> "But if you lose power in the middle of a transaction, your database file >> might go corrupt"==> I can live with this risk if it makes an huge >> improvement with the possible contention issue I'm facing. Any input with >> this setting you can provide will be greatly appreciated as always. >> Robel >> >> -Original Message- >> From: sqlite-users-boun...@sqlite.org >> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin >> Sent: Thursday, June 11, 2009 4:49 AM >> To: General Discussion of SQLite Database >> Subject: Re: [sqlite] Advice on which to use (SQLite or SQL Server) for >> the >> following app. >> >> >> On 11 Jun 2009, at 8:23am, Roger Binns wrote: >> >>> It depends very strongly on how the app is structured and in >>> particular >>> if there are a few persistent connections to the SQLite database, or >>> if >>> each request involves a separate connection to the database. If you >>> have lots of connections then there will be contention. >> >> 500 connections a second, each from a different computer. If the OP >> handles each one with a separate run of his/her application, that's >> 500 connections to the database a second, each updating one record in >> one table. >> >>> If the work done during contention is quick and simple then you are >>> fine. If it is long running then you will benefit from a server based >>> approach. But when you have commits then disk access is serialized >>> and >>> you will have performance limitations no matter what the database >>> server or SQLite. (That is the point Florian is making.) >> >> As far as I can tell, with correct programming each query would be one >> connection for all the data the query would want. So the OP's >> objective is /probably/ achievable with SQLite but I'd want to >> prototype it to be sure. >> >> Sorry, Robel, but we can only guess. Try it. >> >> Simon. >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.
I should have mentioned, if it were me, I'd write the mini server first as a single process in a loop, and make it as fast as possible. If you try to do db updates with multiple processes, you'll have concurrency issues. It might make sense to use multiple processes if you also have lots of queries, and have only 1 process (or thread) writing, while a pool of processes handles queries. Not sure how well SQLite handles this situation, but since you are doing group commits, it will greatly decrease your write load and potential concurrency issues. Jim On 6/11/09, Jim Wilcoxson <pri...@gmail.com> wrote: > Here's what I'd try: ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.
SSD's usually have poor write performance, because to do a write, they have to use read, erase, write sequences across large blocks like 64K. Most of the SSD benchmarks that quote good write performance are for sequential write performance. If you skip all over the disk doing small writes, like a database does, I suspect you'll see pretty bad performance with most SSD's. In most of the SSD benchmarks I've seen, random write performance is worse than rotating media. Actually, most of the SSD benchmarks I've seen completely skip over this point, perhaps not even testing random write performance, but only sequential writes. Using a separate copy of the database on SSD for queries would probably work well, assuming the database doesn't fit into RAM. If the db does fit in RAM, SSD won't buy you anything. Companies are putting all kinds of smarts into SSD's to try to minimize the effects of the read, erase, write cycle, usually by some form of caching, but then you are also playing with losing the transaction guarantees of a commit. Can't really have it both ways. Jim On 6/11/09, Sam Carleton <scarle...@gmail.com> wrote: > Jim Wilcoxson wrote: >> Here's what I'd try: >> >> 1. Write a small server that accepts connections and writes to the >> SQLite database using prepared statements. If you need require 500 >> transaction per second, it's simply not possible with rotating media. > > I am a late comer to this discussion, so this might have already been > purposed... > > Ever consider having all the updates are done on none rotating media > (SSD) and then having 1 process that reads in blocks of the data and > puts it into it's final home? Purge the SSD SQLite db simply by rolling > to a new one from time to time and delete the old once once all the > connections have moved on to the new SQLite db. > > Sam > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.
Hey, if anybody has an SSD laying around, it would be interesting to run that commit test program I posted a while back to see what kind of transaction rates are possible. Although, verifying whether the SSD is actually doing the commits or just saying it is would be very difficult. With rotating media, you at least have an upper bound (120/sec for 7200rpm, 166/sec for 10Krpm, 250/sec for 15Krpm) and if you go outside that, you know it's a lie. Not sure how you could verify that commits/syncs with an SSD are actually working other than repeatedly pulling the plug and seeing if the db survives intact. Jim On 6/11/09, Jim Wilcoxson <pri...@gmail.com> wrote: > SSD's usually have poor write performance, because to do a write, they > have to use read, erase, write sequences across large blocks like 64K. -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.
Yes, good point. If you partition the database into multiple databases, you will have to place each on its own physical disk drive to increase transaction rates. If your base transaction rate with one drive is T, with N drives it should be N*T; 4 drives gives you 4x the transaction rate, etc. Each of the drives has to be completely independent - no filesystems crossing drives. If you partition into multiple databases and keep them on the same drive, your transaction rate will likely go down, because now you are introducing seeks back and forth between the two databases as you commit. I think someone mentioned using the journal_data option with ext3 to increase performance. In theory, it possibly could, but on my Linux system, it didn't. I got the same results when I tried using tune2fs to change it, though I'm not sure it actually did anything. Jim On 6/11/09, Petite Abeillewrote: > > On Jun 11, 2009, at 4:53 PM, Sam Carleton wrote: > >> I am a late comer to this discussion, so this might have already >> been purposed... > > Additionally, if this was not mentioned already, you can partition > your database across multiple physical files through the magic of > 'attach database' or something and load balance across those. > > http://en.wikipedia.org/wiki/Partition_(database) > > CHeers, > > -- > PA. > http://alt.textdrive.com/nanoki/ > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_step performance degredation
There was a recent SQLite bug that caused the size of the SQLite cache to shrink in some circumstances, and the longer a program ran, the smaller the cache became. Maybe you are running into this bug. IIRC, you had to do an update in the select loop to trigger the bug, so if you're not doing that, maybe this doesn't apply. Jim On 6/15/09, Mike Borlandwrote: > Nuno, unfortunately your psychic skills are a bit off on this one. Sorry I > wasn't more explicit. I am not using any LIMIT or OFFSET to do any virtual > scrolling. Basically I have table A which has 900 rows. Table B has > 180,000 rows (900 * 200) which has a foreign key relationship back to table > A. So for each row in table A, there are 200 rows in table B. My query is > basically a "SELECT * FROM Table B WHERE ID = TableA.ID". I'm executing > this query 900 times, once for each row in table A. > > When I start the 900 read iterations (always in the same order), the first > one generally reads in about 50ms and by the last read, it's taking roughly > 1000ms. Sometimes it slows down immediately, sometimes after the 100th > iteration. The only absolutely reproducible aspect is that it always slows > down eventually and once it slows down, it never speeds back up. I don't > believe it's a locking issue since my timer doesn't start until the query is > successfully executed. > > Any ideas? Would the occasional write operation in the midst of these reads > cause any permanent slow down to the read time? Thanks. > > Mike Borland > > -Original Message- > From: Nuno Lucas [mailto:ntlu...@gmail.com] > Sent: Friday, June 12, 2009 7:16 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] sqlite3_step performance degredation > > On Sat, Jun 13, 2009 at 1:52 AM, Mike > Borland wrote: >> I have a fairly complex program where I am seeing a performance >> degradation of the sqlite3_step() function. Basically I'm iterating >> roughly 200 rows at a time, over and over. The only work happening >> during the iteration is I'm copying the record into an array. At first, >> sqlite3_step() takes less than a millisecond to run. After 0-50 >> iterations, it's taking anywhere from 10-100ms. >> >> Does anybody have any insight into what's happening behind the scenes >> with this function to help me track down the cause? I appreciate it! > > You should explicitly say what your SQL query is. Without that we can > only guess. > > My current "psychic" guess is that you are using LIMIT to obtain those > 200 rows, one "page" at a time, and as you go advancing "pages" it > becomes slower and slower. > If this is true, then you should re-think your design as LIMIT just > skips the rows, but it will "generate" them before, meaning it > becomes slower as you advance on the offset given. > Look at http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor (and > notice the "What not to do" at the end, talking about "LIMIT" and > "OFFSET"). > > If my my psychic abilities are becoming weak, then please supply your > exact query that is getting slower (and maybe your database schema) > and then someone can give you an exact answer. > > > Regards, > ~Nuno Lucas > >> >> Mike Borland >> >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Database inserts gradually slowing down
You are doing transactions here, which is a very different thing from normal disk I/O. Your CPU is idle because it is waiting for the disk. Your disk is idle because it is waiting for the platters to rotate around again. The best case you can achieve on a 7200RPM disk is 120 transactions (commits) per second. In practice, you will see much lower TPS rates, and they will typically be 50% fractions. For 1 sync per transaction, the max rate is 120 TPS. For 2 syncs per transaction, the max rate is 60 TPS. For 3 syncs per transaction, the max rate is 30 TPS. SQLite always does at least 2 syncs per transaction, and sometimes 3, depending on the options you use, so a transaction rate of 30 TPS is reasonable. The only way you can speed this up is to get a disk that rotates faster, ie, a 10K or 15K rpm drive will do faster transactions, but even so, you are still limited to 250 TPS with a 15K drive. Or, get a battery-backup caching controller that will lie to your OS and tell it that the data is on the media, when really it is only stored in the controller's memory. This allows the controller to combine write requests to increase the transaction rate. Jim On 6/16/09, Jens Páll Hafsteinssonwrote: > Yes, I'm perfectly aware of this and hence I would expect the disk to be > sweating like hell running this test while the CPU is relatively relaxed > (given that sqlite is disk bound in this case and not CPU bound). > > But this is not happening; neither the disk nor the CPU are practically > doing anything, which is a bit strange. It's as if both the disk and the CPU > are waiting for each other or that sqlite is playing 'nice' behind my back > and giving up the CPU when it shouldn't. > > JP > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Adam DeVita > Sent: 16. júní 2009 14:06 > To: General Discussion of SQLite Database > Subject: Re: [sqlite] Database inserts gradually slowing down > > Remember the implications of Moore's law and how much time has passed. > > CPU speed is much faster than memory speed. > Memory speed is much faster than disk access. > > This is why hardware folks play all sorts of tricks with pipelines, caches, > interleaving, and parallelism. > > For a single process that interacts with the HDD, the HDD will be the bottle > neck and the CPU will spend lots of time waiting for the rest of the machine > to catch up. Even if you have a RAID system, the CPU is still much faster > than the bus the hard drives are on. > > > On Tue, Jun 16, 2009 at 9:52 AM, Jens Páll Hafsteinsson > wrote: > >> In step 5 I execute "delete from t1" without any where clause. >> >> I haven't monitored the disk space used (does sqlite use temporary files >> beside the database file?) but the database file itself has been fixed in >> size at around 12MB (12.461.056 bytes) the whole time. >> >> The load on the disk is minimal (sustained 1MB/s) and the CPU load is >> about >> 6%, which is a bit surprising since I thought I would be putting a huge >> load >> on the computer running a loop like this. I'm not at all happy to see >> these >> low load numbers given how the test is programmed (it should practically >> own >> the machine). The database should utilize the computer much better than >> this. >> >> I've been running the test now for about 10 minutes using 3.6.15 and it >> 'seems' to be behaving as before, slowly increasing in execution time. I >> want to run this a bit longer to be absolutely sure and will let you know. >> >> JP >> >> -Original Message- >> From: sqlite-users-boun...@sqlite.org [mailto: >> sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov >> Sent: 16. júní 2009 12:15 >> To: General Discussion of SQLite Database >> Subject: Re: [sqlite] Database inserts gradually slowing down >> >> How do you do step 5? Like "delete from table" or "delete from table >> where ..."? Do you see any degrade in disk space used by database >> along with slowness? >> >> Pavel >> >> On Tue, Jun 16, 2009 at 8:06 AM, Jens Páll >> Hafsteinsson wrote: >> > Hi >> > >> > I've been running some tests against sqlite and have found that inserts >> are gradually slowing down. Since I'm new to sqlite I might be doing >> something wrong and thought I'd ask here if anyone has seen this before or >> know what might be causing this. >> > >> > The test I'm running creates a database containing a single table (int, >> int, varchar(100)) along with an index (not unique) on the first field. I >> then perform the following operations (all records have unique data in the >> first field): >> > >> > >> > 1. start a transaction >> > >> > 2. insert 1000 records >> > >> > 3. commit >> > >> > 4. repeat steps 1-3 100 times >> > >> > 5. delete everything from the table >> > >> > 6. Start again at step 1 >> > >> > The time taken to execute steps 1-4 increases
Re: [sqlite] async io and locks
Async I/O fits perfectly with my app: I don't need the durable guarantee. But I do need the ability to activate it with a pragma since it isn't a C app. Are there plans to make async I/O available via pragma? Jim On 6/20/09, Danwrote: > > I think we have quite different approaches. > > The SQLite asynchronous IO backend is queueing IO, not queries. It > intercepts > the equivalent of the write() system call. It does not overload query > processing > in any way. The things on the write-queue are blobs of data to be > written > to specific offsets within the database file, not SQL queries. > > It's more complex than that of course. But the point is that the write- > queue is > a layer between SQLite and the OS, not between the user and SQLite. > > Dan. > > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Python sqlite binding: commit inside select loop
I'm using the Python sqlite3 (pysqlite) bindings. I'd like to be able to do a select, and while fetching those rows and inserting new ones, periodically do a commit. With the Python bindings, an error occurs because a commit resets all pending select statements. Here is an example showing the error: import sqlite3 con = sqlite3.connect('testdb') con.execute('drop table t') con.execute('create table t(f)') con.execute('insert into t (f) values (1)') con.execute('insert into t (f) values (2)') con.execute('insert into t (f) values (3)') print '1st select:' cur = con.execute('select * from t') for row in cur: print row print '2nd select:' cur = con.execute('select * from t') for row in cur: print row con.execute('update t set f=2 where f=1') con.commit() Running it shows: 1st select: (1,) (2,) (3,) 2nd select: (1,) Traceback (most recent call last): File "dbbug.py", line 17, in for row in cur: sqlite3.InterfaceError: Cursor needed to be reset because of commit/rollback and can no longer be fetched from. I can understand that there is some potential weirdness here in the case where rows are being deleted or rowid's are getting changed inside a select, and I think deciding I want to see those updates in the select results is an isolation level setting - not sure of the details. Right now, I don't think it matters whether I see the old or new values, because I'm mostly just adding rows inside the transaction, updating some stuff where the old or new value would be okay, and not doing any deletes. So I have a couple of questions/observations: 1. Is the thing about resetting cursors an SQLite requirement, or something the Python binding is doing to protect people from the confusion of seeing the updated rows. 2. If I were deleting rows, I can see where that would be a problem: what if some of the rows satisfying the query were deleted? Once the commit happens, the rows are gone unless there is a delete/add with the same rowid. I'm thinking about removing this reset from pysqlite, but don't want to trash my database either. Thanks for any insight. Jim ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Python sqlite binding: commit inside select loop
I guess I am a bit confused. Igor says it's not possible, but Roger says my example works. One other point of confusion is that sometimes an interface layer will grab all or a bunch of the rows after a select, even though it may hand them to upper layers one row at a time. For example, with pysqlite, cur.fetchmany() does this. When this happens, it appears as if the transaction & commit occur, but the select & fetch loop have already finished behind the scenes. For my particular app, I'm only adding new rows, so there wouldn't be too many weird effects. Jim On 6/30/09, Roger Binns <rog...@rogerbinns.com> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Jim Wilcoxson wrote: >> With the Python bindings, an error occurs >> because a commit resets all pending select statements. > > Note that there are some constraints about how SQLite works detailed by > Igor, but in this particular case you are being stymied by pysqlite. > > By default pysqlite tries to parse the SQL you execute and does > transaction management behind the scenes on your behalf (ie calling > begin and commit as it deems fit). This is allegedly something required > by the Python DBAPI standard although I admit I don't understand it > myself. Apparently you use the isolation level parameter with pysqlite > to control how much of this nonsense it does behind the scenes. > > If you use apsw (disclaimer: I am the author) then it doesn't do any > nonsense behind your back and you get (1,) (2,) (3,) printed out twice > without any exceptions as you were expecting. > > Roger > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkpKZKgACgkQmOOfHg372QSMJwCgkIoZ8VGqUcpn8rMtZPF7kpoF > vqIAnj5qIaSCy7VTp5mJsAQ4mBVCk+GD > =SGGD > -END PGP SIGNATURE- > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Python sqlite binding: commit inside select loop
Thanks for the comments and explanations everyone - much appreciated. It seems there are a few alternatives I can check out. Jim On 6/30/09, Roger Binns <rog...@rogerbinns.com> wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Jim Wilcoxson wrote: >> I guess I am a bit confused. Igor says it's not possible, but Roger >> says my example works. > > For the exact code you were running the error message came from pysqlite > and not from SQLite. Other posters are not aware of the underlying > implementation details in pysqlite and it doing all sorts of transaction > related stuff behind the scenes, and so were telling you about SQLite > behaviour in general. In summary the behaviour you see from pysqlite > will not necessarily match the behaviour you would see if using the > SQLite C api directly to do the same thing. > > It was issues like this that led me to create APSW in the first place > because I wanted the exact SQLite semantics and not some random > "standard" that tried to pretend all database apis behave in exactly the > same way. I also document which SQLite apis are called from each Python > api and the index lets you work in the reverse direction: > > http://apsw.googlecode.com/svn/publish/genindex.html#S > >> One other point of confusion is that sometimes an interface layer will >> grab all or a bunch of the rows after a select, even though it may >> hand them to upper layers one row at a time. > > Note that none of the Python apis do that by default and you have to > call a method to grab all the results. The apis for other databases do > do that by default since they use the network and grabbing result rows > one at a time would be far slower than batching them. > >> For my particular app, I'm only adding new rows, so there wouldn't be >> too many weird effects. > > There was a poster (I forget if it was this list or the pysqlite/APSW > one) who managed to get an infinite loop. Effectively he was changing a > rowid for each row found (an UPDATE was doing it - rowid wasn't > explicitly mentioned in the query) while doing a select over all items. > The constantly updated rowid effectively added the changed row to the > end of the select results and so it carried on forever. > > Although the relevant information is in my APSW docs, I think I need to > pull more of it together and make things clearer. > > Roger > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkpKfdYACgkQmOOfHg372QTAjQCeL3zZCBh8Pgnh/GT95RH0JIY0 > qGgAoKsqr66QwxKDtQNqe6W1jayU90/J > =Xs+y > -END PGP SIGNATURE- > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Python sqlite binding: commit inside select loop
I did some checking on the pysqlite mailing list. Apparently the change to reset all cursors before commit is recent, and the reason is that pysqlite or sqlite itself would return bogus rows. There is an example here: http://article.gmane.org/gmane.comp.python.db.pysqlite.user/2217 Returning old or new data would be one thing, but in this example, totally wrong data is returned. There's no explanation of why it was happening. On 6/30/09, Jim Wilcoxson <pri...@gmail.com> wrote: > Thanks for the comments and explanations everyone - much appreciated. > It seems there are a few alternatives I can check out. > > Jim > > On 6/30/09, Roger Binns <rog...@rogerbinns.com> wrote: >> -BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Jim Wilcoxson wrote: >>> I guess I am a bit confused. Igor says it's not possible, but Roger >>> says my example works. >> >> For the exact code you were running the error message came from pysqlite >> and not from SQLite. Other posters are not aware of the underlying >> implementation details in pysqlite and it doing all sorts of transaction >> related stuff behind the scenes, and so were telling you about SQLite >> behaviour in general. In summary the behaviour you see from pysqlite >> will not necessarily match the behaviour you would see if using the >> SQLite C api directly to do the same thing. >> >> It was issues like this that led me to create APSW in the first place >> because I wanted the exact SQLite semantics and not some random >> "standard" that tried to pretend all database apis behave in exactly the >> same way. I also document which SQLite apis are called from each Python >> api and the index lets you work in the reverse direction: >> >> http://apsw.googlecode.com/svn/publish/genindex.html#S >> >>> One other point of confusion is that sometimes an interface layer will >>> grab all or a bunch of the rows after a select, even though it may >>> hand them to upper layers one row at a time. >> >> Note that none of the Python apis do that by default and you have to >> call a method to grab all the results. The apis for other databases do >> do that by default since they use the network and grabbing result rows >> one at a time would be far slower than batching them. >> >>> For my particular app, I'm only adding new rows, so there wouldn't be >>> too many weird effects. >> >> There was a poster (I forget if it was this list or the pysqlite/APSW >> one) who managed to get an infinite loop. Effectively he was changing a >> rowid for each row found (an UPDATE was doing it - rowid wasn't >> explicitly mentioned in the query) while doing a select over all items. >> The constantly updated rowid effectively added the changed row to the >> end of the select results and so it carried on forever. >> >> Although the relevant information is in my APSW docs, I think I need to >> pull more of it together and make things clearer. >> >> Roger >> -BEGIN PGP SIGNATURE- >> Version: GnuPG v1.4.9 (GNU/Linux) >> >> iEYEARECAAYFAkpKfdYACgkQmOOfHg372QTAjQCeL3zZCBh8Pgnh/GT95RH0JIY0 >> qGgAoKsqr66QwxKDtQNqe6W1jayU90/J >> =Xs+y >> -END PGP SIGNATURE- >> ___ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > > > -- > Software first. Software lasts! > -- Software first. Software lasts! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] shell .schema command doesn't display attached table schemas
Example: [jim@mb ~]$ sqlite3 dummy -- Loading resources from /Users/jim/.sqliterc SQLite version 3.15.1 2016-11-04 12:08:49 with the Encryption Extension Copyright 2016 Hipp, Wyrick & Company, Inc. Enter ".help" for usage hints. sqlite> create table t1(text); sqlite> ^D [jim@mb ~]$ ls -l dummy -rw-r--r-- 1 jim staff 8192 Feb 21 10:23 dummy [jim@mb ~]$ sqlite3 dummy2 -- Loading resources from /Users/jim/.sqliterc SQLite version 3.15.1 2016-11-04 12:08:49 with the Encryption Extension Copyright 2016 Hipp, Wyrick & Company, Inc. Enter ".help" for usage hints. sqlite> create table t2(text); sqlite> ^D [jim@mb ~]$ ls -l dummy2 -rw-r--r-- 1 jim staff 8192 Feb 21 10:24 dummy2 [jim@mb ~]$ sqlite3 dummy -- Loading resources from /Users/jim/.sqliterc SQLite version 3.15.1 2016-11-04 12:08:49 with the Encryption Extension Copyright 2016 Hipp, Wyrick & Company, Inc. Enter ".help" for usage hints. sqlite> attach dummy2 as d2; sqlite> .tables d2.t2 t1 sqlite> .schema CREATE TABLE t1(text); sqlite> ^D HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users