Re: [sqlite] expected performance of selecting large numbers of records
On 1/31/15, Godmar Back wrote: > Thank you very much for those suggestions! > > Select max(rowid) completes immediately, I haven't tried WAL mode and the > idea of sorting yet. > > https://www.sqlite.org/wal.html appears to be down right now, but > http://www.sqlite.org/wal.html is up. > There was some server maintenance. And for whatever reason, the HTTPS side didn't come back up after reboot. I've manually restarted it so it should be working now. For future reference, if www.sqlite.org is ever down, you can always visit one of the (geographically dispersed) backup sites: http://www2.sqlite.org/ http://www3.sqlite.org/ -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] expected performance of selecting large numbers of records
Thank you very much for those suggestions! Select max(rowid) completes immediately, I haven't tried WAL mode and the idea of sorting yet. https://www.sqlite.org/wal.html appears to be down right now, but http://www.sqlite.org/wal.html is up. On Sat, Jan 31, 2015 at 5:31 AM, Igor Tandetnik wrote: > > I'm not sure what you mean by "dump an approximate snapshot". I meant that I don't care whether records that were recently added, say during the last few seconds, are included in the total count()/dump of all records or not. So I'm fine with an approximate snapshot that may not have a few of the latest updates. - Godmar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] expected performance of selecting large numbers of records
On Sat, 31 Jan 2015 00:21:18 -0500, Godmar Back wrote: >I have a single SQLite 2-column table with a primary TEXT key and a value, >like so (id TEXT PRIMARY KEY, value TEXT). One process adds new records >periodically, perhaps 1-10 per minute. The database currently has 850,000 >entries and is 7.3GB large on disk. You could benchmark an alternative structure with CREATE TABLE ... ( id TEXT PRIMARY KEY, value TEXT ) WITHOUT ROWID; http://www.sqlite.org/withoutrowid.html >I also need to perform bulk queries, which appear to be very slow. On an >average desktop PC, for instance, a "SELECT COUNT(*)" takes over 5 minutes. count(*) does a full table scan, which takes time. You could add ON INSERT/ON DELETE triggers to maintain a running count in a separate table. >If I want to do a dump of the table, as in "SELECT value FROM " I'll >quickly get "database is locked" errors. Googling revealed that those are >because a long running select keeps a cursor, and thus readlock on the >entire database. > > I have since rewritten the query using multiple SELECT * >FROM ... LIMIT a, b where b = 50 and a = 0, 50, 100, 150, . However, >it takes 20 hours to fully extract the table's 850,000 records, with only >minimal per record processing. With the LIMIT clause SELECT will usually be slower, especially when ORDER BY is something else than the primary key. >My question: is this performance expected, or am I doing something wrong? >Is there a quick way to count and/or dump an approximate snapshot of a >single 2-column table such as the one shown above, even while keeping the >database available for concurrent writes? You probably should use PRAGMA journal_mode=WAL; , which allows one writer and many readers concurrently. http://www.sqlite.org/pragma.html#pragma_journal_mode Also, have a look at http://www.sqlite.org/pragma.html#pragma_busy_timeout >I'm using Ubuntu 12.10 with 3.7.9-2ubuntu1.1 libsqlite3 using the Python >binding. Check whether that the version reported by SELECT sqlite_version(); has the features I mentioned above (WITHOUT ROWID and WAL mode); If necessary, compile your own library from source, http://www.sqlite.org/download.html (bottom) Hope this helps. -- Regards, Kees Nuyt ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] expected performance of selecting large numbers of records
On 1/31/2015 12:21 AM, Godmar Back wrote: If I want to do a dump of the table, as in "SELECT value FROM " I'll quickly get "database is locked" errors. Googling revealed that those are because a long running select keeps a cursor, and thus readlock on the entire database. I have since rewritten the query using multiple SELECT * FROM ... LIMIT a, b where b = 50 and a = 0, 50, 100, 150, . However, it takes 20 hours to fully extract the table's 850,000 records, with only minimal per record processing. That's a quadratic algorithm you have here. In order to start reading from the b-th record, SQLite needs to read and discard the previous b records. You would be better off with a "WHERE id > $LastID ORDER BY id " clause, where $LastID would be the key of the last record in the previous chunk. Better still, switch to WAL mode ( https://www.sqlite.org/wal.html ) It allows readers to work simultaneously with a single writer. Then just run a single query across the whole table. My question: is this performance expected, or am I doing something wrong? Is there a quick way to count and/or dump an approximate snapshot of a single 2-column table such as the one shown above, even while keeping the database available for concurrent writes? Do you ever delete records from this table? If you don't, you can get a count with select max(rowid) from mytable; That should be nearly instantaneous. If you do delete rows, and all you need is a quick count, then I would suggest adding an ON INSERT and ON DELETE triggers that would update a count stored in a separate, singleton table. I'm not sure what you mean by "dump an approximate snapshot". -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] expected performance of selecting large numbers of records
I have a single SQLite 2-column table with a primary TEXT key and a value, like so (id TEXT PRIMARY KEY, value TEXT). One process adds new records periodically, perhaps 1-10 per minute. The database currently has 850,000 entries and is 7.3GB large on disk. I also need to perform bulk queries, which appear to be very slow. On an average desktop PC, for instance, a "SELECT COUNT(*)" takes over 5 minutes. If I want to do a dump of the table, as in "SELECT value FROM " I'll quickly get "database is locked" errors. Googling revealed that those are because a long running select keeps a cursor, and thus readlock on the entire database. I have since rewritten the query using multiple SELECT * FROM ... LIMIT a, b where b = 50 and a = 0, 50, 100, 150, . However, it takes 20 hours to fully extract the table's 850,000 records, with only minimal per record processing. My question: is this performance expected, or am I doing something wrong? Is there a quick way to count and/or dump an approximate snapshot of a single 2-column table such as the one shown above, even while keeping the database available for concurrent writes? I'm using Ubuntu 12.10 with 3.7.9-2ubuntu1.1 libsqlite3 using the Python binding. - Godmar ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users