subject:"\[sqlite\] expected performance of selecting large numbers of records"

Re: [sqlite] expected performance of selecting large numbers of records

2015-01-31 Thread Richard Hipp

On 1/31/15, Godmar Back  wrote:
> Thank you very much for those suggestions!
>
> Select max(rowid) completes immediately, I haven't tried WAL mode and the
> idea of sorting yet.
>
> https://www.sqlite.org/wal.html appears to be down right now, but
> http://www.sqlite.org/wal.html is up.
>

There was some server maintenance.  And for whatever reason, the HTTPS
side didn't come back up after reboot.  I've manually restarted it so
it should be working now.

For future reference, if www.sqlite.org is ever down, you can always
visit one of the (geographically dispersed) backup sites:

http://www2.sqlite.org/
http://www3.sqlite.org/

--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] expected performance of selecting large numbers of records

2015-01-31 Thread Godmar Back

Thank you very much for those suggestions!

Select max(rowid) completes immediately, I haven't tried WAL mode and the
idea of sorting yet.

https://www.sqlite.org/wal.html appears to be down right now, but
http://www.sqlite.org/wal.html is up.

On Sat, Jan 31, 2015 at 5:31 AM, Igor Tandetnik  wrote:

>
> I'm not sure what you mean by "dump an approximate snapshot".

I meant that I don't care whether records that were recently added, say
during the last few seconds, are included in the total count()/dump of all
records or not. So I'm fine with an approximate snapshot that may not have
a few of the latest updates.

 - Godmar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] expected performance of selecting large numbers of records

2015-01-31 Thread Kees Nuyt

On Sat, 31 Jan 2015 00:21:18 -0500, Godmar Back  wrote:

>I have a single SQLite 2-column table with a primary TEXT key and a value,
>like so (id TEXT PRIMARY KEY, value TEXT). One process adds new records
>periodically, perhaps 1-10 per minute. The database currently has 850,000
>entries and is 7.3GB large on disk.

You could benchmark an alternative structure with 
 CREATE TABLE ... ( id TEXT PRIMARY KEY, value TEXT ) WITHOUT ROWID;
http://www.sqlite.org/withoutrowid.html

>I also need to perform bulk queries, which appear to be very slow. On an
>average desktop PC, for instance, a "SELECT COUNT(*)" takes over 5 minutes.

count(*) does a full table scan, which takes time. You could add
ON INSERT/ON DELETE triggers to maintain a running count in a separate table.

>If I want to do a dump of the table, as in "SELECT value FROM " I'll
>quickly get "database is locked" errors. Googling revealed that those are
>because a long running select keeps a cursor, and thus readlock on the
>entire database.
>
> I have since rewritten the query using multiple SELECT *
>FROM ... LIMIT a, b where b = 50 and a = 0, 50, 100, 150, .  However,
>it takes 20 hours to fully extract the table's 850,000 records, with only
>minimal per record processing.

With the LIMIT clause SELECT will usually be slower, especially when ORDER BY
is something else than the primary key.

>My question: is this performance expected, or am I doing something wrong?
>Is there a quick way to count and/or dump an approximate snapshot of a
>single 2-column table such as the one shown above, even while keeping the
>database available for concurrent writes?

You probably should use PRAGMA journal_mode=WAL; , which allows one writer
and many readers concurrently.
http://www.sqlite.org/pragma.html#pragma_journal_mode

Also, have a look at 
http://www.sqlite.org/pragma.html#pragma_busy_timeout

>I'm using Ubuntu 12.10 with 3.7.9-2ubuntu1.1 libsqlite3 using the Python
>binding.

Check whether that the version reported by SELECT sqlite_version(); 
has the features I mentioned above (WITHOUT ROWID and WAL mode);
If necessary, compile your own library from source,
http://www.sqlite.org/download.html (bottom)

Hope this helps.

-- 
Regards, 

Kees Nuyt

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] expected performance of selecting large numbers of records

2015-01-31 Thread Igor Tandetnik


On 1/31/2015 12:21 AM, Godmar Back wrote:

If I want to do a dump of the table, as in "SELECT value FROM " I'll
quickly get "database is locked" errors. Googling revealed that those are
because a long running select keeps a cursor, and thus readlock on the
entire database. I have since rewritten the query using multiple SELECT *
FROM ... LIMIT a, b where b = 50 and a = 0, 50, 100, 150, .  However,
it takes 20 hours to fully extract the table's 850,000 records, with only
minimal per record processing.


That's a quadratic algorithm you have here. In order to start reading 
from the b-th record, SQLite needs to read and discard the previous b 
records. You would be better off with a "WHERE id > $LastID ORDER BY id 
" clause, where $LastID would be the key of the last record in the 
previous chunk.


Better still, switch to WAL mode ( https://www.sqlite.org/wal.html ) It 
allows readers to work simultaneously with a single writer. Then just 
run a single query across the whole table.



My question: is this performance expected, or am I doing something wrong?
Is there a quick way to count and/or dump an approximate snapshot of a
single 2-column table such as the one shown above, even while keeping the
database available for concurrent writes?


Do you ever delete records from this table? If you don't, you can get a 
count with


select max(rowid) from mytable;

That should be nearly instantaneous. If you do delete rows, and all you 
need is a quick count, then I would suggest adding an ON INSERT and ON 
DELETE triggers that would update a count stored in a separate, 
singleton table.


I'm not sure what you mean by "dump an approximate snapshot".
--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] expected performance of selecting large numbers of records

2015-01-30 Thread Godmar Back

I have a single SQLite 2-column table with a primary TEXT key and a value,
like so (id TEXT PRIMARY KEY, value TEXT). One process adds new records
periodically, perhaps 1-10 per minute. The database currently has 850,000
entries and is 7.3GB large on disk.

I also need to perform bulk queries, which appear to be very slow. On an
average desktop PC, for instance, a "SELECT COUNT(*)" takes over 5 minutes.

If I want to do a dump of the table, as in "SELECT value FROM " I'll
quickly get "database is locked" errors. Googling revealed that those are
because a long running select keeps a cursor, and thus readlock on the
entire database. I have since rewritten the query using multiple SELECT *
FROM ... LIMIT a, b where b = 50 and a = 0, 50, 100, 150, .  However,
it takes 20 hours to fully extract the table's 850,000 records, with only
minimal per record processing.

My question: is this performance expected, or am I doing something wrong?
Is there a quick way to count and/or dump an approximate snapshot of a
single 2-column table such as the one shown above, even while keeping the
database available for concurrent writes?

I'm using Ubuntu 12.10 with 3.7.9-2ubuntu1.1 libsqlite3 using the Python
binding.

 - Godmar
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] expected performance of selecting large numbers of records

Re: [sqlite] expected performance of selecting large numbers of records

Re: [sqlite] expected performance of selecting large numbers of records

Re: [sqlite] expected performance of selecting large numbers of records

[sqlite] expected performance of selecting large numbers of records

5 matches

Site Navigation

Mail list logo

Footer information