Re: [sqlite] count(gid) takes too long

2006-08-30 Thread Martin Jenkins

Sripathi Raj wrote:


This is on a NFS on Windows XP Xeon - 2.8 Ghz, 1 Gig RAM and the database
size is 395 MB. I'm connecting to the database from Perl. There is no
discernible difference b/w Perl and sqlite shell.


NFS? Hmm... You have been following the locking threads, haven't you? ;)

How long does it take to read the entire file? Don't forget that network 
bandwidth is << than disk bandwidth. The disks in my Samba server may do 
>60MB/sec but my 100Mb network restricts this to about 8.5MB/sec. 
That's around 50 seconds for a straight read of your 400MB file before 
you even do anything with it. If you have to search indexes, go back to 
read data etc etc then it's bound to be slow. If your network is at all 
loaded the performance will be even worse.


It takes about 5 seconds to copy my 43MB test database to a Samba server
(1GHz Athlon, new SATA2 disks) over 100Mb CAT5. The SQLite shell takes a
good 9 seconds for your count(gid) query. It's 8.5 seconds from Python.

If I repeat the test the time goes down to about 0.4 seconds - XP cached 
the data.


If I (roughly) simulate lots of network traffic by listing all the files 
on the server the test takes about 28 seconds.


If I "saturate" the network by copying a ~4GB file from the server to my 
PC the test takes about 43 seconds.


That's a >100x slowdown without even trying.

Martin

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] count(gid) takes too long

2006-08-30 Thread Roger Binns
Sripathi Raj wrote:
> This is on a NFS

If you really mean that then you have two problems.

#1 - http://www.sqlite.org/faq.html#q7

The second is that you have network and server latency in addition to
disk latency for all disk accesses.  When iterating over so much data,
those latencies soon add up to a very big number.  If you really need
your client code to run on a different machine than the data is stored
on, then you should a client server database (eg Postgres, MySQL, Oracle).

Roger

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] count(gid) takes too long

2006-08-30 Thread Sripathi Raj

Hi,

This is on a NFS on Windows XP Xeon - 2.8 Ghz, 1 Gig RAM and the database
size is 395 MB. I'm connecting to the database from Perl. There is no
discernible difference b/w Perl and sqlite shell.

Raj

On 8/29/06, Martin Jenkins <[EMAIL PROTECTED]> wrote:


Sripathi Raj wrote:

> Main question: Using DBD-SQLite, select count(gid) from
es_src_media_info
> takes 130 secs. What gives?

Repeatably 0.3 seconds or less here with apsw and python2.4 on Windows
XP on a dual Athlon1600 with ~4 year old disks. Same sort of times in
the sqlite command line shell. Obviously I don't have real data so I
made some up - my database is about 43MB. What are you running on? Do
the times change if you use the sqlite shell? Are your discs fragged?

Martin





-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] count(gid) takes too long

2006-08-29 Thread Martin Jenkins

Sripathi Raj wrote:


Main question: Using DBD-SQLite, select count(gid) from es_src_media_info
takes 130 secs. What gives?


Repeatably 0.3 seconds or less here with apsw and python2.4 on Windows
XP on a dual Athlon1600 with ~4 year old disks. Same sort of times in
the sqlite command line shell. Obviously I don't have real data so I
made some up - my database is about 43MB. What are you running on? Do
the times change if you use the sqlite shell? Are your discs fragged?

Martin




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] count(gid) takes too long

2006-08-29 Thread John Stanton

Sripathi Raj wrote:

Hi,

I have a table with 500,000 records. The following is the schema of that
table:

CREATE TABLE ES_SRC_MEDIA_INFO (GID INTEGER PRIMARY KEY AUTOINCREMENT,
MEDIAPATH VARCHAR(256) NOT NULL UNIQUE,
BYTES_USED LONG,
BYTES_ON_DISK LONG,
MTIME LONG,
CTIME LONG,
TYPE VARCHAR(20),
CATEGORY VARCHAR(20),
TIMESTAMP LONG,
JOBID VARCHAR(30)

Main question: Using DBD-SQLite, select count(gid) from es_src_media_info
takes 130 secs. What gives?

I added an unique index on GID and tried it select count(gid) from
es_src_media_info where gid >= 1. It took around 90 seconds this time. Is
there any way to speed this up other than adding a trigger?

Small question: Does sqlite take the column types and length into
consideration while creating the table?

Thanks,

Raj

With or without an index Sqlite has to access every row to get a count. 
 If you want a rapidly accessed count keep a running total.


Sqlite ignores the text field sizes and makes them all of type TEXT.  In 
stores integers as 65 bit signed integers and other numbers as 64 bit 
floating point.  Read up on Sqlite "manifest typing" to learn more on 
how it stores data according to actual type rather than declared type.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-