Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread Gerry Snyder
Jens Miltner wrote:
>
> Any ideas why there would be such a _huge_ performance hit after  
> deleting and re-inserting records for a while?
Without deletes/inserts the reads are sequential, and the OS and/or the 
drive/controller are reading ahead for you, hiding much of the disk read 
and seek times.

After fragmentation reads are (to some extent) random, and fancy 
read-aheads do not help.

Instead of deleting, could you just somehow mark obsoleted rows as 
"deleted" (possibly with a new column, but probably some existing column 
could just have a unique value for that)? The file would grow a lot 
larger, and you would be doing a lot of unneeded reads, but the fact 
that they are sequential might more than make up for that.

Just a thought.


Gerry


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread Alexey Pechnikov
Hello!

В сообщении от Wednesday 19 November 2008 15:05:01 D. Richard Hipp написал(а):
> That's because "SELECT count(*) FROM t" has to read and count every  
> row in the table - all 1.2 million rows.  And this involves reading  
> all 1.2 million rows from disk.  If each row is 100 bytes in size,  
> that involves reading 120MB.

Is it possible using only PRIMARY KEY index for select count(*) queries? For 
other indexes it's possible as

explain query plan select count(*) from street where code >= '770' and 
code < '770'+1;
0|0|TABLE street WITH INDEX street_code_idx


Best regards, Alexey.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread John Stanton
Jens Miltner wrote:
> Am 19.11.2008 um 13:05 schrieb D. Richard Hipp:
> 
>> On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote:
>>
>>> Hi,
>>>
>>> we're seeing terrible performance problems when fetching data from  
>>> one
>>> of our tables:
>>> The table contains roughly 1.2 Million rows and a plain "SELECT
>>> COUNT(*) FROM t" query takes 8 minutes to finish.
>> That's because "SELECT count(*) FROM t" has to read and count every
>> row in the table - all 1.2 million rows.  And this involves reading
>> all 1.2 million rows from disk.  If each row is 100 bytes in size,
>> that involves reading 120MB.
>>
>> An SQLite database does *not* maintain the number of rows in a table
>> as separate metadata.  Many client/server database engines do maintain
>> such metadata.  When I was designing the SQLite file format, I
>> deliberately choose to omit such metadata because including it would
>> slow down inserts and deletes.
> 
> Yes, I know about this and it's usually not a problem.
> It only turned into a problem for this specific table.
> 
> As I mentioned in my original e-mail, after vacuuming the database,  
> the time to run the COUNT(*) query went down to about 5 seconds, so it  
> looks like somehow the database became fragmented enough to seriously  
> hit the performance of the serial row access...
> 
> BTW: we don't actually run the COUNT(*) query, but we see major  
> performance hits after a while with this table, so I figured I'd run  
> the most simple query first and found that the time needed for  
> COUNT(*) goes through the ceiling (from something like 5 seconds to  
> almost 9 minutes -  roughly a 100x time increase) after having deleted  
> and re-inserted rows in that table for a while.
> 
> Any ideas why there would be such a _huge_ performance hit after  
> deleting and re-inserting records for a while?
> Anything we can do to avoid this kind of performance-penalty- 
> fragmentation (other than regularly vacuuming)?
> 
> Thanks,
> 
>
Fragmentation is inherent when there are deletions and insertions, and 
involves orders of magnitude increases in access times.  Just look at 
the example of before and after when a badly fragmented disk is 
defragged to see the effect.

Many years ago we learned from practical experience to avoid such 
situations when designing applications.  High availability applications 
cannot tolerate an increase in chaos during operation.

> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread Jens Miltner

Am 19.11.2008 um 13:05 schrieb D. Richard Hipp:

>
> On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote:
>
>> Hi,
>>
>> we're seeing terrible performance problems when fetching data from  
>> one
>> of our tables:
>> The table contains roughly 1.2 Million rows and a plain "SELECT
>> COUNT(*) FROM t" query takes 8 minutes to finish.
>
> That's because "SELECT count(*) FROM t" has to read and count every
> row in the table - all 1.2 million rows.  And this involves reading
> all 1.2 million rows from disk.  If each row is 100 bytes in size,
> that involves reading 120MB.
>
> An SQLite database does *not* maintain the number of rows in a table
> as separate metadata.  Many client/server database engines do maintain
> such metadata.  When I was designing the SQLite file format, I
> deliberately choose to omit such metadata because including it would
> slow down inserts and deletes.

Yes, I know about this and it's usually not a problem.
It only turned into a problem for this specific table.

As I mentioned in my original e-mail, after vacuuming the database,  
the time to run the COUNT(*) query went down to about 5 seconds, so it  
looks like somehow the database became fragmented enough to seriously  
hit the performance of the serial row access...

BTW: we don't actually run the COUNT(*) query, but we see major  
performance hits after a while with this table, so I figured I'd run  
the most simple query first and found that the time needed for  
COUNT(*) goes through the ceiling (from something like 5 seconds to  
almost 9 minutes -  roughly a 100x time increase) after having deleted  
and re-inserted rows in that table for a while.

Any ideas why there would be such a _huge_ performance hit after  
deleting and re-inserting records for a while?
Anything we can do to avoid this kind of performance-penalty- 
fragmentation (other than regularly vacuuming)?

Thanks,


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread Brad Stiles
>> Out of interest why must it completely READ each entire row?  Is it
>> because '*' has been used?
>
> The database reads a page at a time.  A page is, by default, 1024
> bytes.  A single page might contain multiple rows, or a single large
> row might be spread across multiple pages.

> When rows are only about 100 bytes in size, they will all fit on a
> single page, so the entire row ends up being read, though only the
> header is decoded and interpreted.

So what is the explanation for the 1.2 million row table taking 8
minutes, and the 5 million row table taking 40 seconds when the row in
the larger, faster table is larger than the smaller, slower one?  The
OP claims that the "average" row size is about 100 bytes for the
smaller table, and a "few hundred" for the larger table, which
suggests that either his estimates of row size are incorrect, or
something else is going on.  Perhaps the variance from "average"
accounts for the difference?

I'm interested because I might have this problem shortly, and I'd like
to understand the causes in order to attempt to avoid this type of
problem.

Brad
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread D. Richard Hipp

On Nov 19, 2008, at 7:57 AM, Hardy, Andrew wrote:

> Out of interest why must it completely READ each entire row?  Is it  
> because '*' has been used?

The database reads a page at a time.  A page is, by default, 1024  
bytes.  A single page might contain multiple rows, or a single large  
row might be spread across multiple pages.

To do a count(*), only the header of each row actually needs to be  
seen.  For smaller rows, the header and the data are all on the same  
page, so the entire row ends up being read.  However, if a row  
contains large CLOBs or BLOBs and is spread across multiple pages,  
only the first page (the page containing the header) is read.

When rows are only about 100 bytes in size, they will all fit on a  
single page, so the entire row ends up being read, though only the  
header is decoded and interpreted.


>
>
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> ] On Behalf Of D. Richard Hipp
> Sent: 19 November 2008 12:05
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Terrible performance for one of our tables
>
>
> On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote:
>
>> Hi,
>>
>> we're seeing terrible performance problems when fetching data from  
>> one
>> of our tables:
>> The table contains roughly 1.2 Million rows and a plain "SELECT
>> COUNT(*) FROM t" query takes 8 minutes to finish.
>
> That's because "SELECT count(*) FROM t" has to read and count every  
> row in the table - all 1.2 million rows.  And this involves reading  
> all 1.2 million rows from disk.  If each row is 100 bytes in size,  
> that involves reading 120MB.
>
> An SQLite database does *not* maintain the number of rows in a table  
> as separate metadata.  Many client/server database engines do  
> maintain such metadata.  When I was designing the SQLite file  
> format, I deliberately choose to omit such metadata because  
> including it would slow down inserts and deletes.
>
> If you frequently need to know how many rows are in a certain table,  
> use insert and delete triggers to maintain the count yourself in a  
> separate table.  Then just read out the count from the separate  
> table when you need it, rather than recomputing it by reading all  
> 1.2 million rows of the original table.
>
>
> D. Richard Hipp
> [EMAIL PROTECTED]
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread Hardy, Andrew
Out of interest why must it completely READ each entire row?  Is it because '*' 
has been used?

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of D. Richard Hipp
Sent: 19 November 2008 12:05
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Terrible performance for one of our tables


On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote:

> Hi,
>
> we're seeing terrible performance problems when fetching data from one
> of our tables:
> The table contains roughly 1.2 Million rows and a plain "SELECT
> COUNT(*) FROM t" query takes 8 minutes to finish.

That's because "SELECT count(*) FROM t" has to read and count every row in the 
table - all 1.2 million rows.  And this involves reading all 1.2 million rows 
from disk.  If each row is 100 bytes in size, that involves reading 120MB.

An SQLite database does *not* maintain the number of rows in a table as 
separate metadata.  Many client/server database engines do maintain such 
metadata.  When I was designing the SQLite file format, I deliberately choose 
to omit such metadata because including it would slow down inserts and deletes.

If you frequently need to know how many rows are in a certain table, use insert 
and delete triggers to maintain the count yourself in a separate table.  Then 
just read out the count from the separate table when you need it, rather than 
recomputing it by reading all 1.2 million rows of the original table.


D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread D. Richard Hipp

On Nov 19, 2008, at 3:08 AM, Jens Miltner wrote:

> Hi,
>
> we're seeing terrible performance problems when fetching data from one
> of our tables:
> The table contains roughly 1.2 Million rows and a plain "SELECT
> COUNT(*) FROM t" query takes 8 minutes to finish.

That's because "SELECT count(*) FROM t" has to read and count every  
row in the table - all 1.2 million rows.  And this involves reading  
all 1.2 million rows from disk.  If each row is 100 bytes in size,  
that involves reading 120MB.

An SQLite database does *not* maintain the number of rows in a table  
as separate metadata.  Many client/server database engines do maintain  
such metadata.  When I was designing the SQLite file format, I  
deliberately choose to omit such metadata because including it would  
slow down inserts and deletes.

If you frequently need to know how many rows are in a certain table,  
use insert and delete triggers to maintain the count yourself in a  
separate table.  Then just read out the count from the separate table  
when you need it, rather than recomputing it by reading all 1.2  
million rows of the original table.


D. Richard Hipp
[EMAIL PROTECTED]



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread Elefterios Stamatogiannakis
What page size do you use in your database? Maybe you should increase it?

Assuming a table fully fragmented with a page size of 16KB then the I/O 
rate should be (if my calculations are correct) in the 2MB/s range 
(assuming a hard disk having 50MB/s bandwidth, 7msec seek time).

For 32KB page size you should get 4MB/s. Because the I/O rate is heavily 
affected by the seeks (the rate doubles for 32KB because the hard disk 
does roughly half the seeks), if SQLite does additional seeks (like 
trying to lock something), then everything halves so try to avoid any 
locking if you can.

lefteris

Jens Miltner wrote:
> Hi,
> 
> we're seeing terrible performance problems when fetching data from one  
> of our tables:
> The table contains roughly 1.2 Million rows and a plain "SELECT  
> COUNT(*) FROM t" query takes 8 minutes to finish.
> The table contains 10 fields, but the records average to about 100  
> Bytes of data total. The largest records are ~1k of data, but there  
> are only  a few hundred records with more than 256 Bytes of data.
> 
> Another table in the same database contains over 5 Million records  
> (and probably an average of a few hundred Bytes per record). Counting  
> rows in this table only takes about 40 seconds.
> 
> (All times taken from first run and they're reproducible on several  
> machines).
> 
> 
> When looking at a sample taken while sqlite is executing the COUNT(*)  
> query, almost all the time is spent inside read() calls.
> The average I/O rate is about 500kB - 800kB / seconds read  
> performance. When reading from other databases / other tables, I can  
> easily get 20-30 MB/s overall read performance.
> So, I can only conclude that for some reason, there seems to be some  
> terrible I/O behavior for the data in this specific table.
> 
> Running fs_usage shows that there are almost no consecutive pages  
> being read (i.e. the file offsets are not consecutive), so I suspect  
> that for some reason, this table is fragmented all across the 3 GB  
> database file, which may explain the slow overall read performance.
> 
> Now, there are two things that happened to this table, which didn't  
> happen to the other big table:
> 
> (1) we've updated the schema by appending a couple of columns
> (2) when data is updated, it's updated by first deleting a bunch of  
> rows and then re-inserting the new data. This happens once or twice a  
> day for almost all entries, thus the
> 
> Would either (1) or (2) lead to heavy fragmentation of this kind?
> 
> Vacuuming the database does resolve the performance issue, but  
> vacuuming that specific database took about 75 minutes (MacPro @  
> 3GHz), so it's not something we can do frequently (especially, since  
> the database is used in a server product that has 24/7 uptime) :(
> 
> Is there anything we could do to avoid the table fragmentation in the  
> first place (write queries in a different way, avoid deletions/re- 
> insertions, etc.)?
> (Note that I'm not talking about reclaiming free space here - that's  
> not really an issue for us. We need to avoid the performance hit we've  
> seen which IMHO is caused by the pages for a single table being  
> scattered all over the database file, thus requiring the paging code  
> to jump a lot in the file).
> 
> 
> Thanks,
> -jens
> 
> 
> BTW: this is SQLite 3.6.3, but at least running the COUNT queries is  
> not any faster with 3.5.x. Tests run on Mac OS X.
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Terrible performance for one of our tables

2008-11-19 Thread Marcus Grimm
Hi,

do you have a autoincrement primary key in that table ?
if not, try to add one giving sqlite the chance to
query an internal index rather than the table itselve.
I don't see why sqlite should read all the data from that table.

I've read somewhere that count(*) may scan the hole table,
if you allready have an primary key (autoincrement) you may
try with "SELECT COUNT(Primarykey) FROM t".

Or you may try by SELECT ID FROM t and step throu the results
and count manually.

I'm interested in the results.

Hope this helps.

Marcus

Jens Miltner wrote:
> Hi,
> 
> we're seeing terrible performance problems when fetching data from one  
> of our tables:
> The table contains roughly 1.2 Million rows and a plain "SELECT  
> COUNT(*) FROM t" query takes 8 minutes to finish.
> The table contains 10 fields, but the records average to about 100  
> Bytes of data total. The largest records are ~1k of data, but there  
> are only  a few hundred records with more than 256 Bytes of data.
> 
> Another table in the same database contains over 5 Million records  
> (and probably an average of a few hundred Bytes per record). Counting  
> rows in this table only takes about 40 seconds.
> 
> (All times taken from first run and they're reproducible on several  
> machines).
> 
> 
> When looking at a sample taken while sqlite is executing the COUNT(*)  
> query, almost all the time is spent inside read() calls.
> The average I/O rate is about 500kB - 800kB / seconds read  
> performance. When reading from other databases / other tables, I can  
> easily get 20-30 MB/s overall read performance.
> So, I can only conclude that for some reason, there seems to be some  
> terrible I/O behavior for the data in this specific table.
> 
> Running fs_usage shows that there are almost no consecutive pages  
> being read (i.e. the file offsets are not consecutive), so I suspect  
> that for some reason, this table is fragmented all across the 3 GB  
> database file, which may explain the slow overall read performance.
> 
> Now, there are two things that happened to this table, which didn't  
> happen to the other big table:
> 
> (1) we've updated the schema by appending a couple of columns
> (2) when data is updated, it's updated by first deleting a bunch of  
> rows and then re-inserting the new data. This happens once or twice a  
> day for almost all entries, thus the
> 
> Would either (1) or (2) lead to heavy fragmentation of this kind?
> 
> Vacuuming the database does resolve the performance issue, but  
> vacuuming that specific database took about 75 minutes (MacPro @  
> 3GHz), so it's not something we can do frequently (especially, since  
> the database is used in a server product that has 24/7 uptime) :(
> 
> Is there anything we could do to avoid the table fragmentation in the  
> first place (write queries in a different way, avoid deletions/re- 
> insertions, etc.)?
> (Note that I'm not talking about reclaiming free space here - that's  
> not really an issue for us. We need to avoid the performance hit we've  
> seen which IMHO is caused by the pages for a single table being  
> scattered all over the database file, thus requiring the paging code  
> to jump a lot in the file).
> 
> 
> Thanks,
> -jens
> 
> 
> BTW: this is SQLite 3.6.3, but at least running the COUNT queries is  
> not any faster with 3.5.x. Tests run on Mac OS X.
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Terrible performance for one of our tables

2008-11-19 Thread Jens Miltner
Hi,

we're seeing terrible performance problems when fetching data from one  
of our tables:
The table contains roughly 1.2 Million rows and a plain "SELECT  
COUNT(*) FROM t" query takes 8 minutes to finish.
The table contains 10 fields, but the records average to about 100  
Bytes of data total. The largest records are ~1k of data, but there  
are only  a few hundred records with more than 256 Bytes of data.

Another table in the same database contains over 5 Million records  
(and probably an average of a few hundred Bytes per record). Counting  
rows in this table only takes about 40 seconds.

(All times taken from first run and they're reproducible on several  
machines).


When looking at a sample taken while sqlite is executing the COUNT(*)  
query, almost all the time is spent inside read() calls.
The average I/O rate is about 500kB - 800kB / seconds read  
performance. When reading from other databases / other tables, I can  
easily get 20-30 MB/s overall read performance.
So, I can only conclude that for some reason, there seems to be some  
terrible I/O behavior for the data in this specific table.

Running fs_usage shows that there are almost no consecutive pages  
being read (i.e. the file offsets are not consecutive), so I suspect  
that for some reason, this table is fragmented all across the 3 GB  
database file, which may explain the slow overall read performance.

Now, there are two things that happened to this table, which didn't  
happen to the other big table:

(1) we've updated the schema by appending a couple of columns
(2) when data is updated, it's updated by first deleting a bunch of  
rows and then re-inserting the new data. This happens once or twice a  
day for almost all entries, thus the

Would either (1) or (2) lead to heavy fragmentation of this kind?

Vacuuming the database does resolve the performance issue, but  
vacuuming that specific database took about 75 minutes (MacPro @  
3GHz), so it's not something we can do frequently (especially, since  
the database is used in a server product that has 24/7 uptime) :(

Is there anything we could do to avoid the table fragmentation in the  
first place (write queries in a different way, avoid deletions/re- 
insertions, etc.)?
(Note that I'm not talking about reclaiming free space here - that's  
not really an issue for us. We need to avoid the performance hit we've  
seen which IMHO is caused by the pages for a single table being  
scattered all over the database file, thus requiring the paging code  
to jump a lot in the file).


Thanks,
-jens


BTW: this is SQLite 3.6.3, but at least running the COUNT queries is  
not any faster with 3.5.x. Tests run on Mac OS X.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users