Re: [sqlite] Anomalously slow performance on updates to early entries in a DB

2012-11-09 Thread Clemens Ladisch
O'Toole, Eamonn wrote:
>> From: Clemens Ladisch
>> Is it the DELETE or the INSERT that is slow, or both?
>
> Good question - I don't know
>
>> Are you addressing the records by ROWID or by name?
>
>  By name

So when the DELETE searches for the record to be deleted, it has to scan
the *entire* table ...

... unless you add an index on that field.

(And does this mean that the name, and not the ROWID, should be the
primary key?)

>> How big are the records, compared to the page size?
>> (And what is the output of sqlite3_analyzer?)
>
> I'll post the sqlite_analyzer output in a separate mail.

Nothing interesting there.

You might consider increasing the page size, which reduces page
management overhead slightly.  (All of today's file systems and OS
caches use a page size of 4 KB anyway, so using smaller DB pages does
not make sense.)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Anomalously slow performance on updates to early entries in a DB

2012-11-09 Thread O'Toole, Eamonn
Sqlite_analyzer output:

/** Disk-Space Utilization Report For container10M.db
*** As of 2012-Nov-09 13:04:53

Page size in bytes 1024  
Pages in the whole file (measured) 3412536   
Pages in the whole file (calculated).. 3412535   
Pages that store data. 3412497 99.999% 
Pages on the freelist (per header) 38   0.001% 
Pages on the freelist (calculated) 39   0.001% 
Pages of auto-vacuum overhead. 00.0% 
Number of tables in the database.. 6 
Number of indices. 3 
Number of named indices... 1 
Automatically generated indices... 2 
Size of the file in bytes. 3494436864
Bytes of user payload stored.. 1768889095  50.6% 

*** Page counts for all tables with their indices 

OBJECT 3412485 99.999% 
SQLITE_MASTER. 60.0% 
INCOMING_SYNC. 20.0% 
OUTGOING_SYNC. 20.0% 
CONTAINER_STAT 10.0% 
SQLITE_SEQUENCE... 10.0% 

*** All tables and indices ***

Percentage of total database..  99.999%  
Number of entries. 2017  
Bytes of storage consumed. 3494396928
Bytes of payload.. 2941393332  84.2% 
Average payload per entry. 147.07
Average unused bytes per entry 19.51 
Average fanout 90.00 
Fragmentation.  75.9%
Maximum payload per entry. 929   
Entries that use overflow. 00.0% 
Index pages used.. 22459 
Primary pages used 3390038   
Overflow pages used... 0 
Total pages used.. 3412497   
Unused bytes on index pages... 2745050 11.9% 
Unused bytes on primary pages. 387423351   11.2% 
Unused bytes on overflow pages 0 
Unused bytes on all pages. 390168401   11.2% 

*** All tables ***

Percentage of total database..  59.3%
Number of entries. 1017  
Bytes of storage consumed. 2071290880
Bytes of payload.. 1768893042  85.4% 
Average payload per entry. 176.89
Average unused bytes per entry 18.62 
Average fanout 90.00 
Fragmentation.  59.3%
Maximum payload per entry. 929   
Entries that use overflow. 00.0% 
Index pages used.. 22459 
Primary pages used 2000286   
Overflow pages used... 0 
Total pages used.. 2022745   
Unused bytes on index pages... 2745050 11.9% 
Unused bytes on primary pages. 1834946059.0% 
Unused bytes on overflow pages 0 
Unused bytes on all pages. 1862396559.0% 

*** All indices **

Percentage of total database..  40.7%
Number of entries. 1000  
Bytes of storage consumed. 1423106048
Bytes of payload.. 1172500290  82.4% 
Average payload per entry. 117.25
Average unused bytes per entry 20.39 
Fragmentation.  99.910%  
Maximum payload per entry. 119   
Entries that use overflow. 00.0% 
Primary pages used 1389752   
Overflow pages used... 0 
Total pages used.. 1389752   
Unused bytes on primary pages. 203928746   14.3% 
Unused bytes on overflow pages 0 
Unused bytes on all pages. 203928746   14.3% 

*** Table CONTAINER_STAT *

Percentage of total database..   0.0%
Number of entries. 1 
Bytes of storage consumed. 1024  
Bytes of payload.. 185 18.1% 
Average payload per entry. 185.00
Average unused bytes per entry 826.00
Maximum payload per entry. 185   
Entries that use overflow. 00.0% 
Primary pages used 1 
Overflow pages used... 0 
Total pages used.. 1 
Unused bytes on primary pages. 826 80.7% 
Unused bytes on overflow pages 0 
Unused bytes on all pages. 826 80.7% 

*** Table INCOMING_SYNC and all its indices **


Re: [sqlite] Anomalously slow performance on updates to early entries in a DB

2012-11-09 Thread O'Toole, Eamonn
Thanks Clemens, comments below:

> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Clemens Ladisch
> Sent: 09 November 2012 12:26
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] Anomalously slow performance on updates to early
> entries in a DB
> 
> O'Toole, Eamonn wrote:
> > We are seeing a performance anomaly on updates to existing object
> > records in the SQLite DB.  If the container DB is sufficiently large
> > (about 10 million objects, 3.3GB) then the time to update records at
> > the beginning of the database by order of entry is anomalously high.
> 
> Does VACUUM change anything?
[O'Toole, Eamonn] It improves the times, but there is still a very large 
performance penalty on updates to the beginning of the db.
> 
> > The time is particularly bad for the first approx. 100K records, is
> > somewhat better for the next 900K records, and settles down to
> > a consistent average from approx 1 million records on.  If this
> > consistent average time is around 7 seconds for 10,000 updates, then
> > we see times of about 170 seconds for 10,000 updates on the first
> 100K
> > records.  We don't see this anomalously high update time if we start
> > the updates after the first 1 million records.
> >
> > Note that table updates are performed by first DELETEing the entry
> and
> > then INSERTing the changed entry.
> 
> Is it the DELETE or the INSERT that is slow, or both?
[O'Toole, Eamonn] Good question - I don't know
> 
> Are you addressing the records by ROWID or by name?
[O'Toole, Eamonn] By name
> 
> > Note that Chunky_Ks (IRC) has suggested that B-tree rebalancing could
> > be the fundamental issue.
> 
> How big are the records, compared to the page size?
> (And what is the output of sqlite3_analyzer?)
[O'Toole, Eamonn] I'll post the sqlite_analyzer output in a separate mail.
> 
> What kind of updates are these?  If you're changing many values in the
> 'deleted' column in the same way, it might be possible that the
> ix_object_deleted_name index needs to be rebalanced.
[O'Toole, Eamonn] In our tests we change just the "created_at" field in the db 
entry.  However, more commonly the "size" and "etag" (md5sum) fields would also 
change.
> 
> > CREATE TRIGGER object_update BEFORE UPDATE ON object
> > BEGIN
> > SELECT RAISE(FAIL, 'UPDATE not allowed; DELETE and
> INSERT');
> > END;
> 
> Couldn't you implement this by running the code of both DELETE and
> INSERT triggers for the old and new records?  (This should avoid the
> need to rebalance the object table.)
> 
> 
[O'Toole, Eamonn] Good point.  On the face of it, I think that you're correct.  
I'll try this and see.

> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Anomalously slow performance on updates to early entries in a DB

2012-11-09 Thread O'Toole, Eamonn
Thanks Michael.  Comments below:

> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Black, Michael (IS)
> Sent: 09 November 2012 13:49
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Anomalously slow performance on updates to early
> entries in a DB
> 
> Hmmm...is this a disk head seeking problem?
> You've got several TEXT entries which are either NULL or default to '';
> I did a small experiment: on 3.7.13
> CREATE TABLE container_stat (
> account TEXT,
> container TEXT,
> created_at TEXT,
> put_timestamp TEXT DEFAULT '0',
> delete_timestamp TEXT DEFAULT '0',
> object_count INTEGER,
> bytes_used INTEGER,
> reported_put_timestamp TEXT DEFAULT '0',
> reported_delete_timestamp TEXT DEFAULT '0',
> reported_object_count INTEGER DEFAULT 0,
> reported_bytes_used INTEGER DEFAULT 0,
> hash TEXT default '',
> id TEXT,
> status TEXT DEFAULT '',
> status_changed_at TEXT DEFAULT '0',
> metadata TEXT DEFAULT '',
> x_container_sync_point1 INTEGER DEFAULT -1,
> x_container_sync_point2 INTEGER DEFAULT -1);
> insert into container_stat(id,status,status_changed_at)
> values('id1','status1','');
> insert into container_stat(id,status,status_changed_at)
> values('id2','status2','');
> insert into container_stat(id,status,status_changed_at)
> values('id3','status3','');
> insert into container_stat(id,status,status_changed_at)
> values('id4','status4','');
> insert into container_stat(id,status,status_changed_at)
> values('id5','status5','');
> delete from container_stat where account='id1';
> insert into container_stat(account,status,status_changed_at)
> values('id1','status1 change1 to something
> else','status_changhed_at_1');
> 
> If you look at the order of data BEFORE the delete/insert occurs you
> get this (using "strings");
> id5status5
> id4status4
> id3status3
> id2status2
> id1status1
> 
> After a delete/insert where one of the fields grows in size you get
> this:
> id1status1 change1 to something
> elsestatus_changed_at_1
> id5status5
> id4status4
> id3status3
> id2status2
> id1status1
> 
> So...the probability of an update needing to move due to larger data
> increases the closer you are to the beginning of the database.  Each
> update would reduce the likelihood of that record getting relocated
> again as the fields grow in size.
>
[O'Toole, Eamonn] This definitely sounds like it could be an issue.  There is 
just one container_stat entry (confirmed by sqlite3_analyzer output which I'll 
post later).  So you're saying that the single container_stat table entry is 
potentially being relocated very frequently the closer the update is to the 
beginning of the db?


 
> If you use default values that are 2 sigma of the sizes of your strings
> (or maybe just 2X the average length or so) you would reduce the head
> seek time on updating records.
> 
> It basically sounds possibly like your disk head is slewing from the
> beginning of the file to the end on many of your updates.
[O'Toole, Eamonn] What we do see is that the disk utilization is at 100% during 
updates to the beginning of the file, but the actual throughput in terms of 
MB/s is low - which tallies with your hypothesis.

> 
> Also...would "insert or update" help you a bit?
> http://www.sqlite.org/lang_conflict.html
[O'Toole, Eamonn] I'll have a look at this, thanks.
> 
> 
> 
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Advanced GEOINT Solutions Operating Unit
> Northrop Grumman Information Systems
> 
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org]
> on behalf of O'Toole, Eamonn [eamonn.oto...@hp.com]
> Sent: Friday, November 09, 2012 5:07 AM
> To: sqlite-users@sqlite.org
> Subject: EXT :[sqlite] Anomalously slow performance on updates to early
> entries in a DB
> 
> Hello all,
> 
> First of all, I'm a complete novice with respect to SQLite so apologies
> if there is an obvious answer to my question.  I've also posted this
> question in IRC, and it was suggested that I post the question to this
> mailing list.  We're running a test-bed of an object store (Openstack
> Swift) which uses SQLite to record information on the objects that are
> stored in a container.  The 

Re: [sqlite] Anomalously slow performance on updates to early entries in a DB

2012-11-09 Thread Black, Michael (IS)
Hmmm...is this a disk head seeking problem?
You've got several TEXT entries which are either NULL or default to '';
I did a small experiment: on 3.7.13
CREATE TABLE container_stat (
account TEXT,
container TEXT,
created_at TEXT,
put_timestamp TEXT DEFAULT '0',
delete_timestamp TEXT DEFAULT '0',
object_count INTEGER,
bytes_used INTEGER,
reported_put_timestamp TEXT DEFAULT '0',
reported_delete_timestamp TEXT DEFAULT '0',
reported_object_count INTEGER DEFAULT 0,
reported_bytes_used INTEGER DEFAULT 0,
hash TEXT default '',
id TEXT,
status TEXT DEFAULT '',
status_changed_at TEXT DEFAULT '0',
metadata TEXT DEFAULT '',
x_container_sync_point1 INTEGER DEFAULT -1,
x_container_sync_point2 INTEGER DEFAULT -1);
insert into container_stat(id,status,status_changed_at) 
values('id1','status1','');
insert into container_stat(id,status,status_changed_at) 
values('id2','status2','');
insert into container_stat(id,status,status_changed_at) 
values('id3','status3','');
insert into container_stat(id,status,status_changed_at) 
values('id4','status4','');
insert into container_stat(id,status,status_changed_at) 
values('id5','status5','');
delete from container_stat where account='id1';
insert into container_stat(account,status,status_changed_at) 
values('id1','status1 change1 to something else','status_changhed_at_1');

If you look at the order of data BEFORE the delete/insert occurs you get this 
(using "strings");
id5status5
id4status4
id3status3
id2status2
id1status1

After a delete/insert where one of the fields grows in size you get this:
id1status1 change1 to something 
elsestatus_changed_at_1
id5status5
id4status4
id3status3
id2status2
id1status1

So...the probability of an update needing to move due to larger data increases 
the closer you are to the beginning of the database.  Each update would reduce 
the likelihood of that record getting relocated again as the fields grow in 
size.

If you use default values that are 2 sigma of the sizes of your strings (or 
maybe just 2X the average length or so) you would reduce the head seek time on 
updating records.

It basically sounds possibly like your disk head is slewing from the beginning 
of the file to the end on many of your updates. 

Also...would "insert or update" help you a bit?
http://www.sqlite.org/lang_conflict.html



Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems


From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of O'Toole, Eamonn [eamonn.oto...@hp.com]
Sent: Friday, November 09, 2012 5:07 AM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Anomalously slow performance on updates to early entries 
in a DB

Hello all,

First of all, I'm a complete novice with respect to SQLite so apologies if 
there is an obvious answer to my question.  I've also posted this question in 
IRC, and it was suggested that I post the question to this mailing list.  We're 
running a test-bed of an object store (Openstack Swift) which uses SQLite to 
record information on the objects that are stored in a container.  The table 
itself (called "object") is very simple, it contains the modification time of 
the object, the size of object, the md5sum of the object, and the content-type 
of the object.  We are seeing a performance anomaly on updates to existing 
object records in the SQLite DB.  If the container DB is sufficiently large 
(about 10 million objects, 3.3GB) then the time to update records at the 
beginning of the database by order of entry is anomalously high.  The time is 
particularly bad for the first approx. 100K records, is somewhat better for the 
next 900K records, and settles down to a consistent average from
  approx 1 million records on.  If this consistent average time is around 7 
seconds for 10,000 updates, then we see times of about 170 seconds for 10,000 
updates on the first 100K records.  We don't see this anomalously high update 
time if we start the updates after the first 1 million records.

Note that table updates are performed by first DELETEing the entry and then 
INSERTing the changed entry.

Does anybody have any idea why we're seeing this behaviour, and what we can do 
to fix it?

Re: [sqlite] Anomalously slow performance on updates to early entries in a DB

2012-11-09 Thread Clemens Ladisch
O'Toole, Eamonn wrote:
> We are seeing a performance anomaly on updates to existing object
> records in the SQLite DB.  If the container DB is sufficiently large
> (about 10 million objects, 3.3GB) then the time to update records at
> the beginning of the database by order of entry is anomalously high.

Does VACUUM change anything?

> The time is particularly bad for the first approx. 100K records, is
> somewhat better for the next 900K records, and settles down to
> a consistent average from approx 1 million records on.  If this
> consistent average time is around 7 seconds for 10,000 updates, then
> we see times of about 170 seconds for 10,000 updates on the first 100K
> records.  We don't see this anomalously high update time if we start
> the updates after the first 1 million records.
>
> Note that table updates are performed by first DELETEing the entry and
> then INSERTing the changed entry.

Is it the DELETE or the INSERT that is slow, or both?

Are you addressing the records by ROWID or by name?

> Note that Chunky_Ks (IRC) has suggested that B-tree rebalancing could
> be the fundamental issue.

How big are the records, compared to the page size?
(And what is the output of sqlite3_analyzer?)

What kind of updates are these?  If you're changing many values in the
'deleted' column in the same way, it might be possible that the
ix_object_deleted_name index needs to be rebalanced.

> CREATE TRIGGER object_update BEFORE UPDATE ON object
> BEGIN
> SELECT RAISE(FAIL, 'UPDATE not allowed; DELETE and INSERT');
> END;

Couldn't you implement this by running the code of both DELETE and
INSERT triggers for the old and new records?  (This should avoid the
need to rebalance the object table.)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users