Re: [sqlite] Anomalously slow performance on updates to early entries in a DB

Black, Michael (IS) Fri, 09 Nov 2012 05:48:52 -0800

Hmmm...is this a disk head seeking problem?
You've got several TEXT entries which are either NULL or default to '';
I did a small experiment: on 3.7.13
CREATE TABLE container_stat (
                account TEXT,
                container TEXT,
                created_at TEXT,
                put_timestamp TEXT DEFAULT '0',
                delete_timestamp TEXT DEFAULT '0',
                object_count INTEGER,
                bytes_used INTEGER,
                reported_put_timestamp TEXT DEFAULT '0',
                reported_delete_timestamp TEXT DEFAULT '0',
                reported_object_count INTEGER DEFAULT 0,
                reported_bytes_used INTEGER DEFAULT 0,
                hash TEXT default '00000000000000000000000000000000',
                id TEXT,
                status TEXT DEFAULT '',
                status_changed_at TEXT DEFAULT '0',
                metadata TEXT DEFAULT '',
                x_container_sync_point1 INTEGER DEFAULT -1,
                x_container_sync_point2 INTEGER DEFAULT -1);
insert into container_stat(id,status,status_changed_at) 
values('id1','status1','');
insert into container_stat(id,status,status_changed_at) 
values('id2','status2','');
insert into container_stat(id,status,status_changed_at) 
values('id3','status3','');
insert into container_stat(id,status,status_changed_at) 
values('id4','status4','');
insert into container_stat(id,status,status_changed_at) 
values('id5','status5','');
delete from container_stat where account='id1';
insert into container_stat(account,status,status_changed_at) 
values('id1','status1 change1 to something else','status_changhed_at_1');


If you look at the order of data BEFORE the delete/insert occurs you get this 
(using "strings");
000000000000000000000000000000000000id5status5
000000000000000000000000000000000000id4status4
000000000000000000000000000000000000id3status3
000000000000000000000000000000000000id2status2
000000000000000000000000000000000000id1status1

After a delete/insert where one of the fields grows in size you get this:
id1000000000000000000000000000000000000status1 change1 to something 
elsestatus_changed_at_1
000000000000000000000000000000000000id5status5
000000000000000000000000000000000000id4status4
000000000000000000000000000000000000id3status3
000000000000000000000000000000000000id2status2
000000000000000000000000000000000000id1status1

So...the probability of an update needing to move due to larger data increases 
the closer you are to the beginning of the database.  Each update would reduce 
the likelihood of that record getting relocated again as the fields grow in 
size.

If you use default values that are 2 sigma of the sizes of your strings (or 
maybe just 2X the average length or so) you would reduce the head seek time on 
updating records.

It basically sounds possibly like your disk head is slewing from the beginning 
of the file to the end on many of your updates. 

Also...would "insert or update" help you a bit?
http://www.sqlite.org/lang_conflict.html



Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems

________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of O'Toole, Eamonn [eamonn.oto...@hp.com]
Sent: Friday, November 09, 2012 5:07 AM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Anomalously slow performance on updates to early entries 
in a DB

Hello all,

First of all, I'm a complete novice with respect to SQLite so apologies if 
there is an obvious answer to my question.  I've also posted this question in 
IRC, and it was suggested that I post the question to this mailing list.  We're 
running a test-bed of an object store (Openstack Swift) which uses SQLite to 
record information on the objects that are stored in a container.  The table 
itself (called "object") is very simple, it contains the modification time of 
the object, the size of object, the md5sum of the object, and the content-type 
of the object.  We are seeing a performance anomaly on updates to existing 
object records in the SQLite DB.  If the container DB is sufficiently large 
(about 10 million objects, 3.3GB) then the time to update records at the 
beginning of the database by order of entry is anomalously high.  The time is 
particularly bad for the first approx. 100K records, is somewhat better for the 
next 900K records, and settles down to a consistent average from
  approx 1 million records on.  If this consistent average time is around 7 
seconds for 10,000 updates, then we see times of about 170 seconds for 10,000 
updates on the first 100K records.  We don't see this anomalously high update 
time if we start the updates after the first 1 million records.

Note that table updates are performed by first DELETEing the entry and then 
INSERTing the changed entry.

Does anybody have any idea why we're seeing this behaviour, and what we can do 
to fix it?

Note that Chunky_Ks (IRC) has suggested that B-tree rebalancing could be the 
fundamental issue.

Thanks.


SQL Schema follows:

1 database: main

Tables:
container_stat
incoming_sync
object
outgoing_sync

container_stat:
CREATE TABLE container_stat (
                account TEXT,
                container TEXT,
                created_at TEXT,
                put_timestamp TEXT DEFAULT '0',
                delete_timestamp TEXT DEFAULT '0',
                object_count INTEGER,
                bytes_used INTEGER,
                reported_put_timestamp TEXT DEFAULT '0',
                reported_delete_timestamp TEXT DEFAULT '0',
                reported_object_count INTEGER DEFAULT 0,
                reported_bytes_used INTEGER DEFAULT 0,
                hash TEXT default '00000000000000000000000000000000',
                id TEXT,
                status TEXT DEFAULT '',
                status_changed_at TEXT DEFAULT '0',
                metadata TEXT DEFAULT '',
                x_container_sync_point1 INTEGER DEFAULT -1,
                x_container_sync_point2 INTEGER DEFAULT -1

incoming_sync:
CREATE TABLE incoming_sync (
                remote_id TEXT UNIQUE,
                sync_point INTEGER,
                updated_at TEXT DEFAULT 0
            );
CREATE TRIGGER incoming_sync_insert AFTER INSERT ON incoming_sync
            BEGIN
                UPDATE incoming_sync
                SET updated_at = STRFTIME('%s', 'NOW')
                WHERE ROWID = new.ROWID;
            END;
CREATE TRIGGER incoming_sync_update AFTER UPDATE ON incoming_sync
            BEGIN
                UPDATE incoming_sync
                SET updated_at = STRFTIME('%s', 'NOW')
                WHERE ROWID = new.ROWID;
            END;

object:
CREATE TABLE object (
                ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
                name TEXT,
                created_at TEXT,
                size INTEGER,
                content_type TEXT,
                etag TEXT,
                deleted INTEGER DEFAULT 0
            );
CREATE INDEX ix_object_deleted_name ON object (deleted, name);
CREATE TRIGGER object_delete AFTER DELETE ON object
            BEGIN
                UPDATE container_stat
                SET object_count = object_count - (1 - old.deleted),
                    bytes_used = bytes_used - old.size,
                    hash = chexor(hash, old.name, old.created_at);
            END;
CREATE TRIGGER object_insert AFTER INSERT ON object
            BEGIN
                UPDATE container_stat
                SET object_count = object_count + (1 - new.deleted),
                    bytes_used = bytes_used + new.size,
                    hash = chexor(hash, new.name, new.created_at);
            END;
CREATE TRIGGER object_update BEFORE UPDATE ON object
            BEGIN
                SELECT RAISE(FAIL, 'UPDATE not allowed; DELETE and INSERT');
            END;

outgoing_sync:
CREATE TABLE outgoing_sync (
               remote_id TEXT UNIQUE,
                sync_point INTEGER,
                updated_at TEXT DEFAULT 0
            );
CREATE TRIGGER outgoing_sync_insert AFTER INSERT ON outgoing_sync
            BEGIN
                UPDATE outgoing_sync
                SET updated_at = STRFTIME('%s', 'NOW')
                WHERE ROWID = new.ROWID;
            END;
CREATE TRIGGER outgoing_sync_update AFTER UPDATE ON outgoing_sync
            BEGIN
                UPDATE outgoing_sync
                SET updated_at = STRFTIME('%s', 'NOW')
                WHERE ROWID = new.ROWID;
            END;



_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Anomalously slow performance on updates to early entries in a DB

Reply via email to