I know the insert into an indexed table performance has been discussed
here before, but I wanted to add to it and ask some questions.

I'm benchmarking loading insert statements, dumped from a 3.7GB file,
into an empty database with sqlite.exe:

BEGIN TRANSACTION;
CREATE TABLE Articles(RID integer primary key, RefCnt integer, MessageID
text, DomainID integer, Date integer, Lines integer, Flags integer,
PartSize integer, FileSize integer, SetSize integer, CurPart integer,
MaxPart integer, CurFile integer, MaxFile integer, Subject integer,
Author integer, FileName integer, DlDir integer, PrevPart integer,
NextPart integer, Score integer);
CREATE INDEX Articles_Index ON Articles (MessageID, DomainID);
INSERT INTO Articles
VALUES(1,3,'dXOuc.4884$pX3.1156',1,817862976,2521,1328512,317646,1049920
2,0,2,33,0,0,1,1,1,0,109099,109100,0);
INSERT INTO Articles
VALUES(2,3,'z3Puc.4938$pX3.1969',1,817863259,2524,1328512,318024,1052301
6,0,22,33,0,0,1,1,2,0,109676,109701,0);
-- etc --
INSERT INTO Articles
VALUES(10602742,1,'8OWdnQHIA84fFSLdRVn-uA',11,818109133,3050,1327744,384
300,384300,0,1,40,23,36,84580,3412,243311,0,0,10602743,0);
INSERT INTO Articles
VALUES(10602743,1,'8OWdnQDIA84aFSLdRVn-uA',11,818109135,3051,1065600,384
426,384426,0,2,40,23,36,84580,3412,243311,0,10602742,0,0);
COMMIT;

The operation starts off loading data at about 500K/sec, but drops to
about 50K/sec over 30 minutes and stays around there. (If it comes
through, see attached)  At that point about 1.2GB of the 3.7GB has
loaded.  It takes about 16 hours to complete the full 3.7GB.

Without the index, the whole 3.7GB loads in about 20 minutes.

CPU starts at 100% but soon starts to come down.  I expect my hard
drives to be able to deliver about 50MB/sec sequentially, about 5MB/sec
random access, and have a 8-10 msec seek time.  Memory allocation is
stable, sqlite.exe is not paging, and the thread is either "running" or
"waiting for executive".  Using taskinfo2003, the only files with
activity are the input sql file and the output sqlite DB file, the temp
and journal files are quite with no movement.  The file position for the
sql input file grows linearly toward it's end.  The file position for
the sqlite DB file.

As the DB grows, you can see the localization deteriorate by watching
the file position.  Taskinfo can give 1/2 second samples.  It starts out
spending most of it's time growing the file at the end.  As time goes
on, it starts jumping to positions back in the file.  After 30 min or
so, it is spending most of its time jumping around.  I presume it is
looking for where new records belong in the index ordering.

So, what is the limiting resource?  The seek time?

Can the index be configured to be kept more in memory?  Can it be kept
in a different file and would this help (I suspect if and only if the
file is on a different physical disk)?


TIA


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to