Re: [sqlite] Curious performance issue with large number of inserts

Joe Wilson Sat, 05 May 2007 08:00:21 -0700

--- Joe Wilson <[EMAIL PROTECTED]> wrote:
> > create table filemap(id integer primary key,
> >    uid integer, gid integer, mtime integer,
> >    vol integer,
> >    path varchar(1024));
> > 
> > It has no indices built yet.
> > 
> > I'm adding quite a lot of records to it using a perl script which  
> > generates SQL like this:
> > 
> > begin;
> >   insert into filemap values(null, 1, 1, 1111, 0, "/path/to/file");
> >   ... 9999 more like this ...
> > commit;
> > ... repeat above ...
> > 
> > The uid, gid and mtime fields vary obviously, but there are very many  
> > paths for each uid/gid pair.  The idea is that I need to be able to  
> > say `show me all the files owned by UID x on volume y?', and we have  
> > enough data that awk can't hack it.
> > 
> > before doing this I've done a
> > 
> > pragma synchronous=off;
> > 
> > All this is just being piped into sqlite3 (I know I should use the  
> > proper interface, but it's a very quick & dirty hack).
> > 
> > I have about 16,000,000 records.  When adding them it goes really  
> > quickly for about the first 1,000,000 (using the big transaction  
> > trick made it much faster, as did the synchronous=off thing).  But  
> > then it slows down dramatically, perhaps by a factor of 100 or 1000  
> > or something.  I've actually left it running, but I'm not convinced  
> > it will have done all 16 million by Monday.
> > 
> > I have not looked at what the program is doing in the sense of system  
> > calls or any more detailed profiling.  It is clear that disk activity  
> > falls right off when it becomes slow.
> > 
> > Am I doing anything obviously stupid here?  I suspect I must be.
> 
> The batch insert you describe ought to be pretty fast since you're only
> appending data.
> 
> This is a guess - instead of inserting NULL for the INTEGER PRIMARY KEY 
> column try assigning an increasing number for each new row.
> This would avoid an OP_NewRowid per insert, which I would not think to
> be slow, but it's worth trying.


Ignore the idea above - an insert of NULL or an incrementing integer for the
INTEGER PRIMARY KEY yields the same timings.

I can't reproduce your problem. I can insert 16M records into your table
schema in 25 minutes on a 5 year old Windows machine. The sqlite3 process
had peak RAM usage of less than 20M.

If you had additional indexes on the table prior to insert (other than the 
INTEGER PRIMARY KEY), that could explain the slow results you are seeing.


$ cat ins.pl

print "
pragma page_size=8192;
pragma temp_store=memory;
pragma synchronous=off;
create table filemap(id integer primary key, uid integer, gid integer,
                     mtime integer, vol integer, path varchar(1024));
begin;
";
for (my $i = 1; $i <= 16000000; ++$i) {
  if ($i % 1000 == 0) { print "commit;\nbegin;\n"; }
  print "insert into filemap values(null,1,1,1111,0,'/path/to/file');\n";
}
print "commit;\n";

$ rm -f ins.db ; perl ins.pl | time ./sqlite3 ins.db
1389.89user 61.98system 25:13.57elapsed 95%CPU (0avgtext+0avgdata 
218880maxresident)k
0inputs+0outputs (4938major+0minor)pagefaults 0swaps

$ ls -l ins.db
-rw-r--r-- 1 User Nobody 512417792 May  5 10:47 ins.db

$ ./sqlite3 ins.db "select count(*) from filemap"
16000000


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] Curious performance issue with large number of inserts

Reply via email to