I've a perl script that reads rows from a file, does some processing and 
then insert the resulting data in a db, one INSERT per row.
Usually the input file is made of hundreds of thousands, thus a typical 
run consists of hundreds of thousands of INSERTions.

p...@ferret:~/log_analyzer >wc -l 
XXX-2009-02-16_1400-success.log.bz2.done.ec.ed
793835 XXX-2009-02-16_1400-success.log.bz2.done.ec.ed

The db is extremly simple, with just one table:

sqlite> .schema
CREATE TABLE envelope (
    smtp_id unsigned int(10), date date, time time, mailq_sndr unsigned 
int(10), delivery_sndr unsigned int(10), customer_sndr unsigned int(10),
    rpath varchar(50), domain_rcvr varchar(25), user_rcvr varchar(25), 
size unsigned int(10), res unsigned int(10), msg varchar(250),
    ip unsigned int(10), vsmtp varchar(50), retries int(10)
);

p...@ferret:~/log_analyzer >date ; ./log_parser_ms_sqlite.pl 2 
XXX-2009-02-16_1400-success.log.bz2.done.ec.ed ; date
Tue Feb 17 10:42:34 CET 2009
Tue Feb 17 10:45:47 CET 2009

the performance was good until i decided to build indexes:

sqlite> delete from envelope;
sqlite> CREATE INDEX i1 on envelope (user_rcvr);

p...@ferret:~/log_analyzer >date ; ./log_parser_ms_sqlite.pl 2 
XXX-2009-02-16_1400-success.log.bz2.done.ec.ed ; date
Tue Feb 17 10:47:48 CET 2009
Tue Feb 17 11:01:53 CET 2009

sqlite> delete from envelope;
sqlite> CREATE INDEX i2 on envelope (domain_rcvr);

p...@ferret:~/log_analyzer >date ; ./log_parser_ms_sqlite.pl 2 
XXX-2009-02-16_1400-success.log.bz2.done.ec.ed ; date
Tue Feb 17 11:06:51 CET 2009
Tue Feb 17 11:31:21 CET 2009

and with 8-12 files created per hour, the db is always running too late 
to be usable.

Insertions are made in group of X entries, where X is usually a power of 
2 (default = 4096):

begin;
INSERT into ....
[repeat 4096 times]
commit;

I already played with PRAGMA synchronous = OFF and journal = MEMORY, but 
in the end
the speedup was barely noticeable (and all the above examples were made 
with these pragmas).
any other idea i could apply to squeeze some more performance?

-- 

bye,
P.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to