I've a perl script that reads rows from a file, does some processing and
then insert the resulting data in a db, one INSERT per row.
Usually the input file is made of hundreds of thousands, thus a typical
run consists of hundreds of thousands of INSERTions.
p...@ferret:~/log_analyzer >wc -l
XXX-2009-02-16_1400-success.log.bz2.done.ec.ed
793835 XXX-2009-02-16_1400-success.log.bz2.done.ec.ed
The db is extremly simple, with just one table:
sqlite> .schema
CREATE TABLE envelope (
smtp_id unsigned int(10), date date, time time, mailq_sndr unsigned
int(10), delivery_sndr unsigned int(10), customer_sndr unsigned int(10),
rpath varchar(50), domain_rcvr varchar(25), user_rcvr varchar(25),
size unsigned int(10), res unsigned int(10), msg varchar(250),
ip unsigned int(10), vsmtp varchar(50), retries int(10)
);
p...@ferret:~/log_analyzer >date ; ./log_parser_ms_sqlite.pl 2
XXX-2009-02-16_1400-success.log.bz2.done.ec.ed ; date
Tue Feb 17 10:42:34 CET 2009
Tue Feb 17 10:45:47 CET 2009
the performance was good until i decided to build indexes:
sqlite> delete from envelope;
sqlite> CREATE INDEX i1 on envelope (user_rcvr);
p...@ferret:~/log_analyzer >date ; ./log_parser_ms_sqlite.pl 2
XXX-2009-02-16_1400-success.log.bz2.done.ec.ed ; date
Tue Feb 17 10:47:48 CET 2009
Tue Feb 17 11:01:53 CET 2009
sqlite> delete from envelope;
sqlite> CREATE INDEX i2 on envelope (domain_rcvr);
p...@ferret:~/log_analyzer >date ; ./log_parser_ms_sqlite.pl 2
XXX-2009-02-16_1400-success.log.bz2.done.ec.ed ; date
Tue Feb 17 11:06:51 CET 2009
Tue Feb 17 11:31:21 CET 2009
and with 8-12 files created per hour, the db is always running too late
to be usable.
Insertions are made in group of X entries, where X is usually a power of
2 (default = 4096):
begin;
INSERT into ....
[repeat 4096 times]
commit;
I already played with PRAGMA synchronous = OFF and journal = MEMORY, but
in the end
the speedup was barely noticeable (and all the above examples were made
with these pragmas).
any other idea i could apply to squeeze some more performance?
--
bye,
P.
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users