Are you wrapping your data dump into a "BEGIN" "COMMIT"?

Also...you mention using a select...I assume you are testing first to see if 
you need to update or insert?

You may want to look at the REPLACE clause and see if you can use that.

Here's a little benchmark program I wrote to test a super-simple one-integer 
insert to test sql speed and commit interval behavior.

Running it on my machine (Windows XP64 8-core 3Ghz gcc (tdm64-1) 4.5.1) I go 
from 320M inserts per second to 361K inserts per second when no begin/commit 
occurs.  With WAL mode turned on it only drops to 5.9M inserts per second.

D:\SQLite>batch 5000 5000
320000000.0 inserts per sec
D:\SQLite>batch 5000 2500
320000000.0 inserts per sec
D:\SQLite>batch 5000 1200
160000000.0 inserts per sec
D:\SQLite>batch 5000 600
160000000.0 inserts per sec
D:\SQLite>batch 5000 300
106666666.7 inserts per sec
D:\SQLite>batch 5000 150
53333333.3 inserts per sec
D:\SQLite>batch 5000 75
32000000.0 inserts per sec
D:\SQLite>batch 5000 40
17777777.8 inserts per sec
D:\SQLite>batch 5000 20
10000000.0 inserts per sec
D:\SQLite>batch 5000 10
3333333.3 inserts per sec
D:\SQLite>batch 5000 5
2406015.0 inserts per sec
D:\SQLite>batch 5000 2
526315.8 inserts per sec
D:\SQLite>batch 5000 1
360766.6 inserts per sec

#include <stdio.h>
#include <windows.h>
#include <winbase.h>
#include <time.h>
#include "sqlite3.h"
double elapsed()
{
    double t1;
    struct timeval tv;
    static long base_seconds;
    gettimeofday(&tv,NULL);
    if (base_seconds==0) base_seconds=tv.tv_sec;
    return (tv.tv_sec-base_seconds)+tv.tv_usec/(double)1000000;
}
int main(int argc, char *argv[])
{
 sqlite3 *db;
 sqlite3_stmt *stmt=NULL;
 int rc;
 int n=0;
 int nrec=0;
 int interval=0;
 double t1;
 if (argc != 3) {
  fprintf(stderr,"Usage: %s nrecords commit_interval\n",argv[0]);
  exit(-1);
 }
 nrec=atoi(argv[1]);
 interval=atoi(argv[2]);
 remove("batch.db");
 rc=sqlite3_open("batch.db",&db);
 if (rc != SQLITE_OK) {
  puts(sqlite3_errmsg(db));
 }
 sqlite3_exec(db,"create table t (i integer)",NULL,NULL,NULL)
 // turn on WAL mode if you want to test it
 //sqlite3_exec(db,"pragma journal_mode=WAL",NULL,NULL,NULL);
 sqlite3_prepare_v2(db,"insert into t values(?)",-1,&stmt,NULL);
 t1=elapsed();
 if (interval != 1) rc=sqlite3_exec(db,"begin",NULL,NULL,NULL);
 while(n < nrec) {
  ++n;
  if (interval != 1 && (n% interval)==0) {
   sqlite3_exec(db,"commit",NULL,NULL,NULL);
   sqlite3_exec(db,"begin",NULL,NULL,NULL);
  }
  sqlite3_bind_int(stmt,n,1);
  rc = sqlite3_step(stmt);
  if (rc != SQLITE_DONE) {
   puts(sqlite3_errmsg(db));
  }
 }
 sqlite3_exec(db,"commit",NULL,NULL,NULL);
 printf("%.1f inserts per sec\n",nrec/((elapsed()-t1)/CLOCKS_PER_SEC));
 sqlite3_exec(db,"commit",NULL,NULL,NULL);
 sqlite3_close(db);
}



Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate



________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Thomas Fjellstrom [tfjellst...@strangesoft.net]
Sent: Friday, February 11, 2011 9:50 PM
To: Teg
Cc: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

On February 11, 2011, Teg wrote:
> Hello Thomas,
>
> I download off the net at between 400Mbps to 1 Gbps and have to pack
> the data away into a Sqlite DB. I use an intermediate buffer on disk
> and a background thread to write the data to the DB. Obviously, I
> couldn't keep up if the data was continuous but, the lulls in between
> the peak data rate times give me time to pack it away. Even if I get
> an extended burst of data, it'll eventually make it in. If I was you,
> I'd start by having something service the interface and buffer the
> packet data and something else pulling the packet data from the buffer
> and feeding it to Sqlite. Then you won't be dropping packets.

The program is split into two separate threads, one is dedicated to just
capturing packets off the network, and storing them in a circular buffer in
memory. The second thread is dedicated to parsing the packets, calculating
stats, and occasionally flushing data to storage.

What normally happens, is as hosts haven't been seen for a while, they get
saved to the db, and are removed from the in memory hash table. But that
doesn't happen for local hosts, the fake host used to capture traffic totals, or
any external hosts that keep showing up. And temporarily I've added some code
to auto save all in memory hosts to the database, as I'm seeing some
instability in the interface with sqlite, causing the process to loose
information once in a while.

I've seen numbers on the internet stating that sqlite is super fast, should be
able to handle tens of thousands of inserts or updates in a second or two,
even in sync mode. So I'm a bit stumped as to why its performance is so low in
my case. One time, I saw it take 5 seconds to flush almost 3k rows back to the
db.

No hosts are actually ever removed from the database, which has about 120k
hosts in it by now, totaling up to a file size of around 7-8MB total. So its
not a lot of data, and the queries are pretty simple. So I really am stumped.

>
> T
>
> Friday, February 11, 2011, 6:49:16 PM, you wrote:
>
> TF> Hi, I have a small problem with a program I've developed.
>
> TF> It basically captures packets from a network device on linux and stores
> TF> transfer stats on each host seen. To store the stats long term I
> decided to TF> use sqlite, and when the program saves the stats every few
> minutes, it takes TF> about 4-5 seconds, and if I have the synchronous
> pragma turned off, it takes TF> 1-2 seconds.
>
> TF> These are the relevant sql commands:
>
> TF> CREATE TABLE hosts (id INTEGER PRIMARY KEY,\
> TF>   address INTEGER UNIQUE, \
> TF>   first_seen INTEGER DEFAULT CURRENT_DATETIME, \
> TF>   last_on INTEGER DEFAULT CURRENT_DATETIME, \
> TF>   last_off INTEGER, \
> TF>   rx_bytes INTEGER, \
> TF>   tx_bytes INTEGER);
>
> TF> UPDATE hosts SET last_on = ?, last_off = ?, rx_bytes = ?, tx_bytes = ?
> WHERE id TF> = ?
> TF> INSERT INTO hosts ( address, first_seen, last_on, rx_bytes, tx_bytes )
> VALUES ( TF> ?, ?, ?, ?, ? )
>
> TF> The stats saving code will interleave a bunch of UPDATEs and INSERTs,
> inside a TF> single transaction, maybe that has something to do with it,
> I'm not sure.
>
> TF> 1s to update them all isn't too bad, but it still can mean I'm
> potentially TF> dropping packets, which I'd really rather not do.
>
> TF> Thanks.


--
Thomas Fjellstrom
tfjellst...@strangesoft.net
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to