Brannon King wrote:
Or is an Index sufficient?
Brandon,
Having a primary key is the same thing as adding a UNIQUE index on the
desired columns except for the case where the key is on a single column
with integer type (i.e. integer primary key). The later case is
optimized in sqlite, but must be done before the table is populated.
I have a table that looks like this:
CREATE TABLE IF NOT EXISTS results (qi INTEGER NOT NULL, ri INTEGER
NOT NULL, run INTEGER NOT NULL,
drl INTEGER NOT NULL, score INTEGER NOT NULL, qis INTEGER NOT NULL,
ris INTEGER NOT NULL );
I originally had this line as well: , PRIMARY KEY (qi, ri, run)
I need to do billions (no joke) of consecutive inserts into this
table. Writing binary out with std::ofstream was using 26% of my code
execution time according to the profiler. I changed to sqlite3, and
now the sqlite3_step function uses 50% of my execution time. After
reading the news groups, I removed the primary key. That dropped it
down to about 41%. That was significant. However, I was still
significantly slower than binary writes with ofstream. Then, I tried
the PRAGMA temp_store = 2. That made absolutely no difference. I'll
assume that's what it was to begin with, though it reports 0. Also,
from the profiler, it seems that the step function does a lot of
memory allocations and deallocations, yet I cannot find them in the code.
If it helps, I was testing 600k inserts in transactions of 0xFFF
inserts and my current settings:
PRAGMA auto_vacuum = 0; \
PRAGMA case_sensitive_like = 1; \
PRAGMA cache_size = 32768; \
PRAGMA default_cache_size = 32768; \
PRAGMA count_changes = 0; \
PRAGMA synchronous = 0; \
PRAGMA page_size = 4096; \
PRAGMA temp_store = 2;
What else can I do to speed up my inserts?
There are a couple of things to look at. First, you don't say exactly
how you are executing the inserts. You should use sqlite3_prepare() to
compile an insert statement with a your seven columns as parameters.
insert into results values (?, ?, ?, ?, ?, ?, ?)
Then inside your insert loop you should bind values to each of these
parameters. If you have known patterns in your input data, and some of
the parameter values repeat, you can eliminate re-binding the same
value to a parameter for each loop since sqlite will retain the last
bound value. Finally, step the statement and then reset it for the next
loop iteration.
You should add your index after you have inserted all the data.
You can also speed the execution time of the insert statement somewhat
by dropping the NOT NULL constraints on your data columns. Each one of
those is checked for every insert. For comparison here is a dump of the
code generated for a simple two column table with and without the not
null constraints.
SQLite version 3.3.5
Enter ".help" for instructions
sqlite> create table t1 (a integer, b integer);
sqlite> create table t2 (a integer not null, b integer not null);
sqlite> .explain on
sqlite> explain insert into t1 values (?, ?);
addr opcode p1 p2 p3
---- -------------- ---------- ----------
---------------------------------
0 Goto 0 11
1 Integer 0 0
2 OpenWrite 0 2
3 SetNumColumns 0 2
4 NewRowid 0 0
5 Variable 1 0
6 Variable 2 0
7 MakeRecord 2 0 dd
8 Insert 0 3 t1
9 Close 0 0
10 Halt 0 0
11 Transaction 0 1
12 VerifyCookie 0 2
13 Goto 0 1
14 Noop 0 0
sqlite> explain insert into t2 values (?, ?);
addr opcode p1 p2 p3
---- -------------- ---------- ----------
---------------------------------
0 Goto 0 17
1 Integer 0 0
2 OpenWrite 0 3
3 SetNumColumns 0 2
4 NewRowid 0 0
5 Variable 1 0
6 Variable 2 0
7 Dup 1 1
8 NotNull 1 10
9 Halt 19 2 t2.a may not be NULL
10 Dup 0 1
11 NotNull 1 13
12 Halt 19 2 t2.b may not be NULL
13 MakeRecord 2 0 dd
14 Insert 0 3 t2
15 Close 0 0
16 Halt 0 0
17 Transaction 0 1
18 VerifyCookie 0 2
19 Goto 0 1
20 Noop 0 0
sqlite>
Dropping the not null constraints makes perfect sense if you are always
using sqlite3_bind_int() to assign the column values, since there is no
way for your application to insert a null value (i.e. you need to call
sqlite3_bind_null() to insert a null value).
HTH
Dennis Cote