Dennis Cote wrote:
John Stanton wrote:
I suspect that the timing difference is due to page overflows.
John,
I doubt that that is the case. The two fields being indexed are the
first field, and a second one that is only separated from the first by
the size of the first string (10 bytes) and three integers (max 27
bytes, typically less). The length of the second field is only 15 bytes
so all the information that sqlite needs to read during the indexing (52
bytes) should be contained in the initial part of the record even if the
other fields do spill onto overflow pages. SQLite does not need to read
those fields and won't follow the overflow chain (if one exists, which I
doubt). You can use the sqlite3_analyzer tool at
http://www.sqlite.org/download.html to see if your table is using any
overflow pages.
SQLite version 3.3.13
Enter ".help" for instructions
sqlite> CREATE TABLE keyword (key, contyp int, imagecount int, searchcat
int,
...> value, nextword, sec, ipr, fldseq int);
sqlite> .explain on
sqlite> explain CREATE INDEX valuekey on keyword (value, key);
addr opcode p1 p2 p3
---- -------------- ---------- ----------
---------------------------------
0 Goto 0 39
1 Noop 0 0
2 CreateIndex 0 0
3 MemStore 0 0
4 Dup 0 0
5 MemStore 1 1
6 Integer 0 0
7 OpenWrite 0 1
8 SetNumColumns 0 5
9 NewRowid 0 0
10 String8 0 0 index
11 String8 0 0 valuekey
12 String8 0 0 keyword
13 MemLoad 1 0
14 String8 0 0 CREATE INDEX valuekey on
keyword
15 MakeRecord 5 0 aaada
16 Insert 0 0
17 Close 0 0
18 Pop 1 0
19 MemLoad 0 0
20 Integer 0 0
21 OpenWrite 2 0 keyinfo(2,BINARY,BINARY)
22 Integer 0 0
23 OpenRead 1 2
24 SetNumColumns 1 9
25 Rewind 1 32
26 Rowid 1 0
27 Column 1 4
28 Column 1 0
29 MakeIdxRec 2 0 bb
30 IdxInsert 2 0
31 Next 1 26
32 Close 1 0
33 Close 2 0
34 Integer 2 0
35 SetCookie 0 0
36 ParseSchema 0 0 name='valuekey'
37 Expire 0 0
38 Halt 0 0
39 Transaction 0 1
40 VerifyCookie 0 1
41 Goto 0 1
42 Noop 0 0
sqlite>
The main index operation occurs on lines 26-31. For each record in the
table it pushes the rowid, column 4 (value), and column 0 (key) onto the
stack, builds an index record, and finally inserts the record into the
index.
The only difference in the single field case is that only one column is
pushed in this loop. That is what seems peculiar to me. The only thing I
can think of is that the index records are about double the size in the
compound index case, and therefore fewer records fit on a page, and
hence more pages must be allocated and linked to build the compound
index. I am surprised that this makes it take 5 times as long.
Dennis Cote
I retract the overflow page theory on your compelling evidence and now
understand better what it is doing after looking at the VDBE. By
building an index by successive insertions the tree is splitting and
balancing as it grows, and that is expensive. Double the size of the
key and you get twice as many leaf nodes and quite a few more interior
nodes.
If the keys order is very random the keys are being inserted all over
the tree which is slow. Presenting the keys in sorted sequence should
cut back on the fragmentation and will very likely build a more compact
tree by ensuring that each leaf node is filled.
An optimization for building such a tree would be to extract the keys,
sort them and build the tree bottom up. By avoiding all splitting and
jumping around the tree it should be an order of magnitude faster or
better. I took a quick look at the code and got the impression that a
fast index option could be built by a motivated user as a seperate
program and might be a handy tool for people managing very large Sqlite
databases. Cutting back a 20 hour run to 1-2 hours can be a big win.
Sqlite is something of a victim of its success. The embedded lite
database is being asked to perform enterprise level tasks which stretch
its envelope.
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------