Re: [sqlite] CREATE INDEX performance

John Stanton Tue, 27 Mar 2007 16:00:53 -0800

Dennis Cote wrote:

John Stanton wrote:
I suspect that the timing difference is due to page overflows.
John,
I doubt that that is the case. The two fields being indexed are thefirst field, and a second one that is only separated from the first bythe size of the first string (10 bytes) and three integers (max 27bytes, typically less). The length of the second field is only 15 bytesso all the information that sqlite needs to read during the indexing (52bytes) should be contained in the initial part of the record even if theother fields do spill onto overflow pages. SQLite does not need to readthose fields and won't follow the overflow chain (if one exists, which Idoubt). You can use the sqlite3_analyzer tool athttp://www.sqlite.org/download.html to see if your table is using anyoverflow pages.
SQLite version 3.3.13
Enter ".help" for instructions
sqlite> CREATE TABLE keyword (key, contyp int, imagecount int, searchcatint,
  ...> value, nextword, sec, ipr, fldseq int);
sqlite> .explain on
sqlite> explain CREATE INDEX valuekey on keyword (value, key);
addr  opcode          p1          p2          p3
---- -------------- ---------- -------------------------------------------
0     Goto            0           39
1     Noop            0           0
2     CreateIndex     0           0
3     MemStore        0           0
4     Dup             0           0
5     MemStore        1           1
6     Integer         0           0
7     OpenWrite       0           1
8     SetNumColumns   0           5
9     NewRowid        0           0
10    String8         0           0           index
11    String8         0           0           valuekey
12    String8         0           0           keyword
13    MemLoad         1           0
14 String8 0 0 CREATE INDEX valuekey onkeyword
15    MakeRecord      5           0           aaada
16    Insert          0           0
17    Close           0           0
18    Pop             1           0
19    MemLoad         0           0
20    Integer         0           0
21    OpenWrite       2           0           keyinfo(2,BINARY,BINARY)
22    Integer         0           0
23    OpenRead        1           2
24    SetNumColumns   1           9
25    Rewind          1           32
26    Rowid           1           0
27    Column          1           4
28    Column          1           0
29    MakeIdxRec      2           0           bb
30    IdxInsert       2           0
31    Next            1           26
32    Close           1           0
33    Close           2           0
34    Integer         2           0
35    SetCookie       0           0
36    ParseSchema     0           0           name='valuekey'
37    Expire          0           0
38    Halt            0           0
39    Transaction     0           1
40    VerifyCookie    0           1
41    Goto            0           1
42    Noop            0           0
sqlite>
The main index operation occurs on lines 26-31. For each record in thetable it pushes the rowid, column 4 (value), and column 0 (key) onto thestack, builds an index record, and finally inserts the record into theindex.
The only difference in the single field case is that only one column ispushed in this loop. That is what seems peculiar to me. The only thing Ican think of is that the index records are about double the size in thecompound index case, and therefore fewer records fit on a page, andhence more pages must be allocated and linked to build the compoundindex. I am surprised that this makes it take 5 times as long.
Dennis Cote

I retract the overflow page theory on your compelling evidence and nowunderstand better what it is doing after looking at the VDBE. Bybuilding an index by successive insertions the tree is splitting andbalancing as it grows, and that is expensive. Double the size of thekey and you get twice as many leaf nodes and quite a few more interiornodes.

If the keys order is very random the keys are being inserted all overthe tree which is slow. Presenting the keys in sorted sequence shouldcut back on the fragmentation and will very likely build a more compacttree by ensuring that each leaf node is filled.

An optimization for building such a tree would be to extract the keys,sort them and build the tree bottom up. By avoiding all splitting andjumping around the tree it should be an order of magnitude faster orbetter. I took a quick look at the code and got the impression that afast index option could be built by a motivated user as a seperateprogram and might be a handy tool for people managing very large Sqlitedatabases. Cutting back a 20 hour run to 1-2 hours can be a big win.

Sqlite is something of a victim of its success. The embedded litedatabase is being asked to perform enterprise level tasks which stretchits envelope.

-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Re: [sqlite] CREATE INDEX performance

Reply via email to