You should learn how to use "explain query plan".

Indexes are a trade-off between insert speed (slow) and select speed (fast).  
If you have a static database there is no such thing as too many indexes as 
they never would get changed.  Most of us live in the world between.

Generally, don't worry about indexes until you have something that runs slower 
than you need.  Then use 'explain query plan' to see what's happening.

sqlite> CREATE TABLE CostItems(
   ...> RecNo INTEGER PRIMARY KEY AUTOINCREMENT,
   ...> CustomerCode TEXT,
   ...> ProjectName TEXT,
   ...> SupplierName TEXT,
   ...> WhatExactly TEXT,
   ...> CostDate TEXT
   ...> );
sqlite> SELECT * FROM CostItems WHERE CostDate >= '2011-01-01' AND CostDate <= 
'2011-03-01'
   ...> ORDER BY CostDate,CustomerCode;
sqlite> explain query plan SELECT * FROM CostItems WHERE CostDate >= 
'2011-01-01' AND CostDate <= '2011-03-01'
   ...> ORDER BY CostDate,CustomerCode;
sele  order          from  deta
----  -------------  ----  ----
0     0              0     SCAN TABLE CostItems (~111111 rows)
0     0              0     USE TEMP B-TREE FOR ORDER BY
With no indexes we get a SCAN TABLE -- as slow as you can get -- but once the 
data is selected a temporary btree is used for the order by...no index needed.
So we add an index with the item from your WHERE clause.

sqlite> create index index1 on CostItems(CostDate);
sqlite> explain query plan SELECT * FROM CostItems WHERE CostDate >= 
'2011-01-01' AND CostDate <= '2011-03-01'
   ...> ORDER BY CostDate,CustomerCode;
sele  order          from  deta
----  -------------  ----  ----
0     0              0     SEARCH TABLE CostItems USING INDEX index1 
(CostDate>? AND CostDate<?) (~110000 rows)
0     0              0     USE TEMP B-TREE FOR ORDER BY

Problem solved...USING INDEX and USE TEMP B-TREE are both good things.  Not 
gonna' get much faster using more indexes.

For your 2nd query
sqlite> explain query plan SELECT * FROM CostItems WHERE CustomerCode = 'X' AND 
ProjectName='P' ORDER BY CostDate;
sele  order          from  deta
----  -------------  ----  ----
0     0              0     SCAN TABLE CostItems USING INDEX index1 (~10000 rows)
Then INDEX referred to here is the ORDER BY...not real clear that's what's 
happening...and not real good..
Can be improved with the AND items in your WHERE clause
sqlite> create index index2 on CostItems(CustomerCode,ProjectName);
sqlite> explain query plan SELECT * FROM CostItems WHERE CustomerCode = 'X' AND 
ProjectName='P' ORDER BY CostDate;
sele  order          from  deta
----  -------------  ----  ----
0     0              0     SEARCH TABLE CostItems USING INDEX index2 
(CustomerCode=? AND ProjectName=?) (~9 rows)
0     0              0     USE TEMP B-TREE FOR ORDER BY

When you see all your WHERE clause items in the USING INDEX reference and no 
SCAN TABLES you're doing good.


Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate



________________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of J Trahair [j.trah...@foreversoftware.co.uk]
Sent: Monday, March 07, 2011 5:07 AM
To: sqlite-users@sqlite.org
Subject: EXT :[sqlite] Indexing - a test example

I have looked at some information on indexing, and watched a 44 minute 
presentation by Dr Richard Hipp.

I looked at the project I am converting to SQLite, and I have identited all the 
SELECTs for all the tables. They do not fall easily into the 'SELECT x, y, z 
FROM Table1 WHERE w = 5 AND x = 6 ORDER BY x, y' example.

Picking one of my tables at random (the first one, in fact):
CREATE TABLE CostItems(
RecNo INTEGER PRIMARY KEY AUTOINCREMENT,
CustomerCode TEXT,
ProjectName TEXT,
SupplierName TEXT,
WhatExactly TEXT,
CostDate TEXT
etc.
);

I have 2 SELECTs for this table:
Find all cost items between two dates and present in groups of date, and in 
order of Customer Code within any date:
"SELECT * FROM CostItems WHERE CostDate >= '" & Format(dateStartDate, 
"yyyy-MM-dd") & "' AND CostDate <= '" & Format(dateEndDate, "yyyy-MM-dd") & "' 
ORDER BY CostDate, CustomerCode"

and
Find cost items for one of a customer's projects for all dates and present in 
date order:
"SELECT * FROM CostItems WHERE CustomerCode = '" & gstrCustomerCodeOnly & "' 
AND ProjectName = '" & ComboBox2.Text & "' ORDER BY CostDate"

I am proposing therefore to have 2 indexes:

CREATE INDEX idxCostItems1 ON CostItems(CostDate,CustomerCode);
and
CREATE INDEX idxCostItems1 ON CostItems(CustomerCode,CostDate);

Questions:
1. Do the differing orders of fields in the 2 SELECTs require me to use 2 
indexes as above?
2. In the event that other SELECTs for other tables turn out to be more complex 
than these two (which they are), would a maximum of (say) 4 indexes per table 
be too many?

Thanks in advance.

Jonathan Trahair
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to