RE: [sqlite] Index Usage

2004-10-28 Thread Christian Jensen
I noticed that you use { instead of (

What do those do?
 

-Original Message-
From: Ulrik Petersen [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 28, 2004 10:28 AM
To: [EMAIL PROTECTED]
Subject: Re: [sqlite] Index Usage

William,

William Hachfeld wrote:

>Hi,
>
>Have a question for everyone regarding index usage in SQLite... Say 
>that I have the following database schema:
>
>CREATE TABLE Example (
>   id INTEGER PRIMARY KEY,
>   grp INTEGER,
>   begin INTEGER,
>   end INTEGER
>);
>
>and I want to perform the following query:
>
>SELECT id FROM Example WHERE grp=g AND x < end AND y >= begin;
>
>on a large number of rows (say around one million) for some group 'g' 
>and an interval '[x, y)'. And, of course, with the assumption that (end

>> begin) for all rows. Will my query performance be substantially 
>improved by creating an index such as:
>
>CREATE INDEX MultiColumnIndex ON Example (grp, begin, end)
>
>or will the operators "<" and ">=" prohibit SQLite from using the
index?
>  
>
I have almost the same table in my linguistic database, Emdros
(http://emdros.org).  What I have found that works best for me is to put
an index on what you call "begin" (not a double index), then do

SELECT id
FROM Example
WHERE grp = g
AND begin BETWEEN {x} AND {y-1}
AND end BETWEEN {x} AND {y-1}.

For some strange reason, this is about 5% faster than what you were
proposing.  It could be because SQLite does not know that begin <= end,
and so can't make optimizations about when to stop looking.


>Also, I'm aware that SQLite supports multi-column indicies, but not the

>use of multiple indicies per query. Is it possible to get around the 
>later restriction by expressing my above query using a sub-select:
>
>SELECT id FROM (SELECT * FROM Example WHERE grp=g)
>   WHERE x < end AND y >= begin;
>
>and then creating the following indicies instead:
>
>CREATE INDEX GroupIndex ON Example (group)
>CREATE INDEX IntervalIndex ON Example (begin, end)
>
>And if so, can any generalizations be made regarding the performance of

>using the two indicies versus the first, single, index? How about disk
usage?
>  
>
I cannot comment on this, except that I've run EXPLAIN on my versions of
the above queries, and found that SQLite wouldn't consult the "end" part
of the (begin,end) index.  Instead, it would consult the "end" part of
the table column, and then only use the "begin" part of the index.  At
least that's how I understood the EXPLAIN output, but I may be wrong.

The upshot of the above is that you can save diskspace by not doing the
double index, and only indexing "begin", since for these queries, the
"end" part is redundant (i.e., not used) in the index.

Cheers,

Ulrik

--
Ulrik Petersen, MA, B.Sc.
Emdros -- the text database engine for analyzed or annotated text
http://emdros.org/






RE: [sqlite] Index Usage

2004-10-27 Thread Christian Jensen
Great Question! I am eager to hear the response! I use a ton of JOINs
and INTERSECTs. Coverage on that topic would be great too! 

-Original Message-
From: William Hachfeld [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 27, 2004 9:34 AM
To: [EMAIL PROTECTED]
Subject: [sqlite] Index Usage


Hi,

Have a question for everyone regarding index usage in SQLite... Say that
I have the following database schema:

CREATE TABLE Example (
id INTEGER PRIMARY KEY,
grp INTEGER,
begin INTEGER,
end INTEGER
);

and I want to perform the following query:

SELECT id FROM Example WHERE grp=g AND x < end AND y >= begin;

on a large number of rows (say around one million) for some group 'g'
and an interval '[x, y)'. And, of course, with the assumption that (end
> begin) for all rows. Will my query performance be substantially
improved by creating an index such as:

CREATE INDEX MultiColumnIndex ON Example (grp, begin, end)

or will the operators "<" and ">=" prohibit SQLite from using the index?

Also, I'm aware that SQLite supports multi-column indicies, but not the
use of multiple indicies per query. Is it possible to get around the
later restriction by expressing my above query using a sub-select:

SELECT id FROM (SELECT * FROM Example WHERE grp=g)
WHERE x < end AND y >= begin;

and then creating the following indicies instead:

CREATE INDEX GroupIndex ON Example (group)
CREATE INDEX IntervalIndex ON Example (begin, end)

And if so, can any generalizations be made regarding the performance of
using the two indicies versus the first, single, index? How about disk
usage?

Thanks in advance for any information regarding the above!

-- William Hachfeld