Re: [sqlite] Optimizing a query with range comparison.

2011-12-28 Thread Bo Peng
Hi, Everyone,

I tried to use btree but I only noticed a slight increase of
performance. I guess this module is better suited for more complex
boundaries and does not really help one-dimensional searches that I am
running. Because I am not sure if btree works for all python/sqlite
installations, I eventually gave up this approach.

I also tried many combinations of join, where, indexes and order of
tables. They all yield similar performance after I ANALYZE the tables.
This again proves sqlite's ability to use the best strategies for a
query, regardless of how it is written.

I also tried Don V Nielsen's method to expand the ranges, but this
does not work either because my ranges are large and there will be
billions of positions inside these ranges.

However, the last attempt motivated me to use a binning method. More
specifically, I

1. add a bin field to tableA to let bin=pos/1, which groups
positions in bins of size 1.

2. add a separate tableC with (bin INT, range_id INT) to store all
bins each range covers. More specifically, for each tange in tableB,
sbin=start/1, ebin=end/1, I insert sbin, sbin+1, sbin+2,...,
ebin+1 to tableC with the rowid of the range.

3. Index bin and range_id of tableC.

4. Change the query from

INSERT INTO ids SELECT id FROM tableA, tableB WHERE pos BETWEEN start AND end;

to

INSERT INTO ids SELECT id FROM tableA, tableB, tableC WHERE
tableA.bin=tableC.bin AND tableC.range_id=tableB.rowid AND pos BETWEEN
start AND end;

Using this method, a query that needed more than 40min to execute
could now finish in 2 minutes. The major change here is that for each
position, tableC contains a small number of ranges that might cover
it. Instead of searching all ranges, the new query only searches
ranges specified in tableC, which significantly boost the performance.

I am very happy that I finally find a solution for my problem and I
appreciate all the responses from the list. Thank you!

Bo

On Tue, Dec 27, 2011 at 11:57 AM, Bo Peng  wrote:
> I will report back my if I can use this module
> to optimize my query.
>
> Thanks,
> Bo
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Bo Peng
On Tue, Dec 27, 2011 at 10:47 AM, Igor Tandetnik  wrote:
> If you need to do this with any regularity, you should look at R-Tree
> module:
>
> http://www.sqlite.org/rtree.html

I do have a lot of range-based queries and rtree seems to be a perfect
solution for my problem. I am using the sqlite module from Python and
a good news is that, at least on Mac, RTREE is enabled by default
(python 2.7, Mac/BuildScript/build-installer.py has
-DSQLITE_ENABLE_RTREE). I will report back my if I can use this module
to optimize my query.

Thanks,
Bo
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Simon Slavin

On 27 Dec 2011, at 5:36pm, Igor Tandetnik wrote:

> On 12/27/2011 12:17 PM, Bo Peng wrote:
>> sqlite selects all ids before it uses B-TREE for DISTINCT. Is there a
>> way to tell sqlite to return an id when it founds the first range that
>> the id falls into?
> 
> Without a temporary set to store the IDs it has already retrieved, how do you 
> expect SQLite to figure out whether a given range is in fact the first one a 
> given ID falls into?

Yeah.  You do need to store the results.  If you use

INSERT OR IGNORE

(or some equivalent way of keeping only the first found result) you can kind of 
hack it by relying on how SQLite uses indexes, but it is a hack and might fail 
in some future version of SQLite.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Igor Tandetnik

On 12/27/2011 12:17 PM, Bo Peng wrote:

sqlite selects all ids before it uses B-TREE for DISTINCT. Is there a
way to tell sqlite to return an id when it founds the first range that
the id falls into?


Without a temporary set to store the IDs it has already retrieved, how 
do you expect SQLite to figure out whether a given range is in fact the 
first one a given ID falls into?

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Bo Peng
>> Try using a JOIN instead.  In fact, try both ways around:
>>
>> DELETE FROM ids;
>> INSERT INTO ids SELECT tableA.id FROM tableA JOIN tableB ON pos BETWEEN 
>> start AND end;
>>
>> then try
>>
>> DELETE FROM ids;
>> INSERT INTO ids SELECT tableA.id FROM tableB JOIN tableA ON pos BETWEEN 
>> start AND end;
>>
>> Which one is faster depends on some aspects about your data and it's easier 
>> for you to test it than for me to guess.
>
> If these two don't behave identically to each other and don't behave
> identically to the original query then there's bug in SQLite. With
> inner join it shouldn't matter for optimizer which form your query is
> written in.

Thank everyone for your quick replies. I trimmed down my tables to
have 10,000 and 1000 rows respectively, and re-create a test database
test1_fresh.DB WITHOUT analyze it.

bpeng@bp8:annoDB % cp test1_fresh.db test1.db
bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select
tableA.id from tableA join tableB on pos between start and end;'

real0m28.026s
user0m27.994s
sys 0m0.016s
bpeng@bp8:annoDB % sqlite3 test1.db 'select count(*) from ids;'
32486
bpeng@bp8:annoDB % cp test1_fresh.db test1.db
bpeng@bp8:annoDB % time sqlite3 test1.db 'insert into ids select
tableA.id from tableB join tableA on pos between start and end;'

real0m0.085s
user0m0.061s
sys 0m0.010s
bpeng@bp8:annoDB % sqlite3 test1.db 'select count(*) from ids;'
32486

To my surprise, the order of join has a significant impact on the
performance of query. The explain command shows why:

bpeng@bp8:annoDB % sqlite3 test1.db 'explain query plan insert into
ids select tableA.id from tableA join tableB on pos between start and
end;'
0|0|0|SCAN TABLE tableA (~100 rows)
0|1|1|SEARCH TABLE tableB USING COVERING INDEX tableB_idx (start? AND pos? AND pos? AND poshttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Simon Slavin

On 27 Dec 2011, at 4:46pm, Pavel Ivanov wrote:

>>> INSERT INTO ids SELECT id FROM tableA, tableB WHERE pos BETWEEN start AND 
>>> end;
>> 
>> Try using a JOIN instead.  In fact, try both ways around:
>> 
>> DELETE FROM ids;
>> INSERT INTO ids SELECT tableA.id FROM tableA JOIN tableB ON pos BETWEEN 
>> start AND end;
>> 
>> then try
>> 
>> DELETE FROM ids;
>> INSERT INTO ids SELECT tableA.id FROM tableB JOIN tableA ON pos BETWEEN 
>> start AND end;
>> 
>> Which one is faster depends on some aspects about your data and it's easier 
>> for you to test it than for me to guess.
> 
> If these two don't behave identically to each other and don't behave
> identically to the original query then there's bug in SQLite. With
> inner join it shouldn't matter for optimizer which form your query is
> written in.

After an 'ANALYZE', perhaps.  But without it my two forms give different 
'EXPLAIN QUERY PLAN' results to one-another, though one of them is the same as 
the OP's version.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Igor Tandetnik

On 12/27/2011 10:39 AM, Bo Peng wrote:

The schema of my test tables is as follows:

CREATE TABLE tableA (id INT, pos INT);
CREATE TABLE tableB (start INT, end INT);

CREATE INDEX tableA_idx on tableA (pos asc);
CREATE INDEX tableB_idx on tableB (start asc, end asc);

CREATE TABLE ids (id INT);

tableA saves position of about 8 million objects, and table B saves
about 40 thousand ranges. I need to find out all ids in tableA that
falls into one of the ranges in tableB, and insert the results into
table ids.


If you need to do this with any regularity, you should look at R-Tree 
module:


http://www.sqlite.org/rtree.html


I am using a query

INSERT INTO ids SELECT id FROM tableA, tableB WHERE pos BETWEEN start AND end;

with indexes on tableA.pos and tableB.start, tableB.end (combined),
this query takes hours to execute.


An index on tableB(start, end) can only be used to satisfy one condition 
(pos >= start). After that, it's a linear scan.



Is there anyway to optimize this
query? My understanding is that, if the query takes each pos and
compare it to all ranges, it will be slow. If it takes each range and
get all pos fall into the range, the query will be much faster.


Try changing the order of tables in the FROM clause:

SELECT id FROM tableB, tableA WHERE pos BETWEEN start AND end;

All other things being equal (and here SQLite has no reason to believe 
they are not), SQLite tends to scan on the left hand side of the join, 
and search on the right hand side.


Alternatively, drop an index on TableB(start, end). It doesn't help much 
anyway, and without it, things are no longer equal and SQLite should 
choose the plan you want.


Note that, unless you know that ranges don't overlap, you may be getting 
duplicate IDs. You may want to change your query to


SELECT DISTINCT id FROM tableB, tableA WHERE pos BETWEEN start AND end;


I have
tried to 'EXPLAIN' the query but I do not understand the output


Use EXPLAIN QUERY PLAN instead. This produces a human-readable summary 
of the plan.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Pavel Ivanov
>> INSERT INTO ids SELECT id FROM tableA, tableB WHERE pos BETWEEN start AND 
>> end;
>
> Try using a JOIN instead.  In fact, try both ways around:
>
> DELETE FROM ids;
> INSERT INTO ids SELECT tableA.id FROM tableA JOIN tableB ON pos BETWEEN start 
> AND end;
>
> then try
>
> DELETE FROM ids;
> INSERT INTO ids SELECT tableA.id FROM tableB JOIN tableA ON pos BETWEEN start 
> AND end;
>
> Which one is faster depends on some aspects about your data and it's easier 
> for you to test it than for me to guess.

If these two don't behave identically to each other and don't behave
identically to the original query then there's bug in SQLite. With
inner join it shouldn't matter for optimizer which form your query is
written in.


For OP: please issue "EXPLAIN QUERY PLAN" instead of "EXPLAIN" on your
query. It will give more understandable information on how SQLite
processes your query.


Pavel


On Tue, Dec 27, 2011 at 11:39 AM, Simon Slavin  wrote:
>
> On 27 Dec 2011, at 3:39pm, Bo Peng wrote:
>
>> The schema of my test tables is as follows:
>>
>> CREATE TABLE tableA (id INT, pos INT);
>> CREATE TABLE tableB (start INT, end INT);
>>
>> CREATE INDEX tableA_idx on tableA (pos asc);
>> CREATE INDEX tableB_idx on tableB (start asc, end asc);
>>
>> CREATE TABLE ids (id INT);
>
> First, thanks a lot for posting that, which saves us all a huge amount of 
> guessing.
>
>> tableA saves position of about 8 million objects, and table B saves
>> about 40 thousand ranges. I need to find out all ids in tableA that
>> falls into one of the ranges in tableB, and insert the results into
>> table ids. I am using a query
>
> So you don't care how many ranges in tableB an object falls into, you just 
> want it to appear once ?  Or your data is structures so that ranges don't 
> overlap ?  Either way, what you really want for ids is something more like
>
> CREATE TABLE ids (id INTEGER PRIMARY KEY, ON CONFLICT IGNORE);
>
> To understand this better, read
>
> 
>
>> INSERT INTO ids SELECT id FROM tableA, tableB WHERE pos BETWEEN start AND 
>> end;
>
> Try using a JOIN instead.  In fact, try both ways around:
>
> DELETE FROM ids;
> INSERT INTO ids SELECT tableA.id FROM tableA JOIN tableB ON pos BETWEEN start 
> AND end;
>
> then try
>
> DELETE FROM ids;
> INSERT INTO ids SELECT tableA.id FROM tableB JOIN tableA ON pos BETWEEN start 
> AND end;
>
> Which one is faster depends on some aspects about your data and it's easier 
> for you to test it than for me to guess.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Simon Slavin

On 27 Dec 2011, at 3:39pm, Bo Peng wrote:

> The schema of my test tables is as follows:
> 
> CREATE TABLE tableA (id INT, pos INT);
> CREATE TABLE tableB (start INT, end INT);
> 
> CREATE INDEX tableA_idx on tableA (pos asc);
> CREATE INDEX tableB_idx on tableB (start asc, end asc);
> 
> CREATE TABLE ids (id INT);

First, thanks a lot for posting that, which saves us all a huge amount of 
guessing.

> tableA saves position of about 8 million objects, and table B saves
> about 40 thousand ranges. I need to find out all ids in tableA that
> falls into one of the ranges in tableB, and insert the results into
> table ids. I am using a query

So you don't care how many ranges in tableB an object falls into, you just want 
it to appear once ?  Or your data is structures so that ranges don't overlap ?  
Either way, what you really want for ids is something more like

CREATE TABLE ids (id INTEGER PRIMARY KEY, ON CONFLICT IGNORE);

To understand this better, read



> INSERT INTO ids SELECT id FROM tableA, tableB WHERE pos BETWEEN start AND end;

Try using a JOIN instead.  In fact, try both ways around:

DELETE FROM ids;
INSERT INTO ids SELECT tableA.id FROM tableA JOIN tableB ON pos BETWEEN start 
AND end;

then try

DELETE FROM ids;
INSERT INTO ids SELECT tableA.id FROM tableB JOIN tableA ON pos BETWEEN start 
AND end;

Which one is faster depends on some aspects about your data and it's easier for 
you to test it than for me to guess.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing a query with range comparison.

2011-12-27 Thread Don V Nielsen
I do something similar, where the ranges are zip codes.  However, my tableb
is arranged vertically with one key (zip code) and one value (geographic
zone).  I would then join the two tables using the zip code, rather than
trying to identify the zip code within a range of zip codes in tableb.
Matching 3 million tablea rows to 49,000 tableb rows takes 90 seconds, I
think.


create tablea (id int, pos int);
create tableb (pos int, ?? int);  /* not sure what is represented by index
position of range */

create table ids as
select ?? from tableb b join tablea a on b.pos = a.pos;


On Tue, Dec 27, 2011 at 9:39 AM, Bo Peng  wrote:

> Dear Sqlite experts,
>
> The schema of my test tables is as follows:
>
> CREATE TABLE tableA (id INT, pos INT);
> CREATE TABLE tableB (start INT, end INT);
>
> CREATE INDEX tableA_idx on tableA (pos asc);
> CREATE INDEX tableB_idx on tableB (start asc, end asc);
>
> CREATE TABLE ids (id INT);
>
> tableA saves position of about 8 million objects, and table B saves
> about 40 thousand ranges. I need to find out all ids in tableA that
> falls into one of the ranges in tableB, and insert the results into
> table ids. I am using a query
>
> INSERT INTO ids SELECT id FROM tableA, tableB WHERE pos BETWEEN start AND
> end;
>
> with indexes on tableA.pos and tableB.start, tableB.end (combined),
> this query takes hours to execute. Is there anyway to optimize this
> query? My understanding is that, if the query takes each pos and
> compare it to all ranges, it will be slow. If it takes each range and
> get all pos fall into the range, the query will be much faster. I have
> tried to 'EXPLAIN' the query but I do not understand the output
> because it looks different from what is described in
> http://www.sqlite.org/eqp.html. I will appreciate it if someone can
> tell me what sqlite is doing for this query.
>
> > explain select id from tableA, tableB where pos between start and end;
> 0|Trace|0|0|0||00|
> 1|Goto|0|26|0||00|
> 2|OpenRead|1|2|0|2|00|
> 3|OpenRead|0|1446|0|2|00|
> 4|OpenRead|2|133259|0|keyinfo(1,BINARY)|00|
> 5|Rewind|1|22|0||00|
> 6|Column|1|0|1||00|
> 7|IsNull|1|21|0||00|
> 8|Affinity|1|1|0|d|00|
> 9|SeekGe|2|21|1|1|00|
> 10|Column|1|1|1||00|
> 11|IsNull|1|21|0||00|
> 12|Affinity|1|1|0|d|00|
> 13|IdxGE|2|21|1|1|01|
> 14|Column|2|0|2||00|
> 15|IsNull|2|20|0||00|
> 16|IdxRowid|2|2|0||00|
> 17|Seek|0|2|0||00|
> 18|Column|0|0|3||00|
> 19|ResultRow|3|1|0||00|
> 20|Next|2|13|0||00|
> 21|Next|1|6|0||01|
> 22|Close|1|0|0||00|
> 23|Close|0|0|0||00|
> 24|Close|2|0|0||00|
> 25|Halt|0|0|0||00|
> 26|Transaction|0|0|0||00|
> 27|VerifyCookie|0|6|0||00|
> 28|TableLock|0|2|0|tableB|00|
> 29|TableLock|0|1446|0|tableA|00|
> 30|Goto|0|2|0||00|
>
> Many thanks in advance,
> Bo
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] optimizing a query with ORDER BY and LIMIT

2007-07-12 Thread Joe Wilson
> CREATE INDEX dpi5 on device_perf_interval(
>   interval_end_date,
>   interval_type,
>   interval_duration
> );
> 
> explain query plan
>  SELECT d.device_type, dpi.* 
>  FROM device d, device_perf_interval dpi
>  WHERE d.device_id=dpi.device_id AND 
>dpi.interval_type=1 AND
>dpi.interval_duration=300
>  ORDER BY dpi.interval_end_date 
>  LIMIT 5;
> 0|1|TABLE device_perf_interval AS dpi WITH INDEX dpi5 ORDER BY
> 1|0|TABLE device AS d USING PRIMARY KEY

Actually, I wouldn't recommend using the above index.
Despite the use of the dpi5 index in ORDER BY, this query would 
likely be much slower than the previous dpi1 index query plan 
if just a small subset of the device_perf_interval table need 
be scanned.

Don't obsess with the output of EXPLAIN QUERY PLAN. Use whatever
set of indexes is fastest for most of your queries for your dataset 
within reason. Too many indexes bloat the size of your database
and slow down insert speed.



 

The fish are biting. 
Get more visitors on your site using Yahoo! Search Marketing.
http://searchmarketing.yahoo.com/arp/sponsoredsearch_v2.php

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] optimizing a query with ORDER BY and LIMIT

2007-07-11 Thread Joe Wilson
> SELECT d.device_type, dpi.* FROM device d, device_perf_interval dpi
>   WHERE d.device_id=dpi.device_id AND dpi.interval_type=1 AND
> dpi.interval_duration=300
>   ORDER BY dpi.interval_end_date LIMIT ;
> 
> What can I do to speed this up? I tried a third index on interval_end_date
> but can't get SQLite to notice it (at least in EXPLAIN QUERY PLAN output).

-- drop all other indexes on these tables
drop index dpi1;
drop index dpi2;

CREATE INDEX dpi5 on device_perf_interval(
  interval_end_date,
  interval_type,
  interval_duration
);

explain query plan
 SELECT d.device_type, dpi.* 
 FROM device d, device_perf_interval dpi
 WHERE d.device_id=dpi.device_id AND 
   dpi.interval_type=1 AND
   dpi.interval_duration=300
 ORDER BY dpi.interval_end_date 
 LIMIT 5;
0|1|TABLE device_perf_interval AS dpi WITH INDEX dpi5 ORDER BY
1|0|TABLE device AS d USING PRIMARY KEY



   

Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, 
photos & more. 
http://mobile.yahoo.com/go?refer=1GNXIC

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: [inbox] Re: [sqlite] Optimizing a query

2004-01-14 Thread Michael Hunley
At 11:12 AM 1/14/2004 -0500, D. Richard Hipp wrote:

   SELECT count(*) FROM table WHERE col1>'abc' AND col1<'xyz';
In the original query, the result was indeed a count(*) so no
access to the data we required there.  But access to the data
was required in order to evaluate the WHERE clause.  So it is
still O(NlogN).
What is different between his where clause and the one you cite as an 
example that only takes O(N)?  Is it just that in your example col1 is 
(part of) the index?  So, wouldn't Ken be able to do the same, except that 
he needs to step through two indices?  That, it seems to me, is the crux of 
the issue.

I don't mean to belabor this issue, but I am curious as to the workings.

michael 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] Re: [inbox] Re: [sqlite] Optimizing a query

2004-01-14 Thread D. Richard Hipp
Michael Hunley wrote:
At 10:37 AM 1/14/2004 -0500, D. Richard Hipp wrote:

In some cases you can avoid the O(logN) lookup of the
main table entry and just use the index.  For example:
   SELECT count(*) FROM table WHERE col1>'abc' AND col1<'xyz';


Wasn't that the original question, Ken?  Except it was a count(*) on a 
JOIN.

Dr Hipp, would the same optimization apply if it is stepping through two 
indices?  In which case Ken should see a huge speed improvement to his 
original question by adding an index and updating to the latest SQLite 
(after you release 2.8.11, that is ;).

In the original query, the result was indeed a count(*) so no
access to the data we required there.  But access to the data
was required in order to evaluate the WHERE clause.  So it is
still O(NlogN).
But it is also still really fast.  Do this:  Run the same query
on SQLite, PostgreSQL, MySQL, and Oracle and see which one
finishes first.  I've not tried it so I don't know what will
happen.  But I'm guessing SQLite will be the clear winner.
Somebody please correct me if my guess is wrong.
--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


[sqlite] Re: [inbox] Re: [sqlite] Optimizing a query

2004-01-14 Thread Michael Hunley
At 10:37 AM 1/14/2004 -0500, D. Richard Hipp wrote:
In some cases you can avoid the O(logN) lookup of the
main table entry and just use the index.  For example:
   SELECT count(*) FROM table WHERE col1>'abc' AND col1<'xyz';
Wasn't that the original question, Ken?  Except it was a count(*) on a JOIN.

Dr Hipp, would the same optimization apply if it is stepping through two 
indices?  In which case Ken should see a huge speed improvement to his 
original question by adding an index and updating to the latest SQLite 
(after you release 2.8.11, that is ;).

Just trying to keep track.

michael 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] Optimizing a query

2004-01-14 Thread D. Richard Hipp
Williams, Ken wrote:

So, no way to make it O(N)?  If the two indexes could be 
Retrieving a single record from a BTree is an O(logN) operation.
Doing so N times gives O(NlogN).


Oh, I thought it was also possible to step straight through an index,

You can step straight through the index in linear time.  But
for each index entry you encounter, you have to look up a
record in the main table in order to get the data.  It's the
second step, the table lookup, that takes O(logN).
In some cases you can avoid the O(logN) lookup of the
main table entry and just use the index.  For example:
   SELECT count(*) FROM table WHERE col1>'abc' AND col1<'xyz';

In this case, since you are never using any data from the
table, just counting entries, it is sufficient to step
through through the index and the operation runs in
linear time.  As recently as 2 weeks ago, SQLite would
go ahead and look up the main table entry even though
the data was needed.  That extra lookup was optimized
out with check-in [1165].
   http://www.sqlite.org/cvstrac/chngview?cn=1165

After check-in [1165], queries such as the above are about
10x faster on large tables.
--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: [sqlite] Optimizing a query

2004-01-14 Thread Williams, Ken

> -Original Message-
> From: D. Richard Hipp [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 14, 2004 9:22 AM
> To: Williams, Ken
> Cc: [EMAIL PROTECTED]
> Subject: Re: [sqlite] Optimizing a query
> 
> 
> Williams, Ken wrote:
> > 
> > So, no way to make it O(N)?  If the two indexes could be 
> iterated together,
> > as in the following pseudocode, it would seem to be an O(N) 
> operation.
> > 
> 
> Retrieving a single record from a BTree is an O(logN) operation.
> Doing so N times gives O(NlogN).

Oh, I thought it was also possible to step straight through an index, but I
guess I was mistaken.

 -Ken

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [sqlite] Optimizing a query

2004-01-14 Thread D. Richard Hipp
Williams, Ken wrote:
CREATE INDEX whatever ON output(verb_id,tag);

That will make it O(NlogN) instead of O(N**2).


So, no way to make it O(N)?  If the two indexes could be iterated together,
as in the following pseudocode, it would seem to be an O(N) operation.
Retrieving a single record from a BTree is an O(logN) operation.
Doing so N times gives O(NlogN).
O(N) would be possible if you were to step straight through
both tables in native order (INTEGER PRIMARY KEY order) without
having to do a search for each record using an index.  But
that would only work, of course, if the join was on the
INTEGER PRIMARY KEY of both tables.  And even then, SQLite
doesn't do that particular optimization.
--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: [sqlite] Optimizing a query

2004-01-14 Thread Williams, Ken


> -Original Message-
> From: D. Richard Hipp [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 13, 2004 6:17 PM
> To: [EMAIL PROTECTED]
> Subject: Re: [sqlite] Optimizing a query
> 
> 
> > Can anyone suggest a good way to optimize the following query?
> > 
> > SELECT count(*) FROM propositions p, output o
> >  WHERE p.verb_id=o.verb_id
> >AND p.tag=o.tag
> >AND (p.stop!=o.stop OR p.start!=o.start);
> > 
> 
> CREATE INDEX whatever ON output(verb_id,tag);
> 
> That will make it O(NlogN) instead of O(N**2).

So, no way to make it O(N)?  If the two indexes could be iterated together,
as in the following pseudocode, it would seem to be an O(N) operation.

  P_INDEX:
  while ($p_entry = p_index.next) {
while ($o_entry = o_index.next) {
  if ($o_entry == $p_entry) {
...do the rest of the query criteria...
  } elsif ($o_entry > $p_entry {
next P_INDEX;
  }
}
  }

 -Ken


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[sqlite] Re: [inbox] Re: [sqlite] Optimizing a query

2004-01-13 Thread Michael Hunley
At 07:17 PM 1/13/2004 -0500, D. Richard Hipp wrote:
Actually, SQLite implements JOIN USING by translating the
USING clausing into some extra WHERE clause terms.  It does
the same with NATURAL JOIN and JOIN ON. So while those
constructs might be helpful to the human reader, they don't
really make any difference to SQLite's query optimizer.
Thanks for the specifics, very good to know.

My suggestion was based on the idea that, while it may be equivalent now, 
you could roll out a new version tomorrow with optimizations for JOINS.  It 
is far less likely to roll out a new optimizer that would catch 
convolutions of where clauses that could be optimized.  The latter would be 
more universally useful, but immensely harder to write, I expect.

thanks again.
michael 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [sqlite] Optimizing a query

2004-01-13 Thread D. Richard Hipp
Can anyone suggest a good way to optimize the following query?

SELECT count(*) FROM propositions p, output o
 WHERE p.verb_id=o.verb_id
   AND p.tag=o.tag
   AND (p.stop!=o.stop OR p.start!=o.start);
CREATE INDEX whatever ON output(verb_id,tag);

That will make it O(NlogN) instead of O(N**2).

>
> I don't think this will be much help and is very implementation specific, I
> expect, but Your first two where clauses are effectively a "JOIN USING
> verb_id,tag", which has a much better chance of using any built in
> optimizations for cross indexing.
>
Actually, SQLite implements JOIN USING by translating the
USING clausing into some extra WHERE clause terms.  It does
the same with NATURAL JOIN and JOIN ON. So while those
constructs might be helpful to the human reader, they don't
really make any difference to SQLite's query optimizer.
The index should solve your problem.

--
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]