Re: [sqlite] Optmize queries on ranges

2018-10-31 Thread Jens Alfke


> On Oct 25, 2018, at 10:45 AM, Keith Medcalf  wrote:
> 
> There is an extra column load and compare when using the between version of 
> the query (this is because although the optimization of the index use is the 
> same, the use of x BETWEEN y AND z adds both the y <= x and x <= z checks as 
> where clause tests that are executed within the loop, whereas when using the 
> devolved query (the later form) one of the constraints is used against the 
> index and only the other one is tested.  

This seems like an optimization opportunity … is it already a known issue, to 
be addressed in the query optimizer at some point?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-27 Thread E.Pasma

> Keith Medcalf wrote:
>  .. Am I doing something wrong here ..

No! The query with order by + limit 1 is superior, also in my test. 

Still I am surprised that the rtree extension is available by default
(at least in the sqlite version 3.25 command line)


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-26 Thread Keith Medcalf

You also need to make sure the "no hit" does not degenerate into a table scan.  
RTree works well for this but is overall significantly slower than not using 
RTree since the purpose of RTree is to find the "small number of candidate 
records" that could possibly satisfy the query out of a haystack of records 
(that is, find the candidate needles in the haystack, so that you only need to 
closely examine that small number of candidates to find the needle rather than 
test the whole haystack).  

However, if you know that there can only be one possible record which can 
satisfy the query (ie, there is only one possible needle in the haystack, and 
only one possible candidate, and you can find this candidate directly for 
testing), then the overhead of using RTree where it is not needed exceeds the 
benefits of using it.

I see that the performance of the RTree is significantly slower than the 
equivalent "direct" method.  Am I doing something wrong here or is that 
overhead simply because of the data structures that the RTRee implementation 
must maintain (which are not required in this case).

Without RTree:
>py -3 test.py
Created 10 random ranges in 00:00:00.681118 Creation Rate = 146817 
Ranges/Second
Looked up 1102019 random range values in 00:00:04.598245 Lookup Rate = 239660 
Values/Second
Failure Rate = 257270 Values/Second
Success Rate = 228828 Values/Second

With RTree:
>py -3 test.py --rtree
Created 10 random ranges in 00:00:02.139742 Creation Rate = 46734 
Ranges/Second
Looked up 1100681 random range values in 00:00:13.662556 Lookup Rate = 80561 
Values/Second
Failure Rate = 119874 Values/Second
Success Rate = 65627 Values/Second

And that came from the following test program (in python) where the only 
difference is the SQL statements being used.  Because the ranges are random and 
the lookups are random, the timings given are subject to differences on every 
run, however, the averaged rates are relatively stable given a large number of 
random ranges and query values.

--- test.py ---
from __future__ import absolute_import, division, print_function, 
unicode_literals

import datetime
import random
import sys
import time

import sqlite3

# Convert a value in seconds to HMS format
HMS = lambda t: (datetime.datetime.min + 
datetime.timedelta(seconds=t)).time().isoformat()

# Create constants for the SQL statements we will use

if '--rtree' in sys.argv:
create_sql = 'create virtual table ranges using rtree(id, start, stop, 
+value);'
query_sql = 'select value from ranges where ? between start and stop;'
else:
create_sql = 'create table ranges (start integer primary key, stop integer 
not null, value integer not null);'
query_sql = 'select value from (select stop, value from ranges where start 
<= ?1 order by start desc limit 1) where ?1 <= stop;'

insert_sql = 'insert into ranges (start, stop, value) values (?, ?, ?);'


# Open our database and do not use automagical transactions
db = sqlite3.connect(':memory:', isolation_level=None)

# Create our table
db.execute(create_sql)

# Create our random range data
recs = 10
start = 0
st = time.time()
for cnt in range(recs):
start += random.randint(1, 10)
stop = start + random.randint(1, 10)
value = int((start + stop) / 2)
db.execute(insert_sql, (start, stop, value))
start = stop
stop = stop + random.randint(1, 10)
et = time.time() - st
print('Created', recs, 'random ranges in', HMS(et), 'Creation Rate =', int(recs 
/ et), 'Ranges/Second')

db.execute('analyze;')

# Generate a bunch of random values and perform the range query
eta = 0.0
ets = 0.0
etf = 0.0
fcnt = 0
scnt = 0
tcnt = 0
for i in range(stop):
x = random.randint(0, stop)
lst = time.time()
row = db.execute(query_sql, (x, )).fetchone()
let = time.time() - lst
if row:
value = row[0]
ets += let
scnt += 1
else:
value = None
etf += let
fcnt += 1
eta += let
tcnt += 1
print('Looked up', stop, 'random range values in', HMS(eta), 'Lookup Rate =', 
int(tcnt / eta), 'Values/Second')
print('Failure Rate =', int(fcnt / etf), 'Values/Second')
print('Success Rate =', int(scnt / ets), 'Values/Second')

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of E.Pasma
>Sent: Friday, 26 October, 2018 16:28
>To: SQLite mailing list
>Subject: Re: [sqlite] Optmize queries on ranges
>
>About the rtree extension, which was the first idea.
>
>The extension appears available without any special installation
>option. This is easier than 

Re: [sqlite] Optmize queries on ranges

2018-10-26 Thread E.Pasma
About the rtree extension, which was the first idea.

The extension appears available without any special installation option. This 
is easier than what is mentioned in https://sqlite.org/rtree.html 
 chapter 2: "Compiling The R*Tree Module". 
This chapter may as well be left out?

With test data where the ranges are mostly non-overlapping, the query now runs 
faster than without rtree. Even though both run within a millisecond rtree is 
ten times faster.
With order by and limit the timing remains superior. But this relies on 
strictly non-overlapping ranges.
Below my test script


/* query 1: using rtree built-in extension */
;
create virtual table ranges using rtree(id, minX, maxX, +value);
with r as (select 0 as r union all select r+1 from r where r<100)
insert into ranges (minX, maxX, value) 
select r*10+1,r*10+10,r*10+5 from r
;
select value from ranges where 123456 between minx and maxx
;
123455
Run Time: real 0.000 user 0.000135 sys 0.18

/* query 2: using index on minx+maxx */
drop table ranges
;
create table ranges (minx int, maxx int, value int)
;
with r as (select 0 as r union all select r+1 from r where r<100)
insert into ranges (minX, maxX, value) 
select r*10+1,r*10+10,r*10+5 from r
;
create unique index ranges_minx_maxx on ranges(minx,maxx)
;
select value from ranges where 123456 between minx and maxx
;
123455
Run Time: real 0.002 user 0.001415 sys 0.16

/* query 3: same, assuming non-overlapping ranges */
select value from ranges where 123456 between minx and maxx
order by minx desc limit 1
;
123455
Run Time: real 0.000 user 0.57 sys 0.00

 

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-26 Thread Keith Medcalf

Limit 1 says to stop after returning 1 row.  If the "first row" being searched 
is not the one containing "the answer" then the search will continue until the 
row that does not match the index constraint is hit, after which it is known 
that no answer is possible (without returning a row).


---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of siscia
>Sent: Friday, 26 October, 2018 01:49
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] Optmize queries on ranges
>
>Ok, after the message I thought a little bit more.
>
>And it turns out that in the database the `start`s are not unique how
>they
>should.
>Making them unique, seems to solve the performance problem
>completely.
>
>However, still, I am not sure why the `LIMIT 1` does not help at all.
>
>Can you guys shed some light on this?
>
>Cheers,
>Simone
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-26 Thread Keith Medcalf

Based on your assumptions being correct
 (a) start is unique
 (b) start end ranges do not overlap 

create table ranges
(
  start integer primary key,
  stop  integer not null,
  value integer not null
);

INSERT INTO ranges values (1, 10, 5);
INSERT INTO ranges values (15, 29, 8);
INSERT INTO ranges values (30, 32, 9);

select value
  from (select stop, value
  from ranges
 where start <= ?1
  order by start desc
 limit 1)
 where ?1 <= stop;

If your data does not meet the constraints you have specified then the query 
will not work properly.  The resulting value (if there is one) will be returned 
with a single index lookup and a single comparison.  (Note that you can create 
a covering index on your existing table if you do not want to remake it).

This works as it does because the answer, if there is one, can only be located 
on the row where start <= ?1 (for the biggest numerical value of start) and 
then only if the correspondingly found row also meets the requirement that ?1 
<= stop

Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit 
(AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import apsw
>>> db = apsw.Connection(':memory:')
>>>
>>> create_sql = """create table ranges
... (
...   start integer primary key,
...   stop  integer not null,
...   value integer not null
... );
...
... INSERT INTO ranges values (1, 10, 5);
... INSERT INTO ranges values (15, 29, 8);
... INSERT INTO ranges values (30, 32, 9);
... """
>>>
>>> sql = """select value
...   from (select stop, value
...   from ranges
...  where start <= ?1
...   order by start desc
...  limit 1)
...  where ?1 <= stop;
... """
>>>
>>> db.execute(create_sql)

>>> for row in db.execute('select * from ranges;'):
...  print(row)
...
Row(start=1, stop=10, value=5)
Row(start=15, stop=29, value=8)
Row(start=30, stop=32, value=9)
>>> for i in range(35):
...  for row in db.execute(sql, (i, )):
...   print(i, row)
...
1 Row(value=5)
2 Row(value=5)
3 Row(value=5)
4 Row(value=5)
5 Row(value=5)
6 Row(value=5)
7 Row(value=5)
8 Row(value=5)
9 Row(value=5)
10 Row(value=5)
15 Row(value=8)
16 Row(value=8)
17 Row(value=8)
18 Row(value=8)
19 Row(value=8)
20 Row(value=8)
21 Row(value=8)
22 Row(value=8)
23 Row(value=8)
24 Row(value=8)
25 Row(value=8)
26 Row(value=8)
27 Row(value=8)
28 Row(value=8)
29 Row(value=8)
30 Row(value=9)
31 Row(value=9)
32 Row(value=9)
>>>

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of siscia
>Sent: Friday, 26 October, 2018 01:27
>To: sqlite-users@mailinglists.sqlite.org
>Subject: Re: [sqlite] Optmize queries on ranges
>
>Hi all,
>
>thanks for your suggestions, unfortunately, I already tried all of
>them,
>except for the rtrees.
>
>Actually, my request for help wasn't complete.
>
>The ranges I am storing in the table are not overlapping.
>
>To make an example in SQL.
>
>The following can be in the dataset:
>INSERT INTO ranges(1, 10, 5);
>INSERT INTO ranges(15, 29, 8);
>INSERT INTO ranges(30, 32, 9);
>
>However, there will never be something like:
>INSERT INTO ranges(1, 10, 5);
>INSERT INTO ranges(5, 15, 8); -- impossible, overlap with the first
>one
>
>So all the queries are actually:
>
>`SELECT value FROM ranges WHERE (? BETWEEN start AND end) LIMIT 1`
>
>Now suppose there is an index on start and so we are looking for
>(start < ?)
>
>What happen could be that we begin from (start = 0) and move up to
>(start <=
>?) which is basically a full scan.
>Or we could begin from (start <= ?) and move down towards (start = 0)
>which
>would be optimal.
>
>I am afraid that we are hitting the first case, which really is a
>pity.
>
>Is there a way to suggest to the index how to work on these cases?
>
>Cheers,
>
>Simone
>
>
>
>
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-26 Thread Olivier Mascia
> Le 26 oct. 2018 à 09:27, siscia  a écrit :
> 
> thanks for your suggestions, unfortunately, I already tried all of them,
> except for the rtrees.
> 
> Actually, my request for help wasn't complete.
> 
> The ranges I am storing in the table are not overlapping.
> 
> To make an example in SQL.
> 
> The following can be in the dataset:
> INSERT INTO ranges(1, 10, 5);
> INSERT INTO ranges(15, 29, 8);
> INSERT INTO ranges(30, 32, 9);
> 
> However, there will never be something like:
> INSERT INTO ranges(1, 10, 5);
> INSERT INTO ranges(5, 15, 8); -- impossible, overlap with the first one

What if the data was structured differently?

> CREATE TABLE ranges (
>start int,
>end int,
>value int,
> );

becomes:

CREATE TABLE ranges (
   start int,
   range int,   -- on the basis that start + range = end
   value int,
);

> INSERT INTO ranges(1, 10, 5);
> INSERT INTO ranges(15, 29, 8);
> INSERT INTO ranges(30, 32, 9);

becomes:

INSERT INTO ranges(1, 9, 5);
INSERT INTO ranges(15, 14, 8);
INSERT INTO ranges(30, 2, 9);

and you have:

CREATE INDEX idx_ranges on ranges(start);

> select value from ranges
> where (? between start and end)

becomes:

SELECT value FROM ranges where (? between start AND start+range);

-- 
Best Regards, Meilleures salutations, Met vriendelijke groeten,
Olivier Mascia


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-26 Thread siscia
Sorry,

I was a little too optimistic.

Making the starts unique does help only for some queries, not for all.

Why?

Cheers,
Simone



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-26 Thread siscia
Ok, after the message I thought a little bit more.

And it turns out that in the database the `start`s are not unique how they
should.
Making them unique, seems to solve the performance problem completely.

However, still, I am not sure why the `LIMIT 1` does not help at all.

Can you guys shed some light on this?

Cheers,
Simone



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-26 Thread Dan Kennedy

On 10/26/2018 02:27 PM, siscia wrote:

Hi all,

thanks for your suggestions, unfortunately, I already tried all of them,
except for the rtrees.

Actually, my request for help wasn't complete.

The ranges I am storing in the table are not overlapping.

To make an example in SQL.

The following can be in the dataset:
INSERT INTO ranges(1, 10, 5);
INSERT INTO ranges(15, 29, 8);
INSERT INTO ranges(30, 32, 9);

However, there will never be something like:
INSERT INTO ranges(1, 10, 5);
INSERT INTO ranges(5, 15, 8); -- impossible, overlap with the first one

So all the queries are actually:

`SELECT value FROM ranges WHERE (? BETWEEN start AND end) LIMIT 1`

Now suppose there is an index on start and so we are looking for (start < ?)

What happen could be that we begin from (start = 0) and move up to (start <=
?) which is basically a full scan.
Or we could begin from (start <= ?) and move down towards (start = 0) which
would be optimal.



In SQL, I guess that is:

  SELECT value FROM ranges WHERE (? BETWEEN start AND end)
  ORDER BY start DESC LIMIT 1

Or, perhaps more efficient for the cases where there is no such range:

  SELECT value FROM (
SELECT value, start, end FROM ranges
WHERE start <= ?
ORDER BY start DESC LIMIT 1
  ) WHERE end >= ?

Dan.




I am afraid that we are hitting the first case, which really is a pity.

Is there a way to suggest to the index how to work on these cases?

Cheers,

Simone







--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-26 Thread siscia
Hi all,

thanks for your suggestions, unfortunately, I already tried all of them,
except for the rtrees.

Actually, my request for help wasn't complete.

The ranges I am storing in the table are not overlapping.

To make an example in SQL.

The following can be in the dataset:
INSERT INTO ranges(1, 10, 5);
INSERT INTO ranges(15, 29, 8);
INSERT INTO ranges(30, 32, 9);

However, there will never be something like:
INSERT INTO ranges(1, 10, 5);
INSERT INTO ranges(5, 15, 8); -- impossible, overlap with the first one

So all the queries are actually:

`SELECT value FROM ranges WHERE (? BETWEEN start AND end) LIMIT 1`

Now suppose there is an index on start and so we are looking for (start < ?)

What happen could be that we begin from (start = 0) and move up to (start <=
?) which is basically a full scan.
Or we could begin from (start <= ?) and move down towards (start = 0) which
would be optimal.

I am afraid that we are hitting the first case, which really is a pity.

Is there a way to suggest to the index how to work on these cases?

Cheers,

Simone



 



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-25 Thread Keith Medcalf




---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


On Thursday, 25 October, 2018 10:48, Dan Kennedy  wrote:
>On 10/25/2018 11:13 PM, siscia wrote:
>>
>> I am facing an interesting optimization problem.
>>
>> I have a table like this:
>>
>> CREATE TABLE ranges (
>> start int,
>> end int,
>> value int,
>> );
>>
>> The query that I am interested in optimizing is "select value from
>ranges
>> where (? between start and end)"
>>
>> The max performance that I was able to get is 250 results/second
>with a
>> covering index on all three columns.

>And so you might be iterating through a very large set of records
>to extract the ones you want.

>R-tree might work for you:
>   https://sqlite.org/rtree.html

create virtual table ranges using rtree(id, minX, maxX, +value);
insert into ranges (minX, maxX, value) 
select x.value - y.value, x.value + y.value, x.value 
  from generate_series as x 
  join generate_series as y 
 where x.start = 1 and x.stop = 1 
   and y.start = 1 and y.stop = 1;
-- data insertion is about 10 times slower
select count(*) from ranges where 25 between minX and maxX;
-- 50254399
-- Run Time: real 3.473 user 3.468750 sys 0.00
select count(*) from ranges where minX <= 25 and 25 <= MaxX;
-- 50254399
-- Run Time: real 3.533 user 3.546875 sys 0.00

Execution time is not significantly quicker, at least for this data ... but the 
"BETWEEN" version is a tad quicker than the devolved version ...





___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-25 Thread Keith Medcalf

On Thursday, 25 October, 2018 10:48, Dan Kennedy  wrote:

>On 10/25/2018 11:13 PM, siscia wrote:

>> Hi all,

>> CREATE TABLE ranges (
>> start int,
>> end int,
>> value int,
>> );

>> The query that I am interested in optimizing is "select value from
>> ranges where (? between start and end)"

>Your query is the same as "start <= ? AND end >= ?". The trouble is
>that SQlite can only use the index to optimize one of "start <= ?" 
>or "end >= ?". 

select value from ranges where ? between start and end;
is *almost* the same as
select value from ranges where start <= ?1 and ?1 <= end;

There is an extra column load and compare when using the between version of the 
query (this is because although the optimization of the index use is the same, 
the use of x BETWEEN y AND z adds both the y <= x and x <= z checks as where 
clause tests that are executed within the loop, whereas when using the devolved 
query (the later form) one of the constraints is used against the index and 
only the other one is tested.  

CREATE TABLE ranges 
(
start integer,
end integer,
value integer
);
create index ranges_index on ranges (start, end, value);
insert into ranges (start, end, value) 
 select x.value - y.value, x.value + y.value, x.value 
   from generate_series as x 
   join generate_series as y 
  where x.start = 1 and x.stop = 1 
and y.start = 1 and y.stop = 1;
-- count of ranges records is  1000
select count(*) from ranges where 25 between start and end;
-- 50254399
-- Run Time: real 4.860 user 4.859375 sys 0.00
select count(*) from ranges where start <= 25 and 25 <= end;
-- 50254399
-- Run Time: real 3.982 user 3.984375 sys 0.00

Thus the extra column load and compare comprises 18% of the time used to 
execute the query.

It is slightly faster in if the table is a without rowid table, but not 
significantly (though the space used is halved).  Note however the overhead of 
using x BETWEEN y AND z -vs- y <= x AND x <= z is now 21% of the query 
execution time (which is probably not a significant difference) ...

create table ranges 
(
 start integer, 
 end integer, 
 value integer, 
 primary key (start, end)
) without rowid;
insert into ranges (start, end, value) 
 select x.value - y.value, x.value + y.value, x.value 
   from generate_series as x 
   join generate_series as y 
  where x.start = 1 and x.stop = 1 
and y.start = 1 and y.stop = 1;
select count(*) from ranges where 25 between start and end;
-- 50254399
-- Run Time: real 4.854 user 4.843750 sys 0.00
select count(*) from ranges where start <= 25 and 25 <= end;
-- 50254399
-- Run Time: real 3.814 user 3.812500 sys 0.00

>And so you might be iterating through a very large set of records
>to extract the ones you want.

>R-tree might work for you:
>
>   https://sqlite.org/rtree.html

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.




___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-25 Thread Simon Slavin
On 25 Oct 2018, at 5:13pm, siscia  wrote:

> CREATE TABLE ranges (
>start int,
>end int,
>value int,
> );
> 
> The query that I am interested in optimizing is
> "select value from ranges
> where (? between start and end)"

First, "END" is a reserved keyword in SQLite.  Your use of it might work right 
now but you may find yourself in trouble later when you introduce a trigger or 
some other construction.  I suggest you replace it as a column name with 
"finish" or perhaps both ends with "low" and "high".  See



As an experiment to figure out a good optimization for your search problem, try 
the following:

1. Create two indexes on that table, one on (low,high,value), the other on 
(high,low,value).
2. Ensure that your 'ranges' table has plausible data in, both the number of 
rows and the contents of those rows must be similar to what you expect the 
table to contain in normal use.
3. Run the SQL command "ANALYZE".  This tells SQLite to look at the table and 
figure out good ways to run future searches and sorts.  The results of this are 
stored in the database.  You will not need to run the command again even if you 
change the content of the database.

Now run your query again and see whether the timing has changed.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optmize queries on ranges

2018-10-25 Thread Dan Kennedy

On 10/25/2018 11:13 PM, siscia wrote:

Hi all,

I am facing an interesting optimization problem.

I have a table like this:

CREATE TABLE ranges (
start int,
end int,
value int,
);

The query that I am interested in optimizing is "select value from ranges
where (? between start and end)"

The max performance that I was able to get is 250 results/second with a
covering index on all three columns.

Now, if I do a more classic "select value from ranges where start = ?" this
provides 14 results/second

So I am pretty sure that I am doing something quite wrong.

Do you guys have any idea of what it could be? How can I obtain better
results?


Your query is the same as "start <= ? AND end >= ?". The trouble is that 
SQlite can only use the index to optimize one of "start <= ?" or "end >= 
?". And so you might be iterating through a very large set of records to 
extract the ones you want.


R-tree might work for you:

  https://sqlite.org/rtree.html

Dan.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Optmize queries on ranges

2018-10-25 Thread siscia
Hi all,

I am facing an interesting optimization problem.

I have a table like this:

CREATE TABLE ranges (
start int,
end int,
value int,
);

The query that I am interested in optimizing is "select value from ranges
where (? between start and end)"

The max performance that I was able to get is 250 results/second with a
covering index on all three columns.

Now, if I do a more classic "select value from ranges where start = ?" this
provides 14 results/second

So I am pretty sure that I am doing something quite wrong.

Do you guys have any idea of what it could be? How can I obtain better
results?

Cheers,

Simone



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users