You also need to make sure the "no hit" does not degenerate into a table scan. RTree works well for this but is overall significantly slower than not using RTree since the purpose of RTree is to find the "small number of candidate records" that could possibly satisfy the query out of a haystack of records (that is, find the candidate needles in the haystack, so that you only need to closely examine that small number of candidates to find the needle rather than test the whole haystack).
However, if you know that there can only be one possible record which can satisfy the query (ie, there is only one possible needle in the haystack, and only one possible candidate, and you can find this candidate directly for testing), then the overhead of using RTree where it is not needed exceeds the benefits of using it. I see that the performance of the RTree is significantly slower than the equivalent "direct" method. Am I doing something wrong here or is that overhead simply because of the data structures that the RTRee implementation must maintain (which are not required in this case). Without RTree: >py -3 test.py Created 100000 random ranges in 00:00:00.681118 Creation Rate = 146817 Ranges/Second Looked up 1102019 random range values in 00:00:04.598245 Lookup Rate = 239660 Values/Second Failure Rate = 257270 Values/Second Success Rate = 228828 Values/Second With RTree: >py -3 test.py --rtree Created 100000 random ranges in 00:00:02.139742 Creation Rate = 46734 Ranges/Second Looked up 1100681 random range values in 00:00:13.662556 Lookup Rate = 80561 Values/Second Failure Rate = 119874 Values/Second Success Rate = 65627 Values/Second And that came from the following test program (in python) where the only difference is the SQL statements being used. Because the ranges are random and the lookups are random, the timings given are subject to differences on every run, however, the averaged rates are relatively stable given a large number of random ranges and query values. --- test.py --- from __future__ import absolute_import, division, print_function, unicode_literals import datetime import random import sys import time import sqlite3 # Convert a value in seconds to HMS format HMS = lambda t: (datetime.datetime.min + datetime.timedelta(seconds=t)).time().isoformat() # Create constants for the SQL statements we will use if '--rtree' in sys.argv: create_sql = 'create virtual table ranges using rtree(id, start, stop, +value);' query_sql = 'select value from ranges where ? between start and stop;' else: create_sql = 'create table ranges (start integer primary key, stop integer not null, value integer not null);' query_sql = 'select value from (select stop, value from ranges where start <= ?1 order by start desc limit 1) where ?1 <= stop;' insert_sql = 'insert into ranges (start, stop, value) values (?, ?, ?);' # Open our database and do not use automagical transactions db = sqlite3.connect(':memory:', isolation_level=None) # Create our table db.execute(create_sql) # Create our random range data recs = 100000 start = 0 st = time.time() for cnt in range(recs): start += random.randint(1, 10) stop = start + random.randint(1, 10) value = int((start + stop) / 2) db.execute(insert_sql, (start, stop, value)) start = stop stop = stop + random.randint(1, 10) et = time.time() - st print('Created', recs, 'random ranges in', HMS(et), 'Creation Rate =', int(recs / et), 'Ranges/Second') db.execute('analyze;') # Generate a bunch of random values and perform the range query eta = 0.0 ets = 0.0 etf = 0.0 fcnt = 0 scnt = 0 tcnt = 0 for i in range(stop): x = random.randint(0, stop) lst = time.time() row = db.execute(query_sql, (x, )).fetchone() let = time.time() - lst if row: value = row[0] ets += let scnt += 1 else: value = None etf += let fcnt += 1 eta += let tcnt += 1 print('Looked up', stop, 'random range values in', HMS(eta), 'Lookup Rate =', int(tcnt / eta), 'Values/Second') print('Failure Rate =', int(fcnt / etf), 'Values/Second') print('Success Rate =', int(scnt / ets), 'Values/Second') --- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. >-----Original Message----- >From: sqlite-users [mailto:sqlite-users- >boun...@mailinglists.sqlite.org] On Behalf Of E.Pasma >Sent: Friday, 26 October, 2018 16:28 >To: SQLite mailing list >Subject: Re: [sqlite] Optmize queries on ranges > >About the rtree extension, which was the first idea. > >The extension appears available without any special installation >option. This is easier than what is mentioned in >https://sqlite.org/rtree.html <https://sqlite.org/rtree.html> chapter >2: "Compiling The R*Tree Module". >This chapter may as well be left out? > >With test data where the ranges are mostly non-overlapping, the query >now runs faster than without rtree. Even though both run within a >millisecond rtree is ten times faster. >With order by and limit the timing remains superior. But this relies >on strictly non-overlapping ranges. >Below my test script > > >/* query 1: using rtree built-in extension */ >; >create virtual table ranges using rtree(id, minX, maxX, +value); >with r as (select 0 as r union all select r+1 from r where r<1000000) >insert into ranges (minX, maxX, value) >select r*10+1,r*10+10,r*10+5 from r >; >select value from ranges where 123456 between minx and maxx >; >123455 >Run Time: real 0.000 user 0.000135 sys 0.000018 > >/* query 2: using index on minx+maxx */ >drop table ranges >; >create table ranges (minx int, maxx int, value int) >; >with r as (select 0 as r union all select r+1 from r where r<1000000) >insert into ranges (minX, maxX, value) >select r*10+1,r*10+10,r*10+5 from r >; >create unique index ranges_minx_maxx on ranges(minx,maxx) >; >select value from ranges where 123456 between minx and maxx >; >123455 >Run Time: real 0.002 user 0.001415 sys 0.000016 > >/* query 3: same, assuming non-overlapping ranges */ >select value from ranges where 123456 between minx and maxx >order by minx desc limit 1 >; >123455 >Run Time: real 0.000 user 0.000057 sys 0.000000 > > > >_______________________________________________ >sqlite-users mailing list >sqlite-users@mailinglists.sqlite.org >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users