Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

Andrea Aime Mon, 01 Jan 2018 08:23:04 -0800

On Mon, Jan 1, 2018 at 10:45 AM, Clemens Ladisch <clem...@ladisch.de> wrote:

> Wolfgang Enzinger wrote:
> > First, query the overall extent of your data, like this:
> > SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM
> flst_shape_index;
>
> This results in a full table scan.  Instead of caching these values
> manually,
> it would be a better idea to read them from the index:
>
>   SELECT rtreenode(2, data) FROM flst_shape_index_node WHERE nodeno = 1;
>
> (rtreenode() is undocumented; maybe you should use your own decoder.)
>

Intriguing! I'm looking at the output on my local data set (all Texas
roads, 3+ million records, large,
but not all that much in GIS systems):

SELECT rtreenode(2, data) FROM rtree_tiger_shp_geom_node WHERE nodeno = 1;
{26338 -100.598 -96.2836 25.8411 30.6962} {26337 -96.9349 -93.5195 28.598
31.1915} {43295 -96.9621 -93.6133 30.6486 33.9442} {62086 -106.644 -98.0079
28.887 32.1568} {80094 -97.6371 -96.4565 28.5382 33.9419} {97065 -98.5755
-97.2869 28.686 34.1405} {108645 -104.096 -98.3331 31.6511 36.5007}

So, if I'm guessing right, each set of numbers is a count of features and
then the rectangle intersecting them?
I've prepared a picture with the data:

https://drive.google.com/file/d/15WpyfWNKdVeBoLDDRCoCW2xx87T6qf_Z/view?usp=sharing

This query is indeed quite a bit faster and more interesting than scanning
the entire index virtual table, wondering,
is there also a way to get the "next level" of the rtree?
This one returns only a few records, I would not mind having a bit more in
memory for more accurate pre-checks (dabase
access can be optimized out fully if the requested rectangle falls outside
of the rtree boxes no?)

What worries me is that these functions are undocumented, and thus, I
imaging, unsupported and at risk
of being removed at any time. Do you have a feeling of how likely is this
to happen?

>
> > Let SQLite know about that likelihood in a JOIN query
>

The query is being generated after translation from another generic filter
language and at the moment it's
expressed as a sub-query (cause it can be negated, related with other
conditions by and/or, and so on).
I've tried to use it with little luck, the subquery is always run according
to explain plan:

sqlite> explain query plan SELECT "fid","geom" as "geom" FROM "tiger_shp"
WHERE  likelihood("fid" IN (SELECT id FROM "rtree_tiger_shp_geom" r WHERE
r.maxx >= -98.41324970985764 AND r.minx <= -98.33594825643836 AND r.maxy >=
30.532285631231357 AND r.miny <= 30.595068029284644), *0.01*);
0|0|0|SEARCH TABLE tiger_shp USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE LIST SUBQUERY 1
1|0|0|SCAN TABLE rtree_tiger_shp_geom AS r VIRTUAL TABLE INDEX 2:D1B0D3B2
sqlite> explain query plan SELECT "fid","geom" as "geom" FROM "tiger_shp"
WHERE  likelihood("fid" IN (SELECT id FROM "rtree_tiger_shp_geom" r WHERE
r.maxx >= -98.41324970985764 AND r.minx <= -98.33594825643836 AND r.maxy >=
30.532285631231357 AND r.miny <= 30.595068029284644), *0.99*);
0|0|0|SEARCH TABLE tiger_shp USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE LIST SUBQUERY 1
1|0|0|SCAN TABLE rtree_tiger_shp_geom AS r VIRTUAL TABLE INDEX 2:D1B0D3B2

Oh well, since the code is generating the SQL, it can directly omit the
subquery. Is there any literature on what a good threshold
would be?

Cheers
Andrea
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Knowing when not to use a R-Tree index, and clustering tables along spatial indexes

Reply via email to