On Mon, Jan 1, 2018 at 10:45 AM, Clemens Ladisch <clem...@ladisch.de> wrote:
> Wolfgang Enzinger wrote: > > First, query the overall extent of your data, like this: > > SELECT MIN(mingx),MAX(maxgx),MIN(mingy),MAX(maxgy) FROM > flst_shape_index; > > This results in a full table scan. Instead of caching these values > manually, > it would be a better idea to read them from the index: > > SELECT rtreenode(2, data) FROM flst_shape_index_node WHERE nodeno = 1; > > (rtreenode() is undocumented; maybe you should use your own decoder.) > Intriguing! I'm looking at the output on my local data set (all Texas roads, 3+ million records, large, but not all that much in GIS systems): SELECT rtreenode(2, data) FROM rtree_tiger_shp_geom_node WHERE nodeno = 1; {26338 -100.598 -96.2836 25.8411 30.6962} {26337 -96.9349 -93.5195 28.598 31.1915} {43295 -96.9621 -93.6133 30.6486 33.9442} {62086 -106.644 -98.0079 28.887 32.1568} {80094 -97.6371 -96.4565 28.5382 33.9419} {97065 -98.5755 -97.2869 28.686 34.1405} {108645 -104.096 -98.3331 31.6511 36.5007} So, if I'm guessing right, each set of numbers is a count of features and then the rectangle intersecting them? I've prepared a picture with the data: https://drive.google.com/file/d/15WpyfWNKdVeBoLDDRCoCW2xx87T6qf_Z/view?usp=sharing This query is indeed quite a bit faster and more interesting than scanning the entire index virtual table, wondering, is there also a way to get the "next level" of the rtree? This one returns only a few records, I would not mind having a bit more in memory for more accurate pre-checks (dabase access can be optimized out fully if the requested rectangle falls outside of the rtree boxes no?) What worries me is that these functions are undocumented, and thus, I imaging, unsupported and at risk of being removed at any time. Do you have a feeling of how likely is this to happen? > > > Let SQLite know about that likelihood in a JOIN query > The query is being generated after translation from another generic filter language and at the moment it's expressed as a sub-query (cause it can be negated, related with other conditions by and/or, and so on). I've tried to use it with little luck, the subquery is always run according to explain plan: sqlite> explain query plan SELECT "fid","geom" as "geom" FROM "tiger_shp" WHERE likelihood("fid" IN (SELECT id FROM "rtree_tiger_shp_geom" r WHERE r.maxx >= -98.41324970985764 AND r.minx <= -98.33594825643836 AND r.maxy >= 30.532285631231357 AND r.miny <= 30.595068029284644), *0.01*); 0|0|0|SEARCH TABLE tiger_shp USING INTEGER PRIMARY KEY (rowid=?) 0|0|0|EXECUTE LIST SUBQUERY 1 1|0|0|SCAN TABLE rtree_tiger_shp_geom AS r VIRTUAL TABLE INDEX 2:D1B0D3B2 sqlite> explain query plan SELECT "fid","geom" as "geom" FROM "tiger_shp" WHERE likelihood("fid" IN (SELECT id FROM "rtree_tiger_shp_geom" r WHERE r.maxx >= -98.41324970985764 AND r.minx <= -98.33594825643836 AND r.maxy >= 30.532285631231357 AND r.miny <= 30.595068029284644), *0.99*); 0|0|0|SEARCH TABLE tiger_shp USING INTEGER PRIMARY KEY (rowid=?) 0|0|0|EXECUTE LIST SUBQUERY 1 1|0|0|SCAN TABLE rtree_tiger_shp_geom AS r VIRTUAL TABLE INDEX 2:D1B0D3B2 Oh well, since the code is generating the SQL, it can directly omit the subquery. Is there any literature on what a good threshold would be? Cheers Andrea _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users