Dear group,
since a couple of years now, I'm using SQLite for storing graphic objects
like polygons and lines, with great success. Using the rtree extension
makes spatial queries very efficient.
In my lastest project, however, these objects are just single points
(coordinate pairs). As I understand it, rtree isn't the appropriate tool
here because it's for range queries. Of course, I could store every
coordinate twice in an rtee index, thus simulating ranges (with a width of
zero). However, I guess that would be a waste of space with probably no benefit.
So I tried a few things. The table definition looks like this:
CREATE TABLE "pdata"(
"gx" DOUBLE NOT NULL, -- x coordinate
"gy" DOUBLE NOT NULL, -- y coordinate
"ID" INTEGER PRIMARY KEY NOT NULL
/* a couple more fields here */
)
Then I established an index over gx and gy:
CREATE INDEX "pdata_spatial_index" ON "pdata"("gx","gy");
EQP shows that this index is used (not surprisingly) for every spatial query
like:
SELECT * FROM pdata WHERE gx BETWEEN 210000 AND 220000 AND gy BETWEEN 200000
AND 390000
However, EQP also shows that it is only used with the value of gx, not gy:
SEARCH TABLE pdata USING INDEX pdata_spatial_index (gx>? AND gx<?)
Since storing "gy" in this index again seems just a waste of space, I left
it off. OTOH, it doesn't feel right to have no index at all on "gy", so this
fields got its own index:
CREATE INDEX "pdata_spatial_index_x" ON "pdata"("gx");
CREATE INDEX "pdata_spatial_index_y" ON "pdata"("gy");
With ANALYZE data available, SQLite is now smart enough to choose the most
efficient search path, depending on the width / height of the search range:
SELECT * FROM pdata WHERE gx BETWEEN 210000 AND 220000 and gy BETWEEN 200000
AND 390000
--> SEARCH TABLE pdata USING INDEX pdata_spatial_index_x (gx>? AND gx<?)
SELECT * FROM pdata WHERE gx BETWEEN 200000 AND 300000 and gy BETWEEN 210000
AND 220000
--> SEARCH TABLE pdata USING INDEX pdata_spatial_index_y (gy>? AND gy<?)
So far this seems the most efficient solution to me. Any input on this?
Wolfgang