Hi PostGISers,
sorry this is a long one ;)
I am trying to use the ST_DWithin and ST_Distance functions to get a list of
nearest neighboring roads in relation to a point location. The dataset is big
its the tiger shape files which I loaded into PostGIS and clustered on the
gist_index, SRID is 9102003 (or EPSG code 102003) an Albers-Equal-Area
projection which has units in meters. I researched earlier posts and it seemed
that several recommendations pointed out that the use of ST_DWithin and
ST_Distance would do exactly what I wanted to do.
I created an spatial index on the roads layer and afterwards ran vacuum analyze
on the roads layer.
CREATE INDEX roads_gidx ON roads USING gist (the_geom); ALTER TABLE roads
CLUSTER ON roads_gidx;
and a second index on the transformed geom:
CREATE INDEX roads_tranform_9102003_gidx ON roads USING gist
(st_transform(the_geom, 9102003)) WHERE the_geom IS NOT NULL;
My "dream" SQL query which I wanted to run is "Find all road segments within 50
meter from a point"" e.g. like this
SELECT * FROM roads WHERE ST_DWithin(transform(roads.the_geom,9102003),
GeomFromText('POINT(-2026631.3499604 1355302.2855377)', 9102003), 50)
ORDER BY ST_Distance(transform(roads.the_geom,9102003),
GeomFromText('POINT(-2026631.3499604 1355302.2855377)', 9102003)) limit 1;
However this query was so slow that I never managed yet to wait it out yet :(
.. I believe it would be a couple of hours ...
Thus I tried to run it in parts to fnd out what is =going on
Since no index was used I tried to convince the query planner to use a bounding
box (which ST_DWithin should use anyway) by adding one manually:
SELECT * FROM roads WHERE ST_DWithin(transform(roads.the_geom,9102003),
GeomFromText('POINT(-2260485.2563774 251816.03973006)', 9102003), 50) and
transform(roads.the_geom,9102003) && GeomFromText('POINT(-2260485.2563774
251816.03973006)', 9102003) LIMIT 1;
The query returns one row and
Explain Analyze give me:
"Limit (cost=0.00..2450234.77 rows=1 width=879) (actual
time=1277669.052..1277669.053 rows=1 loops=1)"
" -> Seq Scan on roads (cost=0.00..2450234.77 rows=1 width=879) (actual
time=1277669.051..1277669.051 rows=1 loops=1)"
" Filter: ((transform(the_geom, 9102003) &&
'0103000020B3E28A000100000005000000000000C01B3F41C100000040B0BB0E41000000C01B3F41C100000060D0BE0E41000000A0E93E41C100000060D0BE0E41000000A0E93E41C100000040B0BB0E41000000C01B3F41C100000040B0BB0E41'::geometry)
AND _st_dwithin(transform(the_geom, 9102003),
'0101000020B3E28A0082F9D0A0023F41C163FE5D5140BD0E41'::geometry, 50::double
precision) AND (transform(the_geom, 9102003) &&
'0101000020B3E28A0082F9D0A0023F41C163FE5D5140BD0E41'::geometry) AND
('0101000020B3E28A0082F9D0A0023F41C163FE5D5140BD0E41'::geometry &&
st_expand(transform(the_geom, 9102003), 50::double precision)))"
"Total runtime: 1277669.134 ms"
woaahh!
I am wondering why the spatial index was not used which - at least in theory -
should be obvious to the query planner ....
This discussion forum thread from August 2009 discusses similar issues:
<http://www.nabble.com/-PostGIS---230%3A-st_expand-seems-to-affect-the-execution-order-wich-affects-st_dwithin-tt24809346.html#a24809346>
http://www.nabble.com/-PostGIS---230%3A-st_expand-seems-to-affect-the-execution-order-wich-affects-st_dwithin-tt24809346.html#a24809346
Namely that PostGIS performs an sequential scan instead of an index scan. From
what I understand in the above thread it seems that the query planner for
large/huge tables (like my tables of all US roads loaded from TIGER) does
somehow not get the correct cost inputs to make the decision to use the spatial
index and this might get solved in PostGIS 1.1 or 1.5 possibly (I'm using
PostGIS 1.3.6RC1)?
I also tried to change the cost for the functions used to convince PostGIS to
do the bounding box operation first but with no success. I also temporarily to
"disabling" or better "discouraging" seq scans by: set enable_seqscan=false and
after the query setting it back to "normal": set enable_seqscan=true; - still
PostGIS does the seq scan first.
Another forum thread seems to report the same issue:
<http://www.nabble.com/use-of-spatial-index-in-nearest-neighbour-query---to25291925.html#a25291925>
http://www.nabble.com/use-of-spatial-index-in-nearest-neighbour-query---to25291925.html#a25291925
So would anybody have suggestions how I can work around this behavior and still
get the closets roads from a point location or will that just not work right
now on huge data sets?
Is there any way to really force PostGIS to use the gist index ?
I am using the following software versions:
PostgreSQL 8.3.7 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Ubuntu
4.3.3-5ubuntu4) 4.3.3
POSTGIS="1.3.6RC1" GEOS="3.1.0-CAPI-1.5.0" PROJ="Rel. 4.6.1, 21 August 2008"
USE_STATS
Cheers
Karsten
Karsten Vennemann
Terra GIS LTD
Seattle, WA, USA
<http://www.terragis.net> www.terragis.net
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
More details:
Strangely the above query also works faster when using SRID 4326 in degress
which doesn't seem to make sense (radius = 50 degrees - no way I don't get it)
SELECT * FROM roads WHERE ST_DWithin(roads.the_geom,
transform(GeomFromText('POINT(-2260485.2563774 251816.03973006)',
9102003),4326), 50) and
roads.the_geom && transform(GeomFromText('POINT(-2260485.2563774
251816.03973006)', 9102003),4326) LIMIT 1;
This returns one record in 4607 ms.
Explain of this query spits out:
"Limit (cost=0.00..9.24 rows=1 width=879) (actual time=437.187..437.188 rows=1
loops=1)"
" -> Index Scan using roads_gidx on roads (cost=0.00..9.24 rows=1 width=879)
(actual time=437.184..437.184 rows=1 loops=1)"
" Index Cond: ((the_geom &&
'0103000020E6100000010000000500000000000060837E65C000000080970C2AC000000060837E65C0000000206DBE5540000000A006FD51C0000000206DBE5540000000A006FD51C000000080970C2AC000000060837E65C000000080970C2AC0'::geometry)
AND (the_geom &&
'0101000020E610000018925BB2067D5EC0C8CF2D22DA7C4240'::geometry))"
" Filter: ((the_geom &&
'0103000020E6100000010000000500000000000060837E65C000000080970C2AC000000060837E65C0000000206DBE5540000000A006FD51C0000000206DBE5540000000A006FD51C000000080970C2AC000000060837E65C000000080970C2AC0'::geometry)
AND _st_dwithin(the_geom,
'0101000020E610000018925BB2067D5EC0C8CF2D22DA7C4240'::geometry, 50::double
precision) AND (the_geom &&
'0101000020E610000018925BB2067D5EC0C8CF2D22DA7C4240'::geometry) AND
('0101000020E610000018925BB2067D5EC0C8CF2D22DA7C4240'::geometry &&
st_expand(the_geom, 50::double precision)))"
"Total runtime: 437.500 ms"
I also tried to use a bounding box with a manually created polygon but I am
running into the same issues - its running slow doing a seq scan: 1519138 ms:
like:
SELECT * FROM roads WHERE transform(roads.the_geom,9102003) &&
ST_Polygon(ST_GeomFromText('LINESTRING(-2026098.7705385
1354954.1944117,-2026979.4410859 1354954.1944117,-2026979.4410859
1356078.528747,-2026098.7705385 1356078.528747,-2026098.7705385
1354954.1944117)'), 9102003) and
ST_intersects(transform(roads.the_geom,9102003),
ST_Polygon(ST_GeomFromText('LINESTRING(-2026098.7705385
1354954.1944117,-2026979.4410859 1354954.1944117,-2026979.4410859
1356078.528747,-2026098.7705385 1356078.528747,-2026098.7705385
1354954.1944117)'), 9102003));
Explain Analyze of the query above gives me:
"Seq Scan on roads (cost=0.00..2144881.90 rows=1 width=879) (actual
time=1491788.995..1519081.606 rows=45 loops=1)"
" Filter: ((transform(the_geom, 9102003) &&
'0103000020B3E28A000100000005000000DA0242C572EA3EC115F7C431CAAC34416B01EB70E3ED3EC115F7C431CAAC34416B01EB70E3ED3EC1A1F65B872EB13441DA0242C572EA3EC1A1F65B872EB13441DA0242C572EA3EC115F7C431CAAC3441'::geometry)
AND (transform(the_geom, 9102003) &&
'0103000020B3E28A000100000005000000DA0242C572EA3EC115F7C431CAAC34416B01EB70E3ED3EC115F7C431CAAC34416B01EB70E3ED3EC1A1F65B872EB13441DA0242C572EA3EC1A1F65B872EB13441DA0242C572EA3EC115F7C431CAAC3441'::geometry)
AND _st_intersects(transform(the_geom, 9102003),
'0103000020B3E28A000100000005000000DA0242C572EA3EC115F7C431CAAC34416B01EB70E3ED3EC115F7C431CAAC34416B01EB70E3ED3EC1A1F65B872EB13441DA0242C572EA3EC1A1F65B872EB13441DA0242C572EA3EC115F7C431CAAC3441'::geometry))"
"Total runtime: 1519138.528 ms"
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users