Thank you for these suggestions. I haven't replied yet because I am testing them, and the queries take an hour or more to run on the 2 million plus table. I will report back with some results.
On Wed, Nov 20, 2013 at 4:46 AM, Rémi Cura <remi.c...@gmail.com> wrote: > > Maybe you can try the stupidest way to go, > anyway you have to do a inner product because you have to consider each > pair of polygons. > You don't have to do inner product. `a LEFT JOIN b ... WHERE b.gid IS NULL` is a typical unmatched query. By doing self join ON ST_Touches(), the polygons with no neighbors are returned because the LEFT JOIN returns all records from a, while ST_Touches() produces no match (because a polygon doesn't touch itself) so b.gid IS NULL. > CREATE TABLE result AS > SELECT DISTINCT ON (a.gid) a.* > FROM table AS a , table AS b > WHERE ST_Intersects(a.geom, b.geom) = TRUE > AND a.gid != b.gid > Just to be clear, this produces the polygons *with* neighbors, which is why I use this as a subselect with gid NOT IN (...) Still testing, --Lee > Now you can improve, because in the previous query you will compute > (geom1,geom2) and (geom2,geom1), which is duplicate > so you can try creating an index on gid , and > CREATE TABLE result2 AS > WITH direct_comp AS (SELECT a.gid AS agid, a.geom AS ageom , b.gid AS > bgid, b.geom AS bgeom > FROM table AS a , table AS b > WHERE ST_Intersects(a.geom, b.geom) = TRUE > AND a.gid < b.gid) > SELECT agid AS gid, ageom AS geom > FROM direct_comp > UNION > SELECT bgid AS gid, bgeom AS geom > FROM direct_comp > > > > > Now if you want to go really big, this will be difficult. > Assuming your polygons bboxes have a homogenous size and/or are not too > big reguarding the total area . > The idea is to group polygons into same cells, so when you try to find > which polygons intersects, instead of comparing a polygon to each other > polygon, you compare a polygon to each other polygon in the same cell. > > > create a grid of cells (big enough or you will have lot's of cells) > index the table > > for each cell, get polygons intersecting it. This defines groups of > polygons in the same cell, defined by a new id "group_id" > index "group_id" > > Now you will have several options depending on your hardware/data > 1) process a cell at a time if limited hardware, adding "WHERE group_id=XX" > > 2) do a massiv inner join on the group_id and compute all > > In fact you could avoid to create explicitly the cells, but it will be > more complicated. > > Cheers, > Rémi-C > > > 2013/11/20 Brent Wood <pcr...@pcreso.com> > >> I figure you have spatially indexed the polygons already? >> >> Any way of pre-categorising your polygons - binning them in some way that >> allows a non spatial test in the where clause to replace the spatial test... >> >> eg: a boolean attribute set to indicate if a feature has any part <= a >> particular x value or not. >> >> run this against the 2m features once to populate it, and assuming you >> get a 50/50 split of T/F features, your 2m^2 query can instead include a >> "where a.bool = b.bool", as we already know that if the boolean flag is >> different, they cannot touch, so your query should involve a spatial test >> only over 1m^2 instead... if you do this on both x & y, the boolean filter >> will replace even more spatial calculations & drop them to 500,000^2 >> tests... >> >> If you can pre-classify features so that non-spatial tests can reduce the >> spatial ones (esp for features with lots of vertices) in a where clause, >> such queries do run much faster, but you have the trade-off of the time >> taken to carry out the classification... which is only n*no_classes/2, >> generally much faster than n^n >> >> Hope this makes sense... >> >> Brent Wood >> ------------------------------ >> *From:* Lee Hachadoorian <lee.hachadooria...@gmail.com> >> *To:* PostGIS Users <postgis-us...@postgis.refractions.net> >> *Sent:* Wednesday, November 20, 2013 8:52 PM >> *Subject:* [postgis-users] Finding Islands >> >> I am trying to find "islands", polygons in a (multi)polygon layer which >> are not connected to any other polygons in the same layer. What I came >> up with runs in a couple of seconds on a layer with ~1000 geometries, >> and a couple of minutes on a layer with ~23,000 geometries, but I want >> to run it on a layer of 2 million+ and it's taking a L-O-O-O-NG time, >> presumably because it's making (2 million)^2 comparisons. >> >> What I came up with is: >> >> SELECT a.* >> FROM table a LEFT JOIN table b >> ON (ST_Touches(a.geom, b.geom)) >> WHERE b.gid is null >> >> or >> >> SELECT * >> FROM table >> WHERE gid NOT IN ( >> SELECT DISTINCT a.gid >> FROM table a JOIN table b >> ON (ST_Intersects(a.geom, b.geom) AND a.gid != b.gid) >> ) >> >> The first variant raises "NOTICE: geometry_gist_joinsel called with >> incorrect join type". So I thought I could improve performance with an >> INNER JOIN instead of an OUTER JOIN, and came up with the second >> variant, and it does seem to perform somewhat better. Any suggestions >> for how to speed up either approach? >> >> Best, >> --Lee >> >> -- >> Lee Hachadoorian >> Assistant Professor in Geography, Dartmouth College >> http://freecity.commons.gc.cuny.edu >> >> >> _______________________________________________ >> postgis-users mailing list >> postgis-users@lists.osgeo.org >> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >> >> >> >> _______________________________________________ >> postgis-users mailing list >> postgis-users@lists.osgeo.org >> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >> > > > _______________________________________________ > postgis-users mailing list > postgis-users@lists.osgeo.org > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users > -- Lee Hachadoorian Asst Professor of Geography, Dartmouth College http://freecity.commons.gc.cuny.edu/
_______________________________________________ postgis-users mailing list postgis-users@lists.osgeo.org http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users