And Brent & Rémi, thanks for your suggestions. --Lee
On Thu, Nov 21, 2013 at 4:40 PM, Lee Hachadoorian < [email protected]> wrote: > This is what I came up with: > > SELECT * FROM table WHERE gid IN ( > SELECT DISTINCT unnest(array[a.gid, b.gid]) AS gid > FROM table a JOIN table b > ON (ST_Intersects(a.geom, b.geom) AND a.gid < b.gid) > ) > > I've posted my original attempts, your suggestions, and the final version > to my blog (with copious commentary) at > http://geospatial.commons.gc.cuny.edu/2013/11/21/finding-islands-or-the-converse/ > > Best, > --Lee > > > > On Thu, Nov 21, 2013 at 7:28 AM, Rémi Cura <[email protected]> wrote: > >> Hey , >> Oops, >> vocaulary error >> "because it's making (2 million)^2 comparisons." -> it's cartesian >> product. >> >> basically you want to compare every polygon to every other, this is a >> cartesian product, and this is what takes time. >> Choosing wel the function used (touches or intersects or ...) won't >> change much computing time as long as it uses index. >> In the same way, the logic you do after is not what takes time. >> >> Maybe you should not use the NOT IN syntaxe, but instead a where, or >> better, an "except", this should work much better >> http://www.postgresql.org/docs/9.3/static/queries-union.html >> >> >> Cheers, >> >> Rémi-C >> >> >> >> 2013/11/20 Lee Hachadoorian <[email protected]> >> >>> Thank you for these suggestions. I haven't replied yet because I am >>> testing them, and the queries take an hour or more to run on the 2 million >>> plus table. I will report back with some results. >>> >>> >>> On Wed, Nov 20, 2013 at 4:46 AM, Rémi Cura <[email protected]> wrote: >>> >>>> >>>> Maybe you can try the stupidest way to go, >>>> anyway you have to do a inner product because you have to consider each >>>> pair of polygons. >>>> >>> >>> You don't have to do inner product. `a LEFT JOIN b ... WHERE b.gid IS >>> NULL` is a typical unmatched query. By doing self join ON ST_Touches(), the >>> polygons with no neighbors are returned because the LEFT JOIN returns all >>> records from a, while ST_Touches() produces no match (because a polygon >>> doesn't touch itself) so b.gid IS NULL. >>> >>> >>>> CREATE TABLE result AS >>>> SELECT DISTINCT ON (a.gid) a.* >>>> FROM table AS a , table AS b >>>> WHERE ST_Intersects(a.geom, b.geom) = TRUE >>>> AND a.gid != b.gid >>>> >>> >>> Just to be clear, this produces the polygons *with* neighbors, which is >>> why I use this as a subselect with gid NOT IN (...) >>> >>> Still testing, >>> --Lee >>> >>> >>> >>> >>>> Now you can improve, because in the previous query you will compute >>>> (geom1,geom2) and (geom2,geom1), which is duplicate >>>> so you can try creating an index on gid , and >>>> CREATE TABLE result2 AS >>>> WITH direct_comp AS (SELECT a.gid AS agid, a.geom AS ageom , b.gid AS >>>> bgid, b.geom AS bgeom >>>> FROM table AS a , table AS b >>>> WHERE ST_Intersects(a.geom, b.geom) = TRUE >>>> AND a.gid < b.gid) >>>> SELECT agid AS gid, ageom AS geom >>>> FROM direct_comp >>>> UNION >>>> SELECT bgid AS gid, bgeom AS geom >>>> FROM direct_comp >>>> >>>> >>>> >>>> >>>> Now if you want to go really big, this will be difficult. >>>> Assuming your polygons bboxes have a homogenous size and/or are not too >>>> big reguarding the total area . >>>> The idea is to group polygons into same cells, so when you try to find >>>> which polygons intersects, instead of comparing a polygon to each other >>>> polygon, you compare a polygon to each other polygon in the same cell. >>>> >>>> >>>> create a grid of cells (big enough or you will have lot's of cells) >>>> index the table >>>> >>>> for each cell, get polygons intersecting it. This defines groups of >>>> polygons in the same cell, defined by a new id "group_id" >>>> index "group_id" >>>> >>>> Now you will have several options depending on your hardware/data >>>> 1) process a cell at a time if limited hardware, adding "WHERE >>>> group_id=XX" >>>> >>>> 2) do a massiv inner join on the group_id and compute all >>>> >>>> In fact you could avoid to create explicitly the cells, but it will be >>>> more complicated. >>>> >>>> Cheers, >>>> Rémi-C >>>> >>>> >>>> 2013/11/20 Brent Wood <[email protected]> >>>> >>>>> I figure you have spatially indexed the polygons already? >>>>> >>>>> Any way of pre-categorising your polygons - binning them in some way >>>>> that allows a non spatial test in the where clause to replace the spatial >>>>> test... >>>>> >>>>> eg: a boolean attribute set to indicate if a feature has any part <= a >>>>> particular x value or not. >>>>> >>>>> run this against the 2m features once to populate it, and assuming you >>>>> get a 50/50 split of T/F features, your 2m^2 query can instead include a >>>>> "where a.bool = b.bool", as we already know that if the boolean flag is >>>>> different, they cannot touch, so your query should involve a spatial test >>>>> only over 1m^2 instead... if you do this on both x & y, the boolean filter >>>>> will replace even more spatial calculations & drop them to 500,000^2 >>>>> tests... >>>>> >>>>> If you can pre-classify features so that non-spatial tests can reduce >>>>> the spatial ones (esp for features with lots of vertices) in a where >>>>> clause, such queries do run much faster, but you have the trade-off of the >>>>> time taken to carry out the classification... which is only >>>>> n*no_classes/2, >>>>> generally much faster than n^n >>>>> >>>>> Hope this makes sense... >>>>> >>>>> Brent Wood >>>>> ------------------------------ >>>>> *From:* Lee Hachadoorian <[email protected]> >>>>> *To:* PostGIS Users <[email protected]> >>>>> *Sent:* Wednesday, November 20, 2013 8:52 PM >>>>> *Subject:* [postgis-users] Finding Islands >>>>> >>>>> I am trying to find "islands", polygons in a (multi)polygon layer >>>>> which >>>>> are not connected to any other polygons in the same layer. What I came >>>>> up with runs in a couple of seconds on a layer with ~1000 geometries, >>>>> and a couple of minutes on a layer with ~23,000 geometries, but I want >>>>> to run it on a layer of 2 million+ and it's taking a L-O-O-O-NG time, >>>>> presumably because it's making (2 million)^2 comparisons. >>>>> >>>>> What I came up with is: >>>>> >>>>> SELECT a.* >>>>> FROM table a LEFT JOIN table b >>>>> ON (ST_Touches(a.geom, b.geom)) >>>>> WHERE b.gid is null >>>>> >>>>> or >>>>> >>>>> SELECT * >>>>> FROM table >>>>> WHERE gid NOT IN ( >>>>> SELECT DISTINCT a.gid >>>>> FROM table a JOIN table b >>>>> ON (ST_Intersects(a.geom, b.geom) AND a.gid != b.gid) >>>>> ) >>>>> >>>>> The first variant raises "NOTICE: geometry_gist_joinsel called with >>>>> incorrect join type". So I thought I could improve performance with an >>>>> INNER JOIN instead of an OUTER JOIN, and came up with the second >>>>> variant, and it does seem to perform somewhat better. Any suggestions >>>>> for how to speed up either approach? >>>>> >>>>> Best, >>>>> --Lee >>>>> >>>>> -- >>>>> Lee Hachadoorian >>>>> Assistant Professor in Geography, Dartmouth College >>>>> http://freecity.commons.gc.cuny.edu >>>>> >>>>> >>>>> _______________________________________________ >>>>> postgis-users mailing list >>>>> [email protected] >>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> postgis-users mailing list >>>>> [email protected] >>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >>>>> >>>> >>>> >>>> _______________________________________________ >>>> postgis-users mailing list >>>> [email protected] >>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >>>> >>> >>> >>> >>> -- >>> Lee Hachadoorian >>> Asst Professor of Geography, Dartmouth College >>> http://freecity.commons.gc.cuny.edu/ >>> >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] >>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >>> >> >> >> _______________________________________________ >> postgis-users mailing list >> [email protected] >> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users >> > > > > -- > Lee Hachadoorian > Asst Professor of Geography, Dartmouth College > http://freecity.commons.gc.cuny.edu/ > -- Lee Hachadoorian Asst Professor of Geography, Dartmouth College http://freecity.commons.gc.cuny.edu/
_______________________________________________ postgis-users mailing list [email protected] http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
