Thanks Mark. I will indeed do the st_dump, but I don't think it will help much (very few multis). I think the tiling will help a lot. What I wonder, though, is how long it will take to tile? Afterall, it's still an st_intersection operation to tile each geometry against each tile.
Is there a quad tree type tiling algorithm in a function? If I do 256 x 256 tiles, doing it all at once would be 65536 operations of st_intersection(complex_geom, square). With a quad tree I'll only have 4 operations of st_intersection(complex_geom, square), and then 16 of (a_little_less_complex_geom, square), and then 64 of (even_less_complex_geom, square) and so on, for 8 nests. The last one will still be 65536 operations, but the complex geoms should be a lot simpler by the time I get down to that level. What do you think, is this worth trying? Or is intersecting with a square fairly simple regardless of how complex the other geometry is? (Your reply initially fell into my junk mail, so I missed it, and also haven't tried these things yet, partly because I missed your email.) -- John Abraham [email protected] 403-232-1060 On Feb 19, 2015, at 6:52 PM, Mark Wynter <[email protected]> wrote: > >> So I've was running this query for 866000 s (10 days) before I decided to >> kill it: > > >> One potential thing I've realized is that a few of the geometries in tazjan2 >> are multipolygons, not single polygons. But it's only a few. There are a >> few very large and complex polygons in lancover_polygons_snap, but again, >> most of the 998031 are simple small polygons, about half would be >> ST_CoveredBy the polygons in tazjan2 and most of the rest would only overlap >> two or three of the polygons in tazjan2. > > > A common offender - the multipolygons. > > You can get an immediate performance improvement by > ST_Dump(wkb_geometry).geom into a new table, with a new serial id, but > retaining the original poly_id so you can don’t lose the multipolygon > provenance of each polygon. > Complex operations on the multipolygon table, like ST_Intersection(), are > extremely slow - Because the query has to access and compute the operation on > each of the geometry elements contained within the multi. > > Another massive performance gain can be achieved by “Tiling” your dumped > polygon table - which has the effect of breaking your geometries into smaller > features, which you can then “union” back together again, after running > ST_Intersection on your vector tiles. Don’t try to tile your multi polygon > table - you’ll still be waiting days. > > Take the coastline of Australia - a single multi polygon > > SELECT Count(ogc_fid) as num_rows, SUM(ST_NumGeometries(wkb_geometry)) as > num_geoms, SUM(ST_NPoints(wkb_geometry)) as num_points FROM abs_aus11; > num_rows | num_geoms | num_points > -------------------+-------------- > 1 | 6718 | 1622598 > > If you need to query the intersection of a polygon with land, and the water, > its near impossible. Many, many days. > > Hence we need to take a different approach. > > The first thing you can do is to dump the multi polygon table. > Then create a regular vector grid. > Then tile the both your Polygon tables using the vector grid. > Put indexes on the resultant tiled tables. > Then run your ST_Intersects. > The result is many more geometries, and points, but now we get the benefit of > geometry indexes. > > As a result of tiling Australia coastline, we now get > > SELECT Count(tid) as num_rows, SUM(ST_NumGeometries(wkb_geometry)) as > num_geoms, SUM(ST_NPoints(wkb_geometry)) as num_points FROM abs_aus11_tiled; > num_rows | num_geoms | num_points > -------------------+-------------- > 17222 | 29020 | 6513770 > > Each geometry also has a tile_id, and the original poly_id. The tile_id’s > are really useful for subsetting your queries, and for using tile-id’s as an > additional join condition. At any time you can rapidly rebuild your original > geometries doing ST_Union() GROUP BY poly_id. > > Using the approach of dumping and tiling, queries that once took days now > takes minutes at most. And as Remi mentioned, you can parallelise the > process. The concept of vector tiling in PostGIS is analogous to Map Reduce > and assigning Key Value Pairs. Its about chunking your data, doing lots of > fast operations on the smaller bits, and then consolidating your outputs. > You don’t necessarily have to write a SQL function to do the SQL > parallelisation. My preference is Bash and GNU Parallel because you can > write vanilla SQL and execute via psql in parallel. For me its now about > the speed I can write the SQL. > > So we’re now starting to apply Big Data concepts within PostGIS…. Holy > smoke… and you can apply the same concepts to a wide range of PostGIS > operations - scale up, scale out…. > > I will write up a tutorial explaining the benefits of vector tiling in > PostGIS, with examples and parallelisation code patterns. If not today, > hopefully over the next week. > > Mark > >
_______________________________________________ postgis-users mailing list [email protected] http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
