Thanks Mark.  I will indeed do the st_dump, but I don't think it will help much 
(very few multis).  I think the tiling will help a lot.  What I wonder, though, 
is how long it will take to tile?  Afterall, it's still an st_intersection 
operation to tile each geometry against each tile.

Is there a quad tree type tiling algorithm in a function?  If I do 256 x 256 
tiles, doing it all at once would be 65536 operations of 
st_intersection(complex_geom, square).  With a quad tree I'll only have 4 
operations of st_intersection(complex_geom, square), and then 16 of 
(a_little_less_complex_geom, square), and then 64 of (even_less_complex_geom, 
square) and so on, for 8 nests.  The last one will still be 65536 operations, 
but the complex geoms should be a lot simpler by the time I get down to that 
level.  What do you think, is this worth trying?  Or is intersecting with a 
square fairly simple regardless of how complex the other geometry is?

(Your reply initially fell into my junk mail, so I missed it, and also haven't 
tried these things yet, partly because I missed your email.)

--
John Abraham
[email protected]
403-232-1060

On Feb 19, 2015, at 6:52 PM, Mark Wynter <[email protected]> wrote:

> 
>> So I've was running this query for 866000 s (10 days) before I decided to 
>> kill it:
> 
> 
>> One potential thing I've realized is that a few of the geometries in tazjan2 
>> are multipolygons, not single polygons.  But it's only a few.  There are a 
>> few very large and complex polygons in lancover_polygons_snap, but again, 
>> most of the 998031 are simple small polygons, about half would be 
>> ST_CoveredBy the polygons in tazjan2 and most of the rest would only overlap 
>> two or three of the polygons in tazjan2.
> 
> 
> A common offender - the multipolygons.
> 
> You can get an immediate performance improvement by 
> ST_Dump(wkb_geometry).geom into a new table, with a new serial id, but 
> retaining the original poly_id so you can don’t lose the multipolygon 
> provenance of each polygon.
> Complex operations on the multipolygon table, like ST_Intersection(), are 
> extremely slow - Because the query has to access and compute the operation on 
> each of the geometry elements contained within the multi.
> 
> Another massive performance gain can be achieved by “Tiling” your dumped 
> polygon table - which has the effect of breaking your geometries into smaller 
> features, which you can then “union” back together again, after running 
> ST_Intersection on your vector tiles.  Don’t try to tile your multi polygon 
> table - you’ll still be waiting days.
> 
> Take the coastline of Australia - a single multi polygon
> 
> SELECT Count(ogc_fid) as num_rows, SUM(ST_NumGeometries(wkb_geometry)) as 
> num_geoms, SUM(ST_NPoints(wkb_geometry)) as num_points FROM abs_aus11;
>  num_rows | num_geoms | num_points 
> -------------------+--------------
>         1 |      6718 |    1622598
> 
> If you need to query the intersection of a polygon with land, and the water, 
> its near impossible.  Many, many days.
> 
> Hence we need to take a different approach.
> 
> The first thing you can do is to dump the multi polygon table.
> Then create a regular vector grid.
> Then tile the both your Polygon tables using the vector grid.
> Put indexes on the resultant tiled tables.
> Then run your ST_Intersects.
> The result is many more geometries, and points, but now we get the benefit of 
> geometry indexes.
> 
> As a result of tiling Australia coastline, we now get
> 
> SELECT Count(tid) as num_rows, SUM(ST_NumGeometries(wkb_geometry)) as 
> num_geoms, SUM(ST_NPoints(wkb_geometry)) as num_points FROM abs_aus11_tiled;
>  num_rows | num_geoms | num_points 
> -------------------+--------------
>     17222 |     29020 |    6513770
> 
> Each geometry also has a tile_id, and the original poly_id.  The tile_id’s 
> are really useful for subsetting your queries, and for using tile-id’s as an 
> additional join condition.  At any time you can rapidly rebuild your original 
> geometries doing ST_Union() GROUP BY poly_id.
> 
> Using the approach of dumping and tiling, queries that once took days now 
> takes minutes at most.  And as Remi mentioned, you can parallelise the 
> process.  The concept of vector tiling in PostGIS is analogous to Map Reduce 
> and assigning Key Value Pairs.  Its about chunking your data, doing lots of 
> fast operations on the smaller bits, and then consolidating your outputs.  
> You don’t necessarily have to write a SQL function to do the SQL 
> parallelisation.   My preference is Bash and GNU Parallel because you can 
> write vanilla SQL and execute via psql in parallel.   For me its now about 
> the speed I can write the SQL.
> 
> So we’re now starting to apply Big Data concepts within PostGIS….    Holy 
> smoke… and you can apply the same concepts to a wide range of PostGIS 
> operations - scale up, scale out….
> 
> I will write up a tutorial explaining the benefits of vector tiling in 
> PostGIS, with examples and parallelisation code patterns.  If not today, 
> hopefully over the next week.  
> 
> Mark
> 
> 

_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Reply via email to