Re: [postgis-users] [postgis-devel] ST_Union() performance problem (with possiblefunding)

Brent Wood Wed, 16 Oct 2013 20:47:56 -0700

These tend to assume each operation is constrained to a single cell, hence 
parallelizable, hence undertaking an operation on multiple cells concurrently.


While I'm dealing with many cells - they are all being merged into a single 
multipolygon feature. I can't load two cells into the same multipolygon at the 
same time - two writes to the same record - so can't run concurrently.


What may be possible is to perhaps run multiple processes on subsets to create 
intermediate (larger) merged polygons which can then be merged themselves to 
create the final single feature.

This would probably allow better use of resources on a multi core system... at 
present I'm using 1.7% of memory on a 100% cpu process, so I'll look into this 
approach - 8 cores running concurrently giving giving close to 8x faster is 
useful.

Brent




________________________________
 From: "[email protected]" <[email protected]>
To: Brent Wood <[email protected]>; PostGIS Development Discussion 
<[email protected]>; Bborie Park <[email protected]> 
Cc: PostGIS Users Discussion <[email protected]> 
Sent: Thursday, October 17, 2013 3:16 PM
Subject: Re: [postgis-devel] ST_Union() performance problem (with       
possiblefunding)
 


yes,
true.. but with a thorough read you might notice that the gdal_retile.py
experiment was largely ineffective, 
but if you click on the link at the
top to the *next post*  
      Variable Buffers in
PostGIS
you will find the one that really worked well.. in fact, we used
that 2nd post in production for months, to great effect.
The trick on one
machine was to split to work by some constant, and then make psycopg2
connections for each "bucket."

This worked very well.. 

Since then I have experimented only a tiny bit with SPARK from
the Berkeley Amp Lab for a distributed work load on a Hadoop file system, but
that world has no GEOS (yet) 

--
Brian M
Hamlin
OSGeo California
Chapter
blog.light42.com



On Wed, 16 Oct 2013
17:28:27 -0700, Bborie Park <[email protected]>
wrote:

Your best bet is to consider splitting the workload among several
postgresql connections.
>
>
>darkblueb had a blog post about
this...
>
>
>http://blog.light42.com/wordpress/?p=23
>
>
>
>
>
>
>
>On Wed, Oct 16, 2013 at 5:21
PM, Brent Wood <[email protected]> wrote:
>
>Hi, 
>>
>>
>>
>>Any advice appreciated!!
>>
>>
>>I'm undertaking a spatial analysis using Postgis (what else would I use!!!). 
>>The
first part works well.
>>
>>
>>
>>I take a large number (potentially millions) of lines defined
by start & end points & buffer them to create polygons. (I'm working in
lat/long EPSG:4326 but transforming to a custom equal area projection for the
buffering operation).
>>
>>
>>I generate a grid of 5x5km cells (polygons) covering the region of
interest.
>>
>>
>>I clip the line based polygons to the grid, so I can generate statistics for
each cell describing the lines that intersect with it, various quantitative
measures such as ST_Union() the clipped line polygons to generate a footprint in
each cell to work out how much is/is not covered, or sum the ST_Area() of the
clipped polygons grouped by cell to calculate an aggregate cover, which can be
several times the actual cell area.
>>
>>
>>
>>So far so good, it works
well, the code is clear & transparent & provides a good result. At least
as good as any commercial software can do. My test
 data subset is processed from scratch in about 30 minutes.
>>
>>Now I want to ST_Union() all the cell based polygons into an overall
single multipolygon representing the footprint. The code is simple. The
performance, even with my subset,  is a problem.
>>
>>I have thousands of
cell based footprint multipolygons, each potentially with thousands of vertices
to be ST_Union()ed. Runtime is weeks for an iteration. If I need separate total
footprints for 20 different species annually for 5 years, that is 100
iterations. Memory & I/O use is minimal - it is totally cpu bound.
>>
>>I
am looking at trying to simplify the polygons to be unioned to reduce the number
of vertices (& hence processing) involved, but to achieve any significant
benefit I'm having to change the shape of the polygons to ST_Union() too much. 
>>
>>
>>
>>Does anyone have any suggestions as to how this could be made
significantly faster? 
>>If I
 had $$ to throw at developers to work on the codebase (presumably GEOS?) could
performance be significantly improved?
>>
>>
>>Thanks,
>>
>>   Brent
Wood
>>_______________________________________________
>>postgis-devel mailing list
>>[email protected]
>>http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>>
>
>>________________________________
>
>_______________________________________________
>postgis-devel mailing list
>[email protected]
>http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel

_______________________________________________
postgis-users mailing list
[email protected]
http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users

Re: [postgis-users] [postgis-devel] ST_Union() performance problem (with possiblefunding)

Reply via email to