Hi Roger,

You may very well be pushing the bounds of what geomunion can do efficiently. Back in Nov, as Martin pointed out, Lee Keel had a similar problem. He was trying to dissolve a set of 32000 polygons, and even that was a long time (about the size of one of your 8 classes you are grouping on). As you can see from Martin's link reference, however, we were able to bring this time down to 30 secs.

The problem here is two fold:
1. You input geometry may not be spatially close to eachother. As Paul Ramsey mentioned, "The goal is to cause each individual geometry + geometry union to *reduce* the amount of aggregate linework. When the g+g ops have no locality, each addition *adds* to the amount of linework, making successive ops slower and slower and slower." 2. Through testing, I found that union does not perform well with large collections. This is may be due to the size of your work_mem and shared_buffer settings and the size of the ever growing collected geometry, I'm not sure. In any case, I found that a collected size of about 10 geometries performs well.

My solution to you would be:
1. Do as Paul Ramsey suggested and sort your input geometries so they are spatially close to each other before dissolving them together. Paul did this by ordering by the x + y of the extents of the geometry. This may work well, test it out. In my solution earlier, I sorted by the centroid of the geometry, snapped to some grid. But then Lee's data was fairly evenly scattered across his dataset extents - I'm not sure about yours. 2. Perform your dissolve on smaller groups (191000 records grouped into 8 classes is far too large, leaving each group to be ~24000). I would recommend you group on your class and a unique id, rounded to the nearest 10th digit. This will ensure you union will not aggregate more than 10 geometries at a time, which I found is the entire key to this problem.

ie. (I haven't tested this, but you could try something like this).

CREATE TEMP TABLE tmp
 (id serial, class char(8), the_geom geometry);

INSERT INTO tmp (class, the_geom)
 SELECT class, the_geom
 FROM test_suit_h_crop3_class
 ORDER BY class, X(Extent(the_geom)) + Y(Extent(the_geom));
 -- OR
 ORDER BY class, the_geom;
 -- OR
ORDER BY class, SnapToGrid(X(Extent(the_geom)), <some resonable precision>);
-- I'd be curious to see which is fastest

CREATE TABLE union_result AS
SELECT class, ST_Union(the_geom) AS the_geom
FROM ( SELECT min(id) AS id, class, ST_Union(the_geom) AS the_geom
 FROM (
   SELECT min(id) AS id, class, ST_Union(the_geom) AS the_geom
   FROM (
     SELECT min(id) AS id, class, ST_Union(the_geom) AS the_geom
     FROM (
       SELECT min(id) AS id, class, ST_Union(the_geom) AS the_geom
       FROM tmp
       GROUP BY class, round(id/10)
       ORDER BY class, id) AS tmp1
     GROUP BY class, round(id/100)
     ORDER BY class, id) AS tmp2
   GROUP BY class, round(id/1000)
   ORDER BY class, id) AS tmp3
 GROUP BY class, round(id/10000)
 ORDER BY class, id) AS tmp4
GROUP BY class, round(id/100000)
ORDER BY class, id;

Hope this clarifies things.
-- Kevin

Martin Davis wrote:
Yep, that's the kind of number I was asking for. Not surprising that it takes a long time!

I think you might be pushing the bounds of what can be handled by PostGIS/GEOS at the moment...

Roger André wrote:
Hi, and thanks for the feedback. I have 8 classes, spread across 191,317 total records. Not sure is that was what you meant by number of geometries.
--

On Thu, Mar 13, 2008 at 3:56 PM, Paragon Corporation <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

     Slight correction

     SELECT st_multi(st_union(the_geom)) AS the_geom, class FROM
    "test_suit_h_crop3_class" GROUP BY class;

    or

     SELECT st_multi(st_collect(the_geom)) AS the_geom, class FROM
    "test_suit_h_crop3_class" GROUP BY class;


    -----Original Message-----
    From: Paragon Corporation [mailto:[EMAIL PROTECTED] <mailto:[EMAIL 
PROTECTED]>]
    Sent: Thursday, March 13, 2008 6:54 PM
    To: 'PostGIS Users Discussion'
    Subject: RE: [postgis-users] geomunion HOWTO?

    Also scrap the AsText call you have.   I'm guessing its slowing
    things down
    a bit, although probably not much, but its totally unnecessary at
    anyrate.
    Should just be

     SELECT st_multi(st_geomunion(the_geom)) AS the_geom, class FROM
    "test_suit_h_crop3_class" GROUP BY class;

    You may also want to consider using ST_Collect instead of
    ST_GeomUnion,
    although for large files may not help much.


    Hope that helps,
    Regina


    -----Original Message-----
    From: [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>
    [mailto:[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>] On Behalf
    Of Paul
    Ramsey
    Sent: Thursday, March 13, 2008 6:39 PM
    To: PostGIS Users Discussion
    Subject: Re: [postgis-users] geomunion HOWTO?

    No, you are probably just exercising the geometric operators a
    lot. It is
    possible a cascaded union would do better, but we don't have that
    programmed
    right now.  You could try and make it mildly faster by forcing the
    union to
    happen in a minimally more efficient order, by sorting when you
    create your
    first table, see below...

    No guarantees this makes anything better, just a random guess at a
    hack.

    On 3/13/08, Roger André <[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>> wrote:
> I'm trying to find a way to generate "dissolved" geometries without
    > exporting shapefiles from PostGIS and performing the operating in
    > ArcGIS.  I found some instructions online at
    >
http://www.paolocorti.net/public/wordpress/index.php/2007/03/30/union-of-two
    -geometries-in-postgis/
<http://www.paolocorti.net/public/wordpress/index.php/2007/03/30/union-of-two-geometries-in-postgis/>. > These work fine on their example, but the opeartion when applied to > my data set never completes. I realize my data set is pretty large
    > (), but the same dissolve operation when done via ArcGIS on a
> shapefile exported by pgsql2shp takes around 5 minutes to complete. > This leads me to believe I'm doing something completely wrong, and I
    > would love to get some feedback from those of you with
    experience doing
    this.  Below are the steps I've done.
    >
    > Step 1 - create a "crop_3" table that contains only crop3
    values, and
    > a class.  This completes within 30 secs:
    >
    > begin;
> create table "test_suit_h_crop3_class" ( "alloc_id" char(8) PRIMARY
    > KEY,  "crop3" numeric, "class" char(8) ); select
    >
AddGeometryColumn('','test_suit_h_crop3_class','the_geom','-1','MULTIP
    > OLYGON',2); insert into "test_suit_h_crop3_class" ("alloc_id",
    > "crop3", "class",
    >  "the_geom")
    > select vw_suit_area_h.alloc_id, vw_suit_area_h.crop3, case when
    crop3
    > < 1 then 'class_0'
    > when crop3 >= 1 and crop3 < 860 then 'class_1'
    > when crop3 >= 860 and crop3 < 1720 then 'class_2'
    >  when crop3 >= 1720 and crop3 < 3440 then 'class_3'
    > when crop3 >= 3440 and crop3 < 5160 then 'class_4'
    > when crop3 >= 5160 and crop3 < 6880 then 'class_5'
    > when crop3 >= 6880 and crop3 < 7740 then 'class_6'
    >  when crop3 >= 7740 then 'class_7'
    > ELSE 'other'
    > end AS class,
    > vw_suit_area_h.the_geom
    > FROM vw_suit_area_h

    ORDER BY X(Extent(the_geom)) + Y(Extent(the_geom))

    > end;

More ideally, we would bit-interleave the X and Y values, to force the
    ordering of the inputs to be very well localized, and even more
    ideally do
    an actual cascaded union.

    The goal is to cause each individual geometry + geometry union to
    *reduce* the amount of aggregate linework. When the g+g ops have no
    locality, each addition *adds* to the amount of linework, making
    successive
    ops slower and slower and slower.

    > Step 2 - create a temp "dissolve" table to store the results of a
    > geometric union run of the above table, grouped by "class".  I
    run out
> of patience before this ever completes (I've let it run for hours.)
    >
    > begin;
    > CREATE TABLE "test_suit_area_h_crop3_diss" (  gid serial PRIMARY
    KEY,
    > "class" char(8) ); select
    >
AddGeometryColumn('','test_suit_area_h_crop3_diss','the_geom','-1','MU
    > LTIPOLYGON',2); INSERT INTO "test_suit_area_h_crop3_diss"
    > (the_geom,class)  SELECT astext(multi(geomunion(the_geom))) AS
> the_geom, class FROM "test_suit_h_crop3_class" GROUP BY class; end;
    >
    > Thanks,
    >
    > Roger
    >
    > _______________________________________________
    >  postgis-users mailing list
    >  [email protected]
    <mailto:[email protected]>
    > http://postgis.refractions.net/mailman/listinfo/postgis-users
    >
    >
    _______________________________________________
    postgis-users mailing list
    [email protected]
    <mailto:[email protected]>
    http://postgis.refractions.net/mailman/listinfo/postgis-users



    _______________________________________________
    postgis-users mailing list
    [email protected]
    <mailto:[email protected]>
    http://postgis.refractions.net/mailman/listinfo/postgis-users


------------------------------------------------------------------------

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to