Good afternoon, I 'm taking the liberty to add one additional question. The script proposed in this post works. I "just" need two small changes:
- In the final layer, I don't just want to retrieve the res_id, data_id_ori, num_sources, and geom fields, but also many other fields from my source layer, layer_union.Could you tell me at what level of the script I should add these fields that I ultimately want to keep? Is it in the section where the "res_assoc" table is created or the "res_final" table is created? The problem is that if it's at the "res_assoc" table level, I would have no choice but to add them to the group BY? - Second complexity. In the current script, I'm actually creating a union from a single source data: layer_union. But I need to combine this 'layer_union' data with another source data, land use. Thanks so much. Le sam. 12 juil. 2025 à 11:04, Franck Theeten < franck.thee...@africamuseum.be> a écrit : > Hi, > > Out of curiosity, why are you indexing the intermediate table by > ST_Pointonsurface ? > The documentation doesn't guarantee that the returned point is always the > same, so it introduces some degree of randomness. > This should not have an influence on the number of rows in res, but could > have one in the default sort order when joining it with layer_union. > Maybe indexing on st_centroid and/or using an explicit order by on res_id > when building res_assoc could make the processs more stabke ? > > ------------------------------ > *From:* Martin Davis <mtncl...@gmail.com> > *Sent:* Friday, July 11, 2025 21:24 > *To:* celati Laurent <laurent.cel...@gmail.com> > *Cc:* postgis-users@lists.osgeo.org <postgis-users@lists.osgeo.org> > *Subject:* Re: Qgis/Postgis : Multi Union (25 polygonal layers) > > The reply on GIS StackExchange was a good one, so reproducing it here for > this list: > > Can you investigate further as to the differences between two identical > runs, apart from the number of features? Do you get the same behaviour > working on a small subset of the data? Can you see if its "sliver" features > or other tiny differences due perhaps to arithmetic rounding errors - these > can result in non-reproducibility in parallel processing systems when the > order of operations isn't well-defined (ie (a+b+c) could be (a+b)+c or > a+(b+c) which might not be equal because of floating point precision...) > > I would add that this situation can happen even when there is no > parallelism. It can arise whenever there is not determinism in the order > of inputs to operations throughout the process. You might be able to > provide this by sorting every input and intermediate result. But it's > questionable whether this would be worth the time and effort. > > It would be interesting to know the difference in total result area > between the different runs. This should be very low. > > A more detailed test would be to match the two results (via > point-in-polygon) and then compare matched polygon boundaries (via > Hausdorff distance). If there is a significant difference between two > results, that would be worth investigation. > > On Fri, Jul 11, 2025 at 3:22 AM celati Laurent <laurent.cel...@gmail.com> > wrote: > > Good afternoon, > Thanks so much for your message. > I succeed in executing a postgis script without error message. It is > working. > However, i notice a lack of reliability/stabilization. Because when I've > rerun the same process several times, i never end up with exactly the same > number of features in my intermediate/final result tables. > I'm taking the liberty to share you the sql script. > The screenshot compares the number of objects for test v1 and test v2, > (which are identical tests). We can see that there is a difference in the > res_final table, but also in the res table. I ran several tests agai. Still > with different numbers of objects for the res and res_final tables. Often > with larger differences than the one shown in the screenshot. > > Number of entities for each table: > *Test 1*: > layer_union table: 1026194 > res table : 1462661 > res_assoc table : 1462661 > res_final table : 1462661 > > *Test 2* > layer_union table : 1026194 > res table 1462645 > res_assoc table : 1462645 > res_final table : 1462635 > > I share below/and attach the script : > > --Import all shp in postgis db > -- union ALL des geom et attributs des 28 data sources dans une seule table > drop table if exists layer_union; > create table layer_union as > select inrae_2014.data_id, inrae_2014.geom from inrae_2014 UNION ALL > select aesn_2006.data_id, aesn_2006.geom from aesn_2006 UNION ALL > select aesn_2019.data_id, aesn_2019.geom from aesn_2019 UNION ALL > --(...)etc. > > -- res table > drop table if exists res; > create table res as > with tmp as > (select st_union(ST_Force2D(st_boundary(geom))) as geom from layer_union > ) > select (st_dump(st_collectionextract(st_polygonize(geom), 3))).path[1] as > id, > (st_dump(st_collectionextract(st_polygonize(geom), > 3))).geom::geometry(polygon, 2154) as geom > from tmp; > > -- res table id unique > alter table res add column res_id int generated always as identity primary > key; > -- res table index on pointOnSurfacee > create index on res using gist (st_pointOnSurface(geom)); > analyze res; > > -- res_assoc table > --JOIN simple for filter polygons without link with input polygons (for > instance : holes for data input) > drop table if exists res_assoc; > create table res_assoc as > select res.res_id, array_agg(l.data_id) as data_id_ori, count(distinct > l.data_id) as num_sources > from res join layer_union l on st_contains(l.geom, > st_pointonsurface(res.geom)) > group by res.res_id; > -- res_assoc table : index creation > create index on res_assoc(res_id); > analyse res_assoc; > > ----cleaning: we remove from the res table the polygons that did not match > in res_assoc: > -- these are polygons representing holes in the input layers > delete from res > where not exists ( > select null > from res_assoc ra > where ra.res_id = res.res_id); > > > -- -- Final table with the new polygons and the source polygon > information, as a join: > -- Much faster to create a table than to update the res table (after > adding the desired columns). > drop table if exists res_final; > create table res_final as > select ra.res_id, data_id_ori, num_sources, geom::geometry(polygon, 2154) > as geom > from res_assoc ra join res r on ra.res_id = r.res_id; > > Thanks so much > > > Le mar. 8 juil. 2025 à 02:54, <snor...@hillcrestgeo.ca> a écrit : > > Here is working example of Martin's suggestion, for a job that sounds > fairly similar: > https://github.com/bcgov/harvest-restrictions/blob/main/sql/overlay.sql > > > On Jul 7, 2025, at 4:45 PM, Martin Davis <mtncl...@gmail.com> wrote: > > I'd characterize your use case as "Overlay of overlapping polygonal > datasets". The basic state-of-the art for solving this using PostGIS is > still the solution Paul outlined in > https://blog.cleverelephant.ca/2019/07/postgis-overlays.html (or see > https://dr-jts.github.io/postgis-patterns/overlay/overlay.html#count-overlap-depth-in-set-of-polygons > for more ideas). > > Basically, you node and polygonize to make a flat coverage, and then join > back to the parent layers to determine attribution (including counts). > > Doing this in a single query might be slow for very large datasets like > yours, though. You might be able to partition your large dataset and run > smaller queries, possibly in parallel. Also, it might be better to overlay > the small layers first, and then overlay that with the big layer. And if > you don't care about overlaps in the big layer (or if there are none), that > makes it much easier, since you can process each big-layer polygon > independently (and ideally in parallel). > > On Mon, Jul 7, 2025 at 1:16 PM celati Laurent <laurent.cel...@gmail.com> > wrote: > > Dear all, > I'm working with QGIS and PostGIS. As input, I have 25 polygonal layers > covering a large area (multicities area). One of these data is a very large > dataset (1 million objects). The other 24 are much smaller (a maximum of a > hundred objects). > For information, I should point out that some of these polygonal datasets > are in "multi-part features" mode and others in "single-part features" > mode. I imagine this may ultimately have a slight impact on the > method/result. These 25 polygonal .shp files have highly variable, > non-homogeneous/non-harmonized data structures. Each layer has a "data_id" > field that allows to define/link/reference, for each feature, its > membership in the layer. For example, all values in the "data_id" field for > the first layer have a value of '1'. For the second layer, the field values > are '2', etc. > > My goal would be to be able to apply/adapt the existing QGIS geoprocessing > tool called "Multiple Union": > > https://docs.qgis.org/3.40/en/docs/user_manual/processing_algs/qgis/vectoroverlay.html#union-multiple > > Below a screenshot from the QGIS documentation : > > <image.png> > > My goal would be to have an output file: > > > - Which would be the result of the union/overlay of the 25 input > data. To use the terms of the QGIS documentation, the processing should > check for overlaps between features within the 25 layers and create > separate features for the overlapping and non-overlapping parts. This > "multiple union" geoprocessing seems interesting for my goal where there is > no overlap (a, NULL; b, NULL; c, NULL). > > > - For areas where there is an overlap, the QGIS union geoprocessing > creates as many identical overlapping features as there are features > participating in this overlap. This doesn't bother me. But since, > ultimately, I'd like a field in the result/output file to allow, for each > feature, to retrieve the list of input layers that participate/contribute > to this result feature (in order to retrieve the origin/source of the > data). I was wondering/thinking it might be better if only one feature was > created per overlapping area? > > > - I'd like a field in the result file to allow, for each feature, to > retrieve the list of input layers that participate/contribute to this > result feature. In order to retrieve the origin/source of the data. > > > - Ideally, a field that allows you to retrieve the number (COUNT) of > layers that contribute to this feature (at least 1 layer, at most 25 > layers). > > > - Regarding the non-geometric attributes/fields, I would like to be > able to specify the selection/define the list of fields I ultimately want > to keep. I don't want to keep all of the fields, but rather just some of > the fields for each of the 25 input layers. > > > I imagine it's recommended to do this processing in PostGIS rather than > QGIS? I can, if necessary, import my 25 SHP files into a PostGIS database. > I also imagine it's important to keep in mind that the "multi-part > features" / "single-part pieces/features" mode of the input layers can > affect the result. If I'm using a PostGIS query, I was thinking it might be > helpful to force all features to be in single-part mode (using the PostGIS > 'st_dump' function?). > > In advance, Thanks so much for your help, guidance. > > >