ST_Area and ST_Perimeter functions are relatively low cost, so that fact you 
discovered is not surprising.

 

I think if you were doing something like ST_Distance then the CTE or subquery 
would be more efficient than your direct solution.

 

From: postgis-users [mailto:[email protected]] On Behalf Of 
Marco Boeringa
Sent: Monday, September 28, 2020 10:15 AM
To: [email protected]
Subject: Re: [postgis-users] Setting multiple columns in one UPDATE request

 

Hi Regina,

I can now partially answer my question about performance myself: 

It turns out that for datasets having relatively small geometries (in terms of 
number of vertices, not area, e.g. a few dozen to a few hundred vertices 
maximum) there is actually *NO* benefit at all of rewriting the query either 
with a WITH (CTE) or FROM (Subquery). This may be different though for other 
datasets having much larger geometries, but needs further testing.

In fact, processing is marginally slower, but only by 5-10% or so, compared to 
the original query. 

In my setup, I can also run the query both in a single thread, or using a 
custom Python multi-threaded implementation sending SQL statements in parallel 
to PostgreSQL. Since the test system has a very limited 4 core multi-threaded 
processor, the benefits of the multi-threading versus single threaded 
processing in this case are nil, obviously due to the overhead of the 
multi-threading. The multi-threaded application is as fast as the single 
threaded PostgreSQL worker, or even a bit slower, but puts a far higher load on 
the processor. Of course, with a more modern processor with high core count, 
this experience likely changes.

There also appears to be virtually no difference between using a CTE or the 
subquery as you suggested: subquery is only very marginally faster than CTE.

So for datasets having small geometries, just sticking to the original query 
like:

UPDATE <MY_TABLE> SET area = ST_Area(<GEOMETRY_COLUMN>), area_perimeter = 
ST_Area(<GEOMETRY_COLUMN>) / ST_Perimeter(<GEOMETRY_COLUMN>)

is fine for those datasets. 

I think this result is caused by the fact that the retrieving and storing 
overhead of the geometries (tables stored on SSD), is simply far bigger than 
the actual cost of calculating the area or perimeter for such datasets where 
the majority of geometries is of very limited size (e.g. OSM buildings, simple 
landuse polygons). Additionally, there may be an extra cost due to the needed 
join for the CTE and subquery statements. Lastly, the cost of running ST_Area 
and ST_Perimeter may just be to low as well. There may be other functions in 
PostGIS with a much higher computational cost that would show a benefit from 
rewriting the query.

I will attempt to run a second benchmark using a dataset with much larger 
geometries though (some with well over > 10k vertices), to see if that gives 
the same result, and report back. There may be a difference, but we will see...

Marco

*** Dataset with small geometries (most < 200 vertices) *********

- Single-threaded using ORIGINAL QUERY: 8m45s

- Single-threaded using SUBQUERY (FROM): 8m52s

- Single-threaded using CTE (WITH): 9m13s

- Multi-threaded using ORIGINAL QUERY: 9m27s

- Multi-threaded using SUBQUERY (FROM): 9m44s

- Multi-threaded using CTE (WITH): 9m50s

*******************************************************

 

Op 28-9-2020 om 09:36 schreef Marco Boeringa:

Regina,

Thanks for your suggestion.

How is this performance wise? Is not using a CTE as in your suggestion, 
supposedly faster than with using a CTE, or is this just a syntax thing and 
performance is expected to be equal?

It would still be nice though, if PostgreSQL somehow handled this 
automatically, and one could use the most basic form yet be sure it was 
optimized. It also reads more easily to just see:

UPDATE <MY_TABLE> SET area = ST_Area(<GEOMETRY_COLUMN>), area_perimeter = 
ST_Area(<GEOMETRY_COLUMN>) / ST_Perimeter(<GEOMETRY_COLUMN>)

in your code, instead of more elaborate construct involving a join.

Marco

Op 28-9-2020 om 03:26 schreef Regina Obe:

I prefer doing it in the FROM and not bothering using a CTE.

 

So something like

 

UPDATE <MY_TABLE> SET area = f.area, area_perimeter = f.area/f.perimeter
FROM (SELECT id, ST_Area(<GEOMETRY_COLUMN>) AS area, ST_Perimeter(<GEOMETRY 
COLUMN>) AS perimeter
        FROM <MY TABLE> ) AS f
WHERE f.id = <MY TABLE>.id;
 
 

 

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to