[postgis-users] ST_Subdivide vertex limit and TOAST overhead

Marco Boeringa Wed, 26 Jan 2022 01:32:53 -0800

Hi all,

After Paul's remarks here on the list about the cost of TOAST inrelation to the optimal default for ST_Subdivide's vertex limit (96according to Paul's tests), I got a bit fascinated and wanted to do sometesting myself.

Until Paul's remark, I never gave much thought about TOAST overhead inrelation to my OpenStreetMap database. I just simply assumed it as afact, as it was likely big geometries needed TOASTing in many cases.

However, since Paul gave a clear guideline to prevent TOASTing, I gaveit a try and collected some rough statistics.

The data is from generalized OpenStreetMap woodland polygons, some ofwhich are absolutely huge before ST_Subdivide kicks in in thegeneralization processing (> 100k vertices), as I amalgamate them tobigger structures in the generalization processing.

I now tested with two subdivide limits: the default 5000 I had beenusing up to now, which seemed a reasonable compromise between limitingthe number of vertices in a polygon and the number of output polygons atthe same time: not to small to generate large amounts of splits, butalso not to big to cause issues with display times.

Next, I used Paul's recommended "prevent TOAST" limit of 96 vertices. Isubsequently looked at display times for the entire dataset in QGIS byzooming to the dataset's extent and timing the display time.


The result are as follows:

96 vertices: 1,996,226 records: 1.8GB disk size: 33s / 32s /33s / 32s
5000 vertices: 1,332,258 records: 2.7GB disk size: 45s / 39s / 38s / 39s

A few take aways:

- What I never realised before, is also the disk size cost of TOAST: ascan be seen, the '5000' limit size, which requires many geometries to beTOASTed, results in an almost 40% larger disk size for the relationaccording to DBeaver (2.7 versus 1.8 GB for '5000' versus '96' vertexlimit).

- Non-TOASTed records have an about 20-35% faster retrieval time,although it seems that especially the initial time for TOASTed has abigger delay (45s), I guess this is because the de-TOASTed records aresubsequently cached. Even taking that in account, the overhead seems toplateau at 20% minimum.

- Counter-intuitively, displaying almost 600k (non-TOASTed) records more due to much smaller ST_Subdivide vertex limit, is stillconsiderably faster than the displaying the smaller (in terms ofrecords) dataset that did get TOASTed.


Does this all seem about right? And does this fit other users experiences?

Of course, despite the gains of non TOASTing, you still have to evaluatefor each dataset whether subdividing even makes sense: it is usually thelast step in processing, and if you actually need the entire polygon fore.g. labelling purposes in QGIS, than subdividing in such small piecesas to prevent TOASTing, doesn't make sense at all.


Marco

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

[postgis-users] ST_Subdivide vertex limit and TOAST overhead

Reply via email to