Hi all,

After Paul's remarks here on the list about the cost of TOAST in relation to the optimal default for ST_Subdivide's vertex limit (96 according to Paul's tests), I got a bit fascinated and wanted to do some testing myself.

Until Paul's remark, I never gave much thought about TOAST overhead in relation to my OpenStreetMap database. I just simply assumed it as a fact, as it was likely big geometries needed TOASTing in many cases.

However, since Paul gave a clear guideline to prevent TOASTing, I gave it a try and collected some rough statistics.

The data is from generalized OpenStreetMap woodland polygons, some of which are absolutely huge before ST_Subdivide kicks in in the generalization processing (> 100k vertices), as I amalgamate them to bigger structures in the generalization processing.

I now tested with two subdivide limits: the default 5000 I had been using up to now, which seemed a reasonable compromise between limiting the number of vertices in a polygon and the number of output polygons at the same time: not to small to generate large amounts of splits, but also not to big to cause issues with display times.

Next, I used Paul's recommended "prevent TOAST" limit of 96 vertices. I subsequently looked at display times for the entire dataset in QGIS by zooming to the dataset's extent and timing the display time.

The result are as follows:

96 vertices: 1,996,226 records: 1.8GB disk size: 33s / 32s /33s / 32s
5000 vertices: 1,332,258 records: 2.7GB disk size: 45s / 39s / 38s / 39s

A few take aways:

- What I never realised before, is also the disk size cost of TOAST: as can be seen, the '5000' limit size, which requires many geometries to be TOASTed, results in an almost 40% larger disk size for the relation according to DBeaver (2.7 versus 1.8 GB for '5000' versus '96' vertex limit).

- Non-TOASTed records have an about 20-35% faster retrieval time, although it seems that especially the initial time for TOASTed has a bigger delay (45s), I guess this is because the de-TOASTed records are subsequently cached. Even taking that in account, the overhead seems to plateau at 20% minimum.

- Counter-intuitively, displaying almost 600k (non-TOASTed) records  more due to much smaller ST_Subdivide vertex limit, is still considerably faster than the displaying the smaller (in terms of records) dataset that did get TOASTed.

Does this all seem about right? And does this fit other users experiences?

Of course, despite the gains of non TOASTing, you still have to evaluate for each dataset whether subdividing even makes sense: it is usually the last step in processing, and if you actually need the entire polygon for e.g. labelling purposes in QGIS, than subdividing in such small pieces as to prevent TOASTing, doesn't make sense at all.

Marco

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to