Hi Paul,
Appreciate your insights. Good to hear there appear to be opportunities
for improvements to GiST index build speed in the future, even if no
active work is being done right now. Yes, I do think a lot of people,
and an increasing number, could benefit from such work. I personally
would certainly applaud any improvements being made, as it is especially
clear that disk speed is not an issue in most of the processing
involved, and disk speed therefor unlikely to become limiting with any
improvements in index creation, meaning there is likely a good
opportunity for improving GiST index build speed.
Marco
Op 16-9-2020 om 19:05 schreef Paul Ramsey:
On Sep 16, 2020, at 7:35 AM, Marco Boeringa <[email protected]> wrote:
Hi all,
This is probably more of a PostgreSQL question than a PostGIS one, but I have
wondered if there is actually any work going on in allowing PostgreSQL /
PostGIS to build GiST type spatial indexes in parallel, and / or if this is
even logically and technically feasible? According to the PostgreSQL
documentation, only B-tree indexes can be indexed in parallel.
With the ever growing size of spatial databases like OpenStreetMap, with tables
running into the 100s of million records, spatial indexing using GiST is one of
the major bottle necks in re-creating or reloading a spatial PostGIS database.
The indexing process seems highly CPU bound, with negligible disk activity for
the majority of the time the indexing process runs, hence being able to take
advantage of multiple cores seems like a possible big win. Nonetheless, there
seems little to no mention of such (future) option for GiST type indexing when
searching on the internet for relevant information.
Marco,
I do not know if there is active work in the area of making GIST index builds
faster, but I have heard discussions of various approaches from people much
smarter than I, so I am sure there are potential areas of improvement
available. The single-threaded performance of index build might be made faster
with some bulk/batch handling of inserts, though how that interacts with the
generic GIST API expectation of one-at-a-time insertion I do not know.
Probably the biggest hurdle is just that the number of size-constrained GIST
data sets is much smaller than that of BTREE data sets, so it's a lower
priority. Certainly the growth in OSM ubiquity is increasing the number of
users with very large spatial databases they need to index though, so we can
expect more pressure as time goes on.
ATB,
P
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users