Hi Paul,

Appreciate your insights. Good to hear there appear to be opportunities for improvements to GiST index build speed in the future, even if no active work is being done right now. Yes, I do think a lot of people, and an increasing number, could benefit from such work. I personally would certainly applaud any improvements being made, as it is especially clear that disk speed is not an issue in most of the processing involved, and disk speed therefor unlikely to become limiting with any improvements in index creation, meaning there is likely a good opportunity for improving GiST index build speed.

Marco

Op 16-9-2020 om 19:05 schreef Paul Ramsey:

On Sep 16, 2020, at 7:35 AM, Marco Boeringa <[email protected]> wrote:

Hi all,

This is probably more of a PostgreSQL question than a PostGIS one, but I have 
wondered if there is actually any work going on in allowing PostgreSQL / 
PostGIS to build GiST type spatial indexes in parallel, and / or if this is 
even logically and technically feasible? According to the PostgreSQL 
documentation, only B-tree indexes can be indexed in parallel.

With the ever growing size of spatial databases like OpenStreetMap, with tables 
running into the 100s of million records, spatial indexing using GiST is one of 
the major bottle necks in re-creating or reloading a spatial PostGIS database. 
The indexing process seems highly CPU bound, with negligible disk activity for 
the majority of the time the indexing process runs, hence being able to take 
advantage of multiple cores seems like a possible big win. Nonetheless, there 
seems little to no mention of such (future) option for GiST type indexing when 
searching on the internet for relevant information.
Marco,
I do not know if there is active work in the area of making GIST index builds 
faster, but I have heard discussions of various approaches from people much 
smarter than I, so I am sure there are potential areas of improvement 
available. The single-threaded performance of index build might be made faster 
with some bulk/batch handling of inserts, though how that interacts with the 
generic GIST API expectation of one-at-a-time insertion I do not know.
Probably the biggest hurdle is just that the number of size-constrained GIST 
data sets is much smaller than that of BTREE data sets, so it's a lower 
priority. Certainly the growth in OSM ubiquity is increasing the number of 
users with very large spatial databases they need to index though, so we can 
expect more pressure as time goes on.
ATB,
P

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to