Re: [postgis-users] Parallel spatial indexing for GiST?

Marco Boeringa Wed, 16 Sep 2020 11:36:33 -0700

Hi Giuseppe,

Thanks for your insights regarding BRIN.

I actually do employ BRIN, but only for Point type geometry, where Ihave the (subjective) feeling that the performance is least degradedcompared to GiST. I also set the 'pages_per_range' parameter to a muchsmaller value than the default. Even with small values for thisparameter, the size and creation times of the resulting index is nothingcompared to GiST.

For Polygon data, the few times I tried using a BRIN type spatial index,I had the feeling it was probably some 3-4 times slower than GiST interms of display times in a GIS, but these aren't hard figures, becauseI did not really time it. I also had the feeling that there wasconsiderably more disk activity needed to access the relevant geometries.

The data is from osm2pgsql, that initially spatially sorts the datausing the default PostGIS spatial sorting / clustering using Hilbertcurve. This should be efficient. I derive tables from that, some ofwhich are additionally being spatially clustered depending on theprocessing they have had (for those tables I actually also need tocreate GiST type spatial indexes, as the PostgreSQL CLUSTER commandcannot use BRIN as input for spatial clustering, due to the nature ofthe index, it will fail with an error / warning about this when youattempt it, and CLUSTER needs a (spatial) index as input).

osm2pgsql itself already seems to optimize indexing, in the sense thatit launches multiple index processes against different tables inparallel. This is a kind of "parallel indexing", but not against asingle table / spatial column. For the processes I developed myself,this is not feasible though, and I would benefit of having parallel GiSTindex creation for a single geometry column.


Marco

Op 16-9-2020 om 18:55 schreef Giuseppe Broccolo:

Hi Marco,
Il giorno mer 16 set 2020 alle ore 15:35 Marco Boeringa<[email protected] <mailto:[email protected]>> ha scritto:
    [...]
    Yes, I know there are BRIN type spatial indexes for PostGIS, which
    are
    comparatively super fast to create and lead to very small indexes
    even
    for ultra large tables, but from the little information and personal
    experience I gathered, BRIN seems most suited for Point data only,
    and
    for static, not updated data, due to its requirement of clustered
    data
    for efficiency (actually not a problem in my particular case, since I
    don't do updates, but only reloads). The few times I tried to use
    it for
    large, spatially clustered, Polygon data sets, it seemed less
    efficient
    when accessing the data spatially in a GIS, with clearly longer
    display
    times, although I don't have real benchmarks for that.

    Most OpenStreetMap related tools like e.g. osm2pgsql also default to
    GiST, and probably with good reason.
About BRIN in PostGIS: it internally works using bounding boxes ofgeometries,as GiST, so in principle you can use this index for any geometry type,and asfar as you use intersect, contains, is_contained operators for 2Dgeometries and
intersects for 3D ones in your geospatial queries.
You are right when you say that BRIN is more suitable for "static"data, becauseof how it internally works - creating a sort of summary of which rangeof tuples areincluded in the data pages physically stored, just to use a few words.New entriesadded during INSERTs or UPDATEs are properly summarised in BRINs asfar asthe new indexed values/geometries are included in ranges/boundingboxes alreadypresent in the index: in case new pages are created with data whichdoes not fallwithin the last summarized range, the new ranges are not automaticallyacquiredin the summary, and the related tuples remain unsummarized until a newsummarizationis invoked, automatically through a VACUUM or manually through|brin_summarize_range |or|brin_summarize_new_valuesfunctions. This allows some maintenance ofthe
|
|index even with non static data, of course with some limitationcompared to GiST.
|
|
|
|About the performance: being a range index it surely performs worsecompared
|
|to Rtree indexes like GiST. How much worse depends from several factors:

|
|1) how the data pages are physically stored: ranges are as moreeffective as possible
|
|as far as spatially close geometries are adjacently stored even inphysical pages the
|
|storage, so the initial import of spatial data should need to be donefollowing some|
|sorting criteria|
|
|
|2) BRIN granularity: performance starts to be closer to an Rtree oneas far as the size||of the block range is small. This can be configured during indexcreation with the||parameter |pages_per_range|, i.e. how many pages are summarised perrange.||Of course, the smaller the number, the larger is the resulting BRINand more time|
|is needed for the creation|
|
|
|GiSTs remain faster even with 2), but I'd suggest checking how thedata was originally||imported into the geospatial DB in order to be sure you could benefitas much as possible|
|from a range index.
|
|
|
|Hope it helps,|
|Giuseppe.
|
|
|

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Re: [postgis-users] Parallel spatial indexing for GiST?

Reply via email to