Hi Giuseppe,

Thanks for your insights regarding BRIN.

I actually do employ BRIN, but only for Point type geometry, where I have the (subjective) feeling that the performance is least degraded compared to GiST. I also set the 'pages_per_range' parameter to a much smaller value than the default. Even with small values for this parameter, the size and creation times of the resulting index is nothing compared to GiST.

For Polygon data, the few times I tried using a BRIN type spatial index, I had the feeling it was probably some 3-4 times slower than GiST in terms of display times in a GIS, but these aren't hard figures, because I did not really time it. I also had the feeling that there was considerably more disk activity needed to access the relevant geometries.

The data is from osm2pgsql, that initially spatially sorts the data using the default PostGIS spatial sorting / clustering using Hilbert curve. This should be efficient. I derive tables from that, some of which are additionally being spatially clustered depending on the processing they have had (for those tables I actually also need to create GiST type spatial indexes, as the PostgreSQL CLUSTER command cannot use BRIN as input for spatial clustering, due to the nature of the index, it will fail with an error / warning about this when you attempt it, and CLUSTER needs a (spatial) index as input).

osm2pgsql itself already seems to optimize indexing, in the sense that it launches multiple index processes against different tables in parallel. This is a kind of "parallel indexing", but not against a single table / spatial column. For the processes I developed myself, this is not feasible though, and I would benefit of having parallel GiST index creation for a single geometry column.

Marco

Op 16-9-2020 om 18:55 schreef Giuseppe Broccolo:
Hi Marco,

Il giorno mer 16 set 2020 alle ore 15:35 Marco Boeringa <[email protected] <mailto:[email protected]>> ha scritto:

    [...]
    Yes, I know there are BRIN type spatial indexes for PostGIS, which
    are
    comparatively super fast to create and lead to very small indexes
    even
    for ultra large tables, but from the little information and personal
    experience I gathered, BRIN seems most suited for Point data only,
    and
    for static, not updated data, due to its requirement of clustered
    data
    for efficiency (actually not a problem in my particular case, since I
    don't do updates, but only reloads). The few times I tried to use
    it for
    large, spatially clustered, Polygon data sets, it seemed less
    efficient
    when accessing the data spatially in a GIS, with clearly longer
    display
    times, although I don't have real benchmarks for that.

    Most OpenStreetMap related tools like e.g. osm2pgsql also default to
    GiST, and probably with good reason.


About BRIN in PostGIS: it internally works using bounding boxes of geometries, as GiST, so in principle you can use this index for any geometry type, and as far as you use intersect, contains, is_contained operators for 2D geometries and
intersects for 3D ones in your geospatial queries.

You are right when you say that BRIN is more suitable for "static" data, because of how it internally works - creating a sort of summary of which range of tuples are included in the data pages physically stored, just to use a few words. New entries added during INSERTs or UPDATEs are properly summarised in BRINs as far as the new indexed values/geometries are included in ranges/bounding boxes already present in the index: in case new pages are created with data which does not fall within the last summarized range, the new ranges are not automatically acquired in the summary, and the related tuples remain unsummarized until a new summarization is invoked, automatically through a VACUUM or manually through |brin_summarize_range | or|brin_summarize_new_valuesfunctions. This allows some maintenance of the
|
|index even with non static data, of course with some limitation compared to GiST.
|
|
|
|About the performance: being a range index it surely performs worse compared
|
|to Rtree indexes like GiST. How much worse depends from several factors:

|
|1) how the data pages are physically stored: ranges are as more effective as possible
|
|as far as spatially close geometries are adjacently stored even in physical pages the
|
|storage, so the initial import of spatial data should need to be done following some|
|sorting criteria|
|
|
|2) BRIN granularity: performance starts to be closer to an Rtree one as far as the size| |of the block range is small. This can be configured during index creation with the| |parameter |pages_per_range|, i.e. how many pages are summarised per range.| |Of course, the smaller the number, the larger is the resulting BRIN and more time|
|is needed for the creation|
|
|
|GiSTs remain faster even with 2), but I'd suggest checking how the data was originally| |imported into the geospatial DB in order to be sure you could benefit as much as possible|
|from a range index.
|
|
|
|Hope it helps,|
|Giuseppe.
|
|
|

_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to