Hi Giuseppe,
Thanks for your insights regarding BRIN.
I actually do employ BRIN, but only for Point type geometry, where I
have the (subjective) feeling that the performance is least degraded
compared to GiST. I also set the 'pages_per_range' parameter to a much
smaller value than the default. Even with small values for this
parameter, the size and creation times of the resulting index is nothing
compared to GiST.
For Polygon data, the few times I tried using a BRIN type spatial index,
I had the feeling it was probably some 3-4 times slower than GiST in
terms of display times in a GIS, but these aren't hard figures, because
I did not really time it. I also had the feeling that there was
considerably more disk activity needed to access the relevant geometries.
The data is from osm2pgsql, that initially spatially sorts the data
using the default PostGIS spatial sorting / clustering using Hilbert
curve. This should be efficient. I derive tables from that, some of
which are additionally being spatially clustered depending on the
processing they have had (for those tables I actually also need to
create GiST type spatial indexes, as the PostgreSQL CLUSTER command
cannot use BRIN as input for spatial clustering, due to the nature of
the index, it will fail with an error / warning about this when you
attempt it, and CLUSTER needs a (spatial) index as input).
osm2pgsql itself already seems to optimize indexing, in the sense that
it launches multiple index processes against different tables in
parallel. This is a kind of "parallel indexing", but not against a
single table / spatial column. For the processes I developed myself,
this is not feasible though, and I would benefit of having parallel GiST
index creation for a single geometry column.
Marco
Op 16-9-2020 om 18:55 schreef Giuseppe Broccolo:
Hi Marco,
Il giorno mer 16 set 2020 alle ore 15:35 Marco Boeringa
<[email protected] <mailto:[email protected]>> ha scritto:
[...]
Yes, I know there are BRIN type spatial indexes for PostGIS, which
are
comparatively super fast to create and lead to very small indexes
even
for ultra large tables, but from the little information and personal
experience I gathered, BRIN seems most suited for Point data only,
and
for static, not updated data, due to its requirement of clustered
data
for efficiency (actually not a problem in my particular case, since I
don't do updates, but only reloads). The few times I tried to use
it for
large, spatially clustered, Polygon data sets, it seemed less
efficient
when accessing the data spatially in a GIS, with clearly longer
display
times, although I don't have real benchmarks for that.
Most OpenStreetMap related tools like e.g. osm2pgsql also default to
GiST, and probably with good reason.
About BRIN in PostGIS: it internally works using bounding boxes of
geometries,
as GiST, so in principle you can use this index for any geometry type,
and as
far as you use intersect, contains, is_contained operators for 2D
geometries and
intersects for 3D ones in your geospatial queries.
You are right when you say that BRIN is more suitable for "static"
data, because
of how it internally works - creating a sort of summary of which range
of tuples are
included in the data pages physically stored, just to use a few words.
New entries
added during INSERTs or UPDATEs are properly summarised in BRINs as
far as
the new indexed values/geometries are included in ranges/bounding
boxes already
present in the index: in case new pages are created with data which
does not fall
within the last summarized range, the new ranges are not automatically
acquired
in the summary, and the related tuples remain unsummarized until a new
summarization
is invoked, automatically through a VACUUM or manually through
|brin_summarize_range |
or|brin_summarize_new_valuesfunctions. This allows some maintenance of
the
|
|index even with non static data, of course with some limitation
compared to GiST.
|
|
|
|About the performance: being a range index it surely performs worse
compared
|
|to Rtree indexes like GiST. How much worse depends from several factors:
|
|1) how the data pages are physically stored: ranges are as more
effective as possible
|
|as far as spatially close geometries are adjacently stored even in
physical pages the
|
|storage, so the initial import of spatial data should need to be done
following some|
|sorting criteria|
|
|
|2) BRIN granularity: performance starts to be closer to an Rtree one
as far as the size|
|of the block range is small. This can be configured during index
creation with the|
|parameter |pages_per_range|, i.e. how many pages are summarised per
range.|
|Of course, the smaller the number, the larger is the resulting BRIN
and more time|
|is needed for the creation|
|
|
|GiSTs remain faster even with 2), but I'd suggest checking how the
data was originally|
|imported into the geospatial DB in order to be sure you could benefit
as much as possible|
|from a range index.
|
|
|
|Hope it helps,|
|Giuseppe.
|
|
|
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users