In addition to the aforementioned solutions and recommendations (this is a fun 
thread), if your geometry are points than you could leverage the pointcloud 
extension which would do a bit of the clustering for you by storing the points 
in patches.

[http://sig.cmparks.net/cmp-ms-90x122.png]Stephen V. Mather
GIS Manager
(216) 635-3243<tel:(216)%20635-3243> (Work)
--sent from phone--



On Jan 15, 2019, at 16:45, Felix Kunde 
<felix-ku...@gmx.de<mailto:felix-ku...@gmx.de>> wrote:

As already mentioned by others, the more rows the longer it takes to build the 
index.

In preparation for a conference talk, I'm currently testing the performance of 
different spatial indexes that PostGIS offers: GiST, sp-GiST and BRIN. I'm also 
testing with 1 billion randomly generated points, lines and polygons and it 
took me around 6 hours to build the GiST index on my laptop. sp-GIST was 
usually twice as fast.

So far, I can say that GiST tends to slightly faster for large data sets than 
sp-GIST and often twice as fast as BRIN. The big advantage of BRIN imo is that 
it's fast to build (6hrs vs. 3,5 min) and requires hardly any disk space (50 GB 
vs. 3,6 MB) due to its simplicity. Just give it a try and see if the executions 
times are fine for you. Note, that it can be necessary to execute SET 
enable_seqscan = false before. In my case, only then did the query planner 
recognize my BRIN index.

For optimal performance your spatial can be clustered on disk. For this, I 
created a functional btree index transforming the geometries with ST_GeoHash. 
Btree indexes can be created in parallel with the latest Postgres version (took 
me 30min). Unfortunately, the CLUSTER command then takes a long time (8hrs on 
my machine). In the end my test query (a simple containment test) took around 
250ms with BRIN (and 220ms with GIST and sp-GIST).

lg Felix


Gesendet: Sonntag, 13. Januar 2019 um 18:38 Uhr
Von: "Giuseppe Broccolo" <g.broccol...@gmail.com<mailto:g.broccol...@gmail.com>>
An: "PostGIS Users Discussion" 
<postgis-users@lists.osgeo.org<mailto:postgis-users@lists.osgeo.org>>
Betreff: Re: [postgis-users] Slow construction of GiST index, but better with 
smaller # of big rows
Hi Wembo,

Il giorno sab 12 gen 2019 alle ore 16:29 Wenbo Tao 
<taowenbo1...@gmail.com<mailto:taowenbo1...@gmail.com>> ha scritto:
Hello,

    I was trying to build a GiST index on a geometry column in a table with 1 
billion rows. It took an entire week to finish.

    Then I reduced the number of rows by grouping closer objects into one clump 
(using some clustering algorithm), and then compressed the clump as one row 
(the geometry column becomes the bounding box of all objects in that clump). 
The construction then went way faster -- down to 12 hours. I did this because 
the query I need to answer is finding all objects whose bbox intersects with a 
given rectangle. I can now query all clumps whose bbox intersects with the 
rectangle.

   So essentially, the index construction is slow for too many rows, but much 
faster for a smaller # of bigger rows. Any intuition why this is the case would 
be greatly appreciated!

Well, building GiST indexes requires an execution time that grows linearly with 
the size of the dataset (~O(N)). Of course, also hardware (CPU, storage, ...) 
impacts the build. So long execution times for one billion rows sound 
reasonable.

Your solution could be fine: you cluster close objects and index the obtained 
rows, than you can retrieve the clusters themselves and finally find the exact 
match. Of course, it is not an "elegant" solution.

You already had the suggestion to partition your table, and then index the 
single partitions, that could be completely fine.

A second suggestion I would like to give you, is to consider BRIN indexing, 
thought specifically for large datasets:

https://postgis.net/docs/using_postgis_dbmanagement.html#brin_indexes

Of course, there are some limitations with this index, so I invite you to read 
the linked documentation and consider your specific use case. But for 
intersections between bbox (and your case looks to be the case), BRINs could be 
a really good solution.

Hope this can help,
Giuseppe.
_______________________________________________ postgis-users mailing list 
postgis-users@lists.osgeo.org<mailto:postgis-users@lists.osgeo.org> 
https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org<mailto:postgis-users@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to