Hello all: I am humbly asking for help in dealing with slow st_within & st_intersection queries with large (multi)polygons.
In my use case, I need to allow users to enter geographic areas (mosty polygons) through a web interface, and then determine if these areas lie completely with various continent or regional shapes (e.g. Europe, South Asia). The database also includes all countries in the world, as well as stats/oblasts/cantons/etc. for large countries. As the precision in these areas is relatively high (Europe is a multipolygon defined by ~1e6 points), this takes quite a long time. The Postgis manual says: "Just because you *can* store all of Europe in one polygon doesn't mean you *should*." However, in my case I do need to store the entire polygon, as all polygons defined in the system will be regularly exported. In my case, simplifying polygons is not really an option, as user inputs can be points (e.g. factories), and it is quite important to know whether they are on the Rhein in Germany or in France, for example. This affects the electricity mix they use, the markets they buy their inputs from, etc. Looking through the mailing list & the web, it seems like there are several ways to improve database performance: 1) Performing queries initially against a simplified set of polygons. However, I am not sure how this is faster than using the bounding box that is already indexed, and queries would still be made against the most complex shapes, as they are quite large. 2) "Denormalizing" data, i.e. splitting polygons up into each separate UTM zone, and then running queries against input polygons which are similarly separated. This is recommended by the user manual, but I am not sure if it makes sense in my case. Is there a working code sample somewhere that does these calculations automatically? Is the easiest way simply to create polygons that define each UTM zone, and then do an intersection query for each one? 3) Adding extra indices. I am not sure what else could be indexed, but maybe there is something that I am not seeing. The only queries I need are within and intersects/intersection. 4) Caching results. As the base data set does not change often, and the number of total shapes is relatively small (<1000 even after user input), calculations could be done once, slowly, and then referenced, with the appropriate triggers to flush the cache upon updates or deletes. Any ideas, comments, or questions would be greatly appreciated. Yours, Chris -- ############################ Chris Mutel Ökologisches Systemdesign - Ecological Systems Design Institut f.Umweltingenieurwissenschaften - Institute for Environmental Engineering ETH Zürich - HIF C 42 - Schafmattstr. 6 8093 Zürich Telefon: +41 44 633 71 45 - Fax: +41 44 633 10 61 ############################ _______________________________________________ postgis-users mailing list [email protected] http://postgis.refractions.net/mailman/listinfo/postgis-users
