We have done some work, implementing parallel spatial queries using a spatial declustering algorithm. How large are your datasets?
On Mon, Jan 18, 2016 at 1:51 PM, Rémi Cura <remi.c...@gmail.com> wrote: > Hey, > if you have one beefy server you can parallelize throwing several queries > working on sub set of your data. > (aka parallel processing trough data partition). > One conceptual example : you want to process the world, you create 20 > workers, a list of countries, and then make the worker process the list > country by country. > > If you think one postgres server will not be sufficient, > you could of course shard your data across several servers, > with options ranging from writting from scratch (you rewrite everything), > to using existing open source code, to dedicated solution like > Postgresql-Xc, greenplum, ... > > However, sorry to say this but in your case it looks like your first > improvement step will not come from massive paralleling but from first > better understanding the world of geospatial data and postgis. > > Cheers, > Rémi-C > > 2016-01-18 19:30 GMT+01:00 Vincent Picavet (ml) <vincent...@oslandia.com>: > >> Hi Ravi, >> >> >> >> >> On 18/01/2016 19:14, Ravi Pavuluri wrote: >> > Hi All, >> > >> > I am checking if there is a way to process quickly large datasets such >> > as census blocks in PostGIS and also by leveraging big data platform. I >> > have few questions in this regard. >> > >> > 1) When I try intersect for sample census blocks with another polygon >> > layer, PostGIS 2.2(on Postgres 9.4) takes ~60 minutes (after optimizing >> > from http://postgis.net/2014/03/14/tip_intersection_faster/ ) while on >> > ESRI ArcMap takes ~10 minutes. PostGIS layers already have geospatial >> > indices. Is there anyway to optimize this further? >> >> Following the links on your page, here is a good answer from Paul (TL;DR >> : st_intersection is slow, avoid it) : >> >> http://gis.stackexchange.com/questions/31310/acquiring-arcgis-like-speed-in-postgis/31562 >> >> > 2) What is an equivalent of ESRI Union in PostGIS? I didn't see any out >> > of the box functions and any tips here are appreciated. >> >> If ESRI Union makes a union, maybe st_union ? But I guess there are some >> semantic issues here. >> >> > 3) Is there anyway we can expedite these geoprocessing >> > tasks(union/intersect etc) using big data platform (Ex: hadoop)? Most >> > examples talk about analysis (contains etc) but not about geoprocessing >> > on geospatial data. Any input is appreciated. >> >> Lots of people do geoprocessing too with PostGIS, including long-running >> jobs on large volumes of data ( worldwide osm data processing namely). >> "Big data" is a really subjective word. Are your geoprocessing needs >> really parallelizable ? What kind of volumes are we talking about ? MB, >> GB, TB ? What kind of hardware do you have at hand ? >> >> One way to do some sort of map-reduce with PostGIS is to use a bunch of >> servers with FDW connections between a source master and these slaves, >> map the data processing to the slave servers and reduce it on the main >> server. With a bit of Python as glue code this can be automated and >> quite efficient, even though this kind of sharding is not automated ( >> yet ?). >> >> Vincent >> >> > >> > Thanks, >> > Ravi. >> > >> > >> > _______________________________________________ >> > postgis-users mailing list >> > postgis-users@lists.osgeo.org >> > http://lists.osgeo.org/mailman/listinfo/postgis-users >> > >> >> _______________________________________________ >> postgis-users mailing list >> postgis-users@lists.osgeo.org >> http://lists.osgeo.org/mailman/listinfo/postgis-users > > > > _______________________________________________ > postgis-users mailing list > postgis-users@lists.osgeo.org > http://lists.osgeo.org/mailman/listinfo/postgis-users >
_______________________________________________ postgis-users mailing list postgis-users@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/postgis-users