Hello,

I am trying to determine the suitability of Postgres for a significant
climate risk modelling project.

We are batch processing a large (500 million) collection of
geographical points. For each point, we store ~6 dimensions of various
risks (total cardinality of several millions of floats per
geographical point).

We need to perform various ad-hoc aggregations on geographical subsets
of the values associated with these points. These aggregations could
require median/percentiles, so they won't be as simple as mean/sum,
and we expect we may have to write custom aggregations for some cases.

Because we may want to run computations that would use PostGIS
features (certainly polygon containment; potentially others), and
because our existing applications already use Postgres, we have some
degree of preference to do this in Postgres.

I'd like to know if anyone here has successfully built a system to run
this sort of computation at this scale in Postgres. If so, what sort
of schema design did you use? Columnar stores referencing spatially
indexed row stores that contain the spatial references, sharded by
geographical region? What sort of throughput did you achieve?

I'm also interested in any general observations folks may have about
this project. Perhaps we should use Clickhouse (for the main data)
together with Postgres (for the GIS computations)? Perhaps our float
dataset should live outside any kind of oltp/olap database at all?
Something else?

And finally, if you have developed a system like this, are you
available to assist us with building this system on a consulting
basis?

Thanks in advance,
—Sohum


Reply via email to