Re: Suitability of postgres for high cardinality high volume usecase?

Brent Wood Wed, 17 Jun 2026 16:54:08 -0700

Hi,

We have a Timeseries database using Postgtres/Postgis/Timescale with around 400 
billion sensor readings from sensors deployed on research vessels at sea since 
1990 stored in it. Very performant.


A different scenario to what you describe, as we are storing the sensor 
readings as a timestamped hstore. We use this because the number of readings 
per timestamp (and which readings they are)  is highly variable. You, however 
are describing a few fixed values per location.

An example of how this is used:

For a deepwater camera deployment we plot vessel & camera positions live in 
QGIS.
The SQL extracts vessel & camera GPS lat & long coordinate values, converts 
these to points & assembles them into linestrings so we can see this on screen.
QGIS auto refreshes the layer every 5 seconds.
It is a hot query, retrieving only a few values from the entire database, 
taking < 50ms (from the 400,000,000,000 readings in the db)

This massively leverages Timescale indexes, which won't apply in your case, but 
suggests you may not have any performance issues.

One aspect I suggest you consider:
    Even when indexed, spatial queries (point in poly) can take a while with 
complex polygons (lots of vertices).
    For frequent or slow spatial queries you can add an indexed boolean column 
representing each polygon & populate it with a flag as to whether each record 
is inside or outside the         specified polygon.
    This runs the spatial query once & essentially caches the result for future 
use. Much faster, and the approach might help with some non-spatial queries as 
well.

I also suggest you not get overly concerned about possible performance issues 
requiring complex schemas & workarounds unless you know you need to. Postgres 
is generally pretty quick, so try a simple implementation, run some queries & 
find out if you have a performance issue that needs resolving before assuming 
you do. At that stage you'll also have a much better idea as to the specific 
problem which is a big help when looking at fixing it.

Postgres has 2 built in percentile functions, percentile_cont() & 
percentille_disc() that may provide what you require. There is no median 
function as such, but that is just a percentile call with a 0.5 parameter.


Cheers,

Brent Wood



________________________________
From: Sohum Banerjea <[email protected]>
Sent: Wednesday, 17 June 2026 3:21 pm
To: [email protected] <[email protected]>
Cc: Tim McEwan <[email protected]>; Waseem Girach 
<[email protected]>; [email protected] 
<[email protected]>
Subject: Suitability of postgres for high cardinality high volume usecase?

Hello,

I am trying to determine the suitability of Postgres for a significant
climate risk modelling project.

We are batch processing a large (500 million) collection of
geographical points. For each point, we store ~6 dimensions of various
risks (total cardinality of several millions of floats per
geographical point).

We need to perform various ad-hoc aggregations on geographical subsets
of the values associated with these points. These aggregations could
require median/percentiles, so they won't be as simple as mean/sum,
and we expect we may have to write custom aggregations for some cases.

Because we may want to run computations that would use PostGIS
features (certainly polygon containment; potentially others), and
because our existing applications already use Postgres, we have some
degree of preference to do this in Postgres.

I'd like to know if anyone here has successfully built a system to run
this sort of computation at this scale in Postgres. If so, what sort
of schema design did you use? Columnar stores referencing spatially
indexed row stores that contain the spatial references, sharded by
geographical region? What sort of throughput did you achieve?

I'm also interested in any general observations folks may have about
this project. Perhaps we should use Clickhouse (for the main data)
together with Postgres (for the GIS computations)? Perhaps our float
dataset should live outside any kind of oltp/olap database at all?
Something else?

And finally, if you have developed a system like this, are you
available to assist us with building this system on a consulting
basis?

Thanks in advance,
—Sohum



Brent Wood
Principal Technician - GIS and Spatial Data Management
+64-4-386-0529
301 Evans Bay Parade, Greta Point, Hataitai, Wellington, New Zealand
Earth Sciences New Zealand
[Earth Sciences New Zealand]<https://earthsciences.nz>
The Institute of Geological and Nuclear Sciences Limited and the National 
Institute of Water and Atmospheric Research Limited joined to become the New 
Zealand Institute for Earth Science Limited. We are known as Earth Sciences New 
Zealand. For more information on the Earth Sciences transition click 
here<https://niwa.co.nz/about-niwa/science-sector-reforms>.

Notice: This email and any attachments may contain information which is 
confidential and/or subject to copyright or legal privilege, and may not be 
used, published or redistributed without the prior written consent of Earth 
Sciences New Zealand. If you are not the intended recipient, please immediately 
notify the sender and delete the email and any attachments. Any opinion or 
views expressed in this email are those of the individual sender and may not 
represent those of Earth Sciences New Zealand.

For information about how we process data and monitor communications please see 
our privacy policy<https://earthsciences.nz/privacy-policy>.

Re: Suitability of postgres for high cardinality high volume usecase?

Reply via email to