Re: pg_stat_io_histogram

Andres Freund Mon, 02 Mar 2026 07:06:45 -0800

Hi,

On 2026-03-02 09:01:05 +0100, Jakub Wartak wrote:
> On Thu, Feb 26, 2026 at 5:13 PM Andres Freund <[email protected]> wrote:
> > > > > but I think having it in PgStat_BktypeIO is not great. This makes
> > > > > PgStat_IO 30k*BACKEND_NUM_TYPES bigger, or ~ 0.5MB. Having a stats 
> > > > > snapshot
> > > > > be half a megabyte bigger for no reason seems too wasteful.
> > > >
> > > > Yea, that's not awesome.
> > >
> > > Guys, question, could You please explain me what are the drawbacks of 
> > > having
> > > this semi-big (internal-only) stat snapshot of 0.5MB? I'm struggling to
> > > understand two things:
> > > a) 0.5MB is not a lot those days (ok my 286 had 1MB in the day ;))
> >
> > I don't really agree with that, I guess. And even if I did, it's one thing 
> > to
> > use 0.5MB when you actually use it, it's quite another when most of that
> > memory is never used.
> >
> >
> > With the patch, *every* backend ends up with a substantially larger
> > pgStatLocal. Before:
> >
> > nm -t d --size-sort -r -S src/backend/postgres|head -n20|less
> > (the second column is the decimal size, third the type of the symbol)
> >
> > 0000000004131808 0000000000297456 r yy_transition
> > ...
> > 0000000003916352 0000000000054744 r UnicodeDecompMain
> > 0000000021004896 0000000000052824 B pgStatLocal
> > 0000000003850592 0000000000040416 r unicode_categories
> > ...
> >
> > after:
> > 0000000023220512 0000000000329304 B pgStatLocal
> > 0000000018531648 0000000000297456 r yy_transition
> > ...
> >
> > And because pgStatLocal is zero initialized data, it'll be 
> > on-demand-allocated
> > in every single backend (whereas e.g. yy_transition is read-only shared).  
> > So
> > you're not talking a single time increase, you're multiplying it by the 
> > numer
> > of active connections
> >
> > Now, it's true that most backend won't ever touch pgStatLocal.  However, 
> > most
> > backends will touch Pending[Backend]IOStats, which also increased noticably:
> >
> > before:
> > 0000000021060960 0000000000002880 b PendingIOStats
> > 0000000021057792 0000000000002880 b PendingBackendStats
> >
> > after:
> > 0000000023568416 0000000000018240 b PendingIOStats
> > 0000000023549888 0000000000018240 b PendingBackendStats
> >
> >
> > Again, I think some increase here doesn't have to be fatal, but increasing
> > with mainly impossible-to-use memory seems just too much waste to mee.
> >
> >
> > This also increases the shared-memory usage of pgstats: Before it used 
> > ~300kB
> > on a small system. That nearly doubles with this patch. But that's perhaps
> > less concerning, given it's per-system, rather than per-backend memory 
> > usage.
> >
> >
> >
> > > b) how does it affect anything, because testing show it's not?
> >
> > Which of your testing would conceivably show the effect?  The concern here
> > isn't really performance, it's that it increases our memory usage, which 
> > you'd
> > only see having an effect if you are tight on memory or have a workload that
> > is cache sensitive.
> >
> 
> Oh ok, now I get understand the problem about pgStatLocal properly,
> thanks for detailed
> explanation! (but I'm somewhat I'm still lost a little in the woods of
> pgstat infra). Anyway, I
> agree that PgStat_IO started to be way too big especially when the
> pg_stat_io(_histogram)
> views wouldn't be really accessed.
> 
> How about the attached v6-0002? It now dynamically allocates PgStat_IO
> memory to avoid
> the memory cost (only allocated if pgstat_io_snapshot_cb() is used).Is
> that the right path? And
> if so, perhaps it should allocate it from mxct
> pgStatLocal.snapshot.context instead?


I think even the per-backend pending IO stats are too big. And for both
pending stats, stored stats and snapshots, I still don't think I am OK with
storing so many histograms that are not possible to use.  I think that needs
to be fixed first.

Greetings,

Andres Freund

Re: pg_stat_io_histogram

Reply via email to