Hi  Michael,

Thank you for your feedback!

On Thu, Mar 26, 2026 at 7:59 PM Michael Paquier <[email protected]> wrote:

> On Thu, Mar 26, 2026 at 05:29:25PM -0700, SATYANARAYANA NARLAPURAM wrote:
> > Postgre already has pg_stat_wal for aggregate WAL volume (bytes,
> full-page
> > images, buffers), and pg_walinspect (superuser access required) for
> > post-hoc forensic analysis of individual WAL segments. But I don't see a
> > lightweight, observability tool that answers in real time which record
> > types are responsible for the WAL. Additionally, pg_walinspect runs
> against
> > on-disk WAL files, which is expensive. This view will be useful for
> > monitoring systems to poll cheaply.
> >
> > *Use cases:*
> > WAL volume investigation: see which record types dominate WAL generation
> in
> > real time without touching disk.
> > Monitoring integration: Prometheus/Grafana can poll the view to track WAL
> > composition over time and alert on anomalies.
> > Replication tuning: identify whether WAL volume is dominated by data
> > changes, index maintenance, FPIs, or vacuum activity to guide tuning.
> > Extension debugging: custom WAL resource managers get visibility
> > automatically.
>
> Why is it useful to have access to this information in live for
> monitoring purposes, divided by RMGR?


Per-RMGR breakdown is essentially a real-time X-ray of what the database
is actually doing. Existing view help us understand what queries were
submitted,
but what physical operations resulted from them can be seen with this view.



> What do you define as an
> anomaly in this context and what can be changed on the server side to
> get out of an anomaly, based on the definition you would give to it?
>

A few examples, HOT ratio dropped, BTree page splits, some app adding
logical decoding messages, GIN generates more WAL than the corresponding
heap modifications, high lock counts etc.


>
> The current WAL and IO stats are directly useful because they provide
> numbers about flush, read and write quantity and timings.  These are
> useful metrics for benchmarking.
>
> This proposal only informs about the number of records, and we have a
> various number of record types that have a variable length, that can
> influence the distribution of the data written on disk.
>

yeah, that was a downside, didn't add sizes to keep the overhead less.

As a whole, I am doubtful that this information is worth counting in
> live, eating a stats kind ID.  One could also implement a background
> worker that provides more advanced aggregate stats outside the WAL
> insert path, with a custom stats kind.  No need to have that into
> core, especially if the code that increments the stats eats more and
> more cycles.


Your argument makes sense to me, I was a bit hesitant on
writing a background worker because of the potential side effects of tailing
 the WAL. Let me try a different approach by implementing an ondemand
sniffing of the WAL, which can be implemented as an extension without
changes to core. Do you have thoughts on making it a contrib module or
modify existing pg_walinspect to accommodate these requirements?

Thanks,
Satya

Reply via email to