On Wed Jun 3, 2026 at 1:59 PM UTC, Bertrand Drouvot wrote:
> Hi hackers,
>
> Now that we have global lock statistics since 4019f725f5d, it could be useful
> to have the same kind of information on a per-backend basis.
>
> Indeed, pg_stat_lock gives us cluster-wide aggregates: total waits, total wait
> time, total fast-path exceeded across all backends since last reset.
>
> When we see high numbers, we can't answer:
>
> - Which backend is affected the most?
> - Is it one backend affected or many?
> - Is a specific application or connection pool suffering?
> - After a specific workload/application is improved, did its lock behavior
> improve?
>
> With per-backend lock stats, we could:
>
> 1/ Isolate problematic sessions. We can correlate locks behavior with specific
> PIDs visible in pg_stat_activity: identify the exact application_name or user
> experiencing lock waits.
>
> 2/ Debug live contention. During an incident, we could pinpoint which backends
> are experiencing fast-path exhaustion or lock waits without having to reset
> global stats and lose history.
>
> 3/ Define workload characterization. Different backend types may have very
> different lock profiles. Per-backend stats would let us see this directly.
>
> 4/ Compare before/after per session. We could measure a single backend's lock
> behavior across a specific operation, which is impossible with global counters
> that include metrics from all other backends.
>
> IO and WAL stats already have per-backend counterparts 
> (pg_stat_get_backend_io(),
> pg_stat_get_backend_wal()). Lock stats are the same class of operational data:
> having them only at the global level is an inconsistency that limits 
> observability.

The motivation makes sense to me.

> As far the technical implementation:
>
> This data can be retrieved with a new system function called
> pg_stat_get_backend_lock(), that returns one tuple per lock type based on the 
> PID
> provided in input.
>
> pgstat_flush_backend() gains a new flag value, able to control the flush of 
> the
> lock stats.
>
> This patch relies mostly on the infrastructure provided by 9aea73fc61d4, that
> has introduced backend statistics.
>
> The overhead (2 functions calls and counters increments) on the hot path 
> (normal
> lock acquisition) is zero: counters are only incremented on paths that are 
> already
> "slow" (post deadlock timeout waits, fast-path slot exhaustion) and does not 
> add
> that much memory per-backend: PgStat_PendingLock is 288 bytes.
>
> The patch is made of 2 sub-patches:
>
> 0001: Refactor pg_stat_get_lock() to use a helper function

> +static void
> +pg_stat_lock_build_tuples(ReturnSetInfo *rsinfo,
> +                         PgStat_LockEntry *lock_stats,
> +                         TimestampTz stat_reset_timestamp)

I think that the alignment of the second and third arguments could be 
off by one. They should line up with the capital R in ReturnSetInfo.

> -       values[i] = TimestampTzGetDatum(lock_stats->stat_reset_timestamp);
> +       if (stat_reset_timestamp != 0)
> +           values[i] = TimestampTzGetDatum(stat_reset_timestamp);
> +       else
> +           nulls[i] = true;

It's not super clear to me why this changed in the first patch. Perhaps 
it is meant to be in the second patch? I see in the second patch that we 
use the stat_reset_timestamp from the backend stats instead of the lock 
stats in pg_stat_get_backend_lock(). The motivation makes sense. It 
might be cleaner to move the change into patch 2.

> 0002: Add per-backend lock statistics

> +        Returns lock statistics about the backend with the specified
> +        process ID. The output fields are exactly the same as the ones in the
> +        <structname>pg_stat_lock</structname> view.

It probably makes sense to link to pg_stat_lock here.

Other than the few comments I had, this patchset looks good. It follows 
patterns that were already established with the per-backend IO and WAL 
stats.

-- 
Tristan Partin
PostgreSQL Contributors Team
AWS (https://aws.amazon.com)


Reply via email to