Hi,

I have been experimenting with pg_stat_lock and would like to see
*what kind of operation* is causing waits, not just that waits
happened.  This patch adds a "mode" dimension so the view can
distinguish, e.g., waits caused by VACUUM (ShareUpdateExclusiveLock)
from waits caused by DDL (AccessExclusiveLock), without parsing
log_lock_waits output.

The holder mode is already computed and emitted by log_lock_waits
via GetLockHoldersAndWaiters(), so aggregating it into pg_stat_lock
seems like a natural fit at low cost.

## Proposal

Add a mode column to pg_stat_lock:

  locktype | mode                     | waits | wait_time |
fastpath_exceeded

---------+--------------------------+-------+-----------+------------------
  relation | ShareUpdateExclusiveLock |   312 |     89200 |                0
  relation | AccessExclusiveLock      |    47 |    238400 |                0

For waits / wait_time, the mode records the *blocking* mode,
captured under the lock partition LWLock when the requester joins
the wait queue:

  1. Among modes that conflict with the requester and are currently
     held (lock->granted[m] > 0), pick the strongest one.
  2. If no held mode conflicts (queue-priority-only wait), pick the
     strongest mode in lock->waitMask that conflicts.

Rule (1) takes precedence so the recorded blocker reflects an actual
holder when one exists, falling back to a queued waiter only for
pure queue-priority waits.

For fastpath_exceeded, the mode records the *requested* mode (slot
exhaustion has no blocker); see open question 3 below.

## Use cases

Three operational questions that are awkward today and trivial
with this column:

1. "Is VACUUM impacting user workload?"
   -> sum(wait_time) where mode = 'ShareUpdateExclusiveLock'.

2. "Did the last DDL migration cause user-visible waits?"
   -> diff of waits/wait_time on mode = 'AccessExclusiveLock'
      across the deployment window.

3. "Are SELECTs being blocked indirectly via queue priority?"
   -> AccessShare requesters appearing with a stronger blocker mode
      indicate they were queued behind a stronger waiter.

## Cost

  - Shared memory: +~2.3 kB per cluster (one PgStatShared_Lock
    expanded from [locktype] to [locktype][mode]).
  - No new instrumentation in the fast path.  The blocker-mode
    snapshot loop runs only when the request would otherwise wait.

## Alternatives considered

  A) requester_mode only.  Same storage, simpler implementation,
     but the information is largely inferable from locktype plus
     context.  No operational question seemed to need it.

  B) Both requester_mode and blocker_mode.  Most informative and
     could be added, but the view output becomes much larger.
     Users wanting that detail probably reach for pg_wait_sampling
     or log_lock_waits already.

This patch proposes "blocker mode only" as a middle ground.

## Open questions

1. Multi-blocker representation.
   When several conflicting modes are held simultaneously, the
   proposal picks the strongest -- the one whose release the
   waiter needs.  Other conventions worth considering?

2. Per-row attribution for chained waits.
   Each wait event produces one row increment based on the
   proximate blocker.  When waits are chained, multiple rows are
   incremented for what is effectively the same contention:

     - TX1 holds AccessShareLock (long SELECT).
     - TX2 requests AccessExclusiveLock, queues behind TX1.
     - TX3 requests AccessShareLock, queues behind TX2.

   Result:
     locktype | mode                | waits | wait_time
     relation | AccessShareLock     | 1     | ~90s   (TX2)
     relation | AccessExclusiveLock | 1     | ~89s   (TX3)

   Both rows are individually accurate snapshots of "what each
   waiter was queued behind", but a naive read of the second row
   misses that TX1's long SELECT is the underlying cause.  Note
   that sum(wait_time) being per-waiter (not wall-clock contention
   time) is already true of the current pg_stat_lock; this proposal
   just makes it more visible.

   Is documenting this contract sufficient, or should the patch
   fold chained waits into a single attribution?  I lean toward
   documenting.

3. Mode column semantics across counters.
   The mode column has two interpretations depending on which
   counter is non-zero:

     - waits / wait_time: the *blocker* mode.
     - fastpath_exceeded: the *requested* mode (slot exhaustion
       has no blocker).

   Both are useful for diagnosis (the latter shows which workload
   types exhaust per-backend slots).  Carrying two semantics on
   one column is admittedly awkward.  Alternatives:

     a) Document the dual semantic, keep one column.
     b) Move fastpath_exceeded to a separate view.
     c) NULL out mode for fastpath_exceeded rows, losing per-mode
        breakdown.

   I lean toward (a); the column name "mode" (rather than
   "blocker_mode") was chosen with this in mind.

Draft patch v1 attached. Documentation updates are intentionally omitted
from v1.

Regards,
Tatsuya Kawata

Attachment: v1-0001-pg_stat_lock-add-blocker-mode-dimension.patch
Description: Binary data

Reply via email to