I have a patch which is actually not commitfest-ready now, but it
always better to start discussing proof of concept having some patch
instead of just an idea.

Since I'am a DBA rather than C programmer, I will appreciate any
suggestions/critics about the patch and code quality to make things

What is all about.

>From an Oracle DBA's point of view, currently we have a lack of
performance diagnostics tools. Saying that, principally they mean an
Oracle Wait Interface analogue. The Basic idea is to have counters or
sensors all around database kernel to measure what a particular
backend is currently waiting for and how long/how often it waits.
Obviously, implementing such a complex system is not an easy task.
However decomposing the task and implementing some small diagnostics
tools proved to be a good solution: things like pg_stat_bgwriter,
pg_stat_statements or pg_stat_archiver make life significantly easier.
Implementing such histogram for LWLock tracing was my goal.

Why LWLock tracing is important.

Suppose we have a PostgreSQL instance under heavy write workload, but
we do not know any details. We could pull from time to time
pg_stat_lwlock function which would say pid n1 currently in
WALWriteLock and pid n2 in WALInsertLock. That means we should think
about write ahead log tuning. Or pid n1 is in some clog-related
LWLock, which means we need move clog to ramdisk. This is a stupid
example, but it shows how useful LWLock tracing could be for DBAs.
Even better idea is to collect daily LWLock distribution, find most
frequent of them etc.


As far as I know, there are two major problems implementing LWLock
tracing in Postgres: Performance and stability of the server.

The patch https://commitfest.postgresql.org/action/patch_view?id=885
(discussion starts here I hope -
demonstrates performance problems; LWLOCK_STAT,  LOCK_DEBUG and
DTrace-like approach are slow, unsafe for production use and a bit
clumsy for using by DBA.

An Idea.

An idea of this patch is to trace LWLocks with the lowest possible
performance impact. We put integer lwLockID into procarray, then
acquiring the LWLock we put its id to procarray and now we could pull
procarray using a function to see if particular pid holds LWLock. Not
perfect, but if we see sometimes somebody consumes a lot of particular
LWLocks, we could investigate this matter in a more precise way using
another tool. Something like that was implemented in the attached

issuing pgbench  -c 50 -t 1000 -j 50

we have something like that:

postgres=# select now(),* from pg_stat_lwlock ;
              now              | lwlockid | pid
 2014-10-01 15:11:43.848765+02 |       57 | 4257
(1 row)

postgres=# select now(),* from pg_stat_lwlock ;
              now              | lwlockid | pid
 2014-10-01 15:11:45.892428+02 |       67 | 4269
 2014-10-01 15:11:45.892428+02 |       67 | 4258
 2014-10-01 15:11:45.892428+02 |       57 | 4270
 2014-10-01 15:11:45.892428+02 |       67 | 4245
 2014-10-01 15:11:45.892428+02 |       67 | 4271
 2014-10-01 15:11:45.892428+02 |       57 | 4256
 2014-10-01 15:11:45.892428+02 |       54 | 4241
(7 rows)

postgres=# select now(),* from pg_stat_lwlock ;
              now              | lwlockid | pid
 2014-10-01 15:11:47.211024+02 |       58 | 4262
 2014-10-01 15:11:47.211024+02 |       69 | 4243
 2014-10-01 15:11:47.211024+02 |       69 | 4246
(3 rows)

postgres=# select now(),* from pg_stat_lwlock ;
 now | lwlockid | pid
(0 rows)

postgres=# select now(),* from pg_stat_lwlock ;
              now              | lwlockid | pid
 2014-10-01 15:11:49.897357+02 |       55 | 4240
 2014-10-01 15:11:49.897357+02 |       61 | 4264
 2014-10-01 15:11:49.897357+02 |       55 | 4258
 2014-10-01 15:11:49.897357+02 |       61 | 4260
 2014-10-01 15:11:49.897357+02 |       61 | 4283
 2014-10-01 15:11:49.897357+02 |       62 | 4242

We could collect it to some view or table on a periodic basis.


1. I've decided to put pg_stat_lwlock into extension pg_stat_lwlock
(simply for test purposes). Is it OK, or better to implement it
somewhere inside pg_catalog or in another extension (for example

2. Currently lwLockID is in main procarray (this is for PoC purpose
only). I know why procarray was split into two and I know why the main
one should be kept as small as possible. Anyway, which design approach
better: to keep it inside the main one (this is an important feature,
lwLockID is small and I use proc->lwLockId = T_ID(l) to get the id) or
to put it into another procarray (create new one or use existent)?

3. Which is the best way to retrieve the name of LWLock instead of
only its ID (for usability reasons, WALWriteLock looks more useful
than just 8)? To get it from tranche as T_NAME(lock) at the same time
as lwLockId seems to be not a nice idea, especially to put the name
into procarray to pull it. It spoils whole idea. Any advice?

4. Going through procarray to get pairs pid, lwlockid I do
LWLockAcquire(ProcArrayLock, LW_SHARED); I am not quite sure is a good
idea, because of its performance impact. Probably such a
histogram-style thing does not need strong consistency and
ProcArrayLock is not needed?

So, any thoughts, should I proceed implementing this feature?

Best regards,

PS I am thankful to Simon Riggs and Bruce Momjian for discussing this
idea with me on PGCon, and especially to Simon who gave me a brief but
pretty useful tutorial how LWLock-related code lives in Postgres. Also
I am thankful to Heikki Linnakangas and Magnus Hagander for answering
some of my stupid questions about procarray internals.

Ilya Kosmodemiansky,

tel. +14084142500
cell. +4915144336040

Attachment: pg_stat_lwlock_0.1.patch
Description: Binary data

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to