Re: Improving LWLock wait events

2020-12-22 Thread Craig Ringer
On Wed, 23 Dec 2020 at 15:51, Craig Ringer 
wrote:

>
> I've struggled with this quite a bit myself.
>
>
By the way, I sent in a patch to enhance the static tracepoints available
for LWLocks. See
https://www.postgresql.org/message-id/cagry4nxjo+-hcc2i5h93ttsz4gzo-fsddcwvkb-qafq1zdx...@mail.gmail.com
.

It'd benefit significantly from the sort of changes you mentioned in #4.
For most purposes I've been able to just use the raw LWLock* but having a
nice neat (tranche,index) value would be ideal.

The trace patch has helped me identify some excessively long LWLock waits
in tools I work on. I'll share another of the systemtap scripts I used with
it soon.


Re: Improving LWLock wait events

2020-12-22 Thread Craig Ringer
On Mon, 21 Dec 2020 at 05:27, Andres Freund  wrote:

> Hi,
>
> The current wait events are already pretty useful. But I think we could
> make them more informative without adding real runtime overhead.
>
>
All 1-3 sound pretty sensible to me.

I also think there's a 4, but I think the tradeoffs are a bit more
> complicated:
>

> 4) For a few types of lwlock just knowing the tranche isn't
> sufficient. E.g. knowing whether it's one or different buffer mapping locks
> are being waited on is important to judge contention.
>

I've struggled with this quite a bit myself.

In particular, for tools that validate acquire-ordering safety it's
desirable to be able to identify a specific lock in a backend-independent
way.

The hardest part would be to know how to identify individual locks. The
> easiest would probably be to just mask in a parts of the lwlock address
> (e.g. shift it right by INTALIGN, and then mask in the result into the
> eventId). That seems a bit unsatisfying.
>

It also won't work reliably for locks in dsm segments, since the lock can
be mapped to a different address in different backends.

We could probably do a bit better: We could just store the information about
> tranche / offset within tranche at LWLockInitialize() time, instead of
> computing something just before waiting.  While LWLock.tranche is only
> 16bits
> right now, the following two bytes are currently padding...
>
> That'd allow us to have proper numerical identification for nearly all
> tranches, without needing to go back to the complexity of having tranches
> specify base & stride.
>

That sounds appealing. It'd work for any lock in MainLWLockArray - all
built-in individual LWLocks, LWTRANCHE_BUFFER_MAPPING,
LWTRANCHE_LOCK_MANAGER, LWTRANCHE_PREDICATE_LOCK_MANAGER, any lock
allocated by RequestNamedLWLockTranche().

Some of the other tranches allocate locks in contiguous fixed blocks or in
ways that would let them maintain a counter.

We'd need some kind of "unknown" placeholder value for LWLocks where that
doesn't make sense, though, like most locks allocated by callers that make
their own LWLockNewTrancheId() call and locks in some of the  built-in
tranches not allocated in MainLWLockArray.

So I suggest retaining the current LWLockInitialize() and making it a
wrapper for LWLockInitializeWithIndex() or similar. Use a 1-index and keep
0 as unknown, or use 0-index and use (max-1) as unknown.


Improving LWLock wait events

2020-12-20 Thread Andres Freund
Hi,

The current wait events are already pretty useful. But I think we could
make them more informative without adding real runtime overhead.


1) For lwlocks I think it'd be quite useful to show the mode of acquisition in
pg_stat_activity.wait_event_type, instead of just saying 'LWLock'.

I think we should split PG_WAIT_LWLOCK into
PG_WAIT_LWLOCK_{EXCLUSIVE,SHARED,WAIT_UNTIL_FREE}, and report a different
wait_event_type based on the different class.

The fact that it'd break people explicitly looking for LWLock in
pg_stat_activity doesn't seem to outweigh the benefits to me.


2) I think it's unhelpful that waits for WAL insertion locks to progress show
up LWLock acquisitions. LWLockWaitForVar() feels like a distinct enough
operation that passing in a user-specified wait event is worth the miniscule
incremental overhead that'd add.

I'd probably just make it a different wait class, and have xlog.c compute that
based on the number of the slot being waited for.


3) I have observed waking up other processes as part of a lock release to be a
significant performance factor. I would like to add a separate wait event type
for that. That'd be a near trivial extension to 1)


I also think there's a 4, but I think the tradeoffs are a bit more
complicated:

4) For a few types of lwlock just knowing the tranche isn't
sufficient. E.g. knowing whether it's one or different buffer mapping locks
are being waited on is important to judge contention.

For wait events right now we use 1 byte for the class, 1 byte is unused and 2
bytes are used for event specific information (the tranche in case of
lwlocks).

Seems like we could change the split to be a 4bit class and leave 28bit to the
specific wait event type? And in lwlocks case we could make something like 4
bit class, 10 bit tranche, 20 bit sub-tranche?

20 bit aren't enough to uniquely identify a lock for the larger tranches
(mostly buffer locks, I think), but I think it'd still be enough to
disambiguate.

The hardest part would be to know how to identify individual locks. The
easiest would probably be to just mask in a parts of the lwlock address
(e.g. shift it right by INTALIGN, and then mask in the result into the
eventId). That seems a bit unsatisfying.

We could probably do a bit better: We could just store the information about
tranche / offset within tranche at LWLockInitialize() time, instead of
computing something just before waiting.  While LWLock.tranche is only 16bits
right now, the following two bytes are currently padding...

That'd allow us to have proper numerical identification for nearly all
tranches, without needing to go back to the complexity of having tranches
specify base & stride.

Even more API churn around lwlock initialization isn't desirable :(, but we
could just add a LWLockInitializeIdentified() or such.


Greetings,

Andres Freund