On 10/07/2014 05:04 PM, Robert Haas wrote:
On Tue, Oct 7, 2014 at 8:03 AM, Bruce Momjian <br...@momjian.us> wrote:
On Fri, Oct  3, 2014 at 06:06:24PM -0400, Bruce Momjian wrote:
I actually don't think that's true. Every lock acquiration implies a
number of atomic locks. Those are expensive. And if you see individual
locks acquired a high number of times in multiple proceses that's
something important. It causes significant bus traffic between sockets,
while not necessarily visible in the lock held times.

True, but I don't think users are going to get much value from those
numbers, and they are hard to get.  Server developers might want to know
lock counts, but in those cases performance might not be as important.

In summary, I think there are three measurements we can take on locks:

1.  lock wait, from request to acquisition
2.  lock duration, from acquisition to release
3.  lock count

I think #1 is the most useful, and can be tracked by scanning a single
PGPROC lock entry per session (as already outlined), because you can't
wait on more than one lock at a time.

#2 would probably require multiple PGPROC lock entries, though I am
unclear how often a session holds multiple light-weight locks
concurrently.  #3 might require global counters in memory.

#1 seems the most useful from a user perspective, and we can perhaps
experiment with #2 and #3 once that is done.

I agree with some of your thoughts on this, Bruce, but there are some
points I'm not so sure about.

I have a feeling that any system that involves repeatedly scanning the
procarray will either have painful performance impact (if it's
frequent) or catch only a statistically insignificant fraction of lock
acquisitions (if it's infrequent).  The reason I think there may be a
performance impact is because quite a number of heavily-trafficked
shared memory structures are bottlenecked on memory latency, so it's
easy to imagine that having an additional process periodically reading
them would increase cache-line bouncing and hurt performance.  We will
probably need some experimentation to find the best idea.

I think the easiest way to measure lwlock contention would be to put
some counters in the lwlock itself.  My guess, based on a lot of
fiddling with LWLOCK_STATS over the years, is that there's no way to
count lock acquisitions and releases without harming performance
significantly - no matter where we put the counters, it's just going
to be too expensive.  However, I believe that incrementing a counter -
even in the lwlock itself - might not be too expensive if we only do
it when (1) a process goes to sleep or (2) spindelays occur.  Those
operations are expensive enough that I think the cost of an extra
shared memory access won't be too significant.

FWIW, I liked Ilya's design. Before going to sleep, store the lock ID in shared memory. When you wake up, clear it. That should be cheap enough to have it always enabled. And it can easily be extended to other "waits", e.g. when you're waiting for input from client.

I don't think counting the number of lock acquisition is that interesting. It doesn't give you any information on how long the waits were, for example. I think the question the user or DBA is trying to answer is "Why is this query taking so long, even though the CPU is sitting idle?". A sampling approach works well for that.

For comparison, the "perf" tool works great for figuring out where the CPU time is spent in a process. It works by sampling. This is similar, but for wallclock time, and that we can hopefully produce more user-friendly output.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to