Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

Ilya Kosmodemiansky Thu, 25 Jun 2015 05:42:07 -0700

On Thu, Jun 25, 2015 at 1:49 PM, Amit Kapila <[email protected]> wrote:
>> Personally I think, that tracking waits is a not a good idea for
>> pg_stat_activity (at least in that straight-forward manner).
>
> As mentioned in the initial mail by Robert, that sometimes system becomes
> slow (either due to contention on various kinds of locks or due to I/O or
> due
> to some other such reasons) that such kind of handy information via some
> view is quite useful.  Recently while working on one of the
> performance/scalability
> projects, I need to use gdb to attach to different processes to see what
> they
> are doing (of course one can use perf or some other utilities as well) and I
> found most of them were trying to wait on some LW locks, now having such
> an information available via view could be really useful, because sometimes
> at customer sites, we can't use gdb or perf to see what's going on.


Yes, I understand such a use-case. But I hardly see if suggested
design can help for such cases.

Basically, a DBA has two reasons to take a look on waits:

1. Long response time for particular query (or some type of queries).
In that case it is good to know how much time we spend on waiting for
particular resources we need to get query results
2. Overall bad performance of a database. We know, that something goes
wrong and consumes resources, we need to identify which backend, which
query causes the most of waits.

In both cases we need a) some historical data rather than simple
snapshot b) some approach how to aggregate it  because the will be
certainly a lot of events

So my point is, we need separate interface for waits, instead of
integrating in pg_stat_activity. And it should be several interfaces:
one for approximate top of waiting sessions (like
active_sessions_history in oracle), one for detailed tracing of a
session, one for waits per resource statistics etc.

>>  One
>> process can wait for lots of things between 2 sampling of
>> pg_stat_activity and that sampling can be pretty useless.
>>
>
> Yeah, that's right and I am not sure if we should bother about such
> scenario's
> as the system is generally fine in such situations, however there are other
> cases where we can find most of the backends are waiting on one or other
> thing.

I think approach with top of waiting sessions covers both scenarios
(well, with only one exception: if we have billions of very short
waits and high  contention is the problem)

However, it maybe a good idea, to identify the resource we are waiting
for from pg_stat_activity if we are waiting for a long time.

>
> I think this is some what different kind of utility which can give us
> aggregated information and I think this will address different kind of
> usecase and will have somewhat more complex design and it doesn't
> look impossible to use part of what will be developed as part of this
> proposal.
>

I think it is more than possible to mix both approaches. My proof of
concept now is only about LWLocks - yours and Robert's is more
general, and certainly some wait event classification will be needed
for both approaches and its much better to implement one rather than
two different.

And at least, I will be interesting in reviewing your approach.

>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com



-- 
Ilya Kosmodemiansky,

PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
[email protected]


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

Reply via email to