Re: [Proposal] Add accumulated statistics for wait event

Bertrand DROUVOT Thu, 26 Jul 2018 03:07:00 -0700

Hello Guys,

As you mentioned Oracle like active session history sampling in this
thread, I just want to let you know that I am working on a brand new
extension to provide this feature.


You can find the extension here: https://github.com/pgsentinel/pgsentinel

Basically, you could see it as samplings of pg_stat_activity (one second
interval as default) currently providing more information:

ash_time: the sampling time
        blockers:  the number of blockers
        blockerpid: the pid of the blocker (if blockers = 1), the pid of
one blocker (if blockers > 1)
top_level_query: the top level statement (in case PL/pgSQL is used)
query: the statement being executed (not normalised, as it is in
pg_stat_statements, means you see the values)
        cmdtype: the statement type
(SELECT,UPDATE,INSERT,DELETE,UTILITY,UNKNOWN,NOTHING)
queryid: the queryid of the statement (the one coming from
pg_stat_statements)

Thanks to the queryid field you are able to link the session activity with
the sql activity.

It's implemented as in-memory ring buffer where samples are written with
given (configurable) period.
Therefore, user can see some number of recent samples depending on history
size (configurable).

Current caveats: In case of high query rate per pid, you could see (I saw
it at more than 1000 queries per second) top_level_query and query not
"correlated" (query, queryid and cmdtype are still well linked together).
This is due to the fact that those 2 informations are currently collected
independently.

If you want to have a look, give your thoughts, you are welcome.

Bertrand

On 26 July 2018 at 03:24, Michael Paquier <mich...@paquier.xyz> wrote:

> On Tue, Jul 24, 2018 at 04:23:03PM +0000, Phil Florent wrote:
> > It loses non meaningful details and it's in fact a good point. In this
> > example, sampling will definitely find the cause and won't cost
> > resources.
>
> The higher the sampling frequency, the more details you get, with the
> most load on the instance.  So if you are able to take an infinity of
> samples, where registering multiple times the same event for the same
> backend also matters because its overall weight gets higher and it shows
> up higher in profiles, then you would be able converge to the set of
> results that this patch adds.  Sampling method, especially its
> frequency, is something controlled by the client and not the server.
> Approaches like the one proposed here push the load on the server-side,
> unconditionally, for *all* backends, and this has its cost.
>
> Even if you have spiky workloads, sampling may miss those, but even with
> adding counters for each event you would need to query the table holding
> the counters at an insane frequency to be able to perhaps get something
> out of it as you need to do sampling of the counters as well to extract
> deltas.
>
> As Tomas has mentioned up-thread, sampling is light-weight, as-is the
> current design for wait events.  Even if it is not perfect because it
> cannot give exact numbers, it would find bottlenecks in really most
> cases, and that's what matters.  If not, increasing the sampling
> frequency makes things easier to detect as well.  What would be the
> point of taking only one sample every checkpoint for example?
>
> There may be a benefit in having counters, I don't know the answer to
> that, though the point would be to make sure that there is a specific
> set of workloads where it makes sense, still my gut feeling is that
> sampling would be able to detect those anyway.
>
> (I am not a computer scientist by default but a physicist, think fluid
> dynamics and turbulence, and I had my load of random events and signal
> analysis as well.  All that is just statistics with attempts to approach
> reality, where sampling is a life-saver over exactitude of
> measurements.)
>
> Adding hooks is not acceptable to me either, those have a cost, and it
> is not clear what's the benefit we can get into putting hooks in such a
> place for cases other than sampling and counters...
> --
> Michael
>

Re: [Proposal] Add accumulated statistics for wait event

Reply via email to