Hello Guys, As you mentioned Oracle like active session history sampling in this thread, I just want to let you know that I am working on a brand new extension to provide this feature.
You can find the extension here: https://github.com/pgsentinel/pgsentinel Basically, you could see it as samplings of pg_stat_activity (one second interval as default) currently providing more information: ash_time: the sampling time blockers: the number of blockers blockerpid: the pid of the blocker (if blockers = 1), the pid of one blocker (if blockers > 1) top_level_query: the top level statement (in case PL/pgSQL is used) query: the statement being executed (not normalised, as it is in pg_stat_statements, means you see the values) cmdtype: the statement type (SELECT,UPDATE,INSERT,DELETE,UTILITY,UNKNOWN,NOTHING) queryid: the queryid of the statement (the one coming from pg_stat_statements) Thanks to the queryid field you are able to link the session activity with the sql activity. It's implemented as in-memory ring buffer where samples are written with given (configurable) period. Therefore, user can see some number of recent samples depending on history size (configurable). Current caveats: In case of high query rate per pid, you could see (I saw it at more than 1000 queries per second) top_level_query and query not "correlated" (query, queryid and cmdtype are still well linked together). This is due to the fact that those 2 informations are currently collected independently. If you want to have a look, give your thoughts, you are welcome. Bertrand On 26 July 2018 at 03:24, Michael Paquier <mich...@paquier.xyz> wrote: > On Tue, Jul 24, 2018 at 04:23:03PM +0000, Phil Florent wrote: > > It loses non meaningful details and it's in fact a good point. In this > > example, sampling will definitely find the cause and won't cost > > resources. > > The higher the sampling frequency, the more details you get, with the > most load on the instance. So if you are able to take an infinity of > samples, where registering multiple times the same event for the same > backend also matters because its overall weight gets higher and it shows > up higher in profiles, then you would be able converge to the set of > results that this patch adds. Sampling method, especially its > frequency, is something controlled by the client and not the server. > Approaches like the one proposed here push the load on the server-side, > unconditionally, for *all* backends, and this has its cost. > > Even if you have spiky workloads, sampling may miss those, but even with > adding counters for each event you would need to query the table holding > the counters at an insane frequency to be able to perhaps get something > out of it as you need to do sampling of the counters as well to extract > deltas. > > As Tomas has mentioned up-thread, sampling is light-weight, as-is the > current design for wait events. Even if it is not perfect because it > cannot give exact numbers, it would find bottlenecks in really most > cases, and that's what matters. If not, increasing the sampling > frequency makes things easier to detect as well. What would be the > point of taking only one sample every checkpoint for example? > > There may be a benefit in having counters, I don't know the answer to > that, though the point would be to make sure that there is a specific > set of workloads where it makes sense, still my gut feeling is that > sampling would be able to detect those anyway. > > (I am not a computer scientist by default but a physicist, think fluid > dynamics and turbulence, and I had my load of random events and signal > analysis as well. All that is just statistics with attempts to approach > reality, where sampling is a life-saver over exactitude of > measurements.) > > Adding hooks is not acceptable to me either, those have a cost, and it > is not clear what's the benefit we can get into putting hooks in such a > place for cases other than sampling and counters... > -- > Michael >