Re: [Proposal] Add accumulated statistics for wait event
Hi Andres, On Mon, 14 Jun 2021 15:01:14 -0700 Andres Freund wrote: > On 2021-06-14 23:20:47 +0200, Jehan-Guillaume de Rorthais wrote: > > > On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote: > > > > In the patch in attachment, I tried to fix this by using kind of an > > > > internal hook for pgstat_report_wait_start and pgstat_report_wait_end. > > > > This allows to "instrument" wait events only when required, on the fly, > > > > dynamically. > > > > > > That's *far worse*. You're adding an indirect function call. Which > > > requires loading a global variable and then a far call to a different > > > function. You're changing a path that's ~2 instructions with minimal > > > dependencies (and no branches (i.e. fully out of order executable) to > > > something on the order of ~15 instructions with plenty dependencies and > > > at least two branches (call, ret). > > > > Oh, I didn't realized it would affect all queries, even when > > log_statement_stats was off. Thank you for your explanation. > > Maybe I just am misunderstanding what you were doing? As far as I can > tell your patch changed pgstat_report_wait_start() to be an indirect > function call - right? Exact. I didn't realized this indirection would be so costy on every single calls, after the variable assignation itself. > Then yes, this adds overhead to everything. I understand now, thank you for the explanation. For my own curiosity and study, I'll remove this indirection and bench my patch anyway. > You *could* add a pgstat_report_wait_(start|end)_with_time() or such and > only use that in places that won't have a high frequency. But I just > don't quite see the use-case for that. Well, it could be useful if we decide to only track a subset of wait event. In my scenario, I originally wanted to only track ClientWrite, but then I realized this might be too specific and tried to generalize. There are probably some other way to deal with this issue. Eg.: * do NOT include the time lost waiting for the frontend side in the query execution time * expose the frontend part of the query time in log_min_duration_stmt, auto_explain, pg_stat_statements, in the same fashion we currently do with planning and execution time * having some wait-even sampling mechanism in core or as easy and hot-loadable than auto_explain Thoughts? Thanks again! Regards,
Re: [Proposal] Add accumulated statistics for wait event
Hi, On 2021-06-14 23:20:47 +0200, Jehan-Guillaume de Rorthais wrote: > > On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote: > > > In the patch in attachment, I tried to fix this by using kind of an > > > internal > > > hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows > > > to > > > "instrument" wait events only when required, on the fly, dynamically. > > > > That's *far worse*. You're adding an indirect function call. Which requires > > loading a global variable and then a far call to a different function. > > You're > > changing a path that's ~2 instructions with minimal dependencies (and no > > branches (i.e. fully out of order executable) to something on the order of > > ~15 > > instructions with plenty dependencies and at least two branches (call, ret). > > Oh, I didn't realized it would affect all queries, even when > log_statement_stats > was off. Thank you for your explanation. Maybe I just am misunderstanding what you were doing? As far as I can tell your patch changed pgstat_report_wait_start() to be an indirect function call - right? Then yes, this adds overhead to everything. You *could* add a pgstat_report_wait_(start|end)_with_time() or such and only use that in places that won't have a high frequency. But I just don't quite see the use-case for that. Greetings, Andres Freund
Re: [Proposal] Add accumulated statistics for wait event
Hi, On Mon, 14 Jun 2021 11:27:21 -0700 Andres Freund wrote: > On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote: > > In the patch in attachment, I tried to fix this by using kind of an internal > > hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to > > "instrument" wait events only when required, on the fly, dynamically. > > That's *far worse*. You're adding an indirect function call. Which requires > loading a global variable and then a far call to a different function. You're > changing a path that's ~2 instructions with minimal dependencies (and no > branches (i.e. fully out of order executable) to something on the order of ~15 > instructions with plenty dependencies and at least two branches (call, ret). Oh, I didn't realized it would affect all queries, even when log_statement_stats was off. Thank you for your explanation. > I doubt there's a path towards this feature without adding the necessary > infrastructure to hot-patch the code - which is obviously quite a > substantial project. Right. Sadly, this kind of project is far above what I can do. So I suppose it's a dead end for me. I'll study if/how the sampling approach can be done dynamically. Thank you,
Re: [Proposal] Add accumulated statistics for wait event
Hi, On 2021-06-14 11:27:21 -0700, Andres Freund wrote: > On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote: > > In the patch in attachment, I tried to fix this by using kind of an internal > > hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to > > "instrument" wait events only when required, on the fly, dynamically. > > That's *far worse*. You're adding an indirect function call. Which requires > loading a global variable and then a far call to a different function. You're > changing a path that's ~2 instructions with minimal dependencies (and no > branches (i.e. fully out of order executable) to something on the order of ~15 > instructions with plenty dependencies and at least two branches (call, ret). In the case at hand it might even be worse, because the external function call will require registers to be spilled for the function call. Right now wait events "use" two register (one for the wait event, one for my_wait_event_info), but otherwise don't cause additional spilling. With your change you'd see register spill/reload around both wait start and end. Greetings, Andres Freund
Re: [Proposal] Add accumulated statistics for wait event
Hi, On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote: > In the patch in attachment, I tried to fix this by using kind of an internal > hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to > "instrument" wait events only when required, on the fly, dynamically. That's *far worse*. You're adding an indirect function call. Which requires loading a global variable and then a far call to a different function. You're changing a path that's ~2 instructions with minimal dependencies (and no branches (i.e. fully out of order executable) to something on the order of ~15 instructions with plenty dependencies and at least two branches (call, ret). I doubt there's a path towards this feature without adding the necessary infrastructure to hot-patch the code - which is obviously quite a substantial project. Greetings, Andres Freund
Re: [Proposal] Add accumulated statistics for wait event
Hi Andres, Hi all, First, thank you for your feedback! Please find in attachment a patch implementing accumulated wait event stats only from the backend point of view. As I wrote when I reviewed and rebased the existing patch, I was uncomfortable with the global approach. I still volunteer to work/review the original approach is required. See bellow for comments and some more explanations about what I think might be improvements over the previous patch. On Fri, 11 Jun 2021 12:18:07 -0700 Andres Freund wrote: > On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote: > > From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001 > > From: Jehan-Guillaume de Rorthais > > Date: Tue, 1 Jun 2021 13:25:57 +0200 > > Subject: [PATCH 1/2] Add pg_stat_waitaccum view. > > > > pg_stat_waitaccum shows counts and duration of each wait events. > > Each backend/backgrounds counts and measures the time of wait event > > in every pgstat_report_wait_start and pgstat_report_wait_end. They > > store those info into their local variables and send to Statistics > > Collector. We can get those info via Statistics Collector. > > > > For reducing overhead, I implemented statistic hash instead of > > dynamic hash. I also implemented track_wait_timing which > > determines wait event duration is collected or not. > > I object to adding this overhead. The whole selling point for wait > events was that they are low overhead. I since spent time reducing the > overhead further, because even just the branches for checking if > track_activity is enabled are measurable (225a22b19ed). Agree. The previous patch I rebased was to review it and reopen this discussion, I even added a small FIXME in pgstat_report_wait_end and pgstat_report_wait_start about your work: //FIXME: recent patch to speed up this call. In the patch in attachment, I tried to fix this by using kind of an internal hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to "instrument" wait events only when required, on the fly, dynamically. Moreover, I removed the hash structure for a simple static array for faster access. > > From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001 > > From: Jehan-Guillaume de Rorthais > > Date: Fri, 4 Jun 2021 18:14:51 +0200 > > Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from > > INSTR_TIME to rdtsc. > > > > This patch changes measuring method of wait event time from INSTR_TIME > > (which uses gettimeofday or clock_gettime) to rdtsc. This might reduce the > > overhead of measuring overhead. > > > > Any supports like changing clock cycle to actual time or error handling are > > not currently implemented. > > rdtsc is a serializing (*) instruction - that's the expensive part. On linux > clock_gettime() doesn't actually need a syscall. While the vdso call > implies a bit of overhead over a raw rdtsc, it's a relatively small part > of the overhead. See > https://www.postgresql.org/message-id/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de I choose to remove all this rdtsc part from my patch as this wasn't clear how much faster it was compare to simpler vdso functions and how to accurately extract a human time. About my take on $subject, for the sake of simplicity of this PoC, I added instrumentation to log_statement_stats. Despite the query context of the reported log, they are really accumulated stats. The patch updated pg_stat_get_waitaccum() as well to be able to report the accumulated wait events from your interactive or batch session. So using my previous fe-time demo client, you can test it using: PGOPTIONS="--log_statement_stats=on" ./fe-time 100 From logs, I now have (notice the last line): LOG: duration: 3837.194 ms statement: SELECT * FROM pgbench_accounts LOG: QUERY STATISTICS DETAIL: ! system usage stats: ! 0.087444 s user, 0.002106 s system, 3.837202 s elapsed ! [0.087444 s user, 0.003974 s system total] ! 25860 kB max resident size ! 0/0 [0/0] filesystem blocks in/out ! 0/303 [0/697] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent ! 4/18 [5/18] voluntary/involuntary context switches ! Client/ClientWrite 4 calls, 3747102 us elapsed Using pgbench scale factor 10, the copy query for pgbench_accounts looks like: LOG: duration: 2388.081 ms statement: copy pgbench_accounts from stdin LOG: QUERY STATISTICS DETAIL: ! system usage stats: ! 1.373756 s user, 0.252860 s system, 2.388100 s elapsed ! [1.397015 s user, 0.264951 s system total] ! 37788 kB max resident size ! 0/641584 [0/642056] filesystem blocks in/out ! 194/4147 [195/4728] page faults/reclaims, 0 [0] swaps ! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
Re: [Proposal] Add accumulated statistics for wait event
HHi, On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote: > From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001 > From: Jehan-Guillaume de Rorthais > Date: Tue, 1 Jun 2021 13:25:57 +0200 > Subject: [PATCH 1/2] Add pg_stat_waitaccum view. > > pg_stat_waitaccum shows counts and duration of each wait events. > Each backend/backgrounds counts and measures the time of wait event > in every pgstat_report_wait_start and pgstat_report_wait_end. They > store those info into their local variables and send to Statistics > Collector. We can get those info via Statistics Collector. > > For reducing overhead, I implemented statistic hash instead of > dynamic hash. I also implemented track_wait_timing which > determines wait event duration is collected or not. I object to adding this overhead. The whole selling point for wait events was that they are low overhead. I since spent time reducing the overhead further, because even just the branches for checking if track_activity is enabled are measurable (225a22b19ed). > From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001 > From: Jehan-Guillaume de Rorthais > Date: Fri, 4 Jun 2021 18:14:51 +0200 > Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from > INSTR_TIME to rdtsc. > > This patch changes measuring method of wait event time from INSTR_TIME (which > uses gettimeofday or clock_gettime) to rdtsc. This might reduce the overhead > of measuring overhead. > > Any supports like changing clock cycle to actual time or error handling are > not currently implemented. rdtsc is a serializing (*) instruction - that's the expensive part. On linux clock_gettime() doesn't actually need a syscall. While the vdso call implies a bit of overhead over a raw rdtsc, it's a relatively small part of the overhead. See https://www.postgresql.org/message-id/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de Greetings, Andres Freund (*) it's not fully serializing, iirc it allows later instructions to be started, but it does wait for all earlier in-flight instructions to finish.
Re: [Proposal] Add accumulated statistics for wait event
Hi All, I faced a few times a situation where a long running query is actually including the time the backend is waiting for the frontend to fetch all the rows (see [1] for details). See a sample code fe-time.c and its comments in attachment to reproduce this behavior. There's no simple way today to pinpoint the problem in production without advance interactive auditing, and/or using system tools. After studying the problem, I believe it boil down to track the wait event ClientWrite, so I ended up on this thread. You might catch some mismatching times thanks to auto_explain as well. Using the fe-time.c demo code with the following command: PGDATABASE=postgres PGHOST=::1 time ./fe-time 100 The frontend time is 10s, the query time reported is 3228.631ms, but last row has been produced after 20.672ms: LOG: duration: 3228.631 ms plan: Query Text: SELECT * FROM pgbench_accounts Seq Scan on pgbench_accounts (time=0.005..20.672 rows=10 loops=1) (Note that in contrast with localhost, through the unix socket, the backend reported query time is always really close to 10s). I re-based the existing patch (see in attachment), to look at the ClientWrite for this exact query: # SELECT wait_event, calls, times FROM pg_stat_get_waitaccum(NULL) WHERE wait_event = 'ClientWrite'; wait_event | calls | times -+---+- ClientWrite | 4 | 3132266 The "time" is expressed as µs in the patch, so 3132.266ms of the total 3228.631ms is spent sending the result to the frontend. I'm not sure where are the missing 75ms. The pg_wait_sampling extension might help but it requires a production restart to install, then enable it. Whatever if the solution is sampling or cumulative, an in-core and hot-switchable solution would be much more convenient. But anyway, looking at pg_wait_sampling, we have a clear suspect as well for the later query run: # SELECT event, count FROM pg_wait_sampling_profile WHERE queryid=4045741516911800313; event| count -+--- ClientWrite | 309 The default profil period of pg_wait_sampling being 10ms, we can roughly estimate the ClientWrite around 3090ms. Note that this is close enough because we know 3132266µs has been accumulated among only 4 large wait events. Finishing bellow. On Mon, 3 Aug 2020 00:00:40 +0200 Daniel Gustafsson wrote: > > On 31 Jul 2020, at 07:23, imai.yoshik...@fujitsu.com wrote: > > > >> This patch fails to apply to HEAD, please submit a rebased version. I've > >> marked this as as Waiting on Author. Please, find in attachment a rebase of both patches. I did some small editing on the way. I didn't bench them. I'm not sure this patch is the best approach though. Receive it as a motivation to keep up with this discussion. As I wrote, whatever if the solution is sampling or cumulative, an in-core and hot-switchable solution would be much more convenient. The fact is that this patch was already available and ready to keep up with a discussion. Collecting and summing all wait events from all backends in the same place forbid to track precisely wait events from a specific backends. Especially on a busy system where numbers can quickly be buried by all other activities around. I wonder if wait events should only be accumulated on backend side, making possible to enable/disable them on the fly and to collect some reports eg. in logs or to output. Most of the code from these patch could be recycled in a simpler patch implementing this. Thoughts? > > Sorry for my absence. Unfortunately I couldn't have time to work on this > > patch in this cf. I believe I will be back in next cf, work on this patch > > and also review other patches. > > No worries, it happens. Since the thread has stalled and there is no updated > patch I've marked this entry Returned with Feedback. Please feel free to > re-open a new CF entry if you return to this patch. I volunteer to be a reviewer on this patch. Imai-san, do you agree to add it as new CF entry? Regards, [1]: Last time I had such situation was few weeks ago. A customer was reporting a query being randomly slow, running bellow 100ms most of the time and sometime hitting 28s. Long story short, the number of row was the same (10-15k), but the result set size was 9x bigger (1MB vs 9MB). As the same query was running fine from psql, I suspected the frontend was somehow saturated. Tcpdump helped me to compute that the throughput fall to 256kB/s after the first 2MB of data transfert with a very narrow TCP window. I explained to the customer their app probably doesn't pull the rows fast enough and that some buffers were probably saturated on the frontend side, waiting for the app and slowing down the whole transfert. Devels fixed the problem by moving away two fields transformations (unaccent) from their loop fetching the rows. >From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001 From: Jehan-Guillaume de Rorthais
Re: [Proposal] Add accumulated statistics for wait event
> On 31 Jul 2020, at 07:23, imai.yoshik...@fujitsu.com wrote: > >> This patch fails to apply to HEAD, please submit a rebased version. I've >> marked this as as Waiting on Author. > > Sorry for my absence. Unfortunately I couldn't have time to work on this > patch in this cf. > I believe I will be back in next cf, work on this patch and also review other > patches. No worries, it happens. Since the thread has stalled and there is no updated patch I've marked this entry Returned with Feedback. Please feel free to re-open a new CF entry if you return to this patch. cheers ./daniel
RE: [Proposal] Add accumulated statistics for wait event
> This patch fails to apply to HEAD, please submit a rebased version. I've > marked this as as Waiting on Author. Sorry for my absence. Unfortunately I couldn't have time to work on this patch in this cf. I believe I will be back in next cf, work on this patch and also review other patches. --- Yoshikazu IMAI
Re: [Proposal] Add accumulated statistics for wait event
Hi, This patch fails to apply to HEAD, please submit a rebased version. I've marked this as as Waiting on Author. cheers ./daniel
Re: [Proposal] Add accumulated statistics for wait event
Hi Imai-san, I feel your 'pg_stat_waitaccum' will help us investigate the bottleneck. So I'd like to do some benchmarks but unfortunately, the latest v6 patch couldn't be applied to HEAD anymore. Is it possible to share the latest patches? If not, I'll make v6 applicable to the HEAD. Regards, -- Atsushi Torikoshi On Fri, Feb 28, 2020 at 5:17 PM imai.yoshik...@fujitsu.com < imai.yoshik...@fujitsu.com> wrote: > On Wed, Feb 26, 2020 at 1:39 AM, Kyotaro Horiguchi wrote: > > Hello. I had a brief look on this and have some comments on this. > > Hi, Horiguchi-san. Thank you for looking at this! > > > It uses its own hash implement. Aside from the appropriateness of > > having another implement of existing tool, in the first place hash > > works well for wide, sparse and uncertain set of keys. But they are in > > rather a dense and narrow set with certain and fixed members. It > > registers more than 200 entries but bucket size is 461. It would be > > needed to avoid colliisions, but seems a bit too wasting. > > Yes, wait events are grouped and wait events ID are defined as a sequential > number, starting from specified number for each group(like 0x0100U), > thus > keys are in a dense and narrow set. > > = > #define PG_WAIT_LWLOCK 0x0100U > #define PG_WAIT_LOCK0x0300U > #define PG_WAIT_BUFFER_PIN 0x0400U > #define PG_WAIT_ACTIVITY0x0500U > ... > > typedef enum > { > WAIT_EVENT_ARCHIVER_MAIN = PG_WAIT_ACTIVITY, > WAIT_EVENT_AUTOVACUUM_MAIN, > ... > WAIT_EVENT_WAL_WRITER_MAIN > } WaitEventActivity; > = > > The number 461 is the lowest number which avoids collisions in the hash for > current wait events set. As you pointed out, there are a bit too much > unused > entries. > > > If we prepare arrays for each wait classes with appropriate size which just > fits the number of each wait event class, we can store wait event > statistics > into those arrays with no more wastes. > Also we calculate hash number by "(wait event ID) % (bucket size)" in hash > case, while we calculate index of arrays by "(wait event ID) - (wait event > class id)" > in array case. The latter one's cost is lower than the former one. > > Currently I implement wait event statistics store by hash because of its > easiness of implementation, but I think it is good to implement it by array > for above reasons. One concern is that we put restrictions on wait events > that it must be defined as it is sequenced in its wait class so that there > are no unused entries in the array. > > > > It seems trying to avoid doing needless work when the feature is not > > activated by checking "if (wa_hash==NULL)", but the hash is created > > being filled with possible entries regardless of whether > > track_wait_timing is on or off. > > This might be bad implementation but there are the cases "wa_hash = NULL" > where pgstat_init() is not called like when executing "initdb". I insert > that check for avoiding unexpected crash in those cases. > > I also noticed debug codes exist around that code...I will modify it. > > > As the result > > pgstat_report_waitaccum_end calls pgstat_get_wa_entry tremendously > > frequently. This should be the reason for the recent benchmark result. > > I'm not sure such frequency of hash-searching is acceptable even if > > the feature is turned on. > > track_wait_timing parameter determines whether we collect wait time. > Regardless of that parameter, we always collect wait count every wait event > happens. I think calling pgstat_get_wa_entry frequently is not critical to > performance. From pavel's benchmark results, if track_wait_timing parameter > is off, there are +/-1.0% performance difference between patched and > unpatched > and it is just considered as a measurement error. > > > Thanks > -- > Yoshikazu Imai > > >
RE: [Proposal] Add accumulated statistics for wait event
On Wed, Feb 26, 2020 at 1:39 AM, Kyotaro Horiguchi wrote: > Hello. I had a brief look on this and have some comments on this. Hi, Horiguchi-san. Thank you for looking at this! > It uses its own hash implement. Aside from the appropriateness of > having another implement of existing tool, in the first place hash > works well for wide, sparse and uncertain set of keys. But they are in > rather a dense and narrow set with certain and fixed members. It > registers more than 200 entries but bucket size is 461. It would be > needed to avoid colliisions, but seems a bit too wasting. Yes, wait events are grouped and wait events ID are defined as a sequential number, starting from specified number for each group(like 0x0100U), thus keys are in a dense and narrow set. = #define PG_WAIT_LWLOCK 0x0100U #define PG_WAIT_LOCK0x0300U #define PG_WAIT_BUFFER_PIN 0x0400U #define PG_WAIT_ACTIVITY0x0500U ... typedef enum { WAIT_EVENT_ARCHIVER_MAIN = PG_WAIT_ACTIVITY, WAIT_EVENT_AUTOVACUUM_MAIN, ... WAIT_EVENT_WAL_WRITER_MAIN } WaitEventActivity; = The number 461 is the lowest number which avoids collisions in the hash for current wait events set. As you pointed out, there are a bit too much unused entries. If we prepare arrays for each wait classes with appropriate size which just fits the number of each wait event class, we can store wait event statistics into those arrays with no more wastes. Also we calculate hash number by "(wait event ID) % (bucket size)" in hash case, while we calculate index of arrays by "(wait event ID) - (wait event class id)" in array case. The latter one's cost is lower than the former one. Currently I implement wait event statistics store by hash because of its easiness of implementation, but I think it is good to implement it by array for above reasons. One concern is that we put restrictions on wait events that it must be defined as it is sequenced in its wait class so that there are no unused entries in the array. > It seems trying to avoid doing needless work when the feature is not > activated by checking "if (wa_hash==NULL)", but the hash is created > being filled with possible entries regardless of whether > track_wait_timing is on or off. This might be bad implementation but there are the cases "wa_hash = NULL" where pgstat_init() is not called like when executing "initdb". I insert that check for avoiding unexpected crash in those cases. I also noticed debug codes exist around that code...I will modify it. > As the result > pgstat_report_waitaccum_end calls pgstat_get_wa_entry tremendously > frequently. This should be the reason for the recent benchmark result. > I'm not sure such frequency of hash-searching is acceptable even if > the feature is turned on. track_wait_timing parameter determines whether we collect wait time. Regardless of that parameter, we always collect wait count every wait event happens. I think calling pgstat_get_wa_entry frequently is not critical to performance. From pavel's benchmark results, if track_wait_timing parameter is off, there are +/-1.0% performance difference between patched and unpatched and it is just considered as a measurement error. Thanks -- Yoshikazu Imai
Re: [Proposal] Add accumulated statistics for wait event
Hello. I had a brief look on this and have some comments on this. At Tue, 25 Feb 2020 07:53:26 +, "imai.yoshik...@fujitsu.com" wrote in > Thanks for Wang's mail, I noticed my 0002 patch was wrong from v3. > > Here, I attach correct patches. > > Also I will begin to do some benchmark with higher scale and higher number of > users and try to change stats reporting implementation to not affect > performance, which I couldn't have not started because of another tasks. It uses its own hash implement. Aside from the appropriateness of having another implement of existing tool, in the first place hash works well for wide, sparse and uncertain set of keys. But they are in rather a dense and narrow set with certain and fixed members. It registers more than 200 entries but bucket size is 461. It would be needed to avoid colliisions, but seems a bit too wasting. It seems trying to avoid doing needless work when the feature is not activated by checking "if (wa_hash==NULL)", but the hash is created being filled with possible entries regardless of whether track_wait_timing is on or off. As the result pgstat_report_waitaccum_end calls pgstat_get_wa_entry tremendously frequently. This should be the reason for the recent benchmark result. I'm not sure such frequency of hash-searching is acceptable even if the feature is turned on. I think we need a smarter mapping scheme of events to entries. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [Proposal] Add accumulated statistics for wait event
Thank you for this update! I will try it. Best regards, Victor wang | | 王胜利 | | 邮箱:atti...@126.com | 签名由 网易邮箱大师 定制 On 02/25/2020 15:53, imai.yoshik...@fujitsu.com wrote: On Fri, Feb 14, 2020 at 11:59 AM, 王胜利 wrote: >I am glad to know you are working on PG accumulated statistics > feature, and I am interested on it. >I see these two patch file you made, can you let me know which branch > of PG code based? > >when I use this: https://github.com/postgres/postgres/commits/master, > and apply these patches, report some error. Thanks for Wang's mail, I noticed my 0002 patch was wrong from v3. Here, I attach correct patches. Also I will begin to do some benchmark with higher scale and higher number of users and try to change stats reporting implementation to not affect performance, which I couldn't have not started because of another tasks. -- Yoshikazu Imai
RE: [Proposal] Add accumulated statistics for wait event
On Fri, Feb 14, 2020 at 11:59 AM, 王胜利 wrote: >I am glad to know you are working on PG accumulated statistics > feature, and I am interested on it. >I see these two patch file you made, can you let me know which branch > of PG code based? > >when I use this: https://github.com/postgres/postgres/commits/master, > and apply these patches, report some error. Thanks for Wang's mail, I noticed my 0002 patch was wrong from v3. Here, I attach correct patches. Also I will begin to do some benchmark with higher scale and higher number of users and try to change stats reporting implementation to not affect performance, which I couldn't have not started because of another tasks. -- Yoshikazu Imai 0001-Add-pg_stat_waitaccum-view-v6.patch Description: 0001-Add-pg_stat_waitaccum-view-v6.patch 0002-POC-Change-measuring-method-of-wait-event-time-fr-v6.patch Description: 0002-POC-Change-measuring-method-of-wait-event-time-fr-v6.patch
RE: [Proposal] Add accumulated statistics for wait event
On Wed, Feb 12, 2020 at 5:42 AM, Craig Ringer wrote: > > It seems performance difference is big in case of read only tests. The > > reason is that write time is relatively longer than the > > processing time of the logic I added in the patch. > > That's going to be a pretty difficult performance hit to justify. > > Can we buffer collected wait events locally and spit the buffer to the > stats collector at convenient moments? We can use a limited buffer > size with an overflow flag, so we degrade the results rather than > falling over or forcing excessive stats reporting at inappropriate times. IIUC, currently each backend collects wait events locally. When each backend goes to idle (telling the frontend that it is ready-for-query), it reports wait event statistics to the stats collector. The interval of each report is over than PGSTAT_STAT_INTERVAL(default 500ms). Also when each backend exits, it also does report. So if we do the read only test with 50 clients, each 50 backend reports wait events statistics to the stats collector for almost every 500ms. If that causes performance degradation, we can improve performance by letting backends to report its statistics, for example, only at the backend's exit. (I think I can easily test this by building postgres with setting PGSTAT_STAT_INTERVAL to a large value >> 500ms.) > I suggest that this is also a good opportunity to add some more > tracepoints to PostgreSQL. The wait events facilities are not very > traceable right now. Does that mean we will add TRACE_POSTGRESQL_ to every before/after pgstat_report_wait_start? > That way we have a zero-overhead-when-unused option that can also be > used to aggregate the information per-query, per-user, etc. I see. In that way, we can accomplish no overhead when DTrace is not enabled and what we can measure is more customizable. It is also curious how will overhead be if we implement wait events statistics on DTrace scripts though I can't imagine how it will be because I haven't used DTrace. -- Yoshikazu Imai
Re: [Proposal] Add accumulated statistics for wait event
On Wed, 12 Feb 2020 at 12:36, imai.yoshik...@fujitsu.com wrote: > It seems performance difference is big in case of read only tests. The reason > is that write time is relatively longer than the > processing time of the logic I added in the patch. That's going to be a pretty difficult performance hit to justify. Can we buffer collected wait events locally and spit the buffer to the stats collector at convenient moments? We can use a limited buffer size with an overflow flag, so we degrade the results rather than falling over or forcing excessive stats reporting at inappropriate times. I suggest that this is also a good opportunity to add some more tracepoints to PostgreSQL. The wait events facilities are not very traceable right now. Exposing some better TRACE_POSTGRESQL_ tracepoints (SDTs) via probes.d would help users collect better information using external tools like perf, bpftrace and systemtap. That way we have a zero-overhead-when-unused option that can also be used to aggregate the information per-query, per-user, etc. (I really need to add a bunch more tracepoints to make this easier...) -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise
RE: [Proposal] Add accumulated statistics for wait event
On Sat, Feb 1, 2020 at 5:50 AM, Pavel Stehule wrote: > today I run 120 5minutes pgbench tests to measure impact of this patch. > Result is attached. ... > Thanks to Tomas Vondra and 2ndq for hw for testing Thank you for doing a lot of these benchmarks! > The result is interesting - when I run pgbench in R/W mode, then I got +/- 1% > changes in performance. Isn't important if > tracking time is active or not (tested on Linux). In this mode the new code > is not on critical path. It seems performance difference is big in case of read only tests. The reason is that write time is relatively longer than the processing time of the logic I added in the patch. > Looks so for higher scale than 5 and higher number of users 50 the results > are not too much stable (for read only tests - I > repeated tests) and there overhead is about 10% from 60K tps to 55Ktps - > maybe I hit a hw limits (it running with 4CPU) Yes, I suspect some other bottlenecks may be happened and it causes the results unstable. However, it may be better to investigate what is actually happened and why performance is increased/decreased for over 10%. I will inspect it. Also I attach v5 patches which corresponds to other committed patches. -- Yoshikazu Imai 0001-Add-pg_stat_waitaccum-view-v5.patch Description: 0001-Add-pg_stat_waitaccum-view-v5.patch 0002-POC-Change-measuring-method-of-wait-event-time-fr-v5.patch Description: 0002-POC-Change-measuring-method-of-wait-event-time-fr-v5.patch
Re: [Proposal] Add accumulated statistics for wait event
so 1. 2. 2020 v 12:34 odesílatel Tomas Vondra napsal: > This patch was in WoA, but that was wrong I think - we got a patch on > January 15, followed by a benchmark by Pavel Stehule, so I think it > should still be in "needs review". So I've updated it and moved it to > the next CF. > currently this patch needs a rebase Pavel > > regards > > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [Proposal] Add accumulated statistics for wait event
This patch was in WoA, but that was wrong I think - we got a patch on January 15, followed by a benchmark by Pavel Stehule, so I think it should still be in "needs review". So I've updated it and moved it to the next CF. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Proposal] Add accumulated statistics for wait event
Hi st 15. 1. 2020 v 14:15 odesílatel Imai Yoshikazu napsal: > On 2020/01/13 4:11, Pavel Stehule wrote: > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: tested, passed > > Spec compliant: not tested > > Documentation:tested, passed > > > > I like this patch, because I used similar functionality some years ago > very successfully. The implementation is almost simple, and the result > should be valid by used method. > > Thanks for your review! > > > The potential problem is performance impact. Very early test show impact > cca 3% worst case, but I'll try to repeat these tests. > > Yes, performance impact is the main concern. I want to know how it > affects performance in various test cases or on various environments. > > > There are some ending whitespaces and useless tabs. > > > > The new status of this patch is: Waiting on Author > I attach v4 patches removing those extra whitespaces of the end of lines > and useless tabs. > today I run 120 5minutes pgbench tests to measure impact of this patch. Result is attached. test PSQL="/usr/local/pgsql/bin/psql" PGBENCH="/usr/local/pgsql/bin/pgbench" export PGDATABASE=postgres echo "*** START ***" > ~/result.txt for i in 1 5 10 50 100 do echo "scale factor $i" >> ~/result.txt $PSQL -c "create database bench$i" $PGBENCH -i -s $i "bench$i" for c in 1 5 10 50 do $PGBENCH -c $c -T 300 "bench$i" >> ~/result.txt done $PSQL -c "vacuum full" "bench$i" $PSQL -c "vacuum analyze" "bench$i" for c in 1 5 10 50 do $PGBENCH -S -c $c -T 300 "bench$i" >> ~/result.txt done $PSQL -c "drop database bench$i" done Tested on computer with 4CPU, 8GB RAM - configuration: shared buffers 2GB, work mem 20MB The result is interesting - when I run pgbench in R/W mode, then I got +/- 1% changes in performance. Isn't important if tracking time is active or not (tested on Linux). In this mode the new code is not on critical path. More interesting results are from read only tests (there are visible some higher differences) for scale 5/ and 50 users - the tracking time increase performance about 12% (same result I got for scale/users 10/50), in other direction patched but without tracking time decreases performance about 10% for for 50/50 (with without tracking time) and 100/5 Looks so for higher scale than 5 and higher number of users 50 the results are not too much stable (for read only tests - I repeated tests) and there overhead is about 10% from 60K tps to 55Ktps - maybe I hit a hw limits (it running with 4CPU) Thanks to Tomas Vondra and 2ndq for hw for testing Regards Pavel wait_event_type | wait_event | calls|times -+---++-- Client | ClientRead| 1489681408 | 221616362961 Lock| transactionid | 103113369 | 71918794185 LWLock | WALWriteLock | 104781468 | 20865855903 Lock| tuple | 21323744 | 15800875242 IO | WALSync | 50862170 | 8666988491 LWLock | lock_manager | 18415423 |575308266 IO | WALWrite | 51482764 |205775892 LWLock | buffer_content| 15385387 |168446128 LWLock | wal_insert|1502092 | 90019731 IPC | ProcArrayGroupUpdate | 178238 | 46527364 LWLock | ProcArrayLock | 587356 | 13298246 IO | DataFileExtend|2715557 | 11615216 IPC | ClogGroupUpdate | 54319 | 10622013 IO | DataFileRead |5805298 | 9596545 IO | SLRURead |9518930 | 7166217 LWLock | CLogControlLock | 746759 | 6783602 > -- > Yoshikazu Imai > pgbench.ods Description: application/vnd.oasis.opendocument.spreadsheet
Re: [Proposal] Add accumulated statistics for wait event
On 2020/01/13 4:11, Pavel Stehule wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: not tested > Documentation:tested, passed > > I like this patch, because I used similar functionality some years ago very > successfully. The implementation is almost simple, and the result should be > valid by used method. Thanks for your review! > The potential problem is performance impact. Very early test show impact cca > 3% worst case, but I'll try to repeat these tests. Yes, performance impact is the main concern. I want to know how it affects performance in various test cases or on various environments. > There are some ending whitespaces and useless tabs. > > The new status of this patch is: Waiting on Author I attach v4 patches removing those extra whitespaces of the end of lines and useless tabs. -- Yoshikazu Imai From b009b1f8f6be47ae61b5e4538e2730d721ee60db Mon Sep 17 00:00:00 2001 From: "imai.yoshikazu" Date: Wed, 15 Jan 2020 09:13:19 + Subject: [PATCH v4 1/2] Add pg_stat_waitaccum view. pg_stat_waitaccum shows counts and duration of each wait events. Each backend/backgrounds counts and measures the time of wait event in every pgstat_report_wait_start and pgstat_report_wait_end. They store those info into their local variables and send to Statistics Collector. We can get those info via Statistics Collector. For reducing overhead, I implemented statistic hash instead of dynamic hash. I also implemented track_wait_timing which determines wait event duration is collected or not. On windows, this function might be not worked correctly, because now it initializes local variables in pg_stat_init which is not passed to fork processes on windows. --- src/backend/catalog/system_views.sql | 8 + src/backend/postmaster/pgstat.c | 344 ++ src/backend/storage/lmgr/lwlock.c | 19 ++ src/backend/utils/adt/pgstatfuncs.c | 80 ++ src/backend/utils/misc/guc.c | 9 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/catalog/pg_proc.dat | 9 + src/include/pgstat.h | 123 - src/include/storage/lwlock.h | 1 + src/include/storage/proc.h| 1 + src/test/regress/expected/rules.out | 5 + 11 files changed, 598 insertions(+), 2 deletions(-) diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 773edf8..80f2caa 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -957,6 +957,14 @@ CREATE VIEW pg_stat_bgwriter AS pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset; +CREATE VIEW pg_stat_waitaccum AS +SELECT + S.wait_event_type AS wait_event_type, + S.wait_event AS wait_event, + S.calls AS calls, + S.times AS times + FROM pg_stat_get_waitaccum(NULL) AS S; + CREATE VIEW pg_stat_progress_vacuum AS SELECT S.pid AS pid, S.datid AS datid, D.datname AS datname, diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 51c486b..08e10ad 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -123,6 +123,7 @@ */ bool pgstat_track_activities = false; bool pgstat_track_counts = false; +bool pgstat_track_wait_timing = false; intpgstat_track_functions = TRACK_FUNC_OFF; intpgstat_track_activity_query_size = 1024; @@ -153,6 +154,10 @@ static time_t last_pgstat_start_time; static bool pgStatRunningInCollector = false; +WAHash *wa_hash; + +instr_time waitStart; + /* * Structures in which backends store per-table info that's waiting to be * sent to the collector. @@ -255,6 +260,7 @@ static int localNumBackends = 0; */ static PgStat_ArchiverStats archiverStats; static PgStat_GlobalStats globalStats; +static PgStat_WaitAccumStats waitAccumStats; /* * List of OIDs of databases we need to write out. If an entry is InvalidOid, @@ -280,6 +286,8 @@ static pid_t pgstat_forkexec(void); #endif NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn(); +static void pgstat_init_waitaccum_hash(WAHash **hash); +static PgStat_WaitAccumEntry *pgstat_add_wa_entry(WAHash *hash, uint32 key); static void pgstat_beshutdown_hook(int code, Datum arg); static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create); @@ -287,8 +295,11 @@ static PgStat_StatTabEntry *pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create); static void
Re: [Proposal] Add accumulated statistics for wait event
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed I like this patch, because I used similar functionality some years ago very successfully. The implementation is almost simple, and the result should be valid by used method. The potential problem is performance impact. Very early test show impact cca 3% worst case, but I'll try to repeat these tests. There are some ending whitespaces and useless tabs. The new status of this patch is: Waiting on Author
RE: [Proposal] Add accumulated statistics for wait event
On Sun, Oct 28, 2018 at 6:39 PM, legrand legrand wrote: Hi, Thanks for comments. I had overlooked the reply from you. >You are right, sampling has to be "tuned" regarding the event(s) you want to >catch. I see. For tuning, you need to know the length of processing you want to sample? > May I invite you to try it, using PASH-viewer (github) with pgsentinel > (github). I'll check and try it. -- Naoki Yotsunaga
RE: [Proposal] Add accumulated statistics for wait event
On Mon, Oct 29, 2018 at 1:52 AM, Phil Florent wrote: Hi, thank you for comments. >Yes you will be able to solve bottlenecks with sampling. In interactive mode, >a 1s interval is probably too large. I use 0s1 - 0s01 with my tool and it is >normally OK. With the tool you are using, can you sample at intervals shorter than 1 second? If you can, you can get enough sampling number and you can also acquire short events. >Since grafana is now able to connect directly to a postgresql source, I use it >to display the information collected from pg_stat_activity and psutil ( e.g >https://pgphil.ovh/traqueur_dashboard_02.php page is written in french but >panels are in english) It is wonderful to visualize. Especially for beginners like me. >Other DB have accumulated statistics but you can notice that sampling is also >their most modern method. >E.g Oracle DB : 20 years ago you already had tools like "utlbstat/utlestat" . >Then you had "statspack". Those tools were based on accumulated statistics and >the reports were based on differences between 2 points. It was useful to solve >major problems but it was limited and not precise enough in many cases. >The preferred feature to identify bottlenecks in the Oracle world is now ASH >(active session history). It can help with major problems, specific problems >AND it can identify short blockages. >Too bad it is licensed as an option of their Enterprise Edition but similar >tools exist and they are also based on sampling of the activity. >With the "official" ASH, sampling and archiving are done internally and you >have a circular memory zone dedicated to the feature. Hence the overhead is >lower but that's all. >The most advanced interactive tool is called "snapper" and it is also based on >sampling. Thanks. I will check it. The current bottleneck survey method, from sampling, I can know the number (ratio) of waiting events. Then, investigate from those with a high number of times (ratio). Do you agree with this recognition? --- Naoki, Yotsunaga.
RE: [Proposal] Add accumulated statistics for wait event
Hi, Is DBA really able to solve bottlenecks with sampling? What I would like to say is that if we have information on the number of wait events and the wait time(like other DB), we can investigate more easily. Yes you will be able to solve bottlenecks with sampling. In interactive mode, a 1s interval is probably too large. I use 0s1 - 0s01 with my tool and it is normally OK. In batch mode I use 1s=>10s. If you want to visualize the results it's easy to use a dedicated tool and bottlenecks will clearly appear . Since grafana is now able to connect directly to a postgresql source, I use it to display the information collected from pg_stat_activity and psutil ( e.g https://pgphil.ovh/traqueur_dashboard_02.php page is written in french but panels are in english) Other DB have accumulated statistics but you can notice that sampling is also their most modern method. E.g Oracle DB : 20 years ago you already had tools like "utlbstat/utlestat" . Then you had "statspack". Those tools were based on accumulated statistics and the reports were based on differences between 2 points. It was useful to solve major problems but it was limited and not precise enough in many cases. The preferred feature to identify bottlenecks in the Oracle world is now ASH (active session history). It can help with major problems, specific problems AND it can identify short blockages. Too bad it is licensed as an option of their Enterprise Edition but similar tools exist and they are also based on sampling of the activity. With the "official" ASH, sampling and archiving are done internally and you have a circular memory zone dedicated to the feature. Hence the overhead is lower but that's all. The most advanced interactive tool is called "snapper" and it is also based on sampling. Best regards Phil De : Yotsunaga, Naoki Envoyé : lundi 29 octobre 2018 02:20 À : 'Phil Florent'; 'Michael Paquier' Cc : 'Tomas Vondra'; 'pgsql-hackers@lists.postgresql.org' Objet : RE: [Proposal] Add accumulated statistics for wait event On Thu, Oct 4, 2018 at 8:22 PM, Yotsunaga Naoki wrote: Hi, I understood and thought of your statistic comment once again. In the case of sampling, is there enough statistic to investigate? In the case of a long SQL, I think that it is possible to obtain a sufficient sampling number. However, in the case of about 1 minute of SQL, only 60 samples can be obtained at most. #Because legard’s comment. https://www.postgresql.org/message-id/1539158356795-0.post%40n3.nabble.com Does this sampling number of 60 give information that I really want? Perhaps it is not to miss the real problem part? --- Naoki, Yotsunaga.
RE: [Proposal] Add accumulated statistics for wait event
Hello, You are right, sampling has to be "tuned" regarding the event(s) you want to catch. Sampling of 1 second interval is good with treatments that take hours, and not enough for a minute or a second analysis. May I invite you to try it, using PASH-viewer (github) with pgsentinel (github). Changing pgsentiel.c sampling from 1 second rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, ash_sampling_period * 1000L,PG_WAIT_EXTENSION); to 1/10 second rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, ash_sampling_period * 100L,PG_WAIT_EXTENSION); seems the good balance for me (for analysis periods from a few seconds to minutes). Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
RE: [Proposal] Add accumulated statistics for wait event
On Thu, Oct 4, 2018 at 8:22 PM, Yotsunaga Naoki wrote: Hi, I understood and thought of your statistic comment once again. In the case of sampling, is there enough statistic to investigate? In the case of a long SQL, I think that it is possible to obtain a sufficient sampling number. However, in the case of about 1 minute of SQL, only 60 samples can be obtained at most. #Because legard’s comment. https://www.postgresql.org/message-id/1539158356795-0.post%40n3.nabble.com Does this sampling number of 60 give information that I really want? Perhaps it is not to miss the real problem part? --- Naoki, Yotsunaga.
Re: [Proposal] Add accumulated statistics for wait event
Bertrand DROUVOT wrote > Hello Guys, > > As you mentioned Oracle like active session history sampling in this > thread, I just want to let you know that I am working on a brand new > extension to provide this feature. > > You can find the extension here: https://github.com/pgsentinel/pgsentinel > > [...] > > If you want to have a look, give your thoughts, you are welcome. > > Bertrand +1! just a regret: ash sampling interval can not be smaller than a second ;o( Cordialement PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
RE: [Proposal] Add accumulated statistics for wait event
On Thu, Oct 4, 2018 at 0:54 AM, Phil Florent wrote: Phil, Michael, I appreciate your polite comments. I understand as follows. We can find it if we shorten the sampling interval, but a lot of information comes out. # The balance is important. Also, it is not good unless we have enough samples. And I have to do various other things. Is my understand correct? It seems to me that it is difficult for me if my understanding is right. Is DBA really able to solve bottlenecks with sampling? # Since I am a beginner, I feel that way. And other people may not feel it difficult. What I would like to say is that if we have information on the number of wait events and the wait time(like other DB), we can investigate more easily. Of course, I understand that it also affects performance. So, I suggest a way that it can switch on and off, defaults is off. - Naoki, Yotsunaga.
RE: [Proposal] Add accumulated statistics for wait event
Hi, It's the same logic with any polling system. An integration calculation using monte-carlo method with only a few points won't be accurate enough and can even be completely wrong etc. Polling is OK to troubleshoot a problem on the fly but 2 points are not enough. A few seconds are needed to obtain good enough data, e.g 5-10 seconds of polling with a 0.1=>0.01s interval between 2 queries of the activity. Polling a few seconds while the user is waiting is normally enough to say if a significant part of the waits are on the database. It's very important to know that. With 1 hour of accumulated statistics, a DBA will always see something to fix. But if the user waits 10 seconds on a particular screen and 1 second is spent on the database it often won't directly help. Polling gives great information with postgreSQL 10 but it was already useful to catch top queries etc. in older versions. I always check if activity is adequately reported by my tool using known cases. I want to be sure it will report adequately things in real-world troubleshooting sessions. Sometimes there are bugs in my tool, once there was an issue with postgres (pgstat_report_activty() was not called by workers in parallel index creation) Best regards Phil De : Michael Paquier Envoyé : jeudi 4 octobre 2018 12:58 À : Phil Florent Cc : Yotsunaga, Naoki; Tomas Vondra; pgsql-hackers@lists.postgresql.org Objet : Re: [Proposal] Add accumulated statistics for wait event On Thu, Oct 04, 2018 at 09:32:37AM +, Phil Florent wrote: > I am a DB beginner, so please tell me. It says that you can find > events that are bottlenecks in sampling, but as you saw above, you can > not find events shorter than the sampling interval, right? Yes, which is why it would be as simple as making the interval shorter, still not too short so as it bloats the amount of information fetched which needs to be stored and afterwards (perhaps) treated for analysis. This gets rather close to signal processing. A simple image is for example, assuming that event A happens 100 times in an interval of 1s, and event B only once in the same interval of 1s, then if the snapshot interval is only 1s, then in the worst case A would be treated an equal of B, which would be wrong. -- Michael
Re: [Proposal] Add accumulated statistics for wait event
On Thu, Oct 04, 2018 at 09:32:37AM +, Phil Florent wrote: > I am a DB beginner, so please tell me. It says that you can find > events that are bottlenecks in sampling, but as you saw above, you can > not find events shorter than the sampling interval, right? Yes, which is why it would be as simple as making the interval shorter, still not too short so as it bloats the amount of information fetched which needs to be stored and afterwards (perhaps) treated for analysis. This gets rather close to signal processing. A simple image is for example, assuming that event A happens 100 times in an interval of 1s, and event B only once in the same interval of 1s, then if the snapshot interval is only 1s, then in the worst case A would be treated an equal of B, which would be wrong. -- Michael signature.asc Description: PGP signature
RE: [Proposal] Add accumulated statistics for wait event
Hi, I am a DB beginner, so please tell me. It says that you can find events that are bottlenecks in sampling, but as you saw above, you can not find events shorter than the sampling interval, right? If an event occurs frequently and if it is reported in pg_stat_activity, you will catch it again and again while sampling, no matter it duration. Hence you just need to * Sample the sessions and consider the active ones. You need to know if they are waiting (PostgreSQL now provides detailed wait events) or if they are on the CPU * Optionally collect information on the system context at the time of sampling (CPU, memory...), it can be provided by many tools like psutil python library for example If the client application itself provides information it's even more interesting. With something like program/module/action/client_info/sofar/totalwork in application_name you are able to focus directly on different kind of activity. It can give you information like "I/O waits are meaningful for my batch activity but not for my OLTP activity, if my goal is to improve response time for end users I have to consider that." Best regards Phil De : Yotsunaga, Naoki Envoyé : jeudi 4 octobre 2018 10:31 À : 'Michael Paquier'; Phil Florent Cc : Tomas Vondra; pgsql-hackers@lists.postgresql.org Objet : RE: [Proposal] Add accumulated statistics for wait event On Thu, July 26, 2018 at 1:25 AM, Michael Paquier wrote: > Even if you have spiky workloads, sampling may miss those, but even with > adding counters for each event > you would need to query the table holding the counters at an insane frequency > to be able to perhaps get > something out of it as you need to do sampling of the counters as well to > extract deltas. Hi, I was wondering why PostgreSQL did not have the number of wait events and wait time that other databases such as Oracle had as a function, and when I was looking for related threads, I got to this thread. I am a DB beginner, so please tell me. It says that you can find events that are bottlenecks in sampling, but as you saw above, you can not find events shorter than the sampling interval, right? If this short event has occurred quite a number of times and it was a considerable amount of time in total, can you solve this problem with sampling? # I have asked, but I could not understand much of the discussion above and I do not know if such a case can exist. Also, I think that it can be solved by higher the sampling frequency, but the load will be high, right? I think this method is not very practical. Moreover, I think that it is not implemented due to the reason that sampling is good as described above and because it affects performance. How about switching the function on / off and implementing with default off? Do you care about performance degradation during bottleneck investigation? When investigating the bottleneck, I think that it is better to find the cause even at the expense of performance. # If you can detect with high frequency sampling, I think that it depends on whether the sampling or the function(the number of wait events and wait time) is high load. Since I am a DB beginner, I think that it is saying strange things. I am glad if you tell me. - Naoki Yotsunaga
RE: [Proposal] Add accumulated statistics for wait event
On Thu, July 26, 2018 at 1:25 AM, Michael Paquier wrote: > Even if you have spiky workloads, sampling may miss those, but even with > adding counters for each event > you would need to query the table holding the counters at an insane frequency > to be able to perhaps get > something out of it as you need to do sampling of the counters as well to > extract deltas. Hi, I was wondering why PostgreSQL did not have the number of wait events and wait time that other databases such as Oracle had as a function, and when I was looking for related threads, I got to this thread. I am a DB beginner, so please tell me. It says that you can find events that are bottlenecks in sampling, but as you saw above, you can not find events shorter than the sampling interval, right? If this short event has occurred quite a number of times and it was a considerable amount of time in total, can you solve this problem with sampling? # I have asked, but I could not understand much of the discussion above and I do not know if such a case can exist. Also, I think that it can be solved by higher the sampling frequency, but the load will be high, right? I think this method is not very practical. Moreover, I think that it is not implemented due to the reason that sampling is good as described above and because it affects performance. How about switching the function on / off and implementing with default off? Do you care about performance degradation during bottleneck investigation? When investigating the bottleneck, I think that it is better to find the cause even at the expense of performance. # If you can detect with high frequency sampling, I think that it depends on whether the sampling or the function(the number of wait events and wait time) is high load. Since I am a DB beginner, I think that it is saying strange things. I am glad if you tell me. - Naoki Yotsunaga
RE: [Proposal] Add accumulated statistics for wait event
Hi, I agree with that. PostgreSQL 10 is really great, tuning tools based on sampling of pg_stat_activity became accurate without any modification. Best regards Phil De : Michael Paquier Envoyé : jeudi 26 juillet 2018 03:24 À : Phil Florent Cc : Tomas Vondra; pgsql-hackers@lists.postgresql.org Objet : Re: [Proposal] Add accumulated statistics for wait event On Tue, Jul 24, 2018 at 04:23:03PM +, Phil Florent wrote: > It loses non meaningful details and it's in fact a good point. In this > example, sampling will definitely find the cause and won't cost > resources. The higher the sampling frequency, the more details you get, with the most load on the instance. So if you are able to take an infinity of samples, where registering multiple times the same event for the same backend also matters because its overall weight gets higher and it shows up higher in profiles, then you would be able converge to the set of results that this patch adds. Sampling method, especially its frequency, is something controlled by the client and not the server. Approaches like the one proposed here push the load on the server-side, unconditionally, for *all* backends, and this has its cost. Even if you have spiky workloads, sampling may miss those, but even with adding counters for each event you would need to query the table holding the counters at an insane frequency to be able to perhaps get something out of it as you need to do sampling of the counters as well to extract deltas. As Tomas has mentioned up-thread, sampling is light-weight, as-is the current design for wait events. Even if it is not perfect because it cannot give exact numbers, it would find bottlenecks in really most cases, and that's what matters. If not, increasing the sampling frequency makes things easier to detect as well. What would be the point of taking only one sample every checkpoint for example? There may be a benefit in having counters, I don't know the answer to that, though the point would be to make sure that there is a specific set of workloads where it makes sense, still my gut feeling is that sampling would be able to detect those anyway. (I am not a computer scientist by default but a physicist, think fluid dynamics and turbulence, and I had my load of random events and signal analysis as well. All that is just statistics with attempts to approach reality, where sampling is a life-saver over exactitude of measurements.) Adding hooks is not acceptable to me either, those have a cost, and it is not clear what's the benefit we can get into putting hooks in such a place for cases other than sampling and counters... -- Michael
Re: [Proposal] Add accumulated statistics for wait event
Hello Guys, As you mentioned Oracle like active session history sampling in this thread, I just want to let you know that I am working on a brand new extension to provide this feature. You can find the extension here: https://github.com/pgsentinel/pgsentinel Basically, you could see it as samplings of pg_stat_activity (one second interval as default) currently providing more information: ash_time: the sampling time blockers: the number of blockers blockerpid: the pid of the blocker (if blockers = 1), the pid of one blocker (if blockers > 1) top_level_query: the top level statement (in case PL/pgSQL is used) query: the statement being executed (not normalised, as it is in pg_stat_statements, means you see the values) cmdtype: the statement type (SELECT,UPDATE,INSERT,DELETE,UTILITY,UNKNOWN,NOTHING) queryid: the queryid of the statement (the one coming from pg_stat_statements) Thanks to the queryid field you are able to link the session activity with the sql activity. It's implemented as in-memory ring buffer where samples are written with given (configurable) period. Therefore, user can see some number of recent samples depending on history size (configurable). Current caveats: In case of high query rate per pid, you could see (I saw it at more than 1000 queries per second) top_level_query and query not "correlated" (query, queryid and cmdtype are still well linked together). This is due to the fact that those 2 informations are currently collected independently. If you want to have a look, give your thoughts, you are welcome. Bertrand On 26 July 2018 at 03:24, Michael Paquier wrote: > On Tue, Jul 24, 2018 at 04:23:03PM +, Phil Florent wrote: > > It loses non meaningful details and it's in fact a good point. In this > > example, sampling will definitely find the cause and won't cost > > resources. > > The higher the sampling frequency, the more details you get, with the > most load on the instance. So if you are able to take an infinity of > samples, where registering multiple times the same event for the same > backend also matters because its overall weight gets higher and it shows > up higher in profiles, then you would be able converge to the set of > results that this patch adds. Sampling method, especially its > frequency, is something controlled by the client and not the server. > Approaches like the one proposed here push the load on the server-side, > unconditionally, for *all* backends, and this has its cost. > > Even if you have spiky workloads, sampling may miss those, but even with > adding counters for each event you would need to query the table holding > the counters at an insane frequency to be able to perhaps get something > out of it as you need to do sampling of the counters as well to extract > deltas. > > As Tomas has mentioned up-thread, sampling is light-weight, as-is the > current design for wait events. Even if it is not perfect because it > cannot give exact numbers, it would find bottlenecks in really most > cases, and that's what matters. If not, increasing the sampling > frequency makes things easier to detect as well. What would be the > point of taking only one sample every checkpoint for example? > > There may be a benefit in having counters, I don't know the answer to > that, though the point would be to make sure that there is a specific > set of workloads where it makes sense, still my gut feeling is that > sampling would be able to detect those anyway. > > (I am not a computer scientist by default but a physicist, think fluid > dynamics and turbulence, and I had my load of random events and signal > analysis as well. All that is just statistics with attempts to approach > reality, where sampling is a life-saver over exactitude of > measurements.) > > Adding hooks is not acceptable to me either, those have a cost, and it > is not clear what's the benefit we can get into putting hooks in such a > place for cases other than sampling and counters... > -- > Michael >
Re: [Proposal] Add accumulated statistics for wait event
On Tue, Jul 24, 2018 at 04:23:03PM +, Phil Florent wrote: > It loses non meaningful details and it's in fact a good point. In this > example, sampling will definitely find the cause and won't cost > resources. The higher the sampling frequency, the more details you get, with the most load on the instance. So if you are able to take an infinity of samples, where registering multiple times the same event for the same backend also matters because its overall weight gets higher and it shows up higher in profiles, then you would be able converge to the set of results that this patch adds. Sampling method, especially its frequency, is something controlled by the client and not the server. Approaches like the one proposed here push the load on the server-side, unconditionally, for *all* backends, and this has its cost. Even if you have spiky workloads, sampling may miss those, but even with adding counters for each event you would need to query the table holding the counters at an insane frequency to be able to perhaps get something out of it as you need to do sampling of the counters as well to extract deltas. As Tomas has mentioned up-thread, sampling is light-weight, as-is the current design for wait events. Even if it is not perfect because it cannot give exact numbers, it would find bottlenecks in really most cases, and that's what matters. If not, increasing the sampling frequency makes things easier to detect as well. What would be the point of taking only one sample every checkpoint for example? There may be a benefit in having counters, I don't know the answer to that, though the point would be to make sure that there is a specific set of workloads where it makes sense, still my gut feeling is that sampling would be able to detect those anyway. (I am not a computer scientist by default but a physicist, think fluid dynamics and turbulence, and I had my load of random events and signal analysis as well. All that is just statistics with attempts to approach reality, where sampling is a life-saver over exactitude of measurements.) Adding hooks is not acceptable to me either, those have a cost, and it is not clear what's the benefit we can get into putting hooks in such a place for cases other than sampling and counters... -- Michael signature.asc Description: PGP signature
RE: [Proposal] Add accumulated statistics for wait event
Hi, > Some case, sampling of events can not find the cause of issue. It lose detail > data. > For example, some throughput issue occur(ex : disk io), but each wait point > occurs only a few milliseconds. It loses non meaningful details and it's in fact a good point. In this example, sampling will definitely find the cause and won't cost resources. Being as precise as possible to define a wait event is very useful but knowing precisely the duration of each event is less useful in terms of tuning. Example of sampling + group by/order by percentage of activity : ./t -d 5 -o "application_name, wait_event_type" -o "application_name, wait_event, wait_event_type" traqueur 2.05.00 - performance tool for PostgreSQL 9.3 => 11 INFORMATION, no connection parameters provided, connecting to dedicated database ... INFORMATION, connected to dedicated database traqueur INFORMATION, PostgreSQL version : 11 INFORMATION, sql preparation ... INFORMATION, sql execution ... busy_pc | distinct_exe | application_name | wait_event_type -+--+--+- 206 | 8 / 103 | mperf| 62 | 2 / 31 | mperf| LWLock 20 | 3 / 10 | mperf| IO 12 | 1 / 6| mperf| Client (4 rows) busy_pc | distinct_exe | application_name | wait_event | wait_event_type -+--+--+---+- 206 | 8 / 103 | mperf| | 62 | 2 / 31 | mperf| WALWriteLock | LWLock 14 | 1 / 7| mperf| DataFileImmediateSync | IO 12 | 1 / 6| mperf| ClientRead| Client 2 | 1 / 1| mperf| DataFileWrite | IO 2 | 1 / 1| mperf| DataFileRead | IO 2 | 1 / 1| mperf| WALInitWrite | IO No need to know the exact duration of each event to identify the bottleneck(s)... Best regards Phil De : Tomas Vondra Envoyé : mardi 24 juillet 2018 17:45 À : pgsql-hackers@lists.postgresql.org Objet : Re: [Proposal] Add accumulated statistics for wait event On 07/24/2018 12:06 PM, MyungKyu LIM wrote: > 2018-07-23 16:53 (GMT+9), Michael Paquier wrote: >> On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote: >>> This proposal is about recording additional statistics of wait events. > >> I have comments about your patch. First, I don't think that you need to >> count precisely the number of wait events triggered as usually when it >> comes to analyzing a workload's bottleneck what counts is a periodic >> *sampling* of events, patterns which can be fetched already from >> pg_stat_activity and stored say in a different place. > > Thanks for your feedback. > > This proposal is not about *sampling*. > Accumulated statistics of wait events information is useful for solving > issue. It can measure accurate data. > > Some case, sampling of events can not find the cause of issue. It lose detail > data. > For example, some throughput issue occur(ex : disk io), but each wait point > occurs only a few milliseconds. > In this case, it is highly likely that will not find the cause. > I think it's highly likely that it will find the cause. The idea of sampling is that while you don't measure the timing directly, you can infer it from the frequency of the wait events in the samples. So if you see the backend reports a particular wait event in 75% of samples, it probably spent 75% time waiting on it. I'm not saying sampling is perfect and it certainly is less convenient than what you propose. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Proposal] Add accumulated statistics for wait event
On 07/24/2018 12:06 PM, MyungKyu LIM wrote: 2018-07-23 16:53 (GMT+9), Michael Paquier wrote: On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote: This proposal is about recording additional statistics of wait events. I have comments about your patch. First, I don't think that you need to count precisely the number of wait events triggered as usually when it comes to analyzing a workload's bottleneck what counts is a periodic *sampling* of events, patterns which can be fetched already from pg_stat_activity and stored say in a different place. Thanks for your feedback. This proposal is not about *sampling*. Accumulated statistics of wait events information is useful for solving issue. It can measure accurate data. Some case, sampling of events can not find the cause of issue. It lose detail data. For example, some throughput issue occur(ex : disk io), but each wait point occurs only a few milliseconds. In this case, it is highly likely that will not find the cause. I think it's highly likely that it will find the cause. The idea of sampling is that while you don't measure the timing directly, you can infer it from the frequency of the wait events in the samples. So if you see the backend reports a particular wait event in 75% of samples, it probably spent 75% time waiting on it. I'm not saying sampling is perfect and it certainly is less convenient than what you propose. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [Proposal] Add accumulated statistics for wait event
On 07/23/2018 03:57 PM, Tom Lane wrote: Michael Paquier writes: This does not need a configure switch. It probably is there because the OP realizes that most people wouldn't accept having this code compiled in. What's the performance penalty? I am pretty sure that this is measurable as wait events are stored for a backend for each I/O operation as well, and you are calling a C routine within an inlined function which is designed to be light-weight, doing only a four-byte atomic operation. On machines with slow gettimeofday(), I suspect the cost of this patch would be staggering. Even with relatively fast gettimeofday, it doesn't look acceptable for calls in hot code paths (for instance, lwlock.c). Yeah. I wonder if we could measure the time for a small fraction of the wait events, and estimate the actual duration from that. A bigger problem is that it breaks stuff. There are countless calls to pgstat_report_wait_start/pgstat_report_wait_end that assume they have no side-effects (for example, on errno) and can never fail. I wouldn't trust GetCurrentTimestamp() for either. If the report_wait calls can't be dropped into code with *complete* certainty that they're safe, that's a big cost. Why exactly is this insisting on logging timestamps and not, say, just incrementing a counter? I think doing it like this is almost certain to end in rejection. Because the number of times you hit wait event may not correlate with the time you spent waiting on it. So a simple counter is not the most useful thing. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Re: [Proposal] Add accumulated statistics for wait event
Hi, I am skeptical about accumulated statistics. pg_stat_activity now gives necessary information about wait events. It can be easily be used with a polling system that sleeps most of the time to limit the overhead. Measuring the duration of individual wait events is not necessary to know the repartition of the charge. You can aggregate the results of the polling by application, query, wait events or whatever you want. I wrote a script for that that can be used interactively or in batch mode to produce reports but many solutions exist . Best regards Phil <http://aka.ms/weboutlook> De : MyungKyu LIM Envoyé : mardi 24 juillet 2018 12:10 À : Alexander Korotkov; pgsql-hack...@postgresql.org Cc : Woosung Sohn; DoHyung HONG Objet : RE: Re: [Proposal] Add accumulated statistics for wait event > On Mon, Jul 23, 2018 at 10:53 AM Michael Paquier wrote: >> What's the performance penalty? I am pretty sure that this is >> measurable as wait events are stored for a backend for each I/O >> operation as well, and you are calling a C routine within an inlined >> function which is designed to be light-weight, doing only a four-byte >> atomic operation. > Yes, the question is overhead of measuring durations of individual wait > events. It has been proposed before, and there been heated debates about > that (see threads [1-3]). It doesn't seem > to be a conclusion about this feature. The thing to be said for sure: > performance penalty heavily depends on OS/hardware/workload. In some cases > overhead is negligible, but in other cases it appears to be huge. Thanks for good information. I agree. Performance penalty is exist. But wait stats are demandable and useful. In some cases, it is worth sacrificing performance and using it. So, what do you think about developing as extension? I have another concept proposal. 2. This feature can be implemented as extension if some hooks were provided in following functions, - pgstat_report_wait_start - pgstat_report_wait_end This feature can be turned on/off by on-line config when necessary. Best regards, MyungKyu, Lim
RE: Re: [Proposal] Add accumulated statistics for wait event
> On Mon, Jul 23, 2018 at 10:53 AM Michael Paquier wrote: >> What's the performance penalty? I am pretty sure that this is >> measurable as wait events are stored for a backend for each I/O >> operation as well, and you are calling a C routine within an inlined >> function which is designed to be light-weight, doing only a four-byte >> atomic operation. > Yes, the question is overhead of measuring durations of individual wait > events. It has been proposed before, and there been heated debates about > that (see threads [1-3]). It doesn't seem > to be a conclusion about this feature. The thing to be said for sure: > performance penalty heavily depends on OS/hardware/workload. In some cases > overhead is negligible, but in other cases it appears to be huge. Thanks for good information. I agree. Performance penalty is exist. But wait stats are demandable and useful. In some cases, it is worth sacrificing performance and using it. So, what do you think about developing as extension? I have another concept proposal. 2. This feature can be implemented as extension if some hooks were provided in following functions, - pgstat_report_wait_start - pgstat_report_wait_end This feature can be turned on/off by on-line config when necessary. Best regards, MyungKyu, Lim
RE: Re: [Proposal] Add accumulated statistics for wait event
2018-07-23 16:53 (GMT+9), Michael Paquier wrote: > On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote: >> This proposal is about recording additional statistics of wait events. > I have comments about your patch. First, I don't think that you need to > count precisely the number of wait events triggered as usually when it > comes to analyzing a workload's bottleneck what counts is a periodic > *sampling* of events, patterns which can be fetched already from > pg_stat_activity and stored say in a different place. Thanks for your feedback. This proposal is not about *sampling*. Accumulated statistics of wait events information is useful for solving issue. It can measure accurate data. Some case, sampling of events can not find the cause of issue. It lose detail data. For example, some throughput issue occur(ex : disk io), but each wait point occurs only a few milliseconds. In this case, it is highly likely that will not find the cause. > This is ugly and unmaintainable style. I'm sorry. You're right. Think as the PoC. > What's the performance penalty? I have same worries. I just tried pgbench several times. Let me know what some good performance check method. Best regards, MyungKyu, Lim
Re: [Proposal] Add accumulated statistics for wait event
Michael Paquier writes: > This does not need a configure switch. It probably is there because the OP realizes that most people wouldn't accept having this code compiled in. > What's the performance penalty? I am pretty sure that this is > measurable as wait events are stored for a backend for each I/O > operation as well, and you are calling a C routine within an inlined > function which is designed to be light-weight, doing only a four-byte > atomic operation. On machines with slow gettimeofday(), I suspect the cost of this patch would be staggering. Even with relatively fast gettimeofday, it doesn't look acceptable for calls in hot code paths (for instance, lwlock.c). A bigger problem is that it breaks stuff. There are countless calls to pgstat_report_wait_start/pgstat_report_wait_end that assume they have no side-effects (for example, on errno) and can never fail. I wouldn't trust GetCurrentTimestamp() for either. If the report_wait calls can't be dropped into code with *complete* certainty that they're safe, that's a big cost. Why exactly is this insisting on logging timestamps and not, say, just incrementing a counter? I think doing it like this is almost certain to end in rejection. regards, tom lane
Re: [Proposal] Add accumulated statistics for wait event
On Mon, Jul 23, 2018 at 10:53 AM Michael Paquier wrote: > What's the performance penalty? I am pretty sure that this is > measurable as wait events are stored for a backend for each I/O > operation as well, and you are calling a C routine within an inlined > function which is designed to be light-weight, doing only a four-byte > atomic operation. Yes, the question is overhead of measuring durations of individual wait events. It has been proposed before, and there been heated debates about that (see threads [1-3]). It doesn't seem to be a conclusion about this feature. The thing to be said for sure: performance penalty heavily depends on OS/hardware/workload. In some cases overhead is negligible, but in other cases it appears to be huge. 1. https://www.postgresql.org/message-id/flat/559D4729.9080704%40postgrespro.ru 2. https://www.postgresql.org/message-id/flat/CA%2BTgmoYd3GTz2_mJfUHF%2BRPe-bCy75ytJeKVv9x-o%2BSonCGApw%40mail.gmail.com 3. https://www.postgresql.org/message-id/flat/CAG95seUAQVj09KzLwU%2Bz1B-GqdMqerzEkPFR3hn0q88XzMq-PA%40mail.gmail.com -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [Proposal] Add accumulated statistics for wait event
Hi, that will be a great feature. On 23.07.2018 10:53, Michael Paquier wrote: I have comments about your patch. First, I don't think that you need to count precisely the number of wait events triggered as usually when it comes to analyzing a workload's bottleneck what counts is a periodic *sampling* of events If I get it right, this patch is not about sampling. It gathers cumulative statistics, which is regrettably missing in PostgreSQL. That's why it should not only count exact number of wait events, but also min time and stddev would be helpful (as in pg_stat_statements). -- Egor Rogov Postgres Professional http://www.postgrespro.com
Re: [Proposal] Add accumulated statistics for wait event
> This proposal is about recording additional statisticsofwait events. > The pg_stat_activity view is very useful in analysis for performance > issues. > But it is difficult to get information of wait events in detail, > when you need to deep dive into analysis of performance. > It is because pg_stat_activity just shows the current wait status of > backend. There is an extension that samples the information from pg_stat_activity (similar to Oracle's ASH). https://github.com/postgrespro/pg_wait_sampling Maybe it's worthwhile to combine the efforts? -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: [Proposal] Add accumulated statistics for wait event
On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote: > This proposal is about recording additional statistics of wait events. You should avoid sending things in html format, text format being recommended on those mailing lists... The patch applies after using patch -p0 by the way. I would recommend that you generate your patches using "git format-patch". Here are general guidelines on the matter: https://wiki.postgresql.org/wiki/Submitting_a_Patch Please study those guidelines, those are helpful if you want to get yourself familiar with community process. I have comments about your patch. First, I don't think that you need to count precisely the number of wait events triggered as usually when it comes to analyzing a workload's bottleneck what counts is a periodic *sampling* of events, patterns which can be fetched already from pg_stat_activity and stored say in a different place. This can be achieved easily by using a cron job with an INSERT SELECT query which adds data on a relation storing the event counts. I took the time to look at your patch, and here is some feedback. This does not need a configure switch. I assume that what you did is good to learn the internals of ./configure though. There is no documentation. What does the patch do? What is it useful for? + case PG_WAIT_IPC: + arrayIndex = NUM_WAIT_LWLOCK + + NUM_WAIT_LOCK + NUM_WAIT_BUFFER_PIN + + NUM_WAIT_ACTIVITY + NUM_WAIT_CLIENT + + NUM_WAIT_EXTENSION + eventId; + break; This is ugly and unmaintainable style. You could perhaps have considered an enum instead. pg_stat_get_wait_events should be a set-returning function, which you could filter at SQL level using a PID, so no need of it as an argument. What's the performance penalty? I am pretty sure that this is measurable as wait events are stored for a backend for each I/O operation as well, and you are calling a C routine within an inlined function which is designed to be light-weight, doing only a four-byte atomic operation. -- Michael signature.asc Description: PGP signature