Re: [Proposal] Add accumulated statistics for wait event

2021-06-16 Thread Jehan-Guillaume de Rorthais
Hi Andres,

On Mon, 14 Jun 2021 15:01:14 -0700
Andres Freund  wrote:

> On 2021-06-14 23:20:47 +0200, Jehan-Guillaume de Rorthais wrote:
> > > On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote:  
> > > > In the patch in attachment, I tried to fix this by using kind of an
> > > > internal hook for pgstat_report_wait_start and pgstat_report_wait_end.
> > > > This allows to "instrument" wait events only when required, on the fly,
> > > > dynamically.
> > > 
> > > That's *far worse*. You're adding an indirect function call. Which
> > > requires loading a global variable and then a far call to a different
> > > function. You're changing a path that's ~2 instructions with minimal
> > > dependencies (and no branches (i.e. fully out of order executable) to
> > > something on the order of ~15 instructions with plenty dependencies and
> > > at least two branches (call, ret).  
> > 
> > Oh, I didn't realized it would affect all queries, even when
> > log_statement_stats was off. Thank you for your explanation.  
> 
> Maybe I just am misunderstanding what you were doing? As far as I can
> tell your patch changed pgstat_report_wait_start() to be an indirect
> function call - right?

Exact.

I didn't realized this indirection would be so costy on every single calls,
after the variable assignation itself.

> Then yes, this adds overhead to everything.

I understand now, thank you for the explanation.
For my own curiosity and study, I'll remove this indirection and bench my patch
anyway.

> You *could* add a pgstat_report_wait_(start|end)_with_time() or such and
> only use that in places that won't have a high frequency. But I just
> don't quite see the use-case for that.

Well, it could be useful if we decide to only track a subset of wait event.
In my scenario, I originally wanted to only track ClientWrite, but then I
realized this might be too specific and tried to generalize.

There are probably some other way to deal with this issue. Eg.:

* do NOT include the time lost waiting for the frontend side in the query
  execution time
* expose the frontend part of the query time in log_min_duration_stmt,
  auto_explain, pg_stat_statements, in the same fashion we currently do with
  planning and execution time
* having some wait-even sampling mechanism in core or as easy and hot-loadable
  than auto_explain

Thoughts?

Thanks again!

Regards,




Re: [Proposal] Add accumulated statistics for wait event

2021-06-14 Thread Andres Freund
Hi,

On 2021-06-14 23:20:47 +0200, Jehan-Guillaume de Rorthais wrote:
> > On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote:
> > > In the patch in attachment, I tried to fix this by using kind of an 
> > > internal
> > > hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows 
> > > to
> > > "instrument" wait events only when required, on the fly, dynamically.  
> > 
> > That's *far worse*. You're adding an indirect function call. Which requires
> > loading a global variable and then a far call to a different function. 
> > You're
> > changing a path that's ~2 instructions with minimal dependencies (and no
> > branches (i.e. fully out of order executable) to something on the order of 
> > ~15
> > instructions with plenty dependencies and at least two branches (call, ret).
> 
> Oh, I didn't realized it would affect all queries, even when 
> log_statement_stats
> was off. Thank you for your explanation.

Maybe I just am misunderstanding what you were doing? As far as I can
tell your patch changed pgstat_report_wait_start() to be an indirect
function call - right? Then yes, this adds overhead to everything.

You *could* add a pgstat_report_wait_(start|end)_with_time() or such and
only use that in places that won't have a high frequency. But I just
don't quite see the use-case for that.

Greetings,

Andres Freund




Re: [Proposal] Add accumulated statistics for wait event

2021-06-14 Thread Jehan-Guillaume de Rorthais
Hi,

On Mon, 14 Jun 2021 11:27:21 -0700
Andres Freund  wrote:

> On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote:
> > In the patch in attachment, I tried to fix this by using kind of an internal
> > hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to
> > "instrument" wait events only when required, on the fly, dynamically.  
> 
> That's *far worse*. You're adding an indirect function call. Which requires
> loading a global variable and then a far call to a different function. You're
> changing a path that's ~2 instructions with minimal dependencies (and no
> branches (i.e. fully out of order executable) to something on the order of ~15
> instructions with plenty dependencies and at least two branches (call, ret).

Oh, I didn't realized it would affect all queries, even when log_statement_stats
was off. Thank you for your explanation.

> I doubt there's a path towards this feature without adding the necessary
> infrastructure to hot-patch the code - which is obviously quite a
> substantial project.

Right. Sadly, this kind of project is far above what I can do. So I suppose
it's a dead end for me.

I'll study if/how the sampling approach can be done dynamically.

Thank you,




Re: [Proposal] Add accumulated statistics for wait event

2021-06-14 Thread Andres Freund
Hi,

On 2021-06-14 11:27:21 -0700, Andres Freund wrote:
> On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote:
> > In the patch in attachment, I tried to fix this by using kind of an internal
> > hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to
> > "instrument" wait events only when required, on the fly, dynamically.
> 
> That's *far worse*. You're adding an indirect function call. Which requires
> loading a global variable and then a far call to a different function. You're
> changing a path that's ~2 instructions with minimal dependencies (and no
> branches (i.e. fully out of order executable) to something on the order of ~15
> instructions with plenty dependencies and at least two branches (call, ret).

In the case at hand it might even be worse, because the external function call
will require registers to be spilled for the function call. Right now wait
events "use" two register (one for the wait event, one for my_wait_event_info),
but otherwise don't cause additional spilling. With your change you'd see
register spill/reload around both wait start and end.

Greetings,

Andres Freund




Re: [Proposal] Add accumulated statistics for wait event

2021-06-14 Thread Andres Freund
Hi,

On 2021-06-14 16:10:32 +0200, Jehan-Guillaume de Rorthais wrote:
> In the patch in attachment, I tried to fix this by using kind of an internal
> hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to
> "instrument" wait events only when required, on the fly, dynamically.

That's *far worse*. You're adding an indirect function call. Which requires
loading a global variable and then a far call to a different function. You're
changing a path that's ~2 instructions with minimal dependencies (and no
branches (i.e. fully out of order executable) to something on the order of ~15
instructions with plenty dependencies and at least two branches (call, ret).

I doubt there's a path towards this feature without adding the necessary
infrastructure to hot-patch the code - which is obviously quite a
substantial project.

Greetings,

Andres Freund




Re: [Proposal] Add accumulated statistics for wait event

2021-06-14 Thread Jehan-Guillaume de Rorthais
Hi Andres, Hi all,

First, thank you for your feedback!

Please find in attachment a patch implementing accumulated wait event stats
only from the backend point of view. As I wrote when I reviewed and rebased the
existing patch, I was uncomfortable with the global approach. I still volunteer
to work/review the original approach is required.

See bellow for comments and some more explanations about what I think might be
improvements over the previous patch.

On Fri, 11 Jun 2021 12:18:07 -0700
Andres Freund  wrote:

> On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote:
> > From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais 
> > Date: Tue, 1 Jun 2021 13:25:57 +0200
> > Subject: [PATCH 1/2] Add pg_stat_waitaccum view.
> > 
> > pg_stat_waitaccum shows counts and duration of each wait events.
> > Each backend/backgrounds counts and measures the time of wait event
> > in every pgstat_report_wait_start and pgstat_report_wait_end. They
> > store those info into their local variables and send to Statistics
> > Collector. We can get those info via Statistics Collector.
> > 
> > For reducing overhead, I implemented statistic hash instead of
> > dynamic hash. I also implemented track_wait_timing which
> > determines wait event duration is collected or not.  
> 
> I object to adding this overhead. The whole selling point for wait
> events was that they are low overhead. I since spent time reducing the
> overhead further, because even just the branches for checking if
> track_activity is enabled are measurable (225a22b19ed).

Agree. The previous patch I rebased was to review it and reopen this discussion,
I even added a small FIXME in pgstat_report_wait_end and
pgstat_report_wait_start about your work:

  //FIXME: recent patch to speed up this call.

In the patch in attachment, I tried to fix this by using kind of an internal
hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to
"instrument" wait events only when required, on the fly, dynamically.

Moreover, I removed the hash structure for a simple static array for faster
access.

> > From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais 
> > Date: Fri, 4 Jun 2021 18:14:51 +0200
> > Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from
> >  INSTR_TIME to rdtsc.
> > 
> > This patch changes measuring method of wait event time from INSTR_TIME
> > (which uses gettimeofday or clock_gettime) to rdtsc. This might reduce the
> > overhead of measuring overhead.
> > 
> > Any supports like changing clock cycle to actual time or error handling are
> > not currently implemented.  
> 
> rdtsc is a serializing (*) instruction - that's the expensive part. On linux
> clock_gettime() doesn't actually need a syscall. While the vdso call
> implies a bit of overhead over a raw rdtsc, it's a relatively small part
> of the overhead.  See
> https://www.postgresql.org/message-id/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de

I choose to remove all this rdtsc part from my patch as this wasn't clear how
much faster it was compare to simpler vdso functions and how to accurately
extract a human time.


About my take on $subject, for the sake of simplicity of this PoC, I added
instrumentation to log_statement_stats. Despite the query context of the
reported log, they are really accumulated stats.

The patch updated pg_stat_get_waitaccum() as well to be able to report the
accumulated wait events from your interactive or batch session.

So using my previous fe-time demo client, you can test it using:

  PGOPTIONS="--log_statement_stats=on" ./fe-time 100

From logs, I now have (notice the last line):

  LOG:  duration: 3837.194 ms  statement: SELECT * FROM pgbench_accounts
  LOG:  QUERY STATISTICS
  DETAIL:  ! system usage stats:
   !   0.087444 s user, 0.002106 s system, 3.837202 s elapsed
   !   [0.087444 s user, 0.003974 s system total]
   !   25860 kB max resident size
   !   0/0 [0/0] filesystem blocks in/out
   !   0/303 [0/697] page faults/reclaims, 0 [0] swaps
   !   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
   !   4/18 [5/18] voluntary/involuntary context switches
   !   Client/ClientWrite 4 calls, 3747102 us elapsed


Using pgbench scale factor 10, the copy query for pgbench_accounts looks like:

  LOG:  duration: 2388.081 ms  statement: copy pgbench_accounts from stdin
  LOG:  QUERY STATISTICS
  DETAIL:  ! system usage stats:
   !   1.373756 s user, 0.252860 s system, 2.388100 s elapsed
   !   [1.397015 s user, 0.264951 s system total]
   !   37788 kB max resident size
   !   0/641584 [0/642056] filesystem blocks in/out
   !   194/4147 [195/4728] page faults/reclaims, 0 [0] swaps
   !   0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
   

Re: [Proposal] Add accumulated statistics for wait event

2021-06-11 Thread Andres Freund
HHi,

On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote:
> From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001
> From: Jehan-Guillaume de Rorthais 
> Date: Tue, 1 Jun 2021 13:25:57 +0200
> Subject: [PATCH 1/2] Add pg_stat_waitaccum view.
> 
> pg_stat_waitaccum shows counts and duration of each wait events.
> Each backend/backgrounds counts and measures the time of wait event
> in every pgstat_report_wait_start and pgstat_report_wait_end. They
> store those info into their local variables and send to Statistics
> Collector. We can get those info via Statistics Collector.
> 
> For reducing overhead, I implemented statistic hash instead of
> dynamic hash. I also implemented track_wait_timing which
> determines wait event duration is collected or not.

I object to adding this overhead. The whole selling point for wait
events was that they are low overhead. I since spent time reducing the
overhead further, because even just the branches for checking if
track_activity is enabled are measurable (225a22b19ed).


> From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001
> From: Jehan-Guillaume de Rorthais 
> Date: Fri, 4 Jun 2021 18:14:51 +0200
> Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from
>  INSTR_TIME to rdtsc.
> 
> This patch changes measuring method of wait event time from INSTR_TIME (which
> uses gettimeofday or clock_gettime) to rdtsc. This might reduce the overhead
> of measuring overhead.
> 
> Any supports like changing clock cycle to actual time or error handling are
> not currently implemented.

rdtsc is a serializing (*) instruction - that's the expensive part. On linux
clock_gettime() doesn't actually need a syscall. While the vdso call
implies a bit of overhead over a raw rdtsc, it's a relatively small part
of the overhead.  See
https://www.postgresql.org/message-id/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de

Greetings,

Andres Freund

(*) it's not fully serializing, iirc it allows later instructions to be
started, but it does wait for all earlier in-flight instructions to
finish.




Re: [Proposal] Add accumulated statistics for wait event

2021-06-04 Thread Jehan-Guillaume de Rorthais
Hi All,

I faced a few times a situation where a long running query is actually
including the time the backend is waiting for the frontend to fetch all the
rows (see [1] for details). See a sample code fe-time.c and its comments in
attachment to reproduce this behavior.

There's no simple way today to pinpoint the problem in production without
advance interactive auditing, and/or using system tools. After studying the
problem, I believe it boil down to track the wait event ClientWrite, so I ended
up on this thread.

You might catch some mismatching times thanks to auto_explain as well. Using
the fe-time.c demo code with the following command:

  PGDATABASE=postgres PGHOST=::1 time ./fe-time 100

The frontend time is 10s, the query time reported is 3228.631ms, but last row
has been produced after 20.672ms:

  LOG:  duration: 3228.631 ms  plan:
Query Text: SELECT * FROM pgbench_accounts
Seq Scan on pgbench_accounts (time=0.005..20.672 rows=10 loops=1)

(Note that in contrast with localhost, through the unix socket, the backend
reported query time is always really close to 10s).

I re-based the existing patch (see in attachment), to look at the ClientWrite
for this exact query: 

  # SELECT wait_event, calls, times 
FROM pg_stat_get_waitaccum(NULL)
WHERE wait_event = 'ClientWrite';

   wait_event  | calls |  times  
  -+---+-
   ClientWrite | 4 | 3132266

The "time" is expressed as µs in the patch, so 3132.266ms of the total
3228.631ms is spent sending the result to the frontend. I'm not sure where are
the missing 75ms.

The pg_wait_sampling extension might help but it requires a production restart
to install, then enable it. Whatever if the solution is sampling or cumulative,
an in-core and hot-switchable solution would be much more convenient. But
anyway, looking at pg_wait_sampling, we have a clear suspect as well for the
later query run:

  # SELECT event, count
FROM pg_wait_sampling_profile
WHERE queryid=4045741516911800313;

  event| count
  -+---
   ClientWrite |   309

The default profil period of pg_wait_sampling being 10ms, we can roughly
estimate the ClientWrite around 3090ms. Note that this is close enough because
we know 3132266µs has been accumulated among only 4 large wait events.

Finishing bellow.

On Mon, 3 Aug 2020 00:00:40 +0200
Daniel Gustafsson  wrote:

> > On 31 Jul 2020, at 07:23, imai.yoshik...@fujitsu.com wrote:
> >   
> >> This patch fails to apply to HEAD, please submit a rebased version.  I've
> >> marked this as as Waiting on Author.

Please, find in attachment a rebase of both patches. I did some small editing
on the way. I didn't bench them.

I'm not sure this patch is the best approach though. Receive it as a
motivation to keep up with this discussion. As I wrote, whatever if the
solution is sampling or cumulative, an in-core and hot-switchable solution
would be much more convenient. The fact is that this patch was already
available and ready to keep up with a discussion.

Collecting and summing all wait events from all backends in the same place
forbid to track precisely wait events from a specific backends. Especially on a
busy system where numbers can quickly be buried by all other activities around.

I wonder if wait events should only be accumulated on backend side, making
possible to enable/disable them on the fly and to collect some reports eg. in
logs or to output. Most of the code from these patch could be recycled in a
simpler patch implementing this.

Thoughts?

> > Sorry for my absence. Unfortunately I couldn't have time to work on this
> > patch in this cf. I believe I will be back in next cf, work on this patch
> > and also review other patches.  
> 
> No worries, it happens.  Since the thread has stalled and there is no updated
> patch I've marked this entry Returned with Feedback.  Please feel free to
> re-open a new CF entry if you return to this patch.

I volunteer to be a reviewer on this patch.

Imai-san, do you agree to add it as new CF entry?

Regards,

[1]:  Last time I had such situation was few weeks ago. A customer was
reporting a query being randomly slow, running bellow 100ms most of the time
and sometime hitting 28s. Long story short, the number of row was the same
(10-15k), but the result set size was 9x bigger (1MB vs 9MB). As the same query
was running fine from psql, I suspected the frontend was somehow saturated.
Tcpdump helped me to compute that the throughput fall to 256kB/s after the
first 2MB of data transfert with a very narrow TCP window. I explained to the
customer their app probably doesn't pull the rows fast enough and that
some buffers were probably saturated on the frontend side, waiting for
the app and slowing down the whole transfert.
Devels fixed the problem by moving away two fields transformations (unaccent)
from their loop fetching the rows.

>From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001
From: Jehan-Guillaume de Rorthais 

Re: [Proposal] Add accumulated statistics for wait event

2020-08-02 Thread Daniel Gustafsson
> On 31 Jul 2020, at 07:23, imai.yoshik...@fujitsu.com wrote:
> 
>> This patch fails to apply to HEAD, please submit a rebased version.  I've
>> marked this as as Waiting on Author.
> 
> Sorry for my absence. Unfortunately I couldn't have time to work on this 
> patch in this cf.
> I believe I will be back in next cf, work on this patch and also review other 
> patches.

No worries, it happens.  Since the thread has stalled and there is no updated
patch I've marked this entry Returned with Feedback.  Please feel free to
re-open a new CF entry if you return to this patch.

cheers ./daniel



RE: [Proposal] Add accumulated statistics for wait event

2020-07-30 Thread imai.yoshik...@fujitsu.com
> This patch fails to apply to HEAD, please submit a rebased version.  I've
> marked this as as Waiting on Author.

Sorry for my absence. Unfortunately I couldn't have time to work on this patch 
in this cf.
I believe I will be back in next cf, work on this patch and also review other 
patches.

---
Yoshikazu IMAI




Re: [Proposal] Add accumulated statistics for wait event

2020-07-01 Thread Daniel Gustafsson
Hi,

This patch fails to apply to HEAD, please submit a rebased version.  I've
marked this as as Waiting on Author.

cheers ./daniel




Re: [Proposal] Add accumulated statistics for wait event

2020-04-23 Thread Atsushi Torikoshi
Hi Imai-san,

I feel your 'pg_stat_waitaccum' will help us investigate the bottleneck.
So I'd like to do some benchmarks but unfortunately, the latest v6 patch
couldn't be applied to HEAD anymore.

Is it possible to share the latest patches?
If not, I'll make v6 applicable to the HEAD.

Regards,

--
Atsushi Torikoshi

On Fri, Feb 28, 2020 at 5:17 PM imai.yoshik...@fujitsu.com <
imai.yoshik...@fujitsu.com> wrote:

> On Wed, Feb 26, 2020 at 1:39 AM, Kyotaro Horiguchi wrote:
> > Hello.  I had a brief look on this and have some comments on this.
>
> Hi, Horiguchi-san. Thank you for looking at this!
>
> > It uses its own hash implement. Aside from the appropriateness of
> > having another implement of existing tool, in the first place hash
> > works well for wide, sparse and uncertain set of keys. But they are in
> > rather a dense and narrow set with certain and fixed members. It
> > registers more than 200 entries but bucket size is 461. It would be
> > needed to avoid colliisions, but seems a bit too wasting.
>
> Yes, wait events are grouped and wait events ID are defined as a sequential
> number, starting from specified number for each group(like 0x0100U),
> thus
> keys are in a dense and narrow set.
>
> =
> #define PG_WAIT_LWLOCK  0x0100U
> #define PG_WAIT_LOCK0x0300U
> #define PG_WAIT_BUFFER_PIN  0x0400U
> #define PG_WAIT_ACTIVITY0x0500U
> ...
>
> typedef enum
> {
> WAIT_EVENT_ARCHIVER_MAIN = PG_WAIT_ACTIVITY,
> WAIT_EVENT_AUTOVACUUM_MAIN,
> ...
> WAIT_EVENT_WAL_WRITER_MAIN
> } WaitEventActivity;
> =
>
> The number 461 is the lowest number which avoids collisions in the hash for
> current wait events set. As you pointed out, there are a bit too much
> unused
> entries.
>
>
> If we prepare arrays for each wait classes with appropriate size which just
> fits the number of each wait event class, we can store wait event
> statistics
> into those arrays with no more wastes.
> Also we calculate hash number by "(wait event ID) % (bucket size)" in hash
> case, while we calculate index of arrays by "(wait event ID) - (wait event
> class id)"
> in array case. The latter one's cost is lower than the former one.
>
> Currently I implement wait event statistics store by hash because of its
> easiness of implementation, but I think it is good to implement it by array
> for above reasons. One concern is that we put restrictions on wait events
> that it must be defined as it is sequenced in its wait class so that there
> are no unused entries in the array.
>
>
> > It seems trying to avoid doing needless work when the feature is not
> > activated by checking "if (wa_hash==NULL)", but the hash is created
> > being filled with possible entries regardless of whether
> > track_wait_timing is on or off.
>
> This might be bad implementation but there are the cases "wa_hash = NULL"
> where pgstat_init() is not called like when executing "initdb". I insert
> that check for avoiding unexpected crash in those cases.
>
> I also noticed debug codes exist around that code...I will modify it.
>
> > As the result
> > pgstat_report_waitaccum_end calls pgstat_get_wa_entry tremendously
> > frequently. This should be the reason for the recent benchmark result.
> > I'm not sure such frequency of hash-searching is acceptable even if
> > the feature is turned on.
>
> track_wait_timing parameter determines whether we collect wait time.
> Regardless of that parameter, we always collect wait count every wait event
> happens. I think calling pgstat_get_wa_entry frequently is not critical to
> performance. From pavel's benchmark results, if track_wait_timing parameter
> is off, there are +/-1.0% performance difference between patched and
> unpatched
> and it is just considered as a measurement error.
>
>
> Thanks
> --
> Yoshikazu Imai
>
>
>


RE: [Proposal] Add accumulated statistics for wait event

2020-02-28 Thread imai.yoshik...@fujitsu.com
On Wed, Feb 26, 2020 at 1:39 AM, Kyotaro Horiguchi wrote: 
> Hello.  I had a brief look on this and have some comments on this.

Hi, Horiguchi-san. Thank you for looking at this!

> It uses its own hash implement. Aside from the appropriateness of
> having another implement of existing tool, in the first place hash
> works well for wide, sparse and uncertain set of keys. But they are in
> rather a dense and narrow set with certain and fixed members. It
> registers more than 200 entries but bucket size is 461. It would be
> needed to avoid colliisions, but seems a bit too wasting.

Yes, wait events are grouped and wait events ID are defined as a sequential
number, starting from specified number for each group(like 0x0100U), thus
keys are in a dense and narrow set.

=
#define PG_WAIT_LWLOCK  0x0100U
#define PG_WAIT_LOCK0x0300U
#define PG_WAIT_BUFFER_PIN  0x0400U
#define PG_WAIT_ACTIVITY0x0500U
...

typedef enum 
{
WAIT_EVENT_ARCHIVER_MAIN = PG_WAIT_ACTIVITY,
WAIT_EVENT_AUTOVACUUM_MAIN,
...
WAIT_EVENT_WAL_WRITER_MAIN
} WaitEventActivity;
=

The number 461 is the lowest number which avoids collisions in the hash for
current wait events set. As you pointed out, there are a bit too much unused
entries.


If we prepare arrays for each wait classes with appropriate size which just
fits the number of each wait event class, we can store wait event statistics
into those arrays with no more wastes.
Also we calculate hash number by "(wait event ID) % (bucket size)" in hash
case, while we calculate index of arrays by "(wait event ID) - (wait event 
class id)"
in array case. The latter one's cost is lower than the former one.

Currently I implement wait event statistics store by hash because of its
easiness of implementation, but I think it is good to implement it by array
for above reasons. One concern is that we put restrictions on wait events
that it must be defined as it is sequenced in its wait class so that there
are no unused entries in the array.


> It seems trying to avoid doing needless work when the feature is not
> activated by checking "if (wa_hash==NULL)", but the hash is created
> being filled with possible entries regardless of whether
> track_wait_timing is on or off.

This might be bad implementation but there are the cases "wa_hash = NULL"
where pgstat_init() is not called like when executing "initdb". I insert
that check for avoiding unexpected crash in those cases.

I also noticed debug codes exist around that code...I will modify it.

> As the result
> pgstat_report_waitaccum_end calls pgstat_get_wa_entry tremendously
> frequently. This should be the reason for the recent benchmark result.
> I'm not sure such frequency of hash-searching is acceptable even if
> the feature is turned on.

track_wait_timing parameter determines whether we collect wait time.
Regardless of that parameter, we always collect wait count every wait event
happens. I think calling pgstat_get_wa_entry frequently is not critical to
performance. From pavel's benchmark results, if track_wait_timing parameter
is off, there are +/-1.0% performance difference between patched and unpatched
and it is just considered as a measurement error.


Thanks
--
Yoshikazu Imai




Re: [Proposal] Add accumulated statistics for wait event

2020-02-25 Thread Kyotaro Horiguchi
Hello.  I had a brief look on this and have some comments on this.

At Tue, 25 Feb 2020 07:53:26 +, "imai.yoshik...@fujitsu.com" 
 wrote in 
> Thanks for Wang's mail, I noticed my 0002 patch was wrong from v3.
> 
> Here, I attach correct patches.
> 
> Also I will begin to do some benchmark with higher scale and higher number of
> users and try to change stats reporting implementation to not affect
> performance, which I couldn't have not started because of another tasks.

It uses its own hash implement. Aside from the appropriateness of
having another implement of existing tool, in the first place hash
works well for wide, sparse and uncertain set of keys. But they are in
rather a dense and narrow set with certain and fixed members. It
registers more than 200 entries but bucket size is 461. It would be
needed to avoid colliisions, but seems a bit too wasting.

It seems trying to avoid doing needless work when the feature is not
activated by checking "if (wa_hash==NULL)", but the hash is created
being filled with possible entries regardless of whether
track_wait_timing is on or off. As the result
pgstat_report_waitaccum_end calls pgstat_get_wa_entry tremendously
frequently. This should be the reason for the recent benchmark result.
I'm not sure such frequency of hash-searching is acceptable even if
the feature is turned on.

I think we need a smarter mapping scheme of events to entries.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [Proposal] Add accumulated statistics for wait event

2020-02-25 Thread 王胜利
Thank you for this update! I will try it.

Best regards,
Victor wang


| |
王胜利
|
|
邮箱:atti...@126.com
|

签名由 网易邮箱大师 定制

On 02/25/2020 15:53, imai.yoshik...@fujitsu.com wrote:
On Fri, Feb 14, 2020 at 11:59 AM, 王胜利 wrote:
>I am glad to know you are working on PG accumulated statistics 
> feature, and I am interested on it.
>I see these two patch file you made, can you let me know which branch 
> of PG code based?
>
>when I use this:  https://github.com/postgres/postgres/commits/master, 
> and apply these patches, report some error.

Thanks for Wang's mail, I noticed my 0002 patch was wrong from v3.

Here, I attach correct patches.

Also I will begin to do some benchmark with higher scale and higher number of
users and try to change stats reporting implementation to not affect
performance, which I couldn't have not started because of another tasks.

--
Yoshikazu Imai


RE: [Proposal] Add accumulated statistics for wait event

2020-02-24 Thread imai.yoshik...@fujitsu.com
On Fri, Feb 14, 2020 at 11:59 AM, 王胜利 wrote: 
>I am glad to know you are working on PG accumulated statistics 
> feature, and I am interested on it.
>I see these two patch file you made, can you let me know which branch 
> of PG code based?
>
>when I use this:  https://github.com/postgres/postgres/commits/master, 
> and apply these patches, report some error.

Thanks for Wang's mail, I noticed my 0002 patch was wrong from v3.

Here, I attach correct patches.

Also I will begin to do some benchmark with higher scale and higher number of
users and try to change stats reporting implementation to not affect
performance, which I couldn't have not started because of another tasks.

--
Yoshikazu Imai


0001-Add-pg_stat_waitaccum-view-v6.patch
Description: 0001-Add-pg_stat_waitaccum-view-v6.patch


0002-POC-Change-measuring-method-of-wait-event-time-fr-v6.patch
Description:  0002-POC-Change-measuring-method-of-wait-event-time-fr-v6.patch


RE: [Proposal] Add accumulated statistics for wait event

2020-02-12 Thread imai.yoshik...@fujitsu.com
On Wed, Feb 12, 2020 at 5:42 AM, Craig Ringer wrote:
> > It seems performance difference is big in case of read only tests. The 
> > reason is that write time is relatively longer than the
> > processing time of the logic I added in the patch.
> 
> That's going to be a pretty difficult performance hit to justify.
> 
> Can we buffer collected wait events locally and spit the buffer to the
> stats collector at convenient moments? We can use a limited buffer
> size with an overflow flag, so we degrade the results rather than
> falling over or forcing excessive stats reporting at inappropriate
times.

IIUC, currently each backend collects wait events locally. When each
backend goes to idle (telling the frontend that it is ready-for-query), it
reports wait event statistics to the stats collector. The interval of each
report is over than PGSTAT_STAT_INTERVAL(default 500ms). Also when
each backend exits, it also does report.

So if we do the read only test with 50 clients, each 50 backend reports
wait events statistics to the stats collector for almost every 500ms. If
that causes performance degradation, we can improve performance by
letting backends to report its statistics, for example, only at the
backend's exit.

(I think I can easily test this by building postgres with setting
PGSTAT_STAT_INTERVAL to a large value >> 500ms.)

> I suggest that this is also a good opportunity to add some more
> tracepoints to PostgreSQL. The wait events facilities are not very
> traceable right now.

Does that mean we will add TRACE_POSTGRESQL_ to every before/after
pgstat_report_wait_start?

> That way we have a zero-overhead-when-unused option that can also be
> used to aggregate the information per-query, per-user, etc.

I see. In that way, we can accomplish no overhead when DTrace is not
enabled and what we can measure is more customizable.

It is also curious how will overhead be if we implement wait events
statistics on DTrace scripts though I can't imagine how it will be because
I haven't used DTrace.


--
Yoshikazu Imai



Re: [Proposal] Add accumulated statistics for wait event

2020-02-11 Thread Craig Ringer
On Wed, 12 Feb 2020 at 12:36, imai.yoshik...@fujitsu.com
 wrote:

> It seems performance difference is big in case of read only tests. The reason 
> is that write time is relatively longer than the
> processing time of the logic I added in the patch.

That's going to be a pretty difficult performance hit to justify.

Can we buffer collected wait events locally and spit the buffer to the
stats collector at convenient moments? We can use a limited buffer
size with an overflow flag, so we degrade the results rather than
falling over or forcing excessive stats reporting at inappropriate
times.

I suggest that this is also a good opportunity to add some more
tracepoints to PostgreSQL. The wait events facilities are not very
traceable right now. Exposing some better TRACE_POSTGRESQL_
tracepoints (SDTs) via probes.d would help users collect better
information using external tools like perf, bpftrace and systemtap.
That way we have a zero-overhead-when-unused option that can also be
used to aggregate the information per-query, per-user, etc.

(I really need to add a bunch more tracepoints to make this easier...)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise




RE: [Proposal] Add accumulated statistics for wait event

2020-02-11 Thread imai.yoshik...@fujitsu.com
On Sat, Feb 1, 2020 at 5:50 AM, Pavel Stehule wrote:
> today I run 120 5minutes pgbench tests to measure impact of this patch. 
> Result is attached.
...
> Thanks to Tomas Vondra and 2ndq for hw for testing

Thank you for doing a lot of these benchmarks!

> The result is interesting - when I run pgbench in R/W mode, then I got +/- 1% 
> changes in performance. Isn't important if
> tracking time is active or not (tested on Linux). In this mode the new code 
> is not on critical path.

It seems performance difference is big in case of read only tests. The reason 
is that write time is relatively longer than the
processing time of the logic I added in the patch.

> Looks so for higher scale than 5 and higher number of users 50 the results 
> are not too much stable (for read only tests - I
> repeated tests) and there overhead is about 10% from 60K tps to 55Ktps - 
> maybe I hit a hw limits (it running with 4CPU)

Yes, I suspect some other bottlenecks may be happened and it causes the results 
unstable. However, it may be better to
investigate what is actually happened and why performance is 
increased/decreased for over 10%. I will inspect it.


Also I attach v5 patches which corresponds to other committed patches.

--
Yoshikazu Imai


0001-Add-pg_stat_waitaccum-view-v5.patch
Description: 0001-Add-pg_stat_waitaccum-view-v5.patch


0002-POC-Change-measuring-method-of-wait-event-time-fr-v5.patch
Description:  0002-POC-Change-measuring-method-of-wait-event-time-fr-v5.patch


Re: [Proposal] Add accumulated statistics for wait event

2020-02-01 Thread Pavel Stehule
so 1. 2. 2020 v 12:34 odesílatel Tomas Vondra 
napsal:

> This patch was in WoA, but that was wrong I think - we got a patch on
> January 15, followed by a benchmark by Pavel Stehule, so I think it
> should still be in "needs review". So I've updated it and moved it to
> the next CF.
>

currently this patch needs a rebase

Pavel


>
> regards
>
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [Proposal] Add accumulated statistics for wait event

2020-02-01 Thread Tomas Vondra

This patch was in WoA, but that was wrong I think - we got a patch on
January 15, followed by a benchmark by Pavel Stehule, so I think it
should still be in "needs review". So I've updated it and moved it to
the next CF.


regards


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Proposal] Add accumulated statistics for wait event

2020-01-31 Thread Pavel Stehule
Hi

st 15. 1. 2020 v 14:15 odesílatel Imai Yoshikazu 
napsal:

> On 2020/01/13 4:11, Pavel Stehule wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:   tested, passed
> > Spec compliant:   not tested
> > Documentation:tested, passed
> >
> > I like this patch, because I used similar functionality some years ago
> very successfully. The implementation is almost simple, and the result
> should be valid by used method.
>
> Thanks for your review!
>
> > The potential problem is performance impact. Very early test show impact
> cca 3% worst case, but I'll try to repeat these tests.
>
> Yes, performance impact is the main concern. I want to know how it
> affects performance in various test cases or on various environments.
>
> > There are some ending whitespaces and useless tabs.
> >
> > The new status of this patch is: Waiting on Author
> I attach v4 patches removing those extra whitespaces of the end of lines
> and useless tabs.
>

today I run 120 5minutes pgbench tests to measure impact of this patch.
Result is attached.

test

PSQL="/usr/local/pgsql/bin/psql"
PGBENCH="/usr/local/pgsql/bin/pgbench"
export PGDATABASE=postgres

echo "*** START ***" > ~/result.txt

for i in 1 5 10 50 100
do
  echo "scale factor $i" >> ~/result.txt

  $PSQL -c "create database bench$i"
  $PGBENCH -i -s $i "bench$i"

  for c in 1 5 10 50
  do
$PGBENCH -c $c -T 300 "bench$i" >> ~/result.txt
  done

  $PSQL -c "vacuum full" "bench$i"
  $PSQL -c "vacuum analyze" "bench$i"

  for c in 1 5 10 50
  do
$PGBENCH -S -c $c -T 300 "bench$i" >> ~/result.txt
  done

  $PSQL -c "drop database bench$i"
done

Tested on computer with 4CPU, 8GB RAM - configuration: shared buffers 2GB,
work mem 20MB

The result is interesting - when I run pgbench in R/W mode, then I got +/-
1% changes in performance. Isn't important if tracking time is active or
not (tested on Linux). In this mode the new code is not on critical path.

More interesting results are from read only tests (there are visible some
higher differences)

for scale 5/ and 50 users - the tracking time increase performance about
12% (same result I got for scale/users 10/50), in other direction patched
but without tracking time decreases performance about 10% for for 50/50
(with without tracking time) and 100/5

Looks so for higher scale than 5 and higher number of users 50 the results
are not too much stable (for read only tests - I repeated tests) and there
overhead is about 10% from 60K tps to 55Ktps - maybe I hit a hw limits (it
running with 4CPU)

Thanks to Tomas Vondra and 2ndq for hw for testing

Regards

Pavel

 wait_event_type |  wait_event   |   calls|times
-+---++--
 Client  | ClientRead| 1489681408 | 221616362961
 Lock| transactionid |  103113369 |  71918794185
 LWLock  | WALWriteLock  |  104781468 |  20865855903
 Lock| tuple |   21323744 |  15800875242
 IO  | WALSync   |   50862170 |   8666988491
 LWLock  | lock_manager  |   18415423 |575308266
 IO  | WALWrite  |   51482764 |205775892
 LWLock  | buffer_content|   15385387 |168446128
 LWLock  | wal_insert|1502092 | 90019731
 IPC | ProcArrayGroupUpdate  | 178238 | 46527364
 LWLock  | ProcArrayLock | 587356 | 13298246
 IO  | DataFileExtend|2715557 | 11615216
 IPC | ClogGroupUpdate   |  54319 | 10622013
 IO  | DataFileRead  |5805298 |  9596545
 IO  | SLRURead  |9518930 |  7166217
 LWLock  | CLogControlLock   | 746759 |  6783602





> --
> Yoshikazu Imai
>


pgbench.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: [Proposal] Add accumulated statistics for wait event

2020-01-15 Thread Imai Yoshikazu
On 2020/01/13 4:11, Pavel Stehule wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:tested, passed
> 
> I like this patch, because I used similar functionality some years ago very 
> successfully. The implementation is almost simple, and the result should be 
> valid by used method.

Thanks for your review!

> The potential problem is performance impact. Very early test show impact cca 
> 3% worst case, but I'll try to repeat these tests.

Yes, performance impact is the main concern. I want to know how it 
affects performance in various test cases or on various environments.

> There are some ending whitespaces and useless tabs.
> 
> The new status of this patch is: Waiting on Author
I attach v4 patches removing those extra whitespaces of the end of lines 
and useless tabs.

--
Yoshikazu Imai
From b009b1f8f6be47ae61b5e4538e2730d721ee60db Mon Sep 17 00:00:00 2001
From: "imai.yoshikazu" 
Date: Wed, 15 Jan 2020 09:13:19 +
Subject: [PATCH v4 1/2] Add pg_stat_waitaccum view.

pg_stat_waitaccum shows counts and duration of each wait events.
Each backend/backgrounds counts and measures the time of wait event
in every pgstat_report_wait_start and pgstat_report_wait_end. They
store those info into their local variables and send to Statistics
Collector. We can get those info via Statistics Collector.

For reducing overhead, I implemented statistic hash instead of
dynamic hash. I also implemented track_wait_timing which
determines wait event duration is collected or not.

On windows, this function might be not worked correctly, because
now it initializes local variables in pg_stat_init which is not
passed to fork processes on windows.
---
 src/backend/catalog/system_views.sql  |   8 +
 src/backend/postmaster/pgstat.c   | 344 ++
 src/backend/storage/lmgr/lwlock.c |  19 ++
 src/backend/utils/adt/pgstatfuncs.c   |  80 ++
 src/backend/utils/misc/guc.c  |   9 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/catalog/pg_proc.dat   |   9 +
 src/include/pgstat.h  | 123 -
 src/include/storage/lwlock.h  |   1 +
 src/include/storage/proc.h|   1 +
 src/test/regress/expected/rules.out   |   5 +
 11 files changed, 598 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 773edf8..80f2caa 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -957,6 +957,14 @@ CREATE VIEW pg_stat_bgwriter AS
 pg_stat_get_buf_alloc() AS buffers_alloc,
 pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_waitaccum AS
+SELECT
+   S.wait_event_type AS wait_event_type,
+   S.wait_event AS wait_event,
+   S.calls AS calls,
+   S.times AS times
+   FROM pg_stat_get_waitaccum(NULL) AS S;
+
 CREATE VIEW pg_stat_progress_vacuum AS
 SELECT
 S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 51c486b..08e10ad 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -123,6 +123,7 @@
  */
 bool   pgstat_track_activities = false;
 bool   pgstat_track_counts = false;
+bool   pgstat_track_wait_timing = false;
 intpgstat_track_functions = TRACK_FUNC_OFF;
 intpgstat_track_activity_query_size = 1024;
 
@@ -153,6 +154,10 @@ static time_t last_pgstat_start_time;
 
 static bool pgStatRunningInCollector = false;
 
+WAHash *wa_hash;
+
+instr_time waitStart;
+
 /*
  * Structures in which backends store per-table info that's waiting to be
  * sent to the collector.
@@ -255,6 +260,7 @@ static int  localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WaitAccumStats waitAccumStats;
 
 /*
  * List of OIDs of databases we need to write out.  If an entry is InvalidOid,
@@ -280,6 +286,8 @@ static pid_t pgstat_forkexec(void);
 #endif
 
 NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) 
pg_attribute_noreturn();
+static void pgstat_init_waitaccum_hash(WAHash **hash);
+static PgStat_WaitAccumEntry *pgstat_add_wa_entry(WAHash *hash, uint32 key);
 static void pgstat_beshutdown_hook(int code, Datum arg);
 
 static PgStat_StatDBEntry *pgstat_get_db_entry(Oid databaseid, bool create);
@@ -287,8 +295,11 @@ static PgStat_StatTabEntry 
*pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry,

 Oid tableoid, bool create);
 static void 

Re: [Proposal] Add accumulated statistics for wait event

2020-01-12 Thread Pavel Stehule
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I like this patch, because I used similar functionality some years ago very 
successfully. The implementation is almost simple, and the result should be 
valid by used method.

The potential problem is performance impact. Very early test show impact cca 3% 
worst case, but I'll try to repeat these tests.

There are some ending whitespaces and useless tabs.

The new status of this patch is: Waiting on Author


RE: [Proposal] Add accumulated statistics for wait event

2018-11-12 Thread Yotsunaga, Naoki
On Sun, Oct 28, 2018 at 6:39 PM, legrand legrand wrote:

Hi, Thanks for comments.
I had overlooked the reply from you.

>You are right, sampling has to be "tuned" regarding the event(s) you want to 
>catch.
 I see. For tuning, you need to know the length of processing you want to 
sample?
 
> May I invite you to try it, using PASH-viewer (github) with pgsentinel 
> (github).
 I'll check and try it.

--
Naoki Yotsunaga





RE: [Proposal] Add accumulated statistics for wait event

2018-11-01 Thread Yotsunaga, Naoki
On Mon, Oct 29, 2018 at 1:52 AM, Phil Florent wrote:

Hi, thank you for comments.

>Yes you will be able to solve bottlenecks with sampling. In interactive mode, 
>a 1s interval is probably too large. I use 0s1 - 0s01 with my tool and it is 
>normally OK.

With the tool you are using, can you sample at intervals shorter than 1 second?
If you can, you can get enough sampling number and you can also acquire short 
events.

>Since grafana is now able to connect directly to a postgresql source, I use it 
>to display the information collected from pg_stat_activity and psutil ( e.g 
>https://pgphil.ovh/traqueur_dashboard_02.php page is written in french but 
>panels are in english)

It is wonderful to visualize.
Especially for beginners like me.


>Other DB have accumulated statistics but you can notice that sampling is also 
>their most modern method.
>E.g Oracle DB : 20 years ago you already had tools like "utlbstat/utlestat" . 
>Then you had "statspack". Those tools were based on accumulated statistics and 
>the reports were based on differences between 2 points. It was useful to solve 
>major problems but it was limited and not precise enough in many cases.

>The preferred feature to identify bottlenecks in the Oracle world is now ASH 
>(active session history). It can help with major problems, specific problems 
>AND it can identify short blockages.
>Too bad it is licensed as an option of their Enterprise Edition but similar 
>tools exist and they are also based on sampling of the activity.

>With the "official" ASH, sampling and archiving are done internally and you 
>have a circular memory zone dedicated to the feature. Hence the overhead is 
>lower but that's all.

>The most advanced interactive tool is called "snapper" and it is also based on 
>sampling.

Thanks. I will check it.

The current bottleneck survey method, from sampling, I can know the number 
(ratio) of waiting events.
Then, investigate from those with a high number of times (ratio).
Do you agree with this recognition?


---

Naoki, Yotsunaga.


RE: [Proposal] Add accumulated statistics for wait event

2018-10-29 Thread Phil Florent
Hi,

Is DBA really able to solve bottlenecks with sampling?

What I would like to say is that if we have information on the number of wait 
events and the wait time(like other DB), we can investigate more easily.

Yes you will be able to solve bottlenecks with sampling. In interactive mode, a 
1s interval is probably too large. I use 0s1 - 0s01 with my tool and it is 
normally OK. In batch mode I use 1s=>10s. If you want to visualize the results 
it's easy to use a dedicated tool and bottlenecks will clearly appear .
Since grafana is now able to connect directly to a postgresql source, I use it 
to display the information collected from pg_stat_activity and psutil ( e.g 
https://pgphil.ovh/traqueur_dashboard_02.php page is written in french but 
panels are in english)

Other DB have accumulated statistics but you can notice that sampling is also 
their most modern method.
E.g Oracle DB : 20 years ago you already had tools like "utlbstat/utlestat" . 
Then you had "statspack". Those tools were based on accumulated statistics and 
the reports were based on differences between 2 points. It was useful to solve 
major problems but it was limited and not precise enough in many cases.

The preferred feature to identify bottlenecks in the Oracle world is now ASH 
(active session history). It can help with major problems, specific problems 
AND it can identify short blockages.
Too bad it is licensed as an option of their Enterprise Edition but similar 
tools exist and they are also based on sampling of the activity.

With the "official" ASH, sampling and archiving are done internally and you 
have a circular memory zone dedicated to the feature. Hence the overhead is 
lower but that's all.

The most advanced interactive tool is called "snapper" and it is also based on 
sampling.

Best regards
Phil



De : Yotsunaga, Naoki 
Envoyé : lundi 29 octobre 2018 02:20
À : 'Phil Florent'; 'Michael Paquier'
Cc : 'Tomas Vondra'; 'pgsql-hackers@lists.postgresql.org'
Objet : RE: [Proposal] Add accumulated statistics for wait event


On Thu, Oct 4, 2018 at 8:22 PM, Yotsunaga Naoki wrote:



Hi, I understood and thought of your statistic comment once again. In the case 
of sampling, is there enough statistic to investigate?

In the case of a long SQL, I think that it is possible to obtain a sufficient 
sampling number.



However, in the case of about 1 minute of SQL, only 60 samples can be obtained 
at most.

#Because legard’s comment.

https://www.postgresql.org/message-id/1539158356795-0.post%40n3.nabble.com



Does this sampling number of 60 give information that I really want?

Perhaps it is not to miss the real problem part?

---

Naoki, Yotsunaga.


RE: [Proposal] Add accumulated statistics for wait event

2018-10-29 Thread legrand legrand
Hello,
You are right, sampling has to be "tuned" regarding the event(s) you want to
catch.

Sampling of 1 second interval is good with treatments that take hours, and
not enough for a minute or a second analysis.

May I invite you to try it, using PASH-viewer (github) with pgsentinel
(github).

Changing pgsentiel.c sampling from 1 second

rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
ash_sampling_period * 1000L,PG_WAIT_EXTENSION);

to 1/10 second
rc = WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
ash_sampling_period * 100L,PG_WAIT_EXTENSION);

seems the good balance for me (for analysis periods from a few seconds to
minutes).

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



RE: [Proposal] Add accumulated statistics for wait event

2018-10-28 Thread Yotsunaga, Naoki
On Thu, Oct 4, 2018 at 8:22 PM, Yotsunaga Naoki wrote:

Hi, I understood and thought of your statistic comment once again. In the case 
of sampling, is there enough statistic to investigate?
In the case of a long SQL, I think that it is possible to obtain a sufficient 
sampling number.

However, in the case of about 1 minute of SQL, only 60 samples can be obtained 
at most.
#Because legard’s comment.
https://www.postgresql.org/message-id/1539158356795-0.post%40n3.nabble.com

Does this sampling number of 60 give information that I really want?
Perhaps it is not to miss the real problem part?
---
Naoki, Yotsunaga.


Re: [Proposal] Add accumulated statistics for wait event

2018-10-10 Thread legrand legrand
Bertrand DROUVOT wrote
> Hello Guys,
> 
> As you mentioned Oracle like active session history sampling in this
> thread, I just want to let you know that I am working on a brand new
> extension to provide this feature.
> 
> You can find the extension here: https://github.com/pgsentinel/pgsentinel
> 
> [...]
> 
> If you want to have a look, give your thoughts, you are welcome.
> 
> Bertrand


+1!

just a regret: ash sampling interval can not be smaller than a second ;o(

Cordialement
PAscal




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



RE: [Proposal] Add accumulated statistics for wait event

2018-10-05 Thread Yotsunaga, Naoki
On Thu, Oct 4, 2018 at 0:54 AM, Phil Florent wrote:


Phil, Michael, I appreciate your polite comments.
I understand as follows.
We can find it if we shorten the sampling interval, but a lot of information 
comes out.
# The balance is important.
Also, it is not good unless we have enough samples.
And I have to do various other things.
Is my understand correct?

It seems to me that it is difficult for me if my understanding is right.
Is DBA really able to solve bottlenecks with sampling?
# Since I am a beginner, I feel that way. And other people may not feel it 
difficult.

What I would like to say is that if we have information on the number of wait 
events and the wait time(like other DB), we can investigate more easily.
Of course, I understand that it also affects performance. So, I suggest a way 
that it can switch on and off, defaults is off.

-
Naoki, Yotsunaga.


RE: [Proposal] Add accumulated statistics for wait event

2018-10-04 Thread Phil Florent
Hi,

It's the same logic with any polling system. An integration calculation using 
monte-carlo method with only a few points won't be accurate enough and can even 
be completely wrong etc.
Polling is OK to troubleshoot a problem on the fly but 2 points are not enough. 
A few seconds are needed to obtain good enough data, e.g 5-10 seconds of 
polling with a 0.1=>0.01s interval between 2 queries of the activity.
Polling a few seconds while the user is waiting is normally enough to say if a 
significant part of the waits are on the database. It's very important to know 
that. With 1 hour of accumulated statistics, a DBA will always see something to 
fix. But if the user waits 10 seconds on a particular screen and 1 second is 
spent on the database it often won't directly help.
Polling gives great information with postgreSQL 10 but it was already useful to 
catch top queries etc. in older versions.
I always check if activity is adequately reported by my tool using known cases. 
I want to be sure it will report adequately things in real-world 
troubleshooting sessions. Sometimes there are bugs in my tool, once there was 
an issue with postgres (pgstat_report_activty() was not called by workers in 
parallel index creation)

Best regards
Phil

De : Michael Paquier 
Envoyé : jeudi 4 octobre 2018 12:58
À : Phil Florent
Cc : Yotsunaga, Naoki; Tomas Vondra; pgsql-hackers@lists.postgresql.org
Objet : Re: [Proposal] Add accumulated statistics for wait event

On Thu, Oct 04, 2018 at 09:32:37AM +, Phil Florent wrote:
> I am a DB beginner, so please tell me. It says that you can find
> events that are bottlenecks in sampling, but as you saw above, you can
> not find events shorter than the sampling interval, right?

Yes, which is why it would be as simple as making the interval shorter,
still not too short so as it bloats the amount of information fetched
which needs to be stored and afterwards (perhaps) treated for analysis.
This gets rather close to signal processing.  A simple image is for
example, assuming that event A happens 100 times in an interval of 1s,
and event B only once in the same interval of 1s, then if the snapshot
interval is only 1s, then in the worst case A would be treated an equal
of B, which would be wrong.
--
Michael


Re: [Proposal] Add accumulated statistics for wait event

2018-10-04 Thread Michael Paquier
On Thu, Oct 04, 2018 at 09:32:37AM +, Phil Florent wrote:
> I am a DB beginner, so please tell me. It says that you can find
> events that are bottlenecks in sampling, but as you saw above, you can
> not find events shorter than the sampling interval, right?

Yes, which is why it would be as simple as making the interval shorter,
still not too short so as it bloats the amount of information fetched
which needs to be stored and afterwards (perhaps) treated for analysis.
This gets rather close to signal processing.  A simple image is for
example, assuming that event A happens 100 times in an interval of 1s,
and event B only once in the same interval of 1s, then if the snapshot
interval is only 1s, then in the worst case A would be treated an equal
of B, which would be wrong.
--
Michael


signature.asc
Description: PGP signature


RE: [Proposal] Add accumulated statistics for wait event

2018-10-04 Thread Phil Florent
Hi,
I am a DB beginner, so please tell me. It says that you can find events that 
are bottlenecks in sampling, but as you saw above, you can not find events 
shorter than the sampling interval, right?

If an event occurs frequently and if it is reported in pg_stat_activity, you 
will catch it again and again while sampling, no matter it duration.
Hence you just need to

  *   Sample the sessions and consider the active ones. You need to know if 
they are waiting (PostgreSQL now provides detailed wait events) or if they are 
on the CPU
  *   Optionally collect information on the system context at the time of 
sampling (CPU, memory...), it can be provided by many tools like psutil python 
library for example

If the client application itself provides information it's even more 
interesting. With something like 
program/module/action/client_info/sofar/totalwork in application_name you are 
able to focus directly on different kind of activity. It can give you 
information like  "I/O waits are meaningful for my batch activity but not for 
my OLTP activity, if my goal is to improve response time for end users I have 
to consider that."

Best regards
Phil


De : Yotsunaga, Naoki 
Envoyé : jeudi 4 octobre 2018 10:31
À : 'Michael Paquier'; Phil Florent
Cc : Tomas Vondra; pgsql-hackers@lists.postgresql.org
Objet : RE: [Proposal] Add accumulated statistics for wait event

On Thu, July 26, 2018 at 1:25 AM, Michael Paquier wrote:
> Even if you have spiky workloads, sampling may miss those, but even with 
> adding counters for each event
> you would need to query the table holding the counters at an insane frequency 
> to be able to perhaps get
> something out of it as you need to do sampling of the counters as well to 
> extract deltas.

Hi, I was wondering why PostgreSQL did not have the number of wait events and 
wait time that other databases such as Oracle had as a function, and when I was 
looking for related threads, I got to this thread.

I am a DB beginner, so please tell me. It says that you can find events that 
are bottlenecks in sampling, but as you saw above, you can not find events 
shorter than the sampling interval, right?
If this short event has occurred quite a number of times and it was a 
considerable amount of time in total, can you solve this problem with sampling?
# I have asked, but I could not understand much of the discussion above and I 
do not know if such a case can exist.
Also, I think that it can be solved by higher the sampling frequency, but the 
load will be high, right? I think this method is not very practical.

Moreover, I think that it is not implemented due to the reason that sampling is 
good as described above and because it affects performance.
How about switching the function on / off and implementing with default off?
Do you care about performance degradation during bottleneck investigation?
When investigating the bottleneck, I think that it is better to find the cause 
even at the expense of performance.
# If you can detect with high frequency sampling, I think that it depends on 
whether the sampling or the function(the number of wait events and wait time) 
is high load.

Since I am a DB beginner, I think that it is saying strange things.
I am glad if you tell me.

-
Naoki Yotsunaga



RE: [Proposal] Add accumulated statistics for wait event

2018-10-04 Thread Yotsunaga, Naoki
On Thu, July 26, 2018 at 1:25 AM, Michael Paquier wrote:
> Even if you have spiky workloads, sampling may miss those, but even with 
> adding counters for each event 
> you would need to query the table holding the counters at an insane frequency 
> to be able to perhaps get
> something out of it as you need to do sampling of the counters as well to 
> extract deltas.

Hi, I was wondering why PostgreSQL did not have the number of wait events and 
wait time that other databases such as Oracle had as a function, and when I was 
looking for related threads, I got to this thread.

I am a DB beginner, so please tell me. It says that you can find events that 
are bottlenecks in sampling, but as you saw above, you can not find events 
shorter than the sampling interval, right?
If this short event has occurred quite a number of times and it was a 
considerable amount of time in total, can you solve this problem with sampling?
# I have asked, but I could not understand much of the discussion above and I 
do not know if such a case can exist.
Also, I think that it can be solved by higher the sampling frequency, but the 
load will be high, right? I think this method is not very practical.
 
Moreover, I think that it is not implemented due to the reason that sampling is 
good as described above and because it affects performance.
How about switching the function on / off and implementing with default off?
Do you care about performance degradation during bottleneck investigation?
When investigating the bottleneck, I think that it is better to find the cause 
even at the expense of performance.
# If you can detect with high frequency sampling, I think that it depends on 
whether the sampling or the function(the number of wait events and wait time) 
is high load.
 
Since I am a DB beginner, I think that it is saying strange things.
I am glad if you tell me.

-
Naoki Yotsunaga




RE: [Proposal] Add accumulated statistics for wait event

2018-07-28 Thread Phil Florent
Hi,

I agree with that. PostgreSQL 10 is really great, tuning tools based on 
sampling of pg_stat_activity became accurate without any modification.

Best regards

Phil



De : Michael Paquier 
Envoyé : jeudi 26 juillet 2018 03:24
À : Phil Florent
Cc : Tomas Vondra; pgsql-hackers@lists.postgresql.org
Objet : Re: [Proposal] Add accumulated statistics for wait event

On Tue, Jul 24, 2018 at 04:23:03PM +, Phil Florent wrote:
> It loses non meaningful details and it's in fact a good point. In this
> example, sampling will definitely find the cause and won't cost
> resources.

The higher the sampling frequency, the more details you get, with the
most load on the instance.  So if you are able to take an infinity of
samples, where registering multiple times the same event for the same
backend also matters because its overall weight gets higher and it shows
up higher in profiles, then you would be able converge to the set of
results that this patch adds.  Sampling method, especially its
frequency, is something controlled by the client and not the server.
Approaches like the one proposed here push the load on the server-side,
unconditionally, for *all* backends, and this has its cost.

Even if you have spiky workloads, sampling may miss those, but even with
adding counters for each event you would need to query the table holding
the counters at an insane frequency to be able to perhaps get something
out of it as you need to do sampling of the counters as well to extract
deltas.

As Tomas has mentioned up-thread, sampling is light-weight, as-is the
current design for wait events.  Even if it is not perfect because it
cannot give exact numbers, it would find bottlenecks in really most
cases, and that's what matters.  If not, increasing the sampling
frequency makes things easier to detect as well.  What would be the
point of taking only one sample every checkpoint for example?

There may be a benefit in having counters, I don't know the answer to
that, though the point would be to make sure that there is a specific
set of workloads where it makes sense, still my gut feeling is that
sampling would be able to detect those anyway.

(I am not a computer scientist by default but a physicist, think fluid
dynamics and turbulence, and I had my load of random events and signal
analysis as well.  All that is just statistics with attempts to approach
reality, where sampling is a life-saver over exactitude of
measurements.)

Adding hooks is not acceptable to me either, those have a cost, and it
is not clear what's the benefit we can get into putting hooks in such a
place for cases other than sampling and counters...
--
Michael


Re: [Proposal] Add accumulated statistics for wait event

2018-07-26 Thread Bertrand DROUVOT
Hello Guys,

As you mentioned Oracle like active session history sampling in this
thread, I just want to let you know that I am working on a brand new
extension to provide this feature.

You can find the extension here: https://github.com/pgsentinel/pgsentinel

Basically, you could see it as samplings of pg_stat_activity (one second
interval as default) currently providing more information:

ash_time: the sampling time
blockers:  the number of blockers
blockerpid: the pid of the blocker (if blockers = 1), the pid of
one blocker (if blockers > 1)
top_level_query: the top level statement (in case PL/pgSQL is used)
query: the statement being executed (not normalised, as it is in
pg_stat_statements, means you see the values)
cmdtype: the statement type
(SELECT,UPDATE,INSERT,DELETE,UTILITY,UNKNOWN,NOTHING)
queryid: the queryid of the statement (the one coming from
pg_stat_statements)

Thanks to the queryid field you are able to link the session activity with
the sql activity.

It's implemented as in-memory ring buffer where samples are written with
given (configurable) period.
Therefore, user can see some number of recent samples depending on history
size (configurable).

Current caveats: In case of high query rate per pid, you could see (I saw
it at more than 1000 queries per second) top_level_query and query not
"correlated" (query, queryid and cmdtype are still well linked together).
This is due to the fact that those 2 informations are currently collected
independently.

If you want to have a look, give your thoughts, you are welcome.

Bertrand

On 26 July 2018 at 03:24, Michael Paquier  wrote:

> On Tue, Jul 24, 2018 at 04:23:03PM +, Phil Florent wrote:
> > It loses non meaningful details and it's in fact a good point. In this
> > example, sampling will definitely find the cause and won't cost
> > resources.
>
> The higher the sampling frequency, the more details you get, with the
> most load on the instance.  So if you are able to take an infinity of
> samples, where registering multiple times the same event for the same
> backend also matters because its overall weight gets higher and it shows
> up higher in profiles, then you would be able converge to the set of
> results that this patch adds.  Sampling method, especially its
> frequency, is something controlled by the client and not the server.
> Approaches like the one proposed here push the load on the server-side,
> unconditionally, for *all* backends, and this has its cost.
>
> Even if you have spiky workloads, sampling may miss those, but even with
> adding counters for each event you would need to query the table holding
> the counters at an insane frequency to be able to perhaps get something
> out of it as you need to do sampling of the counters as well to extract
> deltas.
>
> As Tomas has mentioned up-thread, sampling is light-weight, as-is the
> current design for wait events.  Even if it is not perfect because it
> cannot give exact numbers, it would find bottlenecks in really most
> cases, and that's what matters.  If not, increasing the sampling
> frequency makes things easier to detect as well.  What would be the
> point of taking only one sample every checkpoint for example?
>
> There may be a benefit in having counters, I don't know the answer to
> that, though the point would be to make sure that there is a specific
> set of workloads where it makes sense, still my gut feeling is that
> sampling would be able to detect those anyway.
>
> (I am not a computer scientist by default but a physicist, think fluid
> dynamics and turbulence, and I had my load of random events and signal
> analysis as well.  All that is just statistics with attempts to approach
> reality, where sampling is a life-saver over exactitude of
> measurements.)
>
> Adding hooks is not acceptable to me either, those have a cost, and it
> is not clear what's the benefit we can get into putting hooks in such a
> place for cases other than sampling and counters...
> --
> Michael
>


Re: [Proposal] Add accumulated statistics for wait event

2018-07-25 Thread Michael Paquier
On Tue, Jul 24, 2018 at 04:23:03PM +, Phil Florent wrote:
> It loses non meaningful details and it's in fact a good point. In this
> example, sampling will definitely find the cause and won't cost
> resources.

The higher the sampling frequency, the more details you get, with the
most load on the instance.  So if you are able to take an infinity of
samples, where registering multiple times the same event for the same
backend also matters because its overall weight gets higher and it shows
up higher in profiles, then you would be able converge to the set of
results that this patch adds.  Sampling method, especially its
frequency, is something controlled by the client and not the server.
Approaches like the one proposed here push the load on the server-side,
unconditionally, for *all* backends, and this has its cost.

Even if you have spiky workloads, sampling may miss those, but even with
adding counters for each event you would need to query the table holding
the counters at an insane frequency to be able to perhaps get something
out of it as you need to do sampling of the counters as well to extract
deltas.

As Tomas has mentioned up-thread, sampling is light-weight, as-is the
current design for wait events.  Even if it is not perfect because it
cannot give exact numbers, it would find bottlenecks in really most
cases, and that's what matters.  If not, increasing the sampling
frequency makes things easier to detect as well.  What would be the
point of taking only one sample every checkpoint for example?

There may be a benefit in having counters, I don't know the answer to
that, though the point would be to make sure that there is a specific
set of workloads where it makes sense, still my gut feeling is that
sampling would be able to detect those anyway.

(I am not a computer scientist by default but a physicist, think fluid
dynamics and turbulence, and I had my load of random events and signal
analysis as well.  All that is just statistics with attempts to approach
reality, where sampling is a life-saver over exactitude of
measurements.)

Adding hooks is not acceptable to me either, those have a cost, and it
is not clear what's the benefit we can get into putting hooks in such a
place for cases other than sampling and counters...
--
Michael


signature.asc
Description: PGP signature


RE: [Proposal] Add accumulated statistics for wait event

2018-07-24 Thread Phil Florent

Hi,


> Some case, sampling of events can not find the cause of issue. It lose detail 
> data.
> For example, some throughput issue occur(ex : disk io), but each wait point
> occurs only a few milliseconds.


It loses non meaningful details and it's in fact a good point. In this example, 
sampling will definitely find the cause and won't cost resources.

Being as precise as possible to define a wait event is very useful but knowing 
precisely the duration of each event is less useful in terms of tuning.


Example of sampling + group by/order by percentage of activity :


./t -d 5 -o "application_name, wait_event_type" -o "application_name, 
wait_event, wait_event_type"
traqueur 2.05.00 - performance tool for PostgreSQL 9.3 => 11
INFORMATION, no connection parameters provided, connecting to dedicated 
database ...
INFORMATION, connected to dedicated database traqueur
INFORMATION, PostgreSQL version : 11
INFORMATION, sql preparation ...
INFORMATION, sql execution ...
 busy_pc | distinct_exe | application_name | wait_event_type
-+--+--+-
 206 | 8 / 103  | mperf|
  62 | 2 / 31   | mperf| LWLock
  20 | 3 / 10   | mperf| IO
  12 | 1 / 6| mperf| Client
(4 rows)

 busy_pc | distinct_exe | application_name |  wait_event   | 
wait_event_type
-+--+--+---+-
 206 | 8 / 103  | mperf|   |
  62 | 2 / 31   | mperf| WALWriteLock  | LWLock
  14 | 1 / 7| mperf| DataFileImmediateSync | IO
  12 | 1 / 6| mperf| ClientRead| Client
   2 | 1 / 1| mperf| DataFileWrite | IO
   2 | 1 / 1| mperf| DataFileRead  | IO
   2 | 1 / 1| mperf| WALInitWrite  | IO


No need to know the exact duration of each event to identify the 
bottleneck(s)...


Best regards

Phil




De : Tomas Vondra 
Envoyé : mardi 24 juillet 2018 17:45
À : pgsql-hackers@lists.postgresql.org
Objet : Re: [Proposal] Add accumulated statistics for wait event



On 07/24/2018 12:06 PM, MyungKyu LIM wrote:
>   2018-07-23 16:53 (GMT+9), Michael Paquier wrote:
>> On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote:
>>> This proposal is about recording additional statistics of wait events.
>
>> I have comments about your patch.  First, I don't think that you need to
>> count precisely the number of wait events triggered as usually when it
>> comes to analyzing a workload's bottleneck what counts is a periodic
>> *sampling* of events, patterns which can be fetched already from
>> pg_stat_activity and stored say in a different place.
>
> Thanks for your feedback.
>
> This proposal is not about *sampling*.
> Accumulated statistics of wait events information is useful for solving
> issue. It can measure accurate data.
>
> Some case, sampling of events can not find the cause of issue. It lose detail 
> data.
> For example, some throughput issue occur(ex : disk io), but each wait point
> occurs only a few milliseconds.
> In this case, it is highly likely that will not find the cause.
>

I think it's highly likely that it will find the cause. The idea of
sampling is that while you don't measure the timing directly, you can
infer it from the frequency of the wait events in the samples. So if you
see the backend reports a particular wait event in 75% of samples, it
probably spent 75% time waiting on it.

I'm not saying sampling is perfect and it certainly is less convenient
than what you propose.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Proposal] Add accumulated statistics for wait event

2018-07-24 Thread Tomas Vondra




On 07/24/2018 12:06 PM, MyungKyu LIM wrote:

  2018-07-23 16:53 (GMT+9), Michael Paquier wrote:

On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote:

This proposal is about recording additional statistics of wait events.
  

I have comments about your patch.  First, I don't think that you need to
count precisely the number of wait events triggered as usually when it
comes to analyzing a workload's bottleneck what counts is a periodic
*sampling* of events, patterns which can be fetched already from
pg_stat_activity and stored say in a different place.


Thanks for your feedback.

This proposal is not about *sampling*.
Accumulated statistics of wait events information is useful for solving
issue. It can measure accurate data.

Some case, sampling of events can not find the cause of issue. It lose detail 
data.
For example, some throughput issue occur(ex : disk io), but each wait point
occurs only a few milliseconds.
In this case, it is highly likely that will not find the cause.



I think it's highly likely that it will find the cause. The idea of 
sampling is that while you don't measure the timing directly, you can 
infer it from the frequency of the wait events in the samples. So if you 
see the backend reports a particular wait event in 75% of samples, it 
probably spent 75% time waiting on it.


I'm not saying sampling is perfect and it certainly is less convenient 
than what you propose.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Proposal] Add accumulated statistics for wait event

2018-07-24 Thread Tomas Vondra




On 07/23/2018 03:57 PM, Tom Lane wrote:

Michael Paquier  writes:

This does not need a configure switch.


It probably is there because the OP realizes that most people wouldn't
accept having this code compiled in.


What's the performance penalty?  I am pretty sure that this is
measurable as wait events are stored for a backend for each I/O
operation as well, and you are calling a C routine within an inlined
function which is designed to be light-weight, doing only a four-byte
atomic operation.


On machines with slow gettimeofday(), I suspect the cost of this
patch would be staggering.  Even with relatively fast gettimeofday,
it doesn't look acceptable for calls in hot code paths (for instance,
lwlock.c).



Yeah. I wonder if we could measure the time for a small fraction of the 
wait events, and estimate the actual duration from that.



A bigger problem is that it breaks stuff.  There are countless
calls to pgstat_report_wait_start/pgstat_report_wait_end that
assume they have no side-effects (for example, on errno) and
can never fail.  I wouldn't trust GetCurrentTimestamp() for either.
If the report_wait calls can't be dropped into code with *complete*
certainty that they're safe, that's a big cost.

Why exactly is this insisting on logging timestamps and not,
say, just incrementing a counter?  I think doing it like this
is almost certain to end in rejection.



Because the number of times you hit wait event may not correlate with 
the time you spent waiting on it. So a simple counter is not the most 
useful thing.


regards


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: Re: [Proposal] Add accumulated statistics for wait event

2018-07-24 Thread Phil Florent
Hi,

I am skeptical about accumulated statistics.

pg_stat_activity now gives necessary information about wait events. It can be 
easily be used with a polling system that sleeps most of the time to limit the 
overhead. Measuring the duration of individual wait events is not necessary to 
know the repartition of the charge.

You can aggregate the results of the polling by application, query, wait events 
or whatever you want.

I wrote a script for that that can be used interactively or in batch mode to 
produce reports but many solutions exist .

Best regards

Phil

<http://aka.ms/weboutlook>



De : MyungKyu LIM 
Envoyé : mardi 24 juillet 2018 12:10
À : Alexander Korotkov; pgsql-hack...@postgresql.org
Cc : Woosung Sohn; DoHyung HONG
Objet : RE: Re: [Proposal] Add accumulated statistics for wait event

> On Mon, Jul 23, 2018 at 10:53 AM Michael Paquier  wrote:
>> What's the performance penalty?  I am pretty sure that this is
>> measurable as wait events are stored for a backend for each I/O
>> operation as well, and you are calling a C routine within an inlined
>> function which is designed to be light-weight, doing only a four-byte
>> atomic operation.

> Yes, the question is overhead of measuring durations of individual wait 
> events.  It has been proposed before, and there been heated debates about 
> that (see threads [1-3]).  It doesn't seem
> to be a conclusion about this feature.  The thing to be said for sure:
> performance penalty heavily depends on OS/hardware/workload.  In some cases 
> overhead is negligible, but in other cases it appears to be huge.

Thanks for good information.
I agree. Performance penalty is exist.
But wait stats are demandable and useful. In some cases, it is worth 
sacrificing performance and using it.

So, what do you think about developing as extension? I have another concept 
proposal.
2. This feature can be implemented as extension if some hooks were provided in 
following functions,
 - pgstat_report_wait_start
 - pgstat_report_wait_end
This feature can be turned on/off by on-line config when necessary.

Best regards,
MyungKyu, Lim




RE: Re: [Proposal] Add accumulated statistics for wait event

2018-07-24 Thread MyungKyu LIM
> On Mon, Jul 23, 2018 at 10:53 AM Michael Paquier  wrote:
>> What's the performance penalty?  I am pretty sure that this is 
>> measurable as wait events are stored for a backend for each I/O 
>> operation as well, and you are calling a C routine within an inlined 
>> function which is designed to be light-weight, doing only a four-byte 
>> atomic operation.

> Yes, the question is overhead of measuring durations of individual wait 
> events.  It has been proposed before, and there been heated debates about 
> that (see threads [1-3]).  It doesn't seem 
> to be a conclusion about this feature.  The thing to be said for sure:
> performance penalty heavily depends on OS/hardware/workload.  In some cases 
> overhead is negligible, but in other cases it appears to be huge.

Thanks for good information.
I agree. Performance penalty is exist.
But wait stats are demandable and useful. In some cases, it is worth 
sacrificing performance and using it.

So, what do you think about developing as extension? I have another concept 
proposal.
2. This feature can be implemented as extension if some hooks were provided in 
following functions,
 - pgstat_report_wait_start
 - pgstat_report_wait_end
This feature can be turned on/off by on-line config when necessary.

Best regards,
MyungKyu, Lim
 



RE: Re: [Proposal] Add accumulated statistics for wait event

2018-07-24 Thread MyungKyu LIM
 2018-07-23 16:53 (GMT+9), Michael Paquier wrote:
> On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote:
>> This proposal is about recording additional statistics of wait events.
 
> I have comments about your patch.  First, I don't think that you need to
> count precisely the number of wait events triggered as usually when it
> comes to analyzing a workload's bottleneck what counts is a periodic
> *sampling* of events, patterns which can be fetched already from
> pg_stat_activity and stored say in a different place.

Thanks for your feedback.

This proposal is not about *sampling*. 
Accumulated statistics of wait events information is useful for solving
issue. It can measure accurate data. 

Some case, sampling of events can not find the cause of issue. It lose detail 
data.
For example, some throughput issue occur(ex : disk io), but each wait point 
occurs only a few milliseconds. 
In this case, it is highly likely that will not find the cause.

> This is ugly and unmaintainable style.

I'm sorry. You're right.
Think as the PoC.
 
> What's the performance penalty?

I have same worries. I just tried pgbench several times.
Let me know what some good performance check method.

Best regards,
MyungKyu, Lim



Re: [Proposal] Add accumulated statistics for wait event

2018-07-23 Thread Tom Lane
Michael Paquier  writes:
> This does not need a configure switch.

It probably is there because the OP realizes that most people wouldn't
accept having this code compiled in.

> What's the performance penalty?  I am pretty sure that this is
> measurable as wait events are stored for a backend for each I/O
> operation as well, and you are calling a C routine within an inlined
> function which is designed to be light-weight, doing only a four-byte
> atomic operation.

On machines with slow gettimeofday(), I suspect the cost of this
patch would be staggering.  Even with relatively fast gettimeofday,
it doesn't look acceptable for calls in hot code paths (for instance,
lwlock.c).

A bigger problem is that it breaks stuff.  There are countless
calls to pgstat_report_wait_start/pgstat_report_wait_end that
assume they have no side-effects (for example, on errno) and
can never fail.  I wouldn't trust GetCurrentTimestamp() for either.
If the report_wait calls can't be dropped into code with *complete*
certainty that they're safe, that's a big cost.

Why exactly is this insisting on logging timestamps and not,
say, just incrementing a counter?  I think doing it like this
is almost certain to end in rejection.

regards, tom lane



Re: [Proposal] Add accumulated statistics for wait event

2018-07-23 Thread Alexander Korotkov
On Mon, Jul 23, 2018 at 10:53 AM Michael Paquier  wrote:
> What's the performance penalty?  I am pretty sure that this is
> measurable as wait events are stored for a backend for each I/O
> operation as well, and you are calling a C routine within an inlined
> function which is designed to be light-weight, doing only a four-byte
> atomic operation.

Yes, the question is overhead of measuring durations of individual
wait events.  It has been proposed before, and there been heated
debates about that (see threads [1-3]).  It doesn't seem to be a
conclusion about this feature.  The thing to be said for sure:
performance penalty heavily depends on OS/hardware/workload.  In some
cases overhead is negligible, but in other cases it appears to be
huge.

1. https://www.postgresql.org/message-id/flat/559D4729.9080704%40postgrespro.ru
2. 
https://www.postgresql.org/message-id/flat/CA%2BTgmoYd3GTz2_mJfUHF%2BRPe-bCy75ytJeKVv9x-o%2BSonCGApw%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/flat/CAG95seUAQVj09KzLwU%2Bz1B-GqdMqerzEkPFR3hn0q88XzMq-PA%40mail.gmail.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [Proposal] Add accumulated statistics for wait event

2018-07-23 Thread Egor Rogov

Hi,

that will be a great feature.


On 23.07.2018 10:53, Michael Paquier wrote:

I have comments about your patch.  First, I don't think that you need to
count precisely the number of wait events triggered as usually when it
comes to analyzing a workload's bottleneck what counts is a periodic
*sampling* of events


If I get it right, this patch is not about sampling. It gathers 
cumulative statistics, which is regrettably missing in PostgreSQL. 
That's why it should not only count exact number of wait events, but 
also min time and stddev would be helpful (as in pg_stat_statements).


--
Egor Rogov
Postgres Professional http://www.postgrespro.com




Re: [Proposal] Add accumulated statistics for wait event

2018-07-23 Thread Thomas Kellerer
> This proposal is about recording additional statisticsofwait
events. 
> The pg_stat_activity view is very useful in analysis for performance
> issues. 
> But it is difficult to get information of wait events in detail, 
> when you need to deep dive into analysis of performance. 
> It is because pg_stat_activity just shows the current wait status of
> backend. 

There is an extension that samples the information from pg_stat_activity
(similar to Oracle's ASH). 

https://github.com/postgrespro/pg_wait_sampling

Maybe it's worthwhile to combine the efforts? 




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [Proposal] Add accumulated statistics for wait event

2018-07-23 Thread Michael Paquier
On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote:
> This proposal is about recording additional statistics of wait events.

You should avoid sending things in html format, text format being
recommended on those mailing lists...  The patch applies after using
patch -p0 by the way.

I would recommend that you generate your patches using "git
format-patch".  Here are general guidelines on the matter:
https://wiki.postgresql.org/wiki/Submitting_a_Patch
Please study those guidelines, those are helpful if you want to get
yourself familiar with community process.

I have comments about your patch.  First, I don't think that you need to
count precisely the number of wait events triggered as usually when it
comes to analyzing a workload's bottleneck what counts is a periodic
*sampling* of events, patterns which can be fetched already from
pg_stat_activity and stored say in a different place.  This can be
achieved easily by using a cron job with an INSERT SELECT query which
adds data on a relation storing the event counts.  I took the time to
look at your patch, and here is some feedback.

This does not need a configure switch.  I assume that what you did is
good to learn the internals of ./configure though.

There is no documentation.  What does the patch do?  What is it useful
for?

+   case PG_WAIT_IPC:
+   arrayIndex = NUM_WAIT_LWLOCK +
+   NUM_WAIT_LOCK + NUM_WAIT_BUFFER_PIN +
+   NUM_WAIT_ACTIVITY + NUM_WAIT_CLIENT +
+   NUM_WAIT_EXTENSION + eventId;
+   break;
This is ugly and unmaintainable style.  You could perhaps have
considered an enum instead.

pg_stat_get_wait_events should be a set-returning function, which you
could filter at SQL level using a PID, so no need of it as an argument.

What's the performance penalty?  I am pretty sure that this is
measurable as wait events are stored for a backend for each I/O
operation as well, and you are calling a C routine within an inlined
function which is designed to be light-weight, doing only a four-byte
atomic operation.
--
Michael


signature.asc
Description: PGP signature