Hi,

> Some case, sampling of events can not find the cause of issue. It lose detail 
> data.
> For example, some throughput issue occur(ex : disk io), but each wait point
> occurs only a few milliseconds.


It loses non meaningful details and it's in fact a good point. In this example, 
sampling will definitely find the cause and won't cost resources.

Being as precise as possible to define a wait event is very useful but knowing 
precisely the duration of each event is less useful in terms of tuning.


Example of sampling + group by/order by percentage of activity :


./t -d 5 -o "application_name, wait_event_type" -o "application_name, 
wait_event, wait_event_type"
traqueur 2.05.00 - performance tool for PostgreSQL 9.3 => 11
INFORMATION, no connection parameters provided, connecting to dedicated 
database ...
INFORMATION, connected to dedicated database traqueur
INFORMATION, PostgreSQL version : 110000
INFORMATION, sql preparation ...
INFORMATION, sql execution ...
 busy_pc | distinct_exe | application_name | wait_event_type
---------+--------------+------------------+-----------------
     206 | 8 / 103      | mperf            |
      62 | 2 / 31       | mperf            | LWLock
      20 | 3 / 10       | mperf            | IO
      12 | 1 / 6        | mperf            | Client
(4 rows)

 busy_pc | distinct_exe | application_name |      wait_event       | 
wait_event_type
---------+--------------+------------------+-----------------------+-----------------
     206 | 8 / 103      | mperf            |                       |
      62 | 2 / 31       | mperf            | WALWriteLock          | LWLock
      14 | 1 / 7        | mperf            | DataFileImmediateSync | IO
      12 | 1 / 6        | mperf            | ClientRead            | Client
       2 | 1 / 1        | mperf            | DataFileWrite         | IO
       2 | 1 / 1        | mperf            | DataFileRead          | IO
       2 | 1 / 1        | mperf            | WALInitWrite          | IO


No need to know the exact duration of each event to identify the 
bottleneck(s)...


Best regards

Phil



________________________________
De : Tomas Vondra <tomas.von...@2ndquadrant.com>
Envoyé : mardi 24 juillet 2018 17:45
À : pgsql-hackers@lists.postgresql.org
Objet : Re: [Proposal] Add accumulated statistics for wait event



On 07/24/2018 12:06 PM, MyungKyu LIM wrote:
>   2018-07-23 16:53 (GMT+9), Michael Paquier wrote:
>> On Mon, Jul 23, 2018 at 04:04:42PM +0900, 임명규 wrote:
>>> This proposal is about recording additional statistics of wait events.
>
>> I have comments about your patch.  First, I don't think that you need to
>> count precisely the number of wait events triggered as usually when it
>> comes to analyzing a workload's bottleneck what counts is a periodic
>> *sampling* of events, patterns which can be fetched already from
>> pg_stat_activity and stored say in a different place.
>
> Thanks for your feedback.
>
> This proposal is not about *sampling*.
> Accumulated statistics of wait events information is useful for solving
> issue. It can measure accurate data.
>
> Some case, sampling of events can not find the cause of issue. It lose detail 
> data.
> For example, some throughput issue occur(ex : disk io), but each wait point
> occurs only a few milliseconds.
> In this case, it is highly likely that will not find the cause.
>

I think it's highly likely that it will find the cause. The idea of
sampling is that while you don't measure the timing directly, you can
infer it from the frequency of the wait events in the samples. So if you
see the backend reports a particular wait event in 75% of samples, it
probably spent 75% time waiting on it.

I'm not saying sampling is perfect and it certainly is less convenient
than what you propose.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to