Re: [PATCH v2 0/4] perf: Fix perf_event_attr::exclusive rotation

2020-11-09 Thread Peter Zijlstra
On Mon, Nov 02, 2020 at 06:41:43PM -0800, Andi Kleen wrote:
> On Mon, Nov 02, 2020 at 03:16:25PM +0100, Peter Zijlstra wrote:
> > On Sun, Nov 01, 2020 at 07:52:38PM -0800, Andi Kleen wrote:
> > > The main motivation is actually that the "multiple groups" algorithm
> > > in perf doesn't work all that great: it has quite a few cases where it
> > > starves groups or makes the wrong decisions. That is because it is very
> > > difficult (likely NP complete) problem and the kernel takes a lot
> > > of short cuts to avoid spending too much time on it.
> > 
> > The event scheduling should be starvation free, except in the presence
> > of pinned events.
> > 
> > If you can show starvation without pinned events, it's a bug.
> > 
> > It will also always do equal or better than exclusive mode wrt PMU
> > utilization. Again, if it doesn't it's a bug.
> 
> Simple example (I think we've shown that one before):
> 
> (on skylake)
> $ cat /proc/sys/kernel/nmi_watchdog
> 0
> $ perf stat -e 
> instructions,cycles,frontend_retired.latency_ge_2,frontend_retired.latency_ge_16
>  -a sleep 2
> 
>  Performance counter stats for 'system wide':
> 
>654,514,990  instructions  #0.34  insn per cycle   
> (50.67%)
>  1,924,297,028  cycles
> (74.28%)
> 21,708,935  frontend_retired.latency_ge_2 
> (75.01%)
>  1,769,952  frontend_retired.latency_ge_16
>  (24.99%)
> 
>2.002426541 seconds time elapsed
> 
> The second frontend_retired should be both getting 50% and the fixed events 
> should be getting
> 100%. So several events are starved.

*should* how? Also, nothing is 0% so nothing is getting starved.

> Another similar example is trying to schedule the topdown events on Icelake 
> in parallel to other
> groups. It works with one extra group, but breaks with two.
> 
> (on icelake)
> $ cat /proc/sys/kernel/nmi_watchdog
> 0
> $ perf stat -e 
> '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring},{branches,branches,branches,branches,branches,branches,branches,branches},{branches,branches,branches,branches,branches,branches,branches,branches}'
>  -a sleep 1
> 
>  Performance counter stats for 'system wide':
> 
> 71,229,087  slots 
> (60.65%)
>  5,066,320  topdown-bad-spec  #  7.1% bad speculation 
> (60.65%)
> 35,080,387  topdown-be-bound  # 49.2% backend bound   
> (60.65%)
> 22,769,750  topdown-fe-bound  # 32.0% frontend bound  
> (60.65%)
>  8,336,760  topdown-retiring  # 11.7% retiring
> (60.65%)
>424,584  branches  
> (70.00%)
>424,584  branches  
> (70.00%)
>424,584  branches  
> (70.00%)
>424,584  branches  
> (70.00%)
>424,584  branches  
> (70.00%)
>424,584  branches  
> (70.00%)
>424,584  branches  
> (70.00%)
>424,584  branches  
> (70.00%)
>  3,634,075  branches  
> (30.00%)
>  3,634,075  branches  
> (30.00%)
>  3,634,075  branches  
> (30.00%)
>  3,634,075  branches  
> (30.00%)
>  3,634,075  branches  
> (30.00%)
>  3,634,075  branches  
> (30.00%)
>  3,634,075  branches  
> (30.00%)
>  3,634,075  branches  
> (30.00%)
> 
>1.001312511 seconds time elapsed
> 
> A tool using exclusive hopefully will be able to do better than this.

I don't see how, exclusive will always result in equal or worse PMU
utilization, never better.


Re: [PATCH v2 0/4] perf: Fix perf_event_attr::exclusive rotation

2020-11-02 Thread Andi Kleen
On Mon, Nov 02, 2020 at 03:16:25PM +0100, Peter Zijlstra wrote:
> On Sun, Nov 01, 2020 at 07:52:38PM -0800, Andi Kleen wrote:
> > The main motivation is actually that the "multiple groups" algorithm
> > in perf doesn't work all that great: it has quite a few cases where it
> > starves groups or makes the wrong decisions. That is because it is very
> > difficult (likely NP complete) problem and the kernel takes a lot
> > of short cuts to avoid spending too much time on it.
> 
> The event scheduling should be starvation free, except in the presence
> of pinned events.
> 
> If you can show starvation without pinned events, it's a bug.
> 
> It will also always do equal or better than exclusive mode wrt PMU
> utilization. Again, if it doesn't it's a bug.

Simple example (I think we've shown that one before):

(on skylake)
$ cat /proc/sys/kernel/nmi_watchdog
0
$ perf stat -e 
instructions,cycles,frontend_retired.latency_ge_2,frontend_retired.latency_ge_16
 -a sleep 2

 Performance counter stats for 'system wide':

   654,514,990  instructions  #0.34  insn per cycle 
  (50.67%)
 1,924,297,028  cycles  
  (74.28%)
21,708,935  frontend_retired.latency_ge_2   
  (75.01%)
 1,769,952  frontend_retired.latency_ge_16  
   (24.99%)

   2.002426541 seconds time elapsed

The second frontend_retired should be both getting 50% and the fixed events 
should be getting
100%. So several events are starved.

Another similar example is trying to schedule the topdown events on Icelake in 
parallel to other
groups. It works with one extra group, but breaks with two.

(on icelake)
$ cat /proc/sys/kernel/nmi_watchdog
0
$ perf stat -e 
'{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring},{branches,branches,branches,branches,branches,branches,branches,branches},{branches,branches,branches,branches,branches,branches,branches,branches}'
 -a sleep 1

 Performance counter stats for 'system wide':

71,229,087  slots   
  (60.65%)
 5,066,320  topdown-bad-spec  #  7.1% bad speculation   
  (60.65%)
35,080,387  topdown-be-bound  # 49.2% backend bound 
  (60.65%)
22,769,750  topdown-fe-bound  # 32.0% frontend bound
  (60.65%)
 8,336,760  topdown-retiring  # 11.7% retiring  
  (60.65%)
   424,584  branches
  (70.00%)
   424,584  branches
  (70.00%)
   424,584  branches
  (70.00%)
   424,584  branches
  (70.00%)
   424,584  branches
  (70.00%)
   424,584  branches
  (70.00%)
   424,584  branches
  (70.00%)
   424,584  branches
  (70.00%)
 3,634,075  branches
  (30.00%)
 3,634,075  branches
  (30.00%)
 3,634,075  branches
  (30.00%)
 3,634,075  branches
  (30.00%)
 3,634,075  branches
  (30.00%)
 3,634,075  branches
  (30.00%)
 3,634,075  branches
  (30.00%)
 3,634,075  branches
  (30.00%)

   1.001312511 seconds time elapsed

A tool using exclusive hopefully will be able to do better than this.

-Andi


Re: [PATCH v2 0/4] perf: Fix perf_event_attr::exclusive rotation

2020-11-02 Thread Peter Zijlstra
On Sun, Nov 01, 2020 at 07:52:38PM -0800, Andi Kleen wrote:
> The main motivation is actually that the "multiple groups" algorithm
> in perf doesn't work all that great: it has quite a few cases where it
> starves groups or makes the wrong decisions. That is because it is very
> difficult (likely NP complete) problem and the kernel takes a lot
> of short cuts to avoid spending too much time on it.

The event scheduling should be starvation free, except in the presence
of pinned events.

If you can show starvation without pinned events, it's a bug.

It will also always do equal or better than exclusive mode wrt PMU
utilization. Again, if it doesn't it's a bug.

Please provide concrete examples for these two cases, or stop spreading
FUD like this.


Re: [PATCH v2 0/4] perf: Fix perf_event_attr::exclusive rotation

2020-11-01 Thread Andi Kleen
> hm, it's too late for me to check ;-) but should I be able to do
> this with exclusive event.. running both command at the same time:

Yes. The exclusive part only applies during a given context,
but the two commands are different contexts.

You would only see a difference when in the same context,
and you have multiple groups (or events) that could in theory schedule
in parallel

e.g. something like

perf stat -e '{cycles,cycles},{cycles,cycles}'  ...

The main motivation is actually that the "multiple groups" algorithm
in perf doesn't work all that great: it has quite a few cases where it
starves groups or makes the wrong decisions. That is because it is very
difficult (likely NP complete) problem and the kernel takes a lot
of short cuts to avoid spending too much time on it.

With exclusive it will be possible for a tool to generate "perfect groups"
in user space and assume the kernel schedules it dumbly, but at least
without any starvation.

-Andi


Re: [PATCH v2 0/4] perf: Fix perf_event_attr::exclusive rotation

2020-10-31 Thread Jiri Olsa
On Thu, Oct 29, 2020 at 05:27:19PM +0100, Peter Zijlstra wrote:
> Hi,
> 
> Andi recently added exclusive event group support to tools/perf:
> 
>   https://lkml.kernel.org/r/20201014144255.22699-1-a...@firstfloor.org
> 
> and promptly found that they didn't work as specified.
> 
> (sorry for the resend, I forgot LKML the first time)

hm, it's too late for me to check ;-) but should I be able to do
this with exclusive event.. running both command at the same time:

$ sudo ./perf stat -e cycles:e -I 1000
#   time counts unit events
 1.002430650 33,946,849  cycles:e   
 
 2.004920725502,399,986  cycles:e   
   (67.57%)
 3.007087631859,745,048  cycles:e   
   (50.00%)
 4.009078254845,860,723  cycles:e   
   (50.00%)
 5.011086104838,457,275  cycles:e   
   (50.00%)

$ sudo ./perf stat -e cycles:e  -I 1000
#   time counts unit events
 1.001665466848,973,404  cycles:e   
   (50.01%)
 2.003658048856,505,255  cycles:e   
   (50.00%)
 3.005658022842,737,973  cycles:e   
   (50.00%)
 4.007657797844,800,598  cycles:e   
   (50.00%)

jirka



[PATCH v2 0/4] perf: Fix perf_event_attr::exclusive rotation

2020-10-29 Thread Peter Zijlstra
Hi,

Andi recently added exclusive event group support to tools/perf:

  https://lkml.kernel.org/r/20201014144255.22699-1-a...@firstfloor.org

and promptly found that they didn't work as specified.

(sorry for the resend, I forgot LKML the first time)