Re: [PATCH] perf report: Fix wrong LBR block sorting

2021-04-07 Thread Arnaldo Carvalho de Melo
Em Wed, Apr 07, 2021 at 06:49:57AM -0700, Andi Kleen escreveu:
> > Now the hottest block is reported at the top of output.
> > 
> > Fixes: b65a7d372b1a ("perf hist: Support block formats with 
> > compare/sort/display")
> > Signed-off-by: Jin Yao 
> 
> 
> Reviewed-by: Andi Kleen 

Thanks, applied.

- Arnaldo



Re: [PATCH] perf report: Fix wrong LBR block sorting

2021-04-07 Thread Andi Kleen
> Now the hottest block is reported at the top of output.
> 
> Fixes: b65a7d372b1a ("perf hist: Support block formats with 
> compare/sort/display")
> Signed-off-by: Jin Yao 


Reviewed-by: Andi Kleen 
-Andi


[PATCH] perf report: Fix wrong LBR block sorting

2021-04-06 Thread Jin Yao
When '--total-cycles' is specified, it supports sorting for all blocks
by 'Sampled Cycles%'. This is useful to concentrate on the globally
hottest blocks.

'Sampled Cycles%' - block sampled cycles aggregation / total sampled cycles

But in current code, it doesn't use the cycles aggregation. Part of 'cycles'
counting is possibly dropped for some overlap jumps. But for identifying
the hot block, we always need the full cycles.

  # perf record -b ./triad_loop
  # perf report --total-cycles --stdio

Before:

  #
  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
   [Program Block Range] Shared Object
  # ...  ..  ...  ..  
..  

  #
  0.81% 7934.32% 793
[setup-vdso.h:34 -> setup-vdso.h:40]ld-2.27.so
  0.49% 4800.87% 160
 [native_write_msr+0 -> native_write_msr+16] [kernel.kallsyms]
  0.48% 4760.52%  95
   [native_read_msr+0 -> native_read_msr+29] [kernel.kallsyms]
  0.31% 3031.65% 303
   [nmi_restore+0 -> nmi_restore+37] [kernel.kallsyms]
  0.26% 2551.39% 255   
[nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162] [kernel.kallsyms]
  0.24% 2341.28% 234
[end_repeat_nmi+67 -> end_repeat_nmi+83] [kernel.kallsyms]
  0.23% 2271.24% 227
 [__irqentry_text_end+96 -> __irqentry_text_end+126] [kernel.kallsyms]
  0.20% 1941.06% 194
  [native_set_debugreg+52 -> native_set_debugreg+56] [kernel.kallsyms]
  0.11% 1060.14%  26
 [native_sched_clock+0 -> native_sched_clock+98] [kernel.kallsyms]
  0.10%  970.53%  97
 [trigger_load_balance+0 -> trigger_load_balance+67] [kernel.kallsyms]
  0.09%  850.46%  85
  [get-dynamic-info.h:102 -> get-dynamic-info.h:111]ld-2.27.so
  ...
  0.00%   92.7K0.02%   4
[triad_loop.c:64 -> triad_loop.c:65]triad_loop

The hottest block '[triad_loop.c:64 -> triad_loop.c:65]' is not at
the top of output.

After:

  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
   [Program Block Range] Shared Object
  # ...  ..  ...  ..  
..  

  #
 94.35%   92.7K0.02%   4
[triad_loop.c:64 -> triad_loop.c:65]triad_loop
  0.81% 7934.32% 793
[setup-vdso.h:34 -> setup-vdso.h:40]ld-2.27.so
  0.49% 4800.87% 160
 [native_write_msr+0 -> native_write_msr+16] [kernel.kallsyms]
  0.48% 4760.52%  95
   [native_read_msr+0 -> native_read_msr+29] [kernel.kallsyms]
  0.31% 3031.65% 303
   [nmi_restore+0 -> nmi_restore+37] [kernel.kallsyms]
  0.26% 2551.39% 255   
[nohz_balance_exit_idle+75 -> nohz_balance_exit_idle+162] [kernel.kallsyms]
  0.24% 2341.28% 234
[end_repeat_nmi+67 -> end_repeat_nmi+83] [kernel.kallsyms]
  0.23% 2271.24% 227
 [__irqentry_text_end+96 -> __irqentry_text_end+126] [kernel.kallsyms]
  0.20% 1941.06% 194
  [native_set_debugreg+52 -> native_set_debugreg+56] [kernel.kallsyms]
  0.11% 1060.14%  26
 [native_sched_clock+0 -> native_sched_clock+98] [kernel.kallsyms]
  0.10%  970.53%  97
 [trigger_load_balance+0 -> trigger_load_balance+67] [kernel.kallsyms]
  0.09%  850.46%  85
  [get-dynamic-info.h:102 -> get-dynamic-info.h:111]ld-2.27.so