[PATCH 4.9 101/310] perf report: Fix off-by-one for non-activation frames

2018-04-11 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Milian Wolff 


[ Upstream commit 1982ad48fc82c284a5cc55697a012d3357e84d01 ]

As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
  g++ -std=c++11 -g -O2 test.cpp
  echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
  perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
  echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
  perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~
  __noinline void bar(void)
  {
volatile long cnt = 0;

for (cnt = 0; cnt < 1; cnt++);
  }

  __noinline void foo(void)
  {
bar();
  }

  void sig_handler(int sig)
  {
foo();
  }

  int main(void)
  {
signal(SIGUSR1, sig_handler);
raise(SIGUSR1);

foo();
return 0;
  }


Before, the report wrongly points to `signal.c:29` after raise():


  $ perf report --stdio --no-children -g srcline -s srcline
  ...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc(). For libunwind, we replace the functionality via
unw_is_signal_frame() for any but the very first frame.

Signed-off-by: 

[PATCH 4.9 101/310] perf report: Fix off-by-one for non-activation frames

2018-04-11 Thread Greg Kroah-Hartman
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Milian Wolff 


[ Upstream commit 1982ad48fc82c284a5cc55697a012d3357e84d01 ]

As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
  g++ -std=c++11 -g -O2 test.cpp
  echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
  perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
  echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
  perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~
  __noinline void bar(void)
  {
volatile long cnt = 0;

for (cnt = 0; cnt < 1; cnt++);
  }

  __noinline void foo(void)
  {
bar();
  }

  void sig_handler(int sig)
  {
foo();
  }

  int main(void)
  {
signal(SIGUSR1, sig_handler);
raise(SIGUSR1);

foo();
return 0;
  }


Before, the report wrongly points to `signal.c:29` after raise():


  $ perf report --stdio --no-children -g srcline -s srcline
  ...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc(). For libunwind, we replace the functionality via
unw_is_signal_frame() for any but the very first frame.

Signed-off-by: Milian Wolff 
Signed-off-by: Namhyung Kim 
Cc: Arnaldo Carvalho de Melo 
Cc: 

[PATCH AUTOSEL for 4.9 102/281] perf report: Fix off-by-one for non-activation frames

2018-03-19 Thread Sasha Levin
From: Milian Wolff 

[ Upstream commit 1982ad48fc82c284a5cc55697a012d3357e84d01 ]

As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
  g++ -std=c++11 -g -O2 test.cpp
  echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
  perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
  echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
  perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~
  __noinline void bar(void)
  {
volatile long cnt = 0;

for (cnt = 0; cnt < 1; cnt++);
  }

  __noinline void foo(void)
  {
bar();
  }

  void sig_handler(int sig)
  {
foo();
  }

  int main(void)
  {
signal(SIGUSR1, sig_handler);
raise(SIGUSR1);

foo();
return 0;
  }


Before, the report wrongly points to `signal.c:29` after raise():


  $ perf report --stdio --no-children -g srcline -s srcline
  ...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc(). For libunwind, we replace the functionality via
unw_is_signal_frame() for any but the very first frame.

Signed-off-by: Milian Wolff 
Signed-off-by: Namhyung Kim 
Cc: Arnaldo 

[PATCH AUTOSEL for 4.9 102/281] perf report: Fix off-by-one for non-activation frames

2018-03-19 Thread Sasha Levin
From: Milian Wolff 

[ Upstream commit 1982ad48fc82c284a5cc55697a012d3357e84d01 ]

As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
  g++ -std=c++11 -g -O2 test.cpp
  echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
  perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
  echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
  perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~
  __noinline void bar(void)
  {
volatile long cnt = 0;

for (cnt = 0; cnt < 1; cnt++);
  }

  __noinline void foo(void)
  {
bar();
  }

  void sig_handler(int sig)
  {
foo();
  }

  int main(void)
  {
signal(SIGUSR1, sig_handler);
raise(SIGUSR1);

foo();
return 0;
  }


Before, the report wrongly points to `signal.c:29` after raise():


  $ perf report --stdio --no-children -g srcline -s srcline
  ...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc(). For libunwind, we replace the functionality via
unw_is_signal_frame() for any but the very first frame.

Signed-off-by: Milian Wolff 
Signed-off-by: Namhyung Kim 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: 

Re: perf report: fix off-by-one for non-activation frames

2017-07-04 Thread Milian Wolff
On Monday, June 19, 2017 8:59:39 PM CEST Arnaldo Carvalho de Melo wrote:
> Em Sat, Jun 17, 2017 at 01:13:11PM +0200, Milian Wolff escreveu:
> > On Samstag, 17. Juni 2017 10:04:02 CEST Jan Kratochvil wrote:
> > > On Sat, 17 Jun 2017 09:56:57 +0200, Namhyung Kim wrote:
> > > > Not sure whether it needs be fixed or not.  If we fix it, srcline and
> > > > address would not match so it can give its own confusion to users.
> > > > Ideally it should display an addressof the instruction before the
> > > > address IMHO.
> > > 
> > > One can figure million ways how it can behave and each one has its pros
> > > and
> > > cons.  I was just describing the current behavior of GDB and LLDB which
> > > people are used to already.
> > 
> > Personally, I agree with Jan that we should mimick existing tool's
> > behavior. I just fear that it's not trivial to do it with the current
> > code base...
> But we agree it is a worthwhile change (have backtraces in perf match
> what gdb, etc show), right?
> 
> If you can, please try to do this, your attempt will help us understand
> more the extent of the changes needed and perhaps someonw can come up
> with simplifications...

Sorry for the (long) delay, but I'm sadly busy on other things right now. I 
have this on my radar and will try to find time to look into it. But it 
probably won't happen before end of July. If this is urgent, maybe someone 
else needs to look into it before me.

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: perf report: fix off-by-one for non-activation frames

2017-07-04 Thread Milian Wolff
On Monday, June 19, 2017 8:59:39 PM CEST Arnaldo Carvalho de Melo wrote:
> Em Sat, Jun 17, 2017 at 01:13:11PM +0200, Milian Wolff escreveu:
> > On Samstag, 17. Juni 2017 10:04:02 CEST Jan Kratochvil wrote:
> > > On Sat, 17 Jun 2017 09:56:57 +0200, Namhyung Kim wrote:
> > > > Not sure whether it needs be fixed or not.  If we fix it, srcline and
> > > > address would not match so it can give its own confusion to users.
> > > > Ideally it should display an addressof the instruction before the
> > > > address IMHO.
> > > 
> > > One can figure million ways how it can behave and each one has its pros
> > > and
> > > cons.  I was just describing the current behavior of GDB and LLDB which
> > > people are used to already.
> > 
> > Personally, I agree with Jan that we should mimick existing tool's
> > behavior. I just fear that it's not trivial to do it with the current
> > code base...
> But we agree it is a worthwhile change (have backtraces in perf match
> what gdb, etc show), right?
> 
> If you can, please try to do this, your attempt will help us understand
> more the extent of the changes needed and perhaps someonw can come up
> with simplifications...

Sorry for the (long) delay, but I'm sadly busy on other things right now. I 
have this on my radar and will try to find time to look into it. But it 
probably won't happen before end of July. If this is urgent, maybe someone 
else needs to look into it before me.

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: perf report: fix off-by-one for non-activation frames

2017-06-19 Thread Arnaldo Carvalho de Melo
Em Sat, Jun 17, 2017 at 01:13:11PM +0200, Milian Wolff escreveu:
> On Samstag, 17. Juni 2017 10:04:02 CEST Jan Kratochvil wrote:
> > On Sat, 17 Jun 2017 09:56:57 +0200, Namhyung Kim wrote:
> > > Not sure whether it needs be fixed or not.  If we fix it, srcline and
> > > address would not match so it can give its own confusion to users.
> > > Ideally it should display an addressof the instruction before the
> > > address IMHO.
> > 
> > One can figure million ways how it can behave and each one has its pros and
> > cons.  I was just describing the current behavior of GDB and LLDB which
> > people are used to already.
> 
> Personally, I agree with Jan that we should mimick existing tool's behavior. 
> I 
> just fear that it's not trivial to do it with the current code base...

But we agree it is a worthwhile change (have backtraces in perf match
what gdb, etc show), right?

If you can, please try to do this, your attempt will help us understand
more the extent of the changes needed and perhaps someonw can come up
with simplifications...

- Arnaldo


Re: perf report: fix off-by-one for non-activation frames

2017-06-19 Thread Arnaldo Carvalho de Melo
Em Sat, Jun 17, 2017 at 01:13:11PM +0200, Milian Wolff escreveu:
> On Samstag, 17. Juni 2017 10:04:02 CEST Jan Kratochvil wrote:
> > On Sat, 17 Jun 2017 09:56:57 +0200, Namhyung Kim wrote:
> > > Not sure whether it needs be fixed or not.  If we fix it, srcline and
> > > address would not match so it can give its own confusion to users.
> > > Ideally it should display an addressof the instruction before the
> > > address IMHO.
> > 
> > One can figure million ways how it can behave and each one has its pros and
> > cons.  I was just describing the current behavior of GDB and LLDB which
> > people are used to already.
> 
> Personally, I agree with Jan that we should mimick existing tool's behavior. 
> I 
> just fear that it's not trivial to do it with the current code base...

But we agree it is a worthwhile change (have backtraces in perf match
what gdb, etc show), right?

If you can, please try to do this, your attempt will help us understand
more the extent of the changes needed and perhaps someonw can come up
with simplifications...

- Arnaldo


Re: perf report: fix off-by-one for non-activation frames

2017-06-17 Thread Milian Wolff
On Samstag, 17. Juni 2017 10:04:02 CEST Jan Kratochvil wrote:
> On Sat, 17 Jun 2017 09:56:57 +0200, Namhyung Kim wrote:
> > Not sure whether it needs be fixed or not.  If we fix it, srcline and
> > address would not match so it can give its own confusion to users.
> > Ideally it should display an addressof the instruction before the
> > address IMHO.
> 
> One can figure million ways how it can behave and each one has its pros and
> cons.  I was just describing the current behavior of GDB and LLDB which
> people are used to already.

Personally, I agree with Jan that we should mimick existing tool's behavior. I 
just fear that it's not trivial to do it with the current code base...

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


Re: perf report: fix off-by-one for non-activation frames

2017-06-17 Thread Milian Wolff
On Samstag, 17. Juni 2017 10:04:02 CEST Jan Kratochvil wrote:
> On Sat, 17 Jun 2017 09:56:57 +0200, Namhyung Kim wrote:
> > Not sure whether it needs be fixed or not.  If we fix it, srcline and
> > address would not match so it can give its own confusion to users.
> > Ideally it should display an addressof the instruction before the
> > address IMHO.
> 
> One can figure million ways how it can behave and each one has its pros and
> cons.  I was just describing the current behavior of GDB and LLDB which
> people are used to already.

Personally, I agree with Jan that we should mimick existing tool's behavior. I 
just fear that it's not trivial to do it with the current code base...

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


Re: perf report: fix off-by-one for non-activation frames

2017-06-17 Thread Jan Kratochvil
On Sat, 17 Jun 2017 09:56:57 +0200, Namhyung Kim wrote:
> Not sure whether it needs be fixed or not.  If we fix it, srcline and
> address would not match so it can give its own confusion to users.
> Ideally it should display an addressof the instruction before the
> address IMHO.

One can figure million ways how it can behave and each one has its pros and
cons.  I was just describing the current behavior of GDB and LLDB which people
are used to already.


Jan


Re: perf report: fix off-by-one for non-activation frames

2017-06-17 Thread Jan Kratochvil
On Sat, 17 Jun 2017 09:56:57 +0200, Namhyung Kim wrote:
> Not sure whether it needs be fixed or not.  If we fix it, srcline and
> address would not match so it can give its own confusion to users.
> Ideally it should display an addressof the instruction before the
> address IMHO.

One can figure million ways how it can behave and each one has its pros and
cons.  I was just describing the current behavior of GDB and LLDB which people
are used to already.


Jan


Re: perf report: fix off-by-one for non-activation frames

2017-06-17 Thread Namhyung Kim
On Sat, Jun 17, 2017 at 4:54 AM, Milian Wolff  wrote:
> On Freitag, 16. Juni 2017 13:57:44 CEST Jan Kratochvil wrote:
>> On Fri, 16 Jun 2017 13:51:37 +0200, Milian Wolff wrote:
>> > > perf-4.12.0-0.rc5.git0.1.fc27.x86_64
>> > >
>> > >   39e32e gdb_main (/usr/libexec/gdb)
>> > >   10b6fa main (/usr/libexec/gdb)
>> > >
>> > >0x5565f6f6 <+54>:callq  0x558f17a0
>> > >:mov
>> > >0x18(%rsp),%rcx
>>
>> [...]
>>
>> > Excuse me, but I'm having trouble following you. The non-GDB backtraces
>> > you
>> > are pasting do not show srcline information. So what exactly is broken?
>>
>> There is broken that perf now reports address 10b6fa (corresponding to
>> relocated address 0x5565f6fa) but there is no instruction on address
>> 0x5565f6fa.  If you 'objdump -d' it you cannot find any instruction
>> on adress 0x5565f6fa (or on address 0x10b6fa).  There is
>> instruction on address 0x5565f6fb.
>>
>> > Maybe paste the perf output you get now and highlight what you'd expect
>> > instead?
>>
>> Actual:
>>   39e32e gdb_main (/usr/libexec/gdb)
>>   10b6fa main (/usr/libexec/gdb)
>> Expected:
>>   39e32f gdb_main (/usr/libexec/gdb)
>>   10b6fb main (/usr/libexec/gdb)
>>
>> I agree perf needs to calculate with 39e32e and 10b6fa.  But it should
>> display to user 39e32f and 10b6fb.
>
> Hmmm this will require some more changes throughout the stack then. I.e. we'll
> have to remember the "isactivation" flag along with the original IP, and only
> apply the offset then when we query for inliners or srcline information. Maybe
> I can pull that off somehow in the patch series I'm working on currently,
> which refactors the whole inline/srcline/callchain logic anyways.
>
> I don't see an easy way to fix the behavior. Does anyone else? So how do we
> deal with this situation in the interim? I'd prefer we keep the current
> "broken" state, as I consider it less broken than what we had before... I
> guess some of the core perf developers should decide how to handle this.

Not sure whether it needs be fixed or not.  If we fix it, srcline and
address would not match so it can give its own confusion to users.
Ideally it should display an addressof the instruction before the
address IMHO.

Thanks,
Namhyung


Re: perf report: fix off-by-one for non-activation frames

2017-06-17 Thread Namhyung Kim
On Sat, Jun 17, 2017 at 4:54 AM, Milian Wolff  wrote:
> On Freitag, 16. Juni 2017 13:57:44 CEST Jan Kratochvil wrote:
>> On Fri, 16 Jun 2017 13:51:37 +0200, Milian Wolff wrote:
>> > > perf-4.12.0-0.rc5.git0.1.fc27.x86_64
>> > >
>> > >   39e32e gdb_main (/usr/libexec/gdb)
>> > >   10b6fa main (/usr/libexec/gdb)
>> > >
>> > >0x5565f6f6 <+54>:callq  0x558f17a0
>> > >:mov
>> > >0x18(%rsp),%rcx
>>
>> [...]
>>
>> > Excuse me, but I'm having trouble following you. The non-GDB backtraces
>> > you
>> > are pasting do not show srcline information. So what exactly is broken?
>>
>> There is broken that perf now reports address 10b6fa (corresponding to
>> relocated address 0x5565f6fa) but there is no instruction on address
>> 0x5565f6fa.  If you 'objdump -d' it you cannot find any instruction
>> on adress 0x5565f6fa (or on address 0x10b6fa).  There is
>> instruction on address 0x5565f6fb.
>>
>> > Maybe paste the perf output you get now and highlight what you'd expect
>> > instead?
>>
>> Actual:
>>   39e32e gdb_main (/usr/libexec/gdb)
>>   10b6fa main (/usr/libexec/gdb)
>> Expected:
>>   39e32f gdb_main (/usr/libexec/gdb)
>>   10b6fb main (/usr/libexec/gdb)
>>
>> I agree perf needs to calculate with 39e32e and 10b6fa.  But it should
>> display to user 39e32f and 10b6fb.
>
> Hmmm this will require some more changes throughout the stack then. I.e. we'll
> have to remember the "isactivation" flag along with the original IP, and only
> apply the offset then when we query for inliners or srcline information. Maybe
> I can pull that off somehow in the patch series I'm working on currently,
> which refactors the whole inline/srcline/callchain logic anyways.
>
> I don't see an easy way to fix the behavior. Does anyone else? So how do we
> deal with this situation in the interim? I'd prefer we keep the current
> "broken" state, as I consider it less broken than what we had before... I
> guess some of the core perf developers should decide how to handle this.

Not sure whether it needs be fixed or not.  If we fix it, srcline and
address would not match so it can give its own confusion to users.
Ideally it should display an addressof the instruction before the
address IMHO.

Thanks,
Namhyung


Re: perf report: fix off-by-one for non-activation frames

2017-06-16 Thread Milian Wolff
On Freitag, 16. Juni 2017 13:57:44 CEST Jan Kratochvil wrote:
> On Fri, 16 Jun 2017 13:51:37 +0200, Milian Wolff wrote:
> > > perf-4.12.0-0.rc5.git0.1.fc27.x86_64
> > > 
> > >   39e32e gdb_main (/usr/libexec/gdb)
> > >   10b6fa main (/usr/libexec/gdb)
> > >
> > >0x5565f6f6 <+54>:callq  0x558f17a0
> > >:mov   
> > >0x18(%rsp),%rcx
> 
> [...]
> 
> > Excuse me, but I'm having trouble following you. The non-GDB backtraces
> > you
> > are pasting do not show srcline information. So what exactly is broken?
> 
> There is broken that perf now reports address 10b6fa (corresponding to
> relocated address 0x5565f6fa) but there is no instruction on address
> 0x5565f6fa.  If you 'objdump -d' it you cannot find any instruction
> on adress 0x5565f6fa (or on address 0x10b6fa).  There is
> instruction on address 0x5565f6fb.
> 
> > Maybe paste the perf output you get now and highlight what you'd expect
> > instead?
> 
> Actual:
>   39e32e gdb_main (/usr/libexec/gdb)
>   10b6fa main (/usr/libexec/gdb)
> Expected:
>   39e32f gdb_main (/usr/libexec/gdb)
>   10b6fb main (/usr/libexec/gdb)
> 
> I agree perf needs to calculate with 39e32e and 10b6fa.  But it should
> display to user 39e32f and 10b6fb.

Hmmm this will require some more changes throughout the stack then. I.e. we'll 
have to remember the "isactivation" flag along with the original IP, and only 
apply the offset then when we query for inliners or srcline information. Maybe 
I can pull that off somehow in the patch series I'm working on currently, 
which refactors the whole inline/srcline/callchain logic anyways.

I don't see an easy way to fix the behavior. Does anyone else? So how do we 
deal with this situation in the interim? I'd prefer we keep the current 
"broken" state, as I consider it less broken than what we had before... I 
guess some of the core perf developers should decide how to handle this.

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


Re: perf report: fix off-by-one for non-activation frames

2017-06-16 Thread Milian Wolff
On Freitag, 16. Juni 2017 13:57:44 CEST Jan Kratochvil wrote:
> On Fri, 16 Jun 2017 13:51:37 +0200, Milian Wolff wrote:
> > > perf-4.12.0-0.rc5.git0.1.fc27.x86_64
> > > 
> > >   39e32e gdb_main (/usr/libexec/gdb)
> > >   10b6fa main (/usr/libexec/gdb)
> > >
> > >0x5565f6f6 <+54>:callq  0x558f17a0
> > >:mov   
> > >0x18(%rsp),%rcx
> 
> [...]
> 
> > Excuse me, but I'm having trouble following you. The non-GDB backtraces
> > you
> > are pasting do not show srcline information. So what exactly is broken?
> 
> There is broken that perf now reports address 10b6fa (corresponding to
> relocated address 0x5565f6fa) but there is no instruction on address
> 0x5565f6fa.  If you 'objdump -d' it you cannot find any instruction
> on adress 0x5565f6fa (or on address 0x10b6fa).  There is
> instruction on address 0x5565f6fb.
> 
> > Maybe paste the perf output you get now and highlight what you'd expect
> > instead?
> 
> Actual:
>   39e32e gdb_main (/usr/libexec/gdb)
>   10b6fa main (/usr/libexec/gdb)
> Expected:
>   39e32f gdb_main (/usr/libexec/gdb)
>   10b6fb main (/usr/libexec/gdb)
> 
> I agree perf needs to calculate with 39e32e and 10b6fa.  But it should
> display to user 39e32f and 10b6fb.

Hmmm this will require some more changes throughout the stack then. I.e. we'll 
have to remember the "isactivation" flag along with the original IP, and only 
apply the offset then when we query for inliners or srcline information. Maybe 
I can pull that off somehow in the patch series I'm working on currently, 
which refactors the whole inline/srcline/callchain logic anyways.

I don't see an easy way to fix the behavior. Does anyone else? So how do we 
deal with this situation in the interim? I'd prefer we keep the current 
"broken" state, as I consider it less broken than what we had before... I 
guess some of the core perf developers should decide how to handle this.

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


Re: perf report: fix off-by-one for non-activation frames

2017-06-16 Thread Jan Kratochvil
On Fri, 16 Jun 2017 13:51:37 +0200, Milian Wolff wrote:
> > perf-4.12.0-0.rc5.git0.1.fc27.x86_64
> >   39e32e gdb_main (/usr/libexec/gdb)
> >   10b6fa main (/usr/libexec/gdb)
> >0x5565f6f6 <+54>:callq  0x558f17a0 
> >  >0x5565f6fb <+59>:mov0x18(%rsp),%rcx
[...]
> Excuse me, but I'm having trouble following you. The non-GDB backtraces you 
> are pasting do not show srcline information. So what exactly is broken?

There is broken that perf now reports address 10b6fa (corresponding to
relocated address 0x5565f6fa) but there is no instruction on address
0x5565f6fa.  If you 'objdump -d' it you cannot find any instruction on
adress 0x5565f6fa (or on address 0x10b6fa).  There is instruction on
address 0x5565f6fb.


> Maybe paste the perf output you get now and highlight what you'd expect
> instead?

Actual:
  39e32e gdb_main (/usr/libexec/gdb)
  10b6fa main (/usr/libexec/gdb)
Expected:
  39e32f gdb_main (/usr/libexec/gdb)
  10b6fb main (/usr/libexec/gdb)

I agree perf needs to calculate with 39e32e and 10b6fa.  But it should display
to user 39e32f and 10b6fb.


Jan


Re: perf report: fix off-by-one for non-activation frames

2017-06-16 Thread Jan Kratochvil
On Fri, 16 Jun 2017 13:51:37 +0200, Milian Wolff wrote:
> > perf-4.12.0-0.rc5.git0.1.fc27.x86_64
> >   39e32e gdb_main (/usr/libexec/gdb)
> >   10b6fa main (/usr/libexec/gdb)
> >0x5565f6f6 <+54>:callq  0x558f17a0 
> >  >0x5565f6fb <+59>:mov0x18(%rsp),%rcx
[...]
> Excuse me, but I'm having trouble following you. The non-GDB backtraces you 
> are pasting do not show srcline information. So what exactly is broken?

There is broken that perf now reports address 10b6fa (corresponding to
relocated address 0x5565f6fa) but there is no instruction on address
0x5565f6fa.  If you 'objdump -d' it you cannot find any instruction on
adress 0x5565f6fa (or on address 0x10b6fa).  There is instruction on
address 0x5565f6fb.


> Maybe paste the perf output you get now and highlight what you'd expect
> instead?

Actual:
  39e32e gdb_main (/usr/libexec/gdb)
  10b6fa main (/usr/libexec/gdb)
Expected:
  39e32f gdb_main (/usr/libexec/gdb)
  10b6fb main (/usr/libexec/gdb)

I agree perf needs to calculate with 39e32e and 10b6fa.  But it should display
to user 39e32f and 10b6fb.


Jan


Re: perf report: fix off-by-one for non-activation frames

2017-06-16 Thread Milian Wolff
On Freitag, 16. Juni 2017 08:14:56 CEST Jan Kratochvil wrote:
> On Mon, 15 May 2017 17:04:44 +0200, Milian Wolff wrote:
> 
> commit 1982ad48fc82c284a5cc55697a012d3357e84d01
> Author: Milian Wolff 
> Date:   Wed May 24 15:21:25 2017 +0900
> 
> > --- a/tools/perf/util/unwind-libdw.c
> > +++ b/tools/perf/util/unwind-libdw.c
> > @@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
> 
> ...
> 
> > +   if (!isactivation)
> > +   --pc;
> > +
> 
> FYI I find it as a regression a bit:
> 
> perf-4.11.4-200.fc25.x86_64
>   30c563 gdb_main (/usr/libexec/gdb)
>fae48 main (/usr/libexec/gdb)
>0x5564ee43 <+51>:callq  0x5585f340
>  0x5564ee48 <+56>:mov   
> 0x18(%rsp),%rcx
> 
> perf-4.12.0-0.rc5.git0.1.fc27.x86_64
>   39e32e gdb_main (/usr/libexec/gdb)
>   10b6fa main (/usr/libexec/gdb)
>0x5565f6f6 <+54>:callq  0x558f17a0
>  0x5565f6fb <+59>:mov   
> 0x18(%rsp),%rcx
> 
> In backtraces it is correct to show the source line of the calling line - as
> perf does now after your fix - but one still should report PC address of
> the start of the next instruction.  At least this is what debuggers are
> used to do:
> 
> #9  gdb_main (args=0x7fffe2e0) at ../../gdb/main.c:1257
> #10 0x5565f6fb in main (argc=, argv=)
> at ../../gdb/gdb.c:40 0x5565f6f6 <+54>:   callq  0x558f17a0
>  => 0x5565f6fb <+59>:  mov   
> 0x18(%rsp),%rcx
> Line 40 of "../../gdb/gdb.c" starts at address 0x5565f6f6  char**)+54> and ends at 0x5565f6fb . Line 41 of
> "../../gdb/gdb.c" starts at address 0x5565f6fb 
> and ends at 0x5565f715.
> 
> You see "gdb.c:40" and 0x5565f6fb in the backtrace despite
> 0x5565f6fb is already line 41.
> 
> This is also why elfutils reports separately PC and 'isactivation' flag.
> Instead of just reporting decreased PC.

Excuse me, but I'm having trouble following you. The non-GDB backtraces you 
are pasting do not show srcline information. So what exactly is broken? Can 
you show me the differences a bit more clearly? Maybe paste the perf output 
you get now and highlight what you'd expect instead? Best would be an 
accompanying test case that I can use to improve the situation, if possible?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


Re: perf report: fix off-by-one for non-activation frames

2017-06-16 Thread Milian Wolff
On Freitag, 16. Juni 2017 08:14:56 CEST Jan Kratochvil wrote:
> On Mon, 15 May 2017 17:04:44 +0200, Milian Wolff wrote:
> 
> commit 1982ad48fc82c284a5cc55697a012d3357e84d01
> Author: Milian Wolff 
> Date:   Wed May 24 15:21:25 2017 +0900
> 
> > --- a/tools/perf/util/unwind-libdw.c
> > +++ b/tools/perf/util/unwind-libdw.c
> > @@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
> 
> ...
> 
> > +   if (!isactivation)
> > +   --pc;
> > +
> 
> FYI I find it as a regression a bit:
> 
> perf-4.11.4-200.fc25.x86_64
>   30c563 gdb_main (/usr/libexec/gdb)
>fae48 main (/usr/libexec/gdb)
>0x5564ee43 <+51>:callq  0x5585f340
>  0x5564ee48 <+56>:mov   
> 0x18(%rsp),%rcx
> 
> perf-4.12.0-0.rc5.git0.1.fc27.x86_64
>   39e32e gdb_main (/usr/libexec/gdb)
>   10b6fa main (/usr/libexec/gdb)
>0x5565f6f6 <+54>:callq  0x558f17a0
>  0x5565f6fb <+59>:mov   
> 0x18(%rsp),%rcx
> 
> In backtraces it is correct to show the source line of the calling line - as
> perf does now after your fix - but one still should report PC address of
> the start of the next instruction.  At least this is what debuggers are
> used to do:
> 
> #9  gdb_main (args=0x7fffe2e0) at ../../gdb/main.c:1257
> #10 0x5565f6fb in main (argc=, argv=)
> at ../../gdb/gdb.c:40 0x5565f6f6 <+54>:   callq  0x558f17a0
>  => 0x5565f6fb <+59>:  mov   
> 0x18(%rsp),%rcx
> Line 40 of "../../gdb/gdb.c" starts at address 0x5565f6f6  char**)+54> and ends at 0x5565f6fb . Line 41 of
> "../../gdb/gdb.c" starts at address 0x5565f6fb 
> and ends at 0x5565f715.
> 
> You see "gdb.c:40" and 0x5565f6fb in the backtrace despite
> 0x5565f6fb is already line 41.
> 
> This is also why elfutils reports separately PC and 'isactivation' flag.
> Instead of just reporting decreased PC.

Excuse me, but I'm having trouble following you. The non-GDB backtraces you 
are pasting do not show srcline information. So what exactly is broken? Can 
you show me the differences a bit more clearly? Maybe paste the perf output 
you get now and highlight what you'd expect instead? Best would be an 
accompanying test case that I can use to improve the situation, if possible?

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts


Re: perf report: fix off-by-one for non-activation frames

2017-06-16 Thread Jan Kratochvil
On Mon, 15 May 2017 17:04:44 +0200, Milian Wolff wrote:

commit 1982ad48fc82c284a5cc55697a012d3357e84d01
Author: Milian Wolff 
Date:   Wed May 24 15:21:25 2017 +0900

> --- a/tools/perf/util/unwind-libdw.c
> +++ b/tools/perf/util/unwind-libdw.c
> @@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
...
> + if (!isactivation)
> + --pc;
> +

FYI I find it as a regression a bit:

perf-4.11.4-200.fc25.x86_64
  30c563 gdb_main (/usr/libexec/gdb)
   fae48 main (/usr/libexec/gdb)
   0x5564ee43 <+51>:callq  0x5585f340 

   0x5564ee48 <+56>:mov0x18(%rsp),%rcx

perf-4.12.0-0.rc5.git0.1.fc27.x86_64
  39e32e gdb_main (/usr/libexec/gdb)
  10b6fa main (/usr/libexec/gdb)
   0x5565f6f6 <+54>:callq  0x558f17a0 

   0x5565f6fb <+59>:mov0x18(%rsp),%rcx

In backtraces it is correct to show the source line of the calling line - as
perf does now after your fix - but one still should report PC address of the
start of the next instruction.  At least this is what debuggers are used to
do:

#9  gdb_main (args=0x7fffe2e0) at ../../gdb/main.c:1257
#10 0x5565f6fb in main (argc=, argv=) at 
../../gdb/gdb.c:40
   0x5565f6f6 <+54>:callq  0x558f17a0 

=> 0x5565f6fb <+59>:mov0x18(%rsp),%rcx
Line 40 of "../../gdb/gdb.c" starts at address 0x5565f6f6  and ends at 0x5565f6fb .
Line 41 of "../../gdb/gdb.c" starts at address 0x5565f6fb  and ends at 0x5565f715.

You see "gdb.c:40" and 0x5565f6fb in the backtrace despite
0x5565f6fb is already line 41.

This is also why elfutils reports separately PC and 'isactivation' flag.
Instead of just reporting decreased PC.


Jan


Re: perf report: fix off-by-one for non-activation frames

2017-06-16 Thread Jan Kratochvil
On Mon, 15 May 2017 17:04:44 +0200, Milian Wolff wrote:

commit 1982ad48fc82c284a5cc55697a012d3357e84d01
Author: Milian Wolff 
Date:   Wed May 24 15:21:25 2017 +0900

> --- a/tools/perf/util/unwind-libdw.c
> +++ b/tools/perf/util/unwind-libdw.c
> @@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
...
> + if (!isactivation)
> + --pc;
> +

FYI I find it as a regression a bit:

perf-4.11.4-200.fc25.x86_64
  30c563 gdb_main (/usr/libexec/gdb)
   fae48 main (/usr/libexec/gdb)
   0x5564ee43 <+51>:callq  0x5585f340 

   0x5564ee48 <+56>:mov0x18(%rsp),%rcx

perf-4.12.0-0.rc5.git0.1.fc27.x86_64
  39e32e gdb_main (/usr/libexec/gdb)
  10b6fa main (/usr/libexec/gdb)
   0x5565f6f6 <+54>:callq  0x558f17a0 

   0x5565f6fb <+59>:mov0x18(%rsp),%rcx

In backtraces it is correct to show the source line of the calling line - as
perf does now after your fix - but one still should report PC address of the
start of the next instruction.  At least this is what debuggers are used to
do:

#9  gdb_main (args=0x7fffe2e0) at ../../gdb/main.c:1257
#10 0x5565f6fb in main (argc=, argv=) at 
../../gdb/gdb.c:40
   0x5565f6f6 <+54>:callq  0x558f17a0 

=> 0x5565f6fb <+59>:mov0x18(%rsp),%rcx
Line 40 of "../../gdb/gdb.c" starts at address 0x5565f6f6  and ends at 0x5565f6fb .
Line 41 of "../../gdb/gdb.c" starts at address 0x5565f6fb  and ends at 0x5565f715.

You see "gdb.c:40" and 0x5565f6fb in the backtrace despite
0x5565f6fb is already line 41.

This is also why elfutils reports separately PC and 'isactivation' flag.
Instead of just reporting decreased PC.


Jan


[tip:perf/urgent] perf report: Fix off-by-one for non-activation frames

2017-05-24 Thread tip-bot for Milian Wolff
Commit-ID:  1982ad48fc82c284a5cc55697a012d3357e84d01
Gitweb: http://git.kernel.org/tip/1982ad48fc82c284a5cc55697a012d3357e84d01
Author: Milian Wolff <milian.wo...@kdab.com>
AuthorDate: Wed, 24 May 2017 15:21:25 +0900
Committer:  Ingo Molnar <mi...@kernel.org>
CommitDate: Wed, 24 May 2017 08:41:48 +0200

perf report: Fix off-by-one for non-activation frames

As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
  g++ -std=c++11 -g -O2 test.cpp
  echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
  perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
  echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
  perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for<long, std::ratio<1l, 1000l> > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~
  __noinline void bar(void)
  {
volatile long cnt = 0;

for (cnt = 0; cnt < 1; cnt++);
  }

  __noinline void foo(void)
  {
bar();
  }

  void sig_handler(int sig)
  {
foo();
  }

  int main(void)
  {
signal(SIGUSR1, sig_handler);
raise(SIGUSR1);

foo();
return 0;
  }


Before, the report wrongly points to `signal.c:29` after raise():


  $ perf report --stdio --no-children -g srcline -s srcline
  ...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding metho

[tip:perf/urgent] perf report: Fix off-by-one for non-activation frames

2017-05-24 Thread tip-bot for Milian Wolff
Commit-ID:  1982ad48fc82c284a5cc55697a012d3357e84d01
Gitweb: http://git.kernel.org/tip/1982ad48fc82c284a5cc55697a012d3357e84d01
Author: Milian Wolff 
AuthorDate: Wed, 24 May 2017 15:21:25 +0900
Committer:  Ingo Molnar 
CommitDate: Wed, 24 May 2017 08:41:48 +0200

perf report: Fix off-by-one for non-activation frames

As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
  g++ -std=c++11 -g -O2 test.cpp
  echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
  perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
  echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
  perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~
  __noinline void bar(void)
  {
volatile long cnt = 0;

for (cnt = 0; cnt < 1; cnt++);
  }

  __noinline void foo(void)
  {
bar();
  }

  void sig_handler(int sig)
  {
foo();
  }

  int main(void)
  {
signal(SIGUSR1, sig_handler);
raise(SIGUSR1);

foo();
return 0;
  }


Before, the report wrongly points to `signal.c:29` after raise():


  $ perf report --stdio --no-children -g srcline -s srcline
  ...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc(). For libunwind, we replace th

[PATCH 3/7] perf report: fix off-by-one for non-activation frames

2017-05-24 Thread Namhyung Kim
From: Milian Wolff 

As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
g++ -std=c++11 -g -O2 test.cpp
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~

__noinline void bar(void)
{
  volatile long cnt = 0;

  for (cnt = 0; cnt < 1; cnt++);
}

__noinline void foo(void)
{
  bar();
}

void sig_handler(int sig)
{
  foo();
}

int main(void)
{
  signal(SIGUSR1, sig_handler);
  raise(SIGUSR1);

  foo();
  return 0;
}


Before, the report wrongly points to `signal.c:29` after raise():


$ perf report --stdio --no-children -g srcline -s srcline
...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc. For libunwind, we replace the functionality via
unw_is_signal_frame for any but the very first frame.

Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Peter Zijlstra 
Cc: Yao Jin 
Signed-off-by: Milian Wolff 
Signed-off-by: 

[PATCH 3/7] perf report: fix off-by-one for non-activation frames

2017-05-24 Thread Namhyung Kim
From: Milian Wolff 

As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
g++ -std=c++11 -g -O2 test.cpp
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~

__noinline void bar(void)
{
  volatile long cnt = 0;

  for (cnt = 0; cnt < 1; cnt++);
}

__noinline void foo(void)
{
  bar();
}

void sig_handler(int sig)
{
  foo();
}

int main(void)
{
  signal(SIGUSR1, sig_handler);
  raise(SIGUSR1);

  foo();
  return 0;
}


Before, the report wrongly points to `signal.c:29` after raise():


$ perf report --stdio --no-children -g srcline -s srcline
...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc. For libunwind, we replace the functionality via
unw_is_signal_frame for any but the very first frame.

Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Peter Zijlstra 
Cc: Yao Jin 
Signed-off-by: Milian Wolff 
Signed-off-by: Namhyung Kim 
---
 tools/perf/util/unwind-libdw.c   |  6 +-
 tools/perf/util/unwind-libunwind-local.c | 11 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff 

Re: [PATCH v3] perf report: fix off-by-one for non-activation frames

2017-05-18 Thread Namhyung Kim
Hi Milian,

On Thu, May 18, 2017 at 10:37:35AM +0200, Milian Wolff wrote:
> As the documentation for dwfl_frame_pc says, frames that
> are no activation frames need to have their program counter
> decremented by one to properly find the function of the caller.
> 
> This fixes many cases where perf report currently attributes
> the cost to the next line. I.e. I have code like this:
> 
> ~~~
>   #include 
>   #include 
> 
>   using namespace std;
> 
>   int main()
>   {
> this_thread::sleep_for(chrono::milliseconds(1000));
> this_thread::sleep_for(chrono::milliseconds(100));
> this_thread::sleep_for(chrono::milliseconds(10));
> 
> return 0;
>   }
> ~~~
> 
> Now compile and record it:
> 
> ~~~
> g++ -std=c++11 -g -O2 test.cpp
> echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
> perf record \
> --event sched:sched_stat_sleep \
> --event sched:sched_process_exit \
> --event sched:sched_switch --call-graph=dwarf \
> --output perf.data.raw \
> ./a.out
> echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
> perf inject --sched-stat --input perf.data.raw --output perf.data
> ~~~
> 
> Before this patch, the report clearly shows the off-by-one issue.
> Most notably, the last sleep invocation is incorrectly attributed
> to the "return 0;" line:
> 
> ~~~
>   Overhead  Source:Line
>     ...
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
>|
>|--90.08%--main test.cpp:9
>|  __libc_start_main
>|  _start
>|
>|--9.01%--main test.cpp:10
>|  __libc_start_main
>|  _start
>|
> --0.91%--main test.cpp:13
>   __libc_start_main
>   _start
> ~~~
> 
> With this patch here applied, the issue is fixed. The report becomes
> much more usable:
> 
> ~~~
>   Overhead  Source:Line
>     ...
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
>|
>|--90.08%--main test.cpp:8
>|  __libc_start_main
>|  _start
>|
>|--9.01%--main test.cpp:9
>|  __libc_start_main
>|  _start
>|
> --0.91%--main test.cpp:10
>   __libc_start_main
>   _start
> ~~~
> 
> Similarly it works for signal frames:
> 
> ~~~
> #include 
> #include 
> #include 
> 
> #define __noinline  __attribute__((noinline))
> 
> __noinline void bar(void)
> {
>   volatile long cnt = 0;
> 
>   for (cnt = 0; cnt < 1; cnt++);
> }
> 
> __noinline void foo(void)
> {
>   bar();
> }
> 
> void sig_handler(int sig)
> {
>   foo();
> }
> 
> int main(void)
> {
>   signal(SIGUSR1, sig_handler);
>   raise(SIGUSR1);
> 
>   foo();
>   return 0;
> }
> 
> 
> Before, the report wrongly points to `signal.c:29` after raise():
> 
> 
> $ perf report --stdio --no-children -g srcline -s srcline
> ...
>100.00%  signal.c:11
> |
> ---bar signal.c:11
>|
>|--50.49%--main signal.c:29
>|  __libc_start_main
>|  _start
>|
> --49.51%--0x33a8f
>   raise .:0
>   main signal.c:29
>   __libc_start_main
>   _start
> 
> 
> With this patch in, the issue is fixed and we instead get:
> 
> 
>100.00%  signal   signal[.] bar
> |
> ---bar signal.c:11
>|
>|--50.49%--main signal.c:29
>|  __libc_start_main
>|  _start
>|
> --49.51%--0x33a8f
>   raise .:0
>   main signal.c:27
>   __libc_start_main
>   _start
> 
> 
> Note how this patch fixes 

Re: [PATCH v3] perf report: fix off-by-one for non-activation frames

2017-05-18 Thread Namhyung Kim
Hi Milian,

On Thu, May 18, 2017 at 10:37:35AM +0200, Milian Wolff wrote:
> As the documentation for dwfl_frame_pc says, frames that
> are no activation frames need to have their program counter
> decremented by one to properly find the function of the caller.
> 
> This fixes many cases where perf report currently attributes
> the cost to the next line. I.e. I have code like this:
> 
> ~~~
>   #include 
>   #include 
> 
>   using namespace std;
> 
>   int main()
>   {
> this_thread::sleep_for(chrono::milliseconds(1000));
> this_thread::sleep_for(chrono::milliseconds(100));
> this_thread::sleep_for(chrono::milliseconds(10));
> 
> return 0;
>   }
> ~~~
> 
> Now compile and record it:
> 
> ~~~
> g++ -std=c++11 -g -O2 test.cpp
> echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
> perf record \
> --event sched:sched_stat_sleep \
> --event sched:sched_process_exit \
> --event sched:sched_switch --call-graph=dwarf \
> --output perf.data.raw \
> ./a.out
> echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
> perf inject --sched-stat --input perf.data.raw --output perf.data
> ~~~
> 
> Before this patch, the report clearly shows the off-by-one issue.
> Most notably, the last sleep invocation is incorrectly attributed
> to the "return 0;" line:
> 
> ~~~
>   Overhead  Source:Line
>     ...
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
>|
>|--90.08%--main test.cpp:9
>|  __libc_start_main
>|  _start
>|
>|--9.01%--main test.cpp:10
>|  __libc_start_main
>|  _start
>|
> --0.91%--main test.cpp:13
>   __libc_start_main
>   _start
> ~~~
> 
> With this patch here applied, the issue is fixed. The report becomes
> much more usable:
> 
> ~~~
>   Overhead  Source:Line
>     ...
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
>|
>|--90.08%--main test.cpp:8
>|  __libc_start_main
>|  _start
>|
>|--9.01%--main test.cpp:9
>|  __libc_start_main
>|  _start
>|
> --0.91%--main test.cpp:10
>   __libc_start_main
>   _start
> ~~~
> 
> Similarly it works for signal frames:
> 
> ~~~
> #include 
> #include 
> #include 
> 
> #define __noinline  __attribute__((noinline))
> 
> __noinline void bar(void)
> {
>   volatile long cnt = 0;
> 
>   for (cnt = 0; cnt < 1; cnt++);
> }
> 
> __noinline void foo(void)
> {
>   bar();
> }
> 
> void sig_handler(int sig)
> {
>   foo();
> }
> 
> int main(void)
> {
>   signal(SIGUSR1, sig_handler);
>   raise(SIGUSR1);
> 
>   foo();
>   return 0;
> }
> 
> 
> Before, the report wrongly points to `signal.c:29` after raise():
> 
> 
> $ perf report --stdio --no-children -g srcline -s srcline
> ...
>100.00%  signal.c:11
> |
> ---bar signal.c:11
>|
>|--50.49%--main signal.c:29
>|  __libc_start_main
>|  _start
>|
> --49.51%--0x33a8f
>   raise .:0
>   main signal.c:29
>   __libc_start_main
>   _start
> 
> 
> With this patch in, the issue is fixed and we instead get:
> 
> 
>100.00%  signal   signal[.] bar
> |
> ---bar signal.c:11
>|
>|--50.49%--main signal.c:29
>|  __libc_start_main
>|  _start
>|
> --49.51%--0x33a8f
>   raise .:0
>   main signal.c:27
>   __libc_start_main
>   _start
> 
> 
> Note how this patch fixes this issue for both unwinding methods, i.e.
> both dwfl 

[PATCH v3] perf report: fix off-by-one for non-activation frames

2017-05-18 Thread Milian Wolff
As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
g++ -std=c++11 -g -O2 test.cpp
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~
#include 
#include 
#include 

#define __noinline  __attribute__((noinline))

__noinline void bar(void)
{
  volatile long cnt = 0;

  for (cnt = 0; cnt < 1; cnt++);
}

__noinline void foo(void)
{
  bar();
}

void sig_handler(int sig)
{
  foo();
}

int main(void)
{
  signal(SIGUSR1, sig_handler);
  raise(SIGUSR1);

  foo();
  return 0;
}


Before, the report wrongly points to `signal.c:29` after raise():


$ perf report --stdio --no-children -g srcline -s srcline
...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc. For libunwind, we replace the functionality via
unw_is_signal_frame for any but the very first frame.

Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Yao Jin 

[PATCH v3] perf report: fix off-by-one for non-activation frames

2017-05-18 Thread Milian Wolff
As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
g++ -std=c++11 -g -O2 test.cpp
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Similarly it works for signal frames:

~~~
#include 
#include 
#include 

#define __noinline  __attribute__((noinline))

__noinline void bar(void)
{
  volatile long cnt = 0;

  for (cnt = 0; cnt < 1; cnt++);
}

__noinline void foo(void)
{
  bar();
}

void sig_handler(int sig)
{
  foo();
}

int main(void)
{
  signal(SIGUSR1, sig_handler);
  raise(SIGUSR1);

  foo();
  return 0;
}


Before, the report wrongly points to `signal.c:29` after raise():


$ perf report --stdio --no-children -g srcline -s srcline
...
   100.00%  signal.c:11
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:29
  __libc_start_main
  _start


With this patch in, the issue is fixed and we instead get:


   100.00%  signal   signal[.] bar
|
---bar signal.c:11
   |
   |--50.49%--main signal.c:29
   |  __libc_start_main
   |  _start
   |
--49.51%--0x33a8f
  raise .:0
  main signal.c:27
  __libc_start_main
  _start


Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc. For libunwind, we replace the functionality via
unw_is_signal_frame for any but the very first frame.

Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Yao Jin 
Signed-off-by: Milian Wolff 
---
 tools/perf/util/unwind-libdw.c   |  6 +-
 tools/perf/util/unwind-libunwind-local.c | 11 +++
 2 files 

Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-18 Thread Milian Wolff
On Mittwoch, 17. Mai 2017 07:12:21 CEST Namhyung Kim wrote:
> On Tue, May 16, 2017 at 06:26:47PM +0200, Milian Wolff wrote:
> > On Dienstag, 16. Mai 2017 18:17:26 CEST Milian Wolff wrote:
> > > On Dienstag, 16. Mai 2017 16:38:29 CEST Namhyung Kim wrote:
> > > > On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> > > > > As the documentation for dwfl_frame_pc says, frames that
> > > > > are no activation frames need to have their program counter
> > > > > decremented by one to properly find the function of the caller.
> > > > > 
> > > > > This fixes many cases where perf report currently attributes
> > > > > the cost to the next line. I.e. I have code like this:
> > > > > 
> > > > > ~~~
> > > > > 
> > > > >   #include 
> > > > >   #include 
> > > > >   
> > > > >   using namespace std;
> > > > >   
> > > > >   int main()
> > > > >   {
> > > > >   
> > > > > this_thread::sleep_for(chrono::milliseconds(1000));
> > > > > this_thread::sleep_for(chrono::milliseconds(100));
> > > > > this_thread::sleep_for(chrono::milliseconds(10));
> > > > > 
> > > > > return 0;
> > > > >   
> > > > >   }
> > > > > 
> > > > > ~~~
> > > > 
> > > > It'd be nice if the test program has a signal frame for verification.
> > > 
> > > I have pretty much zero experience about signals. Would it be enough to
> > > add
> > > a signal handler for, say, SIGUSR1 to my test application and then
> > > trigger
> > > a sleep when that signal is delivered? If that should be enough, I'll
> > > write
> > > and test it out.
> > 
> > ~~~
> > #include 
> > #include 
> > #include 
> > 
> > using namespace std;
> > 
> > volatile bool run_loop = true;
> > 
> > void my_handler(int signum)
> > {
> > 
> > this_thread::sleep_for(chrono::milliseconds(1000));
> > this_thread::sleep_for(chrono::milliseconds(100));
> > this_thread::sleep_for(chrono::milliseconds(10));
> > run_loop = false;
> > 
> > }
> > 
> > int main()
> > {
> > 
> > signal(SIGUSR1, my_handler);
> > 
> > while (run_loop) {}
> > 
> > return 0;
> > 
> > }
> > ~~~
> > 
> > This does not properly unwind neither before nor after this patch. I only
> > ever> 
> > get:
> >100.00%  core.c:0
> >
> > ---__schedule core.c:0
> > 
> >schedule
> >do_nanosleep hrtimer.c:0
> >hrtimer_nanosleep
> >sys_nanosleep
> >entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
> >__nanosleep_nocancel .:0
> >std::this_thread::sleep_for >
> > 
> > thread:323
> > 
> > So... should this work? Please tell me how to test this properly.
> 
> How did you send the SIGUSR1 to the process?
> 
> Anyway it does nothing to do with inlining, right?  I just wrote a
> test code below to burn a cpu with or without a signal frame.
> 
>   $ nl -ba frame-addr.c
>  1#include 
>  2#include 
>  3#include 
>  4
>  5#define __noinline  __attribute__((noinline))
>  6
>  7__noinline void bar(void)
>  8{
>  9  volatile long cnt = 0;
> 10
> 11  for (cnt = 0; cnt < 1; cnt++);
> 12}
> 13
> 14__noinline void foo(void)
> 15{
> 16  bar();
> 17}
> 18
> 19void sig_handler(int sig)
> 20{
> 21  foo();
> 22}
> 23
> 24int main(void)
> 25{
> 26  signal(SIGUSR1, sig_handler);
> 27  raise(SIGUSR1);
> 28
> 29  foo();
> 30  return 0;
> 31}
> 
>   $ gcc -O2 -g -fno-optimize-sibling-calls -o frame-addr frame-addr.c
> 
>   $ perf record --call-graph dwarf ./frame-addr
> 
>   $ perf report -q -g srcline | head -15
>   99.88%  frame-addr  frame-addr[.] bar
> 
>   ---bar frame-addr.c:11
>  foo frame-addr.c:16
> 
>  |--51.12%--main frame-addr.c:29
>  |
>  |  __libc_start_main
>  |  _start
> 
>   --48.76%--sig_handler frame-addr.c:21
> 0x33a8f
> raise .:0
> main frame-addr.c:29   <--- bad
> __libc_start_main
> _start
> 
> Note that 'raise' was called at line 27.  It seems that simply
> checking current frame fixes it.

Got it - thanks for your test case. I fixed it and will resend v3 of this 
patch shortly.

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-18 Thread Milian Wolff
On Mittwoch, 17. Mai 2017 07:12:21 CEST Namhyung Kim wrote:
> On Tue, May 16, 2017 at 06:26:47PM +0200, Milian Wolff wrote:
> > On Dienstag, 16. Mai 2017 18:17:26 CEST Milian Wolff wrote:
> > > On Dienstag, 16. Mai 2017 16:38:29 CEST Namhyung Kim wrote:
> > > > On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> > > > > As the documentation for dwfl_frame_pc says, frames that
> > > > > are no activation frames need to have their program counter
> > > > > decremented by one to properly find the function of the caller.
> > > > > 
> > > > > This fixes many cases where perf report currently attributes
> > > > > the cost to the next line. I.e. I have code like this:
> > > > > 
> > > > > ~~~
> > > > > 
> > > > >   #include 
> > > > >   #include 
> > > > >   
> > > > >   using namespace std;
> > > > >   
> > > > >   int main()
> > > > >   {
> > > > >   
> > > > > this_thread::sleep_for(chrono::milliseconds(1000));
> > > > > this_thread::sleep_for(chrono::milliseconds(100));
> > > > > this_thread::sleep_for(chrono::milliseconds(10));
> > > > > 
> > > > > return 0;
> > > > >   
> > > > >   }
> > > > > 
> > > > > ~~~
> > > > 
> > > > It'd be nice if the test program has a signal frame for verification.
> > > 
> > > I have pretty much zero experience about signals. Would it be enough to
> > > add
> > > a signal handler for, say, SIGUSR1 to my test application and then
> > > trigger
> > > a sleep when that signal is delivered? If that should be enough, I'll
> > > write
> > > and test it out.
> > 
> > ~~~
> > #include 
> > #include 
> > #include 
> > 
> > using namespace std;
> > 
> > volatile bool run_loop = true;
> > 
> > void my_handler(int signum)
> > {
> > 
> > this_thread::sleep_for(chrono::milliseconds(1000));
> > this_thread::sleep_for(chrono::milliseconds(100));
> > this_thread::sleep_for(chrono::milliseconds(10));
> > run_loop = false;
> > 
> > }
> > 
> > int main()
> > {
> > 
> > signal(SIGUSR1, my_handler);
> > 
> > while (run_loop) {}
> > 
> > return 0;
> > 
> > }
> > ~~~
> > 
> > This does not properly unwind neither before nor after this patch. I only
> > ever> 
> > get:
> >100.00%  core.c:0
> >
> > ---__schedule core.c:0
> > 
> >schedule
> >do_nanosleep hrtimer.c:0
> >hrtimer_nanosleep
> >sys_nanosleep
> >entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
> >__nanosleep_nocancel .:0
> >std::this_thread::sleep_for >
> > 
> > thread:323
> > 
> > So... should this work? Please tell me how to test this properly.
> 
> How did you send the SIGUSR1 to the process?
> 
> Anyway it does nothing to do with inlining, right?  I just wrote a
> test code below to burn a cpu with or without a signal frame.
> 
>   $ nl -ba frame-addr.c
>  1#include 
>  2#include 
>  3#include 
>  4
>  5#define __noinline  __attribute__((noinline))
>  6
>  7__noinline void bar(void)
>  8{
>  9  volatile long cnt = 0;
> 10
> 11  for (cnt = 0; cnt < 1; cnt++);
> 12}
> 13
> 14__noinline void foo(void)
> 15{
> 16  bar();
> 17}
> 18
> 19void sig_handler(int sig)
> 20{
> 21  foo();
> 22}
> 23
> 24int main(void)
> 25{
> 26  signal(SIGUSR1, sig_handler);
> 27  raise(SIGUSR1);
> 28
> 29  foo();
> 30  return 0;
> 31}
> 
>   $ gcc -O2 -g -fno-optimize-sibling-calls -o frame-addr frame-addr.c
> 
>   $ perf record --call-graph dwarf ./frame-addr
> 
>   $ perf report -q -g srcline | head -15
>   99.88%  frame-addr  frame-addr[.] bar
> 
>   ---bar frame-addr.c:11
>  foo frame-addr.c:16
> 
>  |--51.12%--main frame-addr.c:29
>  |
>  |  __libc_start_main
>  |  _start
> 
>   --48.76%--sig_handler frame-addr.c:21
> 0x33a8f
> raise .:0
> main frame-addr.c:29   <--- bad
> __libc_start_main
> _start
> 
> Note that 'raise' was called at line 27.  It seems that simply
> checking current frame fixes it.

Got it - thanks for your test case. I fixed it and will resend v3 of this 
patch shortly.

Thanks

-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Namhyung Kim
On Tue, May 16, 2017 at 06:26:47PM +0200, Milian Wolff wrote:
> On Dienstag, 16. Mai 2017 18:17:26 CEST Milian Wolff wrote:
> > On Dienstag, 16. Mai 2017 16:38:29 CEST Namhyung Kim wrote:
> > > On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> > > > As the documentation for dwfl_frame_pc says, frames that
> > > > are no activation frames need to have their program counter
> > > > decremented by one to properly find the function of the caller.
> > > > 
> > > > This fixes many cases where perf report currently attributes
> > > > the cost to the next line. I.e. I have code like this:
> > > > 
> > > > ~~~
> > > > 
> > > >   #include 
> > > >   #include 
> > > >   
> > > >   using namespace std;
> > > >   
> > > >   int main()
> > > >   {
> > > >   
> > > > this_thread::sleep_for(chrono::milliseconds(1000));
> > > > this_thread::sleep_for(chrono::milliseconds(100));
> > > > this_thread::sleep_for(chrono::milliseconds(10));
> > > > 
> > > > return 0;
> > > >   
> > > >   }
> > > > 
> > > > ~~~
> > > 
> > > It'd be nice if the test program has a signal frame for verification.
> > 
> > I have pretty much zero experience about signals. Would it be enough to add
> > a signal handler for, say, SIGUSR1 to my test application and then trigger
> > a sleep when that signal is delivered? If that should be enough, I'll write
> > and test it out.
> 
> ~~~
> #include 
> #include 
> #include 
> 
> using namespace std;
> 
> volatile bool run_loop = true;
> 
> void my_handler(int signum)
> {
> this_thread::sleep_for(chrono::milliseconds(1000));
> this_thread::sleep_for(chrono::milliseconds(100));
> this_thread::sleep_for(chrono::milliseconds(10));
> run_loop = false;
> }
> 
> int main()
> {
> signal(SIGUSR1, my_handler);
> 
> while (run_loop) {}
> 
> return 0;
> }
> ~~~
> 
> This does not properly unwind neither before nor after this patch. I only 
> ever 
> get:
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
> 
> So... should this work? Please tell me how to test this properly.

How did you send the SIGUSR1 to the process?

Anyway it does nothing to do with inlining, right?  I just wrote a
test code below to burn a cpu with or without a signal frame.

  $ nl -ba frame-addr.c
 1  #include 
 2  #include 
 3  #include 
 4  
 5  #define __noinline  __attribute__((noinline))
 6  
 7  __noinline void bar(void)
 8  {
 9volatile long cnt = 0;
10  
11for (cnt = 0; cnt < 1; cnt++);
12  }
13  
14  __noinline void foo(void)
15  {
16bar();
17  }
18  
19  void sig_handler(int sig)
20  {
21foo();
22  }
23  
24  int main(void)
25  {
26signal(SIGUSR1, sig_handler);
27raise(SIGUSR1);
28  
29foo();
30return 0;
31  }

  $ gcc -O2 -g -fno-optimize-sibling-calls -o frame-addr frame-addr.c

  $ perf record --call-graph dwarf ./frame-addr

  $ perf report -q -g srcline | head -15
  99.88%  frame-addr  frame-addr[.] bar
  |
  ---bar frame-addr.c:11
 foo frame-addr.c:16
 |  
 |--51.12%--main frame-addr.c:29
 |  __libc_start_main
 |  _start
 |  
  --48.76%--sig_handler frame-addr.c:21
0x33a8f
raise .:0
main frame-addr.c:29   <--- bad
__libc_start_main
_start

Note that 'raise' was called at line 27.  It seems that simply
checking current frame fixes it.

Thanks,
Namhyung



Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Namhyung Kim
On Tue, May 16, 2017 at 06:26:47PM +0200, Milian Wolff wrote:
> On Dienstag, 16. Mai 2017 18:17:26 CEST Milian Wolff wrote:
> > On Dienstag, 16. Mai 2017 16:38:29 CEST Namhyung Kim wrote:
> > > On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> > > > As the documentation for dwfl_frame_pc says, frames that
> > > > are no activation frames need to have their program counter
> > > > decremented by one to properly find the function of the caller.
> > > > 
> > > > This fixes many cases where perf report currently attributes
> > > > the cost to the next line. I.e. I have code like this:
> > > > 
> > > > ~~~
> > > > 
> > > >   #include 
> > > >   #include 
> > > >   
> > > >   using namespace std;
> > > >   
> > > >   int main()
> > > >   {
> > > >   
> > > > this_thread::sleep_for(chrono::milliseconds(1000));
> > > > this_thread::sleep_for(chrono::milliseconds(100));
> > > > this_thread::sleep_for(chrono::milliseconds(10));
> > > > 
> > > > return 0;
> > > >   
> > > >   }
> > > > 
> > > > ~~~
> > > 
> > > It'd be nice if the test program has a signal frame for verification.
> > 
> > I have pretty much zero experience about signals. Would it be enough to add
> > a signal handler for, say, SIGUSR1 to my test application and then trigger
> > a sleep when that signal is delivered? If that should be enough, I'll write
> > and test it out.
> 
> ~~~
> #include 
> #include 
> #include 
> 
> using namespace std;
> 
> volatile bool run_loop = true;
> 
> void my_handler(int signum)
> {
> this_thread::sleep_for(chrono::milliseconds(1000));
> this_thread::sleep_for(chrono::milliseconds(100));
> this_thread::sleep_for(chrono::milliseconds(10));
> run_loop = false;
> }
> 
> int main()
> {
> signal(SIGUSR1, my_handler);
> 
> while (run_loop) {}
> 
> return 0;
> }
> ~~~
> 
> This does not properly unwind neither before nor after this patch. I only 
> ever 
> get:
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
> 
> So... should this work? Please tell me how to test this properly.

How did you send the SIGUSR1 to the process?

Anyway it does nothing to do with inlining, right?  I just wrote a
test code below to burn a cpu with or without a signal frame.

  $ nl -ba frame-addr.c
 1  #include 
 2  #include 
 3  #include 
 4  
 5  #define __noinline  __attribute__((noinline))
 6  
 7  __noinline void bar(void)
 8  {
 9volatile long cnt = 0;
10  
11for (cnt = 0; cnt < 1; cnt++);
12  }
13  
14  __noinline void foo(void)
15  {
16bar();
17  }
18  
19  void sig_handler(int sig)
20  {
21foo();
22  }
23  
24  int main(void)
25  {
26signal(SIGUSR1, sig_handler);
27raise(SIGUSR1);
28  
29foo();
30return 0;
31  }

  $ gcc -O2 -g -fno-optimize-sibling-calls -o frame-addr frame-addr.c

  $ perf record --call-graph dwarf ./frame-addr

  $ perf report -q -g srcline | head -15
  99.88%  frame-addr  frame-addr[.] bar
  |
  ---bar frame-addr.c:11
 foo frame-addr.c:16
 |  
 |--51.12%--main frame-addr.c:29
 |  __libc_start_main
 |  _start
 |  
  --48.76%--sig_handler frame-addr.c:21
0x33a8f
raise .:0
main frame-addr.c:29   <--- bad
__libc_start_main
_start

Note that 'raise' was called at line 27.  It seems that simply
checking current frame fixes it.

Thanks,
Namhyung



Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Milian Wolff
On Dienstag, 16. Mai 2017 18:17:26 CEST Milian Wolff wrote:
> On Dienstag, 16. Mai 2017 16:38:29 CEST Namhyung Kim wrote:
> > On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> > > As the documentation for dwfl_frame_pc says, frames that
> > > are no activation frames need to have their program counter
> > > decremented by one to properly find the function of the caller.
> > > 
> > > This fixes many cases where perf report currently attributes
> > > the cost to the next line. I.e. I have code like this:
> > > 
> > > ~~~
> > > 
> > >   #include 
> > >   #include 
> > >   
> > >   using namespace std;
> > >   
> > >   int main()
> > >   {
> > >   
> > > this_thread::sleep_for(chrono::milliseconds(1000));
> > > this_thread::sleep_for(chrono::milliseconds(100));
> > > this_thread::sleep_for(chrono::milliseconds(10));
> > > 
> > > return 0;
> > >   
> > >   }
> > > 
> > > ~~~
> > 
> > It'd be nice if the test program has a signal frame for verification.
> 
> I have pretty much zero experience about signals. Would it be enough to add
> a signal handler for, say, SIGUSR1 to my test application and then trigger
> a sleep when that signal is delivered? If that should be enough, I'll write
> and test it out.

~~~
#include 
#include 
#include 

using namespace std;

volatile bool run_loop = true;

void my_handler(int signum)
{
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));
run_loop = false;
}

int main()
{
signal(SIGUSR1, my_handler);

while (run_loop) {}

return 0;
}
~~~

This does not properly unwind neither before nor after this patch. I only ever 
get:

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323

So... should this work? Please tell me how to test this properly.

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Milian Wolff
On Dienstag, 16. Mai 2017 18:17:26 CEST Milian Wolff wrote:
> On Dienstag, 16. Mai 2017 16:38:29 CEST Namhyung Kim wrote:
> > On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> > > As the documentation for dwfl_frame_pc says, frames that
> > > are no activation frames need to have their program counter
> > > decremented by one to properly find the function of the caller.
> > > 
> > > This fixes many cases where perf report currently attributes
> > > the cost to the next line. I.e. I have code like this:
> > > 
> > > ~~~
> > > 
> > >   #include 
> > >   #include 
> > >   
> > >   using namespace std;
> > >   
> > >   int main()
> > >   {
> > >   
> > > this_thread::sleep_for(chrono::milliseconds(1000));
> > > this_thread::sleep_for(chrono::milliseconds(100));
> > > this_thread::sleep_for(chrono::milliseconds(10));
> > > 
> > > return 0;
> > >   
> > >   }
> > > 
> > > ~~~
> > 
> > It'd be nice if the test program has a signal frame for verification.
> 
> I have pretty much zero experience about signals. Would it be enough to add
> a signal handler for, say, SIGUSR1 to my test application and then trigger
> a sleep when that signal is delivered? If that should be enough, I'll write
> and test it out.

~~~
#include 
#include 
#include 

using namespace std;

volatile bool run_loop = true;

void my_handler(int signum)
{
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));
run_loop = false;
}

int main()
{
signal(SIGUSR1, my_handler);

while (run_loop) {}

return 0;
}
~~~

This does not properly unwind neither before nor after this patch. I only ever 
get:

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323

So... should this work? Please tell me how to test this properly.

Thanks
-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Milian Wolff
On Dienstag, 16. Mai 2017 16:38:29 CEST Namhyung Kim wrote:
> On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> > As the documentation for dwfl_frame_pc says, frames that
> > are no activation frames need to have their program counter
> > decremented by one to properly find the function of the caller.
> > 
> > This fixes many cases where perf report currently attributes
> > the cost to the next line. I.e. I have code like this:
> > 
> > ~~~
> > 
> >   #include 
> >   #include 
> >   
> >   using namespace std;
> >   
> >   int main()
> >   {
> >   
> > this_thread::sleep_for(chrono::milliseconds(1000));
> > this_thread::sleep_for(chrono::milliseconds(100));
> > this_thread::sleep_for(chrono::milliseconds(10));
> > 
> > return 0;
> >   
> >   }
> > 
> > ~~~
> 
> It'd be nice if the test program has a signal frame for verification.

I have pretty much zero experience about signals. Would it be enough to add a 
signal handler for, say, SIGUSR1 to my test application and then trigger a 
sleep when that signal is delivered? If that should be enough, I'll write and 
test it out.



> > diff --git a/tools/perf/util/unwind-libunwind-local.c
> > b/tools/perf/util/unwind-libunwind-local.c index
> > f8455bed6e65..30ab26375c80 100644
> > --- a/tools/perf/util/unwind-libunwind-local.c
> > +++ b/tools/perf/util/unwind-libunwind-local.c
> > @@ -690,8 +690,22 @@ static int get_entries(struct unwind_info *ui,
> > unwind_entry_cb_t cb,> 
> > if (ret)
> > 
> > display_error(ret);
> > 
> > +   bool previous_frame_was_signal = false;
> > 
> > while (!ret && (unw_step() > 0) && i < max_stack) {
> > 
> > unw_get_reg(, UNW_REG_IP, [i]);
> > 
> > +
> > +   /*
> > +* Decrement the IP for any non-activation frames.
> > +* this is required to properly find the srcline
> > +* for caller frames.
> > +* See also the documentation for dwfl_frame_pc,
> > +* which this code tries to replicate.
> > +*/
> > +   bool frame_is_signal = unw_is_signal_frame() > 0;
> > +   if (!previous_frame_was_signal && !frame_is_signal)
> > +   --ips[i];
> > +   previous_frame_was_signal = frame_is_signal;
> 
> Does it need to check previous frame too?

That's what dwfl_frame_pc does, if I'm not misunderstanding it's source code?

Bye

-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Milian Wolff
On Dienstag, 16. Mai 2017 16:38:29 CEST Namhyung Kim wrote:
> On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> > As the documentation for dwfl_frame_pc says, frames that
> > are no activation frames need to have their program counter
> > decremented by one to properly find the function of the caller.
> > 
> > This fixes many cases where perf report currently attributes
> > the cost to the next line. I.e. I have code like this:
> > 
> > ~~~
> > 
> >   #include 
> >   #include 
> >   
> >   using namespace std;
> >   
> >   int main()
> >   {
> >   
> > this_thread::sleep_for(chrono::milliseconds(1000));
> > this_thread::sleep_for(chrono::milliseconds(100));
> > this_thread::sleep_for(chrono::milliseconds(10));
> > 
> > return 0;
> >   
> >   }
> > 
> > ~~~
> 
> It'd be nice if the test program has a signal frame for verification.

I have pretty much zero experience about signals. Would it be enough to add a 
signal handler for, say, SIGUSR1 to my test application and then trigger a 
sleep when that signal is delivered? If that should be enough, I'll write and 
test it out.



> > diff --git a/tools/perf/util/unwind-libunwind-local.c
> > b/tools/perf/util/unwind-libunwind-local.c index
> > f8455bed6e65..30ab26375c80 100644
> > --- a/tools/perf/util/unwind-libunwind-local.c
> > +++ b/tools/perf/util/unwind-libunwind-local.c
> > @@ -690,8 +690,22 @@ static int get_entries(struct unwind_info *ui,
> > unwind_entry_cb_t cb,> 
> > if (ret)
> > 
> > display_error(ret);
> > 
> > +   bool previous_frame_was_signal = false;
> > 
> > while (!ret && (unw_step() > 0) && i < max_stack) {
> > 
> > unw_get_reg(, UNW_REG_IP, [i]);
> > 
> > +
> > +   /*
> > +* Decrement the IP for any non-activation frames.
> > +* this is required to properly find the srcline
> > +* for caller frames.
> > +* See also the documentation for dwfl_frame_pc,
> > +* which this code tries to replicate.
> > +*/
> > +   bool frame_is_signal = unw_is_signal_frame() > 0;
> > +   if (!previous_frame_was_signal && !frame_is_signal)
> > +   --ips[i];
> > +   previous_frame_was_signal = frame_is_signal;
> 
> Does it need to check previous frame too?

That's what dwfl_frame_pc does, if I'm not misunderstanding it's source code?

Bye

-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Namhyung Kim
On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> As the documentation for dwfl_frame_pc says, frames that
> are no activation frames need to have their program counter
> decremented by one to properly find the function of the caller.
> 
> This fixes many cases where perf report currently attributes
> the cost to the next line. I.e. I have code like this:
> 
> ~~~
>   #include 
>   #include 
> 
>   using namespace std;
> 
>   int main()
>   {
> this_thread::sleep_for(chrono::milliseconds(1000));
> this_thread::sleep_for(chrono::milliseconds(100));
> this_thread::sleep_for(chrono::milliseconds(10));
> 
> return 0;
>   }
> ~~~

It'd be nice if the test program has a signal frame for verification.


> 
> Now compile and record it:
> 
> ~~~
> g++ -std=c++11 -g -O2 test.cpp
> echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
> perf record \
> --event sched:sched_stat_sleep \
> --event sched:sched_process_exit \
> --event sched:sched_switch --call-graph=dwarf \
> --output perf.data.raw \
> ./a.out
> echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
> perf inject --sched-stat --input perf.data.raw --output perf.data
> ~~~
> 
> Before this patch, the report clearly shows the off-by-one issue.
> Most notably, the last sleep invocation is incorrectly attributed
> to the "return 0;" line:
> 
> ~~~
>   Overhead  Source:Line
>     ...
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
>|
>|--90.08%--main test.cpp:9
>|  __libc_start_main
>|  _start
>|
>|--9.01%--main test.cpp:10
>|  __libc_start_main
>|  _start
>|
> --0.91%--main test.cpp:13
>   __libc_start_main
>   _start
> ~~~
> 
> With this patch here applied, the issue is fixed. The report becomes
> much more usable:
> 
> ~~~
>   Overhead  Source:Line
>     ...
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
>|
>|--90.08%--main test.cpp:8
>|  __libc_start_main
>|  _start
>|
>|--9.01%--main test.cpp:9
>|  __libc_start_main
>|  _start
>|
> --0.91%--main test.cpp:10
>   __libc_start_main
>   _start
> ~~~
> 
> Note how this patch fixes this issue for both unwinding methods, i.e.
> both dwfl and libunwind. The former case is straight-forward thanks
> to dwfl_frame_pc. For libunwind, we replace the functionality via
> unw_is_signal_frame for any but the very first frame.
> 
> Cc: Arnaldo Carvalho de Melo 
> Cc: David Ahern 
> Cc: Namhyung Kim 
> Cc: Peter Zijlstra 
> Cc: Yao Jin 
> Signed-off-by: Milian Wolff 
> ---
>  tools/perf/util/unwind-libdw.c   |  6 +-
>  tools/perf/util/unwind-libunwind-local.c | 14 ++
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> v2:
> - use unw_is_signal_frame to also fix this issue for libunwind
> 
> diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
> index f90e11a555b2..943a06291587 100644
> --- a/tools/perf/util/unwind-libdw.c
> +++ b/tools/perf/util/unwind-libdw.c
> @@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
>  {
>   struct unwind_info *ui = arg;
>   Dwarf_Addr pc;
> + bool isactivation;
>  
> - if (!dwfl_frame_pc(state, , NULL)) {
> + if (!dwfl_frame_pc(state, , )) {
>   pr_err("%s", dwfl_errmsg(-1));
>   return DWARF_CB_ABORT;
>   }
>  
> + if (!isactivation)
> + --pc;
> +
>   return entry(pc, ui) || !(--ui->max_stack) ?
>  DWARF_CB_ABORT : DWARF_CB_OK;
>  }
> diff --git a/tools/perf/util/unwind-libunwind-local.c 
> b/tools/perf/util/unwind-libunwind-local.c
> index f8455bed6e65..30ab26375c80 100644
> 

Re: [PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Namhyung Kim
On Tue, May 16, 2017 at 10:59:51AM +0200, Milian Wolff wrote:
> As the documentation for dwfl_frame_pc says, frames that
> are no activation frames need to have their program counter
> decremented by one to properly find the function of the caller.
> 
> This fixes many cases where perf report currently attributes
> the cost to the next line. I.e. I have code like this:
> 
> ~~~
>   #include 
>   #include 
> 
>   using namespace std;
> 
>   int main()
>   {
> this_thread::sleep_for(chrono::milliseconds(1000));
> this_thread::sleep_for(chrono::milliseconds(100));
> this_thread::sleep_for(chrono::milliseconds(10));
> 
> return 0;
>   }
> ~~~

It'd be nice if the test program has a signal frame for verification.


> 
> Now compile and record it:
> 
> ~~~
> g++ -std=c++11 -g -O2 test.cpp
> echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
> perf record \
> --event sched:sched_stat_sleep \
> --event sched:sched_process_exit \
> --event sched:sched_switch --call-graph=dwarf \
> --output perf.data.raw \
> ./a.out
> echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
> perf inject --sched-stat --input perf.data.raw --output perf.data
> ~~~
> 
> Before this patch, the report clearly shows the off-by-one issue.
> Most notably, the last sleep invocation is incorrectly attributed
> to the "return 0;" line:
> 
> ~~~
>   Overhead  Source:Line
>     ...
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
>|
>|--90.08%--main test.cpp:9
>|  __libc_start_main
>|  _start
>|
>|--9.01%--main test.cpp:10
>|  __libc_start_main
>|  _start
>|
> --0.91%--main test.cpp:13
>   __libc_start_main
>   _start
> ~~~
> 
> With this patch here applied, the issue is fixed. The report becomes
> much more usable:
> 
> ~~~
>   Overhead  Source:Line
>     ...
> 
>100.00%  core.c:0
> |
> ---__schedule core.c:0
>schedule
>do_nanosleep hrtimer.c:0
>hrtimer_nanosleep
>sys_nanosleep
>entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
>__nanosleep_nocancel .:0
>std::this_thread::sleep_for > 
> thread:323
>|
>|--90.08%--main test.cpp:8
>|  __libc_start_main
>|  _start
>|
>|--9.01%--main test.cpp:9
>|  __libc_start_main
>|  _start
>|
> --0.91%--main test.cpp:10
>   __libc_start_main
>   _start
> ~~~
> 
> Note how this patch fixes this issue for both unwinding methods, i.e.
> both dwfl and libunwind. The former case is straight-forward thanks
> to dwfl_frame_pc. For libunwind, we replace the functionality via
> unw_is_signal_frame for any but the very first frame.
> 
> Cc: Arnaldo Carvalho de Melo 
> Cc: David Ahern 
> Cc: Namhyung Kim 
> Cc: Peter Zijlstra 
> Cc: Yao Jin 
> Signed-off-by: Milian Wolff 
> ---
>  tools/perf/util/unwind-libdw.c   |  6 +-
>  tools/perf/util/unwind-libunwind-local.c | 14 ++
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> v2:
> - use unw_is_signal_frame to also fix this issue for libunwind
> 
> diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
> index f90e11a555b2..943a06291587 100644
> --- a/tools/perf/util/unwind-libdw.c
> +++ b/tools/perf/util/unwind-libdw.c
> @@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
>  {
>   struct unwind_info *ui = arg;
>   Dwarf_Addr pc;
> + bool isactivation;
>  
> - if (!dwfl_frame_pc(state, , NULL)) {
> + if (!dwfl_frame_pc(state, , )) {
>   pr_err("%s", dwfl_errmsg(-1));
>   return DWARF_CB_ABORT;
>   }
>  
> + if (!isactivation)
> + --pc;
> +
>   return entry(pc, ui) || !(--ui->max_stack) ?
>  DWARF_CB_ABORT : DWARF_CB_OK;
>  }
> diff --git a/tools/perf/util/unwind-libunwind-local.c 
> b/tools/perf/util/unwind-libunwind-local.c
> index f8455bed6e65..30ab26375c80 100644
> --- a/tools/perf/util/unwind-libunwind-local.c
> +++ b/tools/perf/util/unwind-libunwind-local.c
> @@ -690,8 +690,22 @@ static int get_entries(struct unwind_info *ui, 
> 

Re: [PATCH] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Milian Wolff
On Dienstag, 16. Mai 2017 03:57:53 CEST Namhyung Kim wrote:
> On Mon, May 15, 2017 at 05:13:06PM +0200, Milian Wolff wrote:
> > On Monday, May 15, 2017 5:04:44 PM CEST Milian Wolff wrote:
> > > As the documentation for dwfl_frame_pc says, frames that
> > > are no activation frames need to have their program counter
> > > decremented by one to properly find the function of the caller.
> > 
> > Note that this leaves the perf build against libunwind in the current,
> > broken state. I do not know how to detect the activation property there.
> > Does anyone else? See elfutils source code for what it is doing:
> > 
> > https://sourceware.org/git/?p=elfutils.git;a=blob;f=libdwfl/
> > dwfl_frame_pc.c;h=296c815b9c73f42d79ac1778d2a0c420b89ee4eb;hb=HEAD
> 
> It seems that you can use unw_is_signal_frame().

Thank you! The v2 patch I just sent now uses that and it seems to work 
reliably in my quick test.

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Milian Wolff
On Dienstag, 16. Mai 2017 03:57:53 CEST Namhyung Kim wrote:
> On Mon, May 15, 2017 at 05:13:06PM +0200, Milian Wolff wrote:
> > On Monday, May 15, 2017 5:04:44 PM CEST Milian Wolff wrote:
> > > As the documentation for dwfl_frame_pc says, frames that
> > > are no activation frames need to have their program counter
> > > decremented by one to properly find the function of the caller.
> > 
> > Note that this leaves the perf build against libunwind in the current,
> > broken state. I do not know how to detect the activation property there.
> > Does anyone else? See elfutils source code for what it is doing:
> > 
> > https://sourceware.org/git/?p=elfutils.git;a=blob;f=libdwfl/
> > dwfl_frame_pc.c;h=296c815b9c73f42d79ac1778d2a0c420b89ee4eb;hb=HEAD
> 
> It seems that you can use unw_is_signal_frame().

Thank you! The v2 patch I just sent now uses that and it seems to work 
reliably in my quick test.

Cheers

-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


[PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Milian Wolff
As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
g++ -std=c++11 -g -O2 test.cpp
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc. For libunwind, we replace the functionality via
unw_is_signal_frame for any but the very first frame.

Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Yao Jin 
Signed-off-by: Milian Wolff 
---
 tools/perf/util/unwind-libdw.c   |  6 +-
 tools/perf/util/unwind-libunwind-local.c | 14 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

v2:
- use unw_is_signal_frame to also fix this issue for libunwind

diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index f90e11a555b2..943a06291587 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
 {
struct unwind_info *ui = arg;
Dwarf_Addr pc;
+   bool isactivation;
 
-   if (!dwfl_frame_pc(state, , NULL)) {
+   if (!dwfl_frame_pc(state, , )) {
pr_err("%s", dwfl_errmsg(-1));
return DWARF_CB_ABORT;
}
 
+   if (!isactivation)
+   --pc;
+
return entry(pc, ui) || !(--ui->max_stack) ?
   DWARF_CB_ABORT : DWARF_CB_OK;
 }
diff --git a/tools/perf/util/unwind-libunwind-local.c 
b/tools/perf/util/unwind-libunwind-local.c
index f8455bed6e65..30ab26375c80 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -690,8 +690,22 @@ static int get_entries(struct unwind_info *ui, 
unwind_entry_cb_t cb,
if (ret)
display_error(ret);
 
+   bool previous_frame_was_signal = false;
while (!ret && (unw_step() > 0) && i < max_stack) {

[PATCH v2] perf report: fix off-by-one for non-activation frames

2017-05-16 Thread Milian Wolff
As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
g++ -std=c++11 -g -O2 test.cpp
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

With this patch here applied, the issue is fixed. The report becomes
much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Note how this patch fixes this issue for both unwinding methods, i.e.
both dwfl and libunwind. The former case is straight-forward thanks
to dwfl_frame_pc. For libunwind, we replace the functionality via
unw_is_signal_frame for any but the very first frame.

Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Yao Jin 
Signed-off-by: Milian Wolff 
---
 tools/perf/util/unwind-libdw.c   |  6 +-
 tools/perf/util/unwind-libunwind-local.c | 14 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

v2:
- use unw_is_signal_frame to also fix this issue for libunwind

diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index f90e11a555b2..943a06291587 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
 {
struct unwind_info *ui = arg;
Dwarf_Addr pc;
+   bool isactivation;
 
-   if (!dwfl_frame_pc(state, , NULL)) {
+   if (!dwfl_frame_pc(state, , )) {
pr_err("%s", dwfl_errmsg(-1));
return DWARF_CB_ABORT;
}
 
+   if (!isactivation)
+   --pc;
+
return entry(pc, ui) || !(--ui->max_stack) ?
   DWARF_CB_ABORT : DWARF_CB_OK;
 }
diff --git a/tools/perf/util/unwind-libunwind-local.c 
b/tools/perf/util/unwind-libunwind-local.c
index f8455bed6e65..30ab26375c80 100644
--- a/tools/perf/util/unwind-libunwind-local.c
+++ b/tools/perf/util/unwind-libunwind-local.c
@@ -690,8 +690,22 @@ static int get_entries(struct unwind_info *ui, 
unwind_entry_cb_t cb,
if (ret)
display_error(ret);
 
+   bool previous_frame_was_signal = false;
while (!ret && (unw_step() > 0) && i < max_stack) {
unw_get_reg(, UNW_REG_IP, [i]);
+
+   /*
+* Decrement the IP for any non-activation frames.
+* this is required to 

Re: [PATCH] perf report: fix off-by-one for non-activation frames

2017-05-15 Thread Namhyung Kim
On Mon, May 15, 2017 at 05:13:06PM +0200, Milian Wolff wrote:
> On Monday, May 15, 2017 5:04:44 PM CEST Milian Wolff wrote:
> > As the documentation for dwfl_frame_pc says, frames that
> > are no activation frames need to have their program counter
> > decremented by one to properly find the function of the caller.
> 
> Note that this leaves the perf build against libunwind in the current, broken 
> state. I do not know how to detect the activation property there. Does anyone 
> else? See elfutils source code for what it is doing:
> 
> https://sourceware.org/git/?p=elfutils.git;a=blob;f=libdwfl/
> dwfl_frame_pc.c;h=296c815b9c73f42d79ac1778d2a0c420b89ee4eb;hb=HEAD

It seems that you can use unw_is_signal_frame().

Thanks,
Namhyung


Re: [PATCH] perf report: fix off-by-one for non-activation frames

2017-05-15 Thread Namhyung Kim
On Mon, May 15, 2017 at 05:13:06PM +0200, Milian Wolff wrote:
> On Monday, May 15, 2017 5:04:44 PM CEST Milian Wolff wrote:
> > As the documentation for dwfl_frame_pc says, frames that
> > are no activation frames need to have their program counter
> > decremented by one to properly find the function of the caller.
> 
> Note that this leaves the perf build against libunwind in the current, broken 
> state. I do not know how to detect the activation property there. Does anyone 
> else? See elfutils source code for what it is doing:
> 
> https://sourceware.org/git/?p=elfutils.git;a=blob;f=libdwfl/
> dwfl_frame_pc.c;h=296c815b9c73f42d79ac1778d2a0c420b89ee4eb;hb=HEAD

It seems that you can use unw_is_signal_frame().

Thanks,
Namhyung


Re: [PATCH] perf report: fix off-by-one for non-activation frames

2017-05-15 Thread Milian Wolff
On Monday, May 15, 2017 5:04:44 PM CEST Milian Wolff wrote:
> As the documentation for dwfl_frame_pc says, frames that
> are no activation frames need to have their program counter
> decremented by one to properly find the function of the caller.

Note that this leaves the perf build against libunwind in the current, broken 
state. I do not know how to detect the activation property there. Does anyone 
else? See elfutils source code for what it is doing:

https://sourceware.org/git/?p=elfutils.git;a=blob;f=libdwfl/
dwfl_frame_pc.c;h=296c815b9c73f42d79ac1778d2a0c420b89ee4eb;hb=HEAD

Cheers
-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] perf report: fix off-by-one for non-activation frames

2017-05-15 Thread Milian Wolff
On Monday, May 15, 2017 5:04:44 PM CEST Milian Wolff wrote:
> As the documentation for dwfl_frame_pc says, frames that
> are no activation frames need to have their program counter
> decremented by one to properly find the function of the caller.

Note that this leaves the perf build against libunwind in the current, broken 
state. I do not know how to detect the activation property there. Does anyone 
else? See elfutils source code for what it is doing:

https://sourceware.org/git/?p=elfutils.git;a=blob;f=libdwfl/
dwfl_frame_pc.c;h=296c815b9c73f42d79ac1778d2a0c420b89ee4eb;hb=HEAD

Cheers
-- 
Milian Wolff | milian.wo...@kdab.com | Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

smime.p7s
Description: S/MIME cryptographic signature


[PATCH] perf report: fix off-by-one for non-activation frames

2017-05-15 Thread Milian Wolff
As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
g++ -std=c++11 -g -O2 test.cpp
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

When compiling perf using libdwfl for unwinding instead of libunwind
and having this patch here applied, the issue is fixed. The report
becomes much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Yao Jin 
Signed-off-by: Milian Wolff 
---
 tools/perf/util/unwind-libdw.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index f90e11a555b2..943a06291587 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
 {
struct unwind_info *ui = arg;
Dwarf_Addr pc;
+   bool isactivation;
 
-   if (!dwfl_frame_pc(state, , NULL)) {
+   if (!dwfl_frame_pc(state, , )) {
pr_err("%s", dwfl_errmsg(-1));
return DWARF_CB_ABORT;
}
 
+   if (!isactivation)
+   --pc;
+
return entry(pc, ui) || !(--ui->max_stack) ?
   DWARF_CB_ABORT : DWARF_CB_OK;
 }
-- 
2.13.0



[PATCH] perf report: fix off-by-one for non-activation frames

2017-05-15 Thread Milian Wolff
As the documentation for dwfl_frame_pc says, frames that
are no activation frames need to have their program counter
decremented by one to properly find the function of the caller.

This fixes many cases where perf report currently attributes
the cost to the next line. I.e. I have code like this:

~~~
  #include 
  #include 

  using namespace std;

  int main()
  {
this_thread::sleep_for(chrono::milliseconds(1000));
this_thread::sleep_for(chrono::milliseconds(100));
this_thread::sleep_for(chrono::milliseconds(10));

return 0;
  }
~~~

Now compile and record it:

~~~
g++ -std=c++11 -g -O2 test.cpp
echo 1 | sudo tee /proc/sys/kernel/sched_schedstats
perf record \
--event sched:sched_stat_sleep \
--event sched:sched_process_exit \
--event sched:sched_switch --call-graph=dwarf \
--output perf.data.raw \
./a.out
echo 0 | sudo tee /proc/sys/kernel/sched_schedstats
perf inject --sched-stat --input perf.data.raw --output perf.data
~~~

Before this patch, the report clearly shows the off-by-one issue.
Most notably, the last sleep invocation is incorrectly attributed
to the "return 0;" line:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:10
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:13
  __libc_start_main
  _start
~~~

When compiling perf using libdwfl for unwinding instead of libunwind
and having this patch here applied, the issue is fixed. The report
becomes much more usable:

~~~
  Overhead  Source:Line
    ...

   100.00%  core.c:0
|
---__schedule core.c:0
   schedule
   do_nanosleep hrtimer.c:0
   hrtimer_nanosleep
   sys_nanosleep
   entry_SYSCALL_64_fastpath .tmp_entry_64.o:0
   __nanosleep_nocancel .:0
   std::this_thread::sleep_for > 
thread:323
   |
   |--90.08%--main test.cpp:8
   |  __libc_start_main
   |  _start
   |
   |--9.01%--main test.cpp:9
   |  __libc_start_main
   |  _start
   |
--0.91%--main test.cpp:10
  __libc_start_main
  _start
~~~

Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Yao Jin 
Signed-off-by: Milian Wolff 
---
 tools/perf/util/unwind-libdw.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/unwind-libdw.c b/tools/perf/util/unwind-libdw.c
index f90e11a555b2..943a06291587 100644
--- a/tools/perf/util/unwind-libdw.c
+++ b/tools/perf/util/unwind-libdw.c
@@ -168,12 +168,16 @@ frame_callback(Dwfl_Frame *state, void *arg)
 {
struct unwind_info *ui = arg;
Dwarf_Addr pc;
+   bool isactivation;
 
-   if (!dwfl_frame_pc(state, , NULL)) {
+   if (!dwfl_frame_pc(state, , )) {
pr_err("%s", dwfl_errmsg(-1));
return DWARF_CB_ABORT;
}
 
+   if (!isactivation)
+   --pc;
+
return entry(pc, ui) || !(--ui->max_stack) ?
   DWARF_CB_ABORT : DWARF_CB_OK;
 }
-- 
2.13.0