re: gcc: optimizations, and stack traces

2018-02-23 Thread matthew green
Andrew Cagney writes:
> On 23 February 2018 at 03:41, Maxime Villard  wrote:
> 
> > Many of our ASM functions don't push frames, but that's a different issue.
> 
> /me mumbles something about the assembler needing to be marked up with
> .cfi directives

yup -- with proper directives you can debug complex asm.

this should not require codegen to change for C code, but
ensuring the compiler emits the right directives.


.mrg.


Re: gcc: optimizations, and stack traces

2018-02-23 Thread Andrew Cagney
Two things come to mind:

- was this the innermost (newest) frame?  If it wasn't something
earlier could be the problem
- is the dwarf debug info being used, or is it relying on heuristics
(annoyingly I can't spot an easy way to tell)

and when this happens, 'info frame' may help diagnose things.

Looking at:

>
> void
> kernfs_get_rrootdev(void)
> {
> static int tried = 0;
>
> if (tried) {
> /* Already did it once. */
> return;
> }
> tried = 1;
>
> if (rootdev == NODEV)
> return;
> rrootdev = devsw_blk2chr(rootdev);
> if (rrootdev != NODEV)
> return;
> rrootdev = NODEV;
> printf("kernfs_get_rrootdev: no raw root device\n");
> }

I get:

043c :
 43c:   8b 05 00 00 00 00   mov0x0(%rip),%eax# 442

 442:   85 c0   test   %eax,%eax
 444:   75 2e   jne474 
 446:   c7 05 00 00 00 00 01movl   $0x1,0x0(%rip)# 450

 44d:   00 00 00
 450:   48 8b 3d 00 00 00 00mov0x0(%rip),%rdi# 457

 457:   48 83 ff ff cmp$0x,%rdi
 45b:   74 17   je 474 
 45d:   55  push   %rbp
 45e:   48 89 e5mov%rsp,%rbp
 ...

and has CFI (readelf --debug-dump=frames-interp
amd64/sys/arch/amd64/compile/GENERIC/kernfs_vfsops.o):

01a4 0028 01a8 FDE cie=
pc=043c..0484
   LOC   CFA  rbp   ra
043c rsp+8u c-8
045e rsp+16   c-16  c-8
...

so, until the push, the CFI has't specified RBP, but a reasonable
interpretation s current value.  So this, to me looks ok.


Re: gcc: optimizations, and stack traces

2018-02-23 Thread Andrew Cagney
On 23 February 2018 at 03:41, Maxime Villard  wrote:

> Many of our ASM functions don't push frames, but that's a different issue.

/me mumbles something about the assembler needing to be marked up with
.cfi directives


Re: gcc: optimizations, and stack traces

2018-02-23 Thread Maxime Villard

Le 18/02/2018 à 21:37, Maxime Villard a écrit :

Le 11/02/2018 à 12:04, Krister Walfridsson a écrit :

On Sun, Feb 11, 2018 at 9:11 AM, Maxime Villard  wrote:

[...] we need to find a way
to tell GCC to always push the frame at the beginning of the functions.


This is done by passing the -fno-shrink-wrap flag to GCC.

/Krister


Sorry for the delay; I tested a week ago with -fno-shrink-wrap and it didn't
change anything. I'll retry properly soon.


I re-tested properly, and indeed it works. I could verify that the frame is
always pushed at the very beginning of the functions.

Many of our ASM functions don't push frames, but that's a different issue.

Thanks for the info,
Maxime


Re: gcc: optimizations, and stack traces

2018-02-18 Thread Maxime Villard

Le 11/02/2018 à 12:04, Krister Walfridsson a écrit :

On Sun, Feb 11, 2018 at 9:11 AM, Maxime Villard  wrote:

[...] we need to find a way
to tell GCC to always push the frame at the beginning of the functions.


This is done by passing the -fno-shrink-wrap flag to GCC.

/Krister


Sorry for the delay; I tested a week ago with -fno-shrink-wrap and it didn't
change anything. I'll retry properly soon.

Thanks,
Maxime


Re: gcc: optimizations, and stack traces

2018-02-11 Thread Joerg Sonnenberger
On Sun, Feb 11, 2018 at 04:13:56PM +0700, Robert Elz wrote:
> Date:Sun, 11 Feb 2018 09:11:45 +0100
> From:Maxime Villard 
> Message-ID:  <2c83e9d9-f49c-479b-7a4c-1df581a2b...@m00nbsd.net>
> 
>   | So we have the same problem, and we need to find a way
>   | to tell GCC to always push the frame at the beginning of the functions.
> 
> Either that or the stack unwind code needs to become smarter - which
> would be a better solution, as it avoids dropping (the admittedly minor)
> benefit  obtained from deferring the frame pointer update (which to be a
> useful solution would need to be universal) and adds a (not insignificant)
> cost to the stack unwind code - but performance there usually does
> not matter.

Again, the logic for that already exists. -fomit-frame-pointer would not
be acceptable otherwise.

Joerg


Re: gcc: optimizations, and stack traces

2018-02-11 Thread Krister Walfridsson
On Sun, Feb 11, 2018 at 9:11 AM, Maxime Villard  wrote:
> [...] we need to find a way
> to tell GCC to always push the frame at the beginning of the functions.

This is done by passing the -fno-shrink-wrap flag to GCC.

   /Krister


Re: gcc: optimizations, and stack traces

2018-02-11 Thread Robert Elz
Date:Sun, 11 Feb 2018 09:11:45 +0100
From:Maxime Villard 
Message-ID:  <2c83e9d9-f49c-479b-7a4c-1df581a2b...@m00nbsd.net>

  | So we have the same problem, and we need to find a way
  | to tell GCC to always push the frame at the beginning of the functions.

Either that or the stack unwind code needs to become smarter - which
would be a better solution, as it avoids dropping (the admittedly minor)
benefit  obtained from deferring the frame pointer update (which to be a
useful solution would need to be universal) and adds a (not insignificant)
cost to the stack unwind code - but performance there usually does
not matter.

We know all the information needed to unwind the stack correctly is
available, as (assuming gdb is not involved at all, and the code just
executes) the stack is correctly unwound when the function returns.

The only issue then is finding it - either the frame pointer or the stack
pointer references the current frame, the return address is obtained
from one or the other (and if needed, so is the previous frame poimter).

We know where the current function starts - the lookup of rip gave us
that info along with the function name, so we can look at that and find
the location where the frame pointer push happens (gdb already knows
how to decode instructions), we should be able to work out (by
simulation the instructions from function entry if needed) whether or
not the frame pointer push happened - at least with a fairly high
degree of confidence - enough to usually get things right.  Once
we know that, we have all the info needed to unwind properly.

Of course, this is no trivial amount of work, but it should be possible
to achieve something reasonable, if you really need it.

Personally, I've never found this kind of thing (including when the
compiler has optimised out tail recursion, and similar) all that
"impossible to debug" - somewhere back in the stack trace you
get accurate information about where a call originated.  From the
deepest such  point you can work out what was called (it is either
obvious, or generally possible to determine by looking at register/mem
contents).   From that, and from knowledge of where the code was when
it stopped it is usually not all that hard to determine what must have
happened - and in the investigation you tend to be forced to look at the
code involved so closely that you sometimes encounter the bug that you
was the cause of the problem before you were actually ready to start
looking for it..

kre



Re: gcc: optimizations, and stack traces

2018-02-11 Thread Maxime Villard

Le 09/02/2018 à 13:32, Joerg Sonnenberger a écrit :

On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote:

It implies that if a bug occurs _before_ these two instructions are executed,
we have a %rbp that points to the _previous_ function, the one we got called
from. And therefore, GDB does not display the current function (where the bug
actually happened), but displays its caller.


This analysis is wrong. GDB will first of all look for frame annotation
data, i.e. .eh_frame or the corresponding .debug_frame. Only if it can't
find such annotation will it fall back to guessing from the function
itself. We default to building .eh_frame for all binaries, but I'm not
completely sure if GCC will create async unwind tables by default.


I've investigated the issue. My analysis was only partly incorrect.

In fact, GDB _does_ display the current function: it reads the %rip from which
we faulted, and finds the function name by looking at the symbol table.

However, it may not display the caller of the function. In order to obtain
the caller GDB will iterate as I said:

[the current function was displayed]
uint64_t *rbp = read_rbp();
uint64_t rip;
while (1) {
if (rbp == NULL)
break; /* End of the chain */
rip = *(rbp + 1);
name = find_function_from_rip(rip); /* whatever */
print(name);
rbp = (uint64_t *)*ptr;
}

Here, in the first iteration, %rbp points to the frame the caller pushed, and
therefore it indicates the %rip of the caller of the caller. But the %rip of
the caller itself is skipped.

If you add a global function that dereferences a pointer _before_ pushing a
frame, and then call this function from sys_rasctl(), the GDB trace you get
is:

my_deref_func()
syscall()

sys_rasctl is missing. So we have the same problem, and we need to find a way
to tell GCC to always push the frame at the beginning of the functions.


Le 09/02/2018 à 12:13, Valery Ushakov a écrit :

Does gcc actually generates code like that?  I thought that it can
delay frame pointer creation, but only until it needs to make a nested
call, to C in your example, (as in the sample I showed in another mail
to this thread).


Indeed, it can't generate code like that. I was confused, because I had
specific requirements when I first investigated this (getting a trace across
interrupts).

Maxime


Re: gcc: optimizations, and stack traces

2018-02-09 Thread Maxime Villard

Le 09/02/2018 à 13:32, Joerg Sonnenberger a écrit :

On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote:

It implies that if a bug occurs _before_ these two instructions are executed,
we have a %rbp that points to the _previous_ function, the one we got called
from. And therefore, GDB does not display the current function (where the bug
actually happened), but displays its caller.


This analysis is wrong. GDB will first of all look for frame annotation
data, i.e. .eh_frame or the corresponding .debug_frame. Only if it can't
find such annotation will it fall back to guessing from the function
itself. We default to building .eh_frame for all binaries, but I'm not
completely sure if GCC will create async unwind tables by default.


I'll have to re-check the GDB code, but that the previous function was
displayed and not the current one is the conclusion I came to back then. Will
verify tomorrow.

Maxime


Re: gcc: optimizations, and stack traces

2018-02-09 Thread Maxime Villard

Le 09/02/2018 à 12:13, Valery Ushakov a écrit :

[Summoning Krister]

On Fri, Feb 09, 2018 at 11:23:17 +0100, Maxime Villard wrote:


There are also several cases where functions in the call tree can disappear
from the backtrace. In the following call tree:

A -> B -> C -> D   (and D panics)

if, in B, GCC put the two instructions after the instruction that calls C,
the backtrace will be:

A -> C -> D

This can make a bug completely undebuggable.


Does gcc actually generates code like that?  I thought that it can
delay frame pointer creation, but only until it needs to make a nested
call, to C in your example, (as in the sample I showed in another mail
to this thread).


Mmh, now I'm not so sure about this. Wait a minute, I'll re-give a look and
try to understand what I was doing.

Maxime


Re: gcc: optimizations, and stack traces

2018-02-09 Thread Joerg Sonnenberger
On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote:
> It implies that if a bug occurs _before_ these two instructions are executed,
> we have a %rbp that points to the _previous_ function, the one we got called
> from. And therefore, GDB does not display the current function (where the bug
> actually happened), but displays its caller.

This analysis is wrong. GDB will first of all look for frame annotation
data, i.e. .eh_frame or the corresponding .debug_frame. Only if it can't
find such annotation will it fall back to guessing from the function
itself. We default to building .eh_frame for all binaries, but I'm not
completely sure if GCC will create async unwind tables by default.

Joerg


Re: gcc: optimizations, and stack traces

2018-02-09 Thread Valery Ushakov
[Summoning Krister]

On Fri, Feb 09, 2018 at 11:23:17 +0100, Maxime Villard wrote:

> There are also several cases where functions in the call tree can disappear
> from the backtrace. In the following call tree:
> 
>   A -> B -> C -> D   (and D panics)
> 
> if, in B, GCC put the two instructions after the instruction that calls C,
> the backtrace will be:
> 
>   A -> C -> D
> 
> This can make a bug completely undebuggable.

Does gcc actually generates code like that?  I thought that it can
delay frame pointer creation, but only until it needs to make a nested
call, to C in your example, (as in the sample I showed in another mail
to this thread).

-uwe


Re: gcc: optimizations, and stack traces

2018-02-09 Thread Maxime Villard

Le 09/02/2018 à 12:08, Valery Ushakov a écrit :

On Fri, Feb 09, 2018 at 11:38:47 +0100, Martin Husemann wrote:


On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote:


When I spotted this several months ago (while developing Live
Kernel ASLR), I tried to look for GCC options that say "optimize
with -O2, but keep the stack trace intact". I couldn't find one,
and the only thing I ended up doing was disabling -O2 in the
makefiles.


-fno-omit-frame-pointer?


That won't help.

  `-O' also turns on `-fomit-frame-pointer' on machines where doing
  so does not interfere with debugging.

so it's not turned off in the first place.  The problem is that some
of the later optimization passes may push frame pointer setup to some
place later in function.  E.g. on -7

 void
 kernfs_get_rrootdev(void)
 {
 static int tried = 0;

 if (tried) {
 /* Already did it once. */
 return;
 }
 tried = 1;

 if (rootdev == NODEV)
 return;
 rrootdev = devsw_blk2chr(rootdev);
 if (rrootdev != NODEV)
 return;
 rrootdev = NODEV;
 printf("kernfs_get_rrootdev: no raw root device\n");
 }

is compiled to

 c068f81b :
 c068f81b:   mov0xc0fc6b40,%eax
 c068f820:   test   %eax,%eax
 c068f822:   jnec068f867 
 c068f824:   movl   $0x1,0xc0fc6b40
 c068f82e:   mov0xc0fde0b8,%edx
 c068f834:   mov0xc0fde0bc,%eax
 c068f839:   mov%edx,%ecx
 c068f83b:   and%eax,%ecx
 c068f83d:   cmp$0x,%ecx
 c068f840:   je c068f867 
->  c068f842:   push   %ebp
->  c068f843:   mov%esp,%ebp
 c068f845:   sub$0x8,%esp
 c068f848:   mov%edx,(%esp)
 c068f84b:   mov%eax,0x4(%esp)
 c068f84f:   call   c091ce52 


Yes, exactly. -fno-omit-frame-pointer doesn't change anything here, GCC
does not omit the frame pointer but moves the instructions a little later
in the function.

So we need to find a say to keep the two instructions at the beginning...

Maxime


Re: gcc: optimizations, and stack traces

2018-02-09 Thread Valery Ushakov
On Fri, Feb 09, 2018 at 11:38:47 +0100, Martin Husemann wrote:

> On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote:
>
> > When I spotted this several months ago (while developing Live
> > Kernel ASLR), I tried to look for GCC options that say "optimize
> > with -O2, but keep the stack trace intact". I couldn't find one,
> > and the only thing I ended up doing was disabling -O2 in the
> > makefiles.
> 
> -fno-omit-frame-pointer?

That won't help.

 `-O' also turns on `-fomit-frame-pointer' on machines where doing
 so does not interfere with debugging.

so it's not turned off in the first place.  The problem is that some
of the later optimization passes may push frame pointer setup to some
place later in function.  E.g. on -7 

void
kernfs_get_rrootdev(void)
{
static int tried = 0;

if (tried) {
/* Already did it once. */
return;
}
tried = 1;

if (rootdev == NODEV)
return;
rrootdev = devsw_blk2chr(rootdev);
if (rrootdev != NODEV)
return;
rrootdev = NODEV;
printf("kernfs_get_rrootdev: no raw root device\n");
}

is compiled to 

c068f81b :
c068f81b:   mov0xc0fc6b40,%eax
c068f820:   test   %eax,%eax
c068f822:   jnec068f867 
c068f824:   movl   $0x1,0xc0fc6b40
c068f82e:   mov0xc0fde0b8,%edx
c068f834:   mov0xc0fde0bc,%eax
c068f839:   mov%edx,%ecx
c068f83b:   and%eax,%ecx
c068f83d:   cmp$0x,%ecx
c068f840:   je c068f867 
->  c068f842:   push   %ebp
->  c068f843:   mov%esp,%ebp
c068f845:   sub$0x8,%esp
c068f848:   mov%edx,(%esp)
c068f84b:   mov%eax,0x4(%esp)
c068f84f:   call   c091ce52 

So the "tried" check and the first "rootdev" check happen before the
frame pointer is set up.

-uwe


Re: gcc: optimizations, and stack traces

2018-02-09 Thread Martin Husemann
On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote:
> When I spotted this several months ago (while developing Live Kernel ASLR), I
> tried to look for GCC options that say "optimize with -O2, but keep the stack
> trace intact". I couldn't find one, and the only thing I ended up doing was
> disabling -O2 in the makefiles.

-fno-ommit-frame-pointer?

Martin