re: gcc: optimizations, and stack traces
Andrew Cagney writes: > On 23 February 2018 at 03:41, Maxime Villard wrote: > > > Many of our ASM functions don't push frames, but that's a different issue. > > /me mumbles something about the assembler needing to be marked up with > .cfi directives yup -- with proper directives you can debug complex asm. this should not require codegen to change for C code, but ensuring the compiler emits the right directives. .mrg.
Re: gcc: optimizations, and stack traces
Two things come to mind: - was this the innermost (newest) frame? If it wasn't something earlier could be the problem - is the dwarf debug info being used, or is it relying on heuristics (annoyingly I can't spot an easy way to tell) and when this happens, 'info frame' may help diagnose things. Looking at: > > void > kernfs_get_rrootdev(void) > { > static int tried = 0; > > if (tried) { > /* Already did it once. */ > return; > } > tried = 1; > > if (rootdev == NODEV) > return; > rrootdev = devsw_blk2chr(rootdev); > if (rrootdev != NODEV) > return; > rrootdev = NODEV; > printf("kernfs_get_rrootdev: no raw root device\n"); > } I get: 043c : 43c: 8b 05 00 00 00 00 mov0x0(%rip),%eax# 442 442: 85 c0 test %eax,%eax 444: 75 2e jne474 446: c7 05 00 00 00 00 01movl $0x1,0x0(%rip)# 450 44d: 00 00 00 450: 48 8b 3d 00 00 00 00mov0x0(%rip),%rdi# 457 457: 48 83 ff ff cmp$0x,%rdi 45b: 74 17 je 474 45d: 55 push %rbp 45e: 48 89 e5mov%rsp,%rbp ... and has CFI (readelf --debug-dump=frames-interp amd64/sys/arch/amd64/compile/GENERIC/kernfs_vfsops.o): 01a4 0028 01a8 FDE cie= pc=043c..0484 LOC CFA rbp ra 043c rsp+8u c-8 045e rsp+16 c-16 c-8 ... so, until the push, the CFI has't specified RBP, but a reasonable interpretation s current value. So this, to me looks ok.
Re: gcc: optimizations, and stack traces
On 23 February 2018 at 03:41, Maxime Villard wrote: > Many of our ASM functions don't push frames, but that's a different issue. /me mumbles something about the assembler needing to be marked up with .cfi directives
Re: gcc: optimizations, and stack traces
Le 18/02/2018 à 21:37, Maxime Villard a écrit : Le 11/02/2018 à 12:04, Krister Walfridsson a écrit : On Sun, Feb 11, 2018 at 9:11 AM, Maxime Villard wrote: [...] we need to find a way to tell GCC to always push the frame at the beginning of the functions. This is done by passing the -fno-shrink-wrap flag to GCC. /Krister Sorry for the delay; I tested a week ago with -fno-shrink-wrap and it didn't change anything. I'll retry properly soon. I re-tested properly, and indeed it works. I could verify that the frame is always pushed at the very beginning of the functions. Many of our ASM functions don't push frames, but that's a different issue. Thanks for the info, Maxime
Re: gcc: optimizations, and stack traces
Le 11/02/2018 à 12:04, Krister Walfridsson a écrit : On Sun, Feb 11, 2018 at 9:11 AM, Maxime Villard wrote: [...] we need to find a way to tell GCC to always push the frame at the beginning of the functions. This is done by passing the -fno-shrink-wrap flag to GCC. /Krister Sorry for the delay; I tested a week ago with -fno-shrink-wrap and it didn't change anything. I'll retry properly soon. Thanks, Maxime
Re: gcc: optimizations, and stack traces
On Sun, Feb 11, 2018 at 04:13:56PM +0700, Robert Elz wrote: > Date:Sun, 11 Feb 2018 09:11:45 +0100 > From:Maxime Villard > Message-ID: <2c83e9d9-f49c-479b-7a4c-1df581a2b...@m00nbsd.net> > > | So we have the same problem, and we need to find a way > | to tell GCC to always push the frame at the beginning of the functions. > > Either that or the stack unwind code needs to become smarter - which > would be a better solution, as it avoids dropping (the admittedly minor) > benefit obtained from deferring the frame pointer update (which to be a > useful solution would need to be universal) and adds a (not insignificant) > cost to the stack unwind code - but performance there usually does > not matter. Again, the logic for that already exists. -fomit-frame-pointer would not be acceptable otherwise. Joerg
Re: gcc: optimizations, and stack traces
On Sun, Feb 11, 2018 at 9:11 AM, Maxime Villard wrote: > [...] we need to find a way > to tell GCC to always push the frame at the beginning of the functions. This is done by passing the -fno-shrink-wrap flag to GCC. /Krister
Re: gcc: optimizations, and stack traces
Date:Sun, 11 Feb 2018 09:11:45 +0100 From:Maxime Villard Message-ID: <2c83e9d9-f49c-479b-7a4c-1df581a2b...@m00nbsd.net> | So we have the same problem, and we need to find a way | to tell GCC to always push the frame at the beginning of the functions. Either that or the stack unwind code needs to become smarter - which would be a better solution, as it avoids dropping (the admittedly minor) benefit obtained from deferring the frame pointer update (which to be a useful solution would need to be universal) and adds a (not insignificant) cost to the stack unwind code - but performance there usually does not matter. We know all the information needed to unwind the stack correctly is available, as (assuming gdb is not involved at all, and the code just executes) the stack is correctly unwound when the function returns. The only issue then is finding it - either the frame pointer or the stack pointer references the current frame, the return address is obtained from one or the other (and if needed, so is the previous frame poimter). We know where the current function starts - the lookup of rip gave us that info along with the function name, so we can look at that and find the location where the frame pointer push happens (gdb already knows how to decode instructions), we should be able to work out (by simulation the instructions from function entry if needed) whether or not the frame pointer push happened - at least with a fairly high degree of confidence - enough to usually get things right. Once we know that, we have all the info needed to unwind properly. Of course, this is no trivial amount of work, but it should be possible to achieve something reasonable, if you really need it. Personally, I've never found this kind of thing (including when the compiler has optimised out tail recursion, and similar) all that "impossible to debug" - somewhere back in the stack trace you get accurate information about where a call originated. From the deepest such point you can work out what was called (it is either obvious, or generally possible to determine by looking at register/mem contents). From that, and from knowledge of where the code was when it stopped it is usually not all that hard to determine what must have happened - and in the investigation you tend to be forced to look at the code involved so closely that you sometimes encounter the bug that you was the cause of the problem before you were actually ready to start looking for it.. kre
Re: gcc: optimizations, and stack traces
Le 09/02/2018 à 13:32, Joerg Sonnenberger a écrit : On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote: It implies that if a bug occurs _before_ these two instructions are executed, we have a %rbp that points to the _previous_ function, the one we got called from. And therefore, GDB does not display the current function (where the bug actually happened), but displays its caller. This analysis is wrong. GDB will first of all look for frame annotation data, i.e. .eh_frame or the corresponding .debug_frame. Only if it can't find such annotation will it fall back to guessing from the function itself. We default to building .eh_frame for all binaries, but I'm not completely sure if GCC will create async unwind tables by default. I've investigated the issue. My analysis was only partly incorrect. In fact, GDB _does_ display the current function: it reads the %rip from which we faulted, and finds the function name by looking at the symbol table. However, it may not display the caller of the function. In order to obtain the caller GDB will iterate as I said: [the current function was displayed] uint64_t *rbp = read_rbp(); uint64_t rip; while (1) { if (rbp == NULL) break; /* End of the chain */ rip = *(rbp + 1); name = find_function_from_rip(rip); /* whatever */ print(name); rbp = (uint64_t *)*ptr; } Here, in the first iteration, %rbp points to the frame the caller pushed, and therefore it indicates the %rip of the caller of the caller. But the %rip of the caller itself is skipped. If you add a global function that dereferences a pointer _before_ pushing a frame, and then call this function from sys_rasctl(), the GDB trace you get is: my_deref_func() syscall() sys_rasctl is missing. So we have the same problem, and we need to find a way to tell GCC to always push the frame at the beginning of the functions. Le 09/02/2018 à 12:13, Valery Ushakov a écrit : Does gcc actually generates code like that? I thought that it can delay frame pointer creation, but only until it needs to make a nested call, to C in your example, (as in the sample I showed in another mail to this thread). Indeed, it can't generate code like that. I was confused, because I had specific requirements when I first investigated this (getting a trace across interrupts). Maxime
Re: gcc: optimizations, and stack traces
Le 09/02/2018 à 13:32, Joerg Sonnenberger a écrit : On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote: It implies that if a bug occurs _before_ these two instructions are executed, we have a %rbp that points to the _previous_ function, the one we got called from. And therefore, GDB does not display the current function (where the bug actually happened), but displays its caller. This analysis is wrong. GDB will first of all look for frame annotation data, i.e. .eh_frame or the corresponding .debug_frame. Only if it can't find such annotation will it fall back to guessing from the function itself. We default to building .eh_frame for all binaries, but I'm not completely sure if GCC will create async unwind tables by default. I'll have to re-check the GDB code, but that the previous function was displayed and not the current one is the conclusion I came to back then. Will verify tomorrow. Maxime
Re: gcc: optimizations, and stack traces
Le 09/02/2018 à 12:13, Valery Ushakov a écrit : [Summoning Krister] On Fri, Feb 09, 2018 at 11:23:17 +0100, Maxime Villard wrote: There are also several cases where functions in the call tree can disappear from the backtrace. In the following call tree: A -> B -> C -> D (and D panics) if, in B, GCC put the two instructions after the instruction that calls C, the backtrace will be: A -> C -> D This can make a bug completely undebuggable. Does gcc actually generates code like that? I thought that it can delay frame pointer creation, but only until it needs to make a nested call, to C in your example, (as in the sample I showed in another mail to this thread). Mmh, now I'm not so sure about this. Wait a minute, I'll re-give a look and try to understand what I was doing. Maxime
Re: gcc: optimizations, and stack traces
On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote: > It implies that if a bug occurs _before_ these two instructions are executed, > we have a %rbp that points to the _previous_ function, the one we got called > from. And therefore, GDB does not display the current function (where the bug > actually happened), but displays its caller. This analysis is wrong. GDB will first of all look for frame annotation data, i.e. .eh_frame or the corresponding .debug_frame. Only if it can't find such annotation will it fall back to guessing from the function itself. We default to building .eh_frame for all binaries, but I'm not completely sure if GCC will create async unwind tables by default. Joerg
Re: gcc: optimizations, and stack traces
[Summoning Krister] On Fri, Feb 09, 2018 at 11:23:17 +0100, Maxime Villard wrote: > There are also several cases where functions in the call tree can disappear > from the backtrace. In the following call tree: > > A -> B -> C -> D (and D panics) > > if, in B, GCC put the two instructions after the instruction that calls C, > the backtrace will be: > > A -> C -> D > > This can make a bug completely undebuggable. Does gcc actually generates code like that? I thought that it can delay frame pointer creation, but only until it needs to make a nested call, to C in your example, (as in the sample I showed in another mail to this thread). -uwe
Re: gcc: optimizations, and stack traces
Le 09/02/2018 à 12:08, Valery Ushakov a écrit : On Fri, Feb 09, 2018 at 11:38:47 +0100, Martin Husemann wrote: On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote: When I spotted this several months ago (while developing Live Kernel ASLR), I tried to look for GCC options that say "optimize with -O2, but keep the stack trace intact". I couldn't find one, and the only thing I ended up doing was disabling -O2 in the makefiles. -fno-omit-frame-pointer? That won't help. `-O' also turns on `-fomit-frame-pointer' on machines where doing so does not interfere with debugging. so it's not turned off in the first place. The problem is that some of the later optimization passes may push frame pointer setup to some place later in function. E.g. on -7 void kernfs_get_rrootdev(void) { static int tried = 0; if (tried) { /* Already did it once. */ return; } tried = 1; if (rootdev == NODEV) return; rrootdev = devsw_blk2chr(rootdev); if (rrootdev != NODEV) return; rrootdev = NODEV; printf("kernfs_get_rrootdev: no raw root device\n"); } is compiled to c068f81b : c068f81b: mov0xc0fc6b40,%eax c068f820: test %eax,%eax c068f822: jnec068f867 c068f824: movl $0x1,0xc0fc6b40 c068f82e: mov0xc0fde0b8,%edx c068f834: mov0xc0fde0bc,%eax c068f839: mov%edx,%ecx c068f83b: and%eax,%ecx c068f83d: cmp$0x,%ecx c068f840: je c068f867 -> c068f842: push %ebp -> c068f843: mov%esp,%ebp c068f845: sub$0x8,%esp c068f848: mov%edx,(%esp) c068f84b: mov%eax,0x4(%esp) c068f84f: call c091ce52 Yes, exactly. -fno-omit-frame-pointer doesn't change anything here, GCC does not omit the frame pointer but moves the instructions a little later in the function. So we need to find a say to keep the two instructions at the beginning... Maxime
Re: gcc: optimizations, and stack traces
On Fri, Feb 09, 2018 at 11:38:47 +0100, Martin Husemann wrote: > On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote: > > > When I spotted this several months ago (while developing Live > > Kernel ASLR), I tried to look for GCC options that say "optimize > > with -O2, but keep the stack trace intact". I couldn't find one, > > and the only thing I ended up doing was disabling -O2 in the > > makefiles. > > -fno-omit-frame-pointer? That won't help. `-O' also turns on `-fomit-frame-pointer' on machines where doing so does not interfere with debugging. so it's not turned off in the first place. The problem is that some of the later optimization passes may push frame pointer setup to some place later in function. E.g. on -7 void kernfs_get_rrootdev(void) { static int tried = 0; if (tried) { /* Already did it once. */ return; } tried = 1; if (rootdev == NODEV) return; rrootdev = devsw_blk2chr(rootdev); if (rrootdev != NODEV) return; rrootdev = NODEV; printf("kernfs_get_rrootdev: no raw root device\n"); } is compiled to c068f81b : c068f81b: mov0xc0fc6b40,%eax c068f820: test %eax,%eax c068f822: jnec068f867 c068f824: movl $0x1,0xc0fc6b40 c068f82e: mov0xc0fde0b8,%edx c068f834: mov0xc0fde0bc,%eax c068f839: mov%edx,%ecx c068f83b: and%eax,%ecx c068f83d: cmp$0x,%ecx c068f840: je c068f867 -> c068f842: push %ebp -> c068f843: mov%esp,%ebp c068f845: sub$0x8,%esp c068f848: mov%edx,(%esp) c068f84b: mov%eax,0x4(%esp) c068f84f: call c091ce52 So the "tried" check and the first "rootdev" check happen before the frame pointer is set up. -uwe
Re: gcc: optimizations, and stack traces
On Fri, Feb 09, 2018 at 11:23:17AM +0100, Maxime Villard wrote: > When I spotted this several months ago (while developing Live Kernel ASLR), I > tried to look for GCC options that say "optimize with -O2, but keep the stack > trace intact". I couldn't find one, and the only thing I ended up doing was > disabling -O2 in the makefiles. -fno-ommit-frame-pointer? Martin