Re: [Dwarf-Discuss] Stack tracing and return addresses off by 1
To add to what Greg described you may also want to look at section 6.4.4 of the DWARF 5 specification that discusses these issues. In particular note the non-normative description in the fourth paragraph. > 6.4.4 Call Frame Calling Address > > *When virtually unwinding frames, consumers frequently wish to obtain the > address of the instruction which called a subroutine. This information is not > always provided. Typically, however, one of the registers in the virtual > unwind table is the Return Address.* > > If a Return Address register is defined in the virtual unwind table, and its > rule is undefined (for example, by DW_CFA_undefined), then there is no > return address and no call address, and the virtual unwind of stack > activations is complete. > > *In most cases the return address is in the same context as the calling > address, but that need not be the case, especially if the producer knows in > some way the call never will return. The context of the ’return address’ > might be on a different line, in a different lexical block, or past the end > of the calling subroutine. If a consumer were to assume that it was in the > same context as the calling address, the virtual unwind might fail.* > > *For architectures with constant-length instructions where the return address > immediately follows the call instruction, a simple solution is to subtract > the length of an instruction from the return address to obtain the calling > instruction. For architectures with variable-length instructions (for > example, x86), this is not possible. However, subtracting 1 from the return > address, although not guaranteed to provide the exact calling address, > generally will produce an address within the same context as the calling > address, and that usually is sufficient.* Thanks, -Tony Tye ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] Stack tracing and return addresses off by 1
You are correct. For the first frame you don't adjust the PC when looking up the unwind row. For the second frame on up you can decrement the PC value by 1 before when doing the row lookup and that is usually enough to get you to the correct unwind row. The issue is that the return address points to the next instruction after the instruction that called the function. This also fixes issues with tail calls. Now you might ask why decrementing by 1 works when instructions can often be larger that 1 byte. We just need to get to the previous row in the unwind table, and decrementing by 1 is usually enough to get us there because unwind rows start at valid instruction opcode addresses. The other tricky thing to watch out for is that the unwind information isn't always valid for the first frame for all values of the PC in a function. Why? Most unwind information is only valid at places that can throw exceptions. This means that when unwinding the first frame, you really can't trust the unwind info unless you know you are at a location that can throw an exception which is hard to detect by just looking at disassembly. Compilers have the ability to enable asynchronous unwinding with a compiler option, but even if we do enable this, there is no way to look at the unwind information at run time and tell the difference between synchronous unwind info (not valid everywhere in the function, only at places that can throw exceptions) and asynchronous unwind info (valid for any PC value in the function). To make things worse, the information that is put into .debug_frame often is just the same unwind info that is put into .eh_frame (with a few syntactic differences in encoding), so just know that .debug_frame is often only valid at exception call sites but there is not way to tell unless your compiler emits it compiler invocation flags in the DWARF in the DW_TAG_compile_unit as an attribute. The LLDB debugger will use unwind info from the binary for all frames except the first frame. For the first frame, we actually decode assembly and create our own unwind information that is valid everywhere in the function. Now, the good news is, if you are making a backtrace for a thread that has crashed, it should be at a valid address and allow you to use the unwind information. If you also have other threads that were stopped when one thread crashes, they won't be at valid locations for unwind for the first frames. > On Jul 31, 2020, at 5:37 AM, Jayvee Neumann via Dwarf-Discuss > wrote: > > Hello together! > > I am running into a problem while performing a stack trace of x86 code. The > assembly code I am running has been generated by mingw from C++ code and > looks like this: > > 6c9c1210 <__ZN7my_class9my_methodEs>: > 6c9c1210: sub$0x1c,%esp > 6c9c1213: mov0x20(%esp),%edx > 6c9c1217: mov%edx,%eax > 6c9c1219: test %dx,%dx > 6c9c121c: jle6c9c1221 <__ZN7my_class9my_methodEs+0x11> > 6c9c121e: lea0x1(%edx),%eax > 6c9c1221: cwtl > 6c9c1222: mov%eax,(%esp) > 6c9c1225: call 6c9c1190 <__Z12my_dummy_functions> > 6c9c122a: add$0x1c,%esp > 6c9c122d: ret$0x4 > > 6c9c1230 <__ZN8my_struct9my_methodEv>: > 6c9c1230: sub$0x1c,%esp > 6c9c1233: movzwl 0x4(%ecx),%eax > 6c9c1237: test %ax,%ax > 6c9c123a: jle6c9c1241 <__ZN8my_struct9my_methodEv+0x11> > 6c9c123c: add$0x1,%eax > 6c9c123f: jmp6c9c1246 <__ZN8my_struct9my_methodEv+0x16> > 6c9c1241: mov$0x0,%eax > 6c9c1246: cwtl > 6c9c1247: add$0x8,%ecx > 6c9c124a: mov%eax,(%esp) > 6c9c124d: call 6c9c1210 <__ZN7my_class9my_methodEs> > 6c9c1252: sub$0x4,%esp > 6c9c1255: add$0x1c,%esp > 6c9c1258: ret > 6c9c1259: nop > 6c9c125a: lea0x0(%esi),%esi > > The problem manifests itself, when the instruction pointer is inside " > __ZN7my_class9my_methodEs" (called at 0x6c9c124d). > > In order to perform the stack trace, I use the DWARF frame information for > calculating the previous instruction pointer. This is done by assuming the > return address is the instruction pointer of the previous frame. This is > obviously not entirely correct, since the return address points to a location > AFTER the previous call. Nevertheless, this assumption seems to be standard > for other stack tracers. > > I am having a problem with this though: > The address where I start is 0x6c9c121e. Frame information tells me the > following: > > 0144 001c FDE cie= pc=6c9c1210...6c9c1230 > DW_CFA_advance_loc4: 3 > DW_CFA_def_cfa_offset: +32 > DW_CFA_advance_loc4: 26 > DW_CFA_def_cfa_offset: +4 > DW_CFA_nop: > DW_CFA_nop: > > So the CFA offset is 32. There I find the next return address 0x6c9c124d. > Frame information tells me the following: > > 0164 0028 FDE cie= pc=6c9c1230...6c9c1259 > DW_CFA_advance_loc4: 3 > DW_CFA_def_cfa_offset: +32 > DW_CFA_advance_loc4: 31 > DW_CFA_def_cfa_offset: +28 > DW_CFA_advance_loc4: 3 > DW_CFA_def_
Re: [Dwarf-Discuss] modeling different address spaces
A compiler may promote part of a variable to a scratch pad memory address space. Thanks, -Tony Tye -Original Message- From: Michael Eager On 7/30/20 5:17 PM, Tye, Tony via Dwarf-Discuss wrote: > For optimized code involving multiple address spaces it is possible to > run into cases where the location of a source language variable > requires multiple address spaces. For example, a source variable may > be optimized and different pieces may be in different places including > memory of multiple address spaces, registers, etc. Can you explain this more? DWARF handles the situation where part of a variable is in memory and part in a register or in multiple registers. When would you have a variable which was in multiple address spaces? -- Michael Eager ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] modeling different address spaces
Hello Michael, Sorry for the late reply. I found the email in the spam folder today. > >>> We'd also want an unbounded piece operator to describe partially > registerized > >>> unbounded arrays, but I have not worked that out in detail, yet, and > >>> we're a > bit > >>> farther away from an implementation. > >> > >> Can you describe this more? > > > > Consider a large array kept in memory and a for loop iterating over the > > array. > If that > > loop gets vectorized, compilers would load a portion of the array into > > registers > at the > > beginning of the loop body, operate on the registers, and write them back at > the end > > of the loop body. > > > > The entire array can be split into three pieces: > > - elements that have already been processed: in memory > > - elements that are currently being processed: in registers > > - elements that will be processed in future iterations: in memory > > > > For unbounded arrays, the size of the third piece is not known. > > When would you need to know the third piece? > > How is this different from a non-vector processor doing an optimized > string operation, loading 4 characters into a register at a time? If > the string is nul-terminated, the string length might be unknown. I don't think that this is different from the use-case I sketched above. We wouldn't even need to load a sequence of elements. Loading a single element would produce the same scenario. Can this be described in DWARF today? Markus. Intel Deutschland GmbH Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany Tel: +49 89 99 8853-0, www.intel.de Managing Directors: Christin Eisenschmid, Gary Kershaw Chairperson of the Supervisory Board: Nicole Lau Registered Office: Munich Commercial Register: Amtsgericht Muenchen HRB 186928 ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
Re: [Dwarf-Discuss] modeling different address spaces
Hi - > Can you explain this more? > > DWARF handles the situation where part of a variable is in memory and part > in a register or in multiple registers. When would you have a variable > which was in multiple address spaces? Remember the "how can a debugger WRITE safely to variables" discussion last year? A variable may reside in some unusual memory segment, AND may have been loaded into a register, AND maybe even spilled to the normal stack temporarily. Could be three different address spaces valid for the same variable at the same PC address. (And a debugger that needs to update the value would need to find them all.) - FChE ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
[Dwarf-Discuss] Stack tracing and return addresses off by 1
Hello together! I am running into a problem while performing a stack trace of x86 code. The assembly code I am running has been generated by mingw from C++ code and looks like this: 6c9c1210 <__ZN7my_class9my_methodEs>: 6c9c1210: sub$0x1c,%esp 6c9c1213: mov0x20(%esp),%edx 6c9c1217: mov%edx,%eax 6c9c1219: test %dx,%dx 6c9c121c: jle6c9c1221 <__ZN7my_class9my_methodEs+0x11> 6c9c121e: lea0x1(%edx),%eax 6c9c1221: cwtl 6c9c1222: mov%eax,(%esp) 6c9c1225: call 6c9c1190 <__Z12my_dummy_functions> 6c9c122a: add$0x1c,%esp 6c9c122d: ret$0x4 6c9c1230 <__ZN8my_struct9my_methodEv>: 6c9c1230: sub$0x1c,%esp 6c9c1233: movzwl 0x4(%ecx),%eax 6c9c1237: test %ax,%ax 6c9c123a: jle6c9c1241 <__ZN8my_struct9my_methodEv+0x11> 6c9c123c: add$0x1,%eax 6c9c123f: jmp6c9c1246 <__ZN8my_struct9my_methodEv+0x16> 6c9c1241: mov$0x0,%eax 6c9c1246: cwtl 6c9c1247: add$0x8,%ecx 6c9c124a: mov%eax,(%esp) 6c9c124d: call 6c9c1210 <__ZN7my_class9my_methodEs> 6c9c1252: sub$0x4,%esp 6c9c1255: add$0x1c,%esp 6c9c1258: ret 6c9c1259: nop 6c9c125a: lea0x0(%esi),%esi The problem manifests itself, when the instruction pointer is inside " __ZN7my_class9my_methodEs" (called at 0x6c9c124d). In order to perform the stack trace, I use the DWARF frame information for calculating the previous instruction pointer. This is done by assuming the return address is the instruction pointer of the previous frame. This is obviously not entirely correct, since the return address points to a location AFTER the previous call. Nevertheless, this assumption seems to be standard for other stack tracers. I am having a problem with this though: The address where I start is 0x6c9c121e. Frame information tells me the following: 0144 001c FDE cie= pc=6c9c1210...6c9c1230 DW_CFA_advance_loc4: 3 DW_CFA_def_cfa_offset: +32 DW_CFA_advance_loc4: 26 DW_CFA_def_cfa_offset: +4 DW_CFA_nop: DW_CFA_nop: So the CFA offset is 32. There I find the next return address 0x6c9c124d. Frame information tells me the following: 0164 0028 FDE cie= pc=6c9c1230...6c9c1259 DW_CFA_advance_loc4: 3 DW_CFA_def_cfa_offset: +32 DW_CFA_advance_loc4: 31 DW_CFA_def_cfa_offset: +28 DW_CFA_advance_loc4: 3 DW_CFA_def_cfa_offset: +32 DW_CFA_advance_loc4: 3 DW_CFA_def_cfa_offset: +4 And here the problem arises. Due to 0x6c9c124d being the return address, the CFA offset I read is invalid. By assuming an instruction pointer of 0x6c9c124d I also assume that the "ret $0x4" instruction from 0x6c9c122d has been executed and the stack is 4 bytes shorter. This is however not the case. The return has not been executed yet. So my question here is, how shall the stack tracer solve this issue? My first Idea is to decrement the instruction pointer when looking through the frame information (except for the deepest frame, where the instruction pointer is correct). Would that be an approach that works always? How do other consumers solve this issue? Best regards Jayvee ___ Dwarf-Discuss mailing list Dwarf-Discuss@lists.dwarfstd.org http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org