Re: [PATCH 7/7] DWARF: add the config option
* h...@zytor.comwrote: > This assumes that it actually ends up being feasible for objtool to do so. Yes, agreed, that's a big precondition. I'm cautiously optimistic based on Josh's experiments that he posted about in this thread. Thanks, Ingo
Re: [PATCH 7/7] DWARF: add the config option
* h...@zytor.com wrote: > This assumes that it actually ends up being feasible for objtool to do so. Yes, agreed, that's a big precondition. I'm cautiously optimistic based on Josh's experiments that he posted about in this thread. Thanks, Ingo
Re: [PATCH 7/7] DWARF: add the config option
On May 22, 2017 10:49:06 PM PDT, Ingo Molnarwrote: > >* H. Peter Anvin wrote: > >> On 05/22/17 04:12, Ingo Molnar wrote: >> \>> >> >> This construct might be useful for other arches, which is why I >called >> >> it "FP" instead of "BP". But then I ruined that with the last 3 >:-) >> > >> > Please call it BP - 'FP' can easily be read as floating-point, >making it all >> > super-confusing. We should use canonical x86 register names and >ordering - even >> > if not all registers are used straight away. >> > >> >> Seriously, I suspect that at the end of the day we will have >reinvented >> DWARF. > >Absolutely - the main difference is: > >- the debug-info implementation is _internal_ to the kernel so it can >be fixed >instead of "oh, wait 2 years for the toolchain to fix this particular >bug, work >it around in the kernel meanwhile" kind of crazy flow and dependencies. >I.e. >the debug-info generation and parsing code is both part of the kernel >Git tree > and can be iterated (and fixed) at once with. > >- the debug-info is auto-generated for assembly as well, leaving >assembly code > maintainable. > >- the debug-info has a sane data structure designed for robustness and > compactness > >So even if it's a subset of the existing complexity of dwarf et al we >are still >literally infinitely better off with this model. > >Thanks, > > Ingo This assumes that it actually ends up being feasible for objtool to do so. It is worth noting that using DWARF for unwinding vs auto-generating the unwind information are independent issues. Another option is to use (or postprocess) the compiler-generated DWARF for C modules and pursue autogeneration only for assembly modules, which ought to be a much easier problem and is less dependent on discovering new compiler-generated patterns. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH 7/7] DWARF: add the config option
On May 22, 2017 10:49:06 PM PDT, Ingo Molnar wrote: > >* H. Peter Anvin wrote: > >> On 05/22/17 04:12, Ingo Molnar wrote: >> \>> >> >> This construct might be useful for other arches, which is why I >called >> >> it "FP" instead of "BP". But then I ruined that with the last 3 >:-) >> > >> > Please call it BP - 'FP' can easily be read as floating-point, >making it all >> > super-confusing. We should use canonical x86 register names and >ordering - even >> > if not all registers are used straight away. >> > >> >> Seriously, I suspect that at the end of the day we will have >reinvented >> DWARF. > >Absolutely - the main difference is: > >- the debug-info implementation is _internal_ to the kernel so it can >be fixed >instead of "oh, wait 2 years for the toolchain to fix this particular >bug, work >it around in the kernel meanwhile" kind of crazy flow and dependencies. >I.e. >the debug-info generation and parsing code is both part of the kernel >Git tree > and can be iterated (and fixed) at once with. > >- the debug-info is auto-generated for assembly as well, leaving >assembly code > maintainable. > >- the debug-info has a sane data structure designed for robustness and > compactness > >So even if it's a subset of the existing complexity of dwarf et al we >are still >literally infinitely better off with this model. > >Thanks, > > Ingo This assumes that it actually ends up being feasible for objtool to do so. It is worth noting that using DWARF for unwinding vs auto-generating the unwind information are independent issues. Another option is to use (or postprocess) the compiler-generated DWARF for C modules and pursue autogeneration only for assembly modules, which ought to be a much easier problem and is less dependent on discovering new compiler-generated patterns. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 26, 2017 at 01:29:01PM +0200, Jiri Slaby wrote: > On 05/26/2017, 08:54 AM, Jiri Slaby wrote: > > On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote: > >> > >> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c > > > > JFYI, it crashes in sha1_transform_avx due to crypto changes. You > > perhaps missed that this beast uses ebp (not rbp) register for > > computations. I had to do: > > > > --- a/arch/x86/crypto/sha1_ssse3_asm.S > > +++ b/arch/x86/crypto/sha1_ssse3_asm.S > > @@ -37,7 +37,7 @@ > > #define REG_A %ecx > > #define REG_B %esi > > #define REG_C %edi > > -#define REG_D %ebp > > +#define REG_D %r12d > > #define REG_E %edx > > > > #define REG_T1 %eax > > @@ -74,6 +74,7 @@ > > SYM_FUNC_START(\name) > > > > push%rbx > > + push%r12 > > push%rbp > > > > mov %rsp, %rbp > > @@ -99,6 +100,7 @@ > > rep stosq > > > > leaveq # deallocate workspace > > + pop %r12 > > pop %rbx > > ret > > > > > > I am afraid there are more of these, e.g. in aesni-intel_asm.S. > > aesni-intel_asm.S is OK -- only untouched x86_32 part uses ebp. > > But sha1_avx2_x86_64_asm.S is not. They use *all* usable registers > including ebp in the computations hidden behind the > SHA1_PIPELINED_MAIN_BODY macro. The only work around I can see is to > push rbp/pop rbp around the computation as it used to do with rbx: > > --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S > +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S > @@ -636,6 +636,7 @@ _loop3: > /* Align stack */ > mov %rsp, %rbp > and $~(0x20-1), %rsp > + push%rbp > sub $RESERVE_STACK, %rsp > > avx2_zeroupper > @@ -661,6 +662,7 @@ _loop3: > avx2_zeroupper > > add $RESERVE_STACK, %rsp > + pop %rbp > > leaveq > pop %r15 Thanks, the first fix looks good. Is the second one needed though? It already pushes rbp before it aligns the stack. DWARF/undwarf will be immune to these issues, so I'll be moving a lot of these crypto changes to a separate branch. They were only in this branch because the new-and-improved objtool can now find rbp misusage in leaf functions. It seems that most of the crypto code is frame pointer ignorant. IMO, leaf functions shouldn't be allowed to use rbp because it breaks frame pointers when preempted by an interrupt. GCC seems to agree. I added a check to objtool to find the ones which use rbp badly. Here are the ones I see with my config: arch/x86/crypto/des3_ede-asm_64.o: warning: objtool: des3_ede_x86_64_crypt_blk uses BP as a scratch register arch/x86/crypto/des3_ede-asm_64.o: warning: objtool: des3_ede_x86_64_crypt_blk_3way uses BP as a scratch register arch/x86/crypto/blowfish-x86_64-asm_64.o: warning: objtool: __blowfish_enc_blk_4way uses BP as a scratch register arch/x86/crypto/blowfish-x86_64-asm_64.o: warning: objtool: blowfish_dec_blk_4way uses BP as a scratch register arch/x86/crypto/twofish-x86_64-asm_64-3way.o: warning: objtool: __twofish_enc_blk_3way uses BP as a scratch register arch/x86/crypto/twofish-x86_64-asm_64-3way.o: warning: objtool: twofish_dec_blk_3way uses BP as a scratch register arch/x86/crypto/sha256-avx2-asm.o: warning: objtool: sha256_transform_rorx uses BP as a scratch register arch/x86/crypto/cast5-avx-x86_64-asm_64.o: warning: objtool: __cast5_enc_blk16 uses BP as a scratch register arch/x86/crypto/cast5-avx-x86_64-asm_64.o: warning: objtool: __cast5_dec_blk16 uses BP as a scratch register arch/x86/crypto/cast6-avx-x86_64-asm_64.o: warning: objtool: __cast6_enc_blk8 uses BP as a scratch register arch/x86/crypto/cast6-avx-x86_64-asm_64.o: warning: objtool: __cast6_dec_blk8 uses BP as a scratch register arch/x86/crypto/twofish-avx-x86_64-asm_64.o: warning: objtool: __twofish_enc_blk8 uses BP as a scratch register arch/x86/crypto/twofish-avx-x86_64-asm_64.o: warning: objtool: __twofish_dec_blk8 uses BP as a scratch register (And that doesn't include the ones which misuse ebp.) It may be a challenge to fix some of those which use all available registers. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 26, 2017 at 01:29:01PM +0200, Jiri Slaby wrote: > On 05/26/2017, 08:54 AM, Jiri Slaby wrote: > > On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote: > >> > >> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c > > > > JFYI, it crashes in sha1_transform_avx due to crypto changes. You > > perhaps missed that this beast uses ebp (not rbp) register for > > computations. I had to do: > > > > --- a/arch/x86/crypto/sha1_ssse3_asm.S > > +++ b/arch/x86/crypto/sha1_ssse3_asm.S > > @@ -37,7 +37,7 @@ > > #define REG_A %ecx > > #define REG_B %esi > > #define REG_C %edi > > -#define REG_D %ebp > > +#define REG_D %r12d > > #define REG_E %edx > > > > #define REG_T1 %eax > > @@ -74,6 +74,7 @@ > > SYM_FUNC_START(\name) > > > > push%rbx > > + push%r12 > > push%rbp > > > > mov %rsp, %rbp > > @@ -99,6 +100,7 @@ > > rep stosq > > > > leaveq # deallocate workspace > > + pop %r12 > > pop %rbx > > ret > > > > > > I am afraid there are more of these, e.g. in aesni-intel_asm.S. > > aesni-intel_asm.S is OK -- only untouched x86_32 part uses ebp. > > But sha1_avx2_x86_64_asm.S is not. They use *all* usable registers > including ebp in the computations hidden behind the > SHA1_PIPELINED_MAIN_BODY macro. The only work around I can see is to > push rbp/pop rbp around the computation as it used to do with rbx: > > --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S > +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S > @@ -636,6 +636,7 @@ _loop3: > /* Align stack */ > mov %rsp, %rbp > and $~(0x20-1), %rsp > + push%rbp > sub $RESERVE_STACK, %rsp > > avx2_zeroupper > @@ -661,6 +662,7 @@ _loop3: > avx2_zeroupper > > add $RESERVE_STACK, %rsp > + pop %rbp > > leaveq > pop %r15 Thanks, the first fix looks good. Is the second one needed though? It already pushes rbp before it aligns the stack. DWARF/undwarf will be immune to these issues, so I'll be moving a lot of these crypto changes to a separate branch. They were only in this branch because the new-and-improved objtool can now find rbp misusage in leaf functions. It seems that most of the crypto code is frame pointer ignorant. IMO, leaf functions shouldn't be allowed to use rbp because it breaks frame pointers when preempted by an interrupt. GCC seems to agree. I added a check to objtool to find the ones which use rbp badly. Here are the ones I see with my config: arch/x86/crypto/des3_ede-asm_64.o: warning: objtool: des3_ede_x86_64_crypt_blk uses BP as a scratch register arch/x86/crypto/des3_ede-asm_64.o: warning: objtool: des3_ede_x86_64_crypt_blk_3way uses BP as a scratch register arch/x86/crypto/blowfish-x86_64-asm_64.o: warning: objtool: __blowfish_enc_blk_4way uses BP as a scratch register arch/x86/crypto/blowfish-x86_64-asm_64.o: warning: objtool: blowfish_dec_blk_4way uses BP as a scratch register arch/x86/crypto/twofish-x86_64-asm_64-3way.o: warning: objtool: __twofish_enc_blk_3way uses BP as a scratch register arch/x86/crypto/twofish-x86_64-asm_64-3way.o: warning: objtool: twofish_dec_blk_3way uses BP as a scratch register arch/x86/crypto/sha256-avx2-asm.o: warning: objtool: sha256_transform_rorx uses BP as a scratch register arch/x86/crypto/cast5-avx-x86_64-asm_64.o: warning: objtool: __cast5_enc_blk16 uses BP as a scratch register arch/x86/crypto/cast5-avx-x86_64-asm_64.o: warning: objtool: __cast5_dec_blk16 uses BP as a scratch register arch/x86/crypto/cast6-avx-x86_64-asm_64.o: warning: objtool: __cast6_enc_blk8 uses BP as a scratch register arch/x86/crypto/cast6-avx-x86_64-asm_64.o: warning: objtool: __cast6_dec_blk8 uses BP as a scratch register arch/x86/crypto/twofish-avx-x86_64-asm_64.o: warning: objtool: __twofish_enc_blk8 uses BP as a scratch register arch/x86/crypto/twofish-avx-x86_64-asm_64.o: warning: objtool: __twofish_dec_blk8 uses BP as a scratch register (And that doesn't include the ones which misuse ebp.) It may be a challenge to fix some of those which use all available registers. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On 05/26/2017, 08:54 AM, Jiri Slaby wrote: > On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote: >> >> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c > > JFYI, it crashes in sha1_transform_avx due to crypto changes. You > perhaps missed that this beast uses ebp (not rbp) register for > computations. I had to do: > > --- a/arch/x86/crypto/sha1_ssse3_asm.S > +++ b/arch/x86/crypto/sha1_ssse3_asm.S > @@ -37,7 +37,7 @@ > #define REG_A %ecx > #define REG_B %esi > #define REG_C %edi > -#define REG_D %ebp > +#define REG_D %r12d > #define REG_E %edx > > #define REG_T1 %eax > @@ -74,6 +74,7 @@ > SYM_FUNC_START(\name) > > push%rbx > + push%r12 > push%rbp > > mov %rsp, %rbp > @@ -99,6 +100,7 @@ > rep stosq > > leaveq # deallocate workspace > + pop %r12 > pop %rbx > ret > > > I am afraid there are more of these, e.g. in aesni-intel_asm.S. aesni-intel_asm.S is OK -- only untouched x86_32 part uses ebp. But sha1_avx2_x86_64_asm.S is not. They use *all* usable registers including ebp in the computations hidden behind the SHA1_PIPELINED_MAIN_BODY macro. The only work around I can see is to push rbp/pop rbp around the computation as it used to do with rbx: --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S @@ -636,6 +636,7 @@ _loop3: /* Align stack */ mov %rsp, %rbp and $~(0x20-1), %rsp + push%rbp sub $RESERVE_STACK, %rsp avx2_zeroupper @@ -661,6 +662,7 @@ _loop3: avx2_zeroupper add $RESERVE_STACK, %rsp + pop %rbp leaveq pop %r15 regards, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/26/2017, 08:54 AM, Jiri Slaby wrote: > On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote: >> >> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c > > JFYI, it crashes in sha1_transform_avx due to crypto changes. You > perhaps missed that this beast uses ebp (not rbp) register for > computations. I had to do: > > --- a/arch/x86/crypto/sha1_ssse3_asm.S > +++ b/arch/x86/crypto/sha1_ssse3_asm.S > @@ -37,7 +37,7 @@ > #define REG_A %ecx > #define REG_B %esi > #define REG_C %edi > -#define REG_D %ebp > +#define REG_D %r12d > #define REG_E %edx > > #define REG_T1 %eax > @@ -74,6 +74,7 @@ > SYM_FUNC_START(\name) > > push%rbx > + push%r12 > push%rbp > > mov %rsp, %rbp > @@ -99,6 +100,7 @@ > rep stosq > > leaveq # deallocate workspace > + pop %r12 > pop %rbx > ret > > > I am afraid there are more of these, e.g. in aesni-intel_asm.S. aesni-intel_asm.S is OK -- only untouched x86_32 part uses ebp. But sha1_avx2_x86_64_asm.S is not. They use *all* usable registers including ebp in the computations hidden behind the SHA1_PIPELINED_MAIN_BODY macro. The only work around I can see is to push rbp/pop rbp around the computation as it used to do with rbx: --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S @@ -636,6 +636,7 @@ _loop3: /* Align stack */ mov %rsp, %rbp and $~(0x20-1), %rsp + push%rbp sub $RESERVE_STACK, %rsp avx2_zeroupper @@ -661,6 +662,7 @@ _loop3: avx2_zeroupper add $RESERVE_STACK, %rsp + pop %rbp leaveq pop %r15 regards, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote: > > https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c JFYI, it crashes in sha1_transform_avx due to crypto changes. You perhaps missed that this beast uses ebp (not rbp) register for computations. I had to do: --- a/arch/x86/crypto/sha1_ssse3_asm.S +++ b/arch/x86/crypto/sha1_ssse3_asm.S @@ -37,7 +37,7 @@ #define REG_A %ecx #define REG_B %esi #define REG_C %edi -#define REG_D %ebp +#define REG_D %r12d #define REG_E %edx #define REG_T1 %eax @@ -74,6 +74,7 @@ SYM_FUNC_START(\name) push%rbx + push%r12 push%rbp mov %rsp, %rbp @@ -99,6 +100,7 @@ rep stosq leaveq # deallocate workspace + pop %r12 pop %rbx ret I am afraid there are more of these, e.g. in aesni-intel_asm.S. thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote: > > https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c JFYI, it crashes in sha1_transform_avx due to crypto changes. You perhaps missed that this beast uses ebp (not rbp) register for computations. I had to do: --- a/arch/x86/crypto/sha1_ssse3_asm.S +++ b/arch/x86/crypto/sha1_ssse3_asm.S @@ -37,7 +37,7 @@ #define REG_A %ecx #define REG_B %esi #define REG_C %edi -#define REG_D %ebp +#define REG_D %r12d #define REG_E %edx #define REG_T1 %eax @@ -74,6 +74,7 @@ SYM_FUNC_START(\name) push%rbx + push%r12 push%rbp mov %rsp, %rbp @@ -99,6 +100,7 @@ rep stosq leaveq # deallocate workspace + pop %r12 pop %rbx ret I am afraid there are more of these, e.g. in aesni-intel_asm.S. thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
* Peter Zijlstrawrote: > On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote: > > > But it does hurt, in the sense that the complicated format of DWARF CFI > > > means the unwinder has to jump through a lot more hoops to read it. > > > > Why that matters, actually? Unwinder is nothing to be performance > > oriented. And if somebody is doing a lot of unwinding during runtime, > > they can switch to in-this-case-faster FP unwinder. > > perf (and ftrace) like the unwinder to be considered performance > oriented. Yes, and given how critical debugging is there's a kind of useful synergy here: overall the perf unwinder is run about 10 orders of magnitude more often than the debugging unwinder, so it's a very big help in shaking out unwinder/debug-info bugs and increasing robustness overall. The 'price' for using the unwinder in perf is that it has to be fast. Thanks, Ingo
Re: [PATCH 7/7] DWARF: add the config option
* Peter Zijlstra wrote: > On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote: > > > But it does hurt, in the sense that the complicated format of DWARF CFI > > > means the unwinder has to jump through a lot more hoops to read it. > > > > Why that matters, actually? Unwinder is nothing to be performance > > oriented. And if somebody is doing a lot of unwinding during runtime, > > they can switch to in-this-case-faster FP unwinder. > > perf (and ftrace) like the unwinder to be considered performance > oriented. Yes, and given how critical debugging is there's a kind of useful synergy here: overall the perf unwinder is run about 10 orders of magnitude more often than the debugging unwinder, so it's a very big help in shaking out unwinder/debug-info bugs and increasing robustness overall. The 'price' for using the unwinder in perf is that it has to be fast. Thanks, Ingo
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote: > > But it does hurt, in the sense that the complicated format of DWARF CFI > > means the unwinder has to jump through a lot more hoops to read it. > > Why that matters, actually? Unwinder is nothing to be performance > oriented. And if somebody is doing a lot of unwinding during runtime, > they can switch to in-this-case-faster FP unwinder. perf (and ftrace) like the unwinder to be considered performance oriented.
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote: > > But it does hurt, in the sense that the complicated format of DWARF CFI > > means the unwinder has to jump through a lot more hoops to read it. > > Why that matters, actually? Unwinder is nothing to be performance > oriented. And if somebody is doing a lot of unwinding during runtime, > they can switch to in-this-case-faster FP unwinder. perf (and ftrace) like the unwinder to be considered performance oriented.
Re: [PATCH 7/7] DWARF: add the config option
* H. Peter Anvinwrote: > On 05/22/17 04:12, Ingo Molnar wrote: > \>> > >> This construct might be useful for other arches, which is why I called > >> it "FP" instead of "BP". But then I ruined that with the last 3 :-) > > > > Please call it BP - 'FP' can easily be read as floating-point, making it > > all > > super-confusing. We should use canonical x86 register names and ordering - > > even > > if not all registers are used straight away. > > > > Seriously, I suspect that at the end of the day we will have reinvented > DWARF. Absolutely - the main difference is: - the debug-info implementation is _internal_ to the kernel so it can be fixed instead of "oh, wait 2 years for the toolchain to fix this particular bug, work it around in the kernel meanwhile" kind of crazy flow and dependencies. I.e. the debug-info generation and parsing code is both part of the kernel Git tree and can be iterated (and fixed) at once with. - the debug-info is auto-generated for assembly as well, leaving assembly code maintainable. - the debug-info has a sane data structure designed for robustness and compactness So even if it's a subset of the existing complexity of dwarf et al we are still literally infinitely better off with this model. Thanks, Ingo
Re: [PATCH 7/7] DWARF: add the config option
* H. Peter Anvin wrote: > On 05/22/17 04:12, Ingo Molnar wrote: > \>> > >> This construct might be useful for other arches, which is why I called > >> it "FP" instead of "BP". But then I ruined that with the last 3 :-) > > > > Please call it BP - 'FP' can easily be read as floating-point, making it > > all > > super-confusing. We should use canonical x86 register names and ordering - > > even > > if not all registers are used straight away. > > > > Seriously, I suspect that at the end of the day we will have reinvented > DWARF. Absolutely - the main difference is: - the debug-info implementation is _internal_ to the kernel so it can be fixed instead of "oh, wait 2 years for the toolchain to fix this particular bug, work it around in the kernel meanwhile" kind of crazy flow and dependencies. I.e. the debug-info generation and parsing code is both part of the kernel Git tree and can be iterated (and fixed) at once with. - the debug-info is auto-generated for assembly as well, leaving assembly code maintainable. - the debug-info has a sane data structure designed for robustness and compactness So even if it's a subset of the existing complexity of dwarf et al we are still literally infinitely better off with this model. Thanks, Ingo
Re: [PATCH 7/7] DWARF: add the config option
On Mon, 22 May 2017, H. Peter Anvin wrote: > > Please call it BP - 'FP' can easily be read as floating-point, making > > it all super-confusing. We should use canonical x86 register names and > > ordering - even if not all registers are used straight away. > > Seriously, I suspect that at the end of the day we will have reinvented > DWARF. If this is the fear, what is the proposal then? We simply have to deal with !FRAME_POINTER builds. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH 7/7] DWARF: add the config option
On Mon, 22 May 2017, H. Peter Anvin wrote: > > Please call it BP - 'FP' can easily be read as floating-point, making > > it all super-confusing. We should use canonical x86 register names and > > ordering - even if not all registers are used straight away. > > Seriously, I suspect that at the end of the day we will have reinvented > DWARF. If this is the fear, what is the proposal then? We simply have to deal with !FRAME_POINTER builds. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 22, 2017 at 02:37:50PM -0700, H. Peter Anvin wrote: > On 05/22/17 14:07, H. Peter Anvin wrote: > > On 05/20/17 13:01, H.J. Lu wrote: > >> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf> >> wrote: > >> > > (H.J., could we get a binutils feature that allows is to do: > > pushq %whatever > .cfi_adjust_sp -8 > ... > popq %whatever > .cfi_adjust_sp 8 > > >> > >> Np. Compiler needs to generate this. > >> > > > > For actual assembly we have such a feature, it is called macros. > > > > push/pop is the easy stuff; macros take care of that, but the real pain > > is dealing with the flow of control. > > > > My biggest beef with the CFI directives that gas uses is that there is > that .cfi_remember_state/.cfi_restore_state doesn't have a way to > specify more than one state. That makes it really hard to get sanity > around control flow changes, especially with code that is intentionally > out of line. > > That, and some of the CFI directives seem to be a bit ill-defined in > their definition (are they even applicable to anything other than > DWARF?) They almost seem to be referencing some external specification, > but the only thing I'm finding is the DWARF documentation which is > written in very different terms. > > The best description of what a personality routine is I found in an > article by Ian Lance Taylor. It doesn't seem to be applicable to C as > far as I can tell. So my understanding is that there's stock DWARF (.debug_frame) and then there's souped-up DWARF (.eh_frame), which is basically DWARF with a few extensions. The remember/restore state thing is stock DWARF (DW_CFA_remember_state and DW_CFA_restore_state). The personality routine thing is in the .eh_frame extension which is documented here: http://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 22, 2017 at 02:37:50PM -0700, H. Peter Anvin wrote: > On 05/22/17 14:07, H. Peter Anvin wrote: > > On 05/20/17 13:01, H.J. Lu wrote: > >> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf > >> wrote: > >> > > (H.J., could we get a binutils feature that allows is to do: > > pushq %whatever > .cfi_adjust_sp -8 > ... > popq %whatever > .cfi_adjust_sp 8 > > >> > >> Np. Compiler needs to generate this. > >> > > > > For actual assembly we have such a feature, it is called macros. > > > > push/pop is the easy stuff; macros take care of that, but the real pain > > is dealing with the flow of control. > > > > My biggest beef with the CFI directives that gas uses is that there is > that .cfi_remember_state/.cfi_restore_state doesn't have a way to > specify more than one state. That makes it really hard to get sanity > around control flow changes, especially with code that is intentionally > out of line. > > That, and some of the CFI directives seem to be a bit ill-defined in > their definition (are they even applicable to anything other than > DWARF?) They almost seem to be referencing some external specification, > but the only thing I'm finding is the DWARF documentation which is > written in very different terms. > > The best description of what a personality routine is I found in an > article by Ian Lance Taylor. It doesn't seem to be applicable to C as > far as I can tell. So my understanding is that there's stock DWARF (.debug_frame) and then there's souped-up DWARF (.eh_frame), which is basically DWARF with a few extensions. The remember/restore state thing is stock DWARF (DW_CFA_remember_state and DW_CFA_restore_state). The personality routine thing is in the .eh_frame extension which is documented here: http://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On 05/22/17 14:07, H. Peter Anvin wrote: > On 05/20/17 13:01, H.J. Lu wrote: >> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeufwrote: >> (H.J., could we get a binutils feature that allows is to do: pushq %whatever .cfi_adjust_sp -8 ... popq %whatever .cfi_adjust_sp 8 >> >> Np. Compiler needs to generate this. >> > > For actual assembly we have such a feature, it is called macros. > > push/pop is the easy stuff; macros take care of that, but the real pain > is dealing with the flow of control. > My biggest beef with the CFI directives that gas uses is that there is that .cfi_remember_state/.cfi_restore_state doesn't have a way to specify more than one state. That makes it really hard to get sanity around control flow changes, especially with code that is intentionally out of line. That, and some of the CFI directives seem to be a bit ill-defined in their definition (are they even applicable to anything other than DWARF?) They almost seem to be referencing some external specification, but the only thing I'm finding is the DWARF documentation which is written in very different terms. The best description of what a personality routine is I found in an article by Ian Lance Taylor. It doesn't seem to be applicable to C as far as I can tell. -hpa
Re: [PATCH 7/7] DWARF: add the config option
On 05/22/17 14:07, H. Peter Anvin wrote: > On 05/20/17 13:01, H.J. Lu wrote: >> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf wrote: >> (H.J., could we get a binutils feature that allows is to do: pushq %whatever .cfi_adjust_sp -8 ... popq %whatever .cfi_adjust_sp 8 >> >> Np. Compiler needs to generate this. >> > > For actual assembly we have such a feature, it is called macros. > > push/pop is the easy stuff; macros take care of that, but the real pain > is dealing with the flow of control. > My biggest beef with the CFI directives that gas uses is that there is that .cfi_remember_state/.cfi_restore_state doesn't have a way to specify more than one state. That makes it really hard to get sanity around control flow changes, especially with code that is intentionally out of line. That, and some of the CFI directives seem to be a bit ill-defined in their definition (are they even applicable to anything other than DWARF?) They almost seem to be referencing some external specification, but the only thing I'm finding is the DWARF documentation which is written in very different terms. The best description of what a personality routine is I found in an article by Ian Lance Taylor. It doesn't seem to be applicable to C as far as I can tell. -hpa
Re: [PATCH 7/7] DWARF: add the config option
On 05/22/17 04:12, Ingo Molnar wrote: \>> >> This construct might be useful for other arches, which is why I called >> it "FP" instead of "BP". But then I ruined that with the last 3 :-) > > Please call it BP - 'FP' can easily be read as floating-point, making it all > super-confusing. We should use canonical x86 register names and ordering - > even > if not all registers are used straight away. > Seriously, I suspect that at the end of the day we will have reinvented DWARF. -hpa
Re: [PATCH 7/7] DWARF: add the config option
On 05/22/17 04:12, Ingo Molnar wrote: \>> >> This construct might be useful for other arches, which is why I called >> it "FP" instead of "BP". But then I ruined that with the last 3 :-) > > Please call it BP - 'FP' can easily be read as floating-point, making it all > super-confusing. We should use canonical x86 register names and ordering - > even > if not all registers are used straight away. > Seriously, I suspect that at the end of the day we will have reinvented DWARF. -hpa
Re: [PATCH 7/7] DWARF: add the config option
On 05/20/17 13:01, H.J. Lu wrote: > On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeufwrote: > >>> >>> (H.J., could we get a binutils feature that allows is to do: >>> >>> pushq %whatever >>> .cfi_adjust_sp -8 >>> ... >>> popq %whatever >>> .cfi_adjust_sp 8 >>> > > Np. Compiler needs to generate this. > For actual assembly we have such a feature, it is called macros. push/pop is the easy stuff; macros take care of that, but the real pain is dealing with the flow of control. -hpa
Re: [PATCH 7/7] DWARF: add the config option
On 05/20/17 13:01, H.J. Lu wrote: > On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf wrote: > >>> >>> (H.J., could we get a binutils feature that allows is to do: >>> >>> pushq %whatever >>> .cfi_adjust_sp -8 >>> ... >>> popq %whatever >>> .cfi_adjust_sp 8 >>> > > Np. Compiler needs to generate this. > For actual assembly we have such a feature, it is called macros. push/pop is the easy stuff; macros take care of that, but the real pain is dealing with the flow of control. -hpa
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 22, 2017 at 4:34 AM, Jiri Kosinawrote: > On Sat, 20 May 2017, H.J. Lu wrote: > >> pushq %whatever >> .cfi_adjust_sp -8 >> ... >> popq %whatever >> .cfi_adjust_sp 8 >> >> >> >> >> Np. Compiler needs to generate this. >> >> >> > >> > How would the compiler generate this when inline asm is involved? For >> > the kernel, objtool could get around the need to have these >> > annotations, but not so much for user code? Is the compiler supposed >> > to parse the inline asm? Would the compiler provide some magic % code >> > to represent the current CFA base register? >> >> Here is one example of inline asm with call frame info: >> >> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD > > That brings us basically pretty close to square one though; having to > maintain "manual" anotations. Something we're pretty much trying to avoid > through this excercise. Assembler only encodes instructions. You need to a different tool to figure out what an instruction does. -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 22, 2017 at 4:34 AM, Jiri Kosina wrote: > On Sat, 20 May 2017, H.J. Lu wrote: > >> pushq %whatever >> .cfi_adjust_sp -8 >> ... >> popq %whatever >> .cfi_adjust_sp 8 >> >> >> >> >> Np. Compiler needs to generate this. >> >> >> > >> > How would the compiler generate this when inline asm is involved? For >> > the kernel, objtool could get around the need to have these >> > annotations, but not so much for user code? Is the compiler supposed >> > to parse the inline asm? Would the compiler provide some magic % code >> > to represent the current CFA base register? >> >> Here is one example of inline asm with call frame info: >> >> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD > > That brings us basically pretty close to square one though; having to > maintain "manual" anotations. Something we're pretty much trying to avoid > through this excercise. Assembler only encodes instructions. You need to a different tool to figure out what an instruction does. -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Sat, 20 May 2017, H.J. Lu wrote: > pushq %whatever > .cfi_adjust_sp -8 > ... > popq %whatever > .cfi_adjust_sp 8 > > >> > >> Np. Compiler needs to generate this. > >> > > > > How would the compiler generate this when inline asm is involved? For > > the kernel, objtool could get around the need to have these > > annotations, but not so much for user code? Is the compiler supposed > > to parse the inline asm? Would the compiler provide some magic % code > > to represent the current CFA base register? > > Here is one example of inline asm with call frame info: > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD That brings us basically pretty close to square one though; having to maintain "manual" anotations. Something we're pretty much trying to avoid through this excercise. -- Jiri Kosina SUSE Labs
Re: [PATCH 7/7] DWARF: add the config option
On Sat, 20 May 2017, H.J. Lu wrote: > pushq %whatever > .cfi_adjust_sp -8 > ... > popq %whatever > .cfi_adjust_sp 8 > > >> > >> Np. Compiler needs to generate this. > >> > > > > How would the compiler generate this when inline asm is involved? For > > the kernel, objtool could get around the need to have these > > annotations, but not so much for user code? Is the compiler supposed > > to parse the inline asm? Would the compiler provide some magic % code > > to represent the current CFA base register? > > Here is one example of inline asm with call frame info: > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD That brings us basically pretty close to square one though; having to maintain "manual" anotations. Something we're pretty much trying to avoid through this excercise. -- Jiri Kosina SUSE Labs
Re: [PATCH 7/7] DWARF: add the config option
* Josh Poimboeufwrote: > > How are you handling control flow? > > Control flow of what? > > > > Here's the struct in its current state: > > > > > > #define UNDWARF_REG_UNDEFINED 0 > > > #define UNDWARF_REG_CFA 1 > > > #define UNDWARF_REG_SP 2 > > > #define UNDWARF_REG_FP 3 > > > #define UNDWARF_REG_SP_INDIRECT 4 > > > #define UNDWARF_REG_FP_INDIRECT 5 > > > #define UNDWARF_REG_R10 6 > > > #define UNDWARF_REG_DI 7 > > > #define UNDWARF_REG_DX 8 > > > > > > > Why only those registers? Also, if you have the option I would really > > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, > > si, di, r8-r15 in that order.) > > Those are the only registers which are ever needed as the base for > finding the previous stack frame. 99% of the time it's sp or bp, the > other registers are needed for aligned stacks and entry code. > > Using the actual register numbers isn't an option because I don't need > them all and they need to fit in a small number of bits. > > This construct might be useful for other arches, which is why I called > it "FP" instead of "BP". But then I ruined that with the last 3 :-) Please call it BP - 'FP' can easily be read as floating-point, making it all super-confusing. We should use canonical x86 register names and ordering - even if not all registers are used straight away. Thanks, Ingo
Re: [PATCH 7/7] DWARF: add the config option
* Josh Poimboeuf wrote: > > How are you handling control flow? > > Control flow of what? > > > > Here's the struct in its current state: > > > > > > #define UNDWARF_REG_UNDEFINED 0 > > > #define UNDWARF_REG_CFA 1 > > > #define UNDWARF_REG_SP 2 > > > #define UNDWARF_REG_FP 3 > > > #define UNDWARF_REG_SP_INDIRECT 4 > > > #define UNDWARF_REG_FP_INDIRECT 5 > > > #define UNDWARF_REG_R10 6 > > > #define UNDWARF_REG_DI 7 > > > #define UNDWARF_REG_DX 8 > > > > > > > Why only those registers? Also, if you have the option I would really > > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, > > si, di, r8-r15 in that order.) > > Those are the only registers which are ever needed as the base for > finding the previous stack frame. 99% of the time it's sp or bp, the > other registers are needed for aligned stacks and entry code. > > Using the actual register numbers isn't an option because I don't need > them all and they need to fit in a small number of bits. > > This construct might be useful for other arches, which is why I called > it "FP" instead of "BP". But then I ruined that with the last 3 :-) Please call it BP - 'FP' can easily be read as floating-point, making it all super-confusing. We should use canonical x86 register names and ordering - even if not all registers are used straight away. Thanks, Ingo
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 4:00 PM, Linus Torvaldswrote: > > hjl already posted an example of the kinds of horrors glibc does to do > things "right". Side note: we'd hopefully/presumably never need anything _that_ disgusting for the kernel, so hjl's example is probably an extreme one. But even when we just did the pushq/popq_cfi macros etc to try to have simple and reasonably legible annotations for the common cases, it got pretty ugly. It wasn't that extreme glibc kind of "50 lines of ugly for two instructions of code", but it was pretty bad. And as far as I know we never even tried to annotate places where we did "pushf/pop %reg" in inline asm (for saving/restoring flags) Linus
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 4:00 PM, Linus Torvalds wrote: > > hjl already posted an example of the kinds of horrors glibc does to do > things "right". Side note: we'd hopefully/presumably never need anything _that_ disgusting for the kernel, so hjl's example is probably an extreme one. But even when we just did the pushq/popq_cfi macros etc to try to have simple and reasonably legible annotations for the common cases, it got pretty ugly. It wasn't that extreme glibc kind of "50 lines of ugly for two instructions of code", but it was pretty bad. And as far as I know we never even tried to annotate places where we did "pushf/pop %reg" in inline asm (for saving/restoring flags) Linus
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 2:56 PM, Andy Lutomirskiwrote: > On Sat, May 20, 2017 at 1:16 PM, Linus Torvalds > wrote: >> >> The amount of unreadable crap and bugs it requires is not worth the >> pain. Not for *any* amount of gain, and the gain here is basically >> zero. > > But what if objtool autogenerated the annotations, perhaps with a tiny > bit of help telling it "hardware frame goes here" or "pt_regs goes > here"? You snipped the next part of my email, where I said: > The *only* acceptable model is automated tools (ie objtool). Don't > even bother to try to go any other way. Because I will not accept that > shit. so yes, objtool parsing things on its own is acceptable (and it had better not need any help - it already checks frame pointer data). The CFI annotations needed in asm are horrendous. We used to have them, and we didn't have even _remotely_ complete annotations and despite that they were (a) wrong (b) incomplete (c) made the asm impossible to read and even worse to modify. hjl already posted an example of the kinds of horrors glibc does to do things "right". And those rabbit ears around "right" are there for a reason. There's no way that is ever right - even if it gets the right results, it's an unmaintainable piece of crap. So no, we're never ever adding that CFI garbage back into the kernel. A tool that can generate it is ok, but even then we should expect inevitable bugs and not trust the end result blindly. Because dwarf info is complex enough that other tools have gotten it wrong many many times. Just google for "gcc bugzilla cfi" or go to the gcc bugzilla and search for "DWARF" or whatever. It's not "oh, we once had a bug". It's constant. One of the reasons I like the idea of having objtool generate something *simpler* than dwarf is that it not only is much easier to parse, it has a much higher likelihood of not having some crazy bug. If objtool mainly looks at the actual instructions, and perhaps uses dwarf information as additional input and creates something much simpler than dwarf, it might have a chance in hell of occasionally even getting it right. Because dwarf information is really really complicated. It's complicated because it contains *way* more information than just how to find the next stack frame. I mean, it has basically a RPN interpreter built in, and that's the _simple_ part. Linus
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 2:56 PM, Andy Lutomirski wrote: > On Sat, May 20, 2017 at 1:16 PM, Linus Torvalds > wrote: >> >> The amount of unreadable crap and bugs it requires is not worth the >> pain. Not for *any* amount of gain, and the gain here is basically >> zero. > > But what if objtool autogenerated the annotations, perhaps with a tiny > bit of help telling it "hardware frame goes here" or "pt_regs goes > here"? You snipped the next part of my email, where I said: > The *only* acceptable model is automated tools (ie objtool). Don't > even bother to try to go any other way. Because I will not accept that > shit. so yes, objtool parsing things on its own is acceptable (and it had better not need any help - it already checks frame pointer data). The CFI annotations needed in asm are horrendous. We used to have them, and we didn't have even _remotely_ complete annotations and despite that they were (a) wrong (b) incomplete (c) made the asm impossible to read and even worse to modify. hjl already posted an example of the kinds of horrors glibc does to do things "right". And those rabbit ears around "right" are there for a reason. There's no way that is ever right - even if it gets the right results, it's an unmaintainable piece of crap. So no, we're never ever adding that CFI garbage back into the kernel. A tool that can generate it is ok, but even then we should expect inevitable bugs and not trust the end result blindly. Because dwarf info is complex enough that other tools have gotten it wrong many many times. Just google for "gcc bugzilla cfi" or go to the gcc bugzilla and search for "DWARF" or whatever. It's not "oh, we once had a bug". It's constant. One of the reasons I like the idea of having objtool generate something *simpler* than dwarf is that it not only is much easier to parse, it has a much higher likelihood of not having some crazy bug. If objtool mainly looks at the actual instructions, and perhaps uses dwarf information as additional input and creates something much simpler than dwarf, it might have a chance in hell of occasionally even getting it right. Because dwarf information is really really complicated. It's complicated because it contains *way* more information than just how to find the next stack frame. I mean, it has basically a RPN interpreter built in, and that's the _simple_ part. Linus
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 2:58 PM, Andy Lutomirskiwrote: > On Sat, May 20, 2017 at 1:01 PM, H.J. Lu wrote: >> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf wrote: >> (H.J., could we get a binutils feature that allows is to do: pushq %whatever .cfi_adjust_sp -8 ... popq %whatever .cfi_adjust_sp 8 >> >> Np. Compiler needs to generate this. >> > > How would the compiler generate this when inline asm is involved? For > the kernel, objtool could get around the need to have these > annotations, but not so much for user code? Is the compiler supposed > to parse the inline asm? Would the compiler provide some magic % code > to represent the current CFA base register? Here is one example of inline asm with call frame info: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 2:58 PM, Andy Lutomirski wrote: > On Sat, May 20, 2017 at 1:01 PM, H.J. Lu wrote: >> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf wrote: >> (H.J., could we get a binutils feature that allows is to do: pushq %whatever .cfi_adjust_sp -8 ... popq %whatever .cfi_adjust_sp 8 >> >> Np. Compiler needs to generate this. >> > > How would the compiler generate this when inline asm is involved? For > the kernel, objtool could get around the need to have these > annotations, but not so much for user code? Is the compiler supposed > to parse the inline asm? Would the compiler provide some magic % code > to represent the current CFA base register? Here is one example of inline asm with call frame info: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 1:01 PM, H.J. Luwrote: > On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf wrote: > >>> >>> (H.J., could we get a binutils feature that allows is to do: >>> >>> pushq %whatever >>> .cfi_adjust_sp -8 >>> ... >>> popq %whatever >>> .cfi_adjust_sp 8 >>> > > Np. Compiler needs to generate this. > How would the compiler generate this when inline asm is involved? For the kernel, objtool could get around the need to have these annotations, but not so much for user code? Is the compiler supposed to parse the inline asm? Would the compiler provide some magic % code to represent the current CFA base register?
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 1:01 PM, H.J. Lu wrote: > On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf wrote: > >>> >>> (H.J., could we get a binutils feature that allows is to do: >>> >>> pushq %whatever >>> .cfi_adjust_sp -8 >>> ... >>> popq %whatever >>> .cfi_adjust_sp 8 >>> > > Np. Compiler needs to generate this. > How would the compiler generate this when inline asm is involved? For the kernel, objtool could get around the need to have these annotations, but not so much for user code? Is the compiler supposed to parse the inline asm? Would the compiler provide some magic % code to represent the current CFA base register?
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 1:16 PM, Linus Torvaldswrote: > On Fri, May 19, 2017 at 10:23 PM, Andy Lutomirski wrote: >> >> I personally like the idea of using real DWARF annotations in the >> entry code because it makes gdb work better (not kgdb -- real gdb >> attached to KVM). I bet that we could get entry asm annotations into >> good shape if we really wanted to. OTOH, getting DWARF to work well >> for inline asm is really nasty IIRC. > > No. I will NAK *any* attempt to make our asm contain the crazy > shit-for-brains annotations. > > Been there, done that, got the T-shirt, and then doused the T-shirt in > gasoline and put it on fire. > > The amount of unreadable crap and bugs it requires is not worth the > pain. Not for *any* amount of gain, and the gain here is basically > zero. But what if objtool autogenerated the annotations, perhaps with a tiny bit of help telling it "hardware frame goes here" or "pt_regs goes here"?
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 1:16 PM, Linus Torvalds wrote: > On Fri, May 19, 2017 at 10:23 PM, Andy Lutomirski wrote: >> >> I personally like the idea of using real DWARF annotations in the >> entry code because it makes gdb work better (not kgdb -- real gdb >> attached to KVM). I bet that we could get entry asm annotations into >> good shape if we really wanted to. OTOH, getting DWARF to work well >> for inline asm is really nasty IIRC. > > No. I will NAK *any* attempt to make our asm contain the crazy > shit-for-brains annotations. > > Been there, done that, got the T-shirt, and then doused the T-shirt in > gasoline and put it on fire. > > The amount of unreadable crap and bugs it requires is not worth the > pain. Not for *any* amount of gain, and the gain here is basically > zero. But what if objtool autogenerated the annotations, perhaps with a tiny bit of help telling it "hardware frame goes here" or "pt_regs goes here"?
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 19, 2017 at 10:23 PM, Andy Lutomirskiwrote: > > I personally like the idea of using real DWARF annotations in the > entry code because it makes gdb work better (not kgdb -- real gdb > attached to KVM). I bet that we could get entry asm annotations into > good shape if we really wanted to. OTOH, getting DWARF to work well > for inline asm is really nasty IIRC. No. I will NAK *any* attempt to make our asm contain the crazy shit-for-brains annotations. Been there, done that, got the T-shirt, and then doused the T-shirt in gasoline and put it on fire. The amount of unreadable crap and bugs it requires is not worth the pain. Not for *any* amount of gain, and the gain here is basically zero. > (H.J., could we get a binutils feature that allows is to do: > > pushq %whatever > .cfi_adjust_sp -8 > ... > popq %whatever > .cfi_adjust_sp 8 > > that will emit the right DWARF instructions regardless of whether > there's a frame pointer or not? .cfi_adjust_cfa_offset is not > particularly helpful here because it's totally wrong if the CFA is > currently being computed based on BP.) Yeah, that's just a small example of the kind of crap people have to deal with. Not going to happen. Assembler files are hard enough to write (and read) as-is, anything that expects us to go back to the bad old days with crazy shit cfi annotations is going to get violently NAK'ed and vetoed forever. The *only* acceptable model is automated tools (ie objtool). Don't even bother to try to go any other way. Because I will not accept that shit. Linus
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 19, 2017 at 10:23 PM, Andy Lutomirski wrote: > > I personally like the idea of using real DWARF annotations in the > entry code because it makes gdb work better (not kgdb -- real gdb > attached to KVM). I bet that we could get entry asm annotations into > good shape if we really wanted to. OTOH, getting DWARF to work well > for inline asm is really nasty IIRC. No. I will NAK *any* attempt to make our asm contain the crazy shit-for-brains annotations. Been there, done that, got the T-shirt, and then doused the T-shirt in gasoline and put it on fire. The amount of unreadable crap and bugs it requires is not worth the pain. Not for *any* amount of gain, and the gain here is basically zero. > (H.J., could we get a binutils feature that allows is to do: > > pushq %whatever > .cfi_adjust_sp -8 > ... > popq %whatever > .cfi_adjust_sp 8 > > that will emit the right DWARF instructions regardless of whether > there's a frame pointer or not? .cfi_adjust_cfa_offset is not > particularly helpful here because it's totally wrong if the CFA is > currently being computed based on BP.) Yeah, that's just a small example of the kind of crap people have to deal with. Not going to happen. Assembler files are hard enough to write (and read) as-is, anything that expects us to go back to the bad old days with crazy shit cfi annotations is going to get violently NAK'ed and vetoed forever. The *only* acceptable model is automated tools (ie objtool). Don't even bother to try to go any other way. Because I will not accept that shit. Linus
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeufwrote: >> >> (H.J., could we get a binutils feature that allows is to do: >> >> pushq %whatever >> .cfi_adjust_sp -8 >> ... >> popq %whatever >> .cfi_adjust_sp 8 >> Np. Compiler needs to generate this. -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf wrote: >> >> (H.J., could we get a binutils feature that allows is to do: >> >> pushq %whatever >> .cfi_adjust_sp -8 >> ... >> popq %whatever >> .cfi_adjust_sp 8 >> Np. Compiler needs to generate this. -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 11:20:34AM -0500, Josh Poimboeuf wrote: >But then, if we're going that far, why not just have objtool reformat >the data into something much simpler? It already has the knowledge >to do so. Then we don't have to jump through all those hoops to >justify jumping through more hoops in the kernel (i.e., having a >complex DWARF state machine). With a simple debuginfo format, the >kernel unwinder is simple enough that we don't need to validate its >functionality in a simulator. I should clarify that it doesn't have to be objtool which does this. It could instead be a simple DWARF-to-undwarf conversion tool which runs during the vmlinux linking stage. Anyway we're both proposing simplifying the DWARF data into an easier-to-parse format. I think the question is whether we want that simplification process to happen in the kernel (in the middle of a kernel unwind operation), or at build time. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Sat, May 20, 2017 at 11:20:34AM -0500, Josh Poimboeuf wrote: >But then, if we're going that far, why not just have objtool reformat >the data into something much simpler? It already has the knowledge >to do so. Then we don't have to jump through all those hoops to >justify jumping through more hoops in the kernel (i.e., having a >complex DWARF state machine). With a simple debuginfo format, the >kernel unwinder is simple enough that we don't need to validate its >functionality in a simulator. I should clarify that it doesn't have to be objtool which does this. It could instead be a simple DWARF-to-undwarf conversion tool which runs during the vmlinux linking stage. Anyway we're both proposing simplifying the DWARF data into an easier-to-parse format. I think the question is whether we want that simplification process to happen in the kernel (in the middle of a kernel unwind operation), or at build time. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 19, 2017 at 10:23:53PM -0700, Andy Lutomirski wrote: > On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeufwrote: > > On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote: > >> > How are you handling control flow? > >> > >> Control flow of what? > >> > >> > > Here's the struct in its current state: > >> > > > >> > > #define UNDWARF_REG_UNDEFINED 0 > >> > > #define UNDWARF_REG_CFA 1 > >> > > #define UNDWARF_REG_SP 2 > >> > > #define UNDWARF_REG_FP 3 > >> > > #define UNDWARF_REG_SP_INDIRECT 4 > >> > > #define UNDWARF_REG_FP_INDIRECT 5 > >> > > #define UNDWARF_REG_R10 6 > >> > > #define UNDWARF_REG_DI 7 > >> > > #define UNDWARF_REG_DX 8 > >> > > > >> > > >> > Why only those registers? Also, if you have the option I would really > >> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, > >> > si, di, r8-r15 in that order.) > >> > >> Those are the only registers which are ever needed as the base for > >> finding the previous stack frame. 99% of the time it's sp or bp, the > >> other registers are needed for aligned stacks and entry code. > >> > >> Using the actual register numbers isn't an option because I don't need > >> them all and they need to fit in a small number of bits. > >> > >> This construct might be useful for other arches, which is why I called > >> it "FP" instead of "BP". But then I ruined that with the last 3 :-) > > > > BTW, here's the link to the unwinder code if you're interested: > > > > > > https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c > > At the risk of potentially overcomplicating matters, here's a > suggestion. As far as I know, all (or most all?) unwinders > effectively do the following in a loop: > > 1. Look up the IP to figure out how to unwind from that IP. > 2. Use the results of that lookup to compute the previous frame state. > > The results of step 1 could perhaps be expressed like this: > > struct reg_formula { > unsigned int source_reg :4; > long offset; > bool dereference; /* true: *(reg + offset); false: (reg + offset) */ > /* For DWARF, I think this can be considerably more complicated, but > I doubt it's useful. */ > }; > > struct unwind_step { > u16 available_regs; /* mask of the caller frame regs that we are > able to recover */ > struct reg_formula[16]; > }; Ok, so I assume you mean we would need to have an in-kernel DWARF reader which reads .eh_frame and converts it to the above, which is called for every step of the unwind phase. > The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus > or minus sizeof(unsigned long) or whatever -- I can never remember > exactly what CFA refers to.) For a frame pointer-based unwinder, the > entire unwind_step is a foregone conclusion independent of IP: SP = BP > + 8 (or whatever), BP = *(BP + whatever), all other regs unknown. > > Could it make sense to actually structure the code this way? I can > see a few advantages. It would make the actual meat of the unwind > loop totally independent of the unwinding algorithm in use, Yes, this part of it is an interesting idea, separating the debuginfo reading step from the unwinding step. And for people who don't want to carry around megs of DWARF data, get_dwarf_step() could just be a fake lookup which always returns the frame pointer version. > it would > make the meat be dead simple (and thus easy to verify for > non-crashiness), and I think it just might open the door for a real > in-kernel DWARF unwinder that Linus would be okay with. Specifically, > write a function like: > > bool get_dwarf_step(struct unwind_step *step, unsigned long ip); If we keep the frame pointer encoding thing for non-DWARF kernels then we may also need to pass in bp as well. Or maybe we could force frame pointer users to at least have DWARF data for the entry code. Then even frame pointer kernels could detect entry regs and we could get rid of that nasty frame pointer encoding thing. > Put this function in its own file and make it buildable as kernel code > or as user code. Write a test case that runs it on every single > address on the kernel image (in user mode!) with address-sanitizer > enabled (or in Valgrind or both) and make sure that (a) it doesn't > blow up and (b) that the results are credible (e.g. by comparing to > objtool output). Heck, you could even fuzz-test it where the fuzzer > is allowed to corrupt the actual DWARF data. You could do the same > thing with whatever crazy super-compacted undwarf scheme someone comes > up with down the road, too. I think your proposal can be separated into two ideas, which can each be considered on their own merit: 1) Put .eh_frame in the kernel, along with an in-kernel DWARF unwinder. Manage the complexity of the unwinder by validating the output of the complex
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 19, 2017 at 10:23:53PM -0700, Andy Lutomirski wrote: > On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeuf wrote: > > On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote: > >> > How are you handling control flow? > >> > >> Control flow of what? > >> > >> > > Here's the struct in its current state: > >> > > > >> > > #define UNDWARF_REG_UNDEFINED 0 > >> > > #define UNDWARF_REG_CFA 1 > >> > > #define UNDWARF_REG_SP 2 > >> > > #define UNDWARF_REG_FP 3 > >> > > #define UNDWARF_REG_SP_INDIRECT 4 > >> > > #define UNDWARF_REG_FP_INDIRECT 5 > >> > > #define UNDWARF_REG_R10 6 > >> > > #define UNDWARF_REG_DI 7 > >> > > #define UNDWARF_REG_DX 8 > >> > > > >> > > >> > Why only those registers? Also, if you have the option I would really > >> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, > >> > si, di, r8-r15 in that order.) > >> > >> Those are the only registers which are ever needed as the base for > >> finding the previous stack frame. 99% of the time it's sp or bp, the > >> other registers are needed for aligned stacks and entry code. > >> > >> Using the actual register numbers isn't an option because I don't need > >> them all and they need to fit in a small number of bits. > >> > >> This construct might be useful for other arches, which is why I called > >> it "FP" instead of "BP". But then I ruined that with the last 3 :-) > > > > BTW, here's the link to the unwinder code if you're interested: > > > > > > https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c > > At the risk of potentially overcomplicating matters, here's a > suggestion. As far as I know, all (or most all?) unwinders > effectively do the following in a loop: > > 1. Look up the IP to figure out how to unwind from that IP. > 2. Use the results of that lookup to compute the previous frame state. > > The results of step 1 could perhaps be expressed like this: > > struct reg_formula { > unsigned int source_reg :4; > long offset; > bool dereference; /* true: *(reg + offset); false: (reg + offset) */ > /* For DWARF, I think this can be considerably more complicated, but > I doubt it's useful. */ > }; > > struct unwind_step { > u16 available_regs; /* mask of the caller frame regs that we are > able to recover */ > struct reg_formula[16]; > }; Ok, so I assume you mean we would need to have an in-kernel DWARF reader which reads .eh_frame and converts it to the above, which is called for every step of the unwind phase. > The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus > or minus sizeof(unsigned long) or whatever -- I can never remember > exactly what CFA refers to.) For a frame pointer-based unwinder, the > entire unwind_step is a foregone conclusion independent of IP: SP = BP > + 8 (or whatever), BP = *(BP + whatever), all other regs unknown. > > Could it make sense to actually structure the code this way? I can > see a few advantages. It would make the actual meat of the unwind > loop totally independent of the unwinding algorithm in use, Yes, this part of it is an interesting idea, separating the debuginfo reading step from the unwinding step. And for people who don't want to carry around megs of DWARF data, get_dwarf_step() could just be a fake lookup which always returns the frame pointer version. > it would > make the meat be dead simple (and thus easy to verify for > non-crashiness), and I think it just might open the door for a real > in-kernel DWARF unwinder that Linus would be okay with. Specifically, > write a function like: > > bool get_dwarf_step(struct unwind_step *step, unsigned long ip); If we keep the frame pointer encoding thing for non-DWARF kernels then we may also need to pass in bp as well. Or maybe we could force frame pointer users to at least have DWARF data for the entry code. Then even frame pointer kernels could detect entry regs and we could get rid of that nasty frame pointer encoding thing. > Put this function in its own file and make it buildable as kernel code > or as user code. Write a test case that runs it on every single > address on the kernel image (in user mode!) with address-sanitizer > enabled (or in Valgrind or both) and make sure that (a) it doesn't > blow up and (b) that the results are credible (e.g. by comparing to > objtool output). Heck, you could even fuzz-test it where the fuzzer > is allowed to corrupt the actual DWARF data. You could do the same > thing with whatever crazy super-compacted undwarf scheme someone comes > up with down the road, too. I think your proposal can be separated into two ideas, which can each be considered on their own merit: 1) Put .eh_frame in the kernel, along with an in-kernel DWARF unwinder. Manage the complexity of the unwinder by validating the output of the complex part of the algorithm
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeufwrote: > On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote: >> > How are you handling control flow? >> >> Control flow of what? >> >> > > Here's the struct in its current state: >> > > >> > > #define UNDWARF_REG_UNDEFINED 0 >> > > #define UNDWARF_REG_CFA 1 >> > > #define UNDWARF_REG_SP 2 >> > > #define UNDWARF_REG_FP 3 >> > > #define UNDWARF_REG_SP_INDIRECT 4 >> > > #define UNDWARF_REG_FP_INDIRECT 5 >> > > #define UNDWARF_REG_R10 6 >> > > #define UNDWARF_REG_DI 7 >> > > #define UNDWARF_REG_DX 8 >> > > >> > >> > Why only those registers? Also, if you have the option I would really >> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, >> > si, di, r8-r15 in that order.) >> >> Those are the only registers which are ever needed as the base for >> finding the previous stack frame. 99% of the time it's sp or bp, the >> other registers are needed for aligned stacks and entry code. >> >> Using the actual register numbers isn't an option because I don't need >> them all and they need to fit in a small number of bits. >> >> This construct might be useful for other arches, which is why I called >> it "FP" instead of "BP". But then I ruined that with the last 3 :-) > > BTW, here's the link to the unwinder code if you're interested: > > > https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c At the risk of potentially overcomplicating matters, here's a suggestion. As far as I know, all (or most all?) unwinders effectively do the following in a loop: 1. Look up the IP to figure out how to unwind from that IP. 2. Use the results of that lookup to compute the previous frame state. The results of step 1 could perhaps be expressed like this: struct reg_formula { unsigned int source_reg :4; long offset; bool dereference; /* true: *(reg + offset); false: (reg + offset) */ /* For DWARF, I think this can be considerably more complicated, but I doubt it's useful. */ }; struct unwind_step { u16 available_regs; /* mask of the caller frame regs that we are able to recover */ struct reg_formula[16]; }; The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus or minus sizeof(unsigned long) or whatever -- I can never remember exactly what CFA refers to.) For a frame pointer-based unwinder, the entire unwind_step is a foregone conclusion independent of IP: SP = BP + 8 (or whatever), BP = *(BP + whatever), all other regs unknown. Could it make sense to actually structure the code this way? I can see a few advantages. It would make the actual meat of the unwind loop totally independent of the unwinding algorithm in use, it would make the meat be dead simple (and thus easy to verify for non-crashiness), and I think it just might open the door for a real in-kernel DWARF unwinder that Linus would be okay with. Specifically, write a function like: bool get_dwarf_step(struct unwind_step *step, unsigned long ip); Put this function in its own file and make it buildable as kernel code or as user code. Write a test case that runs it on every single address on the kernel image (in user mode!) with address-sanitizer enabled (or in Valgrind or both) and make sure that (a) it doesn't blow up and (b) that the results are credible (e.g. by comparing to objtool output). Heck, you could even fuzz-test it where the fuzzer is allowed to corrupt the actual DWARF data. You could do the same thing with whatever crazy super-compacted undwarf scheme someone comes up with down the road, too. I personally like the idea of using real DWARF annotations in the entry code because it makes gdb work better (not kgdb -- real gdb attached to KVM). I bet that we could get entry asm annotations into good shape if we really wanted to. OTOH, getting DWARF to work well for inline asm is really nasty IIRC. (H.J., could we get a binutils feature that allows is to do: pushq %whatever .cfi_adjust_sp -8 ... popq %whatever .cfi_adjust_sp 8 that will emit the right DWARF instructions regardless of whether there's a frame pointer or not? .cfi_adjust_cfa_offset is not particularly helpful here because it's totally wrong if the CFA is currently being computed based on BP.) Also, you read the stack like this: *val = *(unsigned long *)addr; how about probe_kernel_read() instead? --Andy
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeuf wrote: > On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote: >> > How are you handling control flow? >> >> Control flow of what? >> >> > > Here's the struct in its current state: >> > > >> > > #define UNDWARF_REG_UNDEFINED 0 >> > > #define UNDWARF_REG_CFA 1 >> > > #define UNDWARF_REG_SP 2 >> > > #define UNDWARF_REG_FP 3 >> > > #define UNDWARF_REG_SP_INDIRECT 4 >> > > #define UNDWARF_REG_FP_INDIRECT 5 >> > > #define UNDWARF_REG_R10 6 >> > > #define UNDWARF_REG_DI 7 >> > > #define UNDWARF_REG_DX 8 >> > > >> > >> > Why only those registers? Also, if you have the option I would really >> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, >> > si, di, r8-r15 in that order.) >> >> Those are the only registers which are ever needed as the base for >> finding the previous stack frame. 99% of the time it's sp or bp, the >> other registers are needed for aligned stacks and entry code. >> >> Using the actual register numbers isn't an option because I don't need >> them all and they need to fit in a small number of bits. >> >> This construct might be useful for other arches, which is why I called >> it "FP" instead of "BP". But then I ruined that with the last 3 :-) > > BTW, here's the link to the unwinder code if you're interested: > > > https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c At the risk of potentially overcomplicating matters, here's a suggestion. As far as I know, all (or most all?) unwinders effectively do the following in a loop: 1. Look up the IP to figure out how to unwind from that IP. 2. Use the results of that lookup to compute the previous frame state. The results of step 1 could perhaps be expressed like this: struct reg_formula { unsigned int source_reg :4; long offset; bool dereference; /* true: *(reg + offset); false: (reg + offset) */ /* For DWARF, I think this can be considerably more complicated, but I doubt it's useful. */ }; struct unwind_step { u16 available_regs; /* mask of the caller frame regs that we are able to recover */ struct reg_formula[16]; }; The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus or minus sizeof(unsigned long) or whatever -- I can never remember exactly what CFA refers to.) For a frame pointer-based unwinder, the entire unwind_step is a foregone conclusion independent of IP: SP = BP + 8 (or whatever), BP = *(BP + whatever), all other regs unknown. Could it make sense to actually structure the code this way? I can see a few advantages. It would make the actual meat of the unwind loop totally independent of the unwinding algorithm in use, it would make the meat be dead simple (and thus easy to verify for non-crashiness), and I think it just might open the door for a real in-kernel DWARF unwinder that Linus would be okay with. Specifically, write a function like: bool get_dwarf_step(struct unwind_step *step, unsigned long ip); Put this function in its own file and make it buildable as kernel code or as user code. Write a test case that runs it on every single address on the kernel image (in user mode!) with address-sanitizer enabled (or in Valgrind or both) and make sure that (a) it doesn't blow up and (b) that the results are credible (e.g. by comparing to objtool output). Heck, you could even fuzz-test it where the fuzzer is allowed to corrupt the actual DWARF data. You could do the same thing with whatever crazy super-compacted undwarf scheme someone comes up with down the road, too. I personally like the idea of using real DWARF annotations in the entry code because it makes gdb work better (not kgdb -- real gdb attached to KVM). I bet that we could get entry asm annotations into good shape if we really wanted to. OTOH, getting DWARF to work well for inline asm is really nasty IIRC. (H.J., could we get a binutils feature that allows is to do: pushq %whatever .cfi_adjust_sp -8 ... popq %whatever .cfi_adjust_sp 8 that will emit the right DWARF instructions regardless of whether there's a frame pointer or not? .cfi_adjust_cfa_offset is not particularly helpful here because it's totally wrong if the CFA is currently being computed based on BP.) Also, you read the stack like this: *val = *(unsigned long *)addr; how about probe_kernel_read() instead? --Andy
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote: > > How are you handling control flow? > > Control flow of what? > > > > Here's the struct in its current state: > > > > > > #define UNDWARF_REG_UNDEFINED 0 > > > #define UNDWARF_REG_CFA 1 > > > #define UNDWARF_REG_SP 2 > > > #define UNDWARF_REG_FP 3 > > > #define UNDWARF_REG_SP_INDIRECT 4 > > > #define UNDWARF_REG_FP_INDIRECT 5 > > > #define UNDWARF_REG_R10 6 > > > #define UNDWARF_REG_DI 7 > > > #define UNDWARF_REG_DX 8 > > > > > > > Why only those registers? Also, if you have the option I would really > > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, > > si, di, r8-r15 in that order.) > > Those are the only registers which are ever needed as the base for > finding the previous stack frame. 99% of the time it's sp or bp, the > other registers are needed for aligned stacks and entry code. > > Using the actual register numbers isn't an option because I don't need > them all and they need to fit in a small number of bits. > > This construct might be useful for other arches, which is why I called > it "FP" instead of "BP". But then I ruined that with the last 3 :-) BTW, here's the link to the unwinder code if you're interested: https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote: > > How are you handling control flow? > > Control flow of what? > > > > Here's the struct in its current state: > > > > > > #define UNDWARF_REG_UNDEFINED 0 > > > #define UNDWARF_REG_CFA 1 > > > #define UNDWARF_REG_SP 2 > > > #define UNDWARF_REG_FP 3 > > > #define UNDWARF_REG_SP_INDIRECT 4 > > > #define UNDWARF_REG_FP_INDIRECT 5 > > > #define UNDWARF_REG_R10 6 > > > #define UNDWARF_REG_DI 7 > > > #define UNDWARF_REG_DX 8 > > > > > > > Why only those registers? Also, if you have the option I would really > > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, > > si, di, r8-r15 in that order.) > > Those are the only registers which are ever needed as the base for > finding the previous stack frame. 99% of the time it's sp or bp, the > other registers are needed for aligned stacks and entry code. > > Using the actual register numbers isn't an option because I don't need > them all and they need to fit in a small number of bits. > > This construct might be useful for other arches, which is why I called > it "FP" instead of "BP". But then I ruined that with the last 3 :-) BTW, here's the link to the unwinder code if you're interested: https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c -- Josh
Re: [PATCH 7/7] DWARF: add the config option
> How are you handling control flow? Control flow of what? > > Here's the struct in its current state: > > > > #define UNDWARF_REG_UNDEFINED 0 > > #define UNDWARF_REG_CFA 1 > > #define UNDWARF_REG_SP 2 > > #define UNDWARF_REG_FP 3 > > #define UNDWARF_REG_SP_INDIRECT 4 > > #define UNDWARF_REG_FP_INDIRECT 5 > > #define UNDWARF_REG_R10 6 > > #define UNDWARF_REG_DI 7 > > #define UNDWARF_REG_DX 8 > > > > Why only those registers? Also, if you have the option I would really > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, > si, di, r8-r15 in that order.) Those are the only registers which are ever needed as the base for finding the previous stack frame. 99% of the time it's sp or bp, the other registers are needed for aligned stacks and entry code. Using the actual register numbers isn't an option because I don't need them all and they need to fit in a small number of bits. This construct might be useful for other arches, which is why I called it "FP" instead of "BP". But then I ruined that with the last 3 :-) -- Josh
Re: [PATCH 7/7] DWARF: add the config option
> How are you handling control flow? Control flow of what? > > Here's the struct in its current state: > > > > #define UNDWARF_REG_UNDEFINED 0 > > #define UNDWARF_REG_CFA 1 > > #define UNDWARF_REG_SP 2 > > #define UNDWARF_REG_FP 3 > > #define UNDWARF_REG_SP_INDIRECT 4 > > #define UNDWARF_REG_FP_INDIRECT 5 > > #define UNDWARF_REG_R10 6 > > #define UNDWARF_REG_DI 7 > > #define UNDWARF_REG_DX 8 > > > > Why only those registers? Also, if you have the option I would really > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, > si, di, r8-r15 in that order.) Those are the only registers which are ever needed as the base for finding the previous stack frame. 99% of the time it's sp or bp, the other registers are needed for aligned stacks and entry code. Using the actual register numbers isn't an option because I don't need them all and they need to fit in a small number of bits. This construct might be useful for other arches, which is why I called it "FP" instead of "BP". But then I ruined that with the last 3 :-) -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On 05/19/17 13:53, Josh Poimboeuf wrote: > > Here's the struct in its current state: > > #define UNDWARF_REG_UNDEFINED 0 > #define UNDWARF_REG_CFA 1 > #define UNDWARF_REG_SP 2 > #define UNDWARF_REG_FP 3 > #define UNDWARF_REG_SP_INDIRECT 4 > #define UNDWARF_REG_FP_INDIRECT 5 > #define UNDWARF_REG_R10 6 > #define UNDWARF_REG_DI 7 > #define UNDWARF_REG_DX 8 > Why only those registers? Also, if you have the option I would really suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, si, di, r8-r15 in that order.) -hpa
Re: [PATCH 7/7] DWARF: add the config option
On 05/19/17 13:53, Josh Poimboeuf wrote: > > Here's the struct in its current state: > > #define UNDWARF_REG_UNDEFINED 0 > #define UNDWARF_REG_CFA 1 > #define UNDWARF_REG_SP 2 > #define UNDWARF_REG_FP 3 > #define UNDWARF_REG_SP_INDIRECT 4 > #define UNDWARF_REG_FP_INDIRECT 5 > #define UNDWARF_REG_R10 6 > #define UNDWARF_REG_DI 7 > #define UNDWARF_REG_DX 8 > Why only those registers? Also, if you have the option I would really suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp, si, di, r8-r15 in that order.) -hpa
Re: [PATCH 7/7] DWARF: add the config option
On 05/19/17 13:53, Josh Poimboeuf wrote: >> >> One instance of the structure would exist for each time the stack >> pointer changes, e.g. for every function entry, push/pop, and rsp >> add/subtract. The data could be assembled and sorted offline, possibly >> derived from DWARF, or more likely, generated by objtool. After doing >> some rough calculations, I think the section size would be comparable to >> the sizes of the DWARF .eh_frame sections it would replace. >> >> If it worked, the "undwarf" unwinder would be a lot simpler than a real >> DWARF unwinder. And validating the sanity of the data at runtime would >> be a lot more straightforward. It could ensure that each stack pointer >> is within the bounds of the current stack, like our current unwinder >> does. > > I've been hacking away at this, and so far it's working well. The code > is much simpler than a DWARF unwinder. Right now the kernel piece is > only ~350 lines of code. The vast majority of the changes are in > objtool. > > It's now successfully unwinding through entry code and most other asm > files, dumping entry regs, dealing with aligned stacks, dynamic stacks, > etc. > > Here's the struct in its current state: How are you handling control flow? -hpa
Re: [PATCH 7/7] DWARF: add the config option
On 05/19/17 13:53, Josh Poimboeuf wrote: >> >> One instance of the structure would exist for each time the stack >> pointer changes, e.g. for every function entry, push/pop, and rsp >> add/subtract. The data could be assembled and sorted offline, possibly >> derived from DWARF, or more likely, generated by objtool. After doing >> some rough calculations, I think the section size would be comparable to >> the sizes of the DWARF .eh_frame sections it would replace. >> >> If it worked, the "undwarf" unwinder would be a lot simpler than a real >> DWARF unwinder. And validating the sanity of the data at runtime would >> be a lot more straightforward. It could ensure that each stack pointer >> is within the bounds of the current stack, like our current unwinder >> does. > > I've been hacking away at this, and so far it's working well. The code > is much simpler than a DWARF unwinder. Right now the kernel piece is > only ~350 lines of code. The vast majority of the changes are in > objtool. > > It's now successfully unwinding through entry code and most other asm > files, dumping entry regs, dealing with aligned stacks, dynamic stacks, > etc. > > Here's the struct in its current state: How are you handling control flow? -hpa
Re: [PATCH 7/7] DWARF: add the config option
On Sun, May 07, 2017 at 11:55:24AM -0500, Josh Poimboeuf wrote: > I'm thinking/hoping that information can be expressed in a simple, easy > to parse, reasonably sized data structure. Something like a sorted > array of this: > > struct undwarf { > unsigned int ip;/* instruction pointer (relative offset > from base) */ > unsigned prev_frame:13; /* offset to previous frame from > current stack pointer */ > unsigned regs:1;/* whether prev_frame contains entry > regs (regs->ip) */ > unsigned align:2; /* some details for dealing with gcc > stack realignment */ > } __packed; > > extern struct undwarf undwarves[]; > > One instance of the structure would exist for each time the stack > pointer changes, e.g. for every function entry, push/pop, and rsp > add/subtract. The data could be assembled and sorted offline, possibly > derived from DWARF, or more likely, generated by objtool. After doing > some rough calculations, I think the section size would be comparable to > the sizes of the DWARF .eh_frame sections it would replace. > > If it worked, the "undwarf" unwinder would be a lot simpler than a real > DWARF unwinder. And validating the sanity of the data at runtime would > be a lot more straightforward. It could ensure that each stack pointer > is within the bounds of the current stack, like our current unwinder > does. I've been hacking away at this, and so far it's working well. The code is much simpler than a DWARF unwinder. Right now the kernel piece is only ~350 lines of code. The vast majority of the changes are in objtool. It's now successfully unwinding through entry code and most other asm files, dumping entry regs, dealing with aligned stacks, dynamic stacks, etc. Here's the struct in its current state: #define UNDWARF_REG_UNDEFINED 0 #define UNDWARF_REG_CFA 1 #define UNDWARF_REG_SP 2 #define UNDWARF_REG_FP 3 #define UNDWARF_REG_SP_INDIRECT 4 #define UNDWARF_REG_FP_INDIRECT 5 #define UNDWARF_REG_R10 6 #define UNDWARF_REG_DI 7 #define UNDWARF_REG_DX 8 #define UNDWARF_TYPE_NORMAL 0 #define UNDWARF_TYPE_REGS 1 #define UNDWARF_TYPE_REGS_IRET 2 struct undwarf_state { int ip; unsigned int len; short cfa_offset; short fp_offset; unsigned cfa_reg:4; unsigned fp_reg:4; unsigned type:2; }; With frame pointers disabled, around 300,000 of those structs are needed for my kernel, which works out to be 4.7M of data. By comparison, the DWARF eh_frame sections would be 2.1M. I think we should be able to compress it down to a comparable size by rearranging the data a little bit. The entry code needs some annotations to give some hints to objtool about how to generate the data, but it's not bad: https://paste.fedoraproject.org/paste/Xq3bPlx5An0Si7AshZTdkF5M1UNdIGYhyRLivL9gydE= I still have a lot of work to do on the tooling side: module support, sorting the undwarf table at build time, and a lot of cleanups. But overall it's looking feasible. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Sun, May 07, 2017 at 11:55:24AM -0500, Josh Poimboeuf wrote: > I'm thinking/hoping that information can be expressed in a simple, easy > to parse, reasonably sized data structure. Something like a sorted > array of this: > > struct undwarf { > unsigned int ip;/* instruction pointer (relative offset > from base) */ > unsigned prev_frame:13; /* offset to previous frame from > current stack pointer */ > unsigned regs:1;/* whether prev_frame contains entry > regs (regs->ip) */ > unsigned align:2; /* some details for dealing with gcc > stack realignment */ > } __packed; > > extern struct undwarf undwarves[]; > > One instance of the structure would exist for each time the stack > pointer changes, e.g. for every function entry, push/pop, and rsp > add/subtract. The data could be assembled and sorted offline, possibly > derived from DWARF, or more likely, generated by objtool. After doing > some rough calculations, I think the section size would be comparable to > the sizes of the DWARF .eh_frame sections it would replace. > > If it worked, the "undwarf" unwinder would be a lot simpler than a real > DWARF unwinder. And validating the sanity of the data at runtime would > be a lot more straightforward. It could ensure that each stack pointer > is within the bounds of the current stack, like our current unwinder > does. I've been hacking away at this, and so far it's working well. The code is much simpler than a DWARF unwinder. Right now the kernel piece is only ~350 lines of code. The vast majority of the changes are in objtool. It's now successfully unwinding through entry code and most other asm files, dumping entry regs, dealing with aligned stacks, dynamic stacks, etc. Here's the struct in its current state: #define UNDWARF_REG_UNDEFINED 0 #define UNDWARF_REG_CFA 1 #define UNDWARF_REG_SP 2 #define UNDWARF_REG_FP 3 #define UNDWARF_REG_SP_INDIRECT 4 #define UNDWARF_REG_FP_INDIRECT 5 #define UNDWARF_REG_R10 6 #define UNDWARF_REG_DI 7 #define UNDWARF_REG_DX 8 #define UNDWARF_TYPE_NORMAL 0 #define UNDWARF_TYPE_REGS 1 #define UNDWARF_TYPE_REGS_IRET 2 struct undwarf_state { int ip; unsigned int len; short cfa_offset; short fp_offset; unsigned cfa_reg:4; unsigned fp_reg:4; unsigned type:2; }; With frame pointers disabled, around 300,000 of those structs are needed for my kernel, which works out to be 4.7M of data. By comparison, the DWARF eh_frame sections would be 2.1M. I think we should be able to compress it down to a comparable size by rearranging the data a little bit. The entry code needs some annotations to give some hints to objtool about how to generate the data, but it's not bad: https://paste.fedoraproject.org/paste/Xq3bPlx5An0Si7AshZTdkF5M1UNdIGYhyRLivL9gydE= I still have a lot of work to do on the tooling side: module support, sorting the undwarf table at build time, and a lot of cleanups. But overall it's looking feasible. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 12:39 AM, Jiri Slabywrote: > > Every SUSE user has been using this for almost a decade and we are not > about to switch to FP for performance reasons as noted by Jiri Kosina. The whole "not about to switch on frame pointers" argument is bogus. Lots of people don't have frame pointers. The tracebacks don't look all that horrible despite that. If the problem is that some debug option that you want to use do that "select FRAME_POINTER" thing, then maybe we should just fix that. For example, I think it's annoying that the LATENCYTOP helper config option basically forces frame pointers. That just seems stupid. Even enabling CONFIG_STACKTRACE doesn't do that. We do fine without frame pointers. Do traces get a bit uglier? Sure. But that's not a huge deal. > Well, reliable stack-traces with minimal performance impact thanks to > out-of-band data is hell good reason in my opinion. It's not the performance impact. It's the other crap. It's the fact that you have a whole state machine that isn't even used. The only reason for that state machine is for register contents, but then the register contents aren't actually used by the stack trace code afaik. Yeah, in theory that register engine might be used for dynamic stack sizes too, but I don't think gcc actually generates code like that - it uses frame pointers for variable-sized stacks, doesn't it? But historically, it's even more the "oops, the unwind tables are buggy because the test coverage is horrible, and we walked off into the weeds while walking them, taking a recursive page fault, which turned a WARN_ON() into a dead machine that didn't even give us the information we wanted in the first place". Now, it may be that with tools like objtool, we might be able to *validate* the tables, which might actually be a good safety net. So I'm not saying that the unwinder is a total no-go. But I want to get rid of these red herring arguments in its favor. If the argument *for* the unwinder i scrazy bullshit (eg "I want to use LATENCYTOP without CONFIG_FRAME_POINTER"), then we should fix those things independently. >> The fact that it gets disabled for KASAN also makes me suspicious. It >> basically means that now all the accesses it does are not even >> validated. > > OK, I inclined to disable KASAN when I started cleaning this up for > _performance_ reasons. The system was so slow, that the RCU stall or > soft-lockup detectors came up complaining. From that time, I measured > the bottlenecks and optimized the unwinder so that 1000 iterations of > unwinding takes: > > Before: > real0m1.808s > user0m0.001s > sys 0m1.807s > > After: > real0m0.018s > user0m0.001s > sys 0m0.017s > > So let me check, whether KASAN still has to be disabled globally. I do > not think so. > > OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as > holds now for the rest of the current unwinding: > KASAN_SANITIZE_dumpstack.o := n > KASAN_SANITIZE_dumpstack_$(BITS).o := n > KASAN_SANITIZE_stacktrace.o := n That's fine. But if the unwinder means no KASAN at all, then I don't think the unwinder is good. > Now we have objtool. My objtool clone: > 1) verifies the DWARF data (prepared by Josh) > > 2) generates DWARF data for assembly -- incomplete yet: see the thread > about x86 assembly cleanup which is a pre-requisite for this to work. > This is BTW the reason why the DWARF unwinder is default-off in this > series yet. > > And we can add: > 3) fix up the data, if they are wrong Yes. objtool might make the unwinder acceptable. One of the things that caused the old unwinder to be absolutely incredible *crap* was how it turned assembly language (both inline and separate .S files) from a useful thing to absolutely horrible line noise that was illegible and unmaintainable, and likely to be buggy to boot. So we may be in a different situation that we used to. But still.. Linus
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 12:39 AM, Jiri Slaby wrote: > > Every SUSE user has been using this for almost a decade and we are not > about to switch to FP for performance reasons as noted by Jiri Kosina. The whole "not about to switch on frame pointers" argument is bogus. Lots of people don't have frame pointers. The tracebacks don't look all that horrible despite that. If the problem is that some debug option that you want to use do that "select FRAME_POINTER" thing, then maybe we should just fix that. For example, I think it's annoying that the LATENCYTOP helper config option basically forces frame pointers. That just seems stupid. Even enabling CONFIG_STACKTRACE doesn't do that. We do fine without frame pointers. Do traces get a bit uglier? Sure. But that's not a huge deal. > Well, reliable stack-traces with minimal performance impact thanks to > out-of-band data is hell good reason in my opinion. It's not the performance impact. It's the other crap. It's the fact that you have a whole state machine that isn't even used. The only reason for that state machine is for register contents, but then the register contents aren't actually used by the stack trace code afaik. Yeah, in theory that register engine might be used for dynamic stack sizes too, but I don't think gcc actually generates code like that - it uses frame pointers for variable-sized stacks, doesn't it? But historically, it's even more the "oops, the unwind tables are buggy because the test coverage is horrible, and we walked off into the weeds while walking them, taking a recursive page fault, which turned a WARN_ON() into a dead machine that didn't even give us the information we wanted in the first place". Now, it may be that with tools like objtool, we might be able to *validate* the tables, which might actually be a good safety net. So I'm not saying that the unwinder is a total no-go. But I want to get rid of these red herring arguments in its favor. If the argument *for* the unwinder i scrazy bullshit (eg "I want to use LATENCYTOP without CONFIG_FRAME_POINTER"), then we should fix those things independently. >> The fact that it gets disabled for KASAN also makes me suspicious. It >> basically means that now all the accesses it does are not even >> validated. > > OK, I inclined to disable KASAN when I started cleaning this up for > _performance_ reasons. The system was so slow, that the RCU stall or > soft-lockup detectors came up complaining. From that time, I measured > the bottlenecks and optimized the unwinder so that 1000 iterations of > unwinding takes: > > Before: > real0m1.808s > user0m0.001s > sys 0m1.807s > > After: > real0m0.018s > user0m0.001s > sys 0m0.017s > > So let me check, whether KASAN still has to be disabled globally. I do > not think so. > > OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as > holds now for the rest of the current unwinding: > KASAN_SANITIZE_dumpstack.o := n > KASAN_SANITIZE_dumpstack_$(BITS).o := n > KASAN_SANITIZE_stacktrace.o := n That's fine. But if the unwinder means no KASAN at all, then I don't think the unwinder is good. > Now we have objtool. My objtool clone: > 1) verifies the DWARF data (prepared by Josh) > > 2) generates DWARF data for assembly -- incomplete yet: see the thread > about x86 assembly cleanup which is a pre-requisite for this to work. > This is BTW the reason why the DWARF unwinder is default-off in this > series yet. > > And we can add: > 3) fix up the data, if they are wrong Yes. objtool might make the unwinder acceptable. One of the things that caused the old unwinder to be absolutely incredible *crap* was how it turned assembly language (both inline and separate .S files) from a useful thing to absolutely horrible line noise that was illegible and unmaintainable, and likely to be buggy to boot. So we may be in a different situation that we used to. But still.. Linus
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 6:09 AM, Josh Poimboeufwrote: > On Wed, May 10, 2017 at 10:15:09AM +0200, Jiri Slaby wrote: >> > I'm, ahem, highly skeptical to creating our own unwinding data >> > format unless there is *documented, supported, and tested* way to >> > force the compiler to *automatically* fall back to frame pointers >> > any time there may be complexity involved, >> >> I second this. Inventing a new format like this mostly ends up with >> using the standard one after several iterations. One cannot think of all >> the consequences and needs while proposing a new one. >> >> The memory footprint is ~2M for average vmlinux. And people who need to >> access: >> * either need it frequently -- those do not need performance (LOCKDEP, >> KASAN, or other debug builds) >> * or are in the middle of WARNING, BUG, crash, panic or such and this is >> not that often... >> >> And we would need *both*. The limited proprietary one in some sort of >> .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash, >> gdb and so on. > > I don't think so. DWARF CFI is optimized for size. My proposal is to > take the same data (or some subset of it) and reformat it to optimize > for simplicity. > > If, for some reason, we ended up needing *all* the original DWARF data, > we could still have it in the simpler format. In that case it might end > up being 8M instead of 2M :-) But I don't see that being possible. There is a compact EH for MIPS: https://github.com/itanium-cxx-abi/cxx-abi/blob/master/MIPSCompactEH.pdf It can be extended to other targets. -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 6:09 AM, Josh Poimboeuf wrote: > On Wed, May 10, 2017 at 10:15:09AM +0200, Jiri Slaby wrote: >> > I'm, ahem, highly skeptical to creating our own unwinding data >> > format unless there is *documented, supported, and tested* way to >> > force the compiler to *automatically* fall back to frame pointers >> > any time there may be complexity involved, >> >> I second this. Inventing a new format like this mostly ends up with >> using the standard one after several iterations. One cannot think of all >> the consequences and needs while proposing a new one. >> >> The memory footprint is ~2M for average vmlinux. And people who need to >> access: >> * either need it frequently -- those do not need performance (LOCKDEP, >> KASAN, or other debug builds) >> * or are in the middle of WARNING, BUG, crash, panic or such and this is >> not that often... >> >> And we would need *both*. The limited proprietary one in some sort of >> .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash, >> gdb and so on. > > I don't think so. DWARF CFI is optimized for size. My proposal is to > take the same data (or some subset of it) and reformat it to optimize > for simplicity. > > If, for some reason, we ended up needing *all* the original DWARF data, > we could still have it in the simpler format. In that case it might end > up being 8M instead of 2M :-) But I don't see that being possible. There is a compact EH for MIPS: https://github.com/itanium-cxx-abi/cxx-abi/blob/master/MIPSCompactEH.pdf It can be extended to other targets. -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote: > On 05/09/2017, 09:22 PM, Josh Poimboeuf wrote: > > On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote: > >> On Sun, 7 May 2017, Josh Poimboeuf wrote: > >> > >>> DWARF is great for debuggers. It helps you find all the registers on > >>> the stack, so you can see function arguments and local variables. All > >>> expressed in a nice compact format. > >>> > >>> But that's overkill for unwinders. We don't need all those registers, > >>> and the state machine is too complicated. > >> > >> OTOH if we make the failures in processing of those "auxiliary" > >> information non-fatal (in a sense that it neither causes kernel bug nor > >> does it actually corrupt the unwinding process, but the only effect is > >> losing "optional" information), having this data available doesn't hurt. > > > > But it does hurt, in the sense that the complicated format of DWARF CFI > > means the unwinder has to jump through a lot more hoops to read it. > > Why that matters, actually? Unwinder is nothing to be performance > oriented. And if somebody is doing a lot of unwinding during runtime, > they can switch to in-this-case-faster FP unwinder. More complexity == more bugs. > > And anyway, fixing the correctness of the DWARF data is only half the > > problem IMO. The other half of the problem is unwinder complexity. > > Complex, but generic and working. IMO, it would be rather though to come > up with some tool working on different compilers or even different > versions of gcc. I mean some tool to convert the DWARF data to something > proprietary. The conversion would be as complex as is the unwinder plus > conversion to the proprietary format and its dump into ELF. We would > still rely on a (now out-of-kernel-runtime-code) complex monolith to do > it right. Complexity outside of the kernel is infinitely better than complexity in mission critical oops code. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote: > On 05/09/2017, 09:22 PM, Josh Poimboeuf wrote: > > On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote: > >> On Sun, 7 May 2017, Josh Poimboeuf wrote: > >> > >>> DWARF is great for debuggers. It helps you find all the registers on > >>> the stack, so you can see function arguments and local variables. All > >>> expressed in a nice compact format. > >>> > >>> But that's overkill for unwinders. We don't need all those registers, > >>> and the state machine is too complicated. > >> > >> OTOH if we make the failures in processing of those "auxiliary" > >> information non-fatal (in a sense that it neither causes kernel bug nor > >> does it actually corrupt the unwinding process, but the only effect is > >> losing "optional" information), having this data available doesn't hurt. > > > > But it does hurt, in the sense that the complicated format of DWARF CFI > > means the unwinder has to jump through a lot more hoops to read it. > > Why that matters, actually? Unwinder is nothing to be performance > oriented. And if somebody is doing a lot of unwinding during runtime, > they can switch to in-this-case-faster FP unwinder. More complexity == more bugs. > > And anyway, fixing the correctness of the DWARF data is only half the > > problem IMO. The other half of the problem is unwinder complexity. > > Complex, but generic and working. IMO, it would be rather though to come > up with some tool working on different compilers or even different > versions of gcc. I mean some tool to convert the DWARF data to something > proprietary. The conversion would be as complex as is the unwinder plus > conversion to the proprietary format and its dump into ELF. We would > still rely on a (now out-of-kernel-runtime-code) complex monolith to do > it right. Complexity outside of the kernel is infinitely better than complexity in mission critical oops code. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 10:15:09AM +0200, Jiri Slaby wrote: > > I'm, ahem, highly skeptical to creating our own unwinding data > > format unless there is *documented, supported, and tested* way to > > force the compiler to *automatically* fall back to frame pointers > > any time there may be complexity involved, > > I second this. Inventing a new format like this mostly ends up with > using the standard one after several iterations. One cannot think of all > the consequences and needs while proposing a new one. > > The memory footprint is ~2M for average vmlinux. And people who need to > access: > * either need it frequently -- those do not need performance (LOCKDEP, > KASAN, or other debug builds) > * or are in the middle of WARNING, BUG, crash, panic or such and this is > not that often... > > And we would need *both*. The limited proprietary one in some sort of > .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash, > gdb and so on. I don't think so. DWARF CFI is optimized for size. My proposal is to take the same data (or some subset of it) and reformat it to optimize for simplicity. If, for some reason, we ended up needing *all* the original DWARF data, we could still have it in the simpler format. In that case it might end up being 8M instead of 2M :-) But I don't see that being possible. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 10:15:09AM +0200, Jiri Slaby wrote: > > I'm, ahem, highly skeptical to creating our own unwinding data > > format unless there is *documented, supported, and tested* way to > > force the compiler to *automatically* fall back to frame pointers > > any time there may be complexity involved, > > I second this. Inventing a new format like this mostly ends up with > using the standard one after several iterations. One cannot think of all > the consequences and needs while proposing a new one. > > The memory footprint is ~2M for average vmlinux. And people who need to > access: > * either need it frequently -- those do not need performance (LOCKDEP, > KASAN, or other debug builds) > * or are in the middle of WARNING, BUG, crash, panic or such and this is > not that often... > > And we would need *both*. The limited proprietary one in some sort of > .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash, > gdb and so on. I don't think so. DWARF CFI is optimized for size. My proposal is to take the same data (or some subset of it) and reformat it to optimize for simplicity. If, for some reason, we ended up needing *all* the original DWARF data, we could still have it in the simpler format. In that case it might end up being 8M instead of 2M :-) But I don't see that being possible. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On 05/10/2017, 02:42 PM, Josh Poimboeuf wrote: > On Wed, May 10, 2017 at 09:39:39AM +0200, Jiri Slaby wrote: >> OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as >> holds now for the rest of the current unwinding: >> KASAN_SANITIZE_dumpstack.o := n >> KASAN_SANITIZE_dumpstack_$(BITS).o := n >> KASAN_SANITIZE_stacktrace.o := n > > Most of the unwinder code is now in unwind_frame.c, which *does* have > KASAN enabled. > > I think the above is leftover from the days before I rewrote the > unwinder. I'll look at renabling KASAN for those files. Ok, fair enough. I will do some measurements in the FP field while I will be at it. thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/10/2017, 02:42 PM, Josh Poimboeuf wrote: > On Wed, May 10, 2017 at 09:39:39AM +0200, Jiri Slaby wrote: >> OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as >> holds now for the rest of the current unwinding: >> KASAN_SANITIZE_dumpstack.o := n >> KASAN_SANITIZE_dumpstack_$(BITS).o := n >> KASAN_SANITIZE_stacktrace.o := n > > Most of the unwinder code is now in unwind_frame.c, which *does* have > KASAN enabled. > > I think the above is leftover from the days before I rewrote the > unwinder. I'll look at renabling KASAN for those files. Ok, fair enough. I will do some measurements in the FP field while I will be at it. thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 09:39:39AM +0200, Jiri Slaby wrote: > OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as > holds now for the rest of the current unwinding: > KASAN_SANITIZE_dumpstack.o := n > KASAN_SANITIZE_dumpstack_$(BITS).o := n > KASAN_SANITIZE_stacktrace.o := n Most of the unwinder code is now in unwind_frame.c, which *does* have KASAN enabled. I think the above is leftover from the days before I rewrote the unwinder. I'll look at renabling KASAN for those files. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Wed, May 10, 2017 at 09:39:39AM +0200, Jiri Slaby wrote: > OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as > holds now for the rest of the current unwinding: > KASAN_SANITIZE_dumpstack.o := n > KASAN_SANITIZE_dumpstack_$(BITS).o := n > KASAN_SANITIZE_stacktrace.o := n Most of the unwinder code is now in unwind_frame.c, which *does* have KASAN enabled. I think the above is leftover from the days before I rewrote the unwinder. I'll look at renabling KASAN for those files. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On 05/09/2017, 09:22 PM, Josh Poimboeuf wrote: > On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote: >> On Sun, 7 May 2017, Josh Poimboeuf wrote: >> >>> DWARF is great for debuggers. It helps you find all the registers on >>> the stack, so you can see function arguments and local variables. All >>> expressed in a nice compact format. >>> >>> But that's overkill for unwinders. We don't need all those registers, >>> and the state machine is too complicated. >> >> OTOH if we make the failures in processing of those "auxiliary" >> information non-fatal (in a sense that it neither causes kernel bug nor >> does it actually corrupt the unwinding process, but the only effect is >> losing "optional" information), having this data available doesn't hurt. > > But it does hurt, in the sense that the complicated format of DWARF CFI > means the unwinder has to jump through a lot more hoops to read it. Why that matters, actually? Unwinder is nothing to be performance oriented. And if somebody is doing a lot of unwinding during runtime, they can switch to in-this-case-faster FP unwinder. > And if we wanted it to be reasonably reliable, we'd also need to fix up > the DWARF data somehow before converting it, presumably with objtool. We have to do this anyway. Be it the DWARF info or whatever we end up with. >>> Unwinders basically only need to know one thing: given an instruction >>> address and a stack pointer, where is the caller's stack frame? >> >> Again, DWARF should be able to give us all of this (including the >> FP-fallback etc). It feels a bit silly to purposedly ignore it and >> reinvent parts of it again, instead of fixing (read: "asking toolchain >> guys to fix") And we can just do, if a totally broken compiler emerges: #if defined(CONFIG_DWARF_UNWINDER) && GCC_VERSION == 59000 #error Sorry, choose a different compiler or disable DWARF unwinder #endif We haven't to do this during the past decade and I am sceptic if we would have to do it in the next one. >> the cases where we actually are not getting the proper data >> in DWARF. That's a win-win at the end of the day. > > Most of the kernel DWARF issues I've seen aren't caused by toolchain > bugs. They're caused by the kernel's quirks: asm, inline asm, special > sections. Right. > And anyway, fixing the correctness of the DWARF data is only half the > problem IMO. The other half of the problem is unwinder complexity. Complex, but generic and working. IMO, it would be rather though to come up with some tool working on different compilers or even different versions of gcc. I mean some tool to convert the DWARF data to something proprietary. The conversion would be as complex as is the unwinder plus conversion to the proprietary format and its dump into ELF. We would still rely on a (now out-of-kernel-runtime-code) complex monolith to do it right. thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/09/2017, 09:22 PM, Josh Poimboeuf wrote: > On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote: >> On Sun, 7 May 2017, Josh Poimboeuf wrote: >> >>> DWARF is great for debuggers. It helps you find all the registers on >>> the stack, so you can see function arguments and local variables. All >>> expressed in a nice compact format. >>> >>> But that's overkill for unwinders. We don't need all those registers, >>> and the state machine is too complicated. >> >> OTOH if we make the failures in processing of those "auxiliary" >> information non-fatal (in a sense that it neither causes kernel bug nor >> does it actually corrupt the unwinding process, but the only effect is >> losing "optional" information), having this data available doesn't hurt. > > But it does hurt, in the sense that the complicated format of DWARF CFI > means the unwinder has to jump through a lot more hoops to read it. Why that matters, actually? Unwinder is nothing to be performance oriented. And if somebody is doing a lot of unwinding during runtime, they can switch to in-this-case-faster FP unwinder. > And if we wanted it to be reasonably reliable, we'd also need to fix up > the DWARF data somehow before converting it, presumably with objtool. We have to do this anyway. Be it the DWARF info or whatever we end up with. >>> Unwinders basically only need to know one thing: given an instruction >>> address and a stack pointer, where is the caller's stack frame? >> >> Again, DWARF should be able to give us all of this (including the >> FP-fallback etc). It feels a bit silly to purposedly ignore it and >> reinvent parts of it again, instead of fixing (read: "asking toolchain >> guys to fix") And we can just do, if a totally broken compiler emerges: #if defined(CONFIG_DWARF_UNWINDER) && GCC_VERSION == 59000 #error Sorry, choose a different compiler or disable DWARF unwinder #endif We haven't to do this during the past decade and I am sceptic if we would have to do it in the next one. >> the cases where we actually are not getting the proper data >> in DWARF. That's a win-win at the end of the day. > > Most of the kernel DWARF issues I've seen aren't caused by toolchain > bugs. They're caused by the kernel's quirks: asm, inline asm, special > sections. Right. > And anyway, fixing the correctness of the DWARF data is only half the > problem IMO. The other half of the problem is unwinder complexity. Complex, but generic and working. IMO, it would be rather though to come up with some tool working on different compilers or even different versions of gcc. I mean some tool to convert the DWARF data to something proprietary. The conversion would be as complex as is the unwinder plus conversion to the proprietary format and its dump into ELF. We would still rely on a (now out-of-kernel-runtime-code) complex monolith to do it right. thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/09/2017, 12:00 PM, h...@zytor.com wrote: > As far as I understand, the .eh_frame section is supposed to contain the > subset of the DWARF bytecode needed to do a stack unwind when an exception is > thrown, whereas the .debug* sections contain the full DWARF data a debugger > might want. Thus .eh_frame is mapped into the runtime process while .debug* > [usually?] is not. .debug* can easily be 10x larger than the executable text > segments. As it currently stands, the (same) data is generated either to .eh_frame, or to .debug_frame. Depending if DWARF_UNWINDER is turned on, or off, respectively. vdso has the same data in both, always. > Assembly language routines become problematic no matter what you do unless > you restrict the way the assembly can be written. Experience has shown us > that hand-maintaining annotations in assembly code is doomed to failure, and > in many cases it isn't even clear to even experienced programmers how to do > it. Sure, manual annotations of assembly will be avoided as much as possible. We have to rely objtool to generate them in most cases. > [H.J.: is the .cfi_* operations set even documented anywhere in a way that > non-compiler-writers can comprehend it?] Until now, I always looked into as manual: $ pinfo --node='CFI directives' as > I'm, ahem, highly skeptical to creating our own unwinding data format unless > there is *documented, supported, and tested* way to force the compiler to > *automatically* fall back to frame pointers any time there may be complexity > involved, I second this. Inventing a new format like this mostly ends up with using the standard one after several iterations. One cannot think of all the consequences and needs while proposing a new one. The memory footprint is ~2M for average vmlinux. And people who need to access: * either need it frequently -- those do not need performance (LOCKDEP, KASAN, or other debug builds) * or are in the middle of WARNING, BUG, crash, panic or such and this is not that often... And we would need *both*. The limited proprietary one in some sort of .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash, gdb and so on. And yes, the DWARF unwinder falls back to FP if they are available (see function dw_fp_fallback). thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/09/2017, 12:00 PM, h...@zytor.com wrote: > As far as I understand, the .eh_frame section is supposed to contain the > subset of the DWARF bytecode needed to do a stack unwind when an exception is > thrown, whereas the .debug* sections contain the full DWARF data a debugger > might want. Thus .eh_frame is mapped into the runtime process while .debug* > [usually?] is not. .debug* can easily be 10x larger than the executable text > segments. As it currently stands, the (same) data is generated either to .eh_frame, or to .debug_frame. Depending if DWARF_UNWINDER is turned on, or off, respectively. vdso has the same data in both, always. > Assembly language routines become problematic no matter what you do unless > you restrict the way the assembly can be written. Experience has shown us > that hand-maintaining annotations in assembly code is doomed to failure, and > in many cases it isn't even clear to even experienced programmers how to do > it. Sure, manual annotations of assembly will be avoided as much as possible. We have to rely objtool to generate them in most cases. > [H.J.: is the .cfi_* operations set even documented anywhere in a way that > non-compiler-writers can comprehend it?] Until now, I always looked into as manual: $ pinfo --node='CFI directives' as > I'm, ahem, highly skeptical to creating our own unwinding data format unless > there is *documented, supported, and tested* way to force the compiler to > *automatically* fall back to frame pointers any time there may be complexity > involved, I second this. Inventing a new format like this mostly ends up with using the standard one after several iterations. One cannot think of all the consequences and needs while proposing a new one. The memory footprint is ~2M for average vmlinux. And people who need to access: * either need it frequently -- those do not need performance (LOCKDEP, KASAN, or other debug builds) * or are in the middle of WARNING, BUG, crash, panic or such and this is not that often... And we would need *both*. The limited proprietary one in some sort of .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash, gdb and so on. And yes, the DWARF unwinder falls back to FP if they are available (see function dw_fp_fallback). thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/06/2017, 09:19 AM, Ingo Molnar wrote: > > * Linus Torvaldswrote: > >> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby wrote: >>> The DWARF unwinder is in place and ready. So introduce the config option >>> to allow users to enable it. It is by default off due to missing >>> assembly annotations. >> >> Who actually ends up using this? > > Also, why wasn't Josh Poimboeuf Cc:-ed, who de-facto maintains the x86 > unwinding > code? I explicitly CCed Josh on 5/7 and 6/7 which touches the code. Besides that, I assumed he is implicitly CCed via live-patch...@vger.kernel.org which is carbon-copied on each of the patches. > AFAICS this series is just repeating the old mistakes of the old Dwarf > unwinders > of trusting GCC's unwinder data. So NAK for the time being on the whole > approach: > > NAcked-by: Ingo Molnar OK, as the series stands now, we indeed do. Noteworthy, we, in SUSE, had no problems with this reliance for all the time we have been using the unwinder. Anyway, objtool is about to vaidate the DWARF data, generate it for assembly and potentially fix it if problems occur. Could you elaborate on what else would help you to change your stance? thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/06/2017, 09:19 AM, Ingo Molnar wrote: > > * Linus Torvalds wrote: > >> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby wrote: >>> The DWARF unwinder is in place and ready. So introduce the config option >>> to allow users to enable it. It is by default off due to missing >>> assembly annotations. >> >> Who actually ends up using this? > > Also, why wasn't Josh Poimboeuf Cc:-ed, who de-facto maintains the x86 > unwinding > code? I explicitly CCed Josh on 5/7 and 6/7 which touches the code. Besides that, I assumed he is implicitly CCed via live-patch...@vger.kernel.org which is carbon-copied on each of the patches. > AFAICS this series is just repeating the old mistakes of the old Dwarf > unwinders > of trusting GCC's unwinder data. So NAK for the time being on the whole > approach: > > NAcked-by: Ingo Molnar OK, as the series stands now, we indeed do. Noteworthy, we, in SUSE, had no problems with this reliance for all the time we have been using the unwinder. Anyway, objtool is about to vaidate the DWARF data, generate it for assembly and potentially fix it if problems occur. Could you elaborate on what else would help you to change your stance? thanks, -- js suse labs
Re: [PATCH 7/7] DWARF: add the config option
On 05/05/2017, 09:57 PM, Linus Torvalds wrote: > On Fri, May 5, 2017 at 5:22 AM, Jiri Slabywrote: >> The DWARF unwinder is in place and ready. So introduce the config option >> to allow users to enable it. It is by default off due to missing >> assembly annotations. > > Who actually ends up using this? Every SUSE user has been using this for almost a decade and we are not about to switch to FP for performance reasons as noted by Jiri Kosina. So SUSE users are going to be exposed to DWARF unwinder for another decade or so at least. Therefore, this is another attempt to make the unwinder (in some form) upstream. Since this was first proposed many years ago, we have been forced to forward-port it over and over and everyone knows what pain it is. So it is nice, that this opened the discussion at least. > Because from the last time we had fancy unwindoers, and all the > problems it caused for oops handling with absolutely _zero_ upsides > ever, I do not ever again want to see fancy unwinders with complex > state machine handling used by the oopsing code. Well, reliable stack-traces with minimal performance impact thanks to out-of-band data is hell good reason in my opinion. > The fact that it gets disabled for KASAN also makes me suspicious. It > basically means that now all the accesses it does are not even > validated. OK, I inclined to disable KASAN when I started cleaning this up for _performance_ reasons. The system was so slow, that the RCU stall or soft-lockup detectors came up complaining. From that time, I measured the bottlenecks and optimized the unwinder so that 1000 iterations of unwinding takes: Before: real0m1.808s user0m0.001s sys 0m1.807s After: real0m0.018s user0m0.001s sys 0m0.017s So let me check, whether KASAN still has to be disabled globally. I do not think so. OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as holds now for the rest of the current unwinding: KASAN_SANITIZE_dumpstack.o := n KASAN_SANITIZE_dumpstack_$(BITS).o := n KASAN_SANITIZE_stacktrace.o := n Still, I can let KASAN := y for dwarf.c for testing purposes locally and smoke-test the unwinder. > The fact that the most of the code seems to be disabled for the first > six patches, and then just enabled in the last patch, also seems to > mean that the series also gets no bisection coverage or testing that > the individual patches make any sense. (ie there's a lot of code > inside "CONFIG_DWARF_UNWIND" in the early patches but that config > option cannot even be enabled until the last patch). Correct. This was one big patch previously. I separated that patch into several smaller commits touching different places of the kernel for easier review. It does not make sense to test any of the patches separately except the first. Hence the config option which enables the rest of the series is the last one. I deemed this as one of possible approaches to split patches (I have seen this many times in the past.) I can of course squash this back into a single patch (or two). > We used to have nasty issues with not just missing dwarf info, but > also actively *wrong* dwarf info. Compiler versions that generated > subtly wrong info, because nobody actually really depended on it, and > the people who had tested it seldom did the kinds of things we do in > the kernel (eg inline asms etc). I must admit I am not aware of any issues in this manner during the years. Again, this unwinder is the default in SUSE kernels since ever, so we have been using gcc from at least 3.2 to 7. But see below. > So I'm personally still very suspicious of these things. > > Last time I had huge issues with people also always blaming *anything* > else than that unwinder. It was always "oh, somebody wrote asm without > getting it right". Or "oh, the compiler generated bad tables, it's not > *my* fault that now the kernel oopsing code no longer works". Now we have objtool. My objtool clone: 1) verifies the DWARF data (prepared by Josh) 2) generates DWARF data for assembly -- incomplete yet: see the thread about x86 assembly cleanup which is a pre-requisite for this to work. This is BTW the reason why the DWARF unwinder is default-off in this series yet. And we can add: 3) fix up the data, if they are wrong That said, objtool could handle the data so they are correct and as-expected for the unwinder. Without objtool, the data (and unwinder) is hopeless (only vdso from all the assembly is annotated.) > When I asked for more stricter debug table validation to avoid issues, > it was always "oh, we fixed it, no worries", and then two months later > somebody hit another issue. Reasonable, indeed. I am all for strict checking. objtool is to do that. > Put another way; the last time we did crazy stuff like this, it got > reverted. For a damn good reason, despite some people being in denial > about those reasons. Speaking for myself, having
Re: [PATCH 7/7] DWARF: add the config option
On 05/05/2017, 09:57 PM, Linus Torvalds wrote: > On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby wrote: >> The DWARF unwinder is in place and ready. So introduce the config option >> to allow users to enable it. It is by default off due to missing >> assembly annotations. > > Who actually ends up using this? Every SUSE user has been using this for almost a decade and we are not about to switch to FP for performance reasons as noted by Jiri Kosina. So SUSE users are going to be exposed to DWARF unwinder for another decade or so at least. Therefore, this is another attempt to make the unwinder (in some form) upstream. Since this was first proposed many years ago, we have been forced to forward-port it over and over and everyone knows what pain it is. So it is nice, that this opened the discussion at least. > Because from the last time we had fancy unwindoers, and all the > problems it caused for oops handling with absolutely _zero_ upsides > ever, I do not ever again want to see fancy unwinders with complex > state machine handling used by the oopsing code. Well, reliable stack-traces with minimal performance impact thanks to out-of-band data is hell good reason in my opinion. > The fact that it gets disabled for KASAN also makes me suspicious. It > basically means that now all the accesses it does are not even > validated. OK, I inclined to disable KASAN when I started cleaning this up for _performance_ reasons. The system was so slow, that the RCU stall or soft-lockup detectors came up complaining. From that time, I measured the bottlenecks and optimized the unwinder so that 1000 iterations of unwinding takes: Before: real0m1.808s user0m0.001s sys 0m1.807s After: real0m0.018s user0m0.001s sys 0m0.017s So let me check, whether KASAN still has to be disabled globally. I do not think so. OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as holds now for the rest of the current unwinding: KASAN_SANITIZE_dumpstack.o := n KASAN_SANITIZE_dumpstack_$(BITS).o := n KASAN_SANITIZE_stacktrace.o := n Still, I can let KASAN := y for dwarf.c for testing purposes locally and smoke-test the unwinder. > The fact that the most of the code seems to be disabled for the first > six patches, and then just enabled in the last patch, also seems to > mean that the series also gets no bisection coverage or testing that > the individual patches make any sense. (ie there's a lot of code > inside "CONFIG_DWARF_UNWIND" in the early patches but that config > option cannot even be enabled until the last patch). Correct. This was one big patch previously. I separated that patch into several smaller commits touching different places of the kernel for easier review. It does not make sense to test any of the patches separately except the first. Hence the config option which enables the rest of the series is the last one. I deemed this as one of possible approaches to split patches (I have seen this many times in the past.) I can of course squash this back into a single patch (or two). > We used to have nasty issues with not just missing dwarf info, but > also actively *wrong* dwarf info. Compiler versions that generated > subtly wrong info, because nobody actually really depended on it, and > the people who had tested it seldom did the kinds of things we do in > the kernel (eg inline asms etc). I must admit I am not aware of any issues in this manner during the years. Again, this unwinder is the default in SUSE kernels since ever, so we have been using gcc from at least 3.2 to 7. But see below. > So I'm personally still very suspicious of these things. > > Last time I had huge issues with people also always blaming *anything* > else than that unwinder. It was always "oh, somebody wrote asm without > getting it right". Or "oh, the compiler generated bad tables, it's not > *my* fault that now the kernel oopsing code no longer works". Now we have objtool. My objtool clone: 1) verifies the DWARF data (prepared by Josh) 2) generates DWARF data for assembly -- incomplete yet: see the thread about x86 assembly cleanup which is a pre-requisite for this to work. This is BTW the reason why the DWARF unwinder is default-off in this series yet. And we can add: 3) fix up the data, if they are wrong That said, objtool could handle the data so they are correct and as-expected for the unwinder. Without objtool, the data (and unwinder) is hopeless (only vdso from all the assembly is annotated.) > When I asked for more stricter debug table validation to avoid issues, > it was always "oh, we fixed it, no worries", and then two months later > somebody hit another issue. Reasonable, indeed. I am all for strict checking. objtool is to do that. > Put another way; the last time we did crazy stuff like this, it got > reverted. For a damn good reason, despite some people being in denial > about those reasons. Speaking for myself, having it out-of-tree
Re: [PATCH 7/7] DWARF: add the config option
On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote: > On Sun, 7 May 2017, Josh Poimboeuf wrote: > > > DWARF is great for debuggers. It helps you find all the registers on > > the stack, so you can see function arguments and local variables. All > > expressed in a nice compact format. > > > > But that's overkill for unwinders. We don't need all those registers, > > and the state machine is too complicated. > > OTOH if we make the failures in processing of those "auxiliary" > information non-fatal (in a sense that it neither causes kernel bug nor > does it actually corrupt the unwinding process, but the only effect is > losing "optional" information), having this data available doesn't hurt. But it does hurt, in the sense that the complicated format of DWARF CFI means the unwinder has to jump through a lot more hoops to read it. > It's there anyway for builds containing debuginfo, and the information is > all there so that it can be used by things like gdb or crash, so it seems > natural to re-use as much as possible of it. There's a valid argument to be made that we should start with the DWARF data instead of creating the new data from scratch. That might be fine. Right now I don't have a strong feeling about it either way. But if we do that, we should still convert the DWARF data to a simple streamlined format for the in-kernel unwinder, so it can easily be read by the kernel without having to fire up a DWARF state machine in the middle of an oops. And if we wanted it to be reasonably reliable, we'd also need to fix up the DWARF data somehow before converting it, presumably with objtool. > > Unwinders basically only need to know one thing: given an instruction > > address and a stack pointer, where is the caller's stack frame? > > Again, DWARF should be able to give us all of this (including the > FP-fallback etc). It feels a bit silly to purposedly ignore it and > reinvent parts of it again, instead of fixing (read: "asking toolchain > guys to fix") the cases where we actually are not getting the proper data > in DWARF. That's a win-win at the end of the day. Most of the kernel DWARF issues I've seen aren't caused by toolchain bugs. They're caused by the kernel's quirks: asm, inline asm, special sections. And anyway, fixing the correctness of the DWARF data is only half the problem IMO. The other half of the problem is unwinder complexity. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote: > On Sun, 7 May 2017, Josh Poimboeuf wrote: > > > DWARF is great for debuggers. It helps you find all the registers on > > the stack, so you can see function arguments and local variables. All > > expressed in a nice compact format. > > > > But that's overkill for unwinders. We don't need all those registers, > > and the state machine is too complicated. > > OTOH if we make the failures in processing of those "auxiliary" > information non-fatal (in a sense that it neither causes kernel bug nor > does it actually corrupt the unwinding process, but the only effect is > losing "optional" information), having this data available doesn't hurt. But it does hurt, in the sense that the complicated format of DWARF CFI means the unwinder has to jump through a lot more hoops to read it. > It's there anyway for builds containing debuginfo, and the information is > all there so that it can be used by things like gdb or crash, so it seems > natural to re-use as much as possible of it. There's a valid argument to be made that we should start with the DWARF data instead of creating the new data from scratch. That might be fine. Right now I don't have a strong feeling about it either way. But if we do that, we should still convert the DWARF data to a simple streamlined format for the in-kernel unwinder, so it can easily be read by the kernel without having to fire up a DWARF state machine in the middle of an oops. And if we wanted it to be reasonably reliable, we'd also need to fix up the DWARF data somehow before converting it, presumably with objtool. > > Unwinders basically only need to know one thing: given an instruction > > address and a stack pointer, where is the caller's stack frame? > > Again, DWARF should be able to give us all of this (including the > FP-fallback etc). It feels a bit silly to purposedly ignore it and > reinvent parts of it again, instead of fixing (read: "asking toolchain > guys to fix") the cases where we actually are not getting the proper data > in DWARF. That's a win-win at the end of the day. Most of the kernel DWARF issues I've seen aren't caused by toolchain bugs. They're caused by the kernel's quirks: asm, inline asm, special sections. And anyway, fixing the correctness of the DWARF data is only half the problem IMO. The other half of the problem is unwinder complexity. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Sun, 7 May 2017, Josh Poimboeuf wrote: > DWARF is great for debuggers. It helps you find all the registers on > the stack, so you can see function arguments and local variables. All > expressed in a nice compact format. > > But that's overkill for unwinders. We don't need all those registers, > and the state machine is too complicated. OTOH if we make the failures in processing of those "auxiliary" information non-fatal (in a sense that it neither causes kernel bug nor does it actually corrupt the unwinding process, but the only effect is losing "optional" information), having this data available doesn't hurt. It's there anyway for builds containing debuginfo, and the information is all there so that it can be used by things like gdb or crash, so it seems natural to re-use as much as possible of it. > Unwinders basically only need to know one thing: given an instruction > address and a stack pointer, where is the caller's stack frame? Again, DWARF should be able to give us all of this (including the FP-fallback etc). It feels a bit silly to purposedly ignore it and reinvent parts of it again, instead of fixing (read: "asking toolchain guys to fix") the cases where we actually are not getting the proper data in DWARF. That's a win-win at the end of the day. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH 7/7] DWARF: add the config option
On Sun, 7 May 2017, Josh Poimboeuf wrote: > DWARF is great for debuggers. It helps you find all the registers on > the stack, so you can see function arguments and local variables. All > expressed in a nice compact format. > > But that's overkill for unwinders. We don't need all those registers, > and the state machine is too complicated. OTOH if we make the failures in processing of those "auxiliary" information non-fatal (in a sense that it neither causes kernel bug nor does it actually corrupt the unwinding process, but the only effect is losing "optional" information), having this data available doesn't hurt. It's there anyway for builds containing debuginfo, and the information is all there so that it can be used by things like gdb or crash, so it seems natural to re-use as much as possible of it. > Unwinders basically only need to know one thing: given an instruction > address and a stack pointer, where is the caller's stack frame? Again, DWARF should be able to give us all of this (including the FP-fallback etc). It feels a bit silly to purposedly ignore it and reinvent parts of it again, instead of fixing (read: "asking toolchain guys to fix") the cases where we actually are not getting the proper data in DWARF. That's a win-win at the end of the day. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH 7/7] DWARF: add the config option
On Tue, May 9, 2017 at 7:58 AM, Josh Poimboeufwrote: > On Tue, May 09, 2017 at 03:00:45AM -0700, h...@zytor.com wrote: >> I'm, ahem, highly skeptical to creating our own unwinding data format >> unless there is *documented, supported, and tested* way to force the >> compiler to *automatically* fall back to frame pointers any time there >> may be complexity involved, which at a very minimum includes any kind >> of data-dependent manipulation of the stack pointer. > > That would be nice. But isn't falling back to a frame pointer (or > another callee-saved reg or a stack location) already needed in such > cases? Otherwise how could DWARF unwinding work? > >> Otherwise you will have to fail the kernel build when your static tool >> runs into instruction sequences it can't deal with, but the compiler >> may generate them - boom. > > Failing the build is harsh, we could just warn about it and skip the > data for the affected function(s). > > BTW, there is another option. Instead of generating the data from > scratch, we could just convert gcc's DWARF CFI to the format we need. > > However that wouldn't solve the problems we have with the holes and > inaccuracies in DWARF from our hand-annotated asm, inline asm, and > special sections (extable, alternatives, etc). We'd still have to rely > on objtool for that, so we'd still be in the same boat of needing > objtool to be able to follow gcc code paths. CFI directives are documented in GNU assembler manual. They store unwind info in .eh_frame section. They work well with assembly codes in glibc. But I don't know how well they work with kernel unwind. > So yes, it sucks that objtool needs to work for unwinding to work. But > if we want decent DWARF-esque unwinding, I don't see any way around > that due to the low-level nature of the kernel. > >> Worse, your tool will not even recognize the problem and you're in a >> worse place than when you started. > > We could have a runtime NMI-based stack checker which ensures it can > always unwind to the bottom of the stack. Over time this would > hopefully provide full validation of the unwinder data and > functionality. > > -- > Josh -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Tue, May 9, 2017 at 7:58 AM, Josh Poimboeuf wrote: > On Tue, May 09, 2017 at 03:00:45AM -0700, h...@zytor.com wrote: >> I'm, ahem, highly skeptical to creating our own unwinding data format >> unless there is *documented, supported, and tested* way to force the >> compiler to *automatically* fall back to frame pointers any time there >> may be complexity involved, which at a very minimum includes any kind >> of data-dependent manipulation of the stack pointer. > > That would be nice. But isn't falling back to a frame pointer (or > another callee-saved reg or a stack location) already needed in such > cases? Otherwise how could DWARF unwinding work? > >> Otherwise you will have to fail the kernel build when your static tool >> runs into instruction sequences it can't deal with, but the compiler >> may generate them - boom. > > Failing the build is harsh, we could just warn about it and skip the > data for the affected function(s). > > BTW, there is another option. Instead of generating the data from > scratch, we could just convert gcc's DWARF CFI to the format we need. > > However that wouldn't solve the problems we have with the holes and > inaccuracies in DWARF from our hand-annotated asm, inline asm, and > special sections (extable, alternatives, etc). We'd still have to rely > on objtool for that, so we'd still be in the same boat of needing > objtool to be able to follow gcc code paths. CFI directives are documented in GNU assembler manual. They store unwind info in .eh_frame section. They work well with assembly codes in glibc. But I don't know how well they work with kernel unwind. > So yes, it sucks that objtool needs to work for unwinding to work. But > if we want decent DWARF-esque unwinding, I don't see any way around > that due to the low-level nature of the kernel. > >> Worse, your tool will not even recognize the problem and you're in a >> worse place than when you started. > > We could have a runtime NMI-based stack checker which ensures it can > always unwind to the bottom of the stack. Over time this would > hopefully provide full validation of the unwinder data and > functionality. > > -- > Josh -- H.J.
Re: [PATCH 7/7] DWARF: add the config option
On Tue, May 09, 2017 at 03:00:45AM -0700, h...@zytor.com wrote: > I'm, ahem, highly skeptical to creating our own unwinding data format > unless there is *documented, supported, and tested* way to force the > compiler to *automatically* fall back to frame pointers any time there > may be complexity involved, which at a very minimum includes any kind > of data-dependent manipulation of the stack pointer. That would be nice. But isn't falling back to a frame pointer (or another callee-saved reg or a stack location) already needed in such cases? Otherwise how could DWARF unwinding work? > Otherwise you will have to fail the kernel build when your static tool > runs into instruction sequences it can't deal with, but the compiler > may generate them - boom. Failing the build is harsh, we could just warn about it and skip the data for the affected function(s). BTW, there is another option. Instead of generating the data from scratch, we could just convert gcc's DWARF CFI to the format we need. However that wouldn't solve the problems we have with the holes and inaccuracies in DWARF from our hand-annotated asm, inline asm, and special sections (extable, alternatives, etc). We'd still have to rely on objtool for that, so we'd still be in the same boat of needing objtool to be able to follow gcc code paths. So yes, it sucks that objtool needs to work for unwinding to work. But if we want decent DWARF-esque unwinding, I don't see any way around that due to the low-level nature of the kernel. > Worse, your tool will not even recognize the problem and you're in a > worse place than when you started. We could have a runtime NMI-based stack checker which ensures it can always unwind to the bottom of the stack. Over time this would hopefully provide full validation of the unwinder data and functionality. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Tue, May 09, 2017 at 03:00:45AM -0700, h...@zytor.com wrote: > I'm, ahem, highly skeptical to creating our own unwinding data format > unless there is *documented, supported, and tested* way to force the > compiler to *automatically* fall back to frame pointers any time there > may be complexity involved, which at a very minimum includes any kind > of data-dependent manipulation of the stack pointer. That would be nice. But isn't falling back to a frame pointer (or another callee-saved reg or a stack location) already needed in such cases? Otherwise how could DWARF unwinding work? > Otherwise you will have to fail the kernel build when your static tool > runs into instruction sequences it can't deal with, but the compiler > may generate them - boom. Failing the build is harsh, we could just warn about it and skip the data for the affected function(s). BTW, there is another option. Instead of generating the data from scratch, we could just convert gcc's DWARF CFI to the format we need. However that wouldn't solve the problems we have with the holes and inaccuracies in DWARF from our hand-annotated asm, inline asm, and special sections (extable, alternatives, etc). We'd still have to rely on objtool for that, so we'd still be in the same boat of needing objtool to be able to follow gcc code paths. So yes, it sucks that objtool needs to work for unwinding to work. But if we want decent DWARF-esque unwinding, I don't see any way around that due to the low-level nature of the kernel. > Worse, your tool will not even recognize the problem and you're in a > worse place than when you started. We could have a runtime NMI-based stack checker which ensures it can always unwind to the bottom of the stack. Over time this would hopefully provide full validation of the unwinder data and functionality. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On May 8, 2017 8:38:31 PM PDT, Josh Poimboeufwrote: >On Mon, May 08, 2017 at 07:31:50PM -0700, Andy Lutomirski wrote: >> On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf >wrote: >> >> Also, don't you need some indication of which reg is the base from >> >> which you find previous frame? After all, sometimes GCC will emit >a >> >> frame pointer even in an otherwise frame-pointer-omitting kernel. >> > >> > I don't think we *need* to do that. I believe the base reg can >just >> > always[*] be the stack pointer, even with frame pointers. >> >> What if there are functions that use alloca or variable length arrays >> on the stack? Functions using AHASH_REQUEST_ON_STACK come to mind. > >Wow, mind blown. This is why I added you to CC! > >Ok, I guess we'll need to be able to use the frame pointer as a base >reg. It should be easy anyway. [Adding H.J. Lu for obvious expertise - H.J., the issue at hand is doing stack unwinding to display the call stack on a kernel failure. Right now the standard kernel configuration is using frame pointers even on x86-64 because we had very bad experiences with fragility of .eh_frame-based unwinding. However, on some kernel workloads the cost of frame pointers I is quite high.] Yes is exactly why. I believe gcc will still use a frame pointer when the relationship between sp and the incoming stack frame is indeterminate for some reason, but I have to admit that a) I'm not 100% sure and b) there are DWARF annotations for exactly this, so even if it is true for current gcc that a frame pointer is not needed it might not be in the future. As far as I understand, the .eh_frame section is supposed to contain the subset of the DWARF bytecode needed to do a stack unwind when an exception is thrown, whereas the .debug* sections contain the full DWARF data a debugger might want. Thus .eh_frame is mapped into the runtime process while .debug* [usually?] is not. .debug* can easily be 10x larger than the executable text segments. Since C doesn't *have* exceptions, the main user of this for C code is the case of calls C++ > C > C++ or equivalent; thus the vulnerability to toolchain filters failures – the case of an exception-caused unwind in the middle of a C function might not have been extensively tested. [H.J.: is that correct? Or could an asynchronous event like a signal cause an unwind of the .eh_frame from an arbitrary point? If so, how is that tested?] In the case of the kernel, size matters in a different way, because even though it will be cache cold this data takes up unreclaimable RAM. However, frame pointer-related code eats up not just RAM but cache. Assembly language routines become problematic no matter what you do unless you restrict the way the assembly can be written. Experience has shown us that hand-maintaining annotations in assembly code is doomed to failure, and in many cases it isn't even clear to even experienced programmers how to do it. [H.J.: is the .cfi_* operations set even documented anywhere in a way that non-compiler-writers can comprehend it?] Inline assembly becomes problematic in the case of non-frame-pointer builds if the assembly changes the stack pointer. A static tool might be able to annotate simple cases like push/pop. I'm, ahem, highly skeptical to creating our own unwinding data format unless there is *documented, supported, and tested* way to force the compiler to *automatically* fall back to frame pointers any time there may be complexity involved, which at a very minimum includes any kind of data-dependent manipulation of the stack pointer. Otherwise you will have to fail the kernel build when your static tool runs into instruction sequences it can't deal with, but the compiler may generate them - boom. Worse, your tool will not even recognize the problem and you're in a worse place than when you started. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH 7/7] DWARF: add the config option
On May 8, 2017 8:38:31 PM PDT, Josh Poimboeuf wrote: >On Mon, May 08, 2017 at 07:31:50PM -0700, Andy Lutomirski wrote: >> On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf >wrote: >> >> Also, don't you need some indication of which reg is the base from >> >> which you find previous frame? After all, sometimes GCC will emit >a >> >> frame pointer even in an otherwise frame-pointer-omitting kernel. >> > >> > I don't think we *need* to do that. I believe the base reg can >just >> > always[*] be the stack pointer, even with frame pointers. >> >> What if there are functions that use alloca or variable length arrays >> on the stack? Functions using AHASH_REQUEST_ON_STACK come to mind. > >Wow, mind blown. This is why I added you to CC! > >Ok, I guess we'll need to be able to use the frame pointer as a base >reg. It should be easy anyway. [Adding H.J. Lu for obvious expertise - H.J., the issue at hand is doing stack unwinding to display the call stack on a kernel failure. Right now the standard kernel configuration is using frame pointers even on x86-64 because we had very bad experiences with fragility of .eh_frame-based unwinding. However, on some kernel workloads the cost of frame pointers I is quite high.] Yes is exactly why. I believe gcc will still use a frame pointer when the relationship between sp and the incoming stack frame is indeterminate for some reason, but I have to admit that a) I'm not 100% sure and b) there are DWARF annotations for exactly this, so even if it is true for current gcc that a frame pointer is not needed it might not be in the future. As far as I understand, the .eh_frame section is supposed to contain the subset of the DWARF bytecode needed to do a stack unwind when an exception is thrown, whereas the .debug* sections contain the full DWARF data a debugger might want. Thus .eh_frame is mapped into the runtime process while .debug* [usually?] is not. .debug* can easily be 10x larger than the executable text segments. Since C doesn't *have* exceptions, the main user of this for C code is the case of calls C++ > C > C++ or equivalent; thus the vulnerability to toolchain filters failures – the case of an exception-caused unwind in the middle of a C function might not have been extensively tested. [H.J.: is that correct? Or could an asynchronous event like a signal cause an unwind of the .eh_frame from an arbitrary point? If so, how is that tested?] In the case of the kernel, size matters in a different way, because even though it will be cache cold this data takes up unreclaimable RAM. However, frame pointer-related code eats up not just RAM but cache. Assembly language routines become problematic no matter what you do unless you restrict the way the assembly can be written. Experience has shown us that hand-maintaining annotations in assembly code is doomed to failure, and in many cases it isn't even clear to even experienced programmers how to do it. [H.J.: is the .cfi_* operations set even documented anywhere in a way that non-compiler-writers can comprehend it?] Inline assembly becomes problematic in the case of non-frame-pointer builds if the assembly changes the stack pointer. A static tool might be able to annotate simple cases like push/pop. I'm, ahem, highly skeptical to creating our own unwinding data format unless there is *documented, supported, and tested* way to force the compiler to *automatically* fall back to frame pointers any time there may be complexity involved, which at a very minimum includes any kind of data-dependent manipulation of the stack pointer. Otherwise you will have to fail the kernel build when your static tool runs into instruction sequences it can't deal with, but the compiler may generate them - boom. Worse, your tool will not even recognize the problem and you're in a worse place than when you started. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 08, 2017 at 07:31:50PM -0700, Andy Lutomirski wrote: > On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeufwrote: > >> Also, don't you need some indication of which reg is the base from > >> which you find previous frame? After all, sometimes GCC will emit a > >> frame pointer even in an otherwise frame-pointer-omitting kernel. > > > > I don't think we *need* to do that. I believe the base reg can just > > always[*] be the stack pointer, even with frame pointers. > > What if there are functions that use alloca or variable length arrays > on the stack? Functions using AHASH_REQUEST_ON_STACK come to mind. Wow, mind blown. This is why I added you to CC! Ok, I guess we'll need to be able to use the frame pointer as a base reg. It should be easy anyway. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 08, 2017 at 07:31:50PM -0700, Andy Lutomirski wrote: > On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf wrote: > >> Also, don't you need some indication of which reg is the base from > >> which you find previous frame? After all, sometimes GCC will emit a > >> frame pointer even in an otherwise frame-pointer-omitting kernel. > > > > I don't think we *need* to do that. I believe the base reg can just > > always[*] be the stack pointer, even with frame pointers. > > What if there are functions that use alloca or variable length arrays > on the stack? Functions using AHASH_REQUEST_ON_STACK come to mind. Wow, mind blown. This is why I added you to CC! Ok, I guess we'll need to be able to use the frame pointer as a base reg. It should be easy anyway. -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeufwrote: > On Mon, May 08, 2017 at 05:21:24PM -0700, Andy Lutomirski wrote: >> On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf wrote: >> > struct undwarf { >> > unsigned int ip;/* instruction pointer (relative >> > offset from base) */ >> > unsigned prev_frame:13; /* offset to previous frame from >> > current stack pointer */ >> > unsigned regs:1;/* whether prev_frame contains >> > entry regs (regs->ip) */ >> > unsigned align:2; /* some details for dealing with >> > gcc stack realignment */ >> > } __packed; >> > >> > extern struct undwarf undwarves[]; >> >> Some comments in case you're actually planning to do this: >> >> 'unsigned int ip' is the majority of the size of this thing. It might >> be worth trying to store a lot fewer bits. You could split the >> structure into: >> >> struct undwarf_header { >> unsigned long start_ip; >> unsigned align:2; /* i'm assuming this rarely changes */ >> ...; >> unsigned int offset_to_details; >> }; >> >> and >> >> struct undwarf_details { >> unsigned short ip_offset; >> unsigned short prev_frame; >> }; >> >> and you'd find the details by first looking up the last header before >> the ip and then finding the details starting at (uintptr_t)header + >> header->offset_to_details. > > Good idea. According to some back-of-a-napkin math, a scheme like this > could reduce the data size from 1.8M down to 1.2M with my kernel config, > a not-too-shabby 600k savings. > >> Also, don't you need some indication of which reg is the base from >> which you find previous frame? After all, sometimes GCC will emit a >> frame pointer even in an otherwise frame-pointer-omitting kernel. > > I don't think we *need* to do that. I believe the base reg can just > always[*] be the stack pointer, even with frame pointers. What if there are functions that use alloca or variable length arrays on the stack? Functions using AHASH_REQUEST_ON_STACK come to mind. --Andy
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf wrote: > On Mon, May 08, 2017 at 05:21:24PM -0700, Andy Lutomirski wrote: >> On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf wrote: >> > struct undwarf { >> > unsigned int ip;/* instruction pointer (relative >> > offset from base) */ >> > unsigned prev_frame:13; /* offset to previous frame from >> > current stack pointer */ >> > unsigned regs:1;/* whether prev_frame contains >> > entry regs (regs->ip) */ >> > unsigned align:2; /* some details for dealing with >> > gcc stack realignment */ >> > } __packed; >> > >> > extern struct undwarf undwarves[]; >> >> Some comments in case you're actually planning to do this: >> >> 'unsigned int ip' is the majority of the size of this thing. It might >> be worth trying to store a lot fewer bits. You could split the >> structure into: >> >> struct undwarf_header { >> unsigned long start_ip; >> unsigned align:2; /* i'm assuming this rarely changes */ >> ...; >> unsigned int offset_to_details; >> }; >> >> and >> >> struct undwarf_details { >> unsigned short ip_offset; >> unsigned short prev_frame; >> }; >> >> and you'd find the details by first looking up the last header before >> the ip and then finding the details starting at (uintptr_t)header + >> header->offset_to_details. > > Good idea. According to some back-of-a-napkin math, a scheme like this > could reduce the data size from 1.8M down to 1.2M with my kernel config, > a not-too-shabby 600k savings. > >> Also, don't you need some indication of which reg is the base from >> which you find previous frame? After all, sometimes GCC will emit a >> frame pointer even in an otherwise frame-pointer-omitting kernel. > > I don't think we *need* to do that. I believe the base reg can just > always[*] be the stack pointer, even with frame pointers. What if there are functions that use alloca or variable length arrays on the stack? Functions using AHASH_REQUEST_ON_STACK come to mind. --Andy
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 08, 2017 at 05:21:24PM -0700, Andy Lutomirski wrote: > On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeufwrote: > > struct undwarf { > > unsigned int ip;/* instruction pointer (relative > > offset from base) */ > > unsigned prev_frame:13; /* offset to previous frame from > > current stack pointer */ > > unsigned regs:1;/* whether prev_frame contains > > entry regs (regs->ip) */ > > unsigned align:2; /* some details for dealing with > > gcc stack realignment */ > > } __packed; > > > > extern struct undwarf undwarves[]; > > Some comments in case you're actually planning to do this: > > 'unsigned int ip' is the majority of the size of this thing. It might > be worth trying to store a lot fewer bits. You could split the > structure into: > > struct undwarf_header { > unsigned long start_ip; > unsigned align:2; /* i'm assuming this rarely changes */ > ...; > unsigned int offset_to_details; > }; > > and > > struct undwarf_details { > unsigned short ip_offset; > unsigned short prev_frame; > }; > > and you'd find the details by first looking up the last header before > the ip and then finding the details starting at (uintptr_t)header + > header->offset_to_details. Good idea. According to some back-of-a-napkin math, a scheme like this could reduce the data size from 1.8M down to 1.2M with my kernel config, a not-too-shabby 600k savings. > Also, don't you need some indication of which reg is the base from > which you find previous frame? After all, sometimes GCC will emit a > frame pointer even in an otherwise frame-pointer-omitting kernel. I don't think we *need* to do that. I believe the base reg can just always[*] be the stack pointer, even with frame pointers. That said, it might be beneficial to use the frame pointer as the base reg where applicable, because it would shrink the data size, especially in a kernel with frame pointers enabled. And I think we would only need an extra bit to store that info. First I'll just start with the simplest possible scheme (i.e., what I proposed above). Even that calculates out to be smaller than the DWARF .eh_frame stuff. Then after that we can look at compression techniques (and their associated tradeoffs). [*] ignoring rare gcc stack realignments -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Mon, May 08, 2017 at 05:21:24PM -0700, Andy Lutomirski wrote: > On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf wrote: > > struct undwarf { > > unsigned int ip;/* instruction pointer (relative > > offset from base) */ > > unsigned prev_frame:13; /* offset to previous frame from > > current stack pointer */ > > unsigned regs:1;/* whether prev_frame contains > > entry regs (regs->ip) */ > > unsigned align:2; /* some details for dealing with > > gcc stack realignment */ > > } __packed; > > > > extern struct undwarf undwarves[]; > > Some comments in case you're actually planning to do this: > > 'unsigned int ip' is the majority of the size of this thing. It might > be worth trying to store a lot fewer bits. You could split the > structure into: > > struct undwarf_header { > unsigned long start_ip; > unsigned align:2; /* i'm assuming this rarely changes */ > ...; > unsigned int offset_to_details; > }; > > and > > struct undwarf_details { > unsigned short ip_offset; > unsigned short prev_frame; > }; > > and you'd find the details by first looking up the last header before > the ip and then finding the details starting at (uintptr_t)header + > header->offset_to_details. Good idea. According to some back-of-a-napkin math, a scheme like this could reduce the data size from 1.8M down to 1.2M with my kernel config, a not-too-shabby 600k savings. > Also, don't you need some indication of which reg is the base from > which you find previous frame? After all, sometimes GCC will emit a > frame pointer even in an otherwise frame-pointer-omitting kernel. I don't think we *need* to do that. I believe the base reg can just always[*] be the stack pointer, even with frame pointers. That said, it might be beneficial to use the frame pointer as the base reg where applicable, because it would shrink the data size, especially in a kernel with frame pointers enabled. And I think we would only need an extra bit to store that info. First I'll just start with the simplest possible scheme (i.e., what I proposed above). Even that calculates out to be smaller than the DWARF .eh_frame stuff. Then after that we can look at compression techniques (and their associated tradeoffs). [*] ignoring rare gcc stack realignments -- Josh
Re: [PATCH 7/7] DWARF: add the config option
On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeufwrote: > On Fri, May 05, 2017 at 12:57:11PM -0700, Linus Torvalds wrote: >> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby wrote: >> > The DWARF unwinder is in place and ready. So introduce the config option >> > to allow users to enable it. It is by default off due to missing >> > assembly annotations. >> >> Who actually ends up using this? >> >> Because from the last time we had fancy unwindoers, and all the >> problems it caused for oops handling with absolutely _zero_ upsides >> ever, I do not ever again want to see fancy unwinders with complex >> state machine handling used by the oopsing code. >> >> The fact that it gets disabled for KASAN also makes me suspicious. It >> basically means that now all the accesses it does are not even >> validated. >> >> The fact that the most of the code seems to be disabled for the first >> six patches, and then just enabled in the last patch, also seems to >> mean that the series also gets no bisection coverage or testing that >> the individual patches make any sense. (ie there's a lot of code >> inside "CONFIG_DWARF_UNWIND" in the early patches but that config >> option cannot even be enabled until the last patch). >> >> We used to have nasty issues with not just missing dwarf info, but >> also actively *wrong* dwarf info. Compiler versions that generated >> subtly wrong info, because nobody actually really depended on it, and >> the people who had tested it seldom did the kinds of things we do in >> the kernel (eg inline asms etc). >> >> So I'm personally still very suspicious of these things. >> >> Last time I had huge issues with people also always blaming *anything* >> else than that unwinder. It was always "oh, somebody wrote asm without >> getting it right". Or "oh, the compiler generated bad tables, it's not >> *my* fault that now the kernel oopsing code no longer works". >> >> When I asked for more stricter debug table validation to avoid issues, >> it was always "oh, we fixed it, no worries", and then two months later >> somebody hit another issue. >> >> Put another way; the last time we did crazy stuff like this, it got >> reverted. For a damn good reason, despite some people being in denial >> about those reasons. > > Here's another possible idea that's been rattling around in my head. > It's purely theoretical at this point, so I don't know for sure that it > would work. But I haven't been able to find any major issues with it > yet. > > DWARF is great for debuggers. It helps you find all the registers on > the stack, so you can see function arguments and local variables. All > expressed in a nice compact format. > > But that's overkill for unwinders. We don't need all those registers, > and the state machine is too complicated. Unwinders basically only need > to know one thing: given an instruction address and a stack pointer, > where is the caller's stack frame? > > I'm thinking/hoping that information can be expressed in a simple, easy > to parse, reasonably sized data structure. Something like a sorted > array of this: > > struct undwarf { > unsigned int ip;/* instruction pointer (relative > offset from base) */ > unsigned prev_frame:13; /* offset to previous frame from > current stack pointer */ > unsigned regs:1;/* whether prev_frame contains entry > regs (regs->ip) */ > unsigned align:2; /* some details for dealing with gcc > stack realignment */ > } __packed; > > extern struct undwarf undwarves[]; Some comments in case you're actually planning to do this: 'unsigned int ip' is the majority of the size of this thing. It might be worth trying to store a lot fewer bits. You could split the structure into: struct undwarf_header { unsigned long start_ip; unsigned align:2; /* i'm assuming this rarely changes */ ...; unsigned int offset_to_details; }; and struct undwarf_details { unsigned short ip_offset; unsigned short prev_frame; }; and you'd find the details by first looking up the last header before the ip and then finding the details starting at (uintptr_t)header + header->offset_to_details. Also, don't you need some indication of which reg is the base from which you find previous frame? After all, sometimes GCC will emit a frame pointer even in an otherwise frame-pointer-omitting kernel. --Andy
Re: [PATCH 7/7] DWARF: add the config option
On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf wrote: > On Fri, May 05, 2017 at 12:57:11PM -0700, Linus Torvalds wrote: >> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby wrote: >> > The DWARF unwinder is in place and ready. So introduce the config option >> > to allow users to enable it. It is by default off due to missing >> > assembly annotations. >> >> Who actually ends up using this? >> >> Because from the last time we had fancy unwindoers, and all the >> problems it caused for oops handling with absolutely _zero_ upsides >> ever, I do not ever again want to see fancy unwinders with complex >> state machine handling used by the oopsing code. >> >> The fact that it gets disabled for KASAN also makes me suspicious. It >> basically means that now all the accesses it does are not even >> validated. >> >> The fact that the most of the code seems to be disabled for the first >> six patches, and then just enabled in the last patch, also seems to >> mean that the series also gets no bisection coverage or testing that >> the individual patches make any sense. (ie there's a lot of code >> inside "CONFIG_DWARF_UNWIND" in the early patches but that config >> option cannot even be enabled until the last patch). >> >> We used to have nasty issues with not just missing dwarf info, but >> also actively *wrong* dwarf info. Compiler versions that generated >> subtly wrong info, because nobody actually really depended on it, and >> the people who had tested it seldom did the kinds of things we do in >> the kernel (eg inline asms etc). >> >> So I'm personally still very suspicious of these things. >> >> Last time I had huge issues with people also always blaming *anything* >> else than that unwinder. It was always "oh, somebody wrote asm without >> getting it right". Or "oh, the compiler generated bad tables, it's not >> *my* fault that now the kernel oopsing code no longer works". >> >> When I asked for more stricter debug table validation to avoid issues, >> it was always "oh, we fixed it, no worries", and then two months later >> somebody hit another issue. >> >> Put another way; the last time we did crazy stuff like this, it got >> reverted. For a damn good reason, despite some people being in denial >> about those reasons. > > Here's another possible idea that's been rattling around in my head. > It's purely theoretical at this point, so I don't know for sure that it > would work. But I haven't been able to find any major issues with it > yet. > > DWARF is great for debuggers. It helps you find all the registers on > the stack, so you can see function arguments and local variables. All > expressed in a nice compact format. > > But that's overkill for unwinders. We don't need all those registers, > and the state machine is too complicated. Unwinders basically only need > to know one thing: given an instruction address and a stack pointer, > where is the caller's stack frame? > > I'm thinking/hoping that information can be expressed in a simple, easy > to parse, reasonably sized data structure. Something like a sorted > array of this: > > struct undwarf { > unsigned int ip;/* instruction pointer (relative > offset from base) */ > unsigned prev_frame:13; /* offset to previous frame from > current stack pointer */ > unsigned regs:1;/* whether prev_frame contains entry > regs (regs->ip) */ > unsigned align:2; /* some details for dealing with gcc > stack realignment */ > } __packed; > > extern struct undwarf undwarves[]; Some comments in case you're actually planning to do this: 'unsigned int ip' is the majority of the size of this thing. It might be worth trying to store a lot fewer bits. You could split the structure into: struct undwarf_header { unsigned long start_ip; unsigned align:2; /* i'm assuming this rarely changes */ ...; unsigned int offset_to_details; }; and struct undwarf_details { unsigned short ip_offset; unsigned short prev_frame; }; and you'd find the details by first looking up the last header before the ip and then finding the details starting at (uintptr_t)header + header->offset_to_details. Also, don't you need some indication of which reg is the base from which you find previous frame? After all, sometimes GCC will emit a frame pointer even in an otherwise frame-pointer-omitting kernel. --Andy