Re: [PATCH 7/7] DWARF: add the config option

2017-05-28 Thread Ingo Molnar

* h...@zytor.com  wrote:

> This assumes that it actually ends up being feasible for objtool to do so.

Yes, agreed, that's a big precondition. I'm cautiously optimistic based on 
Josh's 
experiments that he posted about in this thread.

Thanks,

Ingo


Re: [PATCH 7/7] DWARF: add the config option

2017-05-28 Thread Ingo Molnar

* h...@zytor.com  wrote:

> This assumes that it actually ends up being feasible for objtool to do so.

Yes, agreed, that's a big precondition. I'm cautiously optimistic based on 
Josh's 
experiments that he posted about in this thread.

Thanks,

Ingo


Re: [PATCH 7/7] DWARF: add the config option

2017-05-26 Thread hpa
On May 22, 2017 10:49:06 PM PDT, Ingo Molnar  wrote:
>
>* H. Peter Anvin  wrote:
>
>> On 05/22/17 04:12, Ingo Molnar wrote:
>> \>>
>> >> This construct might be useful for other arches, which is why I
>called
>> >> it "FP" instead of "BP".  But then I ruined that with the last 3
>:-)
>> > 
>> > Please call it BP - 'FP' can easily be read as floating-point,
>making it all 
>> > super-confusing. We should use canonical x86 register names and
>ordering - even
>> > if not all registers are used straight away.
>> > 
>> 
>> Seriously, I suspect that at the end of the day we will have
>reinvented
>> DWARF.
>
>Absolutely - the main difference is:
>
>- the debug-info implementation is _internal_ to the kernel so it can
>be fixed
>instead of "oh, wait 2 years for the toolchain to fix this particular
>bug, work
>it around in the kernel meanwhile" kind of crazy flow and dependencies.
>I.e. 
>the debug-info generation and parsing code is both part of the kernel
>Git tree 
>   and can be iterated (and fixed) at once with.
>
>- the debug-info is auto-generated for assembly as well, leaving
>assembly code 
>   maintainable.
>
>- the debug-info has a sane data structure designed for robustness and 
>   compactness
>
>So even if it's a subset of the existing complexity of dwarf et al we
>are still 
>literally infinitely better off with this model.
>
>Thanks,
>
>   Ingo

This assumes that it actually ends up being feasible for objtool to do so.

It is worth noting that using DWARF for unwinding vs auto-generating the unwind 
information are independent issues.

Another option is to use (or postprocess) the compiler-generated DWARF for C 
modules and pursue autogeneration only for assembly modules, which ought to be 
a much easier problem and is less dependent on discovering new 
compiler-generated patterns.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-26 Thread hpa
On May 22, 2017 10:49:06 PM PDT, Ingo Molnar  wrote:
>
>* H. Peter Anvin  wrote:
>
>> On 05/22/17 04:12, Ingo Molnar wrote:
>> \>>
>> >> This construct might be useful for other arches, which is why I
>called
>> >> it "FP" instead of "BP".  But then I ruined that with the last 3
>:-)
>> > 
>> > Please call it BP - 'FP' can easily be read as floating-point,
>making it all 
>> > super-confusing. We should use canonical x86 register names and
>ordering - even
>> > if not all registers are used straight away.
>> > 
>> 
>> Seriously, I suspect that at the end of the day we will have
>reinvented
>> DWARF.
>
>Absolutely - the main difference is:
>
>- the debug-info implementation is _internal_ to the kernel so it can
>be fixed
>instead of "oh, wait 2 years for the toolchain to fix this particular
>bug, work
>it around in the kernel meanwhile" kind of crazy flow and dependencies.
>I.e. 
>the debug-info generation and parsing code is both part of the kernel
>Git tree 
>   and can be iterated (and fixed) at once with.
>
>- the debug-info is auto-generated for assembly as well, leaving
>assembly code 
>   maintainable.
>
>- the debug-info has a sane data structure designed for robustness and 
>   compactness
>
>So even if it's a subset of the existing complexity of dwarf et al we
>are still 
>literally infinitely better off with this model.
>
>Thanks,
>
>   Ingo

This assumes that it actually ends up being feasible for objtool to do so.

It is worth noting that using DWARF for unwinding vs auto-generating the unwind 
information are independent issues.

Another option is to use (or postprocess) the compiler-generated DWARF for C 
modules and pursue autogeneration only for assembly modules, which ought to be 
a much easier problem and is less dependent on discovering new 
compiler-generated patterns.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-26 Thread Josh Poimboeuf
On Fri, May 26, 2017 at 01:29:01PM +0200, Jiri Slaby wrote:
> On 05/26/2017, 08:54 AM, Jiri Slaby wrote:
> > On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote:
> >>   
> >> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c
> > 
> > JFYI, it crashes in sha1_transform_avx due to crypto changes. You
> > perhaps missed that this beast uses ebp (not rbp) register for
> > computations. I had to do:
> > 
> > --- a/arch/x86/crypto/sha1_ssse3_asm.S
> > +++ b/arch/x86/crypto/sha1_ssse3_asm.S
> > @@ -37,7 +37,7 @@
> >  #define REG_A  %ecx
> >  #define REG_B  %esi
> >  #define REG_C  %edi
> > -#define REG_D  %ebp
> > +#define REG_D  %r12d
> >  #define REG_E  %edx
> > 
> >  #define REG_T1 %eax
> > @@ -74,6 +74,7 @@
> > SYM_FUNC_START(\name)
> > 
> > push%rbx
> > +   push%r12
> > push%rbp
> > 
> > mov %rsp, %rbp
> > @@ -99,6 +100,7 @@
> > rep stosq
> > 
> > leaveq  # deallocate workspace
> > +   pop %r12
> > pop %rbx
> > ret
> > 
> > 
> > I am afraid there are more of these, e.g. in aesni-intel_asm.S.
> 
> aesni-intel_asm.S is OK -- only untouched x86_32 part uses ebp.
> 
> But sha1_avx2_x86_64_asm.S is not. They use *all* usable registers
> including ebp in the computations hidden behind the
> SHA1_PIPELINED_MAIN_BODY macro. The only work around I can see is to
> push rbp/pop rbp around the computation as it used to do with rbx:
> 
> --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> @@ -636,6 +636,7 @@ _loop3:
> /* Align stack */
> mov %rsp, %rbp
> and $~(0x20-1), %rsp
> +   push%rbp
> sub $RESERVE_STACK, %rsp
> 
> avx2_zeroupper
> @@ -661,6 +662,7 @@ _loop3:
> avx2_zeroupper
> 
> add $RESERVE_STACK, %rsp
> +   pop %rbp
> 
> leaveq
> pop %r15

Thanks, the first fix looks good.  Is the second one needed though?  It
already pushes rbp before it aligns the stack.

DWARF/undwarf will be immune to these issues, so I'll be moving a lot of
these crypto changes to a separate branch.  They were only in this
branch because the new-and-improved objtool can now find rbp misusage in
leaf functions.

It seems that most of the crypto code is frame pointer ignorant.  IMO,
leaf functions shouldn't be allowed to use rbp because it breaks frame
pointers when preempted by an interrupt.  GCC seems to agree.

I added a check to objtool to find the ones which use rbp badly.  Here
are the ones I see with my config:

  arch/x86/crypto/des3_ede-asm_64.o: warning: objtool: 
des3_ede_x86_64_crypt_blk uses BP as a scratch register
  arch/x86/crypto/des3_ede-asm_64.o: warning: objtool: 
des3_ede_x86_64_crypt_blk_3way uses BP as a scratch register
  arch/x86/crypto/blowfish-x86_64-asm_64.o: warning: objtool: 
__blowfish_enc_blk_4way uses BP as a scratch register
  arch/x86/crypto/blowfish-x86_64-asm_64.o: warning: objtool: 
blowfish_dec_blk_4way uses BP as a scratch register
  arch/x86/crypto/twofish-x86_64-asm_64-3way.o: warning: objtool: 
__twofish_enc_blk_3way uses BP as a scratch register
  arch/x86/crypto/twofish-x86_64-asm_64-3way.o: warning: objtool: 
twofish_dec_blk_3way uses BP as a scratch register
  arch/x86/crypto/sha256-avx2-asm.o: warning: objtool: sha256_transform_rorx 
uses BP as a scratch register
  arch/x86/crypto/cast5-avx-x86_64-asm_64.o: warning: objtool: 
__cast5_enc_blk16 uses BP as a scratch register
  arch/x86/crypto/cast5-avx-x86_64-asm_64.o: warning: objtool: 
__cast5_dec_blk16 uses BP as a scratch register
  arch/x86/crypto/cast6-avx-x86_64-asm_64.o: warning: objtool: __cast6_enc_blk8 
uses BP as a scratch register
  arch/x86/crypto/cast6-avx-x86_64-asm_64.o: warning: objtool: __cast6_dec_blk8 
uses BP as a scratch register
  arch/x86/crypto/twofish-avx-x86_64-asm_64.o: warning: objtool: 
__twofish_enc_blk8 uses BP as a scratch register
  arch/x86/crypto/twofish-avx-x86_64-asm_64.o: warning: objtool: 
__twofish_dec_blk8 uses BP as a scratch register

(And that doesn't include the ones which misuse ebp.)

It may be a challenge to fix some of those which use all available
registers.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-26 Thread Josh Poimboeuf
On Fri, May 26, 2017 at 01:29:01PM +0200, Jiri Slaby wrote:
> On 05/26/2017, 08:54 AM, Jiri Slaby wrote:
> > On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote:
> >>   
> >> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c
> > 
> > JFYI, it crashes in sha1_transform_avx due to crypto changes. You
> > perhaps missed that this beast uses ebp (not rbp) register for
> > computations. I had to do:
> > 
> > --- a/arch/x86/crypto/sha1_ssse3_asm.S
> > +++ b/arch/x86/crypto/sha1_ssse3_asm.S
> > @@ -37,7 +37,7 @@
> >  #define REG_A  %ecx
> >  #define REG_B  %esi
> >  #define REG_C  %edi
> > -#define REG_D  %ebp
> > +#define REG_D  %r12d
> >  #define REG_E  %edx
> > 
> >  #define REG_T1 %eax
> > @@ -74,6 +74,7 @@
> > SYM_FUNC_START(\name)
> > 
> > push%rbx
> > +   push%r12
> > push%rbp
> > 
> > mov %rsp, %rbp
> > @@ -99,6 +100,7 @@
> > rep stosq
> > 
> > leaveq  # deallocate workspace
> > +   pop %r12
> > pop %rbx
> > ret
> > 
> > 
> > I am afraid there are more of these, e.g. in aesni-intel_asm.S.
> 
> aesni-intel_asm.S is OK -- only untouched x86_32 part uses ebp.
> 
> But sha1_avx2_x86_64_asm.S is not. They use *all* usable registers
> including ebp in the computations hidden behind the
> SHA1_PIPELINED_MAIN_BODY macro. The only work around I can see is to
> push rbp/pop rbp around the computation as it used to do with rbx:
> 
> --- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> +++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
> @@ -636,6 +636,7 @@ _loop3:
> /* Align stack */
> mov %rsp, %rbp
> and $~(0x20-1), %rsp
> +   push%rbp
> sub $RESERVE_STACK, %rsp
> 
> avx2_zeroupper
> @@ -661,6 +662,7 @@ _loop3:
> avx2_zeroupper
> 
> add $RESERVE_STACK, %rsp
> +   pop %rbp
> 
> leaveq
> pop %r15

Thanks, the first fix looks good.  Is the second one needed though?  It
already pushes rbp before it aligns the stack.

DWARF/undwarf will be immune to these issues, so I'll be moving a lot of
these crypto changes to a separate branch.  They were only in this
branch because the new-and-improved objtool can now find rbp misusage in
leaf functions.

It seems that most of the crypto code is frame pointer ignorant.  IMO,
leaf functions shouldn't be allowed to use rbp because it breaks frame
pointers when preempted by an interrupt.  GCC seems to agree.

I added a check to objtool to find the ones which use rbp badly.  Here
are the ones I see with my config:

  arch/x86/crypto/des3_ede-asm_64.o: warning: objtool: 
des3_ede_x86_64_crypt_blk uses BP as a scratch register
  arch/x86/crypto/des3_ede-asm_64.o: warning: objtool: 
des3_ede_x86_64_crypt_blk_3way uses BP as a scratch register
  arch/x86/crypto/blowfish-x86_64-asm_64.o: warning: objtool: 
__blowfish_enc_blk_4way uses BP as a scratch register
  arch/x86/crypto/blowfish-x86_64-asm_64.o: warning: objtool: 
blowfish_dec_blk_4way uses BP as a scratch register
  arch/x86/crypto/twofish-x86_64-asm_64-3way.o: warning: objtool: 
__twofish_enc_blk_3way uses BP as a scratch register
  arch/x86/crypto/twofish-x86_64-asm_64-3way.o: warning: objtool: 
twofish_dec_blk_3way uses BP as a scratch register
  arch/x86/crypto/sha256-avx2-asm.o: warning: objtool: sha256_transform_rorx 
uses BP as a scratch register
  arch/x86/crypto/cast5-avx-x86_64-asm_64.o: warning: objtool: 
__cast5_enc_blk16 uses BP as a scratch register
  arch/x86/crypto/cast5-avx-x86_64-asm_64.o: warning: objtool: 
__cast5_dec_blk16 uses BP as a scratch register
  arch/x86/crypto/cast6-avx-x86_64-asm_64.o: warning: objtool: __cast6_enc_blk8 
uses BP as a scratch register
  arch/x86/crypto/cast6-avx-x86_64-asm_64.o: warning: objtool: __cast6_dec_blk8 
uses BP as a scratch register
  arch/x86/crypto/twofish-avx-x86_64-asm_64.o: warning: objtool: 
__twofish_enc_blk8 uses BP as a scratch register
  arch/x86/crypto/twofish-avx-x86_64-asm_64.o: warning: objtool: 
__twofish_dec_blk8 uses BP as a scratch register

(And that doesn't include the ones which misuse ebp.)

It may be a challenge to fix some of those which use all available
registers.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-26 Thread Jiri Slaby
On 05/26/2017, 08:54 AM, Jiri Slaby wrote:
> On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote:
>>   
>> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c
> 
> JFYI, it crashes in sha1_transform_avx due to crypto changes. You
> perhaps missed that this beast uses ebp (not rbp) register for
> computations. I had to do:
> 
> --- a/arch/x86/crypto/sha1_ssse3_asm.S
> +++ b/arch/x86/crypto/sha1_ssse3_asm.S
> @@ -37,7 +37,7 @@
>  #define REG_A  %ecx
>  #define REG_B  %esi
>  #define REG_C  %edi
> -#define REG_D  %ebp
> +#define REG_D  %r12d
>  #define REG_E  %edx
> 
>  #define REG_T1 %eax
> @@ -74,6 +74,7 @@
> SYM_FUNC_START(\name)
> 
> push%rbx
> +   push%r12
> push%rbp
> 
> mov %rsp, %rbp
> @@ -99,6 +100,7 @@
> rep stosq
> 
> leaveq  # deallocate workspace
> +   pop %r12
> pop %rbx
> ret
> 
> 
> I am afraid there are more of these, e.g. in aesni-intel_asm.S.

aesni-intel_asm.S is OK -- only untouched x86_32 part uses ebp.

But sha1_avx2_x86_64_asm.S is not. They use *all* usable registers
including ebp in the computations hidden behind the
SHA1_PIPELINED_MAIN_BODY macro. The only work around I can see is to
push rbp/pop rbp around the computation as it used to do with rbx:

--- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
+++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
@@ -636,6 +636,7 @@ _loop3:
/* Align stack */
mov %rsp, %rbp
and $~(0x20-1), %rsp
+   push%rbp
sub $RESERVE_STACK, %rsp

avx2_zeroupper
@@ -661,6 +662,7 @@ _loop3:
avx2_zeroupper

add $RESERVE_STACK, %rsp
+   pop %rbp

leaveq
pop %r15

regards,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-26 Thread Jiri Slaby
On 05/26/2017, 08:54 AM, Jiri Slaby wrote:
> On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote:
>>   
>> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c
> 
> JFYI, it crashes in sha1_transform_avx due to crypto changes. You
> perhaps missed that this beast uses ebp (not rbp) register for
> computations. I had to do:
> 
> --- a/arch/x86/crypto/sha1_ssse3_asm.S
> +++ b/arch/x86/crypto/sha1_ssse3_asm.S
> @@ -37,7 +37,7 @@
>  #define REG_A  %ecx
>  #define REG_B  %esi
>  #define REG_C  %edi
> -#define REG_D  %ebp
> +#define REG_D  %r12d
>  #define REG_E  %edx
> 
>  #define REG_T1 %eax
> @@ -74,6 +74,7 @@
> SYM_FUNC_START(\name)
> 
> push%rbx
> +   push%r12
> push%rbp
> 
> mov %rsp, %rbp
> @@ -99,6 +100,7 @@
> rep stosq
> 
> leaveq  # deallocate workspace
> +   pop %r12
> pop %rbx
> ret
> 
> 
> I am afraid there are more of these, e.g. in aesni-intel_asm.S.

aesni-intel_asm.S is OK -- only untouched x86_32 part uses ebp.

But sha1_avx2_x86_64_asm.S is not. They use *all* usable registers
including ebp in the computations hidden behind the
SHA1_PIPELINED_MAIN_BODY macro. The only work around I can see is to
push rbp/pop rbp around the computation as it used to do with rbx:

--- a/arch/x86/crypto/sha1_avx2_x86_64_asm.S
+++ b/arch/x86/crypto/sha1_avx2_x86_64_asm.S
@@ -636,6 +636,7 @@ _loop3:
/* Align stack */
mov %rsp, %rbp
and $~(0x20-1), %rsp
+   push%rbp
sub $RESERVE_STACK, %rsp

avx2_zeroupper
@@ -661,6 +662,7 @@ _loop3:
avx2_zeroupper

add $RESERVE_STACK, %rsp
+   pop %rbp

leaveq
pop %r15

regards,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-26 Thread Jiri Slaby
On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote:
>   
> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c

JFYI, it crashes in sha1_transform_avx due to crypto changes. You
perhaps missed that this beast uses ebp (not rbp) register for
computations. I had to do:

--- a/arch/x86/crypto/sha1_ssse3_asm.S
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -37,7 +37,7 @@
 #define REG_A  %ecx
 #define REG_B  %esi
 #define REG_C  %edi
-#define REG_D  %ebp
+#define REG_D  %r12d
 #define REG_E  %edx

 #define REG_T1 %eax
@@ -74,6 +74,7 @@
SYM_FUNC_START(\name)

push%rbx
+   push%r12
push%rbp

mov %rsp, %rbp
@@ -99,6 +100,7 @@
rep stosq

leaveq  # deallocate workspace
+   pop %r12
pop %rbx
ret


I am afraid there are more of these, e.g. in aesni-intel_asm.S.

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-26 Thread Jiri Slaby
On 05/19/2017, 11:35 PM, Josh Poimboeuf wrote:
>   
> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c

JFYI, it crashes in sha1_transform_avx due to crypto changes. You
perhaps missed that this beast uses ebp (not rbp) register for
computations. I had to do:

--- a/arch/x86/crypto/sha1_ssse3_asm.S
+++ b/arch/x86/crypto/sha1_ssse3_asm.S
@@ -37,7 +37,7 @@
 #define REG_A  %ecx
 #define REG_B  %esi
 #define REG_C  %edi
-#define REG_D  %ebp
+#define REG_D  %r12d
 #define REG_E  %edx

 #define REG_T1 %eax
@@ -74,6 +74,7 @@
SYM_FUNC_START(\name)

push%rbx
+   push%r12
push%rbp

mov %rsp, %rbp
@@ -99,6 +100,7 @@
rep stosq

leaveq  # deallocate workspace
+   pop %r12
pop %rbx
ret


I am afraid there are more of these, e.g. in aesni-intel_asm.S.

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-23 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote:
> > > But it does hurt, in the sense that the complicated format of DWARF CFI
> > > means the unwinder has to jump through a lot more hoops to read it.
> > 
> > Why that matters, actually? Unwinder is nothing to be performance
> > oriented. And if somebody is doing a lot of unwinding during runtime,
> > they can switch to in-this-case-faster FP unwinder.
> 
> perf (and ftrace) like the unwinder to be considered performance
> oriented.

Yes, and given how critical debugging is there's a kind of useful synergy here: 
overall the perf unwinder is run about 10 orders of magnitude more often than 
the 
debugging unwinder, so it's a very big help in shaking out unwinder/debug-info 
bugs and increasing robustness overall.

The 'price' for using the unwinder in perf is that it has to be fast.

Thanks,

Ingo


Re: [PATCH 7/7] DWARF: add the config option

2017-05-23 Thread Ingo Molnar

* Peter Zijlstra  wrote:

> On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote:
> > > But it does hurt, in the sense that the complicated format of DWARF CFI
> > > means the unwinder has to jump through a lot more hoops to read it.
> > 
> > Why that matters, actually? Unwinder is nothing to be performance
> > oriented. And if somebody is doing a lot of unwinding during runtime,
> > they can switch to in-this-case-faster FP unwinder.
> 
> perf (and ftrace) like the unwinder to be considered performance
> oriented.

Yes, and given how critical debugging is there's a kind of useful synergy here: 
overall the perf unwinder is run about 10 orders of magnitude more often than 
the 
debugging unwinder, so it's a very big help in shaking out unwinder/debug-info 
bugs and increasing robustness overall.

The 'price' for using the unwinder in perf is that it has to be fast.

Thanks,

Ingo


Re: [PATCH 7/7] DWARF: add the config option

2017-05-23 Thread Peter Zijlstra
On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote:
> > But it does hurt, in the sense that the complicated format of DWARF CFI
> > means the unwinder has to jump through a lot more hoops to read it.
> 
> Why that matters, actually? Unwinder is nothing to be performance
> oriented. And if somebody is doing a lot of unwinding during runtime,
> they can switch to in-this-case-faster FP unwinder.

perf (and ftrace) like the unwinder to be considered performance
oriented.



Re: [PATCH 7/7] DWARF: add the config option

2017-05-23 Thread Peter Zijlstra
On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote:
> > But it does hurt, in the sense that the complicated format of DWARF CFI
> > means the unwinder has to jump through a lot more hoops to read it.
> 
> Why that matters, actually? Unwinder is nothing to be performance
> oriented. And if somebody is doing a lot of unwinding during runtime,
> they can switch to in-this-case-faster FP unwinder.

perf (and ftrace) like the unwinder to be considered performance
oriented.



Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Ingo Molnar

* H. Peter Anvin  wrote:

> On 05/22/17 04:12, Ingo Molnar wrote:
> \>>
> >> This construct might be useful for other arches, which is why I called
> >> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)
> > 
> > Please call it BP - 'FP' can easily be read as floating-point, making it 
> > all 
> > super-confusing. We should use canonical x86 register names and ordering - 
> > even
> > if not all registers are used straight away.
> > 
> 
> Seriously, I suspect that at the end of the day we will have reinvented
> DWARF.

Absolutely - the main difference is:

 - the debug-info implementation is _internal_ to the kernel so it can be fixed
   instead of "oh, wait 2 years for the toolchain to fix this particular bug, 
work
   it around in the kernel meanwhile" kind of crazy flow and dependencies. I.e. 
   the debug-info generation and parsing code is both part of the kernel Git 
tree 
   and can be iterated (and fixed) at once with.

 - the debug-info is auto-generated for assembly as well, leaving assembly code 
   maintainable.

 - the debug-info has a sane data structure designed for robustness and 
   compactness

So even if it's a subset of the existing complexity of dwarf et al we are still 
literally infinitely better off with this model.

Thanks,

Ingo


Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Ingo Molnar

* H. Peter Anvin  wrote:

> On 05/22/17 04:12, Ingo Molnar wrote:
> \>>
> >> This construct might be useful for other arches, which is why I called
> >> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)
> > 
> > Please call it BP - 'FP' can easily be read as floating-point, making it 
> > all 
> > super-confusing. We should use canonical x86 register names and ordering - 
> > even
> > if not all registers are used straight away.
> > 
> 
> Seriously, I suspect that at the end of the day we will have reinvented
> DWARF.

Absolutely - the main difference is:

 - the debug-info implementation is _internal_ to the kernel so it can be fixed
   instead of "oh, wait 2 years for the toolchain to fix this particular bug, 
work
   it around in the kernel meanwhile" kind of crazy flow and dependencies. I.e. 
   the debug-info generation and parsing code is both part of the kernel Git 
tree 
   and can be iterated (and fixed) at once with.

 - the debug-info is auto-generated for assembly as well, leaving assembly code 
   maintainable.

 - the debug-info has a sane data structure designed for robustness and 
   compactness

So even if it's a subset of the existing complexity of dwarf et al we are still 
literally infinitely better off with this model.

Thanks,

Ingo


Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Jiri Kosina
On Mon, 22 May 2017, H. Peter Anvin wrote:

> > Please call it BP - 'FP' can easily be read as floating-point, making 
> > it all super-confusing. We should use canonical x86 register names and 
> > ordering - even if not all registers are used straight away.
> 
> Seriously, I suspect that at the end of the day we will have reinvented
> DWARF.

If this is the fear, what is the proposal then? We simply have to deal 
with !FRAME_POINTER builds.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Jiri Kosina
On Mon, 22 May 2017, H. Peter Anvin wrote:

> > Please call it BP - 'FP' can easily be read as floating-point, making 
> > it all super-confusing. We should use canonical x86 register names and 
> > ordering - even if not all registers are used straight away.
> 
> Seriously, I suspect that at the end of the day we will have reinvented
> DWARF.

If this is the fear, what is the proposal then? We simply have to deal 
with !FRAME_POINTER builds.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Josh Poimboeuf
On Mon, May 22, 2017 at 02:37:50PM -0700, H. Peter Anvin wrote:
> On 05/22/17 14:07, H. Peter Anvin wrote:
> > On 05/20/17 13:01, H.J. Lu wrote:
> >> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  
> >> wrote:
> >>
> 
>  (H.J., could we get a binutils feature that allows is to do:
> 
>  pushq %whatever
>  .cfi_adjust_sp -8
>  ...
>  popq %whatever
>  .cfi_adjust_sp 8
> 
> >>
> >> Np.  Compiler needs to generate this.
> >>
> > 
> > For actual assembly we have such a feature, it is called macros.
> > 
> > push/pop is the easy stuff; macros take care of that, but the real pain
> > is dealing with the flow of control.
> > 
> 
> My biggest beef with the CFI directives that gas uses is that there is
> that .cfi_remember_state/.cfi_restore_state doesn't have a way to
> specify more than one state.  That makes it really hard to get sanity
> around control flow changes, especially with code that is intentionally
> out of line.
> 
> That, and some of the CFI directives seem to be a bit ill-defined in
> their definition (are they even applicable to anything other than
> DWARF?)  They almost seem to be referencing some external specification,
> but the only thing I'm finding is the DWARF documentation which is
> written in very different terms.
> 
> The best description of what a personality routine is I found in an
> article by Ian Lance Taylor.  It doesn't seem to be applicable to C as
> far as I can tell.

So my understanding is that there's stock DWARF (.debug_frame) and then
there's souped-up DWARF (.eh_frame), which is basically DWARF with a few
extensions.

The remember/restore state thing is stock DWARF (DW_CFA_remember_state
and DW_CFA_restore_state).  The personality routine thing is in the
.eh_frame extension which is documented here:

  
http://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Josh Poimboeuf
On Mon, May 22, 2017 at 02:37:50PM -0700, H. Peter Anvin wrote:
> On 05/22/17 14:07, H. Peter Anvin wrote:
> > On 05/20/17 13:01, H.J. Lu wrote:
> >> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  
> >> wrote:
> >>
> 
>  (H.J., could we get a binutils feature that allows is to do:
> 
>  pushq %whatever
>  .cfi_adjust_sp -8
>  ...
>  popq %whatever
>  .cfi_adjust_sp 8
> 
> >>
> >> Np.  Compiler needs to generate this.
> >>
> > 
> > For actual assembly we have such a feature, it is called macros.
> > 
> > push/pop is the easy stuff; macros take care of that, but the real pain
> > is dealing with the flow of control.
> > 
> 
> My biggest beef with the CFI directives that gas uses is that there is
> that .cfi_remember_state/.cfi_restore_state doesn't have a way to
> specify more than one state.  That makes it really hard to get sanity
> around control flow changes, especially with code that is intentionally
> out of line.
> 
> That, and some of the CFI directives seem to be a bit ill-defined in
> their definition (are they even applicable to anything other than
> DWARF?)  They almost seem to be referencing some external specification,
> but the only thing I'm finding is the DWARF documentation which is
> written in very different terms.
> 
> The best description of what a personality routine is I found in an
> article by Ian Lance Taylor.  It doesn't seem to be applicable to C as
> far as I can tell.

So my understanding is that there's stock DWARF (.debug_frame) and then
there's souped-up DWARF (.eh_frame), which is basically DWARF with a few
extensions.

The remember/restore state thing is stock DWARF (DW_CFA_remember_state
and DW_CFA_restore_state).  The personality routine thing is in the
.eh_frame extension which is documented here:

  
http://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread H. Peter Anvin
On 05/22/17 14:07, H. Peter Anvin wrote:
> On 05/20/17 13:01, H.J. Lu wrote:
>> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:
>>

 (H.J., could we get a binutils feature that allows is to do:

 pushq %whatever
 .cfi_adjust_sp -8
 ...
 popq %whatever
 .cfi_adjust_sp 8

>>
>> Np.  Compiler needs to generate this.
>>
> 
> For actual assembly we have such a feature, it is called macros.
> 
> push/pop is the easy stuff; macros take care of that, but the real pain
> is dealing with the flow of control.
> 

My biggest beef with the CFI directives that gas uses is that there is
that .cfi_remember_state/.cfi_restore_state doesn't have a way to
specify more than one state.  That makes it really hard to get sanity
around control flow changes, especially with code that is intentionally
out of line.

That, and some of the CFI directives seem to be a bit ill-defined in
their definition (are they even applicable to anything other than
DWARF?)  They almost seem to be referencing some external specification,
but the only thing I'm finding is the DWARF documentation which is
written in very different terms.

The best description of what a personality routine is I found in an
article by Ian Lance Taylor.  It doesn't seem to be applicable to C as
far as I can tell.

-hpa




Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread H. Peter Anvin
On 05/22/17 14:07, H. Peter Anvin wrote:
> On 05/20/17 13:01, H.J. Lu wrote:
>> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:
>>

 (H.J., could we get a binutils feature that allows is to do:

 pushq %whatever
 .cfi_adjust_sp -8
 ...
 popq %whatever
 .cfi_adjust_sp 8

>>
>> Np.  Compiler needs to generate this.
>>
> 
> For actual assembly we have such a feature, it is called macros.
> 
> push/pop is the easy stuff; macros take care of that, but the real pain
> is dealing with the flow of control.
> 

My biggest beef with the CFI directives that gas uses is that there is
that .cfi_remember_state/.cfi_restore_state doesn't have a way to
specify more than one state.  That makes it really hard to get sanity
around control flow changes, especially with code that is intentionally
out of line.

That, and some of the CFI directives seem to be a bit ill-defined in
their definition (are they even applicable to anything other than
DWARF?)  They almost seem to be referencing some external specification,
but the only thing I'm finding is the DWARF documentation which is
written in very different terms.

The best description of what a personality routine is I found in an
article by Ian Lance Taylor.  It doesn't seem to be applicable to C as
far as I can tell.

-hpa




Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread H. Peter Anvin
On 05/22/17 04:12, Ingo Molnar wrote:
\>>
>> This construct might be useful for other arches, which is why I called
>> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)
> 
> Please call it BP - 'FP' can easily be read as floating-point, making it all 
> super-confusing. We should use canonical x86 register names and ordering - 
> even
> if not all registers are used straight away.
> 

Seriously, I suspect that at the end of the day we will have reinvented
DWARF.

-hpa



Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread H. Peter Anvin
On 05/22/17 04:12, Ingo Molnar wrote:
\>>
>> This construct might be useful for other arches, which is why I called
>> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)
> 
> Please call it BP - 'FP' can easily be read as floating-point, making it all 
> super-confusing. We should use canonical x86 register names and ordering - 
> even
> if not all registers are used straight away.
> 

Seriously, I suspect that at the end of the day we will have reinvented
DWARF.

-hpa



Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread H. Peter Anvin
On 05/20/17 13:01, H.J. Lu wrote:
> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:
> 
>>>
>>> (H.J., could we get a binutils feature that allows is to do:
>>>
>>> pushq %whatever
>>> .cfi_adjust_sp -8
>>> ...
>>> popq %whatever
>>> .cfi_adjust_sp 8
>>>
> 
> Np.  Compiler needs to generate this.
> 

For actual assembly we have such a feature, it is called macros.

push/pop is the easy stuff; macros take care of that, but the real pain
is dealing with the flow of control.

-hpa



Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread H. Peter Anvin
On 05/20/17 13:01, H.J. Lu wrote:
> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:
> 
>>>
>>> (H.J., could we get a binutils feature that allows is to do:
>>>
>>> pushq %whatever
>>> .cfi_adjust_sp -8
>>> ...
>>> popq %whatever
>>> .cfi_adjust_sp 8
>>>
> 
> Np.  Compiler needs to generate this.
> 

For actual assembly we have such a feature, it is called macros.

push/pop is the easy stuff; macros take care of that, but the real pain
is dealing with the flow of control.

-hpa



Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread H.J. Lu
On Mon, May 22, 2017 at 4:34 AM, Jiri Kosina  wrote:
> On Sat, 20 May 2017, H.J. Lu wrote:
>
>>  pushq %whatever
>>  .cfi_adjust_sp -8
>>  ...
>>  popq %whatever
>>  .cfi_adjust_sp 8
>> 
>> >>
>> >> Np.  Compiler needs to generate this.
>> >>
>> >
>> > How would the compiler generate this when inline asm is involved?  For
>> > the kernel, objtool could get around the need to have these
>> > annotations, but not so much for user code?  Is the compiler supposed
>> > to parse the inline asm?  Would the compiler provide some magic % code
>> > to represent the current CFA base register?
>>
>> Here is one example of inline asm with call frame info:
>>
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD
>
> That brings us basically pretty close to square one though; having to
> maintain "manual" anotations. Something we're pretty much trying to avoid
> through this excercise.

Assembler only encodes instructions.  You need to a different tool
to figure out what an instruction does.


-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread H.J. Lu
On Mon, May 22, 2017 at 4:34 AM, Jiri Kosina  wrote:
> On Sat, 20 May 2017, H.J. Lu wrote:
>
>>  pushq %whatever
>>  .cfi_adjust_sp -8
>>  ...
>>  popq %whatever
>>  .cfi_adjust_sp 8
>> 
>> >>
>> >> Np.  Compiler needs to generate this.
>> >>
>> >
>> > How would the compiler generate this when inline asm is involved?  For
>> > the kernel, objtool could get around the need to have these
>> > annotations, but not so much for user code?  Is the compiler supposed
>> > to parse the inline asm?  Would the compiler provide some magic % code
>> > to represent the current CFA base register?
>>
>> Here is one example of inline asm with call frame info:
>>
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD
>
> That brings us basically pretty close to square one though; having to
> maintain "manual" anotations. Something we're pretty much trying to avoid
> through this excercise.

Assembler only encodes instructions.  You need to a different tool
to figure out what an instruction does.


-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Jiri Kosina
On Sat, 20 May 2017, H.J. Lu wrote:

>  pushq %whatever
>  .cfi_adjust_sp -8
>  ...
>  popq %whatever
>  .cfi_adjust_sp 8
> 
> >>
> >> Np.  Compiler needs to generate this.
> >>
> >
> > How would the compiler generate this when inline asm is involved?  For
> > the kernel, objtool could get around the need to have these
> > annotations, but not so much for user code?  Is the compiler supposed
> > to parse the inline asm?  Would the compiler provide some magic % code
> > to represent the current CFA base register?
> 
> Here is one example of inline asm with call frame info:
> 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD

That brings us basically pretty close to square one though; having to 
maintain "manual" anotations. Something we're pretty much trying to avoid 
through this excercise.

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Jiri Kosina
On Sat, 20 May 2017, H.J. Lu wrote:

>  pushq %whatever
>  .cfi_adjust_sp -8
>  ...
>  popq %whatever
>  .cfi_adjust_sp 8
> 
> >>
> >> Np.  Compiler needs to generate this.
> >>
> >
> > How would the compiler generate this when inline asm is involved?  For
> > the kernel, objtool could get around the need to have these
> > annotations, but not so much for user code?  Is the compiler supposed
> > to parse the inline asm?  Would the compiler provide some magic % code
> > to represent the current CFA base register?
> 
> Here is one example of inline asm with call frame info:
> 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD

That brings us basically pretty close to square one though; having to 
maintain "manual" anotations. Something we're pretty much trying to avoid 
through this excercise.

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> > How are you handling control flow?
> 
> Control flow of what?
> 
> > > Here's the struct in its current state:
> > > 
> > >   #define UNDWARF_REG_UNDEFINED   0
> > >   #define UNDWARF_REG_CFA 1
> > >   #define UNDWARF_REG_SP  2
> > >   #define UNDWARF_REG_FP  3
> > >   #define UNDWARF_REG_SP_INDIRECT 4
> > >   #define UNDWARF_REG_FP_INDIRECT 5
> > >   #define UNDWARF_REG_R10 6
> > >   #define UNDWARF_REG_DI  7
> > >   #define UNDWARF_REG_DX  8
> > > 
> > 
> > Why only those registers?  Also, if you have the option I would really
> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
> > si, di, r8-r15 in that order.)
> 
> Those are the only registers which are ever needed as the base for
> finding the previous stack frame.  99% of the time it's sp or bp, the
> other registers are needed for aligned stacks and entry code.
> 
> Using the actual register numbers isn't an option because I don't need
> them all and they need to fit in a small number of bits.
> 
> This construct might be useful for other arches, which is why I called
> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)

Please call it BP - 'FP' can easily be read as floating-point, making it all 
super-confusing. We should use canonical x86 register names and ordering - even
if not all registers are used straight away.

Thanks,

Ingo


Re: [PATCH 7/7] DWARF: add the config option

2017-05-22 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> > How are you handling control flow?
> 
> Control flow of what?
> 
> > > Here's the struct in its current state:
> > > 
> > >   #define UNDWARF_REG_UNDEFINED   0
> > >   #define UNDWARF_REG_CFA 1
> > >   #define UNDWARF_REG_SP  2
> > >   #define UNDWARF_REG_FP  3
> > >   #define UNDWARF_REG_SP_INDIRECT 4
> > >   #define UNDWARF_REG_FP_INDIRECT 5
> > >   #define UNDWARF_REG_R10 6
> > >   #define UNDWARF_REG_DI  7
> > >   #define UNDWARF_REG_DX  8
> > > 
> > 
> > Why only those registers?  Also, if you have the option I would really
> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
> > si, di, r8-r15 in that order.)
> 
> Those are the only registers which are ever needed as the base for
> finding the previous stack frame.  99% of the time it's sp or bp, the
> other registers are needed for aligned stacks and entry code.
> 
> Using the actual register numbers isn't an option because I don't need
> them all and they need to fit in a small number of bits.
> 
> This construct might be useful for other arches, which is why I called
> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)

Please call it BP - 'FP' can easily be read as floating-point, making it all 
super-confusing. We should use canonical x86 register names and ordering - even
if not all registers are used straight away.

Thanks,

Ingo


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Linus Torvalds
On Sat, May 20, 2017 at 4:00 PM, Linus Torvalds
 wrote:
>
> hjl already posted an example of the kinds of horrors glibc does to do
> things "right".

Side note: we'd hopefully/presumably never need anything _that_
disgusting for the kernel, so hjl's example is probably an extreme
one.

But even when we just did the pushq/popq_cfi macros etc to try to have
simple and reasonably legible annotations for the common cases, it got
pretty ugly.

It wasn't that extreme glibc kind of "50 lines of ugly for two
instructions of code", but it was pretty bad. And as far as I know we
never even tried to annotate places where we did "pushf/pop %reg" in
inline asm (for saving/restoring flags)

   Linus


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Linus Torvalds
On Sat, May 20, 2017 at 4:00 PM, Linus Torvalds
 wrote:
>
> hjl already posted an example of the kinds of horrors glibc does to do
> things "right".

Side note: we'd hopefully/presumably never need anything _that_
disgusting for the kernel, so hjl's example is probably an extreme
one.

But even when we just did the pushq/popq_cfi macros etc to try to have
simple and reasonably legible annotations for the common cases, it got
pretty ugly.

It wasn't that extreme glibc kind of "50 lines of ugly for two
instructions of code", but it was pretty bad. And as far as I know we
never even tried to annotate places where we did "pushf/pop %reg" in
inline asm (for saving/restoring flags)

   Linus


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Linus Torvalds
On Sat, May 20, 2017 at 2:56 PM, Andy Lutomirski  wrote:
> On Sat, May 20, 2017 at 1:16 PM, Linus Torvalds
>  wrote:
>>
>> The amount of unreadable crap and bugs it requires is not worth the
>> pain. Not for *any* amount of gain, and the gain here is basically
>> zero.
>
> But what if objtool autogenerated the annotations, perhaps with a tiny
> bit of help telling it "hardware frame goes here" or "pt_regs goes
> here"?

You snipped the next part of my email, where I said:

> The *only* acceptable model is automated tools (ie objtool). Don't
> even bother to try to go any other way. Because I will not accept that
> shit.

so yes, objtool parsing things on its own is acceptable (and it had
better not need any help - it already checks frame pointer data).

The CFI annotations needed in asm are horrendous. We used to have
them, and we didn't have even _remotely_ complete annotations and
despite that they were
 (a) wrong
 (b) incomplete
 (c) made the asm impossible to read and even worse to modify.

hjl already posted an example of the kinds of horrors glibc does to do
things "right". And those rabbit ears around "right" are there for a
reason. There's no way that is ever right - even if it gets the right
results, it's an unmaintainable piece of crap.

So no, we're never ever adding that CFI garbage back into the kernel.

A tool that can generate it is ok, but even then we should expect
inevitable bugs and not trust the end result blindly.

Because dwarf info is complex enough that other tools have gotten it
wrong many many times. Just google for "gcc bugzilla cfi" or go to the
gcc bugzilla and search for "DWARF" or whatever. It's not "oh, we once
had a bug". It's constant.

One of the reasons I like the idea of having objtool generate
something *simpler* than dwarf is that it not only is much easier to
parse, it has a much higher likelihood of not having some crazy bug.
If objtool mainly looks at the actual instructions, and perhaps uses
dwarf information as additional input and creates something much
simpler than dwarf, it might have a chance in hell of occasionally
even getting it right.

Because dwarf information is really really complicated. It's
complicated because it contains *way* more information than just how
to find the next stack frame.

I mean, it has basically a RPN interpreter built in, and that's the
_simple_ part.

  Linus


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Linus Torvalds
On Sat, May 20, 2017 at 2:56 PM, Andy Lutomirski  wrote:
> On Sat, May 20, 2017 at 1:16 PM, Linus Torvalds
>  wrote:
>>
>> The amount of unreadable crap and bugs it requires is not worth the
>> pain. Not for *any* amount of gain, and the gain here is basically
>> zero.
>
> But what if objtool autogenerated the annotations, perhaps with a tiny
> bit of help telling it "hardware frame goes here" or "pt_regs goes
> here"?

You snipped the next part of my email, where I said:

> The *only* acceptable model is automated tools (ie objtool). Don't
> even bother to try to go any other way. Because I will not accept that
> shit.

so yes, objtool parsing things on its own is acceptable (and it had
better not need any help - it already checks frame pointer data).

The CFI annotations needed in asm are horrendous. We used to have
them, and we didn't have even _remotely_ complete annotations and
despite that they were
 (a) wrong
 (b) incomplete
 (c) made the asm impossible to read and even worse to modify.

hjl already posted an example of the kinds of horrors glibc does to do
things "right". And those rabbit ears around "right" are there for a
reason. There's no way that is ever right - even if it gets the right
results, it's an unmaintainable piece of crap.

So no, we're never ever adding that CFI garbage back into the kernel.

A tool that can generate it is ok, but even then we should expect
inevitable bugs and not trust the end result blindly.

Because dwarf info is complex enough that other tools have gotten it
wrong many many times. Just google for "gcc bugzilla cfi" or go to the
gcc bugzilla and search for "DWARF" or whatever. It's not "oh, we once
had a bug". It's constant.

One of the reasons I like the idea of having objtool generate
something *simpler* than dwarf is that it not only is much easier to
parse, it has a much higher likelihood of not having some crazy bug.
If objtool mainly looks at the actual instructions, and perhaps uses
dwarf information as additional input and creates something much
simpler than dwarf, it might have a chance in hell of occasionally
even getting it right.

Because dwarf information is really really complicated. It's
complicated because it contains *way* more information than just how
to find the next stack frame.

I mean, it has basically a RPN interpreter built in, and that's the
_simple_ part.

  Linus


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread H.J. Lu
On Sat, May 20, 2017 at 2:58 PM, Andy Lutomirski  wrote:
> On Sat, May 20, 2017 at 1:01 PM, H.J. Lu  wrote:
>> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:
>>

 (H.J., could we get a binutils feature that allows is to do:

 pushq %whatever
 .cfi_adjust_sp -8
 ...
 popq %whatever
 .cfi_adjust_sp 8

>>
>> Np.  Compiler needs to generate this.
>>
>
> How would the compiler generate this when inline asm is involved?  For
> the kernel, objtool could get around the need to have these
> annotations, but not so much for user code?  Is the compiler supposed
> to parse the inline asm?  Would the compiler provide some magic % code
> to represent the current CFA base register?

Here is one example of inline asm with call frame info:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD

-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread H.J. Lu
On Sat, May 20, 2017 at 2:58 PM, Andy Lutomirski  wrote:
> On Sat, May 20, 2017 at 1:01 PM, H.J. Lu  wrote:
>> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:
>>

 (H.J., could we get a binutils feature that allows is to do:

 pushq %whatever
 .cfi_adjust_sp -8
 ...
 popq %whatever
 .cfi_adjust_sp 8

>>
>> Np.  Compiler needs to generate this.
>>
>
> How would the compiler generate this when inline asm is involved?  For
> the kernel, objtool could get around the need to have these
> annotations, but not so much for user code?  Is the compiler supposed
> to parse the inline asm?  Would the compiler provide some magic % code
> to represent the current CFA base register?

Here is one example of inline asm with call frame info:

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86_64/sigaction.c;h=be058bac436d1cc9794b2b03107676ed99f6b872;hb=HEAD

-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Andy Lutomirski
On Sat, May 20, 2017 at 1:01 PM, H.J. Lu  wrote:
> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:
>
>>>
>>> (H.J., could we get a binutils feature that allows is to do:
>>>
>>> pushq %whatever
>>> .cfi_adjust_sp -8
>>> ...
>>> popq %whatever
>>> .cfi_adjust_sp 8
>>>
>
> Np.  Compiler needs to generate this.
>

How would the compiler generate this when inline asm is involved?  For
the kernel, objtool could get around the need to have these
annotations, but not so much for user code?  Is the compiler supposed
to parse the inline asm?  Would the compiler provide some magic % code
to represent the current CFA base register?


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Andy Lutomirski
On Sat, May 20, 2017 at 1:01 PM, H.J. Lu  wrote:
> On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:
>
>>>
>>> (H.J., could we get a binutils feature that allows is to do:
>>>
>>> pushq %whatever
>>> .cfi_adjust_sp -8
>>> ...
>>> popq %whatever
>>> .cfi_adjust_sp 8
>>>
>
> Np.  Compiler needs to generate this.
>

How would the compiler generate this when inline asm is involved?  For
the kernel, objtool could get around the need to have these
annotations, but not so much for user code?  Is the compiler supposed
to parse the inline asm?  Would the compiler provide some magic % code
to represent the current CFA base register?


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Andy Lutomirski
On Sat, May 20, 2017 at 1:16 PM, Linus Torvalds
 wrote:
> On Fri, May 19, 2017 at 10:23 PM, Andy Lutomirski  wrote:
>>
>> I personally like the idea of using real DWARF annotations in the
>> entry code because it makes gdb work better (not kgdb -- real gdb
>> attached to KVM).  I bet that we could get entry asm annotations into
>> good shape if we really wanted to.  OTOH, getting DWARF to work well
>> for inline asm is really nasty IIRC.
>
> No. I will NAK *any* attempt to make our asm contain the crazy
> shit-for-brains annotations.
>
> Been there, done that, got the T-shirt, and then doused the T-shirt in
> gasoline and put it on fire.
>
> The amount of unreadable crap and bugs it requires is not worth the
> pain. Not for *any* amount of gain, and the gain here is basically
> zero.

But what if objtool autogenerated the annotations, perhaps with a tiny
bit of help telling it "hardware frame goes here" or "pt_regs goes
here"?


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Andy Lutomirski
On Sat, May 20, 2017 at 1:16 PM, Linus Torvalds
 wrote:
> On Fri, May 19, 2017 at 10:23 PM, Andy Lutomirski  wrote:
>>
>> I personally like the idea of using real DWARF annotations in the
>> entry code because it makes gdb work better (not kgdb -- real gdb
>> attached to KVM).  I bet that we could get entry asm annotations into
>> good shape if we really wanted to.  OTOH, getting DWARF to work well
>> for inline asm is really nasty IIRC.
>
> No. I will NAK *any* attempt to make our asm contain the crazy
> shit-for-brains annotations.
>
> Been there, done that, got the T-shirt, and then doused the T-shirt in
> gasoline and put it on fire.
>
> The amount of unreadable crap and bugs it requires is not worth the
> pain. Not for *any* amount of gain, and the gain here is basically
> zero.

But what if objtool autogenerated the annotations, perhaps with a tiny
bit of help telling it "hardware frame goes here" or "pt_regs goes
here"?


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Linus Torvalds
On Fri, May 19, 2017 at 10:23 PM, Andy Lutomirski  wrote:
>
> I personally like the idea of using real DWARF annotations in the
> entry code because it makes gdb work better (not kgdb -- real gdb
> attached to KVM).  I bet that we could get entry asm annotations into
> good shape if we really wanted to.  OTOH, getting DWARF to work well
> for inline asm is really nasty IIRC.

No. I will NAK *any* attempt to make our asm contain the crazy
shit-for-brains annotations.

Been there, done that, got the T-shirt, and then doused the T-shirt in
gasoline and put it on fire.

The amount of unreadable crap and bugs it requires is not worth the
pain. Not for *any* amount of gain, and the gain here is basically
zero.

> (H.J., could we get a binutils feature that allows is to do:
>
> pushq %whatever
> .cfi_adjust_sp -8
> ...
> popq %whatever
> .cfi_adjust_sp 8
>
> that will emit the right DWARF instructions regardless of whether
> there's a frame pointer or not?  .cfi_adjust_cfa_offset is not
> particularly helpful here because it's totally wrong if the CFA is
> currently being computed based on BP.)

Yeah, that's just a small example of the kind of crap people have to deal with.

Not going to happen. Assembler files are hard enough to write (and
read) as-is, anything that expects us to go back to the bad old days
with crazy shit cfi annotations is going to get violently NAK'ed and
vetoed forever.

The *only* acceptable model is automated tools (ie objtool). Don't
even bother to try to go any other way. Because I will not accept that
shit.

 Linus


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Linus Torvalds
On Fri, May 19, 2017 at 10:23 PM, Andy Lutomirski  wrote:
>
> I personally like the idea of using real DWARF annotations in the
> entry code because it makes gdb work better (not kgdb -- real gdb
> attached to KVM).  I bet that we could get entry asm annotations into
> good shape if we really wanted to.  OTOH, getting DWARF to work well
> for inline asm is really nasty IIRC.

No. I will NAK *any* attempt to make our asm contain the crazy
shit-for-brains annotations.

Been there, done that, got the T-shirt, and then doused the T-shirt in
gasoline and put it on fire.

The amount of unreadable crap and bugs it requires is not worth the
pain. Not for *any* amount of gain, and the gain here is basically
zero.

> (H.J., could we get a binutils feature that allows is to do:
>
> pushq %whatever
> .cfi_adjust_sp -8
> ...
> popq %whatever
> .cfi_adjust_sp 8
>
> that will emit the right DWARF instructions regardless of whether
> there's a frame pointer or not?  .cfi_adjust_cfa_offset is not
> particularly helpful here because it's totally wrong if the CFA is
> currently being computed based on BP.)

Yeah, that's just a small example of the kind of crap people have to deal with.

Not going to happen. Assembler files are hard enough to write (and
read) as-is, anything that expects us to go back to the bad old days
with crazy shit cfi annotations is going to get violently NAK'ed and
vetoed forever.

The *only* acceptable model is automated tools (ie objtool). Don't
even bother to try to go any other way. Because I will not accept that
shit.

 Linus


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread H.J. Lu
On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:

>>
>> (H.J., could we get a binutils feature that allows is to do:
>>
>> pushq %whatever
>> .cfi_adjust_sp -8
>> ...
>> popq %whatever
>> .cfi_adjust_sp 8
>>

Np.  Compiler needs to generate this.

-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread H.J. Lu
On Sat, May 20, 2017 at 9:20 AM, Josh Poimboeuf  wrote:

>>
>> (H.J., could we get a binutils feature that allows is to do:
>>
>> pushq %whatever
>> .cfi_adjust_sp -8
>> ...
>> popq %whatever
>> .cfi_adjust_sp 8
>>

Np.  Compiler needs to generate this.

-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Josh Poimboeuf
On Sat, May 20, 2017 at 11:20:34AM -0500, Josh Poimboeuf wrote:
>But then, if we're going that far, why not just have objtool reformat
>the data into something much simpler?  It already has the knowledge
>to do so.  Then we don't have to jump through all those hoops to
>justify jumping through more hoops in the kernel (i.e., having a
>complex DWARF state machine).  With a simple debuginfo format, the
>kernel unwinder is simple enough that we don't need to validate its
>functionality in a simulator.

I should clarify that it doesn't have to be objtool which does this.  It
could instead be a simple DWARF-to-undwarf conversion tool which runs
during the vmlinux linking stage.

Anyway we're both proposing simplifying the DWARF data into an
easier-to-parse format.  I think the question is whether we want that
simplification process to happen in the kernel (in the middle of a
kernel unwind operation), or at build time.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Josh Poimboeuf
On Sat, May 20, 2017 at 11:20:34AM -0500, Josh Poimboeuf wrote:
>But then, if we're going that far, why not just have objtool reformat
>the data into something much simpler?  It already has the knowledge
>to do so.  Then we don't have to jump through all those hoops to
>justify jumping through more hoops in the kernel (i.e., having a
>complex DWARF state machine).  With a simple debuginfo format, the
>kernel unwinder is simple enough that we don't need to validate its
>functionality in a simulator.

I should clarify that it doesn't have to be objtool which does this.  It
could instead be a simple DWARF-to-undwarf conversion tool which runs
during the vmlinux linking stage.

Anyway we're both proposing simplifying the DWARF data into an
easier-to-parse format.  I think the question is whether we want that
simplification process to happen in the kernel (in the middle of a
kernel unwind operation), or at build time.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Josh Poimboeuf
On Fri, May 19, 2017 at 10:23:53PM -0700, Andy Lutomirski wrote:
> On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeuf  wrote:
> > On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote:
> >> > How are you handling control flow?
> >>
> >> Control flow of what?
> >>
> >> > > Here's the struct in its current state:
> >> > >
> >> > >   #define UNDWARF_REG_UNDEFINED   0
> >> > >   #define UNDWARF_REG_CFA 1
> >> > >   #define UNDWARF_REG_SP  2
> >> > >   #define UNDWARF_REG_FP  3
> >> > >   #define UNDWARF_REG_SP_INDIRECT 4
> >> > >   #define UNDWARF_REG_FP_INDIRECT 5
> >> > >   #define UNDWARF_REG_R10 6
> >> > >   #define UNDWARF_REG_DI  7
> >> > >   #define UNDWARF_REG_DX  8
> >> > >
> >> >
> >> > Why only those registers?  Also, if you have the option I would really
> >> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
> >> > si, di, r8-r15 in that order.)
> >>
> >> Those are the only registers which are ever needed as the base for
> >> finding the previous stack frame.  99% of the time it's sp or bp, the
> >> other registers are needed for aligned stacks and entry code.
> >>
> >> Using the actual register numbers isn't an option because I don't need
> >> them all and they need to fit in a small number of bits.
> >>
> >> This construct might be useful for other arches, which is why I called
> >> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)
> >
> > BTW, here's the link to the unwinder code if you're interested:
> >
> >   
> > https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c
> 
> At the risk of potentially overcomplicating matters, here's a
> suggestion.  As far as I know, all (or most all?) unwinders
> effectively do the following in a loop:
> 
> 1. Look up the IP to figure out how to unwind from that IP.
> 2. Use the results of that lookup to compute the previous frame state.
> 
> The results of step 1 could perhaps be expressed like this:
> 
> struct reg_formula {
>   unsigned int source_reg :4;
>   long offset;
>   bool dereference;  /* true: *(reg + offset); false: (reg + offset) */
>   /* For DWARF, I think this can be considerably more complicated, but
> I doubt it's useful. */
> };
> 
> struct unwind_step {
>   u16 available_regs;  /* mask of the caller frame regs that we are
> able to recover */
>   struct reg_formula[16];
> };

Ok, so I assume you mean we would need to have an in-kernel DWARF reader
which reads .eh_frame and converts it to the above, which is called for
every step of the unwind phase.

> The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus
> or minus sizeof(unsigned long) or whatever -- I can never remember
> exactly what CFA refers to.)  For a frame pointer-based unwinder, the
> entire unwind_step is a foregone conclusion independent of IP: SP = BP
> + 8 (or whatever), BP = *(BP + whatever), all other regs unknown.
> 
> Could it make sense to actually structure the code this way?  I can
> see a few advantages.  It would make the actual meat of the unwind
> loop totally independent of the unwinding algorithm in use,

Yes, this part of it is an interesting idea, separating the debuginfo
reading step from the unwinding step.  And for people who don't want to
carry around megs of DWARF data, get_dwarf_step() could just be a fake
lookup which always returns the frame pointer version.  

> it would
> make the meat be dead simple (and thus easy to verify for
> non-crashiness), and I think it just might open the door for a real
> in-kernel DWARF unwinder that Linus would be okay with.  Specifically,
> write a function like:
> 
> bool get_dwarf_step(struct unwind_step *step, unsigned long ip);

If we keep the frame pointer encoding thing for non-DWARF kernels then
we may also need to pass in bp as well.

Or maybe we could force frame pointer users to at least have DWARF data
for the entry code.  Then even frame pointer kernels could detect entry
regs and we could get rid of that nasty frame pointer encoding thing.

> Put this function in its own file and make it buildable as kernel code
> or as user code.  Write a test case that runs it on every single
> address on the kernel image (in user mode!) with address-sanitizer
> enabled (or in Valgrind or both) and make sure that (a) it doesn't
> blow up and (b) that the results are credible (e.g. by comparing to
> objtool output).  Heck, you could even fuzz-test it where the fuzzer
> is allowed to corrupt the actual DWARF data.  You could do the same
> thing with whatever crazy super-compacted undwarf scheme someone comes
> up with down the road, too.

I think your proposal can be separated into two ideas, which can each be
considered on their own merit:

1) Put .eh_frame in the kernel, along with an in-kernel DWARF unwinder.
   Manage the complexity of the unwinder by validating the output of the
   complex 

Re: [PATCH 7/7] DWARF: add the config option

2017-05-20 Thread Josh Poimboeuf
On Fri, May 19, 2017 at 10:23:53PM -0700, Andy Lutomirski wrote:
> On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeuf  wrote:
> > On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote:
> >> > How are you handling control flow?
> >>
> >> Control flow of what?
> >>
> >> > > Here's the struct in its current state:
> >> > >
> >> > >   #define UNDWARF_REG_UNDEFINED   0
> >> > >   #define UNDWARF_REG_CFA 1
> >> > >   #define UNDWARF_REG_SP  2
> >> > >   #define UNDWARF_REG_FP  3
> >> > >   #define UNDWARF_REG_SP_INDIRECT 4
> >> > >   #define UNDWARF_REG_FP_INDIRECT 5
> >> > >   #define UNDWARF_REG_R10 6
> >> > >   #define UNDWARF_REG_DI  7
> >> > >   #define UNDWARF_REG_DX  8
> >> > >
> >> >
> >> > Why only those registers?  Also, if you have the option I would really
> >> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
> >> > si, di, r8-r15 in that order.)
> >>
> >> Those are the only registers which are ever needed as the base for
> >> finding the previous stack frame.  99% of the time it's sp or bp, the
> >> other registers are needed for aligned stacks and entry code.
> >>
> >> Using the actual register numbers isn't an option because I don't need
> >> them all and they need to fit in a small number of bits.
> >>
> >> This construct might be useful for other arches, which is why I called
> >> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)
> >
> > BTW, here's the link to the unwinder code if you're interested:
> >
> >   
> > https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c
> 
> At the risk of potentially overcomplicating matters, here's a
> suggestion.  As far as I know, all (or most all?) unwinders
> effectively do the following in a loop:
> 
> 1. Look up the IP to figure out how to unwind from that IP.
> 2. Use the results of that lookup to compute the previous frame state.
> 
> The results of step 1 could perhaps be expressed like this:
> 
> struct reg_formula {
>   unsigned int source_reg :4;
>   long offset;
>   bool dereference;  /* true: *(reg + offset); false: (reg + offset) */
>   /* For DWARF, I think this can be considerably more complicated, but
> I doubt it's useful. */
> };
> 
> struct unwind_step {
>   u16 available_regs;  /* mask of the caller frame regs that we are
> able to recover */
>   struct reg_formula[16];
> };

Ok, so I assume you mean we would need to have an in-kernel DWARF reader
which reads .eh_frame and converts it to the above, which is called for
every step of the unwind phase.

> The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus
> or minus sizeof(unsigned long) or whatever -- I can never remember
> exactly what CFA refers to.)  For a frame pointer-based unwinder, the
> entire unwind_step is a foregone conclusion independent of IP: SP = BP
> + 8 (or whatever), BP = *(BP + whatever), all other regs unknown.
> 
> Could it make sense to actually structure the code this way?  I can
> see a few advantages.  It would make the actual meat of the unwind
> loop totally independent of the unwinding algorithm in use,

Yes, this part of it is an interesting idea, separating the debuginfo
reading step from the unwinding step.  And for people who don't want to
carry around megs of DWARF data, get_dwarf_step() could just be a fake
lookup which always returns the frame pointer version.  

> it would
> make the meat be dead simple (and thus easy to verify for
> non-crashiness), and I think it just might open the door for a real
> in-kernel DWARF unwinder that Linus would be okay with.  Specifically,
> write a function like:
> 
> bool get_dwarf_step(struct unwind_step *step, unsigned long ip);

If we keep the frame pointer encoding thing for non-DWARF kernels then
we may also need to pass in bp as well.

Or maybe we could force frame pointer users to at least have DWARF data
for the entry code.  Then even frame pointer kernels could detect entry
regs and we could get rid of that nasty frame pointer encoding thing.

> Put this function in its own file and make it buildable as kernel code
> or as user code.  Write a test case that runs it on every single
> address on the kernel image (in user mode!) with address-sanitizer
> enabled (or in Valgrind or both) and make sure that (a) it doesn't
> blow up and (b) that the results are credible (e.g. by comparing to
> objtool output).  Heck, you could even fuzz-test it where the fuzzer
> is allowed to corrupt the actual DWARF data.  You could do the same
> thing with whatever crazy super-compacted undwarf scheme someone comes
> up with down the road, too.

I think your proposal can be separated into two ideas, which can each be
considered on their own merit:

1) Put .eh_frame in the kernel, along with an in-kernel DWARF unwinder.
   Manage the complexity of the unwinder by validating the output of the
   complex part of the algorithm 

Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread Andy Lutomirski
On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeuf  wrote:
> On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote:
>> > How are you handling control flow?
>>
>> Control flow of what?
>>
>> > > Here's the struct in its current state:
>> > >
>> > >   #define UNDWARF_REG_UNDEFINED   0
>> > >   #define UNDWARF_REG_CFA 1
>> > >   #define UNDWARF_REG_SP  2
>> > >   #define UNDWARF_REG_FP  3
>> > >   #define UNDWARF_REG_SP_INDIRECT 4
>> > >   #define UNDWARF_REG_FP_INDIRECT 5
>> > >   #define UNDWARF_REG_R10 6
>> > >   #define UNDWARF_REG_DI  7
>> > >   #define UNDWARF_REG_DX  8
>> > >
>> >
>> > Why only those registers?  Also, if you have the option I would really
>> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
>> > si, di, r8-r15 in that order.)
>>
>> Those are the only registers which are ever needed as the base for
>> finding the previous stack frame.  99% of the time it's sp or bp, the
>> other registers are needed for aligned stacks and entry code.
>>
>> Using the actual register numbers isn't an option because I don't need
>> them all and they need to fit in a small number of bits.
>>
>> This construct might be useful for other arches, which is why I called
>> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)
>
> BTW, here's the link to the unwinder code if you're interested:
>
>   
> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c

At the risk of potentially overcomplicating matters, here's a
suggestion.  As far as I know, all (or most all?) unwinders
effectively do the following in a loop:

1. Look up the IP to figure out how to unwind from that IP.
2. Use the results of that lookup to compute the previous frame state.

The results of step 1 could perhaps be expressed like this:

struct reg_formula {
  unsigned int source_reg :4;
  long offset;
  bool dereference;  /* true: *(reg + offset); false: (reg + offset) */
  /* For DWARF, I think this can be considerably more complicated, but
I doubt it's useful. */
};

struct unwind_step {
  u16 available_regs;  /* mask of the caller frame regs that we are
able to recover */
  struct reg_formula[16];
};

The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus
or minus sizeof(unsigned long) or whatever -- I can never remember
exactly what CFA refers to.)  For a frame pointer-based unwinder, the
entire unwind_step is a foregone conclusion independent of IP: SP = BP
+ 8 (or whatever), BP = *(BP + whatever), all other regs unknown.

Could it make sense to actually structure the code this way?  I can
see a few advantages.  It would make the actual meat of the unwind
loop totally independent of the unwinding algorithm in use, it would
make the meat be dead simple (and thus easy to verify for
non-crashiness), and I think it just might open the door for a real
in-kernel DWARF unwinder that Linus would be okay with.  Specifically,
write a function like:

bool get_dwarf_step(struct unwind_step *step, unsigned long ip);

Put this function in its own file and make it buildable as kernel code
or as user code.  Write a test case that runs it on every single
address on the kernel image (in user mode!) with address-sanitizer
enabled (or in Valgrind or both) and make sure that (a) it doesn't
blow up and (b) that the results are credible (e.g. by comparing to
objtool output).  Heck, you could even fuzz-test it where the fuzzer
is allowed to corrupt the actual DWARF data.  You could do the same
thing with whatever crazy super-compacted undwarf scheme someone comes
up with down the road, too.

I personally like the idea of using real DWARF annotations in the
entry code because it makes gdb work better (not kgdb -- real gdb
attached to KVM).  I bet that we could get entry asm annotations into
good shape if we really wanted to.  OTOH, getting DWARF to work well
for inline asm is really nasty IIRC.

(H.J., could we get a binutils feature that allows is to do:

pushq %whatever
.cfi_adjust_sp -8
...
popq %whatever
.cfi_adjust_sp 8

that will emit the right DWARF instructions regardless of whether
there's a frame pointer or not?  .cfi_adjust_cfa_offset is not
particularly helpful here because it's totally wrong if the CFA is
currently being computed based on BP.)


Also, you read the stack like this:

*val = *(unsigned long *)addr;

how about probe_kernel_read() instead?

--Andy


Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread Andy Lutomirski
On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeuf  wrote:
> On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote:
>> > How are you handling control flow?
>>
>> Control flow of what?
>>
>> > > Here's the struct in its current state:
>> > >
>> > >   #define UNDWARF_REG_UNDEFINED   0
>> > >   #define UNDWARF_REG_CFA 1
>> > >   #define UNDWARF_REG_SP  2
>> > >   #define UNDWARF_REG_FP  3
>> > >   #define UNDWARF_REG_SP_INDIRECT 4
>> > >   #define UNDWARF_REG_FP_INDIRECT 5
>> > >   #define UNDWARF_REG_R10 6
>> > >   #define UNDWARF_REG_DI  7
>> > >   #define UNDWARF_REG_DX  8
>> > >
>> >
>> > Why only those registers?  Also, if you have the option I would really
>> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
>> > si, di, r8-r15 in that order.)
>>
>> Those are the only registers which are ever needed as the base for
>> finding the previous stack frame.  99% of the time it's sp or bp, the
>> other registers are needed for aligned stacks and entry code.
>>
>> Using the actual register numbers isn't an option because I don't need
>> them all and they need to fit in a small number of bits.
>>
>> This construct might be useful for other arches, which is why I called
>> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)
>
> BTW, here's the link to the unwinder code if you're interested:
>
>   
> https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c

At the risk of potentially overcomplicating matters, here's a
suggestion.  As far as I know, all (or most all?) unwinders
effectively do the following in a loop:

1. Look up the IP to figure out how to unwind from that IP.
2. Use the results of that lookup to compute the previous frame state.

The results of step 1 could perhaps be expressed like this:

struct reg_formula {
  unsigned int source_reg :4;
  long offset;
  bool dereference;  /* true: *(reg + offset); false: (reg + offset) */
  /* For DWARF, I think this can be considerably more complicated, but
I doubt it's useful. */
};

struct unwind_step {
  u16 available_regs;  /* mask of the caller frame regs that we are
able to recover */
  struct reg_formula[16];
};

The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus
or minus sizeof(unsigned long) or whatever -- I can never remember
exactly what CFA refers to.)  For a frame pointer-based unwinder, the
entire unwind_step is a foregone conclusion independent of IP: SP = BP
+ 8 (or whatever), BP = *(BP + whatever), all other regs unknown.

Could it make sense to actually structure the code this way?  I can
see a few advantages.  It would make the actual meat of the unwind
loop totally independent of the unwinding algorithm in use, it would
make the meat be dead simple (and thus easy to verify for
non-crashiness), and I think it just might open the door for a real
in-kernel DWARF unwinder that Linus would be okay with.  Specifically,
write a function like:

bool get_dwarf_step(struct unwind_step *step, unsigned long ip);

Put this function in its own file and make it buildable as kernel code
or as user code.  Write a test case that runs it on every single
address on the kernel image (in user mode!) with address-sanitizer
enabled (or in Valgrind or both) and make sure that (a) it doesn't
blow up and (b) that the results are credible (e.g. by comparing to
objtool output).  Heck, you could even fuzz-test it where the fuzzer
is allowed to corrupt the actual DWARF data.  You could do the same
thing with whatever crazy super-compacted undwarf scheme someone comes
up with down the road, too.

I personally like the idea of using real DWARF annotations in the
entry code because it makes gdb work better (not kgdb -- real gdb
attached to KVM).  I bet that we could get entry asm annotations into
good shape if we really wanted to.  OTOH, getting DWARF to work well
for inline asm is really nasty IIRC.

(H.J., could we get a binutils feature that allows is to do:

pushq %whatever
.cfi_adjust_sp -8
...
popq %whatever
.cfi_adjust_sp 8

that will emit the right DWARF instructions regardless of whether
there's a frame pointer or not?  .cfi_adjust_cfa_offset is not
particularly helpful here because it's totally wrong if the CFA is
currently being computed based on BP.)


Also, you read the stack like this:

*val = *(unsigned long *)addr;

how about probe_kernel_read() instead?

--Andy


Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread Josh Poimboeuf
On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote:
> > How are you handling control flow?
> 
> Control flow of what?
> 
> > > Here's the struct in its current state:
> > > 
> > >   #define UNDWARF_REG_UNDEFINED   0
> > >   #define UNDWARF_REG_CFA 1
> > >   #define UNDWARF_REG_SP  2
> > >   #define UNDWARF_REG_FP  3
> > >   #define UNDWARF_REG_SP_INDIRECT 4
> > >   #define UNDWARF_REG_FP_INDIRECT 5
> > >   #define UNDWARF_REG_R10 6
> > >   #define UNDWARF_REG_DI  7
> > >   #define UNDWARF_REG_DX  8
> > > 
> > 
> > Why only those registers?  Also, if you have the option I would really
> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
> > si, di, r8-r15 in that order.)
> 
> Those are the only registers which are ever needed as the base for
> finding the previous stack frame.  99% of the time it's sp or bp, the
> other registers are needed for aligned stacks and entry code.
> 
> Using the actual register numbers isn't an option because I don't need
> them all and they need to fit in a small number of bits.
> 
> This construct might be useful for other arches, which is why I called
> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)

BTW, here's the link to the unwinder code if you're interested:

  
https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread Josh Poimboeuf
On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote:
> > How are you handling control flow?
> 
> Control flow of what?
> 
> > > Here's the struct in its current state:
> > > 
> > >   #define UNDWARF_REG_UNDEFINED   0
> > >   #define UNDWARF_REG_CFA 1
> > >   #define UNDWARF_REG_SP  2
> > >   #define UNDWARF_REG_FP  3
> > >   #define UNDWARF_REG_SP_INDIRECT 4
> > >   #define UNDWARF_REG_FP_INDIRECT 5
> > >   #define UNDWARF_REG_R10 6
> > >   #define UNDWARF_REG_DI  7
> > >   #define UNDWARF_REG_DX  8
> > > 
> > 
> > Why only those registers?  Also, if you have the option I would really
> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
> > si, di, r8-r15 in that order.)
> 
> Those are the only registers which are ever needed as the base for
> finding the previous stack frame.  99% of the time it's sp or bp, the
> other registers are needed for aligned stacks and entry code.
> 
> Using the actual register numbers isn't an option because I don't need
> them all and they need to fit in a small number of bits.
> 
> This construct might be useful for other arches, which is why I called
> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)

BTW, here's the link to the unwinder code if you're interested:

  
https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread Josh Poimboeuf
> How are you handling control flow?

Control flow of what?

> > Here's the struct in its current state:
> > 
> > #define UNDWARF_REG_UNDEFINED   0
> > #define UNDWARF_REG_CFA 1
> > #define UNDWARF_REG_SP  2
> > #define UNDWARF_REG_FP  3
> > #define UNDWARF_REG_SP_INDIRECT 4
> > #define UNDWARF_REG_FP_INDIRECT 5
> > #define UNDWARF_REG_R10 6
> > #define UNDWARF_REG_DI  7
> > #define UNDWARF_REG_DX  8
> > 
> 
> Why only those registers?  Also, if you have the option I would really
> suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
> si, di, r8-r15 in that order.)

Those are the only registers which are ever needed as the base for
finding the previous stack frame.  99% of the time it's sp or bp, the
other registers are needed for aligned stacks and entry code.

Using the actual register numbers isn't an option because I don't need
them all and they need to fit in a small number of bits.

This construct might be useful for other arches, which is why I called
it "FP" instead of "BP".  But then I ruined that with the last 3 :-)

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread Josh Poimboeuf
> How are you handling control flow?

Control flow of what?

> > Here's the struct in its current state:
> > 
> > #define UNDWARF_REG_UNDEFINED   0
> > #define UNDWARF_REG_CFA 1
> > #define UNDWARF_REG_SP  2
> > #define UNDWARF_REG_FP  3
> > #define UNDWARF_REG_SP_INDIRECT 4
> > #define UNDWARF_REG_FP_INDIRECT 5
> > #define UNDWARF_REG_R10 6
> > #define UNDWARF_REG_DI  7
> > #define UNDWARF_REG_DX  8
> > 
> 
> Why only those registers?  Also, if you have the option I would really
> suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
> si, di, r8-r15 in that order.)

Those are the only registers which are ever needed as the base for
finding the previous stack frame.  99% of the time it's sp or bp, the
other registers are needed for aligned stacks and entry code.

Using the actual register numbers isn't an option because I don't need
them all and they need to fit in a small number of bits.

This construct might be useful for other arches, which is why I called
it "FP" instead of "BP".  But then I ruined that with the last 3 :-)

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread H. Peter Anvin
On 05/19/17 13:53, Josh Poimboeuf wrote:
> 
> Here's the struct in its current state:
> 
>   #define UNDWARF_REG_UNDEFINED   0
>   #define UNDWARF_REG_CFA 1
>   #define UNDWARF_REG_SP  2
>   #define UNDWARF_REG_FP  3
>   #define UNDWARF_REG_SP_INDIRECT 4
>   #define UNDWARF_REG_FP_INDIRECT 5
>   #define UNDWARF_REG_R10 6
>   #define UNDWARF_REG_DI  7
>   #define UNDWARF_REG_DX  8
> 

Why only those registers?  Also, if you have the option I would really
suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
si, di, r8-r15 in that order.)

-hpa



Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread H. Peter Anvin
On 05/19/17 13:53, Josh Poimboeuf wrote:
> 
> Here's the struct in its current state:
> 
>   #define UNDWARF_REG_UNDEFINED   0
>   #define UNDWARF_REG_CFA 1
>   #define UNDWARF_REG_SP  2
>   #define UNDWARF_REG_FP  3
>   #define UNDWARF_REG_SP_INDIRECT 4
>   #define UNDWARF_REG_FP_INDIRECT 5
>   #define UNDWARF_REG_R10 6
>   #define UNDWARF_REG_DI  7
>   #define UNDWARF_REG_DX  8
> 

Why only those registers?  Also, if you have the option I would really
suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
si, di, r8-r15 in that order.)

-hpa



Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread H. Peter Anvin
On 05/19/17 13:53, Josh Poimboeuf wrote:
>>
>> One instance of the structure would exist for each time the stack
>> pointer changes, e.g. for every function entry, push/pop, and rsp
>> add/subtract.  The data could be assembled and sorted offline, possibly
>> derived from DWARF, or more likely, generated by objtool.  After doing
>> some rough calculations, I think the section size would be comparable to
>> the sizes of the DWARF .eh_frame sections it would replace.
>>
>> If it worked, the "undwarf" unwinder would be a lot simpler than a real
>> DWARF unwinder.  And validating the sanity of the data at runtime would
>> be a lot more straightforward.  It could ensure that each stack pointer
>> is within the bounds of the current stack, like our current unwinder
>> does.
> 
> I've been hacking away at this, and so far it's working well.  The code
> is much simpler than a DWARF unwinder.  Right now the kernel piece is
> only ~350 lines of code.  The vast majority of the changes are in
> objtool.
> 
> It's now successfully unwinding through entry code and most other asm
> files, dumping entry regs, dealing with aligned stacks, dynamic stacks,
> etc.
> 
> Here's the struct in its current state:

How are you handling control flow?

-hpa



Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread H. Peter Anvin
On 05/19/17 13:53, Josh Poimboeuf wrote:
>>
>> One instance of the structure would exist for each time the stack
>> pointer changes, e.g. for every function entry, push/pop, and rsp
>> add/subtract.  The data could be assembled and sorted offline, possibly
>> derived from DWARF, or more likely, generated by objtool.  After doing
>> some rough calculations, I think the section size would be comparable to
>> the sizes of the DWARF .eh_frame sections it would replace.
>>
>> If it worked, the "undwarf" unwinder would be a lot simpler than a real
>> DWARF unwinder.  And validating the sanity of the data at runtime would
>> be a lot more straightforward.  It could ensure that each stack pointer
>> is within the bounds of the current stack, like our current unwinder
>> does.
> 
> I've been hacking away at this, and so far it's working well.  The code
> is much simpler than a DWARF unwinder.  Right now the kernel piece is
> only ~350 lines of code.  The vast majority of the changes are in
> objtool.
> 
> It's now successfully unwinding through entry code and most other asm
> files, dumping entry regs, dealing with aligned stacks, dynamic stacks,
> etc.
> 
> Here's the struct in its current state:

How are you handling control flow?

-hpa



Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread Josh Poimboeuf
On Sun, May 07, 2017 at 11:55:24AM -0500, Josh Poimboeuf wrote:
> I'm thinking/hoping that information can be expressed in a simple, easy
> to parse, reasonably sized data structure.  Something like a sorted
> array of this:
> 
>   struct undwarf {
>   unsigned int ip;/* instruction pointer (relative offset 
> from base) */
>   unsigned prev_frame:13; /* offset to previous frame from 
> current stack pointer */
>   unsigned regs:1;/* whether prev_frame contains entry 
> regs (regs->ip) */
>   unsigned align:2;   /* some details for dealing with gcc 
> stack realignment */
>   } __packed;
> 
>   extern struct undwarf undwarves[];
> 
> One instance of the structure would exist for each time the stack
> pointer changes, e.g. for every function entry, push/pop, and rsp
> add/subtract.  The data could be assembled and sorted offline, possibly
> derived from DWARF, or more likely, generated by objtool.  After doing
> some rough calculations, I think the section size would be comparable to
> the sizes of the DWARF .eh_frame sections it would replace.
> 
> If it worked, the "undwarf" unwinder would be a lot simpler than a real
> DWARF unwinder.  And validating the sanity of the data at runtime would
> be a lot more straightforward.  It could ensure that each stack pointer
> is within the bounds of the current stack, like our current unwinder
> does.

I've been hacking away at this, and so far it's working well.  The code
is much simpler than a DWARF unwinder.  Right now the kernel piece is
only ~350 lines of code.  The vast majority of the changes are in
objtool.

It's now successfully unwinding through entry code and most other asm
files, dumping entry regs, dealing with aligned stacks, dynamic stacks,
etc.

Here's the struct in its current state:

#define UNDWARF_REG_UNDEFINED   0
#define UNDWARF_REG_CFA 1
#define UNDWARF_REG_SP  2
#define UNDWARF_REG_FP  3
#define UNDWARF_REG_SP_INDIRECT 4
#define UNDWARF_REG_FP_INDIRECT 5
#define UNDWARF_REG_R10 6
#define UNDWARF_REG_DI  7
#define UNDWARF_REG_DX  8

#define UNDWARF_TYPE_NORMAL 0
#define UNDWARF_TYPE_REGS   1
#define UNDWARF_TYPE_REGS_IRET  2

struct undwarf_state {
int ip;
unsigned int len;
short cfa_offset;
short fp_offset;
unsigned cfa_reg:4;
unsigned fp_reg:4;
unsigned type:2;
};

With frame pointers disabled, around 300,000 of those structs are needed
for my kernel, which works out to be 4.7M of data.  By comparison, the
DWARF eh_frame sections would be 2.1M.  I think we should be able to
compress it down to a comparable size by rearranging the data a little
bit.

The entry code needs some annotations to give some hints to objtool
about how to generate the data, but it's not bad:

  
https://paste.fedoraproject.org/paste/Xq3bPlx5An0Si7AshZTdkF5M1UNdIGYhyRLivL9gydE=

I still have a lot of work to do on the tooling side: module support,
sorting the undwarf table at build time, and a lot of cleanups.  But
overall it's looking feasible.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-19 Thread Josh Poimboeuf
On Sun, May 07, 2017 at 11:55:24AM -0500, Josh Poimboeuf wrote:
> I'm thinking/hoping that information can be expressed in a simple, easy
> to parse, reasonably sized data structure.  Something like a sorted
> array of this:
> 
>   struct undwarf {
>   unsigned int ip;/* instruction pointer (relative offset 
> from base) */
>   unsigned prev_frame:13; /* offset to previous frame from 
> current stack pointer */
>   unsigned regs:1;/* whether prev_frame contains entry 
> regs (regs->ip) */
>   unsigned align:2;   /* some details for dealing with gcc 
> stack realignment */
>   } __packed;
> 
>   extern struct undwarf undwarves[];
> 
> One instance of the structure would exist for each time the stack
> pointer changes, e.g. for every function entry, push/pop, and rsp
> add/subtract.  The data could be assembled and sorted offline, possibly
> derived from DWARF, or more likely, generated by objtool.  After doing
> some rough calculations, I think the section size would be comparable to
> the sizes of the DWARF .eh_frame sections it would replace.
> 
> If it worked, the "undwarf" unwinder would be a lot simpler than a real
> DWARF unwinder.  And validating the sanity of the data at runtime would
> be a lot more straightforward.  It could ensure that each stack pointer
> is within the bounds of the current stack, like our current unwinder
> does.

I've been hacking away at this, and so far it's working well.  The code
is much simpler than a DWARF unwinder.  Right now the kernel piece is
only ~350 lines of code.  The vast majority of the changes are in
objtool.

It's now successfully unwinding through entry code and most other asm
files, dumping entry regs, dealing with aligned stacks, dynamic stacks,
etc.

Here's the struct in its current state:

#define UNDWARF_REG_UNDEFINED   0
#define UNDWARF_REG_CFA 1
#define UNDWARF_REG_SP  2
#define UNDWARF_REG_FP  3
#define UNDWARF_REG_SP_INDIRECT 4
#define UNDWARF_REG_FP_INDIRECT 5
#define UNDWARF_REG_R10 6
#define UNDWARF_REG_DI  7
#define UNDWARF_REG_DX  8

#define UNDWARF_TYPE_NORMAL 0
#define UNDWARF_TYPE_REGS   1
#define UNDWARF_TYPE_REGS_IRET  2

struct undwarf_state {
int ip;
unsigned int len;
short cfa_offset;
short fp_offset;
unsigned cfa_reg:4;
unsigned fp_reg:4;
unsigned type:2;
};

With frame pointers disabled, around 300,000 of those structs are needed
for my kernel, which works out to be 4.7M of data.  By comparison, the
DWARF eh_frame sections would be 2.1M.  I think we should be able to
compress it down to a comparable size by rearranging the data a little
bit.

The entry code needs some annotations to give some hints to objtool
about how to generate the data, but it's not bad:

  
https://paste.fedoraproject.org/paste/Xq3bPlx5An0Si7AshZTdkF5M1UNdIGYhyRLivL9gydE=

I still have a lot of work to do on the tooling side: module support,
sorting the undwarf table at build time, and a lot of cleanups.  But
overall it's looking feasible.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Linus Torvalds
On Wed, May 10, 2017 at 12:39 AM, Jiri Slaby  wrote:
>
> Every SUSE user has been using this for almost a decade and we are not
> about to switch to FP for performance reasons as noted by Jiri Kosina.

The whole "not about to switch on frame pointers" argument is bogus.

Lots of people don't have frame pointers. The tracebacks don't look
all that horrible despite that.

If the problem is that some debug option that you want to use do that
"select FRAME_POINTER" thing, then maybe we should just fix that.

For example, I think it's annoying that the LATENCYTOP helper config
option basically forces frame pointers. That just seems stupid. Even
enabling CONFIG_STACKTRACE doesn't do that.

We do fine without frame pointers. Do traces get a bit uglier? Sure.
But that's not a huge deal.

> Well, reliable stack-traces with minimal performance impact thanks to
> out-of-band data is hell good reason in my opinion.

It's not the performance impact.

It's the other crap.

It's the fact that you have a whole state machine that isn't even
used. The only reason for that state machine is for register contents,
but then the register contents aren't actually used by the stack trace
code afaik.

Yeah, in theory that register engine might be used for dynamic stack
sizes too, but I don't think gcc actually generates code like that -
it uses frame pointers for variable-sized stacks, doesn't it?

But historically, it's even more the "oops, the unwind tables are
buggy because the test coverage is horrible, and we walked off into
the weeds while walking them, taking a recursive page fault, which
turned a WARN_ON() into a dead machine that didn't even give us the
information we wanted in the first place".

Now, it may be that with tools like objtool, we might be able to
*validate* the tables, which might actually be a good safety net.

So I'm not saying that the unwinder is a total no-go.

But I want to get rid of these red herring arguments in its favor.

If the argument *for* the unwinder i scrazy bullshit (eg "I want to
use LATENCYTOP without CONFIG_FRAME_POINTER"), then we should fix
those things independently.

>> The fact that it gets disabled for KASAN also makes me suspicious. It
>> basically means that now all the accesses it does are not even
>> validated.
>
> OK, I inclined to disable KASAN when I started cleaning this up for
> _performance_ reasons. The system was so slow, that the RCU stall or
> soft-lockup detectors came up complaining. From that time, I measured
> the bottlenecks and optimized the unwinder so that 1000 iterations of
> unwinding takes:
>
> Before:
> real0m1.808s
> user0m0.001s
> sys 0m1.807s
>
> After:
> real0m0.018s
> user0m0.001s
> sys 0m0.017s
>
> So let me check, whether KASAN still has to be disabled globally. I do
> not think so.
>
> OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as
> holds now for the rest of the current unwinding:
> KASAN_SANITIZE_dumpstack.o  := n
> KASAN_SANITIZE_dumpstack_$(BITS).o  := n
> KASAN_SANITIZE_stacktrace.o := n

That's fine. But if the unwinder means no KASAN at all, then I don't
think the unwinder is good.

> Now we have objtool. My objtool clone:
> 1) verifies the DWARF data (prepared by Josh)
>
> 2) generates DWARF data for assembly -- incomplete yet: see the thread
> about x86 assembly cleanup which is a pre-requisite for this to work.
> This is BTW the reason why the DWARF unwinder is default-off in this
> series yet.
>
> And we can add:
> 3) fix up the data, if they are wrong

Yes. objtool might make the unwinder acceptable.

One of the things that caused the old unwinder to be absolutely
incredible *crap* was how it turned assembly language (both inline and
separate .S files) from a useful thing to absolutely horrible line
noise that was illegible and unmaintainable, and likely to be buggy to
boot.

So we may be in a different situation that we used to. But still..

  Linus


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Linus Torvalds
On Wed, May 10, 2017 at 12:39 AM, Jiri Slaby  wrote:
>
> Every SUSE user has been using this for almost a decade and we are not
> about to switch to FP for performance reasons as noted by Jiri Kosina.

The whole "not about to switch on frame pointers" argument is bogus.

Lots of people don't have frame pointers. The tracebacks don't look
all that horrible despite that.

If the problem is that some debug option that you want to use do that
"select FRAME_POINTER" thing, then maybe we should just fix that.

For example, I think it's annoying that the LATENCYTOP helper config
option basically forces frame pointers. That just seems stupid. Even
enabling CONFIG_STACKTRACE doesn't do that.

We do fine without frame pointers. Do traces get a bit uglier? Sure.
But that's not a huge deal.

> Well, reliable stack-traces with minimal performance impact thanks to
> out-of-band data is hell good reason in my opinion.

It's not the performance impact.

It's the other crap.

It's the fact that you have a whole state machine that isn't even
used. The only reason for that state machine is for register contents,
but then the register contents aren't actually used by the stack trace
code afaik.

Yeah, in theory that register engine might be used for dynamic stack
sizes too, but I don't think gcc actually generates code like that -
it uses frame pointers for variable-sized stacks, doesn't it?

But historically, it's even more the "oops, the unwind tables are
buggy because the test coverage is horrible, and we walked off into
the weeds while walking them, taking a recursive page fault, which
turned a WARN_ON() into a dead machine that didn't even give us the
information we wanted in the first place".

Now, it may be that with tools like objtool, we might be able to
*validate* the tables, which might actually be a good safety net.

So I'm not saying that the unwinder is a total no-go.

But I want to get rid of these red herring arguments in its favor.

If the argument *for* the unwinder i scrazy bullshit (eg "I want to
use LATENCYTOP without CONFIG_FRAME_POINTER"), then we should fix
those things independently.

>> The fact that it gets disabled for KASAN also makes me suspicious. It
>> basically means that now all the accesses it does are not even
>> validated.
>
> OK, I inclined to disable KASAN when I started cleaning this up for
> _performance_ reasons. The system was so slow, that the RCU stall or
> soft-lockup detectors came up complaining. From that time, I measured
> the bottlenecks and optimized the unwinder so that 1000 iterations of
> unwinding takes:
>
> Before:
> real0m1.808s
> user0m0.001s
> sys 0m1.807s
>
> After:
> real0m0.018s
> user0m0.001s
> sys 0m0.017s
>
> So let me check, whether KASAN still has to be disabled globally. I do
> not think so.
>
> OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as
> holds now for the rest of the current unwinding:
> KASAN_SANITIZE_dumpstack.o  := n
> KASAN_SANITIZE_dumpstack_$(BITS).o  := n
> KASAN_SANITIZE_stacktrace.o := n

That's fine. But if the unwinder means no KASAN at all, then I don't
think the unwinder is good.

> Now we have objtool. My objtool clone:
> 1) verifies the DWARF data (prepared by Josh)
>
> 2) generates DWARF data for assembly -- incomplete yet: see the thread
> about x86 assembly cleanup which is a pre-requisite for this to work.
> This is BTW the reason why the DWARF unwinder is default-off in this
> series yet.
>
> And we can add:
> 3) fix up the data, if they are wrong

Yes. objtool might make the unwinder acceptable.

One of the things that caused the old unwinder to be absolutely
incredible *crap* was how it turned assembly language (both inline and
separate .S files) from a useful thing to absolutely horrible line
noise that was illegible and unmaintainable, and likely to be buggy to
boot.

So we may be in a different situation that we used to. But still..

  Linus


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread H.J. Lu
On Wed, May 10, 2017 at 6:09 AM, Josh Poimboeuf  wrote:
> On Wed, May 10, 2017 at 10:15:09AM +0200, Jiri Slaby wrote:
>> > I'm, ahem, highly skeptical to creating our own unwinding data
>> > format unless there is *documented, supported, and tested* way to
>> > force the compiler to *automatically* fall back to frame pointers
>> > any time there may be complexity involved,
>>
>> I second this. Inventing a new format like this mostly ends up with
>> using the standard one after several iterations. One cannot think of all
>> the consequences and needs while proposing a new one.
>>
>> The memory footprint is ~2M for average vmlinux. And people who need to
>> access:
>> * either need it frequently -- those do not need performance (LOCKDEP,
>> KASAN, or other debug builds)
>> * or are in the middle of WARNING, BUG, crash, panic or such and this is
>> not that often...
>>
>> And we would need *both*. The limited proprietary one in some sort of
>> .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash,
>> gdb and so on.
>
> I don't think so.  DWARF CFI is optimized for size.  My proposal is to
> take the same data (or some subset of it) and reformat it to optimize
> for simplicity.
>
> If, for some reason, we ended up needing *all* the original DWARF data,
> we could still have it in the simpler format.  In that case it might end
> up being 8M instead of 2M :-) But I don't see that being possible.

There is a compact EH for MIPS:

https://github.com/itanium-cxx-abi/cxx-abi/blob/master/MIPSCompactEH.pdf

It can be extended to other targets.

-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread H.J. Lu
On Wed, May 10, 2017 at 6:09 AM, Josh Poimboeuf  wrote:
> On Wed, May 10, 2017 at 10:15:09AM +0200, Jiri Slaby wrote:
>> > I'm, ahem, highly skeptical to creating our own unwinding data
>> > format unless there is *documented, supported, and tested* way to
>> > force the compiler to *automatically* fall back to frame pointers
>> > any time there may be complexity involved,
>>
>> I second this. Inventing a new format like this mostly ends up with
>> using the standard one after several iterations. One cannot think of all
>> the consequences and needs while proposing a new one.
>>
>> The memory footprint is ~2M for average vmlinux. And people who need to
>> access:
>> * either need it frequently -- those do not need performance (LOCKDEP,
>> KASAN, or other debug builds)
>> * or are in the middle of WARNING, BUG, crash, panic or such and this is
>> not that often...
>>
>> And we would need *both*. The limited proprietary one in some sort of
>> .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash,
>> gdb and so on.
>
> I don't think so.  DWARF CFI is optimized for size.  My proposal is to
> take the same data (or some subset of it) and reformat it to optimize
> for simplicity.
>
> If, for some reason, we ended up needing *all* the original DWARF data,
> we could still have it in the simpler format.  In that case it might end
> up being 8M instead of 2M :-) But I don't see that being possible.

There is a compact EH for MIPS:

https://github.com/itanium-cxx-abi/cxx-abi/blob/master/MIPSCompactEH.pdf

It can be extended to other targets.

-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Josh Poimboeuf
On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote:
> On 05/09/2017, 09:22 PM, Josh Poimboeuf wrote:
> > On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote:
> >> On Sun, 7 May 2017, Josh Poimboeuf wrote:
> >>
> >>> DWARF is great for debuggers.  It helps you find all the registers on 
> >>> the stack, so you can see function arguments and local variables.  All 
> >>> expressed in a nice compact format.
> >>>
> >>> But that's overkill for unwinders.  We don't need all those registers,
> >>> and the state machine is too complicated.  
> >>
> >> OTOH if we make the failures in processing of those "auxiliary" 
> >> information non-fatal (in a sense that it neither causes kernel bug nor 
> >> does it actually corrupt the unwinding process, but the only effect is 
> >> losing "optional" information), having this data available doesn't hurt. 
> > 
> > But it does hurt, in the sense that the complicated format of DWARF CFI
> > means the unwinder has to jump through a lot more hoops to read it.
> 
> Why that matters, actually? Unwinder is nothing to be performance
> oriented. And if somebody is doing a lot of unwinding during runtime,
> they can switch to in-this-case-faster FP unwinder.

More complexity == more bugs.

> > And anyway, fixing the correctness of the DWARF data is only half the
> > problem IMO.  The other half of the problem is unwinder complexity.
> 
> Complex, but generic and working. IMO, it would be rather though to come
> up with some tool working on different compilers or even different
> versions of gcc. I mean some tool to convert the DWARF data to something
> proprietary. The conversion would be as complex as is the unwinder plus
> conversion to the proprietary format and its dump into ELF. We would
> still rely on a (now out-of-kernel-runtime-code) complex monolith to do
> it right.

Complexity outside of the kernel is infinitely better than complexity in
mission critical oops code.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Josh Poimboeuf
On Wed, May 10, 2017 at 10:32:06AM +0200, Jiri Slaby wrote:
> On 05/09/2017, 09:22 PM, Josh Poimboeuf wrote:
> > On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote:
> >> On Sun, 7 May 2017, Josh Poimboeuf wrote:
> >>
> >>> DWARF is great for debuggers.  It helps you find all the registers on 
> >>> the stack, so you can see function arguments and local variables.  All 
> >>> expressed in a nice compact format.
> >>>
> >>> But that's overkill for unwinders.  We don't need all those registers,
> >>> and the state machine is too complicated.  
> >>
> >> OTOH if we make the failures in processing of those "auxiliary" 
> >> information non-fatal (in a sense that it neither causes kernel bug nor 
> >> does it actually corrupt the unwinding process, but the only effect is 
> >> losing "optional" information), having this data available doesn't hurt. 
> > 
> > But it does hurt, in the sense that the complicated format of DWARF CFI
> > means the unwinder has to jump through a lot more hoops to read it.
> 
> Why that matters, actually? Unwinder is nothing to be performance
> oriented. And if somebody is doing a lot of unwinding during runtime,
> they can switch to in-this-case-faster FP unwinder.

More complexity == more bugs.

> > And anyway, fixing the correctness of the DWARF data is only half the
> > problem IMO.  The other half of the problem is unwinder complexity.
> 
> Complex, but generic and working. IMO, it would be rather though to come
> up with some tool working on different compilers or even different
> versions of gcc. I mean some tool to convert the DWARF data to something
> proprietary. The conversion would be as complex as is the unwinder plus
> conversion to the proprietary format and its dump into ELF. We would
> still rely on a (now out-of-kernel-runtime-code) complex monolith to do
> it right.

Complexity outside of the kernel is infinitely better than complexity in
mission critical oops code.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Josh Poimboeuf
On Wed, May 10, 2017 at 10:15:09AM +0200, Jiri Slaby wrote:
> > I'm, ahem, highly skeptical to creating our own unwinding data
> > format unless there is *documented, supported, and tested* way to
> > force the compiler to *automatically* fall back to frame pointers
> > any time there may be complexity involved,
> 
> I second this. Inventing a new format like this mostly ends up with
> using the standard one after several iterations. One cannot think of all
> the consequences and needs while proposing a new one.
> 
> The memory footprint is ~2M for average vmlinux. And people who need to
> access:
> * either need it frequently -- those do not need performance (LOCKDEP,
> KASAN, or other debug builds)
> * or are in the middle of WARNING, BUG, crash, panic or such and this is
> not that often...
> 
> And we would need *both*. The limited proprietary one in some sort of
> .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash,
> gdb and so on.

I don't think so.  DWARF CFI is optimized for size.  My proposal is to
take the same data (or some subset of it) and reformat it to optimize
for simplicity.

If, for some reason, we ended up needing *all* the original DWARF data,
we could still have it in the simpler format.  In that case it might end
up being 8M instead of 2M :-) But I don't see that being possible.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Josh Poimboeuf
On Wed, May 10, 2017 at 10:15:09AM +0200, Jiri Slaby wrote:
> > I'm, ahem, highly skeptical to creating our own unwinding data
> > format unless there is *documented, supported, and tested* way to
> > force the compiler to *automatically* fall back to frame pointers
> > any time there may be complexity involved,
> 
> I second this. Inventing a new format like this mostly ends up with
> using the standard one after several iterations. One cannot think of all
> the consequences and needs while proposing a new one.
> 
> The memory footprint is ~2M for average vmlinux. And people who need to
> access:
> * either need it frequently -- those do not need performance (LOCKDEP,
> KASAN, or other debug builds)
> * or are in the middle of WARNING, BUG, crash, panic or such and this is
> not that often...
> 
> And we would need *both*. The limited proprietary one in some sort of
> .kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash,
> gdb and so on.

I don't think so.  DWARF CFI is optimized for size.  My proposal is to
take the same data (or some subset of it) and reformat it to optimize
for simplicity.

If, for some reason, we ended up needing *all* the original DWARF data,
we could still have it in the simpler format.  In that case it might end
up being 8M instead of 2M :-) But I don't see that being possible.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/10/2017, 02:42 PM, Josh Poimboeuf wrote:
> On Wed, May 10, 2017 at 09:39:39AM +0200, Jiri Slaby wrote:
>> OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as
>> holds now for the rest of the current unwinding:
>> KASAN_SANITIZE_dumpstack.o  := n
>> KASAN_SANITIZE_dumpstack_$(BITS).o  := n
>> KASAN_SANITIZE_stacktrace.o := n
> 
> Most of the unwinder code is now in unwind_frame.c, which *does* have
> KASAN enabled.
> 
> I think the above is leftover from the days before I rewrote the
> unwinder.  I'll look at renabling KASAN for those files.

Ok, fair enough.

I will do some measurements in the FP field while I will be at it.

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/10/2017, 02:42 PM, Josh Poimboeuf wrote:
> On Wed, May 10, 2017 at 09:39:39AM +0200, Jiri Slaby wrote:
>> OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as
>> holds now for the rest of the current unwinding:
>> KASAN_SANITIZE_dumpstack.o  := n
>> KASAN_SANITIZE_dumpstack_$(BITS).o  := n
>> KASAN_SANITIZE_stacktrace.o := n
> 
> Most of the unwinder code is now in unwind_frame.c, which *does* have
> KASAN enabled.
> 
> I think the above is leftover from the days before I rewrote the
> unwinder.  I'll look at renabling KASAN for those files.

Ok, fair enough.

I will do some measurements in the FP field while I will be at it.

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Josh Poimboeuf
On Wed, May 10, 2017 at 09:39:39AM +0200, Jiri Slaby wrote:
> OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as
> holds now for the rest of the current unwinding:
> KASAN_SANITIZE_dumpstack.o  := n
> KASAN_SANITIZE_dumpstack_$(BITS).o  := n
> KASAN_SANITIZE_stacktrace.o := n

Most of the unwinder code is now in unwind_frame.c, which *does* have
KASAN enabled.

I think the above is leftover from the days before I rewrote the
unwinder.  I'll look at renabling KASAN for those files.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Josh Poimboeuf
On Wed, May 10, 2017 at 09:39:39AM +0200, Jiri Slaby wrote:
> OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as
> holds now for the rest of the current unwinding:
> KASAN_SANITIZE_dumpstack.o  := n
> KASAN_SANITIZE_dumpstack_$(BITS).o  := n
> KASAN_SANITIZE_stacktrace.o := n

Most of the unwinder code is now in unwind_frame.c, which *does* have
KASAN enabled.

I think the above is leftover from the days before I rewrote the
unwinder.  I'll look at renabling KASAN for those files.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/09/2017, 09:22 PM, Josh Poimboeuf wrote:
> On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote:
>> On Sun, 7 May 2017, Josh Poimboeuf wrote:
>>
>>> DWARF is great for debuggers.  It helps you find all the registers on 
>>> the stack, so you can see function arguments and local variables.  All 
>>> expressed in a nice compact format.
>>>
>>> But that's overkill for unwinders.  We don't need all those registers,
>>> and the state machine is too complicated.  
>>
>> OTOH if we make the failures in processing of those "auxiliary" 
>> information non-fatal (in a sense that it neither causes kernel bug nor 
>> does it actually corrupt the unwinding process, but the only effect is 
>> losing "optional" information), having this data available doesn't hurt. 
> 
> But it does hurt, in the sense that the complicated format of DWARF CFI
> means the unwinder has to jump through a lot more hoops to read it.

Why that matters, actually? Unwinder is nothing to be performance
oriented. And if somebody is doing a lot of unwinding during runtime,
they can switch to in-this-case-faster FP unwinder.

> And if we wanted it to be reasonably reliable, we'd also need to fix up
> the DWARF data somehow before converting it, presumably with objtool.

We have to do this anyway. Be it the DWARF info or whatever we end up with.

>>> Unwinders basically only need to know one thing: given an instruction 
>>> address and a stack pointer, where is the caller's stack frame?
>>
>> Again, DWARF should be able to give us all of this (including the 
>> FP-fallback etc). It feels a bit silly to purposedly ignore it and 
>> reinvent parts of it again, instead of fixing (read: "asking toolchain 
>> guys to fix")

And we can just do, if a totally broken compiler emerges:
#if defined(CONFIG_DWARF_UNWINDER) && GCC_VERSION == 59000
#error Sorry, choose a different compiler or disable DWARF unwinder
#endif

We haven't to do this during the past decade and I am sceptic if we
would have to do it in the next one.

>>  the cases where we actually are not getting the proper data 
>> in DWARF. That's a win-win at the end of the day.
> 
> Most of the kernel DWARF issues I've seen aren't caused by toolchain
> bugs.  They're caused by the kernel's quirks: asm, inline asm, special
> sections.

Right.

> And anyway, fixing the correctness of the DWARF data is only half the
> problem IMO.  The other half of the problem is unwinder complexity.

Complex, but generic and working. IMO, it would be rather though to come
up with some tool working on different compilers or even different
versions of gcc. I mean some tool to convert the DWARF data to something
proprietary. The conversion would be as complex as is the unwinder plus
conversion to the proprietary format and its dump into ELF. We would
still rely on a (now out-of-kernel-runtime-code) complex monolith to do
it right.

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/09/2017, 09:22 PM, Josh Poimboeuf wrote:
> On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote:
>> On Sun, 7 May 2017, Josh Poimboeuf wrote:
>>
>>> DWARF is great for debuggers.  It helps you find all the registers on 
>>> the stack, so you can see function arguments and local variables.  All 
>>> expressed in a nice compact format.
>>>
>>> But that's overkill for unwinders.  We don't need all those registers,
>>> and the state machine is too complicated.  
>>
>> OTOH if we make the failures in processing of those "auxiliary" 
>> information non-fatal (in a sense that it neither causes kernel bug nor 
>> does it actually corrupt the unwinding process, but the only effect is 
>> losing "optional" information), having this data available doesn't hurt. 
> 
> But it does hurt, in the sense that the complicated format of DWARF CFI
> means the unwinder has to jump through a lot more hoops to read it.

Why that matters, actually? Unwinder is nothing to be performance
oriented. And if somebody is doing a lot of unwinding during runtime,
they can switch to in-this-case-faster FP unwinder.

> And if we wanted it to be reasonably reliable, we'd also need to fix up
> the DWARF data somehow before converting it, presumably with objtool.

We have to do this anyway. Be it the DWARF info or whatever we end up with.

>>> Unwinders basically only need to know one thing: given an instruction 
>>> address and a stack pointer, where is the caller's stack frame?
>>
>> Again, DWARF should be able to give us all of this (including the 
>> FP-fallback etc). It feels a bit silly to purposedly ignore it and 
>> reinvent parts of it again, instead of fixing (read: "asking toolchain 
>> guys to fix")

And we can just do, if a totally broken compiler emerges:
#if defined(CONFIG_DWARF_UNWINDER) && GCC_VERSION == 59000
#error Sorry, choose a different compiler or disable DWARF unwinder
#endif

We haven't to do this during the past decade and I am sceptic if we
would have to do it in the next one.

>>  the cases where we actually are not getting the proper data 
>> in DWARF. That's a win-win at the end of the day.
> 
> Most of the kernel DWARF issues I've seen aren't caused by toolchain
> bugs.  They're caused by the kernel's quirks: asm, inline asm, special
> sections.

Right.

> And anyway, fixing the correctness of the DWARF data is only half the
> problem IMO.  The other half of the problem is unwinder complexity.

Complex, but generic and working. IMO, it would be rather though to come
up with some tool working on different compilers or even different
versions of gcc. I mean some tool to convert the DWARF data to something
proprietary. The conversion would be as complex as is the unwinder plus
conversion to the proprietary format and its dump into ELF. We would
still rely on a (now out-of-kernel-runtime-code) complex monolith to do
it right.

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/09/2017, 12:00 PM, h...@zytor.com wrote:
> As far as I understand, the .eh_frame section is supposed to contain the 
> subset of the DWARF bytecode needed to do a stack unwind when an exception is 
> thrown, whereas the .debug* sections contain the full DWARF data a debugger 
> might want.  Thus .eh_frame is mapped into the runtime process while .debug* 
> [usually?] is not.  .debug* can easily be 10x larger than the executable text 
> segments.

As it currently stands, the (same) data is generated either to
.eh_frame, or to .debug_frame. Depending if DWARF_UNWINDER is turned on,
or off, respectively. vdso has the same data in both, always.

> Assembly language routines become problematic no matter what you do unless 
> you restrict the way the assembly can be written.  Experience has shown us 
> that hand-maintaining annotations in assembly code is doomed to failure, and 
> in many cases it isn't even clear to even experienced programmers how to do 
> it.

Sure, manual annotations of assembly will be avoided as much as
possible. We have to rely objtool to generate them in most cases.

>  [H.J.: is the .cfi_* operations set even documented anywhere in a way that 
> non-compiler-writers can comprehend it?]

Until now, I always looked into as manual:
$ pinfo --node='CFI directives' as

> I'm, ahem, highly skeptical to creating our own unwinding data format unless 
> there is *documented, supported, and tested* way to force the compiler to 
> *automatically* fall back to frame pointers any time there may be complexity 
> involved,

I second this. Inventing a new format like this mostly ends up with
using the standard one after several iterations. One cannot think of all
the consequences and needs while proposing a new one.

The memory footprint is ~2M for average vmlinux. And people who need to
access:
* either need it frequently -- those do not need performance (LOCKDEP,
KASAN, or other debug builds)
* or are in the middle of WARNING, BUG, crash, panic or such and this is
not that often...

And we would need *both*. The limited proprietary one in some sort of
.kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash,
gdb and so on.

And yes, the DWARF unwinder falls back to FP if they are available (see
function dw_fp_fallback).

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/09/2017, 12:00 PM, h...@zytor.com wrote:
> As far as I understand, the .eh_frame section is supposed to contain the 
> subset of the DWARF bytecode needed to do a stack unwind when an exception is 
> thrown, whereas the .debug* sections contain the full DWARF data a debugger 
> might want.  Thus .eh_frame is mapped into the runtime process while .debug* 
> [usually?] is not.  .debug* can easily be 10x larger than the executable text 
> segments.

As it currently stands, the (same) data is generated either to
.eh_frame, or to .debug_frame. Depending if DWARF_UNWINDER is turned on,
or off, respectively. vdso has the same data in both, always.

> Assembly language routines become problematic no matter what you do unless 
> you restrict the way the assembly can be written.  Experience has shown us 
> that hand-maintaining annotations in assembly code is doomed to failure, and 
> in many cases it isn't even clear to even experienced programmers how to do 
> it.

Sure, manual annotations of assembly will be avoided as much as
possible. We have to rely objtool to generate them in most cases.

>  [H.J.: is the .cfi_* operations set even documented anywhere in a way that 
> non-compiler-writers can comprehend it?]

Until now, I always looked into as manual:
$ pinfo --node='CFI directives' as

> I'm, ahem, highly skeptical to creating our own unwinding data format unless 
> there is *documented, supported, and tested* way to force the compiler to 
> *automatically* fall back to frame pointers any time there may be complexity 
> involved,

I second this. Inventing a new format like this mostly ends up with
using the standard one after several iterations. One cannot think of all
the consequences and needs while proposing a new one.

The memory footprint is ~2M for average vmlinux. And people who need to
access:
* either need it frequently -- those do not need performance (LOCKDEP,
KASAN, or other debug builds)
* or are in the middle of WARNING, BUG, crash, panic or such and this is
not that often...

And we would need *both*. The limited proprietary one in some sort of
.kernel_eh_frame, and DWARF cfis in .debug_frame for tools like crash,
gdb and so on.

And yes, the DWARF unwinder falls back to FP if they are available (see
function dw_fp_fallback).

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/06/2017, 09:19 AM, Ingo Molnar wrote:
> 
> * Linus Torvalds  wrote:
> 
>> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby  wrote:
>>> The DWARF unwinder is in place and ready. So introduce the config option
>>> to allow users to enable it. It is by default off due to missing
>>> assembly annotations.
>>
>> Who actually ends up using this?
> 
> Also, why wasn't Josh Poimboeuf Cc:-ed, who de-facto maintains the x86 
> unwinding 
> code?

I explicitly CCed Josh on 5/7 and 6/7 which touches the code. Besides
that, I assumed he is implicitly CCed via live-patch...@vger.kernel.org
which is carbon-copied on each of the patches.

> AFAICS this series is just repeating the old mistakes of the old Dwarf 
> unwinders 
> of trusting GCC's unwinder data. So NAK for the time being on the whole 
> approach:
> 
> NAcked-by: Ingo Molnar 

OK, as the series stands now, we indeed do. Noteworthy, we, in SUSE, had
no problems with this reliance for all the time we have been using the
unwinder.

Anyway, objtool is about to vaidate the DWARF data, generate it for
assembly and potentially fix it if problems occur. Could you elaborate
on what else would help you to change your stance?

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/06/2017, 09:19 AM, Ingo Molnar wrote:
> 
> * Linus Torvalds  wrote:
> 
>> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby  wrote:
>>> The DWARF unwinder is in place and ready. So introduce the config option
>>> to allow users to enable it. It is by default off due to missing
>>> assembly annotations.
>>
>> Who actually ends up using this?
> 
> Also, why wasn't Josh Poimboeuf Cc:-ed, who de-facto maintains the x86 
> unwinding 
> code?

I explicitly CCed Josh on 5/7 and 6/7 which touches the code. Besides
that, I assumed he is implicitly CCed via live-patch...@vger.kernel.org
which is carbon-copied on each of the patches.

> AFAICS this series is just repeating the old mistakes of the old Dwarf 
> unwinders 
> of trusting GCC's unwinder data. So NAK for the time being on the whole 
> approach:
> 
> NAcked-by: Ingo Molnar 

OK, as the series stands now, we indeed do. Noteworthy, we, in SUSE, had
no problems with this reliance for all the time we have been using the
unwinder.

Anyway, objtool is about to vaidate the DWARF data, generate it for
assembly and potentially fix it if problems occur. Could you elaborate
on what else would help you to change your stance?

thanks,
-- 
js
suse labs


Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/05/2017, 09:57 PM, Linus Torvalds wrote:
> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby  wrote:
>> The DWARF unwinder is in place and ready. So introduce the config option
>> to allow users to enable it. It is by default off due to missing
>> assembly annotations.
> 
> Who actually ends up using this?

Every SUSE user has been using this for almost a decade and we are not
about to switch to FP for performance reasons as noted by Jiri Kosina.
So SUSE users are going to be exposed to DWARF unwinder for another
decade or so at least.

Therefore, this is another attempt to make the unwinder (in some form)
upstream. Since this was first proposed many years ago, we have been
forced to forward-port it over and over and everyone knows what pain it
is. So it is nice, that this opened the discussion at least.

> Because from the last time we had fancy unwindoers, and all the
> problems it caused for oops handling with absolutely _zero_ upsides
> ever, I do not ever again want to see fancy unwinders with complex
> state machine handling used by the oopsing code.

Well, reliable stack-traces with minimal performance impact thanks to
out-of-band data is hell good reason in my opinion.

> The fact that it gets disabled for KASAN also makes me suspicious. It
> basically means that now all the accesses it does are not even
> validated.

OK, I inclined to disable KASAN when I started cleaning this up for
_performance_ reasons. The system was so slow, that the RCU stall or
soft-lockup detectors came up complaining. From that time, I measured
the bottlenecks and optimized the unwinder so that 1000 iterations of
unwinding takes:

Before:
real0m1.808s
user0m0.001s
sys 0m1.807s

After:
real0m0.018s
user0m0.001s
sys 0m0.017s

So let me check, whether KASAN still has to be disabled globally. I do
not think so.

OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as
holds now for the rest of the current unwinding:
KASAN_SANITIZE_dumpstack.o  := n
KASAN_SANITIZE_dumpstack_$(BITS).o  := n
KASAN_SANITIZE_stacktrace.o := n

Still, I can let KASAN := y for dwarf.c for testing purposes locally and
smoke-test the unwinder.

> The fact that the most of the code seems to be disabled for the first
> six patches, and then just enabled in the last patch, also seems to
> mean that the series also gets no bisection coverage or testing that
> the individual patches make any sense. (ie there's a lot of code
> inside "CONFIG_DWARF_UNWIND" in the early patches but that config
> option cannot even be enabled until the last patch).

Correct. This was one big patch previously. I separated that patch into
several smaller commits touching different places of the kernel for
easier review.

It does not make sense to test any of the patches separately except the
first. Hence the config option which enables the rest of the series is
the last one. I deemed this as one of possible approaches to split
patches (I have seen this many times in the past.) I can of course
squash this back into a single patch (or two).

> We used to have nasty issues with not just missing dwarf info, but
> also actively *wrong* dwarf info. Compiler versions that generated
> subtly wrong info, because nobody actually really depended on it, and
> the people who had tested it seldom did the kinds of things we do in 
> the kernel (eg inline asms etc).

I must admit I am not aware of any issues in this manner during the
years. Again, this unwinder is the default in SUSE kernels since ever,
so we have been using gcc from at least 3.2 to 7. But see below.

> So I'm personally still very suspicious of these things.
> 
> Last time I had huge issues with people also always blaming *anything*
> else than that unwinder. It was always "oh, somebody wrote asm without
> getting it right". Or "oh, the compiler generated bad tables, it's not
> *my* fault that now the kernel oopsing code no longer works".

Now we have objtool. My objtool clone:
1) verifies the DWARF data (prepared by Josh)

2) generates DWARF data for assembly -- incomplete yet: see the thread
about x86 assembly cleanup which is a pre-requisite for this to work.
This is BTW the reason why the DWARF unwinder is default-off in this
series yet.

And we can add:
3) fix up the data, if they are wrong

That said, objtool could handle the data so they are correct and
as-expected for the unwinder. Without objtool, the data (and unwinder)
is hopeless (only vdso from all the assembly is annotated.)

> When I asked for more stricter debug table validation to avoid issues,
> it was always "oh, we fixed it, no worries", and then two months later
> somebody hit another issue.

Reasonable, indeed. I am all for strict checking. objtool is to do that.

> Put another way; the last time we did crazy stuff like this, it got
> reverted. For a damn good reason, despite some people being in denial
> about those reasons.

Speaking for myself, having 

Re: [PATCH 7/7] DWARF: add the config option

2017-05-10 Thread Jiri Slaby
On 05/05/2017, 09:57 PM, Linus Torvalds wrote:
> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby  wrote:
>> The DWARF unwinder is in place and ready. So introduce the config option
>> to allow users to enable it. It is by default off due to missing
>> assembly annotations.
> 
> Who actually ends up using this?

Every SUSE user has been using this for almost a decade and we are not
about to switch to FP for performance reasons as noted by Jiri Kosina.
So SUSE users are going to be exposed to DWARF unwinder for another
decade or so at least.

Therefore, this is another attempt to make the unwinder (in some form)
upstream. Since this was first proposed many years ago, we have been
forced to forward-port it over and over and everyone knows what pain it
is. So it is nice, that this opened the discussion at least.

> Because from the last time we had fancy unwindoers, and all the
> problems it caused for oops handling with absolutely _zero_ upsides
> ever, I do not ever again want to see fancy unwinders with complex
> state machine handling used by the oopsing code.

Well, reliable stack-traces with minimal performance impact thanks to
out-of-band data is hell good reason in my opinion.

> The fact that it gets disabled for KASAN also makes me suspicious. It
> basically means that now all the accesses it does are not even
> validated.

OK, I inclined to disable KASAN when I started cleaning this up for
_performance_ reasons. The system was so slow, that the RCU stall or
soft-lockup detectors came up complaining. From that time, I measured
the bottlenecks and optimized the unwinder so that 1000 iterations of
unwinding takes:

Before:
real0m1.808s
user0m0.001s
sys 0m1.807s

After:
real0m0.018s
user0m0.001s
sys 0m0.017s

So let me check, whether KASAN still has to be disabled globally. I do
not think so.

OTOH, TBH, I am not sure KASAN can be enabled for dwarf.c, the same as
holds now for the rest of the current unwinding:
KASAN_SANITIZE_dumpstack.o  := n
KASAN_SANITIZE_dumpstack_$(BITS).o  := n
KASAN_SANITIZE_stacktrace.o := n

Still, I can let KASAN := y for dwarf.c for testing purposes locally and
smoke-test the unwinder.

> The fact that the most of the code seems to be disabled for the first
> six patches, and then just enabled in the last patch, also seems to
> mean that the series also gets no bisection coverage or testing that
> the individual patches make any sense. (ie there's a lot of code
> inside "CONFIG_DWARF_UNWIND" in the early patches but that config
> option cannot even be enabled until the last patch).

Correct. This was one big patch previously. I separated that patch into
several smaller commits touching different places of the kernel for
easier review.

It does not make sense to test any of the patches separately except the
first. Hence the config option which enables the rest of the series is
the last one. I deemed this as one of possible approaches to split
patches (I have seen this many times in the past.) I can of course
squash this back into a single patch (or two).

> We used to have nasty issues with not just missing dwarf info, but
> also actively *wrong* dwarf info. Compiler versions that generated
> subtly wrong info, because nobody actually really depended on it, and
> the people who had tested it seldom did the kinds of things we do in 
> the kernel (eg inline asms etc).

I must admit I am not aware of any issues in this manner during the
years. Again, this unwinder is the default in SUSE kernels since ever,
so we have been using gcc from at least 3.2 to 7. But see below.

> So I'm personally still very suspicious of these things.
> 
> Last time I had huge issues with people also always blaming *anything*
> else than that unwinder. It was always "oh, somebody wrote asm without
> getting it right". Or "oh, the compiler generated bad tables, it's not
> *my* fault that now the kernel oopsing code no longer works".

Now we have objtool. My objtool clone:
1) verifies the DWARF data (prepared by Josh)

2) generates DWARF data for assembly -- incomplete yet: see the thread
about x86 assembly cleanup which is a pre-requisite for this to work.
This is BTW the reason why the DWARF unwinder is default-off in this
series yet.

And we can add:
3) fix up the data, if they are wrong

That said, objtool could handle the data so they are correct and
as-expected for the unwinder. Without objtool, the data (and unwinder)
is hopeless (only vdso from all the assembly is annotated.)

> When I asked for more stricter debug table validation to avoid issues,
> it was always "oh, we fixed it, no worries", and then two months later
> somebody hit another issue.

Reasonable, indeed. I am all for strict checking. objtool is to do that.

> Put another way; the last time we did crazy stuff like this, it got
> reverted. For a damn good reason, despite some people being in denial
> about those reasons.

Speaking for myself, having it out-of-tree 

Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread Josh Poimboeuf
On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote:
> On Sun, 7 May 2017, Josh Poimboeuf wrote:
> 
> > DWARF is great for debuggers.  It helps you find all the registers on 
> > the stack, so you can see function arguments and local variables.  All 
> > expressed in a nice compact format.
> > 
> > But that's overkill for unwinders.  We don't need all those registers,
> > and the state machine is too complicated.  
> 
> OTOH if we make the failures in processing of those "auxiliary" 
> information non-fatal (in a sense that it neither causes kernel bug nor 
> does it actually corrupt the unwinding process, but the only effect is 
> losing "optional" information), having this data available doesn't hurt. 

But it does hurt, in the sense that the complicated format of DWARF CFI
means the unwinder has to jump through a lot more hoops to read it.

> It's there anyway for builds containing debuginfo, and the information is 
> all there so that it can be used by things like gdb or crash, so it seems 
> natural to re-use as much as possible of it.

There's a valid argument to be made that we should start with the DWARF
data instead of creating the new data from scratch.  That might be fine.
Right now I don't have a strong feeling about it either way.

But if we do that, we should still convert the DWARF data to a simple
streamlined format for the in-kernel unwinder, so it can easily be read
by the kernel without having to fire up a DWARF state machine in the
middle of an oops.

And if we wanted it to be reasonably reliable, we'd also need to fix up
the DWARF data somehow before converting it, presumably with objtool.

> > Unwinders basically only need to know one thing: given an instruction 
> > address and a stack pointer, where is the caller's stack frame?
> 
> Again, DWARF should be able to give us all of this (including the 
> FP-fallback etc). It feels a bit silly to purposedly ignore it and 
> reinvent parts of it again, instead of fixing (read: "asking toolchain 
> guys to fix") the cases where we actually are not getting the proper data 
> in DWARF. That's a win-win at the end of the day.

Most of the kernel DWARF issues I've seen aren't caused by toolchain
bugs.  They're caused by the kernel's quirks: asm, inline asm, special
sections.

And anyway, fixing the correctness of the DWARF data is only half the
problem IMO.  The other half of the problem is unwinder complexity.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread Josh Poimboeuf
On Tue, May 09, 2017 at 08:47:50PM +0200, Jiri Kosina wrote:
> On Sun, 7 May 2017, Josh Poimboeuf wrote:
> 
> > DWARF is great for debuggers.  It helps you find all the registers on 
> > the stack, so you can see function arguments and local variables.  All 
> > expressed in a nice compact format.
> > 
> > But that's overkill for unwinders.  We don't need all those registers,
> > and the state machine is too complicated.  
> 
> OTOH if we make the failures in processing of those "auxiliary" 
> information non-fatal (in a sense that it neither causes kernel bug nor 
> does it actually corrupt the unwinding process, but the only effect is 
> losing "optional" information), having this data available doesn't hurt. 

But it does hurt, in the sense that the complicated format of DWARF CFI
means the unwinder has to jump through a lot more hoops to read it.

> It's there anyway for builds containing debuginfo, and the information is 
> all there so that it can be used by things like gdb or crash, so it seems 
> natural to re-use as much as possible of it.

There's a valid argument to be made that we should start with the DWARF
data instead of creating the new data from scratch.  That might be fine.
Right now I don't have a strong feeling about it either way.

But if we do that, we should still convert the DWARF data to a simple
streamlined format for the in-kernel unwinder, so it can easily be read
by the kernel without having to fire up a DWARF state machine in the
middle of an oops.

And if we wanted it to be reasonably reliable, we'd also need to fix up
the DWARF data somehow before converting it, presumably with objtool.

> > Unwinders basically only need to know one thing: given an instruction 
> > address and a stack pointer, where is the caller's stack frame?
> 
> Again, DWARF should be able to give us all of this (including the 
> FP-fallback etc). It feels a bit silly to purposedly ignore it and 
> reinvent parts of it again, instead of fixing (read: "asking toolchain 
> guys to fix") the cases where we actually are not getting the proper data 
> in DWARF. That's a win-win at the end of the day.

Most of the kernel DWARF issues I've seen aren't caused by toolchain
bugs.  They're caused by the kernel's quirks: asm, inline asm, special
sections.

And anyway, fixing the correctness of the DWARF data is only half the
problem IMO.  The other half of the problem is unwinder complexity.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread Jiri Kosina
On Sun, 7 May 2017, Josh Poimboeuf wrote:

> DWARF is great for debuggers.  It helps you find all the registers on 
> the stack, so you can see function arguments and local variables.  All 
> expressed in a nice compact format.
> 
> But that's overkill for unwinders.  We don't need all those registers,
> and the state machine is too complicated.  

OTOH if we make the failures in processing of those "auxiliary" 
information non-fatal (in a sense that it neither causes kernel bug nor 
does it actually corrupt the unwinding process, but the only effect is 
losing "optional" information), having this data available doesn't hurt. 

It's there anyway for builds containing debuginfo, and the information is 
all there so that it can be used by things like gdb or crash, so it seems 
natural to re-use as much as possible of it.

> Unwinders basically only need to know one thing: given an instruction 
> address and a stack pointer, where is the caller's stack frame?

Again, DWARF should be able to give us all of this (including the 
FP-fallback etc). It feels a bit silly to purposedly ignore it and 
reinvent parts of it again, instead of fixing (read: "asking toolchain 
guys to fix") the cases where we actually are not getting the proper data 
in DWARF. That's a win-win at the end of the day.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread Jiri Kosina
On Sun, 7 May 2017, Josh Poimboeuf wrote:

> DWARF is great for debuggers.  It helps you find all the registers on 
> the stack, so you can see function arguments and local variables.  All 
> expressed in a nice compact format.
> 
> But that's overkill for unwinders.  We don't need all those registers,
> and the state machine is too complicated.  

OTOH if we make the failures in processing of those "auxiliary" 
information non-fatal (in a sense that it neither causes kernel bug nor 
does it actually corrupt the unwinding process, but the only effect is 
losing "optional" information), having this data available doesn't hurt. 

It's there anyway for builds containing debuginfo, and the information is 
all there so that it can be used by things like gdb or crash, so it seems 
natural to re-use as much as possible of it.

> Unwinders basically only need to know one thing: given an instruction 
> address and a stack pointer, where is the caller's stack frame?

Again, DWARF should be able to give us all of this (including the 
FP-fallback etc). It feels a bit silly to purposedly ignore it and 
reinvent parts of it again, instead of fixing (read: "asking toolchain 
guys to fix") the cases where we actually are not getting the proper data 
in DWARF. That's a win-win at the end of the day.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread H.J. Lu
On Tue, May 9, 2017 at 7:58 AM, Josh Poimboeuf  wrote:
> On Tue, May 09, 2017 at 03:00:45AM -0700, h...@zytor.com wrote:
>> I'm, ahem, highly skeptical to creating our own unwinding data format
>> unless there is *documented, supported, and tested* way to force the
>> compiler to *automatically* fall back to frame pointers any time there
>> may be complexity involved, which at a very minimum includes any kind
>> of data-dependent manipulation of the stack pointer.
>
> That would be nice.  But isn't falling back to a frame pointer (or
> another callee-saved reg or a stack location) already needed in such
> cases?  Otherwise how could DWARF unwinding work?
>
>> Otherwise you will have to fail the kernel build when your static tool
>> runs into instruction sequences it can't deal with, but the compiler
>> may generate them - boom.
>
> Failing the build is harsh, we could just warn about it and skip the
> data for the affected function(s).
>
> BTW, there is another option.  Instead of generating the data from
> scratch, we could just convert gcc's DWARF CFI to the format we need.
>
> However that wouldn't solve the problems we have with the holes and
> inaccuracies in DWARF from our hand-annotated asm, inline asm, and
> special sections (extable, alternatives, etc).  We'd still have to rely
> on objtool for that, so we'd still be in the same boat of needing
> objtool to be able to follow gcc code paths.

CFI directives are documented in GNU assembler manual.  They
store unwind info in .eh_frame section.  They work well with assembly
codes in glibc.  But I don't know how well they work with kernel unwind.

> So yes, it sucks that objtool needs to work for unwinding to work.  But
> if we want decent DWARF-esque unwinding, I don't see any way around
> that due to the low-level nature of the kernel.
>
>> Worse, your tool will not even recognize the problem and you're in a
>> worse place than when you started.
>
> We could have a runtime NMI-based stack checker which ensures it can
> always unwind to the bottom of the stack.  Over time this would
> hopefully provide full validation of the unwinder data and
> functionality.
>
> --
> Josh



-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread H.J. Lu
On Tue, May 9, 2017 at 7:58 AM, Josh Poimboeuf  wrote:
> On Tue, May 09, 2017 at 03:00:45AM -0700, h...@zytor.com wrote:
>> I'm, ahem, highly skeptical to creating our own unwinding data format
>> unless there is *documented, supported, and tested* way to force the
>> compiler to *automatically* fall back to frame pointers any time there
>> may be complexity involved, which at a very minimum includes any kind
>> of data-dependent manipulation of the stack pointer.
>
> That would be nice.  But isn't falling back to a frame pointer (or
> another callee-saved reg or a stack location) already needed in such
> cases?  Otherwise how could DWARF unwinding work?
>
>> Otherwise you will have to fail the kernel build when your static tool
>> runs into instruction sequences it can't deal with, but the compiler
>> may generate them - boom.
>
> Failing the build is harsh, we could just warn about it and skip the
> data for the affected function(s).
>
> BTW, there is another option.  Instead of generating the data from
> scratch, we could just convert gcc's DWARF CFI to the format we need.
>
> However that wouldn't solve the problems we have with the holes and
> inaccuracies in DWARF from our hand-annotated asm, inline asm, and
> special sections (extable, alternatives, etc).  We'd still have to rely
> on objtool for that, so we'd still be in the same boat of needing
> objtool to be able to follow gcc code paths.

CFI directives are documented in GNU assembler manual.  They
store unwind info in .eh_frame section.  They work well with assembly
codes in glibc.  But I don't know how well they work with kernel unwind.

> So yes, it sucks that objtool needs to work for unwinding to work.  But
> if we want decent DWARF-esque unwinding, I don't see any way around
> that due to the low-level nature of the kernel.
>
>> Worse, your tool will not even recognize the problem and you're in a
>> worse place than when you started.
>
> We could have a runtime NMI-based stack checker which ensures it can
> always unwind to the bottom of the stack.  Over time this would
> hopefully provide full validation of the unwinder data and
> functionality.
>
> --
> Josh



-- 
H.J.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread Josh Poimboeuf
On Tue, May 09, 2017 at 03:00:45AM -0700, h...@zytor.com wrote:
> I'm, ahem, highly skeptical to creating our own unwinding data format
> unless there is *documented, supported, and tested* way to force the
> compiler to *automatically* fall back to frame pointers any time there
> may be complexity involved, which at a very minimum includes any kind
> of data-dependent manipulation of the stack pointer.

That would be nice.  But isn't falling back to a frame pointer (or
another callee-saved reg or a stack location) already needed in such
cases?  Otherwise how could DWARF unwinding work?

> Otherwise you will have to fail the kernel build when your static tool
> runs into instruction sequences it can't deal with, but the compiler
> may generate them - boom.

Failing the build is harsh, we could just warn about it and skip the
data for the affected function(s).

BTW, there is another option.  Instead of generating the data from
scratch, we could just convert gcc's DWARF CFI to the format we need.

However that wouldn't solve the problems we have with the holes and
inaccuracies in DWARF from our hand-annotated asm, inline asm, and
special sections (extable, alternatives, etc).  We'd still have to rely
on objtool for that, so we'd still be in the same boat of needing
objtool to be able to follow gcc code paths.

So yes, it sucks that objtool needs to work for unwinding to work.  But
if we want decent DWARF-esque unwinding, I don't see any way around
that due to the low-level nature of the kernel.

> Worse, your tool will not even recognize the problem and you're in a
> worse place than when you started.

We could have a runtime NMI-based stack checker which ensures it can
always unwind to the bottom of the stack.  Over time this would
hopefully provide full validation of the unwinder data and
functionality.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread Josh Poimboeuf
On Tue, May 09, 2017 at 03:00:45AM -0700, h...@zytor.com wrote:
> I'm, ahem, highly skeptical to creating our own unwinding data format
> unless there is *documented, supported, and tested* way to force the
> compiler to *automatically* fall back to frame pointers any time there
> may be complexity involved, which at a very minimum includes any kind
> of data-dependent manipulation of the stack pointer.

That would be nice.  But isn't falling back to a frame pointer (or
another callee-saved reg or a stack location) already needed in such
cases?  Otherwise how could DWARF unwinding work?

> Otherwise you will have to fail the kernel build when your static tool
> runs into instruction sequences it can't deal with, but the compiler
> may generate them - boom.

Failing the build is harsh, we could just warn about it and skip the
data for the affected function(s).

BTW, there is another option.  Instead of generating the data from
scratch, we could just convert gcc's DWARF CFI to the format we need.

However that wouldn't solve the problems we have with the holes and
inaccuracies in DWARF from our hand-annotated asm, inline asm, and
special sections (extable, alternatives, etc).  We'd still have to rely
on objtool for that, so we'd still be in the same boat of needing
objtool to be able to follow gcc code paths.

So yes, it sucks that objtool needs to work for unwinding to work.  But
if we want decent DWARF-esque unwinding, I don't see any way around
that due to the low-level nature of the kernel.

> Worse, your tool will not even recognize the problem and you're in a
> worse place than when you started.

We could have a runtime NMI-based stack checker which ensures it can
always unwind to the bottom of the stack.  Over time this would
hopefully provide full validation of the unwinder data and
functionality.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread hpa
On May 8, 2017 8:38:31 PM PDT, Josh Poimboeuf  wrote:
>On Mon, May 08, 2017 at 07:31:50PM -0700, Andy Lutomirski wrote:
>> On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf 
>wrote:
>> >> Also, don't you need some indication of which reg is the base from
>> >> which you find previous frame?  After all, sometimes GCC will emit
>a
>> >> frame pointer even in an otherwise frame-pointer-omitting kernel.
>> >
>> > I don't think we *need* to do that.  I believe the base reg can
>just
>> > always[*] be the stack pointer, even with frame pointers.
>> 
>> What if there are functions that use alloca or variable length arrays
>> on the stack?  Functions using AHASH_REQUEST_ON_STACK come to mind.
>
>Wow, mind blown.  This is why I added you to CC!
>
>Ok, I guess we'll need to be able to use the frame pointer as a base
>reg.  It should be easy anyway.

[Adding H.J. Lu for obvious expertise - H.J., the issue at hand is doing stack 
unwinding to display the call stack on a kernel failure.  Right now the 
standard kernel configuration is using frame pointers even on x86-64 because we 
had very bad experiences with fragility of .eh_frame-based unwinding.  However, 
on some kernel workloads the cost of frame pointers  I is quite high.]

Yes is exactly why.  I believe gcc will still use a frame pointer when the 
relationship between sp and the incoming stack frame is indeterminate for some 
reason, but I have to admit that a) I'm not 100% sure and b) there are DWARF 
annotations for exactly this, so even if it is true for current gcc that a 
frame pointer is not needed it might not be in the future.

As far as I understand, the .eh_frame section is supposed to contain the subset 
of the DWARF bytecode needed to do a stack unwind when an exception is thrown, 
whereas the .debug* sections contain the full DWARF data a debugger might want. 
 Thus .eh_frame is mapped into the runtime process while .debug* [usually?] is 
not.  .debug* can easily be 10x larger than the executable text segments.

Since C doesn't *have* exceptions, the main user of this for C code is the case 
of calls C++ > C > C++ or equivalent; thus the vulnerability to toolchain 
filters failures – the case of an exception-caused unwind in the middle of a C 
function might not have been extensively tested.  [H.J.: is that correct?  Or 
could an asynchronous event like a signal cause an unwind of the .eh_frame from 
an arbitrary point?  If so, how is that tested?]

In the case of the kernel, size matters in a different way, because even though 
it will be cache cold this data takes up unreclaimable RAM.  However, frame 
pointer-related code eats up not just RAM but cache.

Assembly language routines become problematic no matter what you do unless you 
restrict the way the assembly can be written.  Experience has shown us that 
hand-maintaining annotations in assembly code is doomed to failure, and in many 
cases it isn't even clear to even experienced programmers how to do it.  [H.J.: 
is the .cfi_* operations set even documented anywhere in a way that 
non-compiler-writers can comprehend it?] Inline assembly becomes problematic in 
the case of non-frame-pointer builds if the assembly changes the stack pointer. 
 A static tool might be able to annotate simple cases like push/pop.

I'm, ahem, highly skeptical to creating our own unwinding data format unless 
there is *documented, supported, and tested* way to force the compiler to 
*automatically* fall back to frame pointers any time there may be complexity 
involved, which at a very minimum includes any kind of data-dependent 
manipulation of the stack pointer.  Otherwise you will have to fail the kernel 
build when your static tool runs into instruction sequences it can't deal with, 
but the compiler may generate them - boom.  Worse, your tool will not even 
recognize the problem and you're in a worse place than when you started.


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-09 Thread hpa
On May 8, 2017 8:38:31 PM PDT, Josh Poimboeuf  wrote:
>On Mon, May 08, 2017 at 07:31:50PM -0700, Andy Lutomirski wrote:
>> On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf 
>wrote:
>> >> Also, don't you need some indication of which reg is the base from
>> >> which you find previous frame?  After all, sometimes GCC will emit
>a
>> >> frame pointer even in an otherwise frame-pointer-omitting kernel.
>> >
>> > I don't think we *need* to do that.  I believe the base reg can
>just
>> > always[*] be the stack pointer, even with frame pointers.
>> 
>> What if there are functions that use alloca or variable length arrays
>> on the stack?  Functions using AHASH_REQUEST_ON_STACK come to mind.
>
>Wow, mind blown.  This is why I added you to CC!
>
>Ok, I guess we'll need to be able to use the frame pointer as a base
>reg.  It should be easy anyway.

[Adding H.J. Lu for obvious expertise - H.J., the issue at hand is doing stack 
unwinding to display the call stack on a kernel failure.  Right now the 
standard kernel configuration is using frame pointers even on x86-64 because we 
had very bad experiences with fragility of .eh_frame-based unwinding.  However, 
on some kernel workloads the cost of frame pointers  I is quite high.]

Yes is exactly why.  I believe gcc will still use a frame pointer when the 
relationship between sp and the incoming stack frame is indeterminate for some 
reason, but I have to admit that a) I'm not 100% sure and b) there are DWARF 
annotations for exactly this, so even if it is true for current gcc that a 
frame pointer is not needed it might not be in the future.

As far as I understand, the .eh_frame section is supposed to contain the subset 
of the DWARF bytecode needed to do a stack unwind when an exception is thrown, 
whereas the .debug* sections contain the full DWARF data a debugger might want. 
 Thus .eh_frame is mapped into the runtime process while .debug* [usually?] is 
not.  .debug* can easily be 10x larger than the executable text segments.

Since C doesn't *have* exceptions, the main user of this for C code is the case 
of calls C++ > C > C++ or equivalent; thus the vulnerability to toolchain 
filters failures – the case of an exception-caused unwind in the middle of a C 
function might not have been extensively tested.  [H.J.: is that correct?  Or 
could an asynchronous event like a signal cause an unwind of the .eh_frame from 
an arbitrary point?  If so, how is that tested?]

In the case of the kernel, size matters in a different way, because even though 
it will be cache cold this data takes up unreclaimable RAM.  However, frame 
pointer-related code eats up not just RAM but cache.

Assembly language routines become problematic no matter what you do unless you 
restrict the way the assembly can be written.  Experience has shown us that 
hand-maintaining annotations in assembly code is doomed to failure, and in many 
cases it isn't even clear to even experienced programmers how to do it.  [H.J.: 
is the .cfi_* operations set even documented anywhere in a way that 
non-compiler-writers can comprehend it?] Inline assembly becomes problematic in 
the case of non-frame-pointer builds if the assembly changes the stack pointer. 
 A static tool might be able to annotate simple cases like push/pop.

I'm, ahem, highly skeptical to creating our own unwinding data format unless 
there is *documented, supported, and tested* way to force the compiler to 
*automatically* fall back to frame pointers any time there may be complexity 
involved, which at a very minimum includes any kind of data-dependent 
manipulation of the stack pointer.  Otherwise you will have to fail the kernel 
build when your static tool runs into instruction sequences it can't deal with, 
but the compiler may generate them - boom.  Worse, your tool will not even 
recognize the problem and you're in a worse place than when you started.


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 7/7] DWARF: add the config option

2017-05-08 Thread Josh Poimboeuf
On Mon, May 08, 2017 at 07:31:50PM -0700, Andy Lutomirski wrote:
> On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf  wrote:
> >> Also, don't you need some indication of which reg is the base from
> >> which you find previous frame?  After all, sometimes GCC will emit a
> >> frame pointer even in an otherwise frame-pointer-omitting kernel.
> >
> > I don't think we *need* to do that.  I believe the base reg can just
> > always[*] be the stack pointer, even with frame pointers.
> 
> What if there are functions that use alloca or variable length arrays
> on the stack?  Functions using AHASH_REQUEST_ON_STACK come to mind.

Wow, mind blown.  This is why I added you to CC!

Ok, I guess we'll need to be able to use the frame pointer as a base
reg.  It should be easy anyway.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-08 Thread Josh Poimboeuf
On Mon, May 08, 2017 at 07:31:50PM -0700, Andy Lutomirski wrote:
> On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf  wrote:
> >> Also, don't you need some indication of which reg is the base from
> >> which you find previous frame?  After all, sometimes GCC will emit a
> >> frame pointer even in an otherwise frame-pointer-omitting kernel.
> >
> > I don't think we *need* to do that.  I believe the base reg can just
> > always[*] be the stack pointer, even with frame pointers.
> 
> What if there are functions that use alloca or variable length arrays
> on the stack?  Functions using AHASH_REQUEST_ON_STACK come to mind.

Wow, mind blown.  This is why I added you to CC!

Ok, I guess we'll need to be able to use the frame pointer as a base
reg.  It should be easy anyway.

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-08 Thread Andy Lutomirski
On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf  wrote:
> On Mon, May 08, 2017 at 05:21:24PM -0700, Andy Lutomirski wrote:
>> On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf  wrote:
>> > struct undwarf {
>> > unsigned int ip;/* instruction pointer (relative 
>> > offset from base) */
>> > unsigned prev_frame:13; /* offset to previous frame from 
>> > current stack pointer */
>> > unsigned regs:1;/* whether prev_frame contains 
>> > entry regs (regs->ip) */
>> > unsigned align:2;   /* some details for dealing with 
>> > gcc stack realignment */
>> > } __packed;
>> >
>> > extern struct undwarf undwarves[];
>>
>> Some comments in case you're actually planning to do this:
>>
>> 'unsigned int ip' is the majority of the size of this thing.  It might
>> be worth trying to store a lot fewer bits.  You could split the
>> structure into:
>>
>> struct undwarf_header {
>>   unsigned long start_ip;
>>   unsigned align:2;  /* i'm assuming this rarely changes */
>>   ...;
>>   unsigned int offset_to_details;
>> };
>>
>> and
>>
>> struct undwarf_details {
>>   unsigned short ip_offset;
>>   unsigned short prev_frame;
>> };
>>
>> and you'd find the details by first looking up the last header before
>> the ip and then finding the details starting at (uintptr_t)header +
>> header->offset_to_details.
>
> Good idea.  According to some back-of-a-napkin math, a scheme like this
> could reduce the data size from 1.8M down to 1.2M with my kernel config,
> a not-too-shabby 600k savings.
>
>> Also, don't you need some indication of which reg is the base from
>> which you find previous frame?  After all, sometimes GCC will emit a
>> frame pointer even in an otherwise frame-pointer-omitting kernel.
>
> I don't think we *need* to do that.  I believe the base reg can just
> always[*] be the stack pointer, even with frame pointers.

What if there are functions that use alloca or variable length arrays
on the stack?  Functions using AHASH_REQUEST_ON_STACK come to mind.

--Andy


Re: [PATCH 7/7] DWARF: add the config option

2017-05-08 Thread Andy Lutomirski
On Mon, May 8, 2017 at 6:38 PM, Josh Poimboeuf  wrote:
> On Mon, May 08, 2017 at 05:21:24PM -0700, Andy Lutomirski wrote:
>> On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf  wrote:
>> > struct undwarf {
>> > unsigned int ip;/* instruction pointer (relative 
>> > offset from base) */
>> > unsigned prev_frame:13; /* offset to previous frame from 
>> > current stack pointer */
>> > unsigned regs:1;/* whether prev_frame contains 
>> > entry regs (regs->ip) */
>> > unsigned align:2;   /* some details for dealing with 
>> > gcc stack realignment */
>> > } __packed;
>> >
>> > extern struct undwarf undwarves[];
>>
>> Some comments in case you're actually planning to do this:
>>
>> 'unsigned int ip' is the majority of the size of this thing.  It might
>> be worth trying to store a lot fewer bits.  You could split the
>> structure into:
>>
>> struct undwarf_header {
>>   unsigned long start_ip;
>>   unsigned align:2;  /* i'm assuming this rarely changes */
>>   ...;
>>   unsigned int offset_to_details;
>> };
>>
>> and
>>
>> struct undwarf_details {
>>   unsigned short ip_offset;
>>   unsigned short prev_frame;
>> };
>>
>> and you'd find the details by first looking up the last header before
>> the ip and then finding the details starting at (uintptr_t)header +
>> header->offset_to_details.
>
> Good idea.  According to some back-of-a-napkin math, a scheme like this
> could reduce the data size from 1.8M down to 1.2M with my kernel config,
> a not-too-shabby 600k savings.
>
>> Also, don't you need some indication of which reg is the base from
>> which you find previous frame?  After all, sometimes GCC will emit a
>> frame pointer even in an otherwise frame-pointer-omitting kernel.
>
> I don't think we *need* to do that.  I believe the base reg can just
> always[*] be the stack pointer, even with frame pointers.

What if there are functions that use alloca or variable length arrays
on the stack?  Functions using AHASH_REQUEST_ON_STACK come to mind.

--Andy


Re: [PATCH 7/7] DWARF: add the config option

2017-05-08 Thread Josh Poimboeuf
On Mon, May 08, 2017 at 05:21:24PM -0700, Andy Lutomirski wrote:
> On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf  wrote:
> > struct undwarf {
> > unsigned int ip;/* instruction pointer (relative 
> > offset from base) */
> > unsigned prev_frame:13; /* offset to previous frame from 
> > current stack pointer */
> > unsigned regs:1;/* whether prev_frame contains 
> > entry regs (regs->ip) */
> > unsigned align:2;   /* some details for dealing with 
> > gcc stack realignment */
> > } __packed;
> >
> > extern struct undwarf undwarves[];
> 
> Some comments in case you're actually planning to do this:
> 
> 'unsigned int ip' is the majority of the size of this thing.  It might
> be worth trying to store a lot fewer bits.  You could split the
> structure into:
> 
> struct undwarf_header {
>   unsigned long start_ip;
>   unsigned align:2;  /* i'm assuming this rarely changes */
>   ...;
>   unsigned int offset_to_details;
> };
> 
> and
> 
> struct undwarf_details {
>   unsigned short ip_offset;
>   unsigned short prev_frame;
> };
> 
> and you'd find the details by first looking up the last header before
> the ip and then finding the details starting at (uintptr_t)header +
> header->offset_to_details.

Good idea.  According to some back-of-a-napkin math, a scheme like this
could reduce the data size from 1.8M down to 1.2M with my kernel config,
a not-too-shabby 600k savings.

> Also, don't you need some indication of which reg is the base from
> which you find previous frame?  After all, sometimes GCC will emit a
> frame pointer even in an otherwise frame-pointer-omitting kernel.

I don't think we *need* to do that.  I believe the base reg can just
always[*] be the stack pointer, even with frame pointers.

That said, it might be beneficial to use the frame pointer as the base
reg where applicable, because it would shrink the data size, especially
in a kernel with frame pointers enabled.  And I think we would only need
an extra bit to store that info.

First I'll just start with the simplest possible scheme (i.e., what I
proposed above).  Even that calculates out to be smaller than the DWARF
.eh_frame stuff.  Then after that we can look at compression techniques
(and their associated tradeoffs).

[*] ignoring rare gcc stack realignments

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-08 Thread Josh Poimboeuf
On Mon, May 08, 2017 at 05:21:24PM -0700, Andy Lutomirski wrote:
> On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf  wrote:
> > struct undwarf {
> > unsigned int ip;/* instruction pointer (relative 
> > offset from base) */
> > unsigned prev_frame:13; /* offset to previous frame from 
> > current stack pointer */
> > unsigned regs:1;/* whether prev_frame contains 
> > entry regs (regs->ip) */
> > unsigned align:2;   /* some details for dealing with 
> > gcc stack realignment */
> > } __packed;
> >
> > extern struct undwarf undwarves[];
> 
> Some comments in case you're actually planning to do this:
> 
> 'unsigned int ip' is the majority of the size of this thing.  It might
> be worth trying to store a lot fewer bits.  You could split the
> structure into:
> 
> struct undwarf_header {
>   unsigned long start_ip;
>   unsigned align:2;  /* i'm assuming this rarely changes */
>   ...;
>   unsigned int offset_to_details;
> };
> 
> and
> 
> struct undwarf_details {
>   unsigned short ip_offset;
>   unsigned short prev_frame;
> };
> 
> and you'd find the details by first looking up the last header before
> the ip and then finding the details starting at (uintptr_t)header +
> header->offset_to_details.

Good idea.  According to some back-of-a-napkin math, a scheme like this
could reduce the data size from 1.8M down to 1.2M with my kernel config,
a not-too-shabby 600k savings.

> Also, don't you need some indication of which reg is the base from
> which you find previous frame?  After all, sometimes GCC will emit a
> frame pointer even in an otherwise frame-pointer-omitting kernel.

I don't think we *need* to do that.  I believe the base reg can just
always[*] be the stack pointer, even with frame pointers.

That said, it might be beneficial to use the frame pointer as the base
reg where applicable, because it would shrink the data size, especially
in a kernel with frame pointers enabled.  And I think we would only need
an extra bit to store that info.

First I'll just start with the simplest possible scheme (i.e., what I
proposed above).  Even that calculates out to be smaller than the DWARF
.eh_frame stuff.  Then after that we can look at compression techniques
(and their associated tradeoffs).

[*] ignoring rare gcc stack realignments

-- 
Josh


Re: [PATCH 7/7] DWARF: add the config option

2017-05-08 Thread Andy Lutomirski
On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf  wrote:
> On Fri, May 05, 2017 at 12:57:11PM -0700, Linus Torvalds wrote:
>> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby  wrote:
>> > The DWARF unwinder is in place and ready. So introduce the config option
>> > to allow users to enable it. It is by default off due to missing
>> > assembly annotations.
>>
>> Who actually ends up using this?
>>
>> Because from the last time we had fancy unwindoers, and all the
>> problems it caused for oops handling with absolutely _zero_ upsides
>> ever, I do not ever again want to see fancy unwinders with complex
>> state machine handling used by the oopsing code.
>>
>> The fact that it gets disabled for KASAN also makes me suspicious. It
>> basically means that now all the accesses it does are not even
>> validated.
>>
>> The fact that the most of the code seems to be disabled for the first
>> six patches, and then just enabled in the last patch, also seems to
>> mean that the series also gets no bisection coverage or testing that
>> the individual patches make any sense. (ie there's a lot of code
>> inside "CONFIG_DWARF_UNWIND" in the early patches but that config
>> option cannot even be enabled until the last patch).
>>
>> We used to have nasty issues with not just missing dwarf info, but
>> also actively *wrong* dwarf info. Compiler versions that generated
>> subtly wrong info, because nobody actually really depended on it, and
>> the people who had tested it seldom did the kinds of things we do in
>> the kernel (eg inline asms etc).
>>
>> So I'm personally still very suspicious of these things.
>>
>> Last time I had huge issues with people also always blaming *anything*
>> else than that unwinder. It was always "oh, somebody wrote asm without
>> getting it right". Or "oh, the compiler generated bad tables, it's not
>> *my* fault that now the kernel oopsing code no longer works".
>>
>> When I asked for more stricter debug table validation to avoid issues,
>> it was always "oh, we fixed it, no worries", and then two months later
>> somebody hit another issue.
>>
>> Put another way; the last time we did crazy stuff like this, it got
>> reverted. For a damn good reason, despite some people being in denial
>> about those reasons.
>
> Here's another possible idea that's been rattling around in my head.
> It's purely theoretical at this point, so I don't know for sure that it
> would work.  But I haven't been able to find any major issues with it
> yet.
>
> DWARF is great for debuggers.  It helps you find all the registers on
> the stack, so you can see function arguments and local variables.  All
> expressed in a nice compact format.
>
> But that's overkill for unwinders.  We don't need all those registers,
> and the state machine is too complicated.  Unwinders basically only need
> to know one thing: given an instruction address and a stack pointer,
> where is the caller's stack frame?
>
> I'm thinking/hoping that information can be expressed in a simple, easy
> to parse, reasonably sized data structure.  Something like a sorted
> array of this:
>
> struct undwarf {
> unsigned int ip;/* instruction pointer (relative 
> offset from base) */
> unsigned prev_frame:13; /* offset to previous frame from 
> current stack pointer */
> unsigned regs:1;/* whether prev_frame contains entry 
> regs (regs->ip) */
> unsigned align:2;   /* some details for dealing with gcc 
> stack realignment */
> } __packed;
>
> extern struct undwarf undwarves[];

Some comments in case you're actually planning to do this:

'unsigned int ip' is the majority of the size of this thing.  It might
be worth trying to store a lot fewer bits.  You could split the
structure into:

struct undwarf_header {
  unsigned long start_ip;
  unsigned align:2;  /* i'm assuming this rarely changes */
  ...;
  unsigned int offset_to_details;
};

and

struct undwarf_details {
  unsigned short ip_offset;
  unsigned short prev_frame;
};

and you'd find the details by first looking up the last header before
the ip and then finding the details starting at (uintptr_t)header +
header->offset_to_details.

Also, don't you need some indication of which reg is the base from
which you find previous frame?  After all, sometimes GCC will emit a
frame pointer even in an otherwise frame-pointer-omitting kernel.

--Andy


Re: [PATCH 7/7] DWARF: add the config option

2017-05-08 Thread Andy Lutomirski
On Sun, May 7, 2017 at 9:55 AM, Josh Poimboeuf  wrote:
> On Fri, May 05, 2017 at 12:57:11PM -0700, Linus Torvalds wrote:
>> On Fri, May 5, 2017 at 5:22 AM, Jiri Slaby  wrote:
>> > The DWARF unwinder is in place and ready. So introduce the config option
>> > to allow users to enable it. It is by default off due to missing
>> > assembly annotations.
>>
>> Who actually ends up using this?
>>
>> Because from the last time we had fancy unwindoers, and all the
>> problems it caused for oops handling with absolutely _zero_ upsides
>> ever, I do not ever again want to see fancy unwinders with complex
>> state machine handling used by the oopsing code.
>>
>> The fact that it gets disabled for KASAN also makes me suspicious. It
>> basically means that now all the accesses it does are not even
>> validated.
>>
>> The fact that the most of the code seems to be disabled for the first
>> six patches, and then just enabled in the last patch, also seems to
>> mean that the series also gets no bisection coverage or testing that
>> the individual patches make any sense. (ie there's a lot of code
>> inside "CONFIG_DWARF_UNWIND" in the early patches but that config
>> option cannot even be enabled until the last patch).
>>
>> We used to have nasty issues with not just missing dwarf info, but
>> also actively *wrong* dwarf info. Compiler versions that generated
>> subtly wrong info, because nobody actually really depended on it, and
>> the people who had tested it seldom did the kinds of things we do in
>> the kernel (eg inline asms etc).
>>
>> So I'm personally still very suspicious of these things.
>>
>> Last time I had huge issues with people also always blaming *anything*
>> else than that unwinder. It was always "oh, somebody wrote asm without
>> getting it right". Or "oh, the compiler generated bad tables, it's not
>> *my* fault that now the kernel oopsing code no longer works".
>>
>> When I asked for more stricter debug table validation to avoid issues,
>> it was always "oh, we fixed it, no worries", and then two months later
>> somebody hit another issue.
>>
>> Put another way; the last time we did crazy stuff like this, it got
>> reverted. For a damn good reason, despite some people being in denial
>> about those reasons.
>
> Here's another possible idea that's been rattling around in my head.
> It's purely theoretical at this point, so I don't know for sure that it
> would work.  But I haven't been able to find any major issues with it
> yet.
>
> DWARF is great for debuggers.  It helps you find all the registers on
> the stack, so you can see function arguments and local variables.  All
> expressed in a nice compact format.
>
> But that's overkill for unwinders.  We don't need all those registers,
> and the state machine is too complicated.  Unwinders basically only need
> to know one thing: given an instruction address and a stack pointer,
> where is the caller's stack frame?
>
> I'm thinking/hoping that information can be expressed in a simple, easy
> to parse, reasonably sized data structure.  Something like a sorted
> array of this:
>
> struct undwarf {
> unsigned int ip;/* instruction pointer (relative 
> offset from base) */
> unsigned prev_frame:13; /* offset to previous frame from 
> current stack pointer */
> unsigned regs:1;/* whether prev_frame contains entry 
> regs (regs->ip) */
> unsigned align:2;   /* some details for dealing with gcc 
> stack realignment */
> } __packed;
>
> extern struct undwarf undwarves[];

Some comments in case you're actually planning to do this:

'unsigned int ip' is the majority of the size of this thing.  It might
be worth trying to store a lot fewer bits.  You could split the
structure into:

struct undwarf_header {
  unsigned long start_ip;
  unsigned align:2;  /* i'm assuming this rarely changes */
  ...;
  unsigned int offset_to_details;
};

and

struct undwarf_details {
  unsigned short ip_offset;
  unsigned short prev_frame;
};

and you'd find the details by first looking up the last header before
the ip and then finding the details starting at (uintptr_t)header +
header->offset_to_details.

Also, don't you need some indication of which reg is the base from
which you find previous frame?  After all, sometimes GCC will emit a
frame pointer even in an otherwise frame-pointer-omitting kernel.

--Andy


  1   2   >