Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Josh Poimboeuf
On Thu, Sep 21, 2017 at 05:35:11PM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf  wrote:
> 
> > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> > > On 09/19/17 11:45, Josh Poimboeuf wrote:
> > > > For inline asm statements which have a CALL instruction, we list the
> > > > stack pointer as a constraint to convince GCC to ensure the frame
> > > > pointer is set up first:
> > > > 
> > > >   static inline void foo()
> > > >   {
> > > > register void *__sp asm(_ASM_SP);
> > > > asm("call bar" : "+r" (__sp))
> > > >   }
> > > > 
> > > > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > > > 
> > > > There's actually an easier way to achieve the same goal in GCC, without
> > > > causing trouble for clang.  If we declare the stack pointer register
> > > > variable as a global variable, and remove the constraint altogether,
> > > > that convinces GCC to always set up the frame pointer before inserting
> > > > *any* inline asm.
> > > > 
> > > > It basically acts as if *every* inline asm statement has a CALL
> > > > instruction.  It's a bit overkill, but the performance impact should be
> > > > negligible.
> > > > 
> > > 
> > > Again, probably negligible, but why do we need a frame pointer just
> > > because we have a call assembly instruction?
> > 
> > It's frame pointer convention.  Without it, if dumping the stack from
> > the called function, a function will get skipped in the stack trace.
> 
> BTW., could we perhaps relax this and simply phase out the frame pointer on 
> x86, 
> and simplify all our assembly in a cycle or two? ORC unwinder is working out 
> very 
> well so far. Live kernel patching can use ORC data just fine, and nothing 
> else 
> actually relies on frame pointers, right?
> 
> That would give one more register to assembly code.
> 
> I realize that we just rewrote a whole bunch of assembly code... but that was 
> the 
> price for ORC, in a way.

I think ORC isn't *quite* ready for livepatch yet, though it's close.
We could probably make it ready in a cycle or two.

However, I'm not sure whether we would want to remove livepatch support
for frame pointers yet:

- There might be some embedded livepatch users who can't deal with the
  extra 3MB of RAM needed by ORC.  (Though this is purely theoretical, I
  don't actually know anybody with this problem.  I suppose we could
  float the idea on the livepatch/kpatch mailing lists and see if
  anybody objects.)

- Objtool's ORC generation is working fine now, but objtool maintenance
  is currently a little heavy-handed, and I'm currently the only person
  who knows how to do it.  I've got an idea brewing for how to improve
  its maintainability with the help of compiler plugins, but until then,
  if I get hit by a bus tomorrow, who's going to pick it up?  It's nice
  to have frame pointers as a backup solution for live patching.

But also, is this a problem that's even worth fixing?  Do we have any
assembly code that would be noticeably better off with that extra
register?  Most of the changes were:

- adding FRAME_BEGIN/FRAME_END to some asm functions;
- juggling register usage in a few crypto functions; and
- adding the stack pointer constraint to 15 or so inline asm functions.

I think those changes weren't all that disruptive to start with.

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Josh Poimboeuf
On Thu, Sep 21, 2017 at 05:35:11PM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf  wrote:
> 
> > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> > > On 09/19/17 11:45, Josh Poimboeuf wrote:
> > > > For inline asm statements which have a CALL instruction, we list the
> > > > stack pointer as a constraint to convince GCC to ensure the frame
> > > > pointer is set up first:
> > > > 
> > > >   static inline void foo()
> > > >   {
> > > > register void *__sp asm(_ASM_SP);
> > > > asm("call bar" : "+r" (__sp))
> > > >   }
> > > > 
> > > > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > > > 
> > > > There's actually an easier way to achieve the same goal in GCC, without
> > > > causing trouble for clang.  If we declare the stack pointer register
> > > > variable as a global variable, and remove the constraint altogether,
> > > > that convinces GCC to always set up the frame pointer before inserting
> > > > *any* inline asm.
> > > > 
> > > > It basically acts as if *every* inline asm statement has a CALL
> > > > instruction.  It's a bit overkill, but the performance impact should be
> > > > negligible.
> > > > 
> > > 
> > > Again, probably negligible, but why do we need a frame pointer just
> > > because we have a call assembly instruction?
> > 
> > It's frame pointer convention.  Without it, if dumping the stack from
> > the called function, a function will get skipped in the stack trace.
> 
> BTW., could we perhaps relax this and simply phase out the frame pointer on 
> x86, 
> and simplify all our assembly in a cycle or two? ORC unwinder is working out 
> very 
> well so far. Live kernel patching can use ORC data just fine, and nothing 
> else 
> actually relies on frame pointers, right?
> 
> That would give one more register to assembly code.
> 
> I realize that we just rewrote a whole bunch of assembly code... but that was 
> the 
> price for ORC, in a way.

I think ORC isn't *quite* ready for livepatch yet, though it's close.
We could probably make it ready in a cycle or two.

However, I'm not sure whether we would want to remove livepatch support
for frame pointers yet:

- There might be some embedded livepatch users who can't deal with the
  extra 3MB of RAM needed by ORC.  (Though this is purely theoretical, I
  don't actually know anybody with this problem.  I suppose we could
  float the idea on the livepatch/kpatch mailing lists and see if
  anybody objects.)

- Objtool's ORC generation is working fine now, but objtool maintenance
  is currently a little heavy-handed, and I'm currently the only person
  who knows how to do it.  I've got an idea brewing for how to improve
  its maintainability with the help of compiler plugins, but until then,
  if I get hit by a bus tomorrow, who's going to pick it up?  It's nice
  to have frame pointers as a backup solution for live patching.

But also, is this a problem that's even worth fixing?  Do we have any
assembly code that would be noticeably better off with that extra
register?  Most of the changes were:

- adding FRAME_BEGIN/FRAME_END to some asm functions;
- juggling register usage in a few crypto functions; and
- adding the stack pointer constraint to 15 or so inline asm functions.

I think those changes weren't all that disruptive to start with.

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> > On 09/19/17 11:45, Josh Poimboeuf wrote:
> > > For inline asm statements which have a CALL instruction, we list the
> > > stack pointer as a constraint to convince GCC to ensure the frame
> > > pointer is set up first:
> > > 
> > >   static inline void foo()
> > >   {
> > >   register void *__sp asm(_ASM_SP);
> > >   asm("call bar" : "+r" (__sp))
> > >   }
> > > 
> > > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > > 
> > > There's actually an easier way to achieve the same goal in GCC, without
> > > causing trouble for clang.  If we declare the stack pointer register
> > > variable as a global variable, and remove the constraint altogether,
> > > that convinces GCC to always set up the frame pointer before inserting
> > > *any* inline asm.
> > > 
> > > It basically acts as if *every* inline asm statement has a CALL
> > > instruction.  It's a bit overkill, but the performance impact should be
> > > negligible.
> > > 
> > 
> > Again, probably negligible, but why do we need a frame pointer just
> > because we have a call assembly instruction?
> 
> It's frame pointer convention.  Without it, if dumping the stack from
> the called function, a function will get skipped in the stack trace.

BTW., could we perhaps relax this and simply phase out the frame pointer on 
x86, 
and simplify all our assembly in a cycle or two? ORC unwinder is working out 
very 
well so far. Live kernel patching can use ORC data just fine, and nothing else 
actually relies on frame pointers, right?

That would give one more register to assembly code.

I realize that we just rewrote a whole bunch of assembly code... but that was 
the 
price for ORC, in a way.

Thanks,

Ingo


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Ingo Molnar

* Josh Poimboeuf  wrote:

> On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> > On 09/19/17 11:45, Josh Poimboeuf wrote:
> > > For inline asm statements which have a CALL instruction, we list the
> > > stack pointer as a constraint to convince GCC to ensure the frame
> > > pointer is set up first:
> > > 
> > >   static inline void foo()
> > >   {
> > >   register void *__sp asm(_ASM_SP);
> > >   asm("call bar" : "+r" (__sp))
> > >   }
> > > 
> > > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > > 
> > > There's actually an easier way to achieve the same goal in GCC, without
> > > causing trouble for clang.  If we declare the stack pointer register
> > > variable as a global variable, and remove the constraint altogether,
> > > that convinces GCC to always set up the frame pointer before inserting
> > > *any* inline asm.
> > > 
> > > It basically acts as if *every* inline asm statement has a CALL
> > > instruction.  It's a bit overkill, but the performance impact should be
> > > negligible.
> > > 
> > 
> > Again, probably negligible, but why do we need a frame pointer just
> > because we have a call assembly instruction?
> 
> It's frame pointer convention.  Without it, if dumping the stack from
> the called function, a function will get skipped in the stack trace.

BTW., could we perhaps relax this and simply phase out the frame pointer on 
x86, 
and simplify all our assembly in a cycle or two? ORC unwinder is working out 
very 
well so far. Live kernel patching can use ORC data just fine, and nothing else 
actually relies on frame pointers, right?

That would give one more register to assembly code.

I realize that we just rewrote a whole bunch of assembly code... but that was 
the 
price for ORC, in a way.

Thanks,

Ingo


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Dmitry Vyukov
On Thu, Sep 21, 2017 at 1:52 PM, Brian Gerst  wrote:
>>> I think we need just the frame itself and RSP pointing below this
>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>> below used part of stack in leaf functions.
>>>
>>
>> In the kernel it does.  Redzoning is not allowed in the kernel, because
>> interrupts or exceptions would also smash the redzone.
>
> I see... But it's the same for user-space signals, the first thing a
> signal should do is to skip the redzone. I guess interrupt handlers
> should switch to interrupt stack which avoids smashing redzone
> altogether. Do you mean nested interrupts/exceptions in interrupts?
> In my experience frames in leaf functions can have pretty large
> performance penalty. Wonder if we have we considered changing
> interrupt/exception handlers to avoid smashing redzones and disable
> leaf frames?

 Currently, on x86-64, I believe all exceptions have their own dedicated
 stacks in the kernel, but IRQs still come in on the task's kernel stack.

 Andy, do you know if there's a reason why IRQs don't use a dedicated IST
 stack?

>>>
>>> Because IST is awful due to recursion issues.  We immediately switch to an 
>>> IRQ stack, though.
>>>
>>> If the kernel wanted a redzone, it would have to use IST for everything, 
>>> which would entail a bunch of unpleasant hackery.
>>
>> Thanks.
>>
>> I guess it must be finite recursion, because we could not handle
>> infinite with finite stack. I thing that solves it is simply:
>>
>> sub $256, %rsp
>> ... do stuff ...
>> add $256, %rsp
>>
>> Don't know if it's applicable to interrupts or not.
>
> No, it is not.  The processor pushes 5 or 6 words of data on the stack
> (the IRET frame plus an error code for certain exceptions) before the
> interrupt handler gets control.  So without using the IST for stack
> switching on every interrupt, the redzone cannot be used in the kernel
> as it will get smashed by the IRET frame.  In addition, since the
> kernel's stack is limited in size, skipping 128 bytes on every
> interrupt would overrun the stack faster.  The small gain from using
> the redzone in the kernel is outweighed by these limitations.

I see, thanks for educating.


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Dmitry Vyukov
On Thu, Sep 21, 2017 at 1:52 PM, Brian Gerst  wrote:
>>> I think we need just the frame itself and RSP pointing below this
>>> frame. If we don't have a frame, CALL instruction will smash whatever
>>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>>> below used part of stack in leaf functions.
>>>
>>
>> In the kernel it does.  Redzoning is not allowed in the kernel, because
>> interrupts or exceptions would also smash the redzone.
>
> I see... But it's the same for user-space signals, the first thing a
> signal should do is to skip the redzone. I guess interrupt handlers
> should switch to interrupt stack which avoids smashing redzone
> altogether. Do you mean nested interrupts/exceptions in interrupts?
> In my experience frames in leaf functions can have pretty large
> performance penalty. Wonder if we have we considered changing
> interrupt/exception handlers to avoid smashing redzones and disable
> leaf frames?

 Currently, on x86-64, I believe all exceptions have their own dedicated
 stacks in the kernel, but IRQs still come in on the task's kernel stack.

 Andy, do you know if there's a reason why IRQs don't use a dedicated IST
 stack?

>>>
>>> Because IST is awful due to recursion issues.  We immediately switch to an 
>>> IRQ stack, though.
>>>
>>> If the kernel wanted a redzone, it would have to use IST for everything, 
>>> which would entail a bunch of unpleasant hackery.
>>
>> Thanks.
>>
>> I guess it must be finite recursion, because we could not handle
>> infinite with finite stack. I thing that solves it is simply:
>>
>> sub $256, %rsp
>> ... do stuff ...
>> add $256, %rsp
>>
>> Don't know if it's applicable to interrupts or not.
>
> No, it is not.  The processor pushes 5 or 6 words of data on the stack
> (the IRET frame plus an error code for certain exceptions) before the
> interrupt handler gets control.  So without using the IST for stack
> switching on every interrupt, the redzone cannot be used in the kernel
> as it will get smashed by the IRET frame.  In addition, since the
> kernel's stack is limited in size, skipping 128 bytes on every
> interrupt would overrun the stack faster.  The small gain from using
> the redzone in the kernel is outweighed by these limitations.

I see, thanks for educating.


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Brian Gerst
On Thu, Sep 21, 2017 at 4:12 AM, Dmitry Vyukov  wrote:
> On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski  wrote:
 On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
>> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>
>> I think we need just the frame itself and RSP pointing below this
>> frame. If we don't have a frame, CALL instruction will smash whatever
>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>> below used part of stack in leaf functions.
>>
>
> In the kernel it does.  Redzoning is not allowed in the kernel, because
> interrupts or exceptions would also smash the redzone.

 I see... But it's the same for user-space signals, the first thing a
 signal should do is to skip the redzone. I guess interrupt handlers
 should switch to interrupt stack which avoids smashing redzone
 altogether. Do you mean nested interrupts/exceptions in interrupts?
 In my experience frames in leaf functions can have pretty large
 performance penalty. Wonder if we have we considered changing
 interrupt/exception handlers to avoid smashing redzones and disable
 leaf frames?
>>>
>>> Currently, on x86-64, I believe all exceptions have their own dedicated
>>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>>
>>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>>> stack?
>>>
>>
>> Because IST is awful due to recursion issues.  We immediately switch to an 
>> IRQ stack, though.
>>
>> If the kernel wanted a redzone, it would have to use IST for everything, 
>> which would entail a bunch of unpleasant hackery.
>
> Thanks.
>
> I guess it must be finite recursion, because we could not handle
> infinite with finite stack. I thing that solves it is simply:
>
> sub $256, %rsp
> ... do stuff ...
> add $256, %rsp
>
> Don't know if it's applicable to interrupts or not.

No, it is not.  The processor pushes 5 or 6 words of data on the stack
(the IRET frame plus an error code for certain exceptions) before the
interrupt handler gets control.  So without using the IST for stack
switching on every interrupt, the redzone cannot be used in the kernel
as it will get smashed by the IRET frame.  In addition, since the
kernel's stack is limited in size, skipping 128 bytes on every
interrupt would overrun the stack faster.  The small gain from using
the redzone in the kernel is outweighed by these limitations.

--
Brian Gerst


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Brian Gerst
On Thu, Sep 21, 2017 at 4:12 AM, Dmitry Vyukov  wrote:
> On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski  wrote:
 On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
>> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>
>> I think we need just the frame itself and RSP pointing below this
>> frame. If we don't have a frame, CALL instruction will smash whatever
>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>> below used part of stack in leaf functions.
>>
>
> In the kernel it does.  Redzoning is not allowed in the kernel, because
> interrupts or exceptions would also smash the redzone.

 I see... But it's the same for user-space signals, the first thing a
 signal should do is to skip the redzone. I guess interrupt handlers
 should switch to interrupt stack which avoids smashing redzone
 altogether. Do you mean nested interrupts/exceptions in interrupts?
 In my experience frames in leaf functions can have pretty large
 performance penalty. Wonder if we have we considered changing
 interrupt/exception handlers to avoid smashing redzones and disable
 leaf frames?
>>>
>>> Currently, on x86-64, I believe all exceptions have their own dedicated
>>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>>
>>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>>> stack?
>>>
>>
>> Because IST is awful due to recursion issues.  We immediately switch to an 
>> IRQ stack, though.
>>
>> If the kernel wanted a redzone, it would have to use IST for everything, 
>> which would entail a bunch of unpleasant hackery.
>
> Thanks.
>
> I guess it must be finite recursion, because we could not handle
> infinite with finite stack. I thing that solves it is simply:
>
> sub $256, %rsp
> ... do stuff ...
> add $256, %rsp
>
> Don't know if it's applicable to interrupts or not.

No, it is not.  The processor pushes 5 or 6 words of data on the stack
(the IRET frame plus an error code for certain exceptions) before the
interrupt handler gets control.  So without using the IST for stack
switching on every interrupt, the redzone cannot be used in the kernel
as it will get smashed by the IRET frame.  In addition, since the
kernel's stack is limited in size, skipping 128 bytes on every
interrupt would overrun the stack faster.  The small gain from using
the redzone in the kernel is outweighed by these limitations.

--
Brian Gerst


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Dmitry Vyukov
On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski  wrote:
>>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
 On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
> On 09/20/17 10:38, Dmitry Vyukov wrote:
>
> I think we need just the frame itself and RSP pointing below this
> frame. If we don't have a frame, CALL instruction will smash whatever
> RSP happens to point to. Compiler doesn't have to setup RSP to point
> below used part of stack in leaf functions.
>

 In the kernel it does.  Redzoning is not allowed in the kernel, because
 interrupts or exceptions would also smash the redzone.
>>>
>>> I see... But it's the same for user-space signals, the first thing a
>>> signal should do is to skip the redzone. I guess interrupt handlers
>>> should switch to interrupt stack which avoids smashing redzone
>>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>>> In my experience frames in leaf functions can have pretty large
>>> performance penalty. Wonder if we have we considered changing
>>> interrupt/exception handlers to avoid smashing redzones and disable
>>> leaf frames?
>>
>> Currently, on x86-64, I believe all exceptions have their own dedicated
>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>
>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>> stack?
>>
>
> Because IST is awful due to recursion issues.  We immediately switch to an 
> IRQ stack, though.
>
> If the kernel wanted a redzone, it would have to use IST for everything, 
> which would entail a bunch of unpleasant hackery.

Thanks.

I guess it must be finite recursion, because we could not handle
infinite with finite stack. I thing that solves it is simply:

sub $256, %rsp
... do stuff ...
add $256, %rsp

Don't know if it's applicable to interrupts or not.


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-21 Thread Dmitry Vyukov
On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski  wrote:
>>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
 On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
> On 09/20/17 10:38, Dmitry Vyukov wrote:
>
> I think we need just the frame itself and RSP pointing below this
> frame. If we don't have a frame, CALL instruction will smash whatever
> RSP happens to point to. Compiler doesn't have to setup RSP to point
> below used part of stack in leaf functions.
>

 In the kernel it does.  Redzoning is not allowed in the kernel, because
 interrupts or exceptions would also smash the redzone.
>>>
>>> I see... But it's the same for user-space signals, the first thing a
>>> signal should do is to skip the redzone. I guess interrupt handlers
>>> should switch to interrupt stack which avoids smashing redzone
>>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>>> In my experience frames in leaf functions can have pretty large
>>> performance penalty. Wonder if we have we considered changing
>>> interrupt/exception handlers to avoid smashing redzones and disable
>>> leaf frames?
>>
>> Currently, on x86-64, I believe all exceptions have their own dedicated
>> stacks in the kernel, but IRQs still come in on the task's kernel stack.
>>
>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
>> stack?
>>
>
> Because IST is awful due to recursion issues.  We immediately switch to an 
> IRQ stack, though.
>
> If the kernel wanted a redzone, it would have to use IST for everything, 
> which would entail a bunch of unpleasant hackery.

Thanks.

I guess it must be finite recursion, because we could not handle
infinite with finite stack. I thing that solves it is simply:

sub $256, %rsp
... do stuff ...
add $256, %rsp

Don't know if it's applicable to interrupts or not.


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Andy Lutomirski


> On Sep 20, 2017, at 2:07 PM, Josh Poimboeuf  wrote:
> 
>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
>>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
 On 09/20/17 10:38, Dmitry Vyukov wrote:
 
 I think we need just the frame itself and RSP pointing below this
 frame. If we don't have a frame, CALL instruction will smash whatever
 RSP happens to point to. Compiler doesn't have to setup RSP to point
 below used part of stack in leaf functions.
 
>>> 
>>> In the kernel it does.  Redzoning is not allowed in the kernel, because
>>> interrupts or exceptions would also smash the redzone.
>> 
>> I see... But it's the same for user-space signals, the first thing a
>> signal should do is to skip the redzone. I guess interrupt handlers
>> should switch to interrupt stack which avoids smashing redzone
>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>> In my experience frames in leaf functions can have pretty large
>> performance penalty. Wonder if we have we considered changing
>> interrupt/exception handlers to avoid smashing redzones and disable
>> leaf frames?
> 
> Currently, on x86-64, I believe all exceptions have their own dedicated
> stacks in the kernel, but IRQs still come in on the task's kernel stack.
> 
> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
> stack?
> 

Because IST is awful due to recursion issues.  We immediately switch to an IRQ 
stack, though.

If the kernel wanted a redzone, it would have to use IST for everything, which 
would entail a bunch of unpleasant hackery.

> -- 
> Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Andy Lutomirski


> On Sep 20, 2017, at 2:07 PM, Josh Poimboeuf  wrote:
> 
>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
>>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
 On 09/20/17 10:38, Dmitry Vyukov wrote:
 
 I think we need just the frame itself and RSP pointing below this
 frame. If we don't have a frame, CALL instruction will smash whatever
 RSP happens to point to. Compiler doesn't have to setup RSP to point
 below used part of stack in leaf functions.
 
>>> 
>>> In the kernel it does.  Redzoning is not allowed in the kernel, because
>>> interrupts or exceptions would also smash the redzone.
>> 
>> I see... But it's the same for user-space signals, the first thing a
>> signal should do is to skip the redzone. I guess interrupt handlers
>> should switch to interrupt stack which avoids smashing redzone
>> altogether. Do you mean nested interrupts/exceptions in interrupts?
>> In my experience frames in leaf functions can have pretty large
>> performance penalty. Wonder if we have we considered changing
>> interrupt/exception handlers to avoid smashing redzones and disable
>> leaf frames?
> 
> Currently, on x86-64, I believe all exceptions have their own dedicated
> stacks in the kernel, but IRQs still come in on the task's kernel stack.
> 
> Andy, do you know if there's a reason why IRQs don't use a dedicated IST
> stack?
> 

Because IST is awful due to recursion issues.  We immediately switch to an IRQ 
stack, though.

If the kernel wanted a redzone, it would have to use IST for everything, which 
would entail a bunch of unpleasant hackery.

> -- 
> Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Josh Poimboeuf
On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
> > On 09/20/17 10:38, Dmitry Vyukov wrote:
> >>
> >> I think we need just the frame itself and RSP pointing below this
> >> frame. If we don't have a frame, CALL instruction will smash whatever
> >> RSP happens to point to. Compiler doesn't have to setup RSP to point
> >> below used part of stack in leaf functions.
> >>
> >
> > In the kernel it does.  Redzoning is not allowed in the kernel, because
> > interrupts or exceptions would also smash the redzone.
> 
> I see... But it's the same for user-space signals, the first thing a
> signal should do is to skip the redzone. I guess interrupt handlers
> should switch to interrupt stack which avoids smashing redzone
> altogether. Do you mean nested interrupts/exceptions in interrupts?
> In my experience frames in leaf functions can have pretty large
> performance penalty. Wonder if we have we considered changing
> interrupt/exception handlers to avoid smashing redzones and disable
> leaf frames?

Currently, on x86-64, I believe all exceptions have their own dedicated
stacks in the kernel, but IRQs still come in on the task's kernel stack.

Andy, do you know if there's a reason why IRQs don't use a dedicated IST
stack?

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Josh Poimboeuf
On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote:
> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
> > On 09/20/17 10:38, Dmitry Vyukov wrote:
> >>
> >> I think we need just the frame itself and RSP pointing below this
> >> frame. If we don't have a frame, CALL instruction will smash whatever
> >> RSP happens to point to. Compiler doesn't have to setup RSP to point
> >> below used part of stack in leaf functions.
> >>
> >
> > In the kernel it does.  Redzoning is not allowed in the kernel, because
> > interrupts or exceptions would also smash the redzone.
> 
> I see... But it's the same for user-space signals, the first thing a
> signal should do is to skip the redzone. I guess interrupt handlers
> should switch to interrupt stack which avoids smashing redzone
> altogether. Do you mean nested interrupts/exceptions in interrupts?
> In my experience frames in leaf functions can have pretty large
> performance penalty. Wonder if we have we considered changing
> interrupt/exception handlers to avoid smashing redzones and disable
> leaf frames?

Currently, on x86-64, I believe all exceptions have their own dedicated
stacks in the kernel, but IRQs still come in on the task's kernel stack.

Andy, do you know if there's a reason why IRQs don't use a dedicated IST
stack?

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Dmitry Vyukov
On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>
>> I think we need just the frame itself and RSP pointing below this
>> frame. If we don't have a frame, CALL instruction will smash whatever
>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>> below used part of stack in leaf functions.
>>
>
> In the kernel it does.  Redzoning is not allowed in the kernel, because
> interrupts or exceptions would also smash the redzone.

I see... But it's the same for user-space signals, the first thing a
signal should do is to skip the redzone. I guess interrupt handlers
should switch to interrupt stack which avoids smashing redzone
altogether. Do you mean nested interrupts/exceptions in interrupts?
In my experience frames in leaf functions can have pretty large
performance penalty. Wonder if we have we considered changing
interrupt/exception handlers to avoid smashing redzones and disable
leaf frames?


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Dmitry Vyukov
On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin  wrote:
> On 09/20/17 10:38, Dmitry Vyukov wrote:
>>
>> I think we need just the frame itself and RSP pointing below this
>> frame. If we don't have a frame, CALL instruction will smash whatever
>> RSP happens to point to. Compiler doesn't have to setup RSP to point
>> below used part of stack in leaf functions.
>>
>
> In the kernel it does.  Redzoning is not allowed in the kernel, because
> interrupts or exceptions would also smash the redzone.

I see... But it's the same for user-space signals, the first thing a
signal should do is to skip the redzone. I guess interrupt handlers
should switch to interrupt stack which avoids smashing redzone
altogether. Do you mean nested interrupts/exceptions in interrupts?
In my experience frames in leaf functions can have pretty large
performance penalty. Wonder if we have we considered changing
interrupt/exception handlers to avoid smashing redzones and disable
leaf frames?


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread H. Peter Anvin
On 09/20/17 10:38, Dmitry Vyukov wrote:
> 
> I think we need just the frame itself and RSP pointing below this
> frame. If we don't have a frame, CALL instruction will smash whatever
> RSP happens to point to. Compiler doesn't have to setup RSP to point
> below used part of stack in leaf functions.
> 

In the kernel it does.  Redzoning is not allowed in the kernel, because
interrupts or exceptions would also smash the redzone.

-hpa



Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread H. Peter Anvin
On 09/20/17 10:38, Dmitry Vyukov wrote:
> 
> I think we need just the frame itself and RSP pointing below this
> frame. If we don't have a frame, CALL instruction will smash whatever
> RSP happens to point to. Compiler doesn't have to setup RSP to point
> below used part of stack in leaf functions.
> 

In the kernel it does.  Redzoning is not allowed in the kernel, because
interrupts or exceptions would also smash the redzone.

-hpa



Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Josh Poimboeuf
On Tue, Sep 19, 2017 at 08:18:23PM -0500, Josh Poimboeuf wrote:
> On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote:
> > For inline asm statements which have a CALL instruction, we list the
> > stack pointer as a constraint to convince GCC to ensure the frame
> > pointer is set up first:
> > 
> >   static inline void foo()
> >   {
> > register void *__sp asm(_ASM_SP);
> > asm("call bar" : "+r" (__sp))
> >   }
> > 
> > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > 
> > There's actually an easier way to achieve the same goal in GCC, without
> > causing trouble for clang.  If we declare the stack pointer register
> > variable as a global variable, and remove the constraint altogether,
> > that convinces GCC to always set up the frame pointer before inserting
> > *any* inline asm.
> > 
> > It basically acts as if *every* inline asm statement has a CALL
> > instruction.  It's a bit overkill, but the performance impact should be
> > negligible.
> > 
> > Here are the vmlinux .text size differences with the following configs
> > on GCC:
> > 
> > - defconfig
> > - defconfig without frame pointers
> > - Fedora distro config
> > - Fedora distro config without frame pointers
> > 
> > defconfig   defconfig-nofp  distro  distro-nofp
> > before  9796300 9466764 9076191 8789745
> > after   9796941 9462859 9076381 8785325
> > 
> > With frame pointers, the text size increases slightly.  Without frame
> > pointers, the text size decreases, and a little more significantly.
> > 
> > Reported-by: Matthias Kaehlcke 
> > Signed-off-by: Josh Poimboeuf 
> 
> NAK - kbuild bot is reporting some cases where this patch doesn't force
> the frame pointer setup.

So it turns out that for GCC 7, it works as described above: the global
register variable results in the frame pointer getting set up before
*all* inline asm.

But for GCC 6, it doesn't work that way.  The global register variable
has no such effect.

So we need both the global register variable *and* the output constraint
after all.  Will post another patch after some more testing.

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Josh Poimboeuf
On Tue, Sep 19, 2017 at 08:18:23PM -0500, Josh Poimboeuf wrote:
> On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote:
> > For inline asm statements which have a CALL instruction, we list the
> > stack pointer as a constraint to convince GCC to ensure the frame
> > pointer is set up first:
> > 
> >   static inline void foo()
> >   {
> > register void *__sp asm(_ASM_SP);
> > asm("call bar" : "+r" (__sp))
> >   }
> > 
> > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > 
> > There's actually an easier way to achieve the same goal in GCC, without
> > causing trouble for clang.  If we declare the stack pointer register
> > variable as a global variable, and remove the constraint altogether,
> > that convinces GCC to always set up the frame pointer before inserting
> > *any* inline asm.
> > 
> > It basically acts as if *every* inline asm statement has a CALL
> > instruction.  It's a bit overkill, but the performance impact should be
> > negligible.
> > 
> > Here are the vmlinux .text size differences with the following configs
> > on GCC:
> > 
> > - defconfig
> > - defconfig without frame pointers
> > - Fedora distro config
> > - Fedora distro config without frame pointers
> > 
> > defconfig   defconfig-nofp  distro  distro-nofp
> > before  9796300 9466764 9076191 8789745
> > after   9796941 9462859 9076381 8785325
> > 
> > With frame pointers, the text size increases slightly.  Without frame
> > pointers, the text size decreases, and a little more significantly.
> > 
> > Reported-by: Matthias Kaehlcke 
> > Signed-off-by: Josh Poimboeuf 
> 
> NAK - kbuild bot is reporting some cases where this patch doesn't force
> the frame pointer setup.

So it turns out that for GCC 7, it works as described above: the global
register variable results in the frame pointer getting set up before
*all* inline asm.

But for GCC 6, it doesn't work that way.  The global register variable
has no such effect.

So we need both the global register variable *and* the output constraint
after all.  Will post another patch after some more testing.

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Josh Poimboeuf
On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> On 09/19/17 11:45, Josh Poimboeuf wrote:
> > For inline asm statements which have a CALL instruction, we list the
> > stack pointer as a constraint to convince GCC to ensure the frame
> > pointer is set up first:
> > 
> >   static inline void foo()
> >   {
> > register void *__sp asm(_ASM_SP);
> > asm("call bar" : "+r" (__sp))
> >   }
> > 
> > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > 
> > There's actually an easier way to achieve the same goal in GCC, without
> > causing trouble for clang.  If we declare the stack pointer register
> > variable as a global variable, and remove the constraint altogether,
> > that convinces GCC to always set up the frame pointer before inserting
> > *any* inline asm.
> > 
> > It basically acts as if *every* inline asm statement has a CALL
> > instruction.  It's a bit overkill, but the performance impact should be
> > negligible.
> > 
> 
> Again, probably negligible, but why do we need a frame pointer just
> because we have a call assembly instruction?

It's frame pointer convention.  Without it, if dumping the stack from
the called function, a function will get skipped in the stack trace.

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Josh Poimboeuf
On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote:
> On 09/19/17 11:45, Josh Poimboeuf wrote:
> > For inline asm statements which have a CALL instruction, we list the
> > stack pointer as a constraint to convince GCC to ensure the frame
> > pointer is set up first:
> > 
> >   static inline void foo()
> >   {
> > register void *__sp asm(_ASM_SP);
> > asm("call bar" : "+r" (__sp))
> >   }
> > 
> > Unfortunately, that pattern causes clang to corrupt the stack pointer.
> > 
> > There's actually an easier way to achieve the same goal in GCC, without
> > causing trouble for clang.  If we declare the stack pointer register
> > variable as a global variable, and remove the constraint altogether,
> > that convinces GCC to always set up the frame pointer before inserting
> > *any* inline asm.
> > 
> > It basically acts as if *every* inline asm statement has a CALL
> > instruction.  It's a bit overkill, but the performance impact should be
> > negligible.
> > 
> 
> Again, probably negligible, but why do we need a frame pointer just
> because we have a call assembly instruction?

It's frame pointer convention.  Without it, if dumping the stack from
the called function, a function will get skipped in the stack trace.

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread H. Peter Anvin
On 09/19/17 11:45, Josh Poimboeuf wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
> 
>   static inline void foo()
>   {
>   register void *__sp asm(_ASM_SP);
>   asm("call bar" : "+r" (__sp))
>   }
> 
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> 
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang.  If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
> 
> It basically acts as if *every* inline asm statement has a CALL
> instruction.  It's a bit overkill, but the performance impact should be
> negligible.
> 

Again, probably negligible, but why do we need a frame pointer just
because we have a call assembly instruction?

-hpa




Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread H. Peter Anvin
On 09/19/17 11:45, Josh Poimboeuf wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
> 
>   static inline void foo()
>   {
>   register void *__sp asm(_ASM_SP);
>   asm("call bar" : "+r" (__sp))
>   }
> 
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> 
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang.  If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
> 
> It basically acts as if *every* inline asm statement has a CALL
> instruction.  It's a bit overkill, but the performance impact should be
> negligible.
> 

Again, probably negligible, but why do we need a frame pointer just
because we have a call assembly instruction?

-hpa




Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Dmitry Vyukov
On Wed, Sep 20, 2017 at 7:32 PM, H. Peter Anvin  wrote:
> On 09/19/17 11:45, Josh Poimboeuf wrote:
>> For inline asm statements which have a CALL instruction, we list the
>> stack pointer as a constraint to convince GCC to ensure the frame
>> pointer is set up first:
>>
>>   static inline void foo()
>>   {
>>   register void *__sp asm(_ASM_SP);
>>   asm("call bar" : "+r" (__sp))
>>   }
>>
>> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>>
>> There's actually an easier way to achieve the same goal in GCC, without
>> causing trouble for clang.  If we declare the stack pointer register
>> variable as a global variable, and remove the constraint altogether,
>> that convinces GCC to always set up the frame pointer before inserting
>> *any* inline asm.
>>
>> It basically acts as if *every* inline asm statement has a CALL
>> instruction.  It's a bit overkill, but the performance impact should be
>> negligible.
>>
>
> Again, probably negligible, but why do we need a frame pointer just
> because we have a call assembly instruction?

I think we need just the frame itself and RSP pointing below this
frame. If we don't have a frame, CALL instruction will smash whatever
RSP happens to point to. Compiler doesn't have to setup RSP to point
below used part of stack in leaf functions.


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-20 Thread Dmitry Vyukov
On Wed, Sep 20, 2017 at 7:32 PM, H. Peter Anvin  wrote:
> On 09/19/17 11:45, Josh Poimboeuf wrote:
>> For inline asm statements which have a CALL instruction, we list the
>> stack pointer as a constraint to convince GCC to ensure the frame
>> pointer is set up first:
>>
>>   static inline void foo()
>>   {
>>   register void *__sp asm(_ASM_SP);
>>   asm("call bar" : "+r" (__sp))
>>   }
>>
>> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>>
>> There's actually an easier way to achieve the same goal in GCC, without
>> causing trouble for clang.  If we declare the stack pointer register
>> variable as a global variable, and remove the constraint altogether,
>> that convinces GCC to always set up the frame pointer before inserting
>> *any* inline asm.
>>
>> It basically acts as if *every* inline asm statement has a CALL
>> instruction.  It's a bit overkill, but the performance impact should be
>> negligible.
>>
>
> Again, probably negligible, but why do we need a frame pointer just
> because we have a call assembly instruction?

I think we need just the frame itself and RSP pointing below this
frame. If we don't have a frame, CALL instruction will smash whatever
RSP happens to point to. Compiler doesn't have to setup RSP to point
below used part of stack in leaf functions.


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-19 Thread Josh Poimboeuf
On Tue, Sep 19, 2017 at 03:25:59PM -0700, Alexander Potapenko wrote:
> On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko  
> wrote:
> > On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf  
> > wrote:
> >> For inline asm statements which have a CALL instruction, we list the
> >> stack pointer as a constraint to convince GCC to ensure the frame
> >> pointer is set up first:
> >>
> >>   static inline void foo()
> >>   {
> >> register void *__sp asm(_ASM_SP);
> >> asm("call bar" : "+r" (__sp))
> >>   }
> >>
> >> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> >>
> >> There's actually an easier way to achieve the same goal in GCC, without
> >> causing trouble for clang.  If we declare the stack pointer register
> >> variable as a global variable, and remove the constraint altogether,
> >> that convinces GCC to always set up the frame pointer before inserting
> >> *any* inline asm.
> >>
> >> It basically acts as if *every* inline asm statement has a CALL
> >> instruction.  It's a bit overkill, but the performance impact should be
> >> negligible.
> >>
> >> Here are the vmlinux .text size differences with the following configs
> >> on GCC:
> >>
> >> - defconfig
> >> - defconfig without frame pointers
> >> - Fedora distro config
> >> - Fedora distro config without frame pointers
> >>
> >> defconfig   defconfig-nofp  distro  distro-nofp
> >> before  9796300 9466764 9076191 8789745
> >> after   9796941 9462859 9076381 8785325
> >>
> >> With frame pointers, the text size increases slightly.  Without frame
> >> pointers, the text size decreases, and a little more significantly.
> >>
> >> Reported-by: Matthias Kaehlcke 
> >> Signed-off-by: Josh Poimboeuf 
> >> ---
> >>  arch/x86/include/asm/alternative.h   |  3 +--
> >>  arch/x86/include/asm/asm.h   |  9 
> >>  arch/x86/include/asm/mshyperv.h  | 27 
> >> ++--
> >>  arch/x86/include/asm/paravirt_types.h| 14 ++--
> >>  arch/x86/include/asm/preempt.h   | 15 +
> >>  arch/x86/include/asm/processor.h |  6 ++
> >>  arch/x86/include/asm/rwsem.h |  6 +++---
> >>  arch/x86/include/asm/uaccess.h   |  5 ++---
> >>  arch/x86/include/asm/xen/hypercall.h |  5 ++---
> >>  arch/x86/kvm/emulate.c   |  3 +--
> >>  arch/x86/kvm/vmx.c   |  4 +---
> >>  arch/x86/mm/fault.c  |  3 +--
> >>  tools/objtool/Documentation/stack-validation.txt | 20 +-
> >>  13 files changed, 60 insertions(+), 60 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/alternative.h 
> >> b/arch/x86/include/asm/alternative.h
> >> index 1b020381ab38..7aeb1cef4204 100644
> >> --- a/arch/x86/include/asm/alternative.h
> >> +++ b/arch/x86/include/asm/alternative.h
> >> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void 
> >> *start, void *end)
> >>  #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, 
> >> feature2,   \
> >>output, input...)   
> >>\
> >>  { 
> >>\
> >> -   register void *__sp asm(_ASM_SP);  
> >>\
> >> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", 
> >> feature1,\
> >> "call %P[new2]", feature2) 
> >>\
> >> -   : output, "+r" (__sp)  
> >>\
> >> +   : output   
> >>\
> >> : [old] "i" (oldfunc), [new1] "i" (newfunc1),  
> >>\
> >>   [new2] "i" (newfunc2), ## input);
> >>\
> >>  }
> >> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> >> index 676ee5807d86..ff8921d4615b 100644
> >> --- a/arch/x86/include/asm/asm.h
> >> +++ b/arch/x86/include/asm/asm.h
> >> @@ -132,4 +132,13 @@
> >>  /* For C file, we already have NOKPROBE_SYMBOL macro */
> >>  #endif
> >>
> >> +#ifndef __ASSEMBLY__
> >> +/*
> >> + * This variable declaration has the side effect of forcing GCC to always 
> >> set
> >> + * up the frame pointer before inserting any inline asm.  This is 
> >> necessary
> >> + * because some inline asm statements have CALL instructions.
> >> + */
> >> +register unsigned int __sp_unused asm("esp");

> > Shouldn't this be "asm(_ASM_SP)"?

> Answering my own question, this can't be _ASM_SP because of the
> realmode code, which is built with -m16 whereas _ASM_SP is "rsp".
> The patch works fine with "esp" though - most certainly declaring a
> ESP variable 

Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-19 Thread Josh Poimboeuf
On Tue, Sep 19, 2017 at 03:25:59PM -0700, Alexander Potapenko wrote:
> On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko  
> wrote:
> > On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf  
> > wrote:
> >> For inline asm statements which have a CALL instruction, we list the
> >> stack pointer as a constraint to convince GCC to ensure the frame
> >> pointer is set up first:
> >>
> >>   static inline void foo()
> >>   {
> >> register void *__sp asm(_ASM_SP);
> >> asm("call bar" : "+r" (__sp))
> >>   }
> >>
> >> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> >>
> >> There's actually an easier way to achieve the same goal in GCC, without
> >> causing trouble for clang.  If we declare the stack pointer register
> >> variable as a global variable, and remove the constraint altogether,
> >> that convinces GCC to always set up the frame pointer before inserting
> >> *any* inline asm.
> >>
> >> It basically acts as if *every* inline asm statement has a CALL
> >> instruction.  It's a bit overkill, but the performance impact should be
> >> negligible.
> >>
> >> Here are the vmlinux .text size differences with the following configs
> >> on GCC:
> >>
> >> - defconfig
> >> - defconfig without frame pointers
> >> - Fedora distro config
> >> - Fedora distro config without frame pointers
> >>
> >> defconfig   defconfig-nofp  distro  distro-nofp
> >> before  9796300 9466764 9076191 8789745
> >> after   9796941 9462859 9076381 8785325
> >>
> >> With frame pointers, the text size increases slightly.  Without frame
> >> pointers, the text size decreases, and a little more significantly.
> >>
> >> Reported-by: Matthias Kaehlcke 
> >> Signed-off-by: Josh Poimboeuf 
> >> ---
> >>  arch/x86/include/asm/alternative.h   |  3 +--
> >>  arch/x86/include/asm/asm.h   |  9 
> >>  arch/x86/include/asm/mshyperv.h  | 27 
> >> ++--
> >>  arch/x86/include/asm/paravirt_types.h| 14 ++--
> >>  arch/x86/include/asm/preempt.h   | 15 +
> >>  arch/x86/include/asm/processor.h |  6 ++
> >>  arch/x86/include/asm/rwsem.h |  6 +++---
> >>  arch/x86/include/asm/uaccess.h   |  5 ++---
> >>  arch/x86/include/asm/xen/hypercall.h |  5 ++---
> >>  arch/x86/kvm/emulate.c   |  3 +--
> >>  arch/x86/kvm/vmx.c   |  4 +---
> >>  arch/x86/mm/fault.c  |  3 +--
> >>  tools/objtool/Documentation/stack-validation.txt | 20 +-
> >>  13 files changed, 60 insertions(+), 60 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/alternative.h 
> >> b/arch/x86/include/asm/alternative.h
> >> index 1b020381ab38..7aeb1cef4204 100644
> >> --- a/arch/x86/include/asm/alternative.h
> >> +++ b/arch/x86/include/asm/alternative.h
> >> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void 
> >> *start, void *end)
> >>  #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, 
> >> feature2,   \
> >>output, input...)   
> >>\
> >>  { 
> >>\
> >> -   register void *__sp asm(_ASM_SP);  
> >>\
> >> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", 
> >> feature1,\
> >> "call %P[new2]", feature2) 
> >>\
> >> -   : output, "+r" (__sp)  
> >>\
> >> +   : output   
> >>\
> >> : [old] "i" (oldfunc), [new1] "i" (newfunc1),  
> >>\
> >>   [new2] "i" (newfunc2), ## input);
> >>\
> >>  }
> >> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> >> index 676ee5807d86..ff8921d4615b 100644
> >> --- a/arch/x86/include/asm/asm.h
> >> +++ b/arch/x86/include/asm/asm.h
> >> @@ -132,4 +132,13 @@
> >>  /* For C file, we already have NOKPROBE_SYMBOL macro */
> >>  #endif
> >>
> >> +#ifndef __ASSEMBLY__
> >> +/*
> >> + * This variable declaration has the side effect of forcing GCC to always 
> >> set
> >> + * up the frame pointer before inserting any inline asm.  This is 
> >> necessary
> >> + * because some inline asm statements have CALL instructions.
> >> + */
> >> +register unsigned int __sp_unused asm("esp");

> > Shouldn't this be "asm(_ASM_SP)"?

> Answering my own question, this can't be _ASM_SP because of the
> realmode code, which is built with -m16 whereas _ASM_SP is "rsp".
> The patch works fine with "esp" though - most certainly declaring a
> ESP variable is enough to reserve the whole RSP in an x86_64 build.

Right, clang failing to 

Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-19 Thread Josh Poimboeuf
On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
> 
>   static inline void foo()
>   {
>   register void *__sp asm(_ASM_SP);
>   asm("call bar" : "+r" (__sp))
>   }
> 
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> 
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang.  If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
> 
> It basically acts as if *every* inline asm statement has a CALL
> instruction.  It's a bit overkill, but the performance impact should be
> negligible.
> 
> Here are the vmlinux .text size differences with the following configs
> on GCC:
> 
> - defconfig
> - defconfig without frame pointers
> - Fedora distro config
> - Fedora distro config without frame pointers
> 
>   defconfig   defconfig-nofp  distro  distro-nofp
> before9796300 9466764 9076191 8789745
> after 9796941 9462859 9076381 8785325
> 
> With frame pointers, the text size increases slightly.  Without frame
> pointers, the text size decreases, and a little more significantly.
> 
> Reported-by: Matthias Kaehlcke 
> Signed-off-by: Josh Poimboeuf 

NAK - kbuild bot is reporting some cases where this patch doesn't force
the frame pointer setup.

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-19 Thread Josh Poimboeuf
On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
> 
>   static inline void foo()
>   {
>   register void *__sp asm(_ASM_SP);
>   asm("call bar" : "+r" (__sp))
>   }
> 
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
> 
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang.  If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
> 
> It basically acts as if *every* inline asm statement has a CALL
> instruction.  It's a bit overkill, but the performance impact should be
> negligible.
> 
> Here are the vmlinux .text size differences with the following configs
> on GCC:
> 
> - defconfig
> - defconfig without frame pointers
> - Fedora distro config
> - Fedora distro config without frame pointers
> 
>   defconfig   defconfig-nofp  distro  distro-nofp
> before9796300 9466764 9076191 8789745
> after 9796941 9462859 9076381 8785325
> 
> With frame pointers, the text size increases slightly.  Without frame
> pointers, the text size decreases, and a little more significantly.
> 
> Reported-by: Matthias Kaehlcke 
> Signed-off-by: Josh Poimboeuf 

NAK - kbuild bot is reporting some cases where this patch doesn't force
the frame pointer setup.

-- 
Josh


Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-19 Thread Alexander Potapenko
On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko  wrote:
> On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf  wrote:
>> For inline asm statements which have a CALL instruction, we list the
>> stack pointer as a constraint to convince GCC to ensure the frame
>> pointer is set up first:
>>
>>   static inline void foo()
>>   {
>> register void *__sp asm(_ASM_SP);
>> asm("call bar" : "+r" (__sp))
>>   }
>>
>> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>>
>> There's actually an easier way to achieve the same goal in GCC, without
>> causing trouble for clang.  If we declare the stack pointer register
>> variable as a global variable, and remove the constraint altogether,
>> that convinces GCC to always set up the frame pointer before inserting
>> *any* inline asm.
>>
>> It basically acts as if *every* inline asm statement has a CALL
>> instruction.  It's a bit overkill, but the performance impact should be
>> negligible.
>>
>> Here are the vmlinux .text size differences with the following configs
>> on GCC:
>>
>> - defconfig
>> - defconfig without frame pointers
>> - Fedora distro config
>> - Fedora distro config without frame pointers
>>
>> defconfig   defconfig-nofp  distro  distro-nofp
>> before  9796300 9466764 9076191 8789745
>> after   9796941 9462859 9076381 8785325
>>
>> With frame pointers, the text size increases slightly.  Without frame
>> pointers, the text size decreases, and a little more significantly.
>>
>> Reported-by: Matthias Kaehlcke 
>> Signed-off-by: Josh Poimboeuf 
>> ---
>>  arch/x86/include/asm/alternative.h   |  3 +--
>>  arch/x86/include/asm/asm.h   |  9 
>>  arch/x86/include/asm/mshyperv.h  | 27 
>> ++--
>>  arch/x86/include/asm/paravirt_types.h| 14 ++--
>>  arch/x86/include/asm/preempt.h   | 15 +
>>  arch/x86/include/asm/processor.h |  6 ++
>>  arch/x86/include/asm/rwsem.h |  6 +++---
>>  arch/x86/include/asm/uaccess.h   |  5 ++---
>>  arch/x86/include/asm/xen/hypercall.h |  5 ++---
>>  arch/x86/kvm/emulate.c   |  3 +--
>>  arch/x86/kvm/vmx.c   |  4 +---
>>  arch/x86/mm/fault.c  |  3 +--
>>  tools/objtool/Documentation/stack-validation.txt | 20 +-
>>  13 files changed, 60 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/alternative.h 
>> b/arch/x86/include/asm/alternative.h
>> index 1b020381ab38..7aeb1cef4204 100644
>> --- a/arch/x86/include/asm/alternative.h
>> +++ b/arch/x86/include/asm/alternative.h
>> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void 
>> *start, void *end)
>>  #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, 
>>   \
>>output, input...) 
>>  \
>>  {   
>>  \
>> -   register void *__sp asm(_ASM_SP);
>>  \
>> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", 
>> feature1,\
>> "call %P[new2]", feature2)   
>>  \
>> -   : output, "+r" (__sp)
>>  \
>> +   : output 
>>  \
>> : [old] "i" (oldfunc), [new1] "i" (newfunc1),
>>  \
>>   [new2] "i" (newfunc2), ## input);  
>>  \
>>  }
>> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
>> index 676ee5807d86..ff8921d4615b 100644
>> --- a/arch/x86/include/asm/asm.h
>> +++ b/arch/x86/include/asm/asm.h
>> @@ -132,4 +132,13 @@
>>  /* For C file, we already have NOKPROBE_SYMBOL macro */
>>  #endif
>>
>> +#ifndef __ASSEMBLY__
>> +/*
>> + * This variable declaration has the side effect of forcing GCC to always 
>> set
>> + * up the frame pointer before inserting any inline asm.  This is necessary
>> + * because some inline asm statements have CALL instructions.
>> + */
>> +register unsigned int __sp_unused asm("esp");
> Shouldn't this be "asm(_ASM_SP)"?
Answering my own question, this can't be _ASM_SP because of the
realmode code, which is built with -m16 whereas _ASM_SP is "rsp".
The patch works fine with "esp" though - most certainly declaring a
ESP variable is enough to reserve the whole RSP in an x86_64 build.
>> +#endif
>> +
>>  #endif /* _ASM_X86_ASM_H */
>> diff --git a/arch/x86/include/asm/mshyperv.h 
>> b/arch/x86/include/asm/mshyperv.h
>> index 63cc96f064dc..1c913ae27f99 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ 

Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-19 Thread Alexander Potapenko
On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko  wrote:
> On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf  wrote:
>> For inline asm statements which have a CALL instruction, we list the
>> stack pointer as a constraint to convince GCC to ensure the frame
>> pointer is set up first:
>>
>>   static inline void foo()
>>   {
>> register void *__sp asm(_ASM_SP);
>> asm("call bar" : "+r" (__sp))
>>   }
>>
>> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>>
>> There's actually an easier way to achieve the same goal in GCC, without
>> causing trouble for clang.  If we declare the stack pointer register
>> variable as a global variable, and remove the constraint altogether,
>> that convinces GCC to always set up the frame pointer before inserting
>> *any* inline asm.
>>
>> It basically acts as if *every* inline asm statement has a CALL
>> instruction.  It's a bit overkill, but the performance impact should be
>> negligible.
>>
>> Here are the vmlinux .text size differences with the following configs
>> on GCC:
>>
>> - defconfig
>> - defconfig without frame pointers
>> - Fedora distro config
>> - Fedora distro config without frame pointers
>>
>> defconfig   defconfig-nofp  distro  distro-nofp
>> before  9796300 9466764 9076191 8789745
>> after   9796941 9462859 9076381 8785325
>>
>> With frame pointers, the text size increases slightly.  Without frame
>> pointers, the text size decreases, and a little more significantly.
>>
>> Reported-by: Matthias Kaehlcke 
>> Signed-off-by: Josh Poimboeuf 
>> ---
>>  arch/x86/include/asm/alternative.h   |  3 +--
>>  arch/x86/include/asm/asm.h   |  9 
>>  arch/x86/include/asm/mshyperv.h  | 27 
>> ++--
>>  arch/x86/include/asm/paravirt_types.h| 14 ++--
>>  arch/x86/include/asm/preempt.h   | 15 +
>>  arch/x86/include/asm/processor.h |  6 ++
>>  arch/x86/include/asm/rwsem.h |  6 +++---
>>  arch/x86/include/asm/uaccess.h   |  5 ++---
>>  arch/x86/include/asm/xen/hypercall.h |  5 ++---
>>  arch/x86/kvm/emulate.c   |  3 +--
>>  arch/x86/kvm/vmx.c   |  4 +---
>>  arch/x86/mm/fault.c  |  3 +--
>>  tools/objtool/Documentation/stack-validation.txt | 20 +-
>>  13 files changed, 60 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/alternative.h 
>> b/arch/x86/include/asm/alternative.h
>> index 1b020381ab38..7aeb1cef4204 100644
>> --- a/arch/x86/include/asm/alternative.h
>> +++ b/arch/x86/include/asm/alternative.h
>> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void 
>> *start, void *end)
>>  #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, 
>>   \
>>output, input...) 
>>  \
>>  {   
>>  \
>> -   register void *__sp asm(_ASM_SP);
>>  \
>> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", 
>> feature1,\
>> "call %P[new2]", feature2)   
>>  \
>> -   : output, "+r" (__sp)
>>  \
>> +   : output 
>>  \
>> : [old] "i" (oldfunc), [new1] "i" (newfunc1),
>>  \
>>   [new2] "i" (newfunc2), ## input);  
>>  \
>>  }
>> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
>> index 676ee5807d86..ff8921d4615b 100644
>> --- a/arch/x86/include/asm/asm.h
>> +++ b/arch/x86/include/asm/asm.h
>> @@ -132,4 +132,13 @@
>>  /* For C file, we already have NOKPROBE_SYMBOL macro */
>>  #endif
>>
>> +#ifndef __ASSEMBLY__
>> +/*
>> + * This variable declaration has the side effect of forcing GCC to always 
>> set
>> + * up the frame pointer before inserting any inline asm.  This is necessary
>> + * because some inline asm statements have CALL instructions.
>> + */
>> +register unsigned int __sp_unused asm("esp");
> Shouldn't this be "asm(_ASM_SP)"?
Answering my own question, this can't be _ASM_SP because of the
realmode code, which is built with -m16 whereas _ASM_SP is "rsp".
The patch works fine with "esp" though - most certainly declaring a
ESP variable is enough to reserve the whole RSP in an x86_64 build.
>> +#endif
>> +
>>  #endif /* _ASM_X86_ASM_H */
>> diff --git a/arch/x86/include/asm/mshyperv.h 
>> b/arch/x86/include/asm/mshyperv.h
>> index 63cc96f064dc..1c913ae27f99 100644
>> --- a/arch/x86/include/asm/mshyperv.h
>> +++ b/arch/x86/include/asm/mshyperv.h
>> @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 

Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-19 Thread Alexander Potapenko
On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf  wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
>
>   static inline void foo()
>   {
> register void *__sp asm(_ASM_SP);
> asm("call bar" : "+r" (__sp))
>   }
>
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang.  If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
>
> It basically acts as if *every* inline asm statement has a CALL
> instruction.  It's a bit overkill, but the performance impact should be
> negligible.
>
> Here are the vmlinux .text size differences with the following configs
> on GCC:
>
> - defconfig
> - defconfig without frame pointers
> - Fedora distro config
> - Fedora distro config without frame pointers
>
> defconfig   defconfig-nofp  distro  distro-nofp
> before  9796300 9466764 9076191 8789745
> after   9796941 9462859 9076381 8785325
>
> With frame pointers, the text size increases slightly.  Without frame
> pointers, the text size decreases, and a little more significantly.
>
> Reported-by: Matthias Kaehlcke 
> Signed-off-by: Josh Poimboeuf 
> ---
>  arch/x86/include/asm/alternative.h   |  3 +--
>  arch/x86/include/asm/asm.h   |  9 
>  arch/x86/include/asm/mshyperv.h  | 27 
> ++--
>  arch/x86/include/asm/paravirt_types.h| 14 ++--
>  arch/x86/include/asm/preempt.h   | 15 +
>  arch/x86/include/asm/processor.h |  6 ++
>  arch/x86/include/asm/rwsem.h |  6 +++---
>  arch/x86/include/asm/uaccess.h   |  5 ++---
>  arch/x86/include/asm/xen/hypercall.h |  5 ++---
>  arch/x86/kvm/emulate.c   |  3 +--
>  arch/x86/kvm/vmx.c   |  4 +---
>  arch/x86/mm/fault.c  |  3 +--
>  tools/objtool/Documentation/stack-validation.txt | 20 +-
>  13 files changed, 60 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h 
> b/arch/x86/include/asm/alternative.h
> index 1b020381ab38..7aeb1cef4204 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void 
> *start, void *end)
>  #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2,  
>  \
>output, input...)  
> \
>  {
> \
> -   register void *__sp asm(_ASM_SP); 
> \
> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", 
> feature1,\
> "call %P[new2]", feature2)
> \
> -   : output, "+r" (__sp) 
> \
> +   : output  
> \
> : [old] "i" (oldfunc), [new1] "i" (newfunc1), 
> \
>   [new2] "i" (newfunc2), ## input);   
> \
>  }
> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> index 676ee5807d86..ff8921d4615b 100644
> --- a/arch/x86/include/asm/asm.h
> +++ b/arch/x86/include/asm/asm.h
> @@ -132,4 +132,13 @@
>  /* For C file, we already have NOKPROBE_SYMBOL macro */
>  #endif
>
> +#ifndef __ASSEMBLY__
> +/*
> + * This variable declaration has the side effect of forcing GCC to always set
> + * up the frame pointer before inserting any inline asm.  This is necessary
> + * because some inline asm statements have CALL instructions.
> + */
> +register unsigned int __sp_unused asm("esp");
Shouldn't this be "asm(_ASM_SP)"?
> +#endif
> +
>  #endif /* _ASM_X86_ASM_H */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 63cc96f064dc..1c913ae27f99 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void 
> *input, void *output)
> u64 input_address = input ? virt_to_phys(input) : 0;
> u64 output_address = output ? virt_to_phys(output) : 0;
> u64 hv_status;
> -   register void *__sp asm(_ASM_SP);
>
>  #ifdef CONFIG_X86_64
> if (!hv_hypercall_pg)
> return U64_MAX;
>
> -   __asm__ __volatile__("mov %4, %%r8\n"
> - 

Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang

2017-09-19 Thread Alexander Potapenko
On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf  wrote:
> For inline asm statements which have a CALL instruction, we list the
> stack pointer as a constraint to convince GCC to ensure the frame
> pointer is set up first:
>
>   static inline void foo()
>   {
> register void *__sp asm(_ASM_SP);
> asm("call bar" : "+r" (__sp))
>   }
>
> Unfortunately, that pattern causes clang to corrupt the stack pointer.
>
> There's actually an easier way to achieve the same goal in GCC, without
> causing trouble for clang.  If we declare the stack pointer register
> variable as a global variable, and remove the constraint altogether,
> that convinces GCC to always set up the frame pointer before inserting
> *any* inline asm.
>
> It basically acts as if *every* inline asm statement has a CALL
> instruction.  It's a bit overkill, but the performance impact should be
> negligible.
>
> Here are the vmlinux .text size differences with the following configs
> on GCC:
>
> - defconfig
> - defconfig without frame pointers
> - Fedora distro config
> - Fedora distro config without frame pointers
>
> defconfig   defconfig-nofp  distro  distro-nofp
> before  9796300 9466764 9076191 8789745
> after   9796941 9462859 9076381 8785325
>
> With frame pointers, the text size increases slightly.  Without frame
> pointers, the text size decreases, and a little more significantly.
>
> Reported-by: Matthias Kaehlcke 
> Signed-off-by: Josh Poimboeuf 
> ---
>  arch/x86/include/asm/alternative.h   |  3 +--
>  arch/x86/include/asm/asm.h   |  9 
>  arch/x86/include/asm/mshyperv.h  | 27 
> ++--
>  arch/x86/include/asm/paravirt_types.h| 14 ++--
>  arch/x86/include/asm/preempt.h   | 15 +
>  arch/x86/include/asm/processor.h |  6 ++
>  arch/x86/include/asm/rwsem.h |  6 +++---
>  arch/x86/include/asm/uaccess.h   |  5 ++---
>  arch/x86/include/asm/xen/hypercall.h |  5 ++---
>  arch/x86/kvm/emulate.c   |  3 +--
>  arch/x86/kvm/vmx.c   |  4 +---
>  arch/x86/mm/fault.c  |  3 +--
>  tools/objtool/Documentation/stack-validation.txt | 20 +-
>  13 files changed, 60 insertions(+), 60 deletions(-)
>
> diff --git a/arch/x86/include/asm/alternative.h 
> b/arch/x86/include/asm/alternative.h
> index 1b020381ab38..7aeb1cef4204 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void 
> *start, void *end)
>  #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2,  
>  \
>output, input...)  
> \
>  {
> \
> -   register void *__sp asm(_ASM_SP); 
> \
> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", 
> feature1,\
> "call %P[new2]", feature2)
> \
> -   : output, "+r" (__sp) 
> \
> +   : output  
> \
> : [old] "i" (oldfunc), [new1] "i" (newfunc1), 
> \
>   [new2] "i" (newfunc2), ## input);   
> \
>  }
> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> index 676ee5807d86..ff8921d4615b 100644
> --- a/arch/x86/include/asm/asm.h
> +++ b/arch/x86/include/asm/asm.h
> @@ -132,4 +132,13 @@
>  /* For C file, we already have NOKPROBE_SYMBOL macro */
>  #endif
>
> +#ifndef __ASSEMBLY__
> +/*
> + * This variable declaration has the side effect of forcing GCC to always set
> + * up the frame pointer before inserting any inline asm.  This is necessary
> + * because some inline asm statements have CALL instructions.
> + */
> +register unsigned int __sp_unused asm("esp");
Shouldn't this be "asm(_ASM_SP)"?
> +#endif
> +
>  #endif /* _ASM_X86_ASM_H */
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 63cc96f064dc..1c913ae27f99 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void 
> *input, void *output)
> u64 input_address = input ? virt_to_phys(input) : 0;
> u64 output_address = output ? virt_to_phys(output) : 0;
> u64 hv_status;
> -   register void *__sp asm(_ASM_SP);
>
>  #ifdef CONFIG_X86_64
> if (!hv_hypercall_pg)
> return U64_MAX;
>
> -   __asm__ __volatile__("mov %4, %%r8\n"
> -"call *%5"
> -