Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Thu, Sep 21, 2017 at 05:35:11PM +0200, Ingo Molnar wrote: > > * Josh Poimboeufwrote: > > > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote: > > > On 09/19/17 11:45, Josh Poimboeuf wrote: > > > > For inline asm statements which have a CALL instruction, we list the > > > > stack pointer as a constraint to convince GCC to ensure the frame > > > > pointer is set up first: > > > > > > > > static inline void foo() > > > > { > > > > register void *__sp asm(_ASM_SP); > > > > asm("call bar" : "+r" (__sp)) > > > > } > > > > > > > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > > > > > > > There's actually an easier way to achieve the same goal in GCC, without > > > > causing trouble for clang. If we declare the stack pointer register > > > > variable as a global variable, and remove the constraint altogether, > > > > that convinces GCC to always set up the frame pointer before inserting > > > > *any* inline asm. > > > > > > > > It basically acts as if *every* inline asm statement has a CALL > > > > instruction. It's a bit overkill, but the performance impact should be > > > > negligible. > > > > > > > > > > Again, probably negligible, but why do we need a frame pointer just > > > because we have a call assembly instruction? > > > > It's frame pointer convention. Without it, if dumping the stack from > > the called function, a function will get skipped in the stack trace. > > BTW., could we perhaps relax this and simply phase out the frame pointer on > x86, > and simplify all our assembly in a cycle or two? ORC unwinder is working out > very > well so far. Live kernel patching can use ORC data just fine, and nothing > else > actually relies on frame pointers, right? > > That would give one more register to assembly code. > > I realize that we just rewrote a whole bunch of assembly code... but that was > the > price for ORC, in a way. I think ORC isn't *quite* ready for livepatch yet, though it's close. We could probably make it ready in a cycle or two. However, I'm not sure whether we would want to remove livepatch support for frame pointers yet: - There might be some embedded livepatch users who can't deal with the extra 3MB of RAM needed by ORC. (Though this is purely theoretical, I don't actually know anybody with this problem. I suppose we could float the idea on the livepatch/kpatch mailing lists and see if anybody objects.) - Objtool's ORC generation is working fine now, but objtool maintenance is currently a little heavy-handed, and I'm currently the only person who knows how to do it. I've got an idea brewing for how to improve its maintainability with the help of compiler plugins, but until then, if I get hit by a bus tomorrow, who's going to pick it up? It's nice to have frame pointers as a backup solution for live patching. But also, is this a problem that's even worth fixing? Do we have any assembly code that would be noticeably better off with that extra register? Most of the changes were: - adding FRAME_BEGIN/FRAME_END to some asm functions; - juggling register usage in a few crypto functions; and - adding the stack pointer constraint to 15 or so inline asm functions. I think those changes weren't all that disruptive to start with. -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Thu, Sep 21, 2017 at 05:35:11PM +0200, Ingo Molnar wrote: > > * Josh Poimboeuf wrote: > > > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote: > > > On 09/19/17 11:45, Josh Poimboeuf wrote: > > > > For inline asm statements which have a CALL instruction, we list the > > > > stack pointer as a constraint to convince GCC to ensure the frame > > > > pointer is set up first: > > > > > > > > static inline void foo() > > > > { > > > > register void *__sp asm(_ASM_SP); > > > > asm("call bar" : "+r" (__sp)) > > > > } > > > > > > > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > > > > > > > There's actually an easier way to achieve the same goal in GCC, without > > > > causing trouble for clang. If we declare the stack pointer register > > > > variable as a global variable, and remove the constraint altogether, > > > > that convinces GCC to always set up the frame pointer before inserting > > > > *any* inline asm. > > > > > > > > It basically acts as if *every* inline asm statement has a CALL > > > > instruction. It's a bit overkill, but the performance impact should be > > > > negligible. > > > > > > > > > > Again, probably negligible, but why do we need a frame pointer just > > > because we have a call assembly instruction? > > > > It's frame pointer convention. Without it, if dumping the stack from > > the called function, a function will get skipped in the stack trace. > > BTW., could we perhaps relax this and simply phase out the frame pointer on > x86, > and simplify all our assembly in a cycle or two? ORC unwinder is working out > very > well so far. Live kernel patching can use ORC data just fine, and nothing > else > actually relies on frame pointers, right? > > That would give one more register to assembly code. > > I realize that we just rewrote a whole bunch of assembly code... but that was > the > price for ORC, in a way. I think ORC isn't *quite* ready for livepatch yet, though it's close. We could probably make it ready in a cycle or two. However, I'm not sure whether we would want to remove livepatch support for frame pointers yet: - There might be some embedded livepatch users who can't deal with the extra 3MB of RAM needed by ORC. (Though this is purely theoretical, I don't actually know anybody with this problem. I suppose we could float the idea on the livepatch/kpatch mailing lists and see if anybody objects.) - Objtool's ORC generation is working fine now, but objtool maintenance is currently a little heavy-handed, and I'm currently the only person who knows how to do it. I've got an idea brewing for how to improve its maintainability with the help of compiler plugins, but until then, if I get hit by a bus tomorrow, who's going to pick it up? It's nice to have frame pointers as a backup solution for live patching. But also, is this a problem that's even worth fixing? Do we have any assembly code that would be noticeably better off with that extra register? Most of the changes were: - adding FRAME_BEGIN/FRAME_END to some asm functions; - juggling register usage in a few crypto functions; and - adding the stack pointer constraint to 15 or so inline asm functions. I think those changes weren't all that disruptive to start with. -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
* Josh Poimboeufwrote: > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote: > > On 09/19/17 11:45, Josh Poimboeuf wrote: > > > For inline asm statements which have a CALL instruction, we list the > > > stack pointer as a constraint to convince GCC to ensure the frame > > > pointer is set up first: > > > > > > static inline void foo() > > > { > > > register void *__sp asm(_ASM_SP); > > > asm("call bar" : "+r" (__sp)) > > > } > > > > > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > > > > > There's actually an easier way to achieve the same goal in GCC, without > > > causing trouble for clang. If we declare the stack pointer register > > > variable as a global variable, and remove the constraint altogether, > > > that convinces GCC to always set up the frame pointer before inserting > > > *any* inline asm. > > > > > > It basically acts as if *every* inline asm statement has a CALL > > > instruction. It's a bit overkill, but the performance impact should be > > > negligible. > > > > > > > Again, probably negligible, but why do we need a frame pointer just > > because we have a call assembly instruction? > > It's frame pointer convention. Without it, if dumping the stack from > the called function, a function will get skipped in the stack trace. BTW., could we perhaps relax this and simply phase out the frame pointer on x86, and simplify all our assembly in a cycle or two? ORC unwinder is working out very well so far. Live kernel patching can use ORC data just fine, and nothing else actually relies on frame pointers, right? That would give one more register to assembly code. I realize that we just rewrote a whole bunch of assembly code... but that was the price for ORC, in a way. Thanks, Ingo
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
* Josh Poimboeuf wrote: > On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote: > > On 09/19/17 11:45, Josh Poimboeuf wrote: > > > For inline asm statements which have a CALL instruction, we list the > > > stack pointer as a constraint to convince GCC to ensure the frame > > > pointer is set up first: > > > > > > static inline void foo() > > > { > > > register void *__sp asm(_ASM_SP); > > > asm("call bar" : "+r" (__sp)) > > > } > > > > > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > > > > > There's actually an easier way to achieve the same goal in GCC, without > > > causing trouble for clang. If we declare the stack pointer register > > > variable as a global variable, and remove the constraint altogether, > > > that convinces GCC to always set up the frame pointer before inserting > > > *any* inline asm. > > > > > > It basically acts as if *every* inline asm statement has a CALL > > > instruction. It's a bit overkill, but the performance impact should be > > > negligible. > > > > > > > Again, probably negligible, but why do we need a frame pointer just > > because we have a call assembly instruction? > > It's frame pointer convention. Without it, if dumping the stack from > the called function, a function will get skipped in the stack trace. BTW., could we perhaps relax this and simply phase out the frame pointer on x86, and simplify all our assembly in a cycle or two? ORC unwinder is working out very well so far. Live kernel patching can use ORC data just fine, and nothing else actually relies on frame pointers, right? That would give one more register to assembly code. I realize that we just rewrote a whole bunch of assembly code... but that was the price for ORC, in a way. Thanks, Ingo
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Thu, Sep 21, 2017 at 1:52 PM, Brian Gerstwrote: >>> I think we need just the frame itself and RSP pointing below this >>> frame. If we don't have a frame, CALL instruction will smash whatever >>> RSP happens to point to. Compiler doesn't have to setup RSP to point >>> below used part of stack in leaf functions. >>> >> >> In the kernel it does. Redzoning is not allowed in the kernel, because >> interrupts or exceptions would also smash the redzone. > > I see... But it's the same for user-space signals, the first thing a > signal should do is to skip the redzone. I guess interrupt handlers > should switch to interrupt stack which avoids smashing redzone > altogether. Do you mean nested interrupts/exceptions in interrupts? > In my experience frames in leaf functions can have pretty large > performance penalty. Wonder if we have we considered changing > interrupt/exception handlers to avoid smashing redzones and disable > leaf frames? Currently, on x86-64, I believe all exceptions have their own dedicated stacks in the kernel, but IRQs still come in on the task's kernel stack. Andy, do you know if there's a reason why IRQs don't use a dedicated IST stack? >>> >>> Because IST is awful due to recursion issues. We immediately switch to an >>> IRQ stack, though. >>> >>> If the kernel wanted a redzone, it would have to use IST for everything, >>> which would entail a bunch of unpleasant hackery. >> >> Thanks. >> >> I guess it must be finite recursion, because we could not handle >> infinite with finite stack. I thing that solves it is simply: >> >> sub $256, %rsp >> ... do stuff ... >> add $256, %rsp >> >> Don't know if it's applicable to interrupts or not. > > No, it is not. The processor pushes 5 or 6 words of data on the stack > (the IRET frame plus an error code for certain exceptions) before the > interrupt handler gets control. So without using the IST for stack > switching on every interrupt, the redzone cannot be used in the kernel > as it will get smashed by the IRET frame. In addition, since the > kernel's stack is limited in size, skipping 128 bytes on every > interrupt would overrun the stack faster. The small gain from using > the redzone in the kernel is outweighed by these limitations. I see, thanks for educating.
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Thu, Sep 21, 2017 at 1:52 PM, Brian Gerst wrote: >>> I think we need just the frame itself and RSP pointing below this >>> frame. If we don't have a frame, CALL instruction will smash whatever >>> RSP happens to point to. Compiler doesn't have to setup RSP to point >>> below used part of stack in leaf functions. >>> >> >> In the kernel it does. Redzoning is not allowed in the kernel, because >> interrupts or exceptions would also smash the redzone. > > I see... But it's the same for user-space signals, the first thing a > signal should do is to skip the redzone. I guess interrupt handlers > should switch to interrupt stack which avoids smashing redzone > altogether. Do you mean nested interrupts/exceptions in interrupts? > In my experience frames in leaf functions can have pretty large > performance penalty. Wonder if we have we considered changing > interrupt/exception handlers to avoid smashing redzones and disable > leaf frames? Currently, on x86-64, I believe all exceptions have their own dedicated stacks in the kernel, but IRQs still come in on the task's kernel stack. Andy, do you know if there's a reason why IRQs don't use a dedicated IST stack? >>> >>> Because IST is awful due to recursion issues. We immediately switch to an >>> IRQ stack, though. >>> >>> If the kernel wanted a redzone, it would have to use IST for everything, >>> which would entail a bunch of unpleasant hackery. >> >> Thanks. >> >> I guess it must be finite recursion, because we could not handle >> infinite with finite stack. I thing that solves it is simply: >> >> sub $256, %rsp >> ... do stuff ... >> add $256, %rsp >> >> Don't know if it's applicable to interrupts or not. > > No, it is not. The processor pushes 5 or 6 words of data on the stack > (the IRET frame plus an error code for certain exceptions) before the > interrupt handler gets control. So without using the IST for stack > switching on every interrupt, the redzone cannot be used in the kernel > as it will get smashed by the IRET frame. In addition, since the > kernel's stack is limited in size, skipping 128 bytes on every > interrupt would overrun the stack faster. The small gain from using > the redzone in the kernel is outweighed by these limitations. I see, thanks for educating.
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Thu, Sep 21, 2017 at 4:12 AM, Dmitry Vyukovwrote: > On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski wrote: On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote: > On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin wrote: >> On 09/20/17 10:38, Dmitry Vyukov wrote: >> >> I think we need just the frame itself and RSP pointing below this >> frame. If we don't have a frame, CALL instruction will smash whatever >> RSP happens to point to. Compiler doesn't have to setup RSP to point >> below used part of stack in leaf functions. >> > > In the kernel it does. Redzoning is not allowed in the kernel, because > interrupts or exceptions would also smash the redzone. I see... But it's the same for user-space signals, the first thing a signal should do is to skip the redzone. I guess interrupt handlers should switch to interrupt stack which avoids smashing redzone altogether. Do you mean nested interrupts/exceptions in interrupts? In my experience frames in leaf functions can have pretty large performance penalty. Wonder if we have we considered changing interrupt/exception handlers to avoid smashing redzones and disable leaf frames? >>> >>> Currently, on x86-64, I believe all exceptions have their own dedicated >>> stacks in the kernel, but IRQs still come in on the task's kernel stack. >>> >>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST >>> stack? >>> >> >> Because IST is awful due to recursion issues. We immediately switch to an >> IRQ stack, though. >> >> If the kernel wanted a redzone, it would have to use IST for everything, >> which would entail a bunch of unpleasant hackery. > > Thanks. > > I guess it must be finite recursion, because we could not handle > infinite with finite stack. I thing that solves it is simply: > > sub $256, %rsp > ... do stuff ... > add $256, %rsp > > Don't know if it's applicable to interrupts or not. No, it is not. The processor pushes 5 or 6 words of data on the stack (the IRET frame plus an error code for certain exceptions) before the interrupt handler gets control. So without using the IST for stack switching on every interrupt, the redzone cannot be used in the kernel as it will get smashed by the IRET frame. In addition, since the kernel's stack is limited in size, skipping 128 bytes on every interrupt would overrun the stack faster. The small gain from using the redzone in the kernel is outweighed by these limitations. -- Brian Gerst
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Thu, Sep 21, 2017 at 4:12 AM, Dmitry Vyukov wrote: > On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski wrote: On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote: > On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin wrote: >> On 09/20/17 10:38, Dmitry Vyukov wrote: >> >> I think we need just the frame itself and RSP pointing below this >> frame. If we don't have a frame, CALL instruction will smash whatever >> RSP happens to point to. Compiler doesn't have to setup RSP to point >> below used part of stack in leaf functions. >> > > In the kernel it does. Redzoning is not allowed in the kernel, because > interrupts or exceptions would also smash the redzone. I see... But it's the same for user-space signals, the first thing a signal should do is to skip the redzone. I guess interrupt handlers should switch to interrupt stack which avoids smashing redzone altogether. Do you mean nested interrupts/exceptions in interrupts? In my experience frames in leaf functions can have pretty large performance penalty. Wonder if we have we considered changing interrupt/exception handlers to avoid smashing redzones and disable leaf frames? >>> >>> Currently, on x86-64, I believe all exceptions have their own dedicated >>> stacks in the kernel, but IRQs still come in on the task's kernel stack. >>> >>> Andy, do you know if there's a reason why IRQs don't use a dedicated IST >>> stack? >>> >> >> Because IST is awful due to recursion issues. We immediately switch to an >> IRQ stack, though. >> >> If the kernel wanted a redzone, it would have to use IST for everything, >> which would entail a bunch of unpleasant hackery. > > Thanks. > > I guess it must be finite recursion, because we could not handle > infinite with finite stack. I thing that solves it is simply: > > sub $256, %rsp > ... do stuff ... > add $256, %rsp > > Don't know if it's applicable to interrupts or not. No, it is not. The processor pushes 5 or 6 words of data on the stack (the IRET frame plus an error code for certain exceptions) before the interrupt handler gets control. So without using the IST for stack switching on every interrupt, the redzone cannot be used in the kernel as it will get smashed by the IRET frame. In addition, since the kernel's stack is limited in size, skipping 128 bytes on every interrupt would overrun the stack faster. The small gain from using the redzone in the kernel is outweighed by these limitations. -- Brian Gerst
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirskiwrote: >>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote: On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin wrote: > On 09/20/17 10:38, Dmitry Vyukov wrote: > > I think we need just the frame itself and RSP pointing below this > frame. If we don't have a frame, CALL instruction will smash whatever > RSP happens to point to. Compiler doesn't have to setup RSP to point > below used part of stack in leaf functions. > In the kernel it does. Redzoning is not allowed in the kernel, because interrupts or exceptions would also smash the redzone. >>> >>> I see... But it's the same for user-space signals, the first thing a >>> signal should do is to skip the redzone. I guess interrupt handlers >>> should switch to interrupt stack which avoids smashing redzone >>> altogether. Do you mean nested interrupts/exceptions in interrupts? >>> In my experience frames in leaf functions can have pretty large >>> performance penalty. Wonder if we have we considered changing >>> interrupt/exception handlers to avoid smashing redzones and disable >>> leaf frames? >> >> Currently, on x86-64, I believe all exceptions have their own dedicated >> stacks in the kernel, but IRQs still come in on the task's kernel stack. >> >> Andy, do you know if there's a reason why IRQs don't use a dedicated IST >> stack? >> > > Because IST is awful due to recursion issues. We immediately switch to an > IRQ stack, though. > > If the kernel wanted a redzone, it would have to use IST for everything, > which would entail a bunch of unpleasant hackery. Thanks. I guess it must be finite recursion, because we could not handle infinite with finite stack. I thing that solves it is simply: sub $256, %rsp ... do stuff ... add $256, %rsp Don't know if it's applicable to interrupts or not.
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 11:19 PM, Andy Lutomirski wrote: >>> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote: On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin wrote: > On 09/20/17 10:38, Dmitry Vyukov wrote: > > I think we need just the frame itself and RSP pointing below this > frame. If we don't have a frame, CALL instruction will smash whatever > RSP happens to point to. Compiler doesn't have to setup RSP to point > below used part of stack in leaf functions. > In the kernel it does. Redzoning is not allowed in the kernel, because interrupts or exceptions would also smash the redzone. >>> >>> I see... But it's the same for user-space signals, the first thing a >>> signal should do is to skip the redzone. I guess interrupt handlers >>> should switch to interrupt stack which avoids smashing redzone >>> altogether. Do you mean nested interrupts/exceptions in interrupts? >>> In my experience frames in leaf functions can have pretty large >>> performance penalty. Wonder if we have we considered changing >>> interrupt/exception handlers to avoid smashing redzones and disable >>> leaf frames? >> >> Currently, on x86-64, I believe all exceptions have their own dedicated >> stacks in the kernel, but IRQs still come in on the task's kernel stack. >> >> Andy, do you know if there's a reason why IRQs don't use a dedicated IST >> stack? >> > > Because IST is awful due to recursion issues. We immediately switch to an > IRQ stack, though. > > If the kernel wanted a redzone, it would have to use IST for everything, > which would entail a bunch of unpleasant hackery. Thanks. I guess it must be finite recursion, because we could not handle infinite with finite stack. I thing that solves it is simply: sub $256, %rsp ... do stuff ... add $256, %rsp Don't know if it's applicable to interrupts or not.
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
> On Sep 20, 2017, at 2:07 PM, Josh Poimboeufwrote: > >> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote: >>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin wrote: On 09/20/17 10:38, Dmitry Vyukov wrote: I think we need just the frame itself and RSP pointing below this frame. If we don't have a frame, CALL instruction will smash whatever RSP happens to point to. Compiler doesn't have to setup RSP to point below used part of stack in leaf functions. >>> >>> In the kernel it does. Redzoning is not allowed in the kernel, because >>> interrupts or exceptions would also smash the redzone. >> >> I see... But it's the same for user-space signals, the first thing a >> signal should do is to skip the redzone. I guess interrupt handlers >> should switch to interrupt stack which avoids smashing redzone >> altogether. Do you mean nested interrupts/exceptions in interrupts? >> In my experience frames in leaf functions can have pretty large >> performance penalty. Wonder if we have we considered changing >> interrupt/exception handlers to avoid smashing redzones and disable >> leaf frames? > > Currently, on x86-64, I believe all exceptions have their own dedicated > stacks in the kernel, but IRQs still come in on the task's kernel stack. > > Andy, do you know if there's a reason why IRQs don't use a dedicated IST > stack? > Because IST is awful due to recursion issues. We immediately switch to an IRQ stack, though. If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery. > -- > Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
> On Sep 20, 2017, at 2:07 PM, Josh Poimboeuf wrote: > >> On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote: >>> On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin wrote: On 09/20/17 10:38, Dmitry Vyukov wrote: I think we need just the frame itself and RSP pointing below this frame. If we don't have a frame, CALL instruction will smash whatever RSP happens to point to. Compiler doesn't have to setup RSP to point below used part of stack in leaf functions. >>> >>> In the kernel it does. Redzoning is not allowed in the kernel, because >>> interrupts or exceptions would also smash the redzone. >> >> I see... But it's the same for user-space signals, the first thing a >> signal should do is to skip the redzone. I guess interrupt handlers >> should switch to interrupt stack which avoids smashing redzone >> altogether. Do you mean nested interrupts/exceptions in interrupts? >> In my experience frames in leaf functions can have pretty large >> performance penalty. Wonder if we have we considered changing >> interrupt/exception handlers to avoid smashing redzones and disable >> leaf frames? > > Currently, on x86-64, I believe all exceptions have their own dedicated > stacks in the kernel, but IRQs still come in on the task's kernel stack. > > Andy, do you know if there's a reason why IRQs don't use a dedicated IST > stack? > Because IST is awful due to recursion issues. We immediately switch to an IRQ stack, though. If the kernel wanted a redzone, it would have to use IST for everything, which would entail a bunch of unpleasant hackery. > -- > Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote: > On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvinwrote: > > On 09/20/17 10:38, Dmitry Vyukov wrote: > >> > >> I think we need just the frame itself and RSP pointing below this > >> frame. If we don't have a frame, CALL instruction will smash whatever > >> RSP happens to point to. Compiler doesn't have to setup RSP to point > >> below used part of stack in leaf functions. > >> > > > > In the kernel it does. Redzoning is not allowed in the kernel, because > > interrupts or exceptions would also smash the redzone. > > I see... But it's the same for user-space signals, the first thing a > signal should do is to skip the redzone. I guess interrupt handlers > should switch to interrupt stack which avoids smashing redzone > altogether. Do you mean nested interrupts/exceptions in interrupts? > In my experience frames in leaf functions can have pretty large > performance penalty. Wonder if we have we considered changing > interrupt/exception handlers to avoid smashing redzones and disable > leaf frames? Currently, on x86-64, I believe all exceptions have their own dedicated stacks in the kernel, but IRQs still come in on the task's kernel stack. Andy, do you know if there's a reason why IRQs don't use a dedicated IST stack? -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 08:01:02PM +0200, Dmitry Vyukov wrote: > On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin wrote: > > On 09/20/17 10:38, Dmitry Vyukov wrote: > >> > >> I think we need just the frame itself and RSP pointing below this > >> frame. If we don't have a frame, CALL instruction will smash whatever > >> RSP happens to point to. Compiler doesn't have to setup RSP to point > >> below used part of stack in leaf functions. > >> > > > > In the kernel it does. Redzoning is not allowed in the kernel, because > > interrupts or exceptions would also smash the redzone. > > I see... But it's the same for user-space signals, the first thing a > signal should do is to skip the redzone. I guess interrupt handlers > should switch to interrupt stack which avoids smashing redzone > altogether. Do you mean nested interrupts/exceptions in interrupts? > In my experience frames in leaf functions can have pretty large > performance penalty. Wonder if we have we considered changing > interrupt/exception handlers to avoid smashing redzones and disable > leaf frames? Currently, on x86-64, I believe all exceptions have their own dedicated stacks in the kernel, but IRQs still come in on the task's kernel stack. Andy, do you know if there's a reason why IRQs don't use a dedicated IST stack? -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvinwrote: > On 09/20/17 10:38, Dmitry Vyukov wrote: >> >> I think we need just the frame itself and RSP pointing below this >> frame. If we don't have a frame, CALL instruction will smash whatever >> RSP happens to point to. Compiler doesn't have to setup RSP to point >> below used part of stack in leaf functions. >> > > In the kernel it does. Redzoning is not allowed in the kernel, because > interrupts or exceptions would also smash the redzone. I see... But it's the same for user-space signals, the first thing a signal should do is to skip the redzone. I guess interrupt handlers should switch to interrupt stack which avoids smashing redzone altogether. Do you mean nested interrupts/exceptions in interrupts? In my experience frames in leaf functions can have pretty large performance penalty. Wonder if we have we considered changing interrupt/exception handlers to avoid smashing redzones and disable leaf frames?
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 7:46 PM, H. Peter Anvin wrote: > On 09/20/17 10:38, Dmitry Vyukov wrote: >> >> I think we need just the frame itself and RSP pointing below this >> frame. If we don't have a frame, CALL instruction will smash whatever >> RSP happens to point to. Compiler doesn't have to setup RSP to point >> below used part of stack in leaf functions. >> > > In the kernel it does. Redzoning is not allowed in the kernel, because > interrupts or exceptions would also smash the redzone. I see... But it's the same for user-space signals, the first thing a signal should do is to skip the redzone. I guess interrupt handlers should switch to interrupt stack which avoids smashing redzone altogether. Do you mean nested interrupts/exceptions in interrupts? In my experience frames in leaf functions can have pretty large performance penalty. Wonder if we have we considered changing interrupt/exception handlers to avoid smashing redzones and disable leaf frames?
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On 09/20/17 10:38, Dmitry Vyukov wrote: > > I think we need just the frame itself and RSP pointing below this > frame. If we don't have a frame, CALL instruction will smash whatever > RSP happens to point to. Compiler doesn't have to setup RSP to point > below used part of stack in leaf functions. > In the kernel it does. Redzoning is not allowed in the kernel, because interrupts or exceptions would also smash the redzone. -hpa
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On 09/20/17 10:38, Dmitry Vyukov wrote: > > I think we need just the frame itself and RSP pointing below this > frame. If we don't have a frame, CALL instruction will smash whatever > RSP happens to point to. Compiler doesn't have to setup RSP to point > below used part of stack in leaf functions. > In the kernel it does. Redzoning is not allowed in the kernel, because interrupts or exceptions would also smash the redzone. -hpa
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 08:18:23PM -0500, Josh Poimboeuf wrote: > On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote: > > For inline asm statements which have a CALL instruction, we list the > > stack pointer as a constraint to convince GCC to ensure the frame > > pointer is set up first: > > > > static inline void foo() > > { > > register void *__sp asm(_ASM_SP); > > asm("call bar" : "+r" (__sp)) > > } > > > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > > > There's actually an easier way to achieve the same goal in GCC, without > > causing trouble for clang. If we declare the stack pointer register > > variable as a global variable, and remove the constraint altogether, > > that convinces GCC to always set up the frame pointer before inserting > > *any* inline asm. > > > > It basically acts as if *every* inline asm statement has a CALL > > instruction. It's a bit overkill, but the performance impact should be > > negligible. > > > > Here are the vmlinux .text size differences with the following configs > > on GCC: > > > > - defconfig > > - defconfig without frame pointers > > - Fedora distro config > > - Fedora distro config without frame pointers > > > > defconfig defconfig-nofp distro distro-nofp > > before 9796300 9466764 9076191 8789745 > > after 9796941 9462859 9076381 8785325 > > > > With frame pointers, the text size increases slightly. Without frame > > pointers, the text size decreases, and a little more significantly. > > > > Reported-by: Matthias Kaehlcke> > Signed-off-by: Josh Poimboeuf > > NAK - kbuild bot is reporting some cases where this patch doesn't force > the frame pointer setup. So it turns out that for GCC 7, it works as described above: the global register variable results in the frame pointer getting set up before *all* inline asm. But for GCC 6, it doesn't work that way. The global register variable has no such effect. So we need both the global register variable *and* the output constraint after all. Will post another patch after some more testing. -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 08:18:23PM -0500, Josh Poimboeuf wrote: > On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote: > > For inline asm statements which have a CALL instruction, we list the > > stack pointer as a constraint to convince GCC to ensure the frame > > pointer is set up first: > > > > static inline void foo() > > { > > register void *__sp asm(_ASM_SP); > > asm("call bar" : "+r" (__sp)) > > } > > > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > > > There's actually an easier way to achieve the same goal in GCC, without > > causing trouble for clang. If we declare the stack pointer register > > variable as a global variable, and remove the constraint altogether, > > that convinces GCC to always set up the frame pointer before inserting > > *any* inline asm. > > > > It basically acts as if *every* inline asm statement has a CALL > > instruction. It's a bit overkill, but the performance impact should be > > negligible. > > > > Here are the vmlinux .text size differences with the following configs > > on GCC: > > > > - defconfig > > - defconfig without frame pointers > > - Fedora distro config > > - Fedora distro config without frame pointers > > > > defconfig defconfig-nofp distro distro-nofp > > before 9796300 9466764 9076191 8789745 > > after 9796941 9462859 9076381 8785325 > > > > With frame pointers, the text size increases slightly. Without frame > > pointers, the text size decreases, and a little more significantly. > > > > Reported-by: Matthias Kaehlcke > > Signed-off-by: Josh Poimboeuf > > NAK - kbuild bot is reporting some cases where this patch doesn't force > the frame pointer setup. So it turns out that for GCC 7, it works as described above: the global register variable results in the frame pointer getting set up before *all* inline asm. But for GCC 6, it doesn't work that way. The global register variable has no such effect. So we need both the global register variable *and* the output constraint after all. Will post another patch after some more testing. -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote: > On 09/19/17 11:45, Josh Poimboeuf wrote: > > For inline asm statements which have a CALL instruction, we list the > > stack pointer as a constraint to convince GCC to ensure the frame > > pointer is set up first: > > > > static inline void foo() > > { > > register void *__sp asm(_ASM_SP); > > asm("call bar" : "+r" (__sp)) > > } > > > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > > > There's actually an easier way to achieve the same goal in GCC, without > > causing trouble for clang. If we declare the stack pointer register > > variable as a global variable, and remove the constraint altogether, > > that convinces GCC to always set up the frame pointer before inserting > > *any* inline asm. > > > > It basically acts as if *every* inline asm statement has a CALL > > instruction. It's a bit overkill, but the performance impact should be > > negligible. > > > > Again, probably negligible, but why do we need a frame pointer just > because we have a call assembly instruction? It's frame pointer convention. Without it, if dumping the stack from the called function, a function will get skipped in the stack trace. -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 10:32:43AM -0700, H. Peter Anvin wrote: > On 09/19/17 11:45, Josh Poimboeuf wrote: > > For inline asm statements which have a CALL instruction, we list the > > stack pointer as a constraint to convince GCC to ensure the frame > > pointer is set up first: > > > > static inline void foo() > > { > > register void *__sp asm(_ASM_SP); > > asm("call bar" : "+r" (__sp)) > > } > > > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > > > There's actually an easier way to achieve the same goal in GCC, without > > causing trouble for clang. If we declare the stack pointer register > > variable as a global variable, and remove the constraint altogether, > > that convinces GCC to always set up the frame pointer before inserting > > *any* inline asm. > > > > It basically acts as if *every* inline asm statement has a CALL > > instruction. It's a bit overkill, but the performance impact should be > > negligible. > > > > Again, probably negligible, but why do we need a frame pointer just > because we have a call assembly instruction? It's frame pointer convention. Without it, if dumping the stack from the called function, a function will get skipped in the stack trace. -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On 09/19/17 11:45, Josh Poimboeuf wrote: > For inline asm statements which have a CALL instruction, we list the > stack pointer as a constraint to convince GCC to ensure the frame > pointer is set up first: > > static inline void foo() > { > register void *__sp asm(_ASM_SP); > asm("call bar" : "+r" (__sp)) > } > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > There's actually an easier way to achieve the same goal in GCC, without > causing trouble for clang. If we declare the stack pointer register > variable as a global variable, and remove the constraint altogether, > that convinces GCC to always set up the frame pointer before inserting > *any* inline asm. > > It basically acts as if *every* inline asm statement has a CALL > instruction. It's a bit overkill, but the performance impact should be > negligible. > Again, probably negligible, but why do we need a frame pointer just because we have a call assembly instruction? -hpa
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On 09/19/17 11:45, Josh Poimboeuf wrote: > For inline asm statements which have a CALL instruction, we list the > stack pointer as a constraint to convince GCC to ensure the frame > pointer is set up first: > > static inline void foo() > { > register void *__sp asm(_ASM_SP); > asm("call bar" : "+r" (__sp)) > } > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > There's actually an easier way to achieve the same goal in GCC, without > causing trouble for clang. If we declare the stack pointer register > variable as a global variable, and remove the constraint altogether, > that convinces GCC to always set up the frame pointer before inserting > *any* inline asm. > > It basically acts as if *every* inline asm statement has a CALL > instruction. It's a bit overkill, but the performance impact should be > negligible. > Again, probably negligible, but why do we need a frame pointer just because we have a call assembly instruction? -hpa
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 7:32 PM, H. Peter Anvinwrote: > On 09/19/17 11:45, Josh Poimboeuf wrote: >> For inline asm statements which have a CALL instruction, we list the >> stack pointer as a constraint to convince GCC to ensure the frame >> pointer is set up first: >> >> static inline void foo() >> { >> register void *__sp asm(_ASM_SP); >> asm("call bar" : "+r" (__sp)) >> } >> >> Unfortunately, that pattern causes clang to corrupt the stack pointer. >> >> There's actually an easier way to achieve the same goal in GCC, without >> causing trouble for clang. If we declare the stack pointer register >> variable as a global variable, and remove the constraint altogether, >> that convinces GCC to always set up the frame pointer before inserting >> *any* inline asm. >> >> It basically acts as if *every* inline asm statement has a CALL >> instruction. It's a bit overkill, but the performance impact should be >> negligible. >> > > Again, probably negligible, but why do we need a frame pointer just > because we have a call assembly instruction? I think we need just the frame itself and RSP pointing below this frame. If we don't have a frame, CALL instruction will smash whatever RSP happens to point to. Compiler doesn't have to setup RSP to point below used part of stack in leaf functions.
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Wed, Sep 20, 2017 at 7:32 PM, H. Peter Anvin wrote: > On 09/19/17 11:45, Josh Poimboeuf wrote: >> For inline asm statements which have a CALL instruction, we list the >> stack pointer as a constraint to convince GCC to ensure the frame >> pointer is set up first: >> >> static inline void foo() >> { >> register void *__sp asm(_ASM_SP); >> asm("call bar" : "+r" (__sp)) >> } >> >> Unfortunately, that pattern causes clang to corrupt the stack pointer. >> >> There's actually an easier way to achieve the same goal in GCC, without >> causing trouble for clang. If we declare the stack pointer register >> variable as a global variable, and remove the constraint altogether, >> that convinces GCC to always set up the frame pointer before inserting >> *any* inline asm. >> >> It basically acts as if *every* inline asm statement has a CALL >> instruction. It's a bit overkill, but the performance impact should be >> negligible. >> > > Again, probably negligible, but why do we need a frame pointer just > because we have a call assembly instruction? I think we need just the frame itself and RSP pointing below this frame. If we don't have a frame, CALL instruction will smash whatever RSP happens to point to. Compiler doesn't have to setup RSP to point below used part of stack in leaf functions.
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 03:25:59PM -0700, Alexander Potapenko wrote: > On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko> wrote: > > On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf > > wrote: > >> For inline asm statements which have a CALL instruction, we list the > >> stack pointer as a constraint to convince GCC to ensure the frame > >> pointer is set up first: > >> > >> static inline void foo() > >> { > >> register void *__sp asm(_ASM_SP); > >> asm("call bar" : "+r" (__sp)) > >> } > >> > >> Unfortunately, that pattern causes clang to corrupt the stack pointer. > >> > >> There's actually an easier way to achieve the same goal in GCC, without > >> causing trouble for clang. If we declare the stack pointer register > >> variable as a global variable, and remove the constraint altogether, > >> that convinces GCC to always set up the frame pointer before inserting > >> *any* inline asm. > >> > >> It basically acts as if *every* inline asm statement has a CALL > >> instruction. It's a bit overkill, but the performance impact should be > >> negligible. > >> > >> Here are the vmlinux .text size differences with the following configs > >> on GCC: > >> > >> - defconfig > >> - defconfig without frame pointers > >> - Fedora distro config > >> - Fedora distro config without frame pointers > >> > >> defconfig defconfig-nofp distro distro-nofp > >> before 9796300 9466764 9076191 8789745 > >> after 9796941 9462859 9076381 8785325 > >> > >> With frame pointers, the text size increases slightly. Without frame > >> pointers, the text size decreases, and a little more significantly. > >> > >> Reported-by: Matthias Kaehlcke > >> Signed-off-by: Josh Poimboeuf > >> --- > >> arch/x86/include/asm/alternative.h | 3 +-- > >> arch/x86/include/asm/asm.h | 9 > >> arch/x86/include/asm/mshyperv.h | 27 > >> ++-- > >> arch/x86/include/asm/paravirt_types.h| 14 ++-- > >> arch/x86/include/asm/preempt.h | 15 + > >> arch/x86/include/asm/processor.h | 6 ++ > >> arch/x86/include/asm/rwsem.h | 6 +++--- > >> arch/x86/include/asm/uaccess.h | 5 ++--- > >> arch/x86/include/asm/xen/hypercall.h | 5 ++--- > >> arch/x86/kvm/emulate.c | 3 +-- > >> arch/x86/kvm/vmx.c | 4 +--- > >> arch/x86/mm/fault.c | 3 +-- > >> tools/objtool/Documentation/stack-validation.txt | 20 +- > >> 13 files changed, 60 insertions(+), 60 deletions(-) > >> > >> diff --git a/arch/x86/include/asm/alternative.h > >> b/arch/x86/include/asm/alternative.h > >> index 1b020381ab38..7aeb1cef4204 100644 > >> --- a/arch/x86/include/asm/alternative.h > >> +++ b/arch/x86/include/asm/alternative.h > >> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void > >> *start, void *end) > >> #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, > >> feature2, \ > >>output, input...) > >>\ > >> { > >>\ > >> - register void *__sp asm(_ASM_SP); > >>\ > >> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", > >> feature1,\ > >> "call %P[new2]", feature2) > >>\ > >> - : output, "+r" (__sp) > >>\ > >> + : output > >>\ > >> : [old] "i" (oldfunc), [new1] "i" (newfunc1), > >>\ > >> [new2] "i" (newfunc2), ## input); > >>\ > >> } > >> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h > >> index 676ee5807d86..ff8921d4615b 100644 > >> --- a/arch/x86/include/asm/asm.h > >> +++ b/arch/x86/include/asm/asm.h > >> @@ -132,4 +132,13 @@ > >> /* For C file, we already have NOKPROBE_SYMBOL macro */ > >> #endif > >> > >> +#ifndef __ASSEMBLY__ > >> +/* > >> + * This variable declaration has the side effect of forcing GCC to always > >> set > >> + * up the frame pointer before inserting any inline asm. This is > >> necessary > >> + * because some inline asm statements have CALL instructions. > >> + */ > >> +register unsigned int __sp_unused asm("esp"); > > Shouldn't this be "asm(_ASM_SP)"? > Answering my own question, this can't be _ASM_SP because of the > realmode code, which is built with -m16 whereas _ASM_SP is "rsp". > The patch works fine with "esp" though - most certainly declaring a > ESP variable
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 03:25:59PM -0700, Alexander Potapenko wrote: > On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko > wrote: > > On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf > > wrote: > >> For inline asm statements which have a CALL instruction, we list the > >> stack pointer as a constraint to convince GCC to ensure the frame > >> pointer is set up first: > >> > >> static inline void foo() > >> { > >> register void *__sp asm(_ASM_SP); > >> asm("call bar" : "+r" (__sp)) > >> } > >> > >> Unfortunately, that pattern causes clang to corrupt the stack pointer. > >> > >> There's actually an easier way to achieve the same goal in GCC, without > >> causing trouble for clang. If we declare the stack pointer register > >> variable as a global variable, and remove the constraint altogether, > >> that convinces GCC to always set up the frame pointer before inserting > >> *any* inline asm. > >> > >> It basically acts as if *every* inline asm statement has a CALL > >> instruction. It's a bit overkill, but the performance impact should be > >> negligible. > >> > >> Here are the vmlinux .text size differences with the following configs > >> on GCC: > >> > >> - defconfig > >> - defconfig without frame pointers > >> - Fedora distro config > >> - Fedora distro config without frame pointers > >> > >> defconfig defconfig-nofp distro distro-nofp > >> before 9796300 9466764 9076191 8789745 > >> after 9796941 9462859 9076381 8785325 > >> > >> With frame pointers, the text size increases slightly. Without frame > >> pointers, the text size decreases, and a little more significantly. > >> > >> Reported-by: Matthias Kaehlcke > >> Signed-off-by: Josh Poimboeuf > >> --- > >> arch/x86/include/asm/alternative.h | 3 +-- > >> arch/x86/include/asm/asm.h | 9 > >> arch/x86/include/asm/mshyperv.h | 27 > >> ++-- > >> arch/x86/include/asm/paravirt_types.h| 14 ++-- > >> arch/x86/include/asm/preempt.h | 15 + > >> arch/x86/include/asm/processor.h | 6 ++ > >> arch/x86/include/asm/rwsem.h | 6 +++--- > >> arch/x86/include/asm/uaccess.h | 5 ++--- > >> arch/x86/include/asm/xen/hypercall.h | 5 ++--- > >> arch/x86/kvm/emulate.c | 3 +-- > >> arch/x86/kvm/vmx.c | 4 +--- > >> arch/x86/mm/fault.c | 3 +-- > >> tools/objtool/Documentation/stack-validation.txt | 20 +- > >> 13 files changed, 60 insertions(+), 60 deletions(-) > >> > >> diff --git a/arch/x86/include/asm/alternative.h > >> b/arch/x86/include/asm/alternative.h > >> index 1b020381ab38..7aeb1cef4204 100644 > >> --- a/arch/x86/include/asm/alternative.h > >> +++ b/arch/x86/include/asm/alternative.h > >> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void > >> *start, void *end) > >> #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, > >> feature2, \ > >>output, input...) > >>\ > >> { > >>\ > >> - register void *__sp asm(_ASM_SP); > >>\ > >> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", > >> feature1,\ > >> "call %P[new2]", feature2) > >>\ > >> - : output, "+r" (__sp) > >>\ > >> + : output > >>\ > >> : [old] "i" (oldfunc), [new1] "i" (newfunc1), > >>\ > >> [new2] "i" (newfunc2), ## input); > >>\ > >> } > >> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h > >> index 676ee5807d86..ff8921d4615b 100644 > >> --- a/arch/x86/include/asm/asm.h > >> +++ b/arch/x86/include/asm/asm.h > >> @@ -132,4 +132,13 @@ > >> /* For C file, we already have NOKPROBE_SYMBOL macro */ > >> #endif > >> > >> +#ifndef __ASSEMBLY__ > >> +/* > >> + * This variable declaration has the side effect of forcing GCC to always > >> set > >> + * up the frame pointer before inserting any inline asm. This is > >> necessary > >> + * because some inline asm statements have CALL instructions. > >> + */ > >> +register unsigned int __sp_unused asm("esp"); > > Shouldn't this be "asm(_ASM_SP)"? > Answering my own question, this can't be _ASM_SP because of the > realmode code, which is built with -m16 whereas _ASM_SP is "rsp". > The patch works fine with "esp" though - most certainly declaring a > ESP variable is enough to reserve the whole RSP in an x86_64 build. Right, clang failing to
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote: > For inline asm statements which have a CALL instruction, we list the > stack pointer as a constraint to convince GCC to ensure the frame > pointer is set up first: > > static inline void foo() > { > register void *__sp asm(_ASM_SP); > asm("call bar" : "+r" (__sp)) > } > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > There's actually an easier way to achieve the same goal in GCC, without > causing trouble for clang. If we declare the stack pointer register > variable as a global variable, and remove the constraint altogether, > that convinces GCC to always set up the frame pointer before inserting > *any* inline asm. > > It basically acts as if *every* inline asm statement has a CALL > instruction. It's a bit overkill, but the performance impact should be > negligible. > > Here are the vmlinux .text size differences with the following configs > on GCC: > > - defconfig > - defconfig without frame pointers > - Fedora distro config > - Fedora distro config without frame pointers > > defconfig defconfig-nofp distro distro-nofp > before9796300 9466764 9076191 8789745 > after 9796941 9462859 9076381 8785325 > > With frame pointers, the text size increases slightly. Without frame > pointers, the text size decreases, and a little more significantly. > > Reported-by: Matthias Kaehlcke> Signed-off-by: Josh Poimboeuf NAK - kbuild bot is reporting some cases where this patch doesn't force the frame pointer setup. -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 01:45:28PM -0500, Josh Poimboeuf wrote: > For inline asm statements which have a CALL instruction, we list the > stack pointer as a constraint to convince GCC to ensure the frame > pointer is set up first: > > static inline void foo() > { > register void *__sp asm(_ASM_SP); > asm("call bar" : "+r" (__sp)) > } > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > There's actually an easier way to achieve the same goal in GCC, without > causing trouble for clang. If we declare the stack pointer register > variable as a global variable, and remove the constraint altogether, > that convinces GCC to always set up the frame pointer before inserting > *any* inline asm. > > It basically acts as if *every* inline asm statement has a CALL > instruction. It's a bit overkill, but the performance impact should be > negligible. > > Here are the vmlinux .text size differences with the following configs > on GCC: > > - defconfig > - defconfig without frame pointers > - Fedora distro config > - Fedora distro config without frame pointers > > defconfig defconfig-nofp distro distro-nofp > before9796300 9466764 9076191 8789745 > after 9796941 9462859 9076381 8785325 > > With frame pointers, the text size increases slightly. Without frame > pointers, the text size decreases, and a little more significantly. > > Reported-by: Matthias Kaehlcke > Signed-off-by: Josh Poimboeuf NAK - kbuild bot is reporting some cases where this patch doesn't force the frame pointer setup. -- Josh
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenkowrote: > On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf wrote: >> For inline asm statements which have a CALL instruction, we list the >> stack pointer as a constraint to convince GCC to ensure the frame >> pointer is set up first: >> >> static inline void foo() >> { >> register void *__sp asm(_ASM_SP); >> asm("call bar" : "+r" (__sp)) >> } >> >> Unfortunately, that pattern causes clang to corrupt the stack pointer. >> >> There's actually an easier way to achieve the same goal in GCC, without >> causing trouble for clang. If we declare the stack pointer register >> variable as a global variable, and remove the constraint altogether, >> that convinces GCC to always set up the frame pointer before inserting >> *any* inline asm. >> >> It basically acts as if *every* inline asm statement has a CALL >> instruction. It's a bit overkill, but the performance impact should be >> negligible. >> >> Here are the vmlinux .text size differences with the following configs >> on GCC: >> >> - defconfig >> - defconfig without frame pointers >> - Fedora distro config >> - Fedora distro config without frame pointers >> >> defconfig defconfig-nofp distro distro-nofp >> before 9796300 9466764 9076191 8789745 >> after 9796941 9462859 9076381 8785325 >> >> With frame pointers, the text size increases slightly. Without frame >> pointers, the text size decreases, and a little more significantly. >> >> Reported-by: Matthias Kaehlcke >> Signed-off-by: Josh Poimboeuf >> --- >> arch/x86/include/asm/alternative.h | 3 +-- >> arch/x86/include/asm/asm.h | 9 >> arch/x86/include/asm/mshyperv.h | 27 >> ++-- >> arch/x86/include/asm/paravirt_types.h| 14 ++-- >> arch/x86/include/asm/preempt.h | 15 + >> arch/x86/include/asm/processor.h | 6 ++ >> arch/x86/include/asm/rwsem.h | 6 +++--- >> arch/x86/include/asm/uaccess.h | 5 ++--- >> arch/x86/include/asm/xen/hypercall.h | 5 ++--- >> arch/x86/kvm/emulate.c | 3 +-- >> arch/x86/kvm/vmx.c | 4 +--- >> arch/x86/mm/fault.c | 3 +-- >> tools/objtool/Documentation/stack-validation.txt | 20 +- >> 13 files changed, 60 insertions(+), 60 deletions(-) >> >> diff --git a/arch/x86/include/asm/alternative.h >> b/arch/x86/include/asm/alternative.h >> index 1b020381ab38..7aeb1cef4204 100644 >> --- a/arch/x86/include/asm/alternative.h >> +++ b/arch/x86/include/asm/alternative.h >> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void >> *start, void *end) >> #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, >> \ >>output, input...) >> \ >> { >> \ >> - register void *__sp asm(_ASM_SP); >> \ >> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", >> feature1,\ >> "call %P[new2]", feature2) >> \ >> - : output, "+r" (__sp) >> \ >> + : output >> \ >> : [old] "i" (oldfunc), [new1] "i" (newfunc1), >> \ >> [new2] "i" (newfunc2), ## input); >> \ >> } >> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h >> index 676ee5807d86..ff8921d4615b 100644 >> --- a/arch/x86/include/asm/asm.h >> +++ b/arch/x86/include/asm/asm.h >> @@ -132,4 +132,13 @@ >> /* For C file, we already have NOKPROBE_SYMBOL macro */ >> #endif >> >> +#ifndef __ASSEMBLY__ >> +/* >> + * This variable declaration has the side effect of forcing GCC to always >> set >> + * up the frame pointer before inserting any inline asm. This is necessary >> + * because some inline asm statements have CALL instructions. >> + */ >> +register unsigned int __sp_unused asm("esp"); > Shouldn't this be "asm(_ASM_SP)"? Answering my own question, this can't be _ASM_SP because of the realmode code, which is built with -m16 whereas _ASM_SP is "rsp". The patch works fine with "esp" though - most certainly declaring a ESP variable is enough to reserve the whole RSP in an x86_64 build. >> +#endif >> + >> #endif /* _ASM_X86_ASM_H */ >> diff --git a/arch/x86/include/asm/mshyperv.h >> b/arch/x86/include/asm/mshyperv.h >> index 63cc96f064dc..1c913ae27f99 100644 >> --- a/arch/x86/include/asm/mshyperv.h >> +++
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 2:55 PM, Alexander Potapenko wrote: > On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf wrote: >> For inline asm statements which have a CALL instruction, we list the >> stack pointer as a constraint to convince GCC to ensure the frame >> pointer is set up first: >> >> static inline void foo() >> { >> register void *__sp asm(_ASM_SP); >> asm("call bar" : "+r" (__sp)) >> } >> >> Unfortunately, that pattern causes clang to corrupt the stack pointer. >> >> There's actually an easier way to achieve the same goal in GCC, without >> causing trouble for clang. If we declare the stack pointer register >> variable as a global variable, and remove the constraint altogether, >> that convinces GCC to always set up the frame pointer before inserting >> *any* inline asm. >> >> It basically acts as if *every* inline asm statement has a CALL >> instruction. It's a bit overkill, but the performance impact should be >> negligible. >> >> Here are the vmlinux .text size differences with the following configs >> on GCC: >> >> - defconfig >> - defconfig without frame pointers >> - Fedora distro config >> - Fedora distro config without frame pointers >> >> defconfig defconfig-nofp distro distro-nofp >> before 9796300 9466764 9076191 8789745 >> after 9796941 9462859 9076381 8785325 >> >> With frame pointers, the text size increases slightly. Without frame >> pointers, the text size decreases, and a little more significantly. >> >> Reported-by: Matthias Kaehlcke >> Signed-off-by: Josh Poimboeuf >> --- >> arch/x86/include/asm/alternative.h | 3 +-- >> arch/x86/include/asm/asm.h | 9 >> arch/x86/include/asm/mshyperv.h | 27 >> ++-- >> arch/x86/include/asm/paravirt_types.h| 14 ++-- >> arch/x86/include/asm/preempt.h | 15 + >> arch/x86/include/asm/processor.h | 6 ++ >> arch/x86/include/asm/rwsem.h | 6 +++--- >> arch/x86/include/asm/uaccess.h | 5 ++--- >> arch/x86/include/asm/xen/hypercall.h | 5 ++--- >> arch/x86/kvm/emulate.c | 3 +-- >> arch/x86/kvm/vmx.c | 4 +--- >> arch/x86/mm/fault.c | 3 +-- >> tools/objtool/Documentation/stack-validation.txt | 20 +- >> 13 files changed, 60 insertions(+), 60 deletions(-) >> >> diff --git a/arch/x86/include/asm/alternative.h >> b/arch/x86/include/asm/alternative.h >> index 1b020381ab38..7aeb1cef4204 100644 >> --- a/arch/x86/include/asm/alternative.h >> +++ b/arch/x86/include/asm/alternative.h >> @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void >> *start, void *end) >> #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, >> \ >>output, input...) >> \ >> { >> \ >> - register void *__sp asm(_ASM_SP); >> \ >> asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", >> feature1,\ >> "call %P[new2]", feature2) >> \ >> - : output, "+r" (__sp) >> \ >> + : output >> \ >> : [old] "i" (oldfunc), [new1] "i" (newfunc1), >> \ >> [new2] "i" (newfunc2), ## input); >> \ >> } >> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h >> index 676ee5807d86..ff8921d4615b 100644 >> --- a/arch/x86/include/asm/asm.h >> +++ b/arch/x86/include/asm/asm.h >> @@ -132,4 +132,13 @@ >> /* For C file, we already have NOKPROBE_SYMBOL macro */ >> #endif >> >> +#ifndef __ASSEMBLY__ >> +/* >> + * This variable declaration has the side effect of forcing GCC to always >> set >> + * up the frame pointer before inserting any inline asm. This is necessary >> + * because some inline asm statements have CALL instructions. >> + */ >> +register unsigned int __sp_unused asm("esp"); > Shouldn't this be "asm(_ASM_SP)"? Answering my own question, this can't be _ASM_SP because of the realmode code, which is built with -m16 whereas _ASM_SP is "rsp". The patch works fine with "esp" though - most certainly declaring a ESP variable is enough to reserve the whole RSP in an x86_64 build. >> +#endif >> + >> #endif /* _ASM_X86_ASM_H */ >> diff --git a/arch/x86/include/asm/mshyperv.h >> b/arch/x86/include/asm/mshyperv.h >> index 63cc96f064dc..1c913ae27f99 100644 >> --- a/arch/x86/include/asm/mshyperv.h >> +++ b/arch/x86/include/asm/mshyperv.h >> @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeufwrote: > For inline asm statements which have a CALL instruction, we list the > stack pointer as a constraint to convince GCC to ensure the frame > pointer is set up first: > > static inline void foo() > { > register void *__sp asm(_ASM_SP); > asm("call bar" : "+r" (__sp)) > } > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > There's actually an easier way to achieve the same goal in GCC, without > causing trouble for clang. If we declare the stack pointer register > variable as a global variable, and remove the constraint altogether, > that convinces GCC to always set up the frame pointer before inserting > *any* inline asm. > > It basically acts as if *every* inline asm statement has a CALL > instruction. It's a bit overkill, but the performance impact should be > negligible. > > Here are the vmlinux .text size differences with the following configs > on GCC: > > - defconfig > - defconfig without frame pointers > - Fedora distro config > - Fedora distro config without frame pointers > > defconfig defconfig-nofp distro distro-nofp > before 9796300 9466764 9076191 8789745 > after 9796941 9462859 9076381 8785325 > > With frame pointers, the text size increases slightly. Without frame > pointers, the text size decreases, and a little more significantly. > > Reported-by: Matthias Kaehlcke > Signed-off-by: Josh Poimboeuf > --- > arch/x86/include/asm/alternative.h | 3 +-- > arch/x86/include/asm/asm.h | 9 > arch/x86/include/asm/mshyperv.h | 27 > ++-- > arch/x86/include/asm/paravirt_types.h| 14 ++-- > arch/x86/include/asm/preempt.h | 15 + > arch/x86/include/asm/processor.h | 6 ++ > arch/x86/include/asm/rwsem.h | 6 +++--- > arch/x86/include/asm/uaccess.h | 5 ++--- > arch/x86/include/asm/xen/hypercall.h | 5 ++--- > arch/x86/kvm/emulate.c | 3 +-- > arch/x86/kvm/vmx.c | 4 +--- > arch/x86/mm/fault.c | 3 +-- > tools/objtool/Documentation/stack-validation.txt | 20 +- > 13 files changed, 60 insertions(+), 60 deletions(-) > > diff --git a/arch/x86/include/asm/alternative.h > b/arch/x86/include/asm/alternative.h > index 1b020381ab38..7aeb1cef4204 100644 > --- a/arch/x86/include/asm/alternative.h > +++ b/arch/x86/include/asm/alternative.h > @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void > *start, void *end) > #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, > \ >output, input...) > \ > { > \ > - register void *__sp asm(_ASM_SP); > \ > asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", > feature1,\ > "call %P[new2]", feature2) > \ > - : output, "+r" (__sp) > \ > + : output > \ > : [old] "i" (oldfunc), [new1] "i" (newfunc1), > \ > [new2] "i" (newfunc2), ## input); > \ > } > diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h > index 676ee5807d86..ff8921d4615b 100644 > --- a/arch/x86/include/asm/asm.h > +++ b/arch/x86/include/asm/asm.h > @@ -132,4 +132,13 @@ > /* For C file, we already have NOKPROBE_SYMBOL macro */ > #endif > > +#ifndef __ASSEMBLY__ > +/* > + * This variable declaration has the side effect of forcing GCC to always set > + * up the frame pointer before inserting any inline asm. This is necessary > + * because some inline asm statements have CALL instructions. > + */ > +register unsigned int __sp_unused asm("esp"); Shouldn't this be "asm(_ASM_SP)"? > +#endif > + > #endif /* _ASM_X86_ASM_H */ > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h > index 63cc96f064dc..1c913ae27f99 100644 > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void > *input, void *output) > u64 input_address = input ? virt_to_phys(input) : 0; > u64 output_address = output ? virt_to_phys(output) : 0; > u64 hv_status; > - register void *__sp asm(_ASM_SP); > > #ifdef CONFIG_X86_64 > if (!hv_hypercall_pg) > return U64_MAX; > > - __asm__ __volatile__("mov %4, %%r8\n" > -
Re: [PATCH 2/2] x86/asm: Fix inline asm call constraints for clang
On Tue, Sep 19, 2017 at 11:45 AM, Josh Poimboeuf wrote: > For inline asm statements which have a CALL instruction, we list the > stack pointer as a constraint to convince GCC to ensure the frame > pointer is set up first: > > static inline void foo() > { > register void *__sp asm(_ASM_SP); > asm("call bar" : "+r" (__sp)) > } > > Unfortunately, that pattern causes clang to corrupt the stack pointer. > > There's actually an easier way to achieve the same goal in GCC, without > causing trouble for clang. If we declare the stack pointer register > variable as a global variable, and remove the constraint altogether, > that convinces GCC to always set up the frame pointer before inserting > *any* inline asm. > > It basically acts as if *every* inline asm statement has a CALL > instruction. It's a bit overkill, but the performance impact should be > negligible. > > Here are the vmlinux .text size differences with the following configs > on GCC: > > - defconfig > - defconfig without frame pointers > - Fedora distro config > - Fedora distro config without frame pointers > > defconfig defconfig-nofp distro distro-nofp > before 9796300 9466764 9076191 8789745 > after 9796941 9462859 9076381 8785325 > > With frame pointers, the text size increases slightly. Without frame > pointers, the text size decreases, and a little more significantly. > > Reported-by: Matthias Kaehlcke > Signed-off-by: Josh Poimboeuf > --- > arch/x86/include/asm/alternative.h | 3 +-- > arch/x86/include/asm/asm.h | 9 > arch/x86/include/asm/mshyperv.h | 27 > ++-- > arch/x86/include/asm/paravirt_types.h| 14 ++-- > arch/x86/include/asm/preempt.h | 15 + > arch/x86/include/asm/processor.h | 6 ++ > arch/x86/include/asm/rwsem.h | 6 +++--- > arch/x86/include/asm/uaccess.h | 5 ++--- > arch/x86/include/asm/xen/hypercall.h | 5 ++--- > arch/x86/kvm/emulate.c | 3 +-- > arch/x86/kvm/vmx.c | 4 +--- > arch/x86/mm/fault.c | 3 +-- > tools/objtool/Documentation/stack-validation.txt | 20 +- > 13 files changed, 60 insertions(+), 60 deletions(-) > > diff --git a/arch/x86/include/asm/alternative.h > b/arch/x86/include/asm/alternative.h > index 1b020381ab38..7aeb1cef4204 100644 > --- a/arch/x86/include/asm/alternative.h > +++ b/arch/x86/include/asm/alternative.h > @@ -218,10 +218,9 @@ static inline int alternatives_text_reserved(void > *start, void *end) > #define alternative_call_2(oldfunc, newfunc1, feature1, newfunc2, feature2, > \ >output, input...) > \ > { > \ > - register void *__sp asm(_ASM_SP); > \ > asm volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", > feature1,\ > "call %P[new2]", feature2) > \ > - : output, "+r" (__sp) > \ > + : output > \ > : [old] "i" (oldfunc), [new1] "i" (newfunc1), > \ > [new2] "i" (newfunc2), ## input); > \ > } > diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h > index 676ee5807d86..ff8921d4615b 100644 > --- a/arch/x86/include/asm/asm.h > +++ b/arch/x86/include/asm/asm.h > @@ -132,4 +132,13 @@ > /* For C file, we already have NOKPROBE_SYMBOL macro */ > #endif > > +#ifndef __ASSEMBLY__ > +/* > + * This variable declaration has the side effect of forcing GCC to always set > + * up the frame pointer before inserting any inline asm. This is necessary > + * because some inline asm statements have CALL instructions. > + */ > +register unsigned int __sp_unused asm("esp"); Shouldn't this be "asm(_ASM_SP)"? > +#endif > + > #endif /* _ASM_X86_ASM_H */ > diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h > index 63cc96f064dc..1c913ae27f99 100644 > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -179,16 +179,15 @@ static inline u64 hv_do_hypercall(u64 control, void > *input, void *output) > u64 input_address = input ? virt_to_phys(input) : 0; > u64 output_address = output ? virt_to_phys(output) : 0; > u64 hv_status; > - register void *__sp asm(_ASM_SP); > > #ifdef CONFIG_X86_64 > if (!hv_hypercall_pg) > return U64_MAX; > > - __asm__ __volatile__("mov %4, %%r8\n" > -"call *%5" > -