Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Mon, Mar 7, 2016 at 10:41 AM, Brian Gerstwrote: > On Mon, Mar 7, 2016 at 1:03 PM, Andy Lutomirski wrote: >> On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst wrote: >>> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski wrote: Due to a blatant design error, SYSENTER doesn't clear TF. As a result, if a user does SYSENTER with TF set, we will single-step through the kernel until something clears TF. There is absolutely nothing we can do to prevent this short of turning off SYSENTER [1]. Simplify the handling considerably with two changes: 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can add TF to that list of flags to sanitize with no overhead whatsoever. 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >>> >>> What is wrong with the current method of clearing TF and setting >>> TIF_SINGLESTEP on the first debug trap? This patch actually increases >>> complexity because it has to check for a range of addresses rather >>> than just the first instruction, plus it has to singlestep all the way >>> through the SYSENTER prologue. >>> >>> Unless there is an actual issue with TIF_SINGLESTEP, I don't think >>> this patch is an improvement. >> >> TIF_SINGLESTEP has bizarrely overloaded semantics. >> >> There's this: >> >> /* >> * If we stepped into a sysenter/syscall insn, it trapped in >> * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP. >> * If user-mode had set TF itself, then it's still clear from >> * do_debug() and we need to set it again to restore the user >> * state so we don't wrongly set TIF_FORCED_TF below. >> * If enable_single_step() was used last and that is what >> * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are >> * already set and our bookkeeping is fine. >> */ >> if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP))) >> regs->flags |= X86_EFLAGS_TF; >> >> but TIF_SINGLESTEP is also used for other things. (And I need to >> follow up with a patch to remove that comment.) This results in >> incomprehensible behavior: if a user program sets TF and does >> SYSENTER, then TIF_SINGLESTEP gets set (and stays set!). This does >> not happen if a user sets TF and does INT80 or SYSCALL. There was at >> least one bug in here that Oleg fixed a while back, and I wouldn't be >> at all surprised if there were others. >> >> I don't know what would happen if TF were set and SYSENTER were used >> to do a syscall where the __get_user to load the syscall nr failed. >> That happens before the TIF_SINGLESTEP fixup. >> >> Basically, the overloaded use of TIF_SINGLESTEP was complicated and >> hard to understand, and the new behavior is straightforward and >> consistent with other entries, even if it's a bit slower. >> >> We could introduce a new TIF_SYSENTER_TF and use it directly, or we >> could accelerate the TF fixup in regs->flags by switching to a >> different entry path (I had a draft patch to do that), but I tend to >> favor simplicity for things that aren't performance-critical. > > The alternate entry path wouldn't be very big, just 5 instructions > with the OR being the only difference. Agreed. It's just more code to maintain and test. I can certainly do it. I scrapped it the first time because it was messy on Xen PV. It was extra messy because I did this part of the series before I cleaned up the garbage ENTRY(debug) code, and that got in the way as well. > > Another option is to use a per-cpu var instead of a TIF flag. Then we'd need to add a branch in the TF-not-set path, which would be a bit unfortunate. As a third option, we could force BTF on, which would reduce the number of useless traps. In any case, I'd be glad to speed this up as a follow-up patch if anyone cares about performance. --Andy
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Mon, Mar 7, 2016 at 10:41 AM, Brian Gerst wrote: > On Mon, Mar 7, 2016 at 1:03 PM, Andy Lutomirski wrote: >> On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst wrote: >>> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski wrote: Due to a blatant design error, SYSENTER doesn't clear TF. As a result, if a user does SYSENTER with TF set, we will single-step through the kernel until something clears TF. There is absolutely nothing we can do to prevent this short of turning off SYSENTER [1]. Simplify the handling considerably with two changes: 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can add TF to that list of flags to sanitize with no overhead whatsoever. 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >>> >>> What is wrong with the current method of clearing TF and setting >>> TIF_SINGLESTEP on the first debug trap? This patch actually increases >>> complexity because it has to check for a range of addresses rather >>> than just the first instruction, plus it has to singlestep all the way >>> through the SYSENTER prologue. >>> >>> Unless there is an actual issue with TIF_SINGLESTEP, I don't think >>> this patch is an improvement. >> >> TIF_SINGLESTEP has bizarrely overloaded semantics. >> >> There's this: >> >> /* >> * If we stepped into a sysenter/syscall insn, it trapped in >> * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP. >> * If user-mode had set TF itself, then it's still clear from >> * do_debug() and we need to set it again to restore the user >> * state so we don't wrongly set TIF_FORCED_TF below. >> * If enable_single_step() was used last and that is what >> * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are >> * already set and our bookkeeping is fine. >> */ >> if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP))) >> regs->flags |= X86_EFLAGS_TF; >> >> but TIF_SINGLESTEP is also used for other things. (And I need to >> follow up with a patch to remove that comment.) This results in >> incomprehensible behavior: if a user program sets TF and does >> SYSENTER, then TIF_SINGLESTEP gets set (and stays set!). This does >> not happen if a user sets TF and does INT80 or SYSCALL. There was at >> least one bug in here that Oleg fixed a while back, and I wouldn't be >> at all surprised if there were others. >> >> I don't know what would happen if TF were set and SYSENTER were used >> to do a syscall where the __get_user to load the syscall nr failed. >> That happens before the TIF_SINGLESTEP fixup. >> >> Basically, the overloaded use of TIF_SINGLESTEP was complicated and >> hard to understand, and the new behavior is straightforward and >> consistent with other entries, even if it's a bit slower. >> >> We could introduce a new TIF_SYSENTER_TF and use it directly, or we >> could accelerate the TF fixup in regs->flags by switching to a >> different entry path (I had a draft patch to do that), but I tend to >> favor simplicity for things that aren't performance-critical. > > The alternate entry path wouldn't be very big, just 5 instructions > with the OR being the only difference. Agreed. It's just more code to maintain and test. I can certainly do it. I scrapped it the first time because it was messy on Xen PV. It was extra messy because I did this part of the series before I cleaned up the garbage ENTRY(debug) code, and that got in the way as well. > > Another option is to use a per-cpu var instead of a TIF flag. Then we'd need to add a branch in the TF-not-set path, which would be a bit unfortunate. As a third option, we could force BTF on, which would reduce the number of useless traps. In any case, I'd be glad to speed this up as a follow-up patch if anyone cares about performance. --Andy
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Mon, Mar 7, 2016 at 1:03 PM, Andy Lutomirskiwrote: > On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst wrote: >> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski wrote: >>> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >>> if a user does SYSENTER with TF set, we will single-step through the >>> kernel until something clears TF. There is absolutely nothing we can >>> do to prevent this short of turning off SYSENTER [1]. >>> >>> Simplify the handling considerably with two changes: >>> >>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>>add TF to that list of flags to sanitize with no overhead whatsoever. >>> >>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >> >> What is wrong with the current method of clearing TF and setting >> TIF_SINGLESTEP on the first debug trap? This patch actually increases >> complexity because it has to check for a range of addresses rather >> than just the first instruction, plus it has to singlestep all the way >> through the SYSENTER prologue. >> >> Unless there is an actual issue with TIF_SINGLESTEP, I don't think >> this patch is an improvement. > > TIF_SINGLESTEP has bizarrely overloaded semantics. > > There's this: > > /* > * If we stepped into a sysenter/syscall insn, it trapped in > * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP. > * If user-mode had set TF itself, then it's still clear from > * do_debug() and we need to set it again to restore the user > * state so we don't wrongly set TIF_FORCED_TF below. > * If enable_single_step() was used last and that is what > * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are > * already set and our bookkeeping is fine. > */ > if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP))) > regs->flags |= X86_EFLAGS_TF; > > but TIF_SINGLESTEP is also used for other things. (And I need to > follow up with a patch to remove that comment.) This results in > incomprehensible behavior: if a user program sets TF and does > SYSENTER, then TIF_SINGLESTEP gets set (and stays set!). This does > not happen if a user sets TF and does INT80 or SYSCALL. There was at > least one bug in here that Oleg fixed a while back, and I wouldn't be > at all surprised if there were others. > > I don't know what would happen if TF were set and SYSENTER were used > to do a syscall where the __get_user to load the syscall nr failed. > That happens before the TIF_SINGLESTEP fixup. > > Basically, the overloaded use of TIF_SINGLESTEP was complicated and > hard to understand, and the new behavior is straightforward and > consistent with other entries, even if it's a bit slower. > > We could introduce a new TIF_SYSENTER_TF and use it directly, or we > could accelerate the TF fixup in regs->flags by switching to a > different entry path (I had a draft patch to do that), but I tend to > favor simplicity for things that aren't performance-critical. The alternate entry path wouldn't be very big, just 5 instructions with the OR being the only difference. Another option is to use a per-cpu var instead of a TIF flag. -- Brian Gerst
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Mon, Mar 7, 2016 at 1:03 PM, Andy Lutomirski wrote: > On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst wrote: >> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski wrote: >>> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >>> if a user does SYSENTER with TF set, we will single-step through the >>> kernel until something clears TF. There is absolutely nothing we can >>> do to prevent this short of turning off SYSENTER [1]. >>> >>> Simplify the handling considerably with two changes: >>> >>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>>add TF to that list of flags to sanitize with no overhead whatsoever. >>> >>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >> >> What is wrong with the current method of clearing TF and setting >> TIF_SINGLESTEP on the first debug trap? This patch actually increases >> complexity because it has to check for a range of addresses rather >> than just the first instruction, plus it has to singlestep all the way >> through the SYSENTER prologue. >> >> Unless there is an actual issue with TIF_SINGLESTEP, I don't think >> this patch is an improvement. > > TIF_SINGLESTEP has bizarrely overloaded semantics. > > There's this: > > /* > * If we stepped into a sysenter/syscall insn, it trapped in > * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP. > * If user-mode had set TF itself, then it's still clear from > * do_debug() and we need to set it again to restore the user > * state so we don't wrongly set TIF_FORCED_TF below. > * If enable_single_step() was used last and that is what > * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are > * already set and our bookkeeping is fine. > */ > if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP))) > regs->flags |= X86_EFLAGS_TF; > > but TIF_SINGLESTEP is also used for other things. (And I need to > follow up with a patch to remove that comment.) This results in > incomprehensible behavior: if a user program sets TF and does > SYSENTER, then TIF_SINGLESTEP gets set (and stays set!). This does > not happen if a user sets TF and does INT80 or SYSCALL. There was at > least one bug in here that Oleg fixed a while back, and I wouldn't be > at all surprised if there were others. > > I don't know what would happen if TF were set and SYSENTER were used > to do a syscall where the __get_user to load the syscall nr failed. > That happens before the TIF_SINGLESTEP fixup. > > Basically, the overloaded use of TIF_SINGLESTEP was complicated and > hard to understand, and the new behavior is straightforward and > consistent with other entries, even if it's a bit slower. > > We could introduce a new TIF_SYSENTER_TF and use it directly, or we > could accelerate the TF fixup in regs->flags by switching to a > different entry path (I had a draft patch to do that), but I tend to > favor simplicity for things that aren't performance-critical. The alternate entry path wouldn't be very big, just 5 instructions with the OR being the only difference. Another option is to use a per-cpu var instead of a TIF flag. -- Brian Gerst
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerstwrote: > On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski wrote: >> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >> if a user does SYSENTER with TF set, we will single-step through the >> kernel until something clears TF. There is absolutely nothing we can >> do to prevent this short of turning off SYSENTER [1]. >> >> Simplify the handling considerably with two changes: >> >> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>add TF to that list of flags to sanitize with no overhead whatsoever. >> >> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. > > What is wrong with the current method of clearing TF and setting > TIF_SINGLESTEP on the first debug trap? This patch actually increases > complexity because it has to check for a range of addresses rather > than just the first instruction, plus it has to singlestep all the way > through the SYSENTER prologue. > > Unless there is an actual issue with TIF_SINGLESTEP, I don't think > this patch is an improvement. TIF_SINGLESTEP has bizarrely overloaded semantics. There's this: /* * If we stepped into a sysenter/syscall insn, it trapped in * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP. * If user-mode had set TF itself, then it's still clear from * do_debug() and we need to set it again to restore the user * state so we don't wrongly set TIF_FORCED_TF below. * If enable_single_step() was used last and that is what * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are * already set and our bookkeeping is fine. */ if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP))) regs->flags |= X86_EFLAGS_TF; but TIF_SINGLESTEP is also used for other things. (And I need to follow up with a patch to remove that comment.) This results in incomprehensible behavior: if a user program sets TF and does SYSENTER, then TIF_SINGLESTEP gets set (and stays set!). This does not happen if a user sets TF and does INT80 or SYSCALL. There was at least one bug in here that Oleg fixed a while back, and I wouldn't be at all surprised if there were others. I don't know what would happen if TF were set and SYSENTER were used to do a syscall where the __get_user to load the syscall nr failed. That happens before the TIF_SINGLESTEP fixup. Basically, the overloaded use of TIF_SINGLESTEP was complicated and hard to understand, and the new behavior is straightforward and consistent with other entries, even if it's a bit slower. We could introduce a new TIF_SYSENTER_TF and use it directly, or we could accelerate the TF fixup in regs->flags by switching to a different entry path (I had a draft patch to do that), but I tend to favor simplicity for things that aren't performance-critical.
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst wrote: > On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski wrote: >> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >> if a user does SYSENTER with TF set, we will single-step through the >> kernel until something clears TF. There is absolutely nothing we can >> do to prevent this short of turning off SYSENTER [1]. >> >> Simplify the handling considerably with two changes: >> >> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>add TF to that list of flags to sanitize with no overhead whatsoever. >> >> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. > > What is wrong with the current method of clearing TF and setting > TIF_SINGLESTEP on the first debug trap? This patch actually increases > complexity because it has to check for a range of addresses rather > than just the first instruction, plus it has to singlestep all the way > through the SYSENTER prologue. > > Unless there is an actual issue with TIF_SINGLESTEP, I don't think > this patch is an improvement. TIF_SINGLESTEP has bizarrely overloaded semantics. There's this: /* * If we stepped into a sysenter/syscall insn, it trapped in * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP. * If user-mode had set TF itself, then it's still clear from * do_debug() and we need to set it again to restore the user * state so we don't wrongly set TIF_FORCED_TF below. * If enable_single_step() was used last and that is what * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are * already set and our bookkeeping is fine. */ if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP))) regs->flags |= X86_EFLAGS_TF; but TIF_SINGLESTEP is also used for other things. (And I need to follow up with a patch to remove that comment.) This results in incomprehensible behavior: if a user program sets TF and does SYSENTER, then TIF_SINGLESTEP gets set (and stays set!). This does not happen if a user sets TF and does INT80 or SYSCALL. There was at least one bug in here that Oleg fixed a while back, and I wouldn't be at all surprised if there were others. I don't know what would happen if TF were set and SYSENTER were used to do a syscall where the __get_user to load the syscall nr failed. That happens before the TIF_SINGLESTEP fixup. Basically, the overloaded use of TIF_SINGLESTEP was complicated and hard to understand, and the new behavior is straightforward and consistent with other entries, even if it's a bit slower. We could introduce a new TIF_SYSENTER_TF and use it directly, or we could accelerate the TF fixup in regs->flags by switching to a different entry path (I had a draft patch to do that), but I tend to favor simplicity for things that aren't performance-critical.
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirskiwrote: > Due to a blatant design error, SYSENTER doesn't clear TF. As a result, > if a user does SYSENTER with TF set, we will single-step through the > kernel until something clears TF. There is absolutely nothing we can > do to prevent this short of turning off SYSENTER [1]. > > Simplify the handling considerably with two changes: > > 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >add TF to that list of flags to sanitize with no overhead whatsoever. > > 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. What is wrong with the current method of clearing TF and setting TIF_SINGLESTEP on the first debug trap? This patch actually increases complexity because it has to check for a range of addresses rather than just the first instruction, plus it has to singlestep all the way through the SYSENTER prologue. Unless there is an actual issue with TIF_SINGLESTEP, I don't think this patch is an improvement. -- Brian Gerst
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski wrote: > Due to a blatant design error, SYSENTER doesn't clear TF. As a result, > if a user does SYSENTER with TF set, we will single-step through the > kernel until something clears TF. There is absolutely nothing we can > do to prevent this short of turning off SYSENTER [1]. > > Simplify the handling considerably with two changes: > > 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >add TF to that list of flags to sanitize with no overhead whatsoever. > > 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. What is wrong with the current method of clearing TF and setting TIF_SINGLESTEP on the first debug trap? This patch actually increases complexity because it has to check for a range of addresses rather than just the first instruction, plus it has to singlestep all the way through the SYSENTER prologue. Unless there is an actual issue with TIF_SINGLESTEP, I don't think this patch is an improvement. -- Brian Gerst
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On 06/03/16 17:36, Andy Lutomirski wrote: > >> I haven't read the Xen hypervisor code, but what are those 5 words >> that were pushed on the stack by the hypervisor? It suspiciously is >> the size of an IRET frame. > I think it is nominally an IRET frame. It is a notminal IRET frame. Even to this day, Xen doesn't support anything other "making it appear as if an interrupt/exception occurred", even with the syscall/sysenter and artificial entrypoints. The Xen entrypoint logic predates the introduction of the syscall/sysenter support in Linux. At the point where your hammer is already iret shaped and you have a forked version of Linux for running as a guest, modifying sysenter to be iret shaped is an easy option. For better or worse, this is now the ABI. > One might wonder what's in it, given that the hypervisor doesn't know what > the old IP or SP was. Looking at the code which synthesizes the iret frame pushq $FLAT_USER_SS pushq $0 pushfq pushq $3 /* ring 3 null cs */ pushq $0 /* null rip */ Completely ignoring it definitely the best course of action. > >> Considering that we don't use SYSEXIT on >> Xen anymore, can we just redirect SYSENTER to the INT80 handler? >> Perhaps even just disable SYSENTER support in the VDSO on Xen. I >> can't imagine SYSENTER is any faster than INT80 on Xen, because it has >> to trap to the hypervisor first. >> > I think we should leave it enabled -- having user programs on Xen PV > trap into the hypervisor for a system call using SYSENTER will still > be much faster than using INT $0x80 as long as the hypervisor does > something reasonable. The trap into Xen has to happen either way (even for the INT $0x80 path). There is almost certainly room for improvement in both paths, but in principle the sysenter path will be faster. ~Andrew
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On 06/03/16 17:36, Andy Lutomirski wrote: > >> I haven't read the Xen hypervisor code, but what are those 5 words >> that were pushed on the stack by the hypervisor? It suspiciously is >> the size of an IRET frame. > I think it is nominally an IRET frame. It is a notminal IRET frame. Even to this day, Xen doesn't support anything other "making it appear as if an interrupt/exception occurred", even with the syscall/sysenter and artificial entrypoints. The Xen entrypoint logic predates the introduction of the syscall/sysenter support in Linux. At the point where your hammer is already iret shaped and you have a forked version of Linux for running as a guest, modifying sysenter to be iret shaped is an easy option. For better or worse, this is now the ABI. > One might wonder what's in it, given that the hypervisor doesn't know what > the old IP or SP was. Looking at the code which synthesizes the iret frame pushq $FLAT_USER_SS pushq $0 pushfq pushq $3 /* ring 3 null cs */ pushq $0 /* null rip */ Completely ignoring it definitely the best course of action. > >> Considering that we don't use SYSEXIT on >> Xen anymore, can we just redirect SYSENTER to the INT80 handler? >> Perhaps even just disable SYSENTER support in the VDSO on Xen. I >> can't imagine SYSENTER is any faster than INT80 on Xen, because it has >> to trap to the hypervisor first. >> > I think we should leave it enabled -- having user programs on Xen PV > trap into the hypervisor for a system call using SYSENTER will still > be much faster than using INT $0x80 as long as the hypervisor does > something reasonable. The trap into Xen has to happen either way (even for the INT $0x80 path). There is almost certainly room for improvement in both paths, but in principle the sysenter path will be faster. ~Andrew
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sun, Mar 06, 2016 at 09:36:42AM -0800, Andy Lutomirski wrote: > > ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp", > > X86_FEATURE_XENPV > > That might work. Yap, and as we said on IRC, KALLSYMS can be off - even if it is "if EXPERT" so we need to think of a better way to compute start address and size of entry_SYSENTER_32 and the _compat one. We probably could reuse some of the ELF parsing machinery of relocs.c or so... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sun, Mar 06, 2016 at 09:36:42AM -0800, Andy Lutomirski wrote: > > ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp", > > X86_FEATURE_XENPV > > That might work. Yap, and as we said on IRC, KALLSYMS can be off - even if it is "if EXPERT" so we need to think of a better way to compute start address and size of entry_SYSENTER_32 and the _compat one. We probably could reuse some of the ELF parsing machinery of relocs.c or so... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sun, Mar 6, 2016 at 9:30 AM, Brian Gerstwrote: > On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov wrote: >> On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote: >>> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >>> if a user does SYSENTER with TF set, we will single-step through the >>> kernel until something clears TF. There is absolutely nothing we can >>> do to prevent this short of turning off SYSENTER [1]. >>> >>> Simplify the handling considerably with two changes: >>> >>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>>add TF to that list of flags to sanitize with no overhead whatsoever. >>> >>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >>> >>> That's all we need to do. >>> >>> Don't get too excited -- our handling is still buggy on 32-bit >>> kernels. There's nothing wrong with the SYSENTER code itself, but >>> the #DB prologue has a clever fixup for traps on the very first >>> instruction of entry_SYSENTER_32, and the fixup doesn't work quite >>> correctly. The next two patches will fix that. >>> >>> [1] We could probably prevent it by forcing BTF on at all times and >>> making sure we clear TF before any branches in the SYSENTER >>> code. Needless to say, this is a bad idea. >>> >>> Signed-off-by: Andy Lutomirski >>> --- >>> arch/x86/entry/entry_32.S| 42 ++-- >>> arch/x86/entry/entry_64_compat.S | 9 ++- >>> arch/x86/include/asm/proto.h | 15 ++-- >>> arch/x86/kernel/traps.c | 52 >>> +--- >>> 4 files changed, 94 insertions(+), 24 deletions(-) >>> >>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S >>> index a8c3424c3392..7700cf695d8c 100644 >>> --- a/arch/x86/entry/entry_32.S >>> +++ b/arch/x86/entry/entry_32.S >>> @@ -287,7 +287,26 @@ need_resched: >>> END(resume_kernel) >>> #endif >>> >>> - # SYSENTER call handler stub >>> +GLOBAL(__begin_SYSENTER_singlestep_region) >>> +/* >>> + * All code from here through __end_SYSENTER_singlestep_region is subject >>> + * to being single-stepped if a user program sets TF and executes SYSENTER. >>> + * There is absolutely nothing that we can do to prevent this from >>> happening >>> + * (thanks Intel!). To keep our handling of this situation as simple as >>> + * possible, we handle TF just like AC and NT, except that our #DB handler >>> + * will ignore all of the single-step traps generated in this range. >>> + */ >>> + >>> +#ifdef CONFIG_XEN >>> +/* >>> + * Xen doesn't set %esp to be precisely what the normal SYSENTER >>> + * entry point expects, so fix it up before using the normal path. >>> + */ >>> +ENTRY(xen_sysenter_target) >>> + addl$5*4, %esp /* remove xen-provided frame >>> */ >>> + jmp sysenter_past_esp >>> +#endif >>> + >>> ENTRY(entry_SYSENTER_32) >>> movlTSS_sysenter_sp0(%esp), %esp >>> sysenter_past_esp: >> >> Can you do this below (ontop of your diff) and get rid of those >> __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat >> globals and use the entry_SYSENTER_{32|compat} symbols instead? >> >> From a quick scan, kallsyms can give you the symbol size too so that you >> can compute where it ends: >> >> readelf -a vmlinux | grep entry_SYSENTER >> 55454: 8170aff099 FUNCGLOBAL DEFAULT1 >> entry_SYSENTER_compat >> 62596: 8170b053 0 NOTYPE GLOBAL DEFAULT1 >> __end_entry_SYSENTER_comp >> >> 0x8170aff0 + 99 = 0x8170b053 >> >> --- >> Index: b/arch/x86/entry/entry_32.S >> === >> --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100 >> +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100 >> @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio >> * will ignore all of the single-step traps generated in this range. >> */ >> >> +ENTRY(entry_SYSENTER_32) >> #ifdef CONFIG_XEN >> -/* >> - * Xen doesn't set %esp to be precisely what the normal SYSENTER >> - * entry point expects, so fix it up before using the normal path. >> - */ >> -ENTRY(xen_sysenter_target) >> addl$5*4, %esp /* remove xen-provided frame >> */ >> jmp sysenter_past_esp >> -#endif >> - >> -ENTRY(entry_SYSENTER_32) >> +#else >> movlTSS_sysenter_sp0(%esp), %esp >> +#endif >> sysenter_past_esp: >> pushl $__USER_DS /* pt_regs->ss */ >> pushl %ebp/* pt_regs->sp (stashed in bp) */ > > > This won't work if you run a Xen enabled kernel on bare metal. This > would work though: > > ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp", > X86_FEATURE_XENPV That might work. > > I haven't read the Xen hypervisor code, but what are those 5 words > that
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sun, Mar 6, 2016 at 9:30 AM, Brian Gerst wrote: > On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov wrote: >> On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote: >>> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >>> if a user does SYSENTER with TF set, we will single-step through the >>> kernel until something clears TF. There is absolutely nothing we can >>> do to prevent this short of turning off SYSENTER [1]. >>> >>> Simplify the handling considerably with two changes: >>> >>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>>add TF to that list of flags to sanitize with no overhead whatsoever. >>> >>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >>> >>> That's all we need to do. >>> >>> Don't get too excited -- our handling is still buggy on 32-bit >>> kernels. There's nothing wrong with the SYSENTER code itself, but >>> the #DB prologue has a clever fixup for traps on the very first >>> instruction of entry_SYSENTER_32, and the fixup doesn't work quite >>> correctly. The next two patches will fix that. >>> >>> [1] We could probably prevent it by forcing BTF on at all times and >>> making sure we clear TF before any branches in the SYSENTER >>> code. Needless to say, this is a bad idea. >>> >>> Signed-off-by: Andy Lutomirski >>> --- >>> arch/x86/entry/entry_32.S| 42 ++-- >>> arch/x86/entry/entry_64_compat.S | 9 ++- >>> arch/x86/include/asm/proto.h | 15 ++-- >>> arch/x86/kernel/traps.c | 52 >>> +--- >>> 4 files changed, 94 insertions(+), 24 deletions(-) >>> >>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S >>> index a8c3424c3392..7700cf695d8c 100644 >>> --- a/arch/x86/entry/entry_32.S >>> +++ b/arch/x86/entry/entry_32.S >>> @@ -287,7 +287,26 @@ need_resched: >>> END(resume_kernel) >>> #endif >>> >>> - # SYSENTER call handler stub >>> +GLOBAL(__begin_SYSENTER_singlestep_region) >>> +/* >>> + * All code from here through __end_SYSENTER_singlestep_region is subject >>> + * to being single-stepped if a user program sets TF and executes SYSENTER. >>> + * There is absolutely nothing that we can do to prevent this from >>> happening >>> + * (thanks Intel!). To keep our handling of this situation as simple as >>> + * possible, we handle TF just like AC and NT, except that our #DB handler >>> + * will ignore all of the single-step traps generated in this range. >>> + */ >>> + >>> +#ifdef CONFIG_XEN >>> +/* >>> + * Xen doesn't set %esp to be precisely what the normal SYSENTER >>> + * entry point expects, so fix it up before using the normal path. >>> + */ >>> +ENTRY(xen_sysenter_target) >>> + addl$5*4, %esp /* remove xen-provided frame >>> */ >>> + jmp sysenter_past_esp >>> +#endif >>> + >>> ENTRY(entry_SYSENTER_32) >>> movlTSS_sysenter_sp0(%esp), %esp >>> sysenter_past_esp: >> >> Can you do this below (ontop of your diff) and get rid of those >> __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat >> globals and use the entry_SYSENTER_{32|compat} symbols instead? >> >> From a quick scan, kallsyms can give you the symbol size too so that you >> can compute where it ends: >> >> readelf -a vmlinux | grep entry_SYSENTER >> 55454: 8170aff099 FUNCGLOBAL DEFAULT1 >> entry_SYSENTER_compat >> 62596: 8170b053 0 NOTYPE GLOBAL DEFAULT1 >> __end_entry_SYSENTER_comp >> >> 0x8170aff0 + 99 = 0x8170b053 >> >> --- >> Index: b/arch/x86/entry/entry_32.S >> === >> --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100 >> +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100 >> @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio >> * will ignore all of the single-step traps generated in this range. >> */ >> >> +ENTRY(entry_SYSENTER_32) >> #ifdef CONFIG_XEN >> -/* >> - * Xen doesn't set %esp to be precisely what the normal SYSENTER >> - * entry point expects, so fix it up before using the normal path. >> - */ >> -ENTRY(xen_sysenter_target) >> addl$5*4, %esp /* remove xen-provided frame >> */ >> jmp sysenter_past_esp >> -#endif >> - >> -ENTRY(entry_SYSENTER_32) >> +#else >> movlTSS_sysenter_sp0(%esp), %esp >> +#endif >> sysenter_past_esp: >> pushl $__USER_DS /* pt_regs->ss */ >> pushl %ebp/* pt_regs->sp (stashed in bp) */ > > > This won't work if you run a Xen enabled kernel on bare metal. This > would work though: > > ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp", > X86_FEATURE_XENPV That might work. > > I haven't read the Xen hypervisor code, but what are those 5 words > that were pushed on the stack by the hypervisor? It
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkovwrote: > On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote: >> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >> if a user does SYSENTER with TF set, we will single-step through the >> kernel until something clears TF. There is absolutely nothing we can >> do to prevent this short of turning off SYSENTER [1]. >> >> Simplify the handling considerably with two changes: >> >> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>add TF to that list of flags to sanitize with no overhead whatsoever. >> >> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >> >> That's all we need to do. >> >> Don't get too excited -- our handling is still buggy on 32-bit >> kernels. There's nothing wrong with the SYSENTER code itself, but >> the #DB prologue has a clever fixup for traps on the very first >> instruction of entry_SYSENTER_32, and the fixup doesn't work quite >> correctly. The next two patches will fix that. >> >> [1] We could probably prevent it by forcing BTF on at all times and >> making sure we clear TF before any branches in the SYSENTER >> code. Needless to say, this is a bad idea. >> >> Signed-off-by: Andy Lutomirski >> --- >> arch/x86/entry/entry_32.S| 42 ++-- >> arch/x86/entry/entry_64_compat.S | 9 ++- >> arch/x86/include/asm/proto.h | 15 ++-- >> arch/x86/kernel/traps.c | 52 >> +--- >> 4 files changed, 94 insertions(+), 24 deletions(-) >> >> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S >> index a8c3424c3392..7700cf695d8c 100644 >> --- a/arch/x86/entry/entry_32.S >> +++ b/arch/x86/entry/entry_32.S >> @@ -287,7 +287,26 @@ need_resched: >> END(resume_kernel) >> #endif >> >> - # SYSENTER call handler stub >> +GLOBAL(__begin_SYSENTER_singlestep_region) >> +/* >> + * All code from here through __end_SYSENTER_singlestep_region is subject >> + * to being single-stepped if a user program sets TF and executes SYSENTER. >> + * There is absolutely nothing that we can do to prevent this from happening >> + * (thanks Intel!). To keep our handling of this situation as simple as >> + * possible, we handle TF just like AC and NT, except that our #DB handler >> + * will ignore all of the single-step traps generated in this range. >> + */ >> + >> +#ifdef CONFIG_XEN >> +/* >> + * Xen doesn't set %esp to be precisely what the normal SYSENTER >> + * entry point expects, so fix it up before using the normal path. >> + */ >> +ENTRY(xen_sysenter_target) >> + addl$5*4, %esp /* remove xen-provided frame */ >> + jmp sysenter_past_esp >> +#endif >> + >> ENTRY(entry_SYSENTER_32) >> movlTSS_sysenter_sp0(%esp), %esp >> sysenter_past_esp: > > Can you do this below (ontop of your diff) and get rid of those > __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat > globals and use the entry_SYSENTER_{32|compat} symbols instead? > > From a quick scan, kallsyms can give you the symbol size too so that you > can compute where it ends: > > readelf -a vmlinux | grep entry_SYSENTER > 55454: 8170aff099 FUNCGLOBAL DEFAULT1 > entry_SYSENTER_compat > 62596: 8170b053 0 NOTYPE GLOBAL DEFAULT1 > __end_entry_SYSENTER_comp > > 0x8170aff0 + 99 = 0x8170b053 > > --- > Index: b/arch/x86/entry/entry_32.S > === > --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100 > +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100 > @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio > * will ignore all of the single-step traps generated in this range. > */ > > +ENTRY(entry_SYSENTER_32) > #ifdef CONFIG_XEN > -/* > - * Xen doesn't set %esp to be precisely what the normal SYSENTER > - * entry point expects, so fix it up before using the normal path. > - */ > -ENTRY(xen_sysenter_target) > addl$5*4, %esp /* remove xen-provided frame > */ > jmp sysenter_past_esp > -#endif > - > -ENTRY(entry_SYSENTER_32) > +#else > movlTSS_sysenter_sp0(%esp), %esp > +#endif > sysenter_past_esp: > pushl $__USER_DS /* pt_regs->ss */ > pushl %ebp/* pt_regs->sp (stashed in bp) */ This won't work if you run a Xen enabled kernel on bare metal. This would work though: ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp", X86_FEATURE_XENPV I haven't read the Xen hypervisor code, but what are those 5 words that were pushed on the stack by the hypervisor? It suspiciously is the size of an IRET frame. Considering that we don't use SYSEXIT on Xen anymore, can we just redirect SYSENTER to the INT80 handler? Perhaps even just disable
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov wrote: > On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote: >> Due to a blatant design error, SYSENTER doesn't clear TF. As a result, >> if a user does SYSENTER with TF set, we will single-step through the >> kernel until something clears TF. There is absolutely nothing we can >> do to prevent this short of turning off SYSENTER [1]. >> >> Simplify the handling considerably with two changes: >> >> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >>add TF to that list of flags to sanitize with no overhead whatsoever. >> >> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. >> >> That's all we need to do. >> >> Don't get too excited -- our handling is still buggy on 32-bit >> kernels. There's nothing wrong with the SYSENTER code itself, but >> the #DB prologue has a clever fixup for traps on the very first >> instruction of entry_SYSENTER_32, and the fixup doesn't work quite >> correctly. The next two patches will fix that. >> >> [1] We could probably prevent it by forcing BTF on at all times and >> making sure we clear TF before any branches in the SYSENTER >> code. Needless to say, this is a bad idea. >> >> Signed-off-by: Andy Lutomirski >> --- >> arch/x86/entry/entry_32.S| 42 ++-- >> arch/x86/entry/entry_64_compat.S | 9 ++- >> arch/x86/include/asm/proto.h | 15 ++-- >> arch/x86/kernel/traps.c | 52 >> +--- >> 4 files changed, 94 insertions(+), 24 deletions(-) >> >> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S >> index a8c3424c3392..7700cf695d8c 100644 >> --- a/arch/x86/entry/entry_32.S >> +++ b/arch/x86/entry/entry_32.S >> @@ -287,7 +287,26 @@ need_resched: >> END(resume_kernel) >> #endif >> >> - # SYSENTER call handler stub >> +GLOBAL(__begin_SYSENTER_singlestep_region) >> +/* >> + * All code from here through __end_SYSENTER_singlestep_region is subject >> + * to being single-stepped if a user program sets TF and executes SYSENTER. >> + * There is absolutely nothing that we can do to prevent this from happening >> + * (thanks Intel!). To keep our handling of this situation as simple as >> + * possible, we handle TF just like AC and NT, except that our #DB handler >> + * will ignore all of the single-step traps generated in this range. >> + */ >> + >> +#ifdef CONFIG_XEN >> +/* >> + * Xen doesn't set %esp to be precisely what the normal SYSENTER >> + * entry point expects, so fix it up before using the normal path. >> + */ >> +ENTRY(xen_sysenter_target) >> + addl$5*4, %esp /* remove xen-provided frame */ >> + jmp sysenter_past_esp >> +#endif >> + >> ENTRY(entry_SYSENTER_32) >> movlTSS_sysenter_sp0(%esp), %esp >> sysenter_past_esp: > > Can you do this below (ontop of your diff) and get rid of those > __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat > globals and use the entry_SYSENTER_{32|compat} symbols instead? > > From a quick scan, kallsyms can give you the symbol size too so that you > can compute where it ends: > > readelf -a vmlinux | grep entry_SYSENTER > 55454: 8170aff099 FUNCGLOBAL DEFAULT1 > entry_SYSENTER_compat > 62596: 8170b053 0 NOTYPE GLOBAL DEFAULT1 > __end_entry_SYSENTER_comp > > 0x8170aff0 + 99 = 0x8170b053 > > --- > Index: b/arch/x86/entry/entry_32.S > === > --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100 > +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100 > @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio > * will ignore all of the single-step traps generated in this range. > */ > > +ENTRY(entry_SYSENTER_32) > #ifdef CONFIG_XEN > -/* > - * Xen doesn't set %esp to be precisely what the normal SYSENTER > - * entry point expects, so fix it up before using the normal path. > - */ > -ENTRY(xen_sysenter_target) > addl$5*4, %esp /* remove xen-provided frame > */ > jmp sysenter_past_esp > -#endif > - > -ENTRY(entry_SYSENTER_32) > +#else > movlTSS_sysenter_sp0(%esp), %esp > +#endif > sysenter_past_esp: > pushl $__USER_DS /* pt_regs->ss */ > pushl %ebp/* pt_regs->sp (stashed in bp) */ This won't work if you run a Xen enabled kernel on bare metal. This would work though: ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp", X86_FEATURE_XENPV I haven't read the Xen hypervisor code, but what are those 5 words that were pushed on the stack by the hypervisor? It suspiciously is the size of an IRET frame. Considering that we don't use SYSEXIT on Xen anymore, can we just redirect SYSENTER to the INT80 handler? Perhaps even just disable SYSENTER support in the VDSO on Xen. I
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote: > Due to a blatant design error, SYSENTER doesn't clear TF. As a result, > if a user does SYSENTER with TF set, we will single-step through the > kernel until something clears TF. There is absolutely nothing we can > do to prevent this short of turning off SYSENTER [1]. > > Simplify the handling considerably with two changes: > > 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >add TF to that list of flags to sanitize with no overhead whatsoever. > > 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. > > That's all we need to do. > > Don't get too excited -- our handling is still buggy on 32-bit > kernels. There's nothing wrong with the SYSENTER code itself, but > the #DB prologue has a clever fixup for traps on the very first > instruction of entry_SYSENTER_32, and the fixup doesn't work quite > correctly. The next two patches will fix that. > > [1] We could probably prevent it by forcing BTF on at all times and > making sure we clear TF before any branches in the SYSENTER > code. Needless to say, this is a bad idea. > > Signed-off-by: Andy Lutomirski> --- > arch/x86/entry/entry_32.S| 42 ++-- > arch/x86/entry/entry_64_compat.S | 9 ++- > arch/x86/include/asm/proto.h | 15 ++-- > arch/x86/kernel/traps.c | 52 > +--- > 4 files changed, 94 insertions(+), 24 deletions(-) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index a8c3424c3392..7700cf695d8c 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -287,7 +287,26 @@ need_resched: > END(resume_kernel) > #endif > > - # SYSENTER call handler stub > +GLOBAL(__begin_SYSENTER_singlestep_region) > +/* > + * All code from here through __end_SYSENTER_singlestep_region is subject > + * to being single-stepped if a user program sets TF and executes SYSENTER. > + * There is absolutely nothing that we can do to prevent this from happening > + * (thanks Intel!). To keep our handling of this situation as simple as > + * possible, we handle TF just like AC and NT, except that our #DB handler > + * will ignore all of the single-step traps generated in this range. > + */ > + > +#ifdef CONFIG_XEN > +/* > + * Xen doesn't set %esp to be precisely what the normal SYSENTER > + * entry point expects, so fix it up before using the normal path. > + */ > +ENTRY(xen_sysenter_target) > + addl$5*4, %esp /* remove xen-provided frame */ > + jmp sysenter_past_esp > +#endif > + > ENTRY(entry_SYSENTER_32) > movlTSS_sysenter_sp0(%esp), %esp > sysenter_past_esp: Can you do this below (ontop of your diff) and get rid of those __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat globals and use the entry_SYSENTER_{32|compat} symbols instead? >From a quick scan, kallsyms can give you the symbol size too so that you can compute where it ends: readelf -a vmlinux | grep entry_SYSENTER 55454: 8170aff099 FUNCGLOBAL DEFAULT1 entry_SYSENTER_compat 62596: 8170b053 0 NOTYPE GLOBAL DEFAULT1 __end_entry_SYSENTER_comp 0x8170aff0 + 99 = 0x8170b053 --- Index: b/arch/x86/entry/entry_32.S === --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100 +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100 @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio * will ignore all of the single-step traps generated in this range. */ +ENTRY(entry_SYSENTER_32) #ifdef CONFIG_XEN -/* - * Xen doesn't set %esp to be precisely what the normal SYSENTER - * entry point expects, so fix it up before using the normal path. - */ -ENTRY(xen_sysenter_target) addl$5*4, %esp /* remove xen-provided frame */ jmp sysenter_past_esp -#endif - -ENTRY(entry_SYSENTER_32) +#else movlTSS_sysenter_sp0(%esp), %esp +#endif sysenter_past_esp: pushl $__USER_DS /* pt_regs->ss */ pushl %ebp/* pt_regs->sp (stashed in bp) */ -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling
On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote: > Due to a blatant design error, SYSENTER doesn't clear TF. As a result, > if a user does SYSENTER with TF set, we will single-step through the > kernel until something clears TF. There is absolutely nothing we can > do to prevent this short of turning off SYSENTER [1]. > > Simplify the handling considerably with two changes: > > 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can >add TF to that list of flags to sanitize with no overhead whatsoever. > > 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue. > > That's all we need to do. > > Don't get too excited -- our handling is still buggy on 32-bit > kernels. There's nothing wrong with the SYSENTER code itself, but > the #DB prologue has a clever fixup for traps on the very first > instruction of entry_SYSENTER_32, and the fixup doesn't work quite > correctly. The next two patches will fix that. > > [1] We could probably prevent it by forcing BTF on at all times and > making sure we clear TF before any branches in the SYSENTER > code. Needless to say, this is a bad idea. > > Signed-off-by: Andy Lutomirski > --- > arch/x86/entry/entry_32.S| 42 ++-- > arch/x86/entry/entry_64_compat.S | 9 ++- > arch/x86/include/asm/proto.h | 15 ++-- > arch/x86/kernel/traps.c | 52 > +--- > 4 files changed, 94 insertions(+), 24 deletions(-) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index a8c3424c3392..7700cf695d8c 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -287,7 +287,26 @@ need_resched: > END(resume_kernel) > #endif > > - # SYSENTER call handler stub > +GLOBAL(__begin_SYSENTER_singlestep_region) > +/* > + * All code from here through __end_SYSENTER_singlestep_region is subject > + * to being single-stepped if a user program sets TF and executes SYSENTER. > + * There is absolutely nothing that we can do to prevent this from happening > + * (thanks Intel!). To keep our handling of this situation as simple as > + * possible, we handle TF just like AC and NT, except that our #DB handler > + * will ignore all of the single-step traps generated in this range. > + */ > + > +#ifdef CONFIG_XEN > +/* > + * Xen doesn't set %esp to be precisely what the normal SYSENTER > + * entry point expects, so fix it up before using the normal path. > + */ > +ENTRY(xen_sysenter_target) > + addl$5*4, %esp /* remove xen-provided frame */ > + jmp sysenter_past_esp > +#endif > + > ENTRY(entry_SYSENTER_32) > movlTSS_sysenter_sp0(%esp), %esp > sysenter_past_esp: Can you do this below (ontop of your diff) and get rid of those __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat globals and use the entry_SYSENTER_{32|compat} symbols instead? >From a quick scan, kallsyms can give you the symbol size too so that you can compute where it ends: readelf -a vmlinux | grep entry_SYSENTER 55454: 8170aff099 FUNCGLOBAL DEFAULT1 entry_SYSENTER_compat 62596: 8170b053 0 NOTYPE GLOBAL DEFAULT1 __end_entry_SYSENTER_comp 0x8170aff0 + 99 = 0x8170b053 --- Index: b/arch/x86/entry/entry_32.S === --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100 +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100 @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio * will ignore all of the single-step traps generated in this range. */ +ENTRY(entry_SYSENTER_32) #ifdef CONFIG_XEN -/* - * Xen doesn't set %esp to be precisely what the normal SYSENTER - * entry point expects, so fix it up before using the normal path. - */ -ENTRY(xen_sysenter_target) addl$5*4, %esp /* remove xen-provided frame */ jmp sysenter_past_esp -#endif - -ENTRY(entry_SYSENTER_32) +#else movlTSS_sysenter_sp0(%esp), %esp +#endif sysenter_past_esp: pushl $__USER_DS /* pt_regs->ss */ pushl %ebp/* pt_regs->sp (stashed in bp) */ -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.