Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
Dmitry Safonov <0x7f454...@gmail.com> wrote: > 2018-02-16 7:11 GMT+00:00 Cyrill Gorcunov : >> On Thu, Feb 15, 2018 at 11:29:42PM +, Andy Lutomirski wrote: >> ... >> +bool pti_handle_segment_not_present(long error_code) >> +{ >> + if (!static_cpu_has(X86_FEATURE_PTI)) >> + return false; >> + >> + if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3) >> + return false; >> + >> + pti_reenable(); >> + return true; >> +} > > Please don't. You're trying to emulate the old behavior here, but > you're emulating it wrong. In particular, you won't trap on LAR. Yes, I thought I’ll manage to address LAR, but failed. I thought you said this is not a “show-stopper”. I’ll adapt your approach of using prctl, although it really limits the benefit of this mechanism. >>> >>> It's possible we could get away with adding the prctl but making the >>> default be that only the bitness that matches the program being run is >>> allowed. After all, it's possible that CRIU is literally the only >>> program that switches bitness using the GDT. (DOSEMU2 definitely does >>> cross-bitness stuff, but it uses the LDT as far as I know.) And I've >>> never been entirely sure that CRIU fully counts toward the Linux >>> "don't break ABI" guarantee. >>> >>> Linus, how would you feel about, by default, preventing 64-bit >>> programs from long-jumping to __USER32_CS and vice versa? I think it >>> has some value as a hardening measure. I've certainly engaged in some >>> exploit shenanigans myself that took advantage of the ability to long >>> jump/ret to change bitness at will. This wouldn't affect users of >>> modify_ldt() -- 64-bit programs could still create and use their own >>> private 32-bit segments with modify_ldt(), and seccomp can (and >>> should!) prevent that in sandboxed programs. >>> >>> In general, I prefer an approach where everything is explicit to an >>> approach where we almost, but not quite, emulate the weird historical >>> behavior. >>> >>> Pavel and Cyrill, how annoying would it be if CRIU had to do an extra >>> arch_prctl() to enable its cross-bitness shenanigans when >>> checkpointing and restoring a 32-bit program? >> >> I think this should not be a problem for criu (CC'ing Dima, who has >> been working on compat mode support in criu). As far as I remember >> we initiate restoring of 32 bit tasks in native 64 bit mode (well, >> ia32e to be precise :) mode and then, once everything is ready, >> we changing the mode by doing a return to __USER32_CS descriptor. >> So this won't be painful to add additional prctl call here. > > Yeah, restoring will still be easy.. > But checkpointing will be harder if we can't switch to 64-bit mode. > ATM we have one 64-bit parasite binary, which does all seizing job > for both 64 and 32 bit binaries. > So, if you can't switch back to 64-bit from 32-bit mode, we'll need > to keep two parasites. I can allow to switch back and forth by dynamically enabling/disabling PTI. Andy, Dave, do you think it makes it a viable option? Should I respin another version of the patch-set?
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
2018-02-16 7:11 GMT+00:00 Cyrill Gorcunov : > On Thu, Feb 15, 2018 at 11:29:42PM +, Andy Lutomirski wrote: > ... >> >>> +bool pti_handle_segment_not_present(long error_code) >> >>> +{ >> >>> + if (!static_cpu_has(X86_FEATURE_PTI)) >> >>> + return false; >> >>> + >> >>> + if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3) >> >>> + return false; >> >>> + >> >>> + pti_reenable(); >> >>> + return true; >> >>> +} >> >> >> >> Please don't. You're trying to emulate the old behavior here, but >> >> you're emulating it wrong. In particular, you won't trap on LAR. >> > >> > Yes, I thought I’ll manage to address LAR, but failed. I thought you said >> > this is not a “show-stopper”. I’ll adapt your approach of using prctl, >> > although >> > it really limits the benefit of this mechanism. >> > >> >> It's possible we could get away with adding the prctl but making the >> default be that only the bitness that matches the program being run is >> allowed. After all, it's possible that CRIU is literally the only >> program that switches bitness using the GDT. (DOSEMU2 definitely does >> cross-bitness stuff, but it uses the LDT as far as I know.) And I've >> never been entirely sure that CRIU fully counts toward the Linux >> "don't break ABI" guarantee. >> >> Linus, how would you feel about, by default, preventing 64-bit >> programs from long-jumping to __USER32_CS and vice versa? I think it >> has some value as a hardening measure. I've certainly engaged in some >> exploit shenanigans myself that took advantage of the ability to long >> jump/ret to change bitness at will. This wouldn't affect users of >> modify_ldt() -- 64-bit programs could still create and use their own >> private 32-bit segments with modify_ldt(), and seccomp can (and >> should!) prevent that in sandboxed programs. >> >> In general, I prefer an approach where everything is explicit to an >> approach where we almost, but not quite, emulate the weird historical >> behavior. >> >> Pavel and Cyrill, how annoying would it be if CRIU had to do an extra >> arch_prctl() to enable its cross-bitness shenanigans when >> checkpointing and restoring a 32-bit program? > > I think this should not be a problem for criu (CC'ing Dima, who has > been working on compat mode support in criu). As far as I remember > we initiate restoring of 32 bit tasks in native 64 bit mode (well, > ia32e to be precise :) mode and then, once everything is ready, > we changing the mode by doing a return to __USER32_CS descriptor. > So this won't be painful to add additional prctl call here. Yeah, restoring will still be easy.. But checkpointing will be harder if we can't switch to 64-bit mode. ATM we have one 64-bit parasite binary, which does all seizing job for both 64 and 32 bit binaries. So, if you can't switch back to 64-bit from 32-bit mode, we'll need to keep two parasites. -- Dmitry
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
2018-02-15 20:02 GMT+00:00 Andy Lutomirski : > On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit wrote: >> Based on the understanding that there should be no way for userspace to >> address the kernel-space from compatibility mode, disable it while >> running in compatibility mode as long as the 64-bit code segment of the >> user is not used. >> >> Reenabling PTI is performed by restoring NX-bits to the userspace >> mappings, flushing the TLBs, and notifying all the CPUs that use the >> affected mm to disable PTI. Each core responds by removing the present >> bit for the 64-bit code-segment, and marking that PTI is disabled on >> that core. >> > > I dislike this patch because it's conflating two things. The patch > claims to merely disable PTI for compat tasks, whatever those are. > But it's also introducing a much stronger concept of what a compat > task is. The kernel currently mostly doesn't care whether a task is > "compat" or not, and I think that most remaining code paths that do > care are buggy and should be removed. Yes, please, don't do a stronger concept.. Speaking from CRIU side, it C/R ia32 tasks with x86_64 binaries. -- Dmitry
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
>> On Feb 15, 2018, at 4:08 PM, Linus Torvalds >> wrote: >> >> On Thu, Feb 15, 2018 at 3:29 PM, Andy Lutomirski wrote: >> >> It's possible we could get away with adding the prctl but making the >> default be that only the bitness that matches the program being run is >> allowed. After all, it's possible that CRIU is literally the only >> program that switches bitness using the GDT. (DOSEMU2 definitely does >> cross-bitness stuff, but it uses the LDT as far as I know.) And I've >> never been entirely sure that CRIU fully counts toward the Linux >> "don't break ABI" guarantee. > > Ugh. > > There are just _so_ many reasons to dislike that. > > It's not that I don't think we could try to encourage it, but this > whole "security depends on it being in sync" seems really like a > fundamentally bad design. If we're going to do Nadav's thing, I think we have no choice. We could say that Nadav's idea of turning off PTI for 32-bit is just too messy, though. > >> Linus, how would you feel about, by default, preventing 64-bit >> programs from long-jumping to __USER32_CS and vice versa? > > How? It's a standard GDT entry. Are you going to start switching the > GDT around every context switch? That's the idea. We already switch out three GDT entries for TLS. Switching two more isn't going to kill us. > > I *thought* that user space can just do a far jump on its own. But > it's so long since I had to care that I may have forgotten all the > requirements for going between "compatibility mode" and real long > mode. > > I just feel this all is a nightmare. I can see how you would want to > think that compatibility mode doesn't need PTI, but at the same time > it feels like a really risky move to do this. > > I can see one thread being in compatibiilty mode, and another being in > long mode, and sharing the address space. But even with just one > thread, I'm not seeing how you keep user mode from going from > compatibility mode to L mode with just a far jump. > > But maybe you have some clever scheme in mind that guarantees that > there are no issues, or maybe I've just forgotten all the details of > long mode vs compat mode. The clever scheme is that we have a new (maybe default) compat-and-i-mean-it mode that removes the DPL=3 L code segment from the GDT and prevents opportunistic SYSRET.
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
On Thu, Feb 15, 2018 at 11:29:42PM +, Andy Lutomirski wrote: ... > >>> +bool pti_handle_segment_not_present(long error_code) > >>> +{ > >>> + if (!static_cpu_has(X86_FEATURE_PTI)) > >>> + return false; > >>> + > >>> + if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3) > >>> + return false; > >>> + > >>> + pti_reenable(); > >>> + return true; > >>> +} > >> > >> Please don't. You're trying to emulate the old behavior here, but > >> you're emulating it wrong. In particular, you won't trap on LAR. > > > > Yes, I thought I’ll manage to address LAR, but failed. I thought you said > > this is not a “show-stopper”. I’ll adapt your approach of using prctl, > > although > > it really limits the benefit of this mechanism. > > > > It's possible we could get away with adding the prctl but making the > default be that only the bitness that matches the program being run is > allowed. After all, it's possible that CRIU is literally the only > program that switches bitness using the GDT. (DOSEMU2 definitely does > cross-bitness stuff, but it uses the LDT as far as I know.) And I've > never been entirely sure that CRIU fully counts toward the Linux > "don't break ABI" guarantee. > > Linus, how would you feel about, by default, preventing 64-bit > programs from long-jumping to __USER32_CS and vice versa? I think it > has some value as a hardening measure. I've certainly engaged in some > exploit shenanigans myself that took advantage of the ability to long > jump/ret to change bitness at will. This wouldn't affect users of > modify_ldt() -- 64-bit programs could still create and use their own > private 32-bit segments with modify_ldt(), and seccomp can (and > should!) prevent that in sandboxed programs. > > In general, I prefer an approach where everything is explicit to an > approach where we almost, but not quite, emulate the weird historical > behavior. > > Pavel and Cyrill, how annoying would it be if CRIU had to do an extra > arch_prctl() to enable its cross-bitness shenanigans when > checkpointing and restoring a 32-bit program? I think this should not be a problem for criu (CC'ing Dima, who has been working on compat mode support in criu). As far as I remember we initiate restoring of 32 bit tasks in native 64 bit mode (well, ia32e to be precise :) mode and then, once everything is ready, we changing the mode by doing a return to __USER32_CS descriptor. So this won't be painful to add additional prctl call here.
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
Andy Lutomirski wrote: > On Fri, Feb 16, 2018 at 12:42 AM, Linus Torvalds > wrote: >> On Thu, Feb 15, 2018 at 4:22 PM, Nadav Amit wrote: >>> It is not too pretty, I agree, but it should do the work. There is only one >>> problematic descriptor that can be used to switch from compatibility-mode to >>> long-mode in the GDT (LDT descriptors always have the L-bit cleared). >>> Changing the descriptor's present bit on context switch when needed can do >>> the work. >> >> Sure, I can see it working, but it's some really shady stuff, and now >> the scheduler needs to save/restore/check one more subtle bit. >> >> And if you get it wrong, things will happily work, except you've now >> defeated PTI. But you'll never notice, because you won't be testing >> for it, and the only people who will are the black hats. >> >> This is exactly the "security depends on it being in sync" thing that >> makes me go "eww" about the whole model. Get one thing wrong, and >> you'll blow all the PTI code out of the water. >> >> So now you tried to optimize one small case that most people won't >> use, but the downside is that you may make all our PTI work (and all >> the overhead for all the _normal_ cases) pointless. > > There's also the fact that, if this stuff goes in, we'll be > encouraging people to deploy 32-bit binaries. Then they'll buy > Meltdown-fixed CPUs (or AMD CPUs!) and they may well continue running > 32-bit binaries. Sigh. I'm not totally a fan of this. Ok, ok. Stop kicking the dead body...
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
On Fri, Feb 16, 2018 at 12:42 AM, Linus Torvalds wrote: > On Thu, Feb 15, 2018 at 4:22 PM, Nadav Amit wrote: >> >> It is not too pretty, I agree, but it should do the work. There is only one >> problematic descriptor that can be used to switch from compatibility-mode to >> long-mode in the GDT (LDT descriptors always have the L-bit cleared). >> Changing the descriptor's present bit on context switch when needed can do >> the work. > > Sure, I can see it working, but it's some really shady stuff, and now > the scheduler needs to save/restore/check one more subtle bit. > > And if you get it wrong, things will happily work, except you've now > defeated PTI. But you'll never notice, because you won't be testing > for it, and the only people who will are the black hats. > > This is exactly the "security depends on it being in sync" thing that > makes me go "eww" about the whole model. Get one thing wrong, and > you'll blow all the PTI code out of the water. > > So now you tried to optimize one small case that most people won't > use, but the downside is that you may make all our PTI work (and all > the overhead for all the _normal_ cases) pointless. > There's also the fact that, if this stuff goes in, we'll be encouraging people to deploy 32-bit binaries. Then they'll buy Meltdown-fixed CPUs (or AMD CPUs!) and they may well continue running 32-bit binaries. Sigh. I'm not totally a fan of this.
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
On Thu, Feb 15, 2018 at 4:22 PM, Nadav Amit wrote: > > It is not too pretty, I agree, but it should do the work. There is only one > problematic descriptor that can be used to switch from compatibility-mode to > long-mode in the GDT (LDT descriptors always have the L-bit cleared). > Changing the descriptor's present bit on context switch when needed can do > the work. Sure, I can see it working, but it's some really shady stuff, and now the scheduler needs to save/restore/check one more subtle bit. And if you get it wrong, things will happily work, except you've now defeated PTI. But you'll never notice, because you won't be testing for it, and the only people who will are the black hats. This is exactly the "security depends on it being in sync" thing that makes me go "eww" about the whole model. Get one thing wrong, and you'll blow all the PTI code out of the water. So now you tried to optimize one small case that most people won't use, but the downside is that you may make all our PTI work (and all the overhead for all the _normal_ cases) pointless. Linus
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
On 16/02/2018 00:08, Linus Torvalds wrote: > On Thu, Feb 15, 2018 at 3:29 PM, Andy Lutomirski wrote: >> Linus, how would you feel about, by default, preventing 64-bit >> programs from long-jumping to __USER32_CS and vice versa? > How? It's a standard GDT entry. Are you going to start switching the > GDT around every context switch? > > I *thought* that user space can just do a far jump on its own. But > it's so long since I had to care that I may have forgotten all the > requirements for going between "compatibility mode" and real long > mode. Yes - it is just a straight far jump to switch between compat and long mode. A evil^W cunning programmer can use the 286 world view and disable segments by clearing the present bit to yield #NP[sel] on use, which is liable to be rather faster than LGDT on a context switch. Alternatively, set both the L and D (code segments only), or playing with DPL/type can all yield #GP[sel] on use, but these probably aren't as good options. ~Andrew
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
Linus Torvalds wrote: > On Thu, Feb 15, 2018 at 3:29 PM, Andy Lutomirski wrote: >> It's possible we could get away with adding the prctl but making the >> default be that only the bitness that matches the program being run is >> allowed. After all, it's possible that CRIU is literally the only >> program that switches bitness using the GDT. (DOSEMU2 definitely does >> cross-bitness stuff, but it uses the LDT as far as I know.) And I've >> never been entirely sure that CRIU fully counts toward the Linux >> "don't break ABI" guarantee. > > Ugh. > > There are just _so_ many reasons to dislike that. > > It's not that I don't think we could try to encourage it, but this > whole "security depends on it being in sync" seems really like a > fundamentally bad design. > >> Linus, how would you feel about, by default, preventing 64-bit >> programs from long-jumping to __USER32_CS and vice versa? > > How? It's a standard GDT entry. Are you going to start switching the > GDT around every context switch? > > I *thought* that user space can just do a far jump on its own. But > it's so long since I had to care that I may have forgotten all the > requirements for going between "compatibility mode" and real long > mode. > > I just feel this all is a nightmare. I can see how you would want to > think that compatibility mode doesn't need PTI, but at the same time > it feels like a really risky move to do this. > > I can see one thread being in compatibiilty mode, and another being in > long mode, and sharing the address space. But even with just one > thread, I'm not seeing how you keep user mode from going from > compatibility mode to L mode with just a far jump. > > But maybe you have some clever scheme in mind that guarantees that > there are no issues, or maybe I've just forgotten all the details of > long mode vs compat mode. It is not too pretty, I agree, but it should do the work. There is only one problematic descriptor that can be used to switch from compatibility-mode to long-mode in the GDT (LDT descriptors always have the L-bit cleared). Changing the descriptor's present bit on context switch when needed can do the work. I tried to do it transparently, and if long-mode is entered, by any thread, restore PTI. There is one corner case I did not cover (LAR) and Andy felt this scheme is too complicated. Unfortunately, I don’t have a better scheme in mind. signature.asc Description: Message signed with OpenPGP
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
On Thu, Feb 15, 2018 at 3:29 PM, Andy Lutomirski wrote: > > It's possible we could get away with adding the prctl but making the > default be that only the bitness that matches the program being run is > allowed. After all, it's possible that CRIU is literally the only > program that switches bitness using the GDT. (DOSEMU2 definitely does > cross-bitness stuff, but it uses the LDT as far as I know.) And I've > never been entirely sure that CRIU fully counts toward the Linux > "don't break ABI" guarantee. Ugh. There are just _so_ many reasons to dislike that. It's not that I don't think we could try to encourage it, but this whole "security depends on it being in sync" seems really like a fundamentally bad design. > Linus, how would you feel about, by default, preventing 64-bit > programs from long-jumping to __USER32_CS and vice versa? How? It's a standard GDT entry. Are you going to start switching the GDT around every context switch? I *thought* that user space can just do a far jump on its own. But it's so long since I had to care that I may have forgotten all the requirements for going between "compatibility mode" and real long mode. I just feel this all is a nightmare. I can see how you would want to think that compatibility mode doesn't need PTI, but at the same time it feels like a really risky move to do this. I can see one thread being in compatibiilty mode, and another being in long mode, and sharing the address space. But even with just one thread, I'm not seeing how you keep user mode from going from compatibility mode to L mode with just a far jump. But maybe you have some clever scheme in mind that guarantees that there are no issues, or maybe I've just forgotten all the details of long mode vs compat mode. Linus
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
On Thu, Feb 15, 2018 at 8:58 PM, Nadav Amit wrote: > Andy Lutomirski wrote: > >> On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit wrote: >>> Based on the understanding that there should be no way for userspace to >>> address the kernel-space from compatibility mode, disable it while >>> running in compatibility mode as long as the 64-bit code segment of the >>> user is not used. >>> >>> Reenabling PTI is performed by restoring NX-bits to the userspace >>> mappings, flushing the TLBs, and notifying all the CPUs that use the >>> affected mm to disable PTI. Each core responds by removing the present >>> bit for the 64-bit code-segment, and marking that PTI is disabled on >>> that core. >> >> I dislike this patch because it's conflating two things. The patch >> claims to merely disable PTI for compat tasks, whatever those are. >> But it's also introducing a much stronger concept of what a compat >> task is. The kernel currently mostly doesn't care whether a task is >> "compat" or not, and I think that most remaining code paths that do >> care are buggy and should be removed. >> >> I think the right way to approach this is to add a new arch_prctl() >> that changes allowable bitness, like this: >> >> arch_prctl(ARCH_SET_ALLOWED_GDT_CS, X86_ALLOW_CS32 | X86_ALLOW_CS64); >> >> this would set the current task to work the normal way, where 32-bit >> and 64-bit CS are available. You could set just X86_ALLOW_CS32 to >> deny 64-bit mode and just X86_ALLOW_CS64 to deny 32-bit mode. This >> would make nice attack surface reduction tools for the more paranoid >> sandbox users to use. Doing arch_prctl(ARCH_SET_ALLOWED_GDT_CS, 0) >> would return -EINVAL. >> >> A separate patch would turn PTI off if you set X86_ALLOW_CS32. >> >> This has the downside that old code doesn't get the benefit without >> some code change, but that's not the end of the world. >> >>> +static void pti_cpu_update_func(void *info) >>> +{ >>> + struct mm_struct *mm = (struct mm_struct *)info; >>> + >>> + if (mm != this_cpu_read(cpu_tlbstate.loaded_mm)) >>> + return; >>> + >>> + /* >>> +* Keep CS64 and CPU settings in sync despite potential concurrent >>> +* updates. >>> +*/ >>> + set_cpu_pti_disable(READ_ONCE(mm->context.pti_disable)); >>> +} >> >> I don't like this at all. IMO a sane implementation should never >> change PTI status on a remote CPU. Just track it per task. > > From your comments in the past, I understood that this requirement > (enabling/disabling PTI per mm) came from Linus. I can do it per task, but > that means that the NX-bit will always be cleared on the kernel page-table > for user mappings. I disagree with Linus here, although I could be convinced. But there's still no need for an IPI -- even if it were per-mm, PTI could stay on until the next kernel entry. > >>> +void __pti_reenable(void) >>> +{ >>> + struct mm_struct *mm = current->mm; >>> + int cpu; >>> + >>> + if (!mm_pti_disable(mm)) >>> + return; >>> + >>> + /* >>> +* Prevent spurious page-fault storm while we set the NX-bit and >>> have >>> +* yet not updated the per-CPU pti_disable flag. >>> +*/ >>> + down_write(&mm->mmap_sem); >>> + >>> + if (!mm_pti_disable(mm)) >>> + goto out; >>> + >>> + /* >>> +* First, mark the PTI is enabled. Although we do anything yet, we >>> are >>> +* safe as long as we do not reenable CS64. Since we did not update >>> the >>> +* page tables yet, this may lead to spurious page-faults, but we >>> need >>> +* the pti_disable in mm to be set for __pti_set_user_pgd() to do >>> the >>> +* right thing. Holding mmap_sem would ensure matter we hold the >>> +* mmap_sem to prevent them from swamping the system. >>> +*/ >>> + mm->context.pti_disable = PTI_DISABLE_OFF; >>> + >>> + /* Second, restore the NX bits. */ >>> + pti_update_user_pgds(mm, true); >> >> You're holding mmap_sem, but there are code paths that touch page >> tables that don't hold mmap_sem, such as the stack extension code. > > Do they change user mappings? These are the only ones pti_update_user_pgds() > changes. I think they do -- it's the user stack. page_table_lock may be more appropriate, although I'm far from an expert in this. > >>> + >>> +bool pti_handle_segment_not_present(long error_code) >>> +{ >>> + if (!static_cpu_has(X86_FEATURE_PTI)) >>> + return false; >>> + >>> + if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3) >>> + return false; >>> + >>> + pti_reenable(); >>> + return true; >>> +} >> >> Please don't. You're trying to emulate the old behavior here, but >> you're emulating it wrong. In particular, you won't trap on LAR. > > Yes, I thought I’ll manage to address LAR, but failed. I thought you said > this is not a “show-stopper”. I’ll adapt your approach of usi
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
Andy Lutomirski wrote: > On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit wrote: >> Based on the understanding that there should be no way for userspace to >> address the kernel-space from compatibility mode, disable it while >> running in compatibility mode as long as the 64-bit code segment of the >> user is not used. >> >> Reenabling PTI is performed by restoring NX-bits to the userspace >> mappings, flushing the TLBs, and notifying all the CPUs that use the >> affected mm to disable PTI. Each core responds by removing the present >> bit for the 64-bit code-segment, and marking that PTI is disabled on >> that core. > > I dislike this patch because it's conflating two things. The patch > claims to merely disable PTI for compat tasks, whatever those are. > But it's also introducing a much stronger concept of what a compat > task is. The kernel currently mostly doesn't care whether a task is > "compat" or not, and I think that most remaining code paths that do > care are buggy and should be removed. > > I think the right way to approach this is to add a new arch_prctl() > that changes allowable bitness, like this: > > arch_prctl(ARCH_SET_ALLOWED_GDT_CS, X86_ALLOW_CS32 | X86_ALLOW_CS64); > > this would set the current task to work the normal way, where 32-bit > and 64-bit CS are available. You could set just X86_ALLOW_CS32 to > deny 64-bit mode and just X86_ALLOW_CS64 to deny 32-bit mode. This > would make nice attack surface reduction tools for the more paranoid > sandbox users to use. Doing arch_prctl(ARCH_SET_ALLOWED_GDT_CS, 0) > would return -EINVAL. > > A separate patch would turn PTI off if you set X86_ALLOW_CS32. > > This has the downside that old code doesn't get the benefit without > some code change, but that's not the end of the world. > >> +static void pti_cpu_update_func(void *info) >> +{ >> + struct mm_struct *mm = (struct mm_struct *)info; >> + >> + if (mm != this_cpu_read(cpu_tlbstate.loaded_mm)) >> + return; >> + >> + /* >> +* Keep CS64 and CPU settings in sync despite potential concurrent >> +* updates. >> +*/ >> + set_cpu_pti_disable(READ_ONCE(mm->context.pti_disable)); >> +} > > I don't like this at all. IMO a sane implementation should never > change PTI status on a remote CPU. Just track it per task. From your comments in the past, I understood that this requirement (enabling/disabling PTI per mm) came from Linus. I can do it per task, but that means that the NX-bit will always be cleared on the kernel page-table for user mappings. >> +void __pti_reenable(void) >> +{ >> + struct mm_struct *mm = current->mm; >> + int cpu; >> + >> + if (!mm_pti_disable(mm)) >> + return; >> + >> + /* >> +* Prevent spurious page-fault storm while we set the NX-bit and have >> +* yet not updated the per-CPU pti_disable flag. >> +*/ >> + down_write(&mm->mmap_sem); >> + >> + if (!mm_pti_disable(mm)) >> + goto out; >> + >> + /* >> +* First, mark the PTI is enabled. Although we do anything yet, we >> are >> +* safe as long as we do not reenable CS64. Since we did not update >> the >> +* page tables yet, this may lead to spurious page-faults, but we >> need >> +* the pti_disable in mm to be set for __pti_set_user_pgd() to do the >> +* right thing. Holding mmap_sem would ensure matter we hold the >> +* mmap_sem to prevent them from swamping the system. >> +*/ >> + mm->context.pti_disable = PTI_DISABLE_OFF; >> + >> + /* Second, restore the NX bits. */ >> + pti_update_user_pgds(mm, true); > > You're holding mmap_sem, but there are code paths that touch page > tables that don't hold mmap_sem, such as the stack extension code. Do they change user mappings? These are the only ones pti_update_user_pgds() changes. >> + >> +bool pti_handle_segment_not_present(long error_code) >> +{ >> + if (!static_cpu_has(X86_FEATURE_PTI)) >> + return false; >> + >> + if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3) >> + return false; >> + >> + pti_reenable(); >> + return true; >> +} > > Please don't. You're trying to emulate the old behavior here, but > you're emulating it wrong. In particular, you won't trap on LAR. Yes, I thought I’ll manage to address LAR, but failed. I thought you said this is not a “show-stopper”. I’ll adapt your approach of using prctl, although it really limits the benefit of this mechanism.
Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit wrote: > Based on the understanding that there should be no way for userspace to > address the kernel-space from compatibility mode, disable it while > running in compatibility mode as long as the 64-bit code segment of the > user is not used. > > Reenabling PTI is performed by restoring NX-bits to the userspace > mappings, flushing the TLBs, and notifying all the CPUs that use the > affected mm to disable PTI. Each core responds by removing the present > bit for the 64-bit code-segment, and marking that PTI is disabled on > that core. > I dislike this patch because it's conflating two things. The patch claims to merely disable PTI for compat tasks, whatever those are. But it's also introducing a much stronger concept of what a compat task is. The kernel currently mostly doesn't care whether a task is "compat" or not, and I think that most remaining code paths that do care are buggy and should be removed. I think the right way to approach this is to add a new arch_prctl() that changes allowable bitness, like this: arch_prctl(ARCH_SET_ALLOWED_GDT_CS, X86_ALLOW_CS32 | X86_ALLOW_CS64); this would set the current task to work the normal way, where 32-bit and 64-bit CS are available. You could set just X86_ALLOW_CS32 to deny 64-bit mode and just X86_ALLOW_CS64 to deny 32-bit mode. This would make nice attack surface reduction tools for the more paranoid sandbox users to use. Doing arch_prctl(ARCH_SET_ALLOWED_GDT_CS, 0) would return -EINVAL. A separate patch would turn PTI off if you set X86_ALLOW_CS32. This has the downside that old code doesn't get the benefit without some code change, but that's not the end of the world. > +static void pti_cpu_update_func(void *info) > +{ > + struct mm_struct *mm = (struct mm_struct *)info; > + > + if (mm != this_cpu_read(cpu_tlbstate.loaded_mm)) > + return; > + > + /* > +* Keep CS64 and CPU settings in sync despite potential concurrent > +* updates. > +*/ > + set_cpu_pti_disable(READ_ONCE(mm->context.pti_disable)); > +} I don't like this at all. IMO a sane implementation should never change PTI status on a remote CPU. Just track it per task. > +void __pti_reenable(void) > +{ > + struct mm_struct *mm = current->mm; > + int cpu; > + > + if (!mm_pti_disable(mm)) > + return; > + > + /* > +* Prevent spurious page-fault storm while we set the NX-bit and have > +* yet not updated the per-CPU pti_disable flag. > +*/ > + down_write(&mm->mmap_sem); > + > + if (!mm_pti_disable(mm)) > + goto out; > + > + /* > +* First, mark the PTI is enabled. Although we do anything yet, we are > +* safe as long as we do not reenable CS64. Since we did not update > the > +* page tables yet, this may lead to spurious page-faults, but we need > +* the pti_disable in mm to be set for __pti_set_user_pgd() to do the > +* right thing. Holding mmap_sem would ensure matter we hold the > +* mmap_sem to prevent them from swamping the system. > +*/ > + mm->context.pti_disable = PTI_DISABLE_OFF; > + > + /* Second, restore the NX bits. */ > + pti_update_user_pgds(mm, true); You're holding mmap_sem, but there are code paths that touch page tables that don't hold mmap_sem, such as the stack extension code. > + > +bool pti_handle_segment_not_present(long error_code) > +{ > + if (!static_cpu_has(X86_FEATURE_PTI)) > + return false; > + > + if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3) > + return false; > + > + pti_reenable(); > + return true; > +} Please don't. You're trying to emulate the old behavior here, but you're emulating it wrong. In particular, you won't trap on LAR.
[PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode
Based on the understanding that there should be no way for userspace to address the kernel-space from compatibility mode, disable it while running in compatibility mode as long as the 64-bit code segment of the user is not used. Reenabling PTI is performed by restoring NX-bits to the userspace mappings, flushing the TLBs, and notifying all the CPUs that use the affected mm to disable PTI. Each core responds by removing the present bit for the 64-bit code-segment, and marking that PTI is disabled on that core. Signed-off-by: Nadav Amit --- arch/x86/include/asm/pti.h | 39 + arch/x86/kernel/process_64.c | 13 - arch/x86/kernel/traps.c | 23 +++- arch/x86/mm/pti.c| 130 +++ 4 files changed, 201 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pti.h b/arch/x86/include/asm/pti.h index 78a333699874..d04954ebb0d4 100644 --- a/arch/x86/include/asm/pti.h +++ b/arch/x86/include/asm/pti.h @@ -31,6 +31,42 @@ static inline void pti_update_user_cs64(unsigned short prev_pti_disable, write_gdt_entry(d, GDT_ENTRY_DEFAULT_USER_CS, &user_cs, DESCTYPE_S); } +void __pti_reenable(void); + +static inline void pti_reenable(void) +{ + if (!static_cpu_has(X86_FEATURE_PTI) || !mm_pti_disable(current->mm)) + return; + + __pti_reenable(); +} + +void __pti_disable(unsigned short type); + +static inline void pti_disable(unsigned short type) +{ + /* +* To allow PTI to be disabled, we must: +* +* 1. Have PTI enabled. +* 2. Have SMEP enabled, since the lack of NX-bit on user mappings +*raises general security concerns. +* 3. Have NX-bit enabled, since reenabling PTI has a corner case in +*which the kernel tables are restored instead of those of those of +*the user. Having NX-bit causes this scenario to trigger a spurious +*page-fault when control is returned to the user, and allow the +*entry code to restore the page-tables to their correct state. +*/ + if (!static_cpu_has(X86_FEATURE_PTI) || + !static_cpu_has(X86_FEATURE_SMEP) || + !static_cpu_has(X86_FEATURE_NX)) + return; + + __pti_disable(type); +} + +bool pti_handle_segment_not_present(long error_code); + extern void pti_init(void); extern void pti_check_boottime_disable(void); #else @@ -38,6 +74,9 @@ static inline unsigned short mm_pti_disable(struct mm_struct *mm) { return 0; } static inline unsigned short mm_pti_disable(struct mm_struct *mm); static inline void pti_update_user_cs64(unsigned short prev_pti_disable, unsigned short next_pti_disable) { } +static inline void pti_disable(unsigned short type) { } +static inline void pti_reenable(void) { } +static inline bool pti_handle_segment_not_present(long error_code) { return false; } static inline void pti_check_boottime_disable(void) { } #endif diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index c75466232016..24d3429b4191 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -54,6 +54,7 @@ #include #include #include +#include #ifdef CONFIG_IA32_EMULATION /* Not included via unistd.h */ #include @@ -530,8 +531,10 @@ void set_personality_64bit(void) task_pt_regs(current)->orig_ax = __NR_execve; /* Ensure the corresponding mm is not marked. */ - if (current->mm) + if (current->mm) { current->mm->context.ia32_compat = 0; + pti_reenable(); + } /* TBD: overwrites user setup. Should have two bits. But 64bit processes have always behaved this way, @@ -545,8 +548,10 @@ static void __set_personality_x32(void) #ifdef CONFIG_X86_X32 clear_thread_flag(TIF_IA32); set_thread_flag(TIF_X32); - if (current->mm) + if (current->mm) { current->mm->context.ia32_compat = TIF_X32; + pti_reenable(); + } current->personality &= ~READ_IMPLIES_EXEC; /* * in_compat_syscall() uses the presence of the x32 syscall bit @@ -566,8 +571,10 @@ static void __set_personality_ia32(void) #ifdef CONFIG_IA32_EMULATION set_thread_flag(TIF_IA32); clear_thread_flag(TIF_X32); - if (current->mm) + if (current->mm) { current->mm->context.ia32_compat = TIF_IA32; + pti_disable(PTI_DISABLE_IA32); + } current->personality |= force_personality32; /* Prepare the first "return" to user space */ task_pt_regs(current)->orig_ax = __NR_ia32_execve; diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 446c9ef8cfc3..65d8ccb20175 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -61,6 +61,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -315,7 +316,6 @@