Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-16 Thread Nadav Amit
Dmitry Safonov <0x7f454...@gmail.com> wrote:

> 2018-02-16 7:11 GMT+00:00 Cyrill Gorcunov :
>> On Thu, Feb 15, 2018 at 11:29:42PM +, Andy Lutomirski wrote:
>> ...
>> +bool pti_handle_segment_not_present(long error_code)
>> +{
>> +   if (!static_cpu_has(X86_FEATURE_PTI))
>> +   return false;
>> +
>> +   if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3)
>> +   return false;
>> +
>> +   pti_reenable();
>> +   return true;
>> +}
> 
> Please don't.  You're trying to emulate the old behavior here, but
> you're emulating it wrong.  In particular, you won't trap on LAR.
 
 Yes, I thought I’ll manage to address LAR, but failed. I thought you said
 this is not a “show-stopper”. I’ll adapt your approach of using prctl, 
 although
 it really limits the benefit of this mechanism.
>>> 
>>> It's possible we could get away with adding the prctl but making the
>>> default be that only the bitness that matches the program being run is
>>> allowed.  After all, it's possible that CRIU is literally the only
>>> program that switches bitness using the GDT.  (DOSEMU2 definitely does
>>> cross-bitness stuff, but it uses the LDT as far as I know.)  And I've
>>> never been entirely sure that CRIU fully counts toward the Linux
>>> "don't break ABI" guarantee.
>>> 
>>> Linus, how would you feel about, by default, preventing 64-bit
>>> programs from long-jumping to __USER32_CS and vice versa?  I think it
>>> has some value as a hardening measure.  I've certainly engaged in some
>>> exploit shenanigans myself that took advantage of the ability to long
>>> jump/ret to change bitness at will.  This wouldn't affect users of
>>> modify_ldt() -- 64-bit programs could still create and use their own
>>> private 32-bit segments with modify_ldt(), and seccomp can (and
>>> should!) prevent that in sandboxed programs.
>>> 
>>> In general, I prefer an approach where everything is explicit to an
>>> approach where we almost, but not quite, emulate the weird historical
>>> behavior.
>>> 
>>> Pavel and Cyrill, how annoying would it be if CRIU had to do an extra
>>> arch_prctl() to enable its cross-bitness shenanigans when
>>> checkpointing and restoring a 32-bit program?
>> 
>> I think this should not be a problem for criu (CC'ing Dima, who has
>> been working on compat mode support in criu). As far as I remember
>> we initiate restoring of 32 bit tasks in native 64 bit mode (well,
>> ia32e to be precise :) mode and then, once everything is ready,
>> we changing the mode by doing a return to __USER32_CS descriptor.
>> So this won't be painful to add additional prctl call here.
> 
> Yeah, restoring will still be easy..
> But checkpointing will be harder if we can't switch to 64-bit mode.
> ATM we have one 64-bit parasite binary, which does all seizing job
> for both 64 and 32 bit binaries.
> So, if you can't switch back to 64-bit from 32-bit mode, we'll need
> to keep two parasites.

I can allow to switch back and forth by dynamically enabling/disabling PTI.
Andy, Dave, do you think it makes it a viable option? Should I respin
another version of the patch-set?



Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-16 Thread Dmitry Safonov
2018-02-16 7:11 GMT+00:00 Cyrill Gorcunov :
> On Thu, Feb 15, 2018 at 11:29:42PM +, Andy Lutomirski wrote:
> ...
>> >>> +bool pti_handle_segment_not_present(long error_code)
>> >>> +{
>> >>> +   if (!static_cpu_has(X86_FEATURE_PTI))
>> >>> +   return false;
>> >>> +
>> >>> +   if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3)
>> >>> +   return false;
>> >>> +
>> >>> +   pti_reenable();
>> >>> +   return true;
>> >>> +}
>> >>
>> >> Please don't.  You're trying to emulate the old behavior here, but
>> >> you're emulating it wrong.  In particular, you won't trap on LAR.
>> >
>> > Yes, I thought I’ll manage to address LAR, but failed. I thought you said
>> > this is not a “show-stopper”. I’ll adapt your approach of using prctl, 
>> > although
>> > it really limits the benefit of this mechanism.
>> >
>>
>> It's possible we could get away with adding the prctl but making the
>> default be that only the bitness that matches the program being run is
>> allowed.  After all, it's possible that CRIU is literally the only
>> program that switches bitness using the GDT.  (DOSEMU2 definitely does
>> cross-bitness stuff, but it uses the LDT as far as I know.)  And I've
>> never been entirely sure that CRIU fully counts toward the Linux
>> "don't break ABI" guarantee.
>>
>> Linus, how would you feel about, by default, preventing 64-bit
>> programs from long-jumping to __USER32_CS and vice versa?  I think it
>> has some value as a hardening measure.  I've certainly engaged in some
>> exploit shenanigans myself that took advantage of the ability to long
>> jump/ret to change bitness at will.  This wouldn't affect users of
>> modify_ldt() -- 64-bit programs could still create and use their own
>> private 32-bit segments with modify_ldt(), and seccomp can (and
>> should!) prevent that in sandboxed programs.
>>
>> In general, I prefer an approach where everything is explicit to an
>> approach where we almost, but not quite, emulate the weird historical
>> behavior.
>>
>> Pavel and Cyrill, how annoying would it be if CRIU had to do an extra
>> arch_prctl() to enable its cross-bitness shenanigans when
>> checkpointing and restoring a 32-bit program?
>
> I think this should not be a problem for criu (CC'ing Dima, who has
> been working on compat mode support in criu). As far as I remember
> we initiate restoring of 32 bit tasks in native 64 bit mode (well,
> ia32e to be precise :) mode and then, once everything is ready,
> we changing the mode by doing a return to __USER32_CS descriptor.
> So this won't be painful to add additional prctl call here.

Yeah, restoring will still be easy..
But checkpointing will be harder if we can't switch to 64-bit mode.
ATM we have one 64-bit parasite binary, which does all seizing job
for both 64 and 32 bit binaries.
So, if you can't switch back to 64-bit from 32-bit mode, we'll need
to keep two parasites.

-- 
 Dmitry


Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-16 Thread Dmitry Safonov
2018-02-15 20:02 GMT+00:00 Andy Lutomirski :
> On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit  wrote:
>> Based on the understanding that there should be no way for userspace to
>> address the kernel-space from compatibility mode, disable it while
>> running in compatibility mode as long as the 64-bit code segment of the
>> user is not used.
>>
>> Reenabling PTI is performed by restoring NX-bits to the userspace
>> mappings, flushing the TLBs, and notifying all the CPUs that use the
>> affected mm to disable PTI. Each core responds by removing the present
>> bit for the 64-bit code-segment, and marking that PTI is disabled on
>> that core.
>>
>
> I dislike this patch because it's conflating two things.  The patch
> claims to merely disable PTI for compat tasks, whatever those are.
> But it's also introducing a much stronger concept of what a compat
> task is.  The kernel currently mostly doesn't care whether a task is
> "compat" or not, and I think that most remaining code paths that do
> care are buggy and should be removed.

Yes, please, don't do a stronger concept..
Speaking from CRIU side, it C/R ia32 tasks with x86_64 binaries.

-- 
 Dmitry


Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-16 Thread Andy Lutomirski


>> On Feb 15, 2018, at 4:08 PM, Linus Torvalds  
>> wrote:
>> 
>> On Thu, Feb 15, 2018 at 3:29 PM, Andy Lutomirski  wrote:
>> 
>> It's possible we could get away with adding the prctl but making the
>> default be that only the bitness that matches the program being run is
>> allowed.  After all, it's possible that CRIU is literally the only
>> program that switches bitness using the GDT.  (DOSEMU2 definitely does
>> cross-bitness stuff, but it uses the LDT as far as I know.)  And I've
>> never been entirely sure that CRIU fully counts toward the Linux
>> "don't break ABI" guarantee.
> 
> Ugh.
> 
> There are just _so_ many reasons to dislike that.
> 
> It's not that I don't think we could try to encourage it, but this
> whole "security depends on it being in sync" seems really like a
> fundamentally bad design.

If we're going to do Nadav's thing, I think we have no choice.  We could say 
that Nadav's idea of turning off PTI for 32-bit is just too messy, though.

> 
>> Linus, how would you feel about, by default, preventing 64-bit
>> programs from long-jumping to __USER32_CS and vice versa?
> 
> How? It's a standard GDT entry. Are you going to start switching the
> GDT around every context switch?

That's the idea.  We already switch out three GDT entries for TLS.  Switching 
two more isn't going to kill us.

> 
> I *thought* that user space can just do a far jump on its own. But
> it's so long since I had to care that I may have forgotten all the
> requirements for going between "compatibility mode" and real long
> mode.
> 
> I just feel this all is a nightmare. I can see how you would want to
> think that compatibility mode doesn't need PTI, but at the same time
> it feels like a really risky move to do this.
> 
> I can see one thread being in compatibiilty mode, and another being in
> long mode, and sharing the address space. But even with just one
> thread, I'm not seeing how you keep user mode from going from
> compatibility mode to L mode with just a far jump.
> 
> But maybe you have some clever scheme in mind that guarantees that
> there are no issues, or maybe I've just forgotten all the details of
> long mode vs compat mode.

The clever scheme is that we have a new (maybe default) compat-and-i-mean-it 
mode that removes the DPL=3 L code segment from the GDT and prevents 
opportunistic SYSRET.

Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Cyrill Gorcunov
On Thu, Feb 15, 2018 at 11:29:42PM +, Andy Lutomirski wrote:
...
> >>> +bool pti_handle_segment_not_present(long error_code)
> >>> +{
> >>> +   if (!static_cpu_has(X86_FEATURE_PTI))
> >>> +   return false;
> >>> +
> >>> +   if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3)
> >>> +   return false;
> >>> +
> >>> +   pti_reenable();
> >>> +   return true;
> >>> +}
> >>
> >> Please don't.  You're trying to emulate the old behavior here, but
> >> you're emulating it wrong.  In particular, you won't trap on LAR.
> >
> > Yes, I thought I’ll manage to address LAR, but failed. I thought you said
> > this is not a “show-stopper”. I’ll adapt your approach of using prctl, 
> > although
> > it really limits the benefit of this mechanism.
> >
> 
> It's possible we could get away with adding the prctl but making the
> default be that only the bitness that matches the program being run is
> allowed.  After all, it's possible that CRIU is literally the only
> program that switches bitness using the GDT.  (DOSEMU2 definitely does
> cross-bitness stuff, but it uses the LDT as far as I know.)  And I've
> never been entirely sure that CRIU fully counts toward the Linux
> "don't break ABI" guarantee.
> 
> Linus, how would you feel about, by default, preventing 64-bit
> programs from long-jumping to __USER32_CS and vice versa?  I think it
> has some value as a hardening measure.  I've certainly engaged in some
> exploit shenanigans myself that took advantage of the ability to long
> jump/ret to change bitness at will.  This wouldn't affect users of
> modify_ldt() -- 64-bit programs could still create and use their own
> private 32-bit segments with modify_ldt(), and seccomp can (and
> should!) prevent that in sandboxed programs.
> 
> In general, I prefer an approach where everything is explicit to an
> approach where we almost, but not quite, emulate the weird historical
> behavior.
> 
> Pavel and Cyrill, how annoying would it be if CRIU had to do an extra
> arch_prctl() to enable its cross-bitness shenanigans when
> checkpointing and restoring a 32-bit program?

I think this should not be a problem for criu (CC'ing Dima, who has
been working on compat mode support in criu). As far as I remember
we initiate restoring of 32 bit tasks in native 64 bit mode (well,
ia32e to be precise :) mode and then, once everything is ready,
we changing the mode by doing a return to __USER32_CS descriptor.
So this won't be painful to add additional prctl call here.


Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Nadav Amit
Andy Lutomirski  wrote:

> On Fri, Feb 16, 2018 at 12:42 AM, Linus Torvalds
>  wrote:
>> On Thu, Feb 15, 2018 at 4:22 PM, Nadav Amit  wrote:
>>> It is not too pretty, I agree, but it should do the work. There is only one
>>> problematic descriptor that can be used to switch from compatibility-mode to
>>> long-mode in the GDT (LDT descriptors always have the L-bit cleared).
>>> Changing the descriptor's present bit on context switch when needed can do
>>> the work.
>> 
>> Sure, I can see it working, but it's some really shady stuff, and now
>> the scheduler needs to save/restore/check one more subtle bit.
>> 
>> And if you get it wrong, things will happily work, except you've now
>> defeated PTI. But you'll never notice, because you won't be testing
>> for it, and the only people who will are the black hats.
>> 
>> This is exactly the "security depends on it being in sync" thing that
>> makes me go "eww" about the whole model. Get one thing wrong, and
>> you'll blow all the PTI code out of the water.
>> 
>> So now you tried to optimize one small case that most people won't
>> use, but the downside is that you may make all our PTI work (and all
>> the overhead for all the _normal_ cases) pointless.
> 
> There's also the fact that, if this stuff goes in, we'll be
> encouraging people to deploy 32-bit binaries.  Then they'll buy
> Meltdown-fixed CPUs (or AMD CPUs!) and they may well continue running
> 32-bit binaries.  Sigh.  I'm not totally a fan of this.

Ok, ok. Stop kicking the dead body...

Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Andy Lutomirski
On Fri, Feb 16, 2018 at 12:42 AM, Linus Torvalds
 wrote:
> On Thu, Feb 15, 2018 at 4:22 PM, Nadav Amit  wrote:
>>
>> It is not too pretty, I agree, but it should do the work. There is only one
>> problematic descriptor that can be used to switch from compatibility-mode to
>> long-mode in the GDT (LDT descriptors always have the L-bit cleared).
>> Changing the descriptor's present bit on context switch when needed can do
>> the work.
>
> Sure, I can see it working, but it's some really shady stuff, and now
> the scheduler needs to save/restore/check one more subtle bit.
>
> And if you get it wrong, things will happily work, except you've now
> defeated PTI. But you'll never notice, because you won't be testing
> for it, and the only people who will are the black hats.
>
> This is exactly the "security depends on it being in sync" thing that
> makes me go "eww" about the whole model. Get one thing wrong, and
> you'll blow all the PTI code out of the water.
>
> So now you tried to optimize one small case that most people won't
> use, but the downside is that you may make all our PTI work (and all
> the overhead for all the _normal_ cases) pointless.
>

There's also the fact that, if this stuff goes in, we'll be
encouraging people to deploy 32-bit binaries.  Then they'll buy
Meltdown-fixed CPUs (or AMD CPUs!) and they may well continue running
32-bit binaries.  Sigh.  I'm not totally a fan of this.


Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Linus Torvalds
On Thu, Feb 15, 2018 at 4:22 PM, Nadav Amit  wrote:
>
> It is not too pretty, I agree, but it should do the work. There is only one
> problematic descriptor that can be used to switch from compatibility-mode to
> long-mode in the GDT (LDT descriptors always have the L-bit cleared).
> Changing the descriptor's present bit on context switch when needed can do
> the work.

Sure, I can see it working, but it's some really shady stuff, and now
the scheduler needs to save/restore/check one more subtle bit.

And if you get it wrong, things will happily work, except you've now
defeated PTI. But you'll never notice, because you won't be testing
for it, and the only people who will are the black hats.

This is exactly the "security depends on it being in sync" thing that
makes me go "eww" about the whole model. Get one thing wrong, and
you'll blow all the PTI code out of the water.

So now you tried to optimize one small case that most people won't
use, but the downside is that you may make all our PTI work (and all
the overhead for all the _normal_ cases) pointless.

 Linus


Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Andrew Cooper
On 16/02/2018 00:08, Linus Torvalds wrote:
> On Thu, Feb 15, 2018 at 3:29 PM, Andy Lutomirski  wrote:
>> Linus, how would you feel about, by default, preventing 64-bit
>> programs from long-jumping to __USER32_CS and vice versa?
> How? It's a standard GDT entry. Are you going to start switching the
> GDT around every context switch?
>
> I *thought* that user space can just do a far jump on its own. But
> it's so long since I had to care that I may have forgotten all the
> requirements for going between "compatibility mode" and real long
> mode.

Yes - it is just a straight far jump to switch between compat and long mode.

A evil^W cunning programmer can use the 286 world view and disable
segments by clearing the present bit to yield #NP[sel] on use, which is
liable to be rather faster than LGDT on a context switch.

Alternatively, set both the L and D (code segments only), or playing
with DPL/type can all yield #GP[sel] on use, but these probably aren't
as good options.

~Andrew


Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Nadav Amit
Linus Torvalds  wrote:

> On Thu, Feb 15, 2018 at 3:29 PM, Andy Lutomirski  wrote:
>> It's possible we could get away with adding the prctl but making the
>> default be that only the bitness that matches the program being run is
>> allowed.  After all, it's possible that CRIU is literally the only
>> program that switches bitness using the GDT.  (DOSEMU2 definitely does
>> cross-bitness stuff, but it uses the LDT as far as I know.)  And I've
>> never been entirely sure that CRIU fully counts toward the Linux
>> "don't break ABI" guarantee.
> 
> Ugh.
> 
> There are just _so_ many reasons to dislike that.
> 
> It's not that I don't think we could try to encourage it, but this
> whole "security depends on it being in sync" seems really like a
> fundamentally bad design.
> 
>> Linus, how would you feel about, by default, preventing 64-bit
>> programs from long-jumping to __USER32_CS and vice versa?
> 
> How? It's a standard GDT entry. Are you going to start switching the
> GDT around every context switch?
> 
> I *thought* that user space can just do a far jump on its own. But
> it's so long since I had to care that I may have forgotten all the
> requirements for going between "compatibility mode" and real long
> mode.
> 
> I just feel this all is a nightmare. I can see how you would want to
> think that compatibility mode doesn't need PTI, but at the same time
> it feels like a really risky move to do this.
> 
> I can see one thread being in compatibiilty mode, and another being in
> long mode, and sharing the address space. But even with just one
> thread, I'm not seeing how you keep user mode from going from
> compatibility mode to L mode with just a far jump.
> 
> But maybe you have some clever scheme in mind that guarantees that
> there are no issues, or maybe I've just forgotten all the details of
> long mode vs compat mode.

It is not too pretty, I agree, but it should do the work. There is only one
problematic descriptor that can be used to switch from compatibility-mode to
long-mode in the GDT (LDT descriptors always have the L-bit cleared).
Changing the descriptor's present bit on context switch when needed can do
the work.

I tried to do it transparently, and if long-mode is entered, by any thread,
restore PTI. There is one corner case I did not cover (LAR) and Andy felt
this scheme is too complicated. Unfortunately, I don’t have a better scheme
in mind.



signature.asc
Description: Message signed with OpenPGP


Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Linus Torvalds
On Thu, Feb 15, 2018 at 3:29 PM, Andy Lutomirski  wrote:
>
> It's possible we could get away with adding the prctl but making the
> default be that only the bitness that matches the program being run is
> allowed.  After all, it's possible that CRIU is literally the only
> program that switches bitness using the GDT.  (DOSEMU2 definitely does
> cross-bitness stuff, but it uses the LDT as far as I know.)  And I've
> never been entirely sure that CRIU fully counts toward the Linux
> "don't break ABI" guarantee.

Ugh.

There are just _so_ many reasons to dislike that.

It's not that I don't think we could try to encourage it, but this
whole "security depends on it being in sync" seems really like a
fundamentally bad design.

> Linus, how would you feel about, by default, preventing 64-bit
> programs from long-jumping to __USER32_CS and vice versa?

How? It's a standard GDT entry. Are you going to start switching the
GDT around every context switch?

I *thought* that user space can just do a far jump on its own. But
it's so long since I had to care that I may have forgotten all the
requirements for going between "compatibility mode" and real long
mode.

I just feel this all is a nightmare. I can see how you would want to
think that compatibility mode doesn't need PTI, but at the same time
it feels like a really risky move to do this.

I can see one thread being in compatibiilty mode, and another being in
long mode, and sharing the address space. But even with just one
thread, I'm not seeing how you keep user mode from going from
compatibility mode to L mode with just a far jump.

But maybe you have some clever scheme in mind that guarantees that
there are no issues, or maybe I've just forgotten all the details of
long mode vs compat mode.

   Linus


Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Andy Lutomirski
On Thu, Feb 15, 2018 at 8:58 PM, Nadav Amit  wrote:
> Andy Lutomirski  wrote:
>
>> On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit  wrote:
>>> Based on the understanding that there should be no way for userspace to
>>> address the kernel-space from compatibility mode, disable it while
>>> running in compatibility mode as long as the 64-bit code segment of the
>>> user is not used.
>>>
>>> Reenabling PTI is performed by restoring NX-bits to the userspace
>>> mappings, flushing the TLBs, and notifying all the CPUs that use the
>>> affected mm to disable PTI. Each core responds by removing the present
>>> bit for the 64-bit code-segment, and marking that PTI is disabled on
>>> that core.
>>
>> I dislike this patch because it's conflating two things.  The patch
>> claims to merely disable PTI for compat tasks, whatever those are.
>> But it's also introducing a much stronger concept of what a compat
>> task is.  The kernel currently mostly doesn't care whether a task is
>> "compat" or not, and I think that most remaining code paths that do
>> care are buggy and should be removed.
>>
>> I think the right way to approach this is to add a new arch_prctl()
>> that changes allowable bitness, like this:
>>
>> arch_prctl(ARCH_SET_ALLOWED_GDT_CS, X86_ALLOW_CS32 | X86_ALLOW_CS64);
>>
>> this would set the current task to work the normal way, where 32-bit
>> and 64-bit CS are available.  You could set just X86_ALLOW_CS32 to
>> deny 64-bit mode and just X86_ALLOW_CS64 to deny 32-bit mode.  This
>> would make nice attack surface reduction tools for the more paranoid
>> sandbox users to use.  Doing arch_prctl(ARCH_SET_ALLOWED_GDT_CS, 0)
>> would return -EINVAL.
>>
>> A separate patch would turn PTI off if you set X86_ALLOW_CS32.
>>
>> This has the downside that old code doesn't get the benefit without
>> some code change, but that's not the end of the world.
>>
>>> +static void pti_cpu_update_func(void *info)
>>> +{
>>> +   struct mm_struct *mm = (struct mm_struct *)info;
>>> +
>>> +   if (mm != this_cpu_read(cpu_tlbstate.loaded_mm))
>>> +   return;
>>> +
>>> +   /*
>>> +* Keep CS64 and CPU settings in sync despite potential concurrent
>>> +* updates.
>>> +*/
>>> +   set_cpu_pti_disable(READ_ONCE(mm->context.pti_disable));
>>> +}
>>
>> I don't like this at all.  IMO a sane implementation should never
>> change PTI status on a remote CPU.  Just track it per task.
>
> From your comments in the past, I understood that this requirement
> (enabling/disabling PTI per mm) came from Linus. I can do it per task, but
> that means that the NX-bit will always be cleared on the kernel page-table
> for user mappings.

I disagree with Linus here, although I could be convinced.  But
there's still no need for an IPI -- even if it were per-mm, PTI could
stay on until the next kernel entry.

>
>>> +void __pti_reenable(void)
>>> +{
>>> +   struct mm_struct *mm = current->mm;
>>> +   int cpu;
>>> +
>>> +   if (!mm_pti_disable(mm))
>>> +   return;
>>> +
>>> +   /*
>>> +* Prevent spurious page-fault storm while we set the NX-bit and 
>>> have
>>> +* yet not updated the per-CPU pti_disable flag.
>>> +*/
>>> +   down_write(&mm->mmap_sem);
>>> +
>>> +   if (!mm_pti_disable(mm))
>>> +   goto out;
>>> +
>>> +   /*
>>> +* First, mark the PTI is enabled. Although we do anything yet, we 
>>> are
>>> +* safe as long as we do not reenable CS64. Since we did not update 
>>> the
>>> +* page tables yet, this may lead to spurious page-faults, but we 
>>> need
>>> +* the pti_disable in mm to be set for __pti_set_user_pgd() to do 
>>> the
>>> +* right thing.  Holding mmap_sem would ensure matter we hold the
>>> +* mmap_sem to prevent them from swamping the system.
>>> +*/
>>> +   mm->context.pti_disable = PTI_DISABLE_OFF;
>>> +
>>> +   /* Second, restore the NX bits. */
>>> +   pti_update_user_pgds(mm, true);
>>
>> You're holding mmap_sem, but there are code paths that touch page
>> tables that don't hold mmap_sem, such as the stack extension code.
>
> Do they change user mappings? These are the only ones pti_update_user_pgds()
> changes.

I think they do -- it's the user stack.  page_table_lock may be more
appropriate, although I'm far from an expert in this.

>
>>> +
>>> +bool pti_handle_segment_not_present(long error_code)
>>> +{
>>> +   if (!static_cpu_has(X86_FEATURE_PTI))
>>> +   return false;
>>> +
>>> +   if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3)
>>> +   return false;
>>> +
>>> +   pti_reenable();
>>> +   return true;
>>> +}
>>
>> Please don't.  You're trying to emulate the old behavior here, but
>> you're emulating it wrong.  In particular, you won't trap on LAR.
>
> Yes, I thought I’ll manage to address LAR, but failed. I thought you said
> this is not a “show-stopper”. I’ll adapt your approach of usi

Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Nadav Amit
Andy Lutomirski  wrote:

> On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit  wrote:
>> Based on the understanding that there should be no way for userspace to
>> address the kernel-space from compatibility mode, disable it while
>> running in compatibility mode as long as the 64-bit code segment of the
>> user is not used.
>> 
>> Reenabling PTI is performed by restoring NX-bits to the userspace
>> mappings, flushing the TLBs, and notifying all the CPUs that use the
>> affected mm to disable PTI. Each core responds by removing the present
>> bit for the 64-bit code-segment, and marking that PTI is disabled on
>> that core.
> 
> I dislike this patch because it's conflating two things.  The patch
> claims to merely disable PTI for compat tasks, whatever those are.
> But it's also introducing a much stronger concept of what a compat
> task is.  The kernel currently mostly doesn't care whether a task is
> "compat" or not, and I think that most remaining code paths that do
> care are buggy and should be removed.
> 
> I think the right way to approach this is to add a new arch_prctl()
> that changes allowable bitness, like this:
> 
> arch_prctl(ARCH_SET_ALLOWED_GDT_CS, X86_ALLOW_CS32 | X86_ALLOW_CS64);
> 
> this would set the current task to work the normal way, where 32-bit
> and 64-bit CS are available.  You could set just X86_ALLOW_CS32 to
> deny 64-bit mode and just X86_ALLOW_CS64 to deny 32-bit mode.  This
> would make nice attack surface reduction tools for the more paranoid
> sandbox users to use.  Doing arch_prctl(ARCH_SET_ALLOWED_GDT_CS, 0)
> would return -EINVAL.
> 
> A separate patch would turn PTI off if you set X86_ALLOW_CS32.
> 
> This has the downside that old code doesn't get the benefit without
> some code change, but that's not the end of the world.
> 
>> +static void pti_cpu_update_func(void *info)
>> +{
>> +   struct mm_struct *mm = (struct mm_struct *)info;
>> +
>> +   if (mm != this_cpu_read(cpu_tlbstate.loaded_mm))
>> +   return;
>> +
>> +   /*
>> +* Keep CS64 and CPU settings in sync despite potential concurrent
>> +* updates.
>> +*/
>> +   set_cpu_pti_disable(READ_ONCE(mm->context.pti_disable));
>> +}
> 
> I don't like this at all.  IMO a sane implementation should never
> change PTI status on a remote CPU.  Just track it per task.

From your comments in the past, I understood that this requirement
(enabling/disabling PTI per mm) came from Linus. I can do it per task, but
that means that the NX-bit will always be cleared on the kernel page-table
for user mappings.

>> +void __pti_reenable(void)
>> +{
>> +   struct mm_struct *mm = current->mm;
>> +   int cpu;
>> +
>> +   if (!mm_pti_disable(mm))
>> +   return;
>> +
>> +   /*
>> +* Prevent spurious page-fault storm while we set the NX-bit and have
>> +* yet not updated the per-CPU pti_disable flag.
>> +*/
>> +   down_write(&mm->mmap_sem);
>> +
>> +   if (!mm_pti_disable(mm))
>> +   goto out;
>> +
>> +   /*
>> +* First, mark the PTI is enabled. Although we do anything yet, we 
>> are
>> +* safe as long as we do not reenable CS64. Since we did not update 
>> the
>> +* page tables yet, this may lead to spurious page-faults, but we 
>> need
>> +* the pti_disable in mm to be set for __pti_set_user_pgd() to do the
>> +* right thing.  Holding mmap_sem would ensure matter we hold the
>> +* mmap_sem to prevent them from swamping the system.
>> +*/
>> +   mm->context.pti_disable = PTI_DISABLE_OFF;
>> +
>> +   /* Second, restore the NX bits. */
>> +   pti_update_user_pgds(mm, true);
> 
> You're holding mmap_sem, but there are code paths that touch page
> tables that don't hold mmap_sem, such as the stack extension code.

Do they change user mappings? These are the only ones pti_update_user_pgds()
changes.

>> +
>> +bool pti_handle_segment_not_present(long error_code)
>> +{
>> +   if (!static_cpu_has(X86_FEATURE_PTI))
>> +   return false;
>> +
>> +   if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3)
>> +   return false;
>> +
>> +   pti_reenable();
>> +   return true;
>> +}
> 
> Please don't.  You're trying to emulate the old behavior here, but
> you're emulating it wrong.  In particular, you won't trap on LAR.

Yes, I thought I’ll manage to address LAR, but failed. I thought you said
this is not a “show-stopper”. I’ll adapt your approach of using prctl, although
it really limits the benefit of this mechanism.



Re: [PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Andy Lutomirski
On Thu, Feb 15, 2018 at 4:36 PM, Nadav Amit  wrote:
> Based on the understanding that there should be no way for userspace to
> address the kernel-space from compatibility mode, disable it while
> running in compatibility mode as long as the 64-bit code segment of the
> user is not used.
>
> Reenabling PTI is performed by restoring NX-bits to the userspace
> mappings, flushing the TLBs, and notifying all the CPUs that use the
> affected mm to disable PTI. Each core responds by removing the present
> bit for the 64-bit code-segment, and marking that PTI is disabled on
> that core.
>

I dislike this patch because it's conflating two things.  The patch
claims to merely disable PTI for compat tasks, whatever those are.
But it's also introducing a much stronger concept of what a compat
task is.  The kernel currently mostly doesn't care whether a task is
"compat" or not, and I think that most remaining code paths that do
care are buggy and should be removed.

I think the right way to approach this is to add a new arch_prctl()
that changes allowable bitness, like this:

arch_prctl(ARCH_SET_ALLOWED_GDT_CS, X86_ALLOW_CS32 | X86_ALLOW_CS64);

this would set the current task to work the normal way, where 32-bit
and 64-bit CS are available.  You could set just X86_ALLOW_CS32 to
deny 64-bit mode and just X86_ALLOW_CS64 to deny 32-bit mode.  This
would make nice attack surface reduction tools for the more paranoid
sandbox users to use.  Doing arch_prctl(ARCH_SET_ALLOWED_GDT_CS, 0)
would return -EINVAL.

A separate patch would turn PTI off if you set X86_ALLOW_CS32.

This has the downside that old code doesn't get the benefit without
some code change, but that's not the end of the world.

> +static void pti_cpu_update_func(void *info)
> +{
> +   struct mm_struct *mm = (struct mm_struct *)info;
> +
> +   if (mm != this_cpu_read(cpu_tlbstate.loaded_mm))
> +   return;
> +
> +   /*
> +* Keep CS64 and CPU settings in sync despite potential concurrent
> +* updates.
> +*/
> +   set_cpu_pti_disable(READ_ONCE(mm->context.pti_disable));
> +}

I don't like this at all.  IMO a sane implementation should never
change PTI status on a remote CPU.  Just track it per task.

> +void __pti_reenable(void)
> +{
> +   struct mm_struct *mm = current->mm;
> +   int cpu;
> +
> +   if (!mm_pti_disable(mm))
> +   return;
> +
> +   /*
> +* Prevent spurious page-fault storm while we set the NX-bit and have
> +* yet not updated the per-CPU pti_disable flag.
> +*/
> +   down_write(&mm->mmap_sem);
> +
> +   if (!mm_pti_disable(mm))
> +   goto out;
> +
> +   /*
> +* First, mark the PTI is enabled. Although we do anything yet, we are
> +* safe as long as we do not reenable CS64. Since we did not update 
> the
> +* page tables yet, this may lead to spurious page-faults, but we need
> +* the pti_disable in mm to be set for __pti_set_user_pgd() to do the
> +* right thing.  Holding mmap_sem would ensure matter we hold the
> +* mmap_sem to prevent them from swamping the system.
> +*/
> +   mm->context.pti_disable = PTI_DISABLE_OFF;
> +
> +   /* Second, restore the NX bits. */
> +   pti_update_user_pgds(mm, true);

You're holding mmap_sem, but there are code paths that touch page
tables that don't hold mmap_sem, such as the stack extension code.

> +
> +bool pti_handle_segment_not_present(long error_code)
> +{
> +   if (!static_cpu_has(X86_FEATURE_PTI))
> +   return false;
> +
> +   if ((unsigned short)error_code != GDT_ENTRY_DEFAULT_USER_CS << 3)
> +   return false;
> +
> +   pti_reenable();
> +   return true;
> +}

Please don't.  You're trying to emulate the old behavior here, but
you're emulating it wrong.  In particular, you won't trap on LAR.


[PATCH RFC v2 4/6] x86: Disable PTI on compatibility mode

2018-02-15 Thread Nadav Amit
Based on the understanding that there should be no way for userspace to
address the kernel-space from compatibility mode, disable it while
running in compatibility mode as long as the 64-bit code segment of the
user is not used.

Reenabling PTI is performed by restoring NX-bits to the userspace
mappings, flushing the TLBs, and notifying all the CPUs that use the
affected mm to disable PTI. Each core responds by removing the present
bit for the 64-bit code-segment, and marking that PTI is disabled on
that core.

Signed-off-by: Nadav Amit 
---
 arch/x86/include/asm/pti.h   |  39 +
 arch/x86/kernel/process_64.c |  13 -
 arch/x86/kernel/traps.c  |  23 +++-
 arch/x86/mm/pti.c| 130 +++
 4 files changed, 201 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/pti.h b/arch/x86/include/asm/pti.h
index 78a333699874..d04954ebb0d4 100644
--- a/arch/x86/include/asm/pti.h
+++ b/arch/x86/include/asm/pti.h
@@ -31,6 +31,42 @@ static inline void pti_update_user_cs64(unsigned short 
prev_pti_disable,
write_gdt_entry(d, GDT_ENTRY_DEFAULT_USER_CS, &user_cs, DESCTYPE_S);
 }
 
+void __pti_reenable(void);
+
+static inline void pti_reenable(void)
+{
+   if (!static_cpu_has(X86_FEATURE_PTI) || !mm_pti_disable(current->mm))
+   return;
+
+   __pti_reenable();
+}
+
+void __pti_disable(unsigned short type);
+
+static inline void pti_disable(unsigned short type)
+{
+   /*
+* To allow PTI to be disabled, we must:
+*
+* 1. Have PTI enabled.
+* 2. Have SMEP enabled, since the lack of NX-bit on user mappings
+*raises general security concerns.
+* 3. Have NX-bit enabled, since reenabling PTI has a corner case in
+*which the kernel tables are restored instead of those of those of
+*the user. Having NX-bit causes this scenario to trigger a spurious
+*page-fault when control is returned to the user, and allow the
+*entry code to restore the page-tables to their correct state.
+*/
+   if (!static_cpu_has(X86_FEATURE_PTI) ||
+   !static_cpu_has(X86_FEATURE_SMEP) ||
+   !static_cpu_has(X86_FEATURE_NX))
+   return;
+
+   __pti_disable(type);
+}
+
+bool pti_handle_segment_not_present(long error_code);
+
 extern void pti_init(void);
 extern void pti_check_boottime_disable(void);
 #else
@@ -38,6 +74,9 @@ static inline unsigned short mm_pti_disable(struct mm_struct 
*mm) { return 0; }
 static inline unsigned short mm_pti_disable(struct mm_struct *mm);
 static inline void pti_update_user_cs64(unsigned short prev_pti_disable,
unsigned short next_pti_disable) { }
+static inline void pti_disable(unsigned short type) { }
+static inline void pti_reenable(void) { }
+static inline bool pti_handle_segment_not_present(long error_code) { return 
false; }
 static inline void pti_check_boottime_disable(void) { }
 #endif
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index c75466232016..24d3429b4191 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -54,6 +54,7 @@
 #include 
 #include 
 #include 
+#include 
 #ifdef CONFIG_IA32_EMULATION
 /* Not included via unistd.h */
 #include 
@@ -530,8 +531,10 @@ void set_personality_64bit(void)
task_pt_regs(current)->orig_ax = __NR_execve;
 
/* Ensure the corresponding mm is not marked. */
-   if (current->mm)
+   if (current->mm) {
current->mm->context.ia32_compat = 0;
+   pti_reenable();
+   }
 
/* TBD: overwrites user setup. Should have two bits.
   But 64bit processes have always behaved this way,
@@ -545,8 +548,10 @@ static void __set_personality_x32(void)
 #ifdef CONFIG_X86_X32
clear_thread_flag(TIF_IA32);
set_thread_flag(TIF_X32);
-   if (current->mm)
+   if (current->mm) {
current->mm->context.ia32_compat = TIF_X32;
+   pti_reenable();
+   }
current->personality &= ~READ_IMPLIES_EXEC;
/*
 * in_compat_syscall() uses the presence of the x32 syscall bit
@@ -566,8 +571,10 @@ static void __set_personality_ia32(void)
 #ifdef CONFIG_IA32_EMULATION
set_thread_flag(TIF_IA32);
clear_thread_flag(TIF_X32);
-   if (current->mm)
+   if (current->mm) {
current->mm->context.ia32_compat = TIF_IA32;
+   pti_disable(PTI_DISABLE_IA32);
+   }
current->personality |= force_personality32;
/* Prepare the first "return" to user space */
task_pt_regs(current)->orig_ax = __NR_ia32_execve;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 446c9ef8cfc3..65d8ccb20175 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -315,7 +316,6 @@