Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-26 Thread Ingo Molnar


* Thomas Gleixner  wrote:

> On Sun, 25 Nov 2018, Andy Lutomirski wrote:
> > > On Nov 25, 2018, at 2:20 PM, Thomas Gleixner  wrote:
> > > On Sun, 25 Nov 2018, Andi Kleen wrote:
> > > 
> > >>> The current check whether two tasks belong to the same context is using 
> > >>> the
> > >>> tasks context id. While correct, it's simpler to use the mm pointer 
> > >>> because
> > >>> it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> > >>> mechanism requires extra storage, which creates worse code.
> > >> 
> > >> [We tried similar in some really early versions, but it was replaced
> > >> with the context id later.]
> > >> 
> > >> One issue with using the pointer is that the pointer can be reused
> > >> when the original mm_struct is freed, and then gets reallocated
> > >> immediately to an attacker. Then the attacker may avoid the IBPB.
> > >> 
> > >> Given it's probably hard to generate any reasonable leak bandwidth with
> > >> such a complex scenario, but it still seemed better to close the hole.
> > > 
> > > Sorry, but that's really a purely academic exercise. 
> > 
> > I would guess that it’s actually very easy to force mm_struct* reuse.
> > Don’t the various allocators try to allocate hot memory?  There’s nothing
> > hotter than a just-freed allocation of the same size.
> 
> Sure, but this is about a indirect branch predictor attack against
> something which reuses the mm.
> 
> So you'd need to pull off:
> 
>P1 poisons branch predictor
>P1 exit
> 
>P2 starts and resuses mm(P1) and uses the poisoned branch predictor
> 
> the only thing between P1 and P2 is either idle or some other kernel
> thread, but no other user task. If that happens then the code would not
> issue IBPB as it assumes to switch back to the same process.
> 
> Even if you can pull that off the speculation would hit the startup code of
> P2, which is truly a source of secret information. Creating a valuable
> attack based on mm reuse is really less proabable than a lottery jackpot.
> 
> So using mm is really good enough and results in better assembly code which
> is surely more valuable than addressing some hypothetical hole.

OTOH we could probably close even this with very little cost if we added 
an IBPB to non-threaded fork() and vfork()+exec() paths? Those are really 
slow paths compared to all the context switch paths we are trying to 
optimize here.

Alternatively we could IBPB on the post-exit() final task struct freeing, 
which too is a relative slow path compared to the context switch paths.

But no strong opinion.

Thanks,

Ingo


Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-26 Thread Ingo Molnar


* Thomas Gleixner  wrote:

> On Sun, 25 Nov 2018, Andy Lutomirski wrote:
> > > On Nov 25, 2018, at 2:20 PM, Thomas Gleixner  wrote:
> > > On Sun, 25 Nov 2018, Andi Kleen wrote:
> > > 
> > >>> The current check whether two tasks belong to the same context is using 
> > >>> the
> > >>> tasks context id. While correct, it's simpler to use the mm pointer 
> > >>> because
> > >>> it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> > >>> mechanism requires extra storage, which creates worse code.
> > >> 
> > >> [We tried similar in some really early versions, but it was replaced
> > >> with the context id later.]
> > >> 
> > >> One issue with using the pointer is that the pointer can be reused
> > >> when the original mm_struct is freed, and then gets reallocated
> > >> immediately to an attacker. Then the attacker may avoid the IBPB.
> > >> 
> > >> Given it's probably hard to generate any reasonable leak bandwidth with
> > >> such a complex scenario, but it still seemed better to close the hole.
> > > 
> > > Sorry, but that's really a purely academic exercise. 
> > 
> > I would guess that it’s actually very easy to force mm_struct* reuse.
> > Don’t the various allocators try to allocate hot memory?  There’s nothing
> > hotter than a just-freed allocation of the same size.
> 
> Sure, but this is about a indirect branch predictor attack against
> something which reuses the mm.
> 
> So you'd need to pull off:
> 
>P1 poisons branch predictor
>P1 exit
> 
>P2 starts and resuses mm(P1) and uses the poisoned branch predictor
> 
> the only thing between P1 and P2 is either idle or some other kernel
> thread, but no other user task. If that happens then the code would not
> issue IBPB as it assumes to switch back to the same process.
> 
> Even if you can pull that off the speculation would hit the startup code of
> P2, which is truly a source of secret information. Creating a valuable
> attack based on mm reuse is really less proabable than a lottery jackpot.
> 
> So using mm is really good enough and results in better assembly code which
> is surely more valuable than addressing some hypothetical hole.

OTOH we could probably close even this with very little cost if we added 
an IBPB to non-threaded fork() and vfork()+exec() paths? Those are really 
slow paths compared to all the context switch paths we are trying to 
optimize here.

Alternatively we could IBPB on the post-exit() final task struct freeing, 
which too is a relative slow path compared to the context switch paths.

But no strong opinion.

Thanks,

Ingo


Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
On Sun, 25 Nov 2018, Andy Lutomirski wrote:
> > On Nov 25, 2018, at 2:20 PM, Thomas Gleixner  wrote:
> > On Sun, 25 Nov 2018, Andi Kleen wrote:
> > 
> >>> The current check whether two tasks belong to the same context is using 
> >>> the
> >>> tasks context id. While correct, it's simpler to use the mm pointer 
> >>> because
> >>> it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> >>> mechanism requires extra storage, which creates worse code.
> >> 
> >> [We tried similar in some really early versions, but it was replaced
> >> with the context id later.]
> >> 
> >> One issue with using the pointer is that the pointer can be reused
> >> when the original mm_struct is freed, and then gets reallocated
> >> immediately to an attacker. Then the attacker may avoid the IBPB.
> >> 
> >> Given it's probably hard to generate any reasonable leak bandwidth with
> >> such a complex scenario, but it still seemed better to close the hole.
> > 
> > Sorry, but that's really a purely academic exercise. 
> 
> I would guess that it’s actually very easy to force mm_struct* reuse.
> Don’t the various allocators try to allocate hot memory?  There’s nothing
> hotter than a just-freed allocation of the same size.

Sure, but this is about a indirect branch predictor attack against
something which reuses the mm.

So you'd need to pull off:

   P1 poisons branch predictor
   P1 exit

   P2 starts and resuses mm(P1) and uses the poisoned branch predictor

the only thing between P1 and P2 is either idle or some other kernel
thread, but no other user task. If that happens then the code would not
issue IBPB as it assumes to switch back to the same process.

Even if you can pull that off the speculation would hit the startup code of
P2, which is truly a source of secret information. Creating a valuable
attack based on mm reuse is really less proabable than a lottery jackpot.

So using mm is really good enough and results in better assembly code which
is surely more valuable than addressing some hypothetical hole.

Thanks,

tglx

Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
On Sun, 25 Nov 2018, Andy Lutomirski wrote:
> > On Nov 25, 2018, at 2:20 PM, Thomas Gleixner  wrote:
> > On Sun, 25 Nov 2018, Andi Kleen wrote:
> > 
> >>> The current check whether two tasks belong to the same context is using 
> >>> the
> >>> tasks context id. While correct, it's simpler to use the mm pointer 
> >>> because
> >>> it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> >>> mechanism requires extra storage, which creates worse code.
> >> 
> >> [We tried similar in some really early versions, but it was replaced
> >> with the context id later.]
> >> 
> >> One issue with using the pointer is that the pointer can be reused
> >> when the original mm_struct is freed, and then gets reallocated
> >> immediately to an attacker. Then the attacker may avoid the IBPB.
> >> 
> >> Given it's probably hard to generate any reasonable leak bandwidth with
> >> such a complex scenario, but it still seemed better to close the hole.
> > 
> > Sorry, but that's really a purely academic exercise. 
> 
> I would guess that it’s actually very easy to force mm_struct* reuse.
> Don’t the various allocators try to allocate hot memory?  There’s nothing
> hotter than a just-freed allocation of the same size.

Sure, but this is about a indirect branch predictor attack against
something which reuses the mm.

So you'd need to pull off:

   P1 poisons branch predictor
   P1 exit

   P2 starts and resuses mm(P1) and uses the poisoned branch predictor

the only thing between P1 and P2 is either idle or some other kernel
thread, but no other user task. If that happens then the code would not
issue IBPB as it assumes to switch back to the same process.

Even if you can pull that off the speculation would hit the startup code of
P2, which is truly a source of secret information. Creating a valuable
attack based on mm reuse is really less proabable than a lottery jackpot.

So using mm is really good enough and results in better assembly code which
is surely more valuable than addressing some hypothetical hole.

Thanks,

tglx

Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
On Sun, 25 Nov 2018, Andi Kleen wrote:
> On Sun, Nov 25, 2018 at 11:20:50PM +0100, Thomas Gleixner wrote:
> > On Sun, 25 Nov 2018, Andi Kleen wrote:
> > 
> > > > The current check whether two tasks belong to the same context is using 
> > > > the
> > > > tasks context id. While correct, it's simpler to use the mm pointer 
> > > > because
> > > > it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> > > > mechanism requires extra storage, which creates worse code.
> > > 
> > > [We tried similar in some really early versions, but it was replaced
> > > with the context id later.]
> > > 
> > > One issue with using the pointer is that the pointer can be reused
> > > when the original mm_struct is freed, and then gets reallocated
> > > immediately to an attacker. Then the attacker may avoid the IBPB.
> > > 
> > > Given it's probably hard to generate any reasonable leak bandwidth with
> > > such a complex scenario, but it still seemed better to close the hole.
> > 
> > Sorry, but that's really a purely academic exercise. 
> 
> Ok fair enough. I guess it's acceptable if you add a comment explaining it.

Will do.

Thanks,

tglx


Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
On Sun, 25 Nov 2018, Andi Kleen wrote:
> On Sun, Nov 25, 2018 at 11:20:50PM +0100, Thomas Gleixner wrote:
> > On Sun, 25 Nov 2018, Andi Kleen wrote:
> > 
> > > > The current check whether two tasks belong to the same context is using 
> > > > the
> > > > tasks context id. While correct, it's simpler to use the mm pointer 
> > > > because
> > > > it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> > > > mechanism requires extra storage, which creates worse code.
> > > 
> > > [We tried similar in some really early versions, but it was replaced
> > > with the context id later.]
> > > 
> > > One issue with using the pointer is that the pointer can be reused
> > > when the original mm_struct is freed, and then gets reallocated
> > > immediately to an attacker. Then the attacker may avoid the IBPB.
> > > 
> > > Given it's probably hard to generate any reasonable leak bandwidth with
> > > such a complex scenario, but it still seemed better to close the hole.
> > 
> > Sorry, but that's really a purely academic exercise. 
> 
> Ok fair enough. I guess it's acceptable if you add a comment explaining it.

Will do.

Thanks,

tglx


Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Andi Kleen
On Sun, Nov 25, 2018 at 11:20:50PM +0100, Thomas Gleixner wrote:
> On Sun, 25 Nov 2018, Andi Kleen wrote:
> 
> > > The current check whether two tasks belong to the same context is using 
> > > the
> > > tasks context id. While correct, it's simpler to use the mm pointer 
> > > because
> > > it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> > > mechanism requires extra storage, which creates worse code.
> > 
> > [We tried similar in some really early versions, but it was replaced
> > with the context id later.]
> > 
> > One issue with using the pointer is that the pointer can be reused
> > when the original mm_struct is freed, and then gets reallocated
> > immediately to an attacker. Then the attacker may avoid the IBPB.
> > 
> > Given it's probably hard to generate any reasonable leak bandwidth with
> > such a complex scenario, but it still seemed better to close the hole.
> 
> Sorry, but that's really a purely academic exercise. 

Ok fair enough. I guess it's acceptable if you add a comment explaining it.

-Andi


Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Andi Kleen
On Sun, Nov 25, 2018 at 11:20:50PM +0100, Thomas Gleixner wrote:
> On Sun, 25 Nov 2018, Andi Kleen wrote:
> 
> > > The current check whether two tasks belong to the same context is using 
> > > the
> > > tasks context id. While correct, it's simpler to use the mm pointer 
> > > because
> > > it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> > > mechanism requires extra storage, which creates worse code.
> > 
> > [We tried similar in some really early versions, but it was replaced
> > with the context id later.]
> > 
> > One issue with using the pointer is that the pointer can be reused
> > when the original mm_struct is freed, and then gets reallocated
> > immediately to an attacker. Then the attacker may avoid the IBPB.
> > 
> > Given it's probably hard to generate any reasonable leak bandwidth with
> > such a complex scenario, but it still seemed better to close the hole.
> 
> Sorry, but that's really a purely academic exercise. 

Ok fair enough. I guess it's acceptable if you add a comment explaining it.

-Andi


Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Andy Lutomirski



> On Nov 25, 2018, at 2:20 PM, Thomas Gleixner  wrote:
> 
> On Sun, 25 Nov 2018, Andi Kleen wrote:
> 
>>> The current check whether two tasks belong to the same context is using the
>>> tasks context id. While correct, it's simpler to use the mm pointer because
>>> it allows to mangle the TIF_SPEC_IB bit into it. The context id based
>>> mechanism requires extra storage, which creates worse code.
>> 
>> [We tried similar in some really early versions, but it was replaced
>> with the context id later.]
>> 
>> One issue with using the pointer is that the pointer can be reused
>> when the original mm_struct is freed, and then gets reallocated
>> immediately to an attacker. Then the attacker may avoid the IBPB.
>> 
>> Given it's probably hard to generate any reasonable leak bandwidth with
>> such a complex scenario, but it still seemed better to close the hole.
> 
> Sorry, but that's really a purely academic exercise. 
> 
> 

I would guess that it’s actually very easy to force mm_struct* reuse.  Don’t 
the various allocators try to allocate hot memory?  There’s nothing hotter than 
a just-freed allocation of the same size.

Can someone explain the actual problem with ctx_id?  If you just need an extra 
bit, how about:

2*ctx_id vs 2*ctx_id+1

Or any of the many variants of approximately the same thing?

—Andy

Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Andy Lutomirski



> On Nov 25, 2018, at 2:20 PM, Thomas Gleixner  wrote:
> 
> On Sun, 25 Nov 2018, Andi Kleen wrote:
> 
>>> The current check whether two tasks belong to the same context is using the
>>> tasks context id. While correct, it's simpler to use the mm pointer because
>>> it allows to mangle the TIF_SPEC_IB bit into it. The context id based
>>> mechanism requires extra storage, which creates worse code.
>> 
>> [We tried similar in some really early versions, but it was replaced
>> with the context id later.]
>> 
>> One issue with using the pointer is that the pointer can be reused
>> when the original mm_struct is freed, and then gets reallocated
>> immediately to an attacker. Then the attacker may avoid the IBPB.
>> 
>> Given it's probably hard to generate any reasonable leak bandwidth with
>> such a complex scenario, but it still seemed better to close the hole.
> 
> Sorry, but that's really a purely academic exercise. 
> 
> 

I would guess that it’s actually very easy to force mm_struct* reuse.  Don’t 
the various allocators try to allocate hot memory?  There’s nothing hotter than 
a just-freed allocation of the same size.

Can someone explain the actual problem with ctx_id?  If you just need an extra 
bit, how about:

2*ctx_id vs 2*ctx_id+1

Or any of the many variants of approximately the same thing?

—Andy

Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
On Sun, 25 Nov 2018, Andi Kleen wrote:

> > The current check whether two tasks belong to the same context is using the
> > tasks context id. While correct, it's simpler to use the mm pointer because
> > it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> > mechanism requires extra storage, which creates worse code.
> 
> [We tried similar in some really early versions, but it was replaced
> with the context id later.]
> 
> One issue with using the pointer is that the pointer can be reused
> when the original mm_struct is freed, and then gets reallocated
> immediately to an attacker. Then the attacker may avoid the IBPB.
> 
> Given it's probably hard to generate any reasonable leak bandwidth with
> such a complex scenario, but it still seemed better to close the hole.

Sorry, but that's really a purely academic exercise. 

Thanks,

tglx


Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
On Sun, 25 Nov 2018, Andi Kleen wrote:

> > The current check whether two tasks belong to the same context is using the
> > tasks context id. While correct, it's simpler to use the mm pointer because
> > it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> > mechanism requires extra storage, which creates worse code.
> 
> [We tried similar in some really early versions, but it was replaced
> with the context id later.]
> 
> One issue with using the pointer is that the pointer can be reused
> when the original mm_struct is freed, and then gets reallocated
> immediately to an attacker. Then the attacker may avoid the IBPB.
> 
> Given it's probably hard to generate any reasonable leak bandwidth with
> such a complex scenario, but it still seemed better to close the hole.

Sorry, but that's really a purely academic exercise. 

Thanks,

tglx


Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Andi Kleen
> The current check whether two tasks belong to the same context is using the
> tasks context id. While correct, it's simpler to use the mm pointer because
> it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> mechanism requires extra storage, which creates worse code.

[We tried similar in some really early versions, but it was replaced
with the context id later.]

One issue with using the pointer is that the pointer can be reused
when the original mm_struct is freed, and then gets reallocated
immediately to an attacker. Then the attacker may avoid the IBPB.

Given it's probably hard to generate any reasonable leak bandwidth with
such a complex scenario, but it still seemed better to close the hole.

Because of concerns with that the counter ID was used instead.

The ID can wrap too, but since it's 64bit, it will take very long.

-Andi



Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Andi Kleen
> The current check whether two tasks belong to the same context is using the
> tasks context id. While correct, it's simpler to use the mm pointer because
> it allows to mangle the TIF_SPEC_IB bit into it. The context id based
> mechanism requires extra storage, which creates worse code.

[We tried similar in some really early versions, but it was replaced
with the context id later.]

One issue with using the pointer is that the pointer can be reused
when the original mm_struct is freed, and then gets reallocated
immediately to an attacker. Then the attacker may avoid the IBPB.

Given it's probably hard to generate any reasonable leak bandwidth with
such a complex scenario, but it still seemed better to close the hole.

Because of concerns with that the counter ID was used instead.

The ID can wrap too, but since it's 64bit, it will take very long.

-Andi



Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
On Sun, 25 Nov 2018, Thomas Gleixner wrote:
>  /*
> + * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is
> + * stored in cpu_tlb_state.last_user_mm_ibpb.
> + */
> +#define LAST_USER_MM_IBPB0x1UL
> +
> +/*
> + unsigned long next_tif = task_thread_info(next)->flags;
> + unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USR_MM_IBPB;

That wants to be LAST_USER_ of course. That's what you get for last minute
changes...



Re: [patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
On Sun, 25 Nov 2018, Thomas Gleixner wrote:
>  /*
> + * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is
> + * stored in cpu_tlb_state.last_user_mm_ibpb.
> + */
> +#define LAST_USER_MM_IBPB0x1UL
> +
> +/*
> + unsigned long next_tif = task_thread_info(next)->flags;
> + unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USR_MM_IBPB;

That wants to be LAST_USER_ of course. That's what you get for last minute
changes...



[patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
The IBPB speculation barrier is issued from switch_mm() when the kernel
switches to a user space task with a different mm than the user space task
which ran last on the same CPU.

An additional optimization is to avoid IBPB when the incoming task can be
ptraced by the outgoing task. This optimization only works when switching
directly between two user space tasks. When switching from a kernel task to
a user space task the optimization fails because the previous task cannot
be accessed anymore. So for quite some scenarios the optimization is just
adding overhead.

The upcoming conditional IBPB support will issue IBPB only for user space
tasks which have the TIF_SPEC_IB bit set. This requires to handle the
following cases:

  1) Switch from a user space task (potential attacker) which has
 TIF_SPEC_IB set to a user space task (potential victim) which has
 TIF_SPEC_IB not set.

  2) Switch from a user space task (potential attacker) which has
 TIF_SPEC_IB not set to a user space task (potential victim) which has
 TIF_SPEC_IB set.

This needs to be optimized for the case where the IBPB can be avoided when
only kernel threads ran in between user space tasks which belong to the
same process.

The current check whether two tasks belong to the same context is using the
tasks context id. While correct, it's simpler to use the mm pointer because
it allows to mangle the TIF_SPEC_IB bit into it. The context id based
mechanism requires extra storage, which creates worse code.

When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into
the per CPU storage which is used to track the last user space mm which was
running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of
the incoming task to make the decision whether IBPB needs to be issued or
not to cover the two cases above.

As conditional IBPB is going to be the default, remove the dubious ptrace
check for the IBPB always case and simply issue IBPB always when the
process changes.

Move the storage to a different place in the struct as the original one
created a hole.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/nospec-branch.h |2 
 arch/x86/include/asm/tlbflush.h  |8 +-
 arch/x86/kernel/cpu/bugs.c   |   29 +++--
 arch/x86/mm/tlb.c|  109 +--
 4 files changed, 110 insertions(+), 38 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -312,6 +312,8 @@ do {
\
 } while (0)
 
 DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp);
+DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
+DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
 #endif /* __ASSEMBLY__ */
 
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -169,10 +169,14 @@ struct tlb_state {
 
 #define LOADED_MM_SWITCHING ((struct mm_struct *)1)
 
+   /* Last user mm for optimizing IBPB */
+   union {
+   struct mm_struct*last_user_mm;
+   unsigned long   last_user_mm_ibpb;
+   };
+
u16 loaded_mm_asid;
u16 next_asid;
-   /* last user mm's ctx id */
-   u64 last_ctx_id;
 
/*
 * We can be in one of several states:
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -56,6 +56,10 @@ u64 __ro_after_init x86_amd_ls_cfg_ssbd_
 
 /* Control conditional STIPB in switch_to() */
 DEFINE_STATIC_KEY_FALSE(switch_to_cond_stibp);
+/* Control conditional IBPB in switch_mm() */
+DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
+/* Control unconditional IBPB in switch_mm() */
+DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
 void __init check_bugs(void)
 {
@@ -331,7 +335,17 @@ spectre_v2_user_select_mitigation(enum s
/* Initialize Indirect Branch Prediction Barrier */
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
-   pr_info("Spectre v2 mitigation: Enabling Indirect Branch 
Prediction Barrier\n");
+
+   switch (mode) {
+   case SPECTRE_V2_USER_STRICT:
+   static_branch_enable(_mm_always_ibpb);
+   break;
+   default:
+   break;
+   }
+
+   pr_info("mitigation: Enabling %s Indirect Branch Prediction 
Barrier\n",
+   mode == SPECTRE_V2_USER_STRICT ? "always-on" : 
"conditional");
}
 
/* If enhanced IBRS is enabled no STIPB required */
@@ -955,10 +969,15 @@ static char *stibp_state(void)
 
 static char *ibpb_state(void)
 {
-   if (boot_cpu_has(X86_FEATURE_USE_IBPB))
-   return ", IBPB";
-   else
-   return "";
+   if (boot_cpu_has(X86_FEATURE_IBPB)) {
+   switch (spectre_v2_user) {
+   case SPECTRE_V2_USER_NONE:
+   return ", IBPB: disabled";
+   

[patch V2 21/28] x86/speculation: Prepare for conditional IBPB in switch_mm()

2018-11-25 Thread Thomas Gleixner
The IBPB speculation barrier is issued from switch_mm() when the kernel
switches to a user space task with a different mm than the user space task
which ran last on the same CPU.

An additional optimization is to avoid IBPB when the incoming task can be
ptraced by the outgoing task. This optimization only works when switching
directly between two user space tasks. When switching from a kernel task to
a user space task the optimization fails because the previous task cannot
be accessed anymore. So for quite some scenarios the optimization is just
adding overhead.

The upcoming conditional IBPB support will issue IBPB only for user space
tasks which have the TIF_SPEC_IB bit set. This requires to handle the
following cases:

  1) Switch from a user space task (potential attacker) which has
 TIF_SPEC_IB set to a user space task (potential victim) which has
 TIF_SPEC_IB not set.

  2) Switch from a user space task (potential attacker) which has
 TIF_SPEC_IB not set to a user space task (potential victim) which has
 TIF_SPEC_IB set.

This needs to be optimized for the case where the IBPB can be avoided when
only kernel threads ran in between user space tasks which belong to the
same process.

The current check whether two tasks belong to the same context is using the
tasks context id. While correct, it's simpler to use the mm pointer because
it allows to mangle the TIF_SPEC_IB bit into it. The context id based
mechanism requires extra storage, which creates worse code.

When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into
the per CPU storage which is used to track the last user space mm which was
running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of
the incoming task to make the decision whether IBPB needs to be issued or
not to cover the two cases above.

As conditional IBPB is going to be the default, remove the dubious ptrace
check for the IBPB always case and simply issue IBPB always when the
process changes.

Move the storage to a different place in the struct as the original one
created a hole.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/nospec-branch.h |2 
 arch/x86/include/asm/tlbflush.h  |8 +-
 arch/x86/kernel/cpu/bugs.c   |   29 +++--
 arch/x86/mm/tlb.c|  109 +--
 4 files changed, 110 insertions(+), 38 deletions(-)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -312,6 +312,8 @@ do {
\
 } while (0)
 
 DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp);
+DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
+DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
 #endif /* __ASSEMBLY__ */
 
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -169,10 +169,14 @@ struct tlb_state {
 
 #define LOADED_MM_SWITCHING ((struct mm_struct *)1)
 
+   /* Last user mm for optimizing IBPB */
+   union {
+   struct mm_struct*last_user_mm;
+   unsigned long   last_user_mm_ibpb;
+   };
+
u16 loaded_mm_asid;
u16 next_asid;
-   /* last user mm's ctx id */
-   u64 last_ctx_id;
 
/*
 * We can be in one of several states:
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -56,6 +56,10 @@ u64 __ro_after_init x86_amd_ls_cfg_ssbd_
 
 /* Control conditional STIPB in switch_to() */
 DEFINE_STATIC_KEY_FALSE(switch_to_cond_stibp);
+/* Control conditional IBPB in switch_mm() */
+DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
+/* Control unconditional IBPB in switch_mm() */
+DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
 
 void __init check_bugs(void)
 {
@@ -331,7 +335,17 @@ spectre_v2_user_select_mitigation(enum s
/* Initialize Indirect Branch Prediction Barrier */
if (boot_cpu_has(X86_FEATURE_IBPB)) {
setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
-   pr_info("Spectre v2 mitigation: Enabling Indirect Branch 
Prediction Barrier\n");
+
+   switch (mode) {
+   case SPECTRE_V2_USER_STRICT:
+   static_branch_enable(_mm_always_ibpb);
+   break;
+   default:
+   break;
+   }
+
+   pr_info("mitigation: Enabling %s Indirect Branch Prediction 
Barrier\n",
+   mode == SPECTRE_V2_USER_STRICT ? "always-on" : 
"conditional");
}
 
/* If enhanced IBRS is enabled no STIPB required */
@@ -955,10 +969,15 @@ static char *stibp_state(void)
 
 static char *ibpb_state(void)
 {
-   if (boot_cpu_has(X86_FEATURE_USE_IBPB))
-   return ", IBPB";
-   else
-   return "";
+   if (boot_cpu_has(X86_FEATURE_IBPB)) {
+   switch (spectre_v2_user) {
+   case SPECTRE_V2_USER_NONE:
+   return ", IBPB: disabled";
+