Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-17 Thread Peter Zijlstra
On Tue, Jun 16, 2020 at 04:23:46PM -0700, Fenghua Yu wrote:
> Hi, Peter,
> 
> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
> > 
> > > Or do you suggest to add a random new flag in struct thread_info instead
> > > of a TIF flag?
> > 
> > Why thread_info? What's wrong with something simple like the below. It
> > takes a bit from the 'strictly current' flags word.
> > 
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index b62e6aaf28f0..fca830b97055 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -801,6 +801,9 @@ struct task_struct {
> > /* Stalled due to lack of memory */
> > unsignedin_memstall:1;
> >  #endif
> > +#ifdef CONFIG_PCI_PASID
> > +   unsignedhas_valid_pasid:1;
> > +#endif
> >  
> > unsigned long   atomic_flags; /* Flags requiring atomic 
> > access. */
> >  
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 142b23645d82..10b3891be99e 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct 
> > task_struct *orig, int node)
> > tsk->use_memdelay = 0;
> >  #endif
> >  
> > +#ifdef CONFIG_PCI_PASID
> > +   tsk->has_valid_pasid = 0;
> > +#endif
> > +
> >  #ifdef CONFIG_MEMCG
> > tsk->active_memcg = NULL;
> >  #endif
> 
> Can I add "Signed-off-by: Peter Zijlstra "
> to this patch? I will send this patch in the next version of the series.

Sure, n/p.


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-16 Thread Fenghua Yu
Hi, Peter,

On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
> 
> > Or do you suggest to add a random new flag in struct thread_info instead
> > of a TIF flag?
> 
> Why thread_info? What's wrong with something simple like the below. It
> takes a bit from the 'strictly current' flags word.
> 
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b62e6aaf28f0..fca830b97055 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -801,6 +801,9 @@ struct task_struct {
>   /* Stalled due to lack of memory */
>   unsignedin_memstall:1;
>  #endif
> +#ifdef CONFIG_PCI_PASID
> + unsignedhas_valid_pasid:1;
> +#endif
>  
>   unsigned long   atomic_flags; /* Flags requiring atomic 
> access. */
>  
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 142b23645d82..10b3891be99e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct 
> task_struct *orig, int node)
>   tsk->use_memdelay = 0;
>  #endif
>  
> +#ifdef CONFIG_PCI_PASID
> + tsk->has_valid_pasid = 0;
> +#endif
> +
>  #ifdef CONFIG_MEMCG
>   tsk->active_memcg = NULL;
>  #endif

Can I add "Signed-off-by: Peter Zijlstra "
to this patch? I will send this patch in the next version of the series.

Thanks.

-Fenghua


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Raj, Ashok
On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote:
> 
> I don't get why you need a rdmsr here, or why not having one would
> require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
> 
> > > > +*/
> > > > +   rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > > +   if (pasid_msr & MSR_IA32_PASID_VALID)
> > > > +   return false;
> > > > +
> > > > +   /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > > +   wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> 
> How much more expensive is the wrmsr over the rdmsr? Can't we just
> unconditionally write the current PASID and call it a day?

The reason to check the rdmsr() is because we are using a hueristic taking
GP faults. If we already setup the MSR, but we get it a second time it
means the reason is something other than PASID_MSR not being set.

Ideally we should use the TIF_ to track this would be cheaper, but we were
told those bits aren't easy to give out. 

Cheers,
Ashok


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Peter Zijlstra
On Mon, Jun 15, 2020 at 01:17:35PM -0700, Fenghua Yu wrote:
> Hi, Peter,
> 
> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
> > 
> > > Or do you suggest to add a random new flag in struct thread_info instead
> > > of a TIF flag?
> > 
> > Why thread_info? What's wrong with something simple like the below. It
> > takes a bit from the 'strictly current' flags word.
> > 
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index b62e6aaf28f0..fca830b97055 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -801,6 +801,9 @@ struct task_struct {
> > /* Stalled due to lack of memory */
> > unsignedin_memstall:1;
> >  #endif
> > +#ifdef CONFIG_PCI_PASID
> > +   unsignedhas_valid_pasid:1;
> > +#endif
> >  
> > unsigned long   atomic_flags; /* Flags requiring atomic 
> > access. */
> >  
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 142b23645d82..10b3891be99e 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct 
> > task_struct *orig, int node)
> > tsk->use_memdelay = 0;
> >  #endif
> >  
> > +#ifdef CONFIG_PCI_PASID
> > +   tsk->has_valid_pasid = 0;
> > +#endif
> > +
> >  #ifdef CONFIG_MEMCG
> > tsk->active_memcg = NULL;
> >  #endif
> 
> The PASID MSR is x86 specific although PASID is PCIe concept and per-mm.
> Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work.
> The flag should be cleared in cloned()/forked() and is only set and
> read in fixup() in x86 #GP for heuristic. It's not used anywhere outside
> of x86.
> 
> That's why we think the flag should be in x86 struct thread_info instead
> of in generice struct task_struct.

I don't think anybody really cares, it's just one bit, we have plenty
left.

On x86_64 there's a u32 sized alignment hole in thread_info, also we
don't use the upper 32bit of thread_info::flags, however using those
would still mean you have to use atomic ops, which you really don't
need.




RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Luck, Tony
> So what’s the RDMSR for?  Surely you
> have some state somewhere that says “this task has a PASID.”
> Can’t you just make sure that stays in sync with the MSR?  Then, on #GP, if 
> the task already has a PASID, you know the MSR is set.

We have state that says the process ("mm") has been allocated a PASID. But not 
for each task.

E.g. a process may clone a bunch of tasks, then one of them opens a device that 
needs
a PASID.   We did internally have patches to go hunt down all those other tasks 
and
force a PASID onto each. But the code was big and ugly. Also maybe the wrong 
thing
to do if those threads didn't ever need to access the device.

PeterZ suggested that we can add a bit to the task structure for this purpose.

Fenghua is hesitant about adding an x86 only bit there.

-Tony


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Andy Lutomirski



> On Jun 15, 2020, at 1:56 PM, Luck, Tony  wrote:
> 
> 
>> 
>> Are we planning to keep PASID live once a task has used it once or are we 
>> going to swap it lazily?  If the latter, a percpu variable might be better.
> 
> Current plan is "touch it once and the task owns it until exit(2)"
> 
> Maybe someday in the future when we have data on how applications
> actually use accelerators we could look at something more complex
> if usage patterns look like it would be beneficial.
> 
> 

So what’s the RDMSR for?  Surely you
have some state somewhere that says “this task has a PASID.”  Can’t you just 
make sure that stays in sync with the MSR?  Then, on #GP, if the task already 
has a PASID, you know the MSR is set.

RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Luck, Tony
> Are we planning to keep PASID live once a task has used it once or are we 
> going to swap it lazily?  If the latter, a percpu variable might be better.

Current plan is "touch it once and the task owns it until exit(2)"

Maybe someday in the future when we have data on how applications
actually use accelerators we could look at something more complex
if usage patterns look like it would be beneficial.

-Tony


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Andy Lutomirski


> On Jun 15, 2020, at 1:17 PM, Fenghua Yu  wrote:
> 
> Hi, Peter,
> 
>> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
>>> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
>>> 
>>> Or do you suggest to add a random new flag in struct thread_info instead
>>> of a TIF flag?
>> 
>> Why thread_info? What's wrong with something simple like the below. It
>> takes a bit from the 'strictly current' flags word.
>> 
>> 
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index b62e6aaf28f0..fca830b97055 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -801,6 +801,9 @@ struct task_struct {
>>/* Stalled due to lack of memory */
>>unsignedin_memstall:1;
>> #endif
>> +#ifdef CONFIG_PCI_PASID
>> +unsignedhas_valid_pasid:1;
>> +#endif
>> 
>>unsigned longatomic_flags; /* Flags requiring atomic access. 
>> */
>> 
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 142b23645d82..10b3891be99e 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct 
>> task_struct *orig, int node)
>>tsk->use_memdelay = 0;
>> #endif
>> 
>> +#ifdef CONFIG_PCI_PASID
>> +tsk->has_valid_pasid = 0;
>> +#endif
>> +
>> #ifdef CONFIG_MEMCG
>>tsk->active_memcg = NULL;
>> #endif
> 
> The PASID MSR is x86 specific although PASID is PCIe concept and per-mm.
> Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work.
> The flag should be cleared in cloned()/forked() and is only set and
> read in fixup() in x86 #GP for heuristic. It's not used anywhere outside
> of x86.
> 
> That's why we think the flag should be in x86 struct thread_info instead
> of in generice struct task_struct.
> 

Are we planning to keep PASID live once a task has used it once or are we going 
to swap it lazily?  If the latter, a percpu variable might be better.

> Please advice.
> 
> Thanks.
> 
> -Fenghua


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Fenghua Yu
Hi, Peter,

On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
> 
> > Or do you suggest to add a random new flag in struct thread_info instead
> > of a TIF flag?
> 
> Why thread_info? What's wrong with something simple like the below. It
> takes a bit from the 'strictly current' flags word.
> 
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b62e6aaf28f0..fca830b97055 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -801,6 +801,9 @@ struct task_struct {
>   /* Stalled due to lack of memory */
>   unsignedin_memstall:1;
>  #endif
> +#ifdef CONFIG_PCI_PASID
> + unsignedhas_valid_pasid:1;
> +#endif
>  
>   unsigned long   atomic_flags; /* Flags requiring atomic 
> access. */
>  
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 142b23645d82..10b3891be99e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct 
> task_struct *orig, int node)
>   tsk->use_memdelay = 0;
>  #endif
>  
> +#ifdef CONFIG_PCI_PASID
> + tsk->has_valid_pasid = 0;
> +#endif
> +
>  #ifdef CONFIG_MEMCG
>   tsk->active_memcg = NULL;
>  #endif

The PASID MSR is x86 specific although PASID is PCIe concept and per-mm.
Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work.
The flag should be cleared in cloned()/forked() and is only set and
read in fixup() in x86 #GP for heuristic. It's not used anywhere outside
of x86.

That's why we think the flag should be in x86 struct thread_info instead
of in generice struct task_struct.

Please advice.

Thanks.

-Fenghua


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Peter Zijlstra
On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:

> Or do you suggest to add a random new flag in struct thread_info instead
> of a TIF flag?

Why thread_info? What's wrong with something simple like the below. It
takes a bit from the 'strictly current' flags word.


diff --git a/include/linux/sched.h b/include/linux/sched.h
index b62e6aaf28f0..fca830b97055 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -801,6 +801,9 @@ struct task_struct {
/* Stalled due to lack of memory */
unsignedin_memstall:1;
 #endif
+#ifdef CONFIG_PCI_PASID
+   unsignedhas_valid_pasid:1;
+#endif
 
unsigned long   atomic_flags; /* Flags requiring atomic 
access. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..10b3891be99e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
tsk->use_memdelay = 0;
 #endif
 
+#ifdef CONFIG_PCI_PASID
+   tsk->has_valid_pasid = 0;
+#endif
+
 #ifdef CONFIG_MEMCG
tsk->active_memcg = NULL;
 #endif


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Fenghua Yu
Hi, Peter,

On Mon, Jun 15, 2020 at 08:31:16PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote:
> > > I don't get why you need a rdmsr here, or why not having one would
> > > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
> > 
> > My concern is TIF flags are precious (only 3 slots available). Defining
> > a new TIF flag may be not worth it while rdmsr() can check if PASID
> > is valid in the MSR. And performance here might not be a big issue
> > in #GP.
> > 
> > But if you think using TIF flag is better, I can define a new TIF flag
> > and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()).
> > Then we can avoid using rdmsr() to check valid PASID in the MSR.
> 
> WHY ?!?! What do you need a TIF flag for?

We need "a way" to check if the per thread MSR has a valid PASID. If yes,
no need to fix up the MSR (wrmsr()), and let other handler to handle the #GP.
Otherwise, apply the heuristics and fix up the MSR and exit the #GP.

The way to check the valid PASID in the MSR is rdmsr() in this series.
A TIF flag will be much faster than rdmsr() and seems a sutiable way
to check valid PASID status per thread. That's why it could replace
rdmsr() to check PASID in the MSR.

Or do you suggest to add a random new flag in struct thread_info instead
of a TIF flag?

Thanks.

-Fenghua



Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Peter Zijlstra
On Mon, Jun 15, 2020 at 11:19:21AM -0700, Raj, Ashok wrote:
> On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote:
> > 
> > I don't get why you need a rdmsr here, or why not having one would
> > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
> > 
> > > > > +  */
> > > > > + rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > > > + if (pasid_msr & MSR_IA32_PASID_VALID)
> > > > > + return false;
> > > > > +
> > > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> > 
> > How much more expensive is the wrmsr over the rdmsr? Can't we just
> > unconditionally write the current PASID and call it a day?
> 
> The reason to check the rdmsr() is because we are using a hueristic taking
> GP faults. If we already setup the MSR, but we get it a second time it
> means the reason is something other than PASID_MSR not being set.
> 
> Ideally we should use the TIF_ to track this would be cheaper, but we were
> told those bits aren't easy to give out. 

Why do you need a TIF flag? Why not any other random flag in current?


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Peter Zijlstra
On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote:
> > I don't get why you need a rdmsr here, or why not having one would
> > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
> 
> My concern is TIF flags are precious (only 3 slots available). Defining
> a new TIF flag may be not worth it while rdmsr() can check if PASID
> is valid in the MSR. And performance here might not be a big issue
> in #GP.
> 
> But if you think using TIF flag is better, I can define a new TIF flag
> and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()).
> Then we can avoid using rdmsr() to check valid PASID in the MSR.

WHY ?!?! What do you need a TIF flag for?


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Fenghua Yu
On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote:
> > Hi, Peter,
> > On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote:
> > > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> > > > +/*
> > > > + * Apply some heuristics to see if the #GP fault was caused by a thread
> > > > + * that hasn't had the IA32_PASID MSR initialized.  If it looks like 
> > > > that
> > > > + * is the problem, try initializing the IA32_PASID MSR. If the 
> > > > heuristic
> > > > + * guesses incorrectly, take one more #GP fault.
> > > 
> > > How is that going to help? Aren't we then going to run this same
> > > heuristic again and again and again?
> > 
> > The heuristic always initializes the MSR with the per mm PASID IIF the
> > mm has a valid PASID but the MSR doesn't have one. This heuristic usually
> > happens only once on the first #GP in a thread.
> 
> But it doesn't guarantee the PASID is the right one. Suppose both the mm
> has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
> Then we NO-OP. So if the exception was due to us having the wrong PASID,
> we stuck.

The MSR for each thread was cleared during fork() and clone(). The PASID
was cleared during mm_init(). The per-mm PASID was assigned when fist
bind_mm() is called and remains the same value until process exit().

The MSR is only fixed up when the first ENQCMD is executed in a thread:
bit 31 in the MSR is 0 and the PASID in the mm is non-zero.

The MSR remains the same PASID value once it's fixed up until the thread
exits.

So the work flow ensures the PASID goes from mm's PASID to the MSR.

The PASID could be unbund from the mm. In this case, iommu will generate
#PF and kernel oops instead of #GP.

> 
> > If the next #GP still comes in, the heuristic finds out the MSR already
> > has a valid PASID and thus will not fixup the MSR any more. The fixup()
> > returns "false" and lets others to handle the #GP.
> > 
> > So the heuristic will be executed once (at most) and won't be executed
> > again and again.
> 
> So I get that you want to set the PASID on-demand, but I find this all
> really weird code to make that happen.

We could keep PASID same in all threads sychronously by propogating the MSRs
when the PASID is bound to the mm via IPIs or taskworks to all
threads in the process. But the code is complex and error-prone and
overhead could be high:
1. The user can call driver to do binding and unbinding multiple times.
   The IPIs or taskworks will be sent multiple times to make sure only
   the last IPIs or taskworks take action.
2. Even if a thread never executes ENQCMD and thus never uses the MSR,
   the MSR still needs to be updated whenever bind_mm() and needs to be
   context switched each time. This could cause high overhead.

Setting the PASID on-demand is simpler and cleaner and was recommended
by Thomas.

> 
> > > > +bool __fixup_pasid_exception(void)
> > > > +{
> > > > +   u64 pasid_msr;
> > > > +   unsigned int pasid;
> > > > +
> > > > +   /*
> > > > +* This function is called only when this #GP was triggered 
> > > > from user
> > > > +* space. So the mm cannot be NULL.
> > > > +*/
> > > > +   pasid = current->mm->pasid;
> > > > +   /* If the mm doesn't have a valid PASID, then can't help. */
> > > > +   if (invalid_pasid(pasid))
> > > > +   return false;
> > > > +
> > > > +   /*
> > > > +* Since IRQ is disabled now, the current task still owns the 
> > > > FPU on
> > > 
> > > That's just weird and confusing. What you want to say is that you rely
> > > on the exception disabling the interrupt.
> > 
> > I checked SDM again. You are right. #GP can be interrupted by machine check
> > or other interrupts. So I cannot assume the current task still owns the FPU.
> > Instead of directly rdmsr() and wrmsr(), I will add helpers that can access
> > either the MSR on the processor or the PASID state in the memory.
> 
> That's not in fact what I meant, but yes, you can take exceptions while
> !IF just fine.
> 
> > > > +* this CPU and the PASID MSR can be directly accessed.
> > > > +*
> > > > +* If the MSR has a valid PASID, the #GP must be for some other 
> > > > reason.
> > > > +*
> > > > +* If rdmsr() is really a performance issue, a TIF_ flag may be
> > > > +* added to check if the thread has a valid PASID instead of 
> > > > rdmsr().
> > > 
> > > I don't understand any of this. Nobody except us writes to this MSR, we
> > > should bloody well know what's in it. What gives?
> > 
> > Patch 4 describes how to manage the MSR and patch 7 describes the format
> > of the MSR (20-bit PASID value and bit 31 is valid bit).
> > 
> > Are they sufficient to help? Or do you mean something else?
> 
> I don't get why you need a rdmsr here, or why not having one would
> require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?

My 

RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Luck, Tony
>> The heuristic always initializes the MSR with the per mm PASID IIF the
>> mm has a valid PASID but the MSR doesn't have one. This heuristic usually
>> happens only once on the first #GP in a thread.
>
> But it doesn't guarantee the PASID is the right one. Suppose both the mm
> has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
> Then we NO-OP. So if the exception was due to us having the wrong PASID,
> we stuck.

ENQCMD only checks the 'valid' bit of the IA32_PASID MSR to decide whether
to #GP or not.  H/W has no concept of the "right" pasid value.

If IA32_PASID is valid with the wrong value ... then the system is about to
see some major corruption because the operations in the accelerator are
not going to translate to the physical addresses for pages owned by the process
that issued the ENQCMD.

-Tony


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Peter Zijlstra
On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote:
> Hi, Peter,
> On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> > > +/*
> > > + * Apply some heuristics to see if the #GP fault was caused by a thread
> > > + * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
> > > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> > > + * guesses incorrectly, take one more #GP fault.
> > 
> > How is that going to help? Aren't we then going to run this same
> > heuristic again and again and again?
> 
> The heuristic always initializes the MSR with the per mm PASID IIF the
> mm has a valid PASID but the MSR doesn't have one. This heuristic usually
> happens only once on the first #GP in a thread.

But it doesn't guarantee the PASID is the right one. Suppose both the mm
has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
Then we NO-OP. So if the exception was due to us having the wrong PASID,
we stuck.

> If the next #GP still comes in, the heuristic finds out the MSR already
> has a valid PASID and thus will not fixup the MSR any more. The fixup()
> returns "false" and lets others to handle the #GP.
> 
> So the heuristic will be executed once (at most) and won't be executed
> again and again.

So I get that you want to set the PASID on-demand, but I find this all
really weird code to make that happen.

> > > +bool __fixup_pasid_exception(void)
> > > +{
> > > + u64 pasid_msr;
> > > + unsigned int pasid;
> > > +
> > > + /*
> > > +  * This function is called only when this #GP was triggered from user
> > > +  * space. So the mm cannot be NULL.
> > > +  */
> > > + pasid = current->mm->pasid;
> > > + /* If the mm doesn't have a valid PASID, then can't help. */
> > > + if (invalid_pasid(pasid))
> > > + return false;
> > > +
> > > + /*
> > > +  * Since IRQ is disabled now, the current task still owns the FPU on
> > 
> > That's just weird and confusing. What you want to say is that you rely
> > on the exception disabling the interrupt.
> 
> I checked SDM again. You are right. #GP can be interrupted by machine check
> or other interrupts. So I cannot assume the current task still owns the FPU.
> Instead of directly rdmsr() and wrmsr(), I will add helpers that can access
> either the MSR on the processor or the PASID state in the memory.

That's not in fact what I meant, but yes, you can take exceptions while
!IF just fine.

> > > +  * this CPU and the PASID MSR can be directly accessed.
> > > +  *
> > > +  * If the MSR has a valid PASID, the #GP must be for some other reason.
> > > +  *
> > > +  * If rdmsr() is really a performance issue, a TIF_ flag may be
> > > +  * added to check if the thread has a valid PASID instead of rdmsr().
> > 
> > I don't understand any of this. Nobody except us writes to this MSR, we
> > should bloody well know what's in it. What gives?
> 
> Patch 4 describes how to manage the MSR and patch 7 describes the format
> of the MSR (20-bit PASID value and bit 31 is valid bit).
> 
> Are they sufficient to help? Or do you mean something else?

I don't get why you need a rdmsr here, or why not having one would
require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?

> > > +  */
> > > + rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > + if (pasid_msr & MSR_IA32_PASID_VALID)
> > > + return false;
> > > +
> > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);

How much more expensive is the wrmsr over the rdmsr? Can't we just
unconditionally write the current PASID and call it a day?


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Fenghua Yu
Hi, Peter,
On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> > +/*
> > + * Apply some heuristics to see if the #GP fault was caused by a thread
> > + * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
> > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> > + * guesses incorrectly, take one more #GP fault.
> 
> How is that going to help? Aren't we then going to run this same
> heuristic again and again and again?

The heuristic always initializes the MSR with the per mm PASID IIF the
mm has a valid PASID but the MSR doesn't have one. This heuristic usually
happens only once on the first #GP in a thread.

If the next #GP still comes in, the heuristic finds out the MSR already
has a valid PASID and thus will not fixup the MSR any more. The fixup()
returns "false" and lets others to handle the #GP.

So the heuristic will be executed once (at most) and won't be executed
again and again.

> 
> > + */
> > +bool __fixup_pasid_exception(void)
> > +{
> > +   u64 pasid_msr;
> > +   unsigned int pasid;
> > +
> > +   /*
> > +* This function is called only when this #GP was triggered from user
> > +* space. So the mm cannot be NULL.
> > +*/
> > +   pasid = current->mm->pasid;
> > +   /* If the mm doesn't have a valid PASID, then can't help. */
> > +   if (invalid_pasid(pasid))
> > +   return false;
> > +
> > +   /*
> > +* Since IRQ is disabled now, the current task still owns the FPU on
> 
> That's just weird and confusing. What you want to say is that you rely
> on the exception disabling the interrupt.

I checked SDM again. You are right. #GP can be interrupted by machine check
or other interrupts. So I cannot assume the current task still owns the FPU.
Instead of directly rdmsr() and wrmsr(), I will add helpers that can access
either the MSR on the processor or the PASID state in the memory.

> 
> > +* this CPU and the PASID MSR can be directly accessed.
> > +*
> > +* If the MSR has a valid PASID, the #GP must be for some other reason.
> > +*
> > +* If rdmsr() is really a performance issue, a TIF_ flag may be
> > +* added to check if the thread has a valid PASID instead of rdmsr().
> 
> I don't understand any of this. Nobody except us writes to this MSR, we
> should bloody well know what's in it. What gives?

Patch 4 describes how to manage the MSR and patch 7 describes the format
of the MSR (20-bit PASID value and bit 31 is valid bit).

Are they sufficient to help? Or do you mean something else?

> > +*/
> > +   rdmsrl(MSR_IA32_PASID, pasid_msr);
> > +   if (pasid_msr & MSR_IA32_PASID_VALID)
> > +   return false;
> > +
> > +   /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > +   wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> > +
> > +   return true;
> > +}
> > -- 
> > 2.19.1
> > 

Thanks.

-Fenghua


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Peter Zijlstra
On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> +/*
> + * Apply some heuristics to see if the #GP fault was caused by a thread
> + * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
> + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> + * guesses incorrectly, take one more #GP fault.

How is that going to help? Aren't we then going to run this same
heuristic again and again and again?

> + */
> +bool __fixup_pasid_exception(void)
> +{
> + u64 pasid_msr;
> + unsigned int pasid;
> +
> + /*
> +  * This function is called only when this #GP was triggered from user
> +  * space. So the mm cannot be NULL.
> +  */
> + pasid = current->mm->pasid;
> + /* If the mm doesn't have a valid PASID, then can't help. */
> + if (invalid_pasid(pasid))
> + return false;
> +
> + /*
> +  * Since IRQ is disabled now, the current task still owns the FPU on

That's just weird and confusing. What you want to say is that you rely
on the exception disabling the interrupt.

> +  * this CPU and the PASID MSR can be directly accessed.
> +  *
> +  * If the MSR has a valid PASID, the #GP must be for some other reason.
> +  *
> +  * If rdmsr() is really a performance issue, a TIF_ flag may be
> +  * added to check if the thread has a valid PASID instead of rdmsr().

I don't understand any of this. Nobody except us writes to this MSR, we
should bloody well know what's in it. What gives?

> +  */
> + rdmsrl(MSR_IA32_PASID, pasid_msr);
> + if (pasid_msr & MSR_IA32_PASID_VALID)
> + return false;
> +
> + /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> +
> + return true;
> +}
> -- 
> 2.19.1
> 


Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Peter Zijlstra
On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> @@ -447,6 +458,18 @@ dotraplinkage void do_general_protection(struct pt_regs 
> *regs, long error_code)
>   int ret;
>  
>   RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + /*
> +  * Perform the check for a user mode PASID exception before enable
> +  * interrupts. Doing this here ensures that the PASID MSR can be simply
> +  * accessed because the contents are known to be still associated
> +  * with the current process.
> +  */
> + if (user_mode(regs) && fixup_pasid_exception()) {
> + cond_local_irq_enable(regs);
> + return;

OK, so we're done with the exception, lets enable interrupts?

> + }


[PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-12 Thread Fenghua Yu
A #GP fault is generated when ENQCMD instruction is executed without
a valid PASID value programmed in the current thread's PASID MSR. The
#GP fault handler will initialize the MSR if a PASID has been allocated
for this process.

Decoding the user instruction is ugly and sets a bad architecture
precedent. It may not function if the faulting instruction is modified
after #GP.

Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
without a valid PASID value programmed. #GP error codes are 16 bits and all
16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
when loading from the source operand, so its not fully comprehending all
the reasons. Rather than special case the ENQCMD, in future Intel may
choose a different fault mechanism for such cases if recovery is needed on
#GP.

The following heuristic is used to avoid decoding the user instructions
to determine the precise reason for the #GP fault:
1) If the mm for the process has not been allocated a PASID, this #GP
   cannot be fixed.
2) If the PASID MSR is already initialized, then the #GP was for some
   other reason
3) Try initializing the PASID MSR and returning. If the #GP was from
   an ENQCMD this will fix it. If not, the #GP fault will be repeated
   and will hit case "2".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Update the first paragraph of the commit message (Thomas)
- Add reasons why don't decode the user instruction and don't use
  #GP error code (Thomas)
- Change get_task_mm() to current->mm (Thomas)
- Add comments on why IRQ is disabled during PASID fixup (Thomas)
- Add comment in fixup() that the function is called when #GP is from
  user (so mm is not NULL) (Dave Hansen)

 arch/x86/include/asm/iommu.h |  1 +
 arch/x86/kernel/traps.c  | 23 +
 drivers/iommu/intel/svm.c| 39 
 3 files changed, 63 insertions(+)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index ed41259fe7ac..e9365a5d6f7d 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -27,5 +27,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
 }
 
 void __free_pasid(struct mm_struct *mm);
+bool __fixup_pasid_exception(void);
 
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4cc541051994..0f78d5cdddfe 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -436,6 +437,16 @@ static enum kernel_gp_hint get_kernel_gp_address(struct 
pt_regs *regs,
return GP_CANONICAL;
 }
 
+static bool fixup_pasid_exception(void)
+{
+   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+   return false;
+   if (!static_cpu_has(X86_FEATURE_ENQCMD))
+   return false;
+
+   return __fixup_pasid_exception();
+}
+
 #define GPFSTR "general protection fault"
 
 dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
@@ -447,6 +458,18 @@ dotraplinkage void do_general_protection(struct pt_regs 
*regs, long error_code)
int ret;
 
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+   /*
+* Perform the check for a user mode PASID exception before enable
+* interrupts. Doing this here ensures that the PASID MSR can be simply
+* accessed because the contents are known to be still associated
+* with the current process.
+*/
+   if (user_mode(regs) && fixup_pasid_exception()) {
+   cond_local_irq_enable(regs);
+   return;
+   }
+
cond_local_irq_enable(regs);
 
if (static_cpu_has(X86_FEATURE_UMIP)) {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 27dc866b8461..81fd2380c0f9 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1078,3 +1078,42 @@ void __free_pasid(struct mm_struct *mm)
 */
ioasid_free(pasid);
 }
+
+/*
+ * Apply some heuristics to see if the #GP fault was caused by a thread
+ * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
+ * is the problem, try initializing the IA32_PASID MSR. If the heuristic
+ * guesses incorrectly, take one more #GP fault.
+ */
+bool __fixup_pasid_exception(void)
+{
+   u64 pasid_msr;
+   unsigned int pasid;
+
+   /*
+* This function is called only when this #GP was triggered from user
+* space. So the mm cannot be NULL.
+*/
+   pasid = current->mm->pasid;
+   /* If the mm doesn't have a valid PASID, then can't help. */
+   if (invalid_pasid(pasid))
+   return false;
+
+   /*
+* Since IRQ is disabled now, the current task still owns the FPU on
+* this CPU and the PASID MSR can be