RE: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-16 Thread Luck, Tony
> I would expect real applications will try to use the same PASID for
> the same IOVA map to optimize IOTLB caching.

On Intel a ring3 application only gets to use one PASID.

The ENQCMD instruction pick up the PASID for the process
from the IA32_PASID MSR (set by OS when first access,
and on context switches thereafter).

Kernel users (ring0) can supply any PASID when they use
the ENQCMDS instruction. Is that what you mean when you
say "real applications"?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-10 Thread Luck, Tony
On Thu, Feb 10, 2022 at 08:27:50AM -0800, Fenghua Yu wrote:
> Hi, Jacob,
> 
> On Wed, Feb 09, 2022 at 07:16:14PM -0800, Jacob Pan wrote:
> > Hi Fenghua,
> > 
> > On Mon,  7 Feb 2022 15:02:48 -0800, Fenghua Yu  wrote:
> > 
> > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device
> > > *dev, struct mm_struct *mm, void }
> > >  
> > >   sva = intel_svm_bind_mm(iommu, dev, mm, flags);
> > > - if (IS_ERR_OR_NULL(sva))
> > > - intel_svm_free_pasid(mm);
> > If bind fails, the PASID has no IOMMU nor CPU context. It should be safe to
> > free here.
> 
> The PASID can not be freed even if bind fails. The PASID allocated earlier
> (either in this thread or in another thread) might be populated to other
> threads already and being used now.
> 
> Without freeing the PASID on bind failure, the worst case is the PASID might
> not be used in the process (and will be freed on process exit anyway).
> 
> This all matches with the PASID life time described in the commit message.

But what does this mean for the user that failed that intel_svm_bind_mm()?

Here's a scenario:

Process sets up to use PASID capable device #1. Everything works,
so the process has mm->pasid, and the IOMMU has the tables to map
virtual addresses coming from device #1 using that PASID.

Now the same process asks to start using PASID capable device #2,
but there is a failure at intel_svm_bind_mm().

Fenghua is right that we shouldn't free the PASID. It is in use
by at least one thread of the process to access device #1.

But what happens with device #2? Does the caller of intel_svm_bind()
do the right thing with the IS_ERR_OR_NULL return value to let the
user know that device #2 isn't accessible?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-29 Thread Luck, Tony
On Wed, Sep 29, 2021 at 06:07:22PM +, Fenghua Yu wrote:
> Add
> +#ifdef CONFIG_IOMMU_SUPPORT
> because mm->pasid and current-pasid_activated are defined only if
> CONFIG_IOMMU_SUPPORT is defined.
> 
> > +   if (user_mode(regs) && current->mm->pasid && !current->pasid_activated) 
> > {
> 
> Maybe need to add "&& cpu_feature_enabled(X86_FEATURE_ENQCMD)" because
> the IA32_PASID MSR is only used when ENQCMD is enabled?
> 
> > +   current->pasid_activated = 1;
> > +   wrmsrl(MSR_IA32_PASID, current->mm->pasid | 
> > MSR_IA32_PASID_VALID);
> > +   return;
> > +   }
> > +
> 
> +endif

New version that addresses those issues.  Has ugly #ifdef in C
code :-(  If that's unacceptable, then could create some stub
functions, or add a call to __try_fixup_pasid() that's in a
file in the iommu code that is only built when CONFIG_IOMMU_SUPPORT
is set.  But either of those move the details far away from the
#GP handler so make extra work for anyone trying to follow along
with what is happening here.

-Tony

---

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a58800973aed..5a3c87fd65de 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -528,6 +528,32 @@ static enum kernel_gp_hint get_kernel_gp_address(struct 
pt_regs *regs,
 
 #define GPFSTR "general protection fault"
 
+/*
+ * When a user executes the ENQCMD instruction it will #GP
+ * fault if the IA32_PASID MSR has not been set up with a
+ * valid PASID.
+ * So if the process has been allocated a PASID (mm->pasid)
+ * AND the IA32_PASID MSR has not been initialized, try to
+ * fix this #GP by initializing the IA32_PASID MSR.
+ * If the #GP was for some other reason, it will trigger
+ * again, but this routine will return false and the #GP
+ * will be processed.
+ */
+static void try_fixup_pasid(void)
+{
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return false;
+
+#ifdef CONFIG_IOMMU_SUPPORT
+   if (current->mm->pasid && !current->pasid_activated) {
+   current->pasid_activated = 1;
+   wrmsrl(MSR_IA32_PASID, current->mm->pasid);
+   return true;
+   }
+#endif
+   return false;
+}
+
 DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 {
char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
@@ -536,6 +562,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
unsigned long gp_addr;
int ret;
 
+   if (user_mode(regs) && try_fixup_pasid())
+   return;
+
cond_local_irq_enable(regs);
 
if (static_cpu_has(X86_FEATURE_UMIP)) {
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-29 Thread Luck, Tony
On Wed, Sep 29, 2021 at 07:15:53PM +0200, Thomas Gleixner wrote:
> On Wed, Sep 29 2021 at 09:59, Andy Lutomirski wrote:
> > On 9/29/21 05:28, Thomas Gleixner wrote:
> >> Looking at that patch again, none of this muck in fpu__pasid_write() is
> >> required at all. The whole exception fixup is:
> >> 
> >>  if (!user_mode(regs))
> >>   return false;
> >> 
> >>  if (!current->mm->pasid)
> >>   return false;
> >> 
> >>  if (current->pasid_activated)
> >> return false;
> >
> > <-- preemption or BH here: kaboom.
> 
> Sigh, this had obviously to run in the early portion of #GP, i.e. before
> enabling interrupts.

Like this? Obviously with some comment about why this is being done.

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a58800973aed..a848a59291e7 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -536,6 +536,12 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
unsigned long gp_addr;
int ret;
 
+   if (user_mode(regs) && current->mm->pasid && !current->pasid_activated) 
{
+   current->pasid_activated = 1;
+   wrmsrl(MSR_IA32_PASID, current->mm->pasid | 
MSR_IA32_PASID_VALID);
+   return;
+   }
+
cond_local_irq_enable(regs);
 
if (static_cpu_has(X86_FEATURE_UMIP)) {

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-29 Thread Luck, Tony
On Wed, Sep 29, 2021 at 09:58:22AM -0700, Andy Lutomirski wrote:
> On 9/28/21 16:10, Luck, Tony wrote:
> > Moving beyond pseudo-code and into compiles-but-probably-broken-code.
> > 
> > 
> > The intent of the functions below is that Fenghua should be able to
> > do:
> > 
> > void fpu__pasid_write(u32 pasid)
> > {
> > u64 msr_val = pasid | MSR_IA32_PASID_VALID;
> > struct ia32_pasid_state *addr;
> > 
> > addr = begin_update_one_xsave_feature(current, XFEATURE_PASID, true);
> > addr->pasid = msr_val;
> > finish_update_one_xsave_feature(current);
> > }
> > 
> 
> This gets gnarly because we would presumably like to optimize the case where
> we can do the update directly in registers.  I wonder if we can do it with a
> bit of macro magic in a somewhat generic way:

Can we defere the optimizations until there is a use case that
cares? The existing use case (fixing up the #GP fault by setting
the PASID MSR) isn't performance critical.

Let's just have something that is clear (or as clear as any xsave
code can be) and correct.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-29 Thread Luck, Tony
> There is zero requirement to look at TIF_NEED_FPU_LOAD or
> fpregs_state_valid() simply because the #GP comes straight from user
> space which means the FPU registers contain the current tasks user space
> state.

Just to double confirm ... there is no point in the #GP handler up to this point
where pre-emption can occur?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Luck, Tony
>>  if (!(xsave->header.xfeatures & fmask)) {
>>  xsave->header.xfeatures |= fmask;   //<
>>  xsaves(xsave, fmask);
>>  }
>
> I'm not sure why the FPU state is initialized here.
>
> For updating the PASID state, it's unnecessary to init the PASID state.
>
> Maybe it is necessary in other cases?

Dave had suggested initializing feature state when it is unknown (could
be garbage).  This is my attempt to follow that guidance. I'm not confident
that my tests for "is the state in registers, in memory, or is garbage"
really capture all the cases.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Luck, Tony
> But the helpers seem to be generic. They take "task" as a parameter and
> handle the task as non-current case. So the helpers are not for PASID only.
> They may be used by others to modify a running task's FPU state. But
> It's not safe to do so.
>
> At least need some comments/restriction for the helpers to be used on
> a running task?

Fenghua,

Correct. When I add some comments I'll make it very clear that it is the
responsibility of the caller to make sure that the task that is targeted
cannot run.

Earlier in this thread Dave suggested there are two cases where these
helpers might be useful:

1) Fork/clone - to set up some xsave state in the child ... but this would be
done before the child is allowed to run.

2) ptrace - this is a "maybe" because ptrace already has code to handle all
the xsave state as a single entity. Perhaps someone might want to change it
to only modify a single feature ... but this seems unlikely. In any case the
ptrace code already "stops" the target process while it is reading/writing 
state.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Luck, Tony
>>  fpregs_lock();
>
> I'm afraid we may hit the same locking issue when we send IPI to notify 
> another task to modify its
> PASID state. Here the API is called to modify another running task's PASID 
> state as well without a right lock.
> fpregs_lock() is not enough to deal with this, I'm afraid.

We don't send IPI any more to change PASID state. The only place that the
current patch series touches the PASID MSR is in the #GP fault handler.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Luck, Tony
On Tue, Sep 28, 2021 at 11:50:37PM +, Fenghua Yu wrote:
> If xfeatures's feature bit is 0, xsaves will not write its init value to the
> memory due to init optimization. So the xsaves will do nothing and the
> state is not initialized and may have random data.

> Setting TIF_NEED_FPU_LOAD cannot guaranteed to execute XRSTORS on exiting
> to user. In fpregs_restore_userregs():
>   if (!fpregs_state_valid(fpu, cpu)) {
>   ...
> __restore_fpregs_from_fpstate(>state, mask);
>   ...
>   }
> 
> fpregs state should be invalid to get the XRSTROS executed.
> 
> So setting TIF_NEED_FPU_LOAD may get the FPU register unchanged on exiting
> to user.

Does this help?
Changed lines marked with //<

-Tony

void *begin_update_one_xsave_feature(struct task_struct *tsk,
 enum xfeature xfeature, bool full)
{
struct xregs_state *xsave = >thread.fpu.state.xsave;
struct xregs_state *xinit = _fpstate.xsave;
u64 fmask = 1ull << xfeature;
void *addr;

BUG_ON(!(xsave->header.xcomp_bv & fmask));

fpregs_lock();

addr = __raw_xsave_addr(xsave, xfeature);

if (full || tsk != current) {
memcpy(addr, __raw_xsave_addr(xinit, xfeature), 
xstate_sizes[xfeature]);
goto out;
}

if (!(xsave->header.xfeatures & fmask)) {
xsave->header.xfeatures |= fmask;   //<
xsaves(xsave, fmask);
}

out:
xsave->header.xfeatures |= fmask;
return addr;
}

void finish_update_one_xsave_feature(struct task_struct *tsk)
{
set_ti_thread_flag(task_thread_info(tsk), TIF_NEED_FPU_LOAD);
if (tsk == current) //<
__cpu_invalidate_fpregs_state();//<
fpregs_unlock();
}
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Luck, Tony
Moving beyond pseudo-code and into compiles-but-probably-broken-code.


The intent of the functions below is that Fenghua should be able to
do:

void fpu__pasid_write(u32 pasid)
{
u64 msr_val = pasid | MSR_IA32_PASID_VALID;
struct ia32_pasid_state *addr;

addr = begin_update_one_xsave_feature(current, XFEATURE_PASID, true);
addr->pasid = msr_val;
finish_update_one_xsave_feature(current);
}

So here's the two new functions that would be added to
arch/x86/kernel/fpu/xstate.c



void *begin_update_one_xsave_feature(struct task_struct *tsk,
 enum xfeature xfeature, bool full)
{
struct xregs_state *xsave = >thread.fpu.state.xsave;
struct xregs_state *xinit = _fpstate.xsave;
u64 fmask = 1ull << xfeature;
void *addr;

BUG_ON(!(xsave->header.xcomp_bv & fmask));

fpregs_lock();

addr = __raw_xsave_addr(xsave, xfeature);

if (full || tsk != current) {
memcpy(addr, __raw_xsave_addr(xinit, xfeature), 
xstate_sizes[xfeature]);
goto out;
}

/* could optimize some cases where xsaves() isn't fastest option */
if (!(xsave->header.xfeatures & fmask))
xsaves(xsave, fmask);

out:
xsave->header.xfeatures |= fmask;
return addr;
}

void finish_update_one_xsave_feature(struct task_struct *tsk)
{
set_ti_thread_flag(task_thread_info(tsk), TIF_NEED_FPU_LOAD);
fpregs_unlock();
}



-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Luck, Tony
On Tue, Sep 28, 2021 at 12:19:22PM -0700, Dave Hansen wrote:
> On 9/28/21 11:50 AM, Luck, Tony wrote:
> > On Mon, Sep 27, 2021 at 04:51:25PM -0700, Dave Hansen wrote:
> ...
> >> 1. Hide whether we need to write to real registers
> >> 2. Hide whether we need to update the in-memory image
> >> 3. Hide other FPU infrastructure like the TIF flag.
> >> 4. Make the users deal with a *whole* state in the replace API
> > 
> > Is that difference just whether you need to save the
> > state from registers to memory (for the "update" case)
> > or not (for the "replace" case ... where you can ignore
> > the current register, overwrite the whole per-feature
> > xsave area and mark it to be restored to registers).
> > 
> > If so, just a "bool full" argument might do the trick?
> 
> I want to be able to hide the complexity of where the old state comes
> from.  It might be in registers or it might be in memory or it might be
> *neither*.  It's possible we're running with stale register state and a
> current->...->xsave buffer that has XFEATURES_FOO 0.
> 
> In that case, the "old" copy might be memcpy'd out of the init_task.
> Or, for pkeys, we might build it ourselves with init_pkru_val.

So should there be an error case if there isn't an "old" state, and
the user calls:

p = begin_update_one_xsave_feature(XFEATURE_something, false);

Maybe instead of an error, just fill it in with the init state for the feature?

> > Also - you have a "tsk" argument in your pseudo code. Is
> > this needed? Are there places where we need to perform
> > these operations on something other than "current"?
> 
> Two cases come to mind:
> 1. Fork/clone where we are doing things to our child's XSAVE buffer
> 2. ptrace() where we are poking into another task's state
> 
> ptrace() goes for the *whole* buffer now.  I'm not sure it would need
> this per-feature API.  I just call it out as something that we might
> need in the future.

Ok - those seem ok ... it is up to the caller to make sure that the
target task is in some "not running, and can't suddenly start running"
state before calling these functions.

> 
> > pseudo-code:
> > 
> > void *begin_update_one_xsave_feature(enum xfeature xfeature, bool full)
> > {
> > void *addr;
> > 
> > BUG_ON(!(xsave->header.xcomp_bv & xfeature));
> > 
> > addr = __raw_xsave_addr(xsave, xfeature);
> > 
> > fpregs_lock();
> > 
> > if (full)
> > return addr;
> 
> If the feature is marked as in the init state in the buffer
> (XSTATE_BV[feature]==0), this addr *could* contain total garbage.  So,
> we'd want to make sure that the memory contents have the init state
> written before handing them back to the caller.  That's not strictly
> required if the user is writing the whole thing, but it's the nice thing
> to do.

Nice guys waste CPU cycles writing to memory that is just going to get
written again.

> 
> > if (xfeature registers are "live")
> > xsaves(xstate, 1 << xfeature);
> 
> One little note: I don't think we would necessarily need to do an XSAVES
> here.  For PKRU, for instance, we could just do a rdpkru.

Like this?

if (tsk == current) {
switch (xfeature) {
case XFEATURE_PKRU:
*(u32 *)addr = rdpkru();
break;
case XFEATURE_PASID:
rdmsrl(MSR_IA32_PASID, msr);
*(u64 *)addr = msr;
break;
... any other "easy" states ...
default:
xsaves(xstate, 1 << xfeature);
break;
}
}

> 
> > return addr;
> > }
> > 
> > void finish_update_one_xsave_feature(enum xfeature xfeature)
> > {
> > mark feature modified
> 
> I think we'd want to do this at the "begin" time.  Also, do you mean we
> should set XSTATE_BV[feature]?

Begin? End? It's all inside fpregs_lock(). But whatever seems best.

Yes, I think that this means set XSTATE_BV[feature] ... but I'm
relying on you as the xsave expert to help get the subtle bits right so
the Andy Lutomirski can smile at this code.

> > set TIF bit
> 
> Since the XSAVE buffer was updated, it now contains the canonical FPU
> state.  It may have diverged from the register state, thus we need to
> set TIF_NEED_FPU_LOAD.

Yes, that's the TIF bit my pseudo-code intended.

> It's also worth noting that we *could*:
> 
>   xrstors(xstate, 1< 
> as well.

Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Luck, Tony
On Mon, Sep 27, 2021 at 04:51:25PM -0700, Dave Hansen wrote:
> That's close to what we want.
> 
> The size should probably be implicit.  If it isn't implicit, it needs to
> at least be double-checked against the state sizes.
> 
> Not to get too fancy, but I think we also want to have a "replace"
> operation which is separate from the "update".  Think of a case where
> you are trying to set a bit:
> 
>   struct pkru_state *pk = start_update_xstate(tsk, XSTATE_PKRU);
>   pk->pkru |= 0x100;
>   finish_update_xstate(tsk, XSTATE_PKRU, pk);
> 
> versus setting a whole new value:
> 
>   struct pkru_state *pk = start_replace_xstate(tsk, XSTATE_PKRU);
>   memset(pkru, sizeof(*pk), 0);
>   pk->pkru = 0x1234;
>   finish_replace_xstate(tsk, XSTATE_PKRU, pk);
> 
> They look similar, but they actually might have very different costs for
> big states like AMX.  We can also do some very different debugging for
> them.  In normal operation, these _can_ just return pointers directly to
> the fpu...->xstate in some case.  But, if we're debugging, we could
> kmalloc() a buffer and do sanity checking before it goes into the task
> buffer.
> 
> We don't have to do any of that fancy stuff now.  We don't even need to
> do an "update" if all we need for now for XFEATURE_PASID is a "replace".
> 
> 1. Hide whether we need to write to real registers
> 2. Hide whether we need to update the in-memory image
> 3. Hide other FPU infrastructure like the TIF flag.
> 4. Make the users deal with a *whole* state in the replace API

Is that difference just whether you need to save the
state from registers to memory (for the "update" case)
or not (for the "replace" case ... where you can ignore
the current register, overwrite the whole per-feature
xsave area and mark it to be restored to registers).

If so, just a "bool full" argument might do the trick?

Also - you have a "tsk" argument in your pseudo code. Is
this needed? Are there places where we need to perform
these operations on something other than "current"?

pseudo-code:

void *begin_update_one_xsave_feature(enum xfeature xfeature, bool full)
{
void *addr;

BUG_ON(!(xsave->header.xcomp_bv & xfeature));

addr = __raw_xsave_addr(xsave, xfeature);

fpregs_lock();

if (full)
return addr;

if (xfeature registers are "live")
xsaves(xstate, 1 << xfeature);

return addr;
}

void finish_update_one_xsave_feature(enum xfeature xfeature)
{
mark feature modified
set TIF bit
fpregs_unlock();
}

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-27 Thread Luck, Tony
On Thu, Sep 23, 2021 at 04:17:05PM -0700, Andy Lutomirski wrote:
> > +   fpregs_lock();
> > +
> > +   /*
> > +* If the task's FPU doesn't need to be loaded or is valid, directly
> > +* write the IA32_PASID MSR. Otherwise, write the PASID state and
> > +* the MSR will be loaded from the PASID state before returning to
> > +* the user.
> > +*/
> > +   if (!test_thread_flag(TIF_NEED_FPU_LOAD) ||
> > +   fpregs_state_valid(fpu, smp_processor_id())) {
> > +   wrmsrl(MSR_IA32_PASID, msr_val);
> 
> Let me try to decode this.
> 
> If the current task's FPU state is live or if the state is in memory but the 
> CPU regs just happen to match the copy in memory, then write the MSR.  Else 
> write the value to memory.
> 
> This is wrong.  If !TIF_NEED_FPU_LOAD && fpregs_state_valid, you MUST NOT 
> MODIFY FPU STATE.  This is not negotiable -- you will break coherence between 
> CPU regs and the memory image.  The way you modify the current task's state 
> is either you modify it in CPU regs (if the kernel knows that the CPU regs 
> are the one and only source of truth) OR you modify it in memory and 
> invalidate any preserved copies (by zapping last_cpu). 
> 
> In any event, that particular bit of logic really doesn't belong in here -- 
> it belongs in some helper that gets it right, once.

Andy,

A helper sounds like a good idea. Can you flesh out what
you would like that to look like?

Is it just the "where is the live register state?" so the
above could be written:

if (xsave_state_in_memory(args ...))
update pasid bit of xsave state in memory
else
wrmsrl(MSR_IA32_PASID, msr_val);

Or are you thinking of a helper that does both the check
and the update ... so the code here could be:

update_one_xsave_feature(XFEATURE_PASID, _val, sizeof(msr_val));

With the helper being something like:

void update_one_xsave_feature(enum xfeature xfeature, void *data, size_t size)
{
if (xsave_state_in_memory(args ...)) {
addr = get_xsave_addr(xsave, xfeature);
memcpy(addr, data, size);
xsave->header.xfeatures |= (1 << xfeature);
return;
}

switch (xfeature) {
case XFEATURE_PASID:
wrmsrl(MSR_IA32_PASID, *(u64 *)data);
break;

case each_of_the_other_XFEATURE_enums:
code to update registers for that XFEATURE
}
}

either way needs the definitive correct coding for xsave_state_in_memory()

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-24 Thread Luck, Tony
On Fri, Sep 24, 2021 at 04:03:53PM -0700, Andy Lutomirski wrote:
> 1. The context switch code needs to resync PASID.  Unfortunately, this adds 
> some overhead to every context switch, although a static_branch could 
> minimize it for non-PASID users.

Any solution that adds to context switch time isn't going to meet
the definition of "perfect" either.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-24 Thread Luck, Tony
On Fri, Sep 24, 2021 at 03:18:12PM +0200, Thomas Gleixner wrote:
> On Thu, Sep 23 2021 at 19:48, Thomas Gleixner wrote:
> > On Thu, Sep 23 2021 at 09:40, Tony Luck wrote:
> >
> > fpu_write_task_pasid() can just grab the pasid from current->mm->pasid
> > and be done with it.
> >
> > The task exit code can just call iommu_pasid_put_task_ref() from the
> > generic code and not from within x86.
> 
> But OTOH why do you need a per task reference count on the PASID at all?
> 
> The PASID is fundamentaly tied to the mm and the mm can't go away before
> the threads have gone away unless this magically changed after I checked
> that ~20 years ago.

It would be possible to avoid a per-task reference to the PASID by
taking an extra reference when mm->pasid is first allocated using
the CONFIG_PASID_TASK_REFS you proposed yesterday to define a function
to take the extra reference, and another to drop it when the mm is
finally freed ... with stubs to do nothing on architectures that
don't create per-task PASID context.

This solution works, but is functionally different from Fenghua's
proposal for this case:

Process clones a task
task binds a device
task accesses device using PASID
task unbinds device
task exits (but process lives on)

Fenghua will free the PASID because the reference count goes
back to zero. The "take an extra reference and keep until the
mm is freed" option would needlessly hold onto the PASID.

This seems like an odd usage case ... even if it does exist, a process
that does this may spawn another task that does the same thing.

If you think it is sufficiently simpler to use the "extra reference"
option (invoking the "perfect is the enemy of good enough" rule) then we
can change.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-24 Thread Luck, Tony
On Fri, Sep 24, 2021 at 03:37:22PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 23, 2021 at 10:14:42AM -0700, Luck, Tony wrote:
> > On Wed, Sep 22, 2021 at 11:07:22PM +0200, Peter Zijlstra wrote:
> > > On Mon, Sep 20, 2021 at 07:23:45PM +, Fenghua Yu wrote:
> > > > @@ -538,6 +547,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
> > > >  
> > > > cond_local_irq_enable(regs);
> > > >  
> > > > +   if (user_mode(regs) && fixup_pasid_exception())
> > > > +   goto exit;
> > > > +
> > 
> > > So you're eating any random #GP that might or might not be PASID
> > > related. And all that witout a comment... Enlighten?
> > 
> > This is moderately well commented inside the fixup_pasid_exception()
> > function. Another copy of the comments here at the call-site seems
> > overkill.
> 
> +static bool fixup_pasid_exception(void)
> +{
> +   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
> +   return false;
> +
> +   return __fixup_pasid_exception();
> +}
> 
> /me goes looking for comments in that function, lemme get out the
> electron microscope, because I can't seem to spot them with the naked
> eye.

If you have ctags installed then a ctrl-] on that
__fixup_pasid_exception() will take you to the function with
the comments. No electron microscope needed.

+
+/*
+ * Try to figure out if there is a PASID MSR value to propagate to the
+ * thread taking the #GP.
+ */
+bool __fixup_pasid_exception(void)
+{
+   u32 pasid;
+
+   /*
+* This function is called only when this #GP was triggered from user
+* space. So the mm cannot be NULL.
+*/
+   pasid = current->mm->pasid;
+
+   /* If no PASID is allocated, there is nothing to propagate. */
+   if (pasid == PASID_DISABLED)
+   return false;
+
+   /*
+* If the current task already has a valid PASID MSR, then the #GP
+* fault must be for some non-ENQCMD related reason.
+*/
+   if (current->has_valid_pasid)
+   return false;
+
+   /* Fix up the MSR by the PASID in the mm. */
+   fpu__pasid_write(pasid);
+   current->has_valid_pasid = 1;
+
+   return true;
+}

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-23 Thread Luck, Tony
On Thu, Sep 23, 2021 at 04:09:18PM -0700, Andy Lutomirski wrote:
> On Mon, Sep 20, 2021, at 12:23 PM, Fenghua Yu wrote:

> I think this is unnecessarily complicated because it's buying in to the
> existing ISA misconception that PASID has anything to do with a task.
> A PASID belongs to an mm, full stop.  Now the ISA is nasty and we have
> tasks that have *noticed* that their mm has a PASID and tasks that have
> not noticed this fact, but that should be irrelevant to essentially
> everything except the fault handler.
> 
> So just refcount the thing the obvious way: take a reference when you
> stick the PASID in the mm_struct and drop the reference in __mmdrop().
> Problem solved.  You could probably drop it more aggressively in
> __mmput(), and the comment explaining why is left as an exercise to the
> reader -- if a kernel thread starts doing ENQCMD, we have worse things
> to worry about :)

That doesn't match well with the non-x86 usage of PASIDs. The code there
bumps the reference count on each device bind, and decrements on each
device unbind.

If we don't keep a reference count for each task that has IA32_PASID
set up we could have this sequence

1) Process binds to a PASID capable device
2) Task uses ENQCMD, so PASID MSR is set up.
3) Process unbinds the device, reference count on PASID
   goes to zero. PASID is freed. PASID is reallocated to
   another task.
4) Task from step #2 uses ENQCMD to submit a descriptor
   and device now processes virtual addresses based on mappings
   in the new task.

Now you might say that at step 3 we need to hunt down all the
tasks that have PASID enabled and disabled ... but that's the
same painful code that we avoided when we said that we would
not make Linux hand out a PASID to all existing tasks in a
process on the first bind operation.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-23 Thread Luck, Tony
On Wed, Sep 22, 2021 at 11:07:22PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 20, 2021 at 07:23:45PM +, Fenghua Yu wrote:
> > @@ -538,6 +547,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
> >  
> > cond_local_irq_enable(regs);
> >  
> > +   if (user_mode(regs) && fixup_pasid_exception())
> > +   goto exit;
> > +

> So you're eating any random #GP that might or might not be PASID
> related. And all that witout a comment... Enlighten?

This is moderately well commented inside the fixup_pasid_exception()
function. Another copy of the comments here at the call-site seems
overkill.

Would it help to change the name to try_fixup_pasid_exception()
to make it clearer that this is just a heuristic that may or may
not fix this particular #GP?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-23 Thread Luck, Tony
On Thu, Sep 23, 2021 at 04:36:50PM +0200, Thomas Gleixner wrote:
> On Mon, Sep 20 2021 at 19:23, Fenghua Yu wrote:
> >  
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +void pasid_put(struct task_struct *tsk, struct mm_struct *mm);
> > +#else
> > +static inline void pasid_put(struct task_struct *tsk, struct mm_struct 
> > *mm) { }
> > +#endif
> 
> This code is again defining that PASID is entirely restricted to
> INTEL. It's true, that no other vendor supports this, but PASID is
> a non-vendor specific concept.
> 
> Sticking this into INTEL code means that any other PASID implementation
> has to rip it out again from INTEL code and make it a run time property.
> 
> The refcounting issue should be the same for all PASID mechanisms which
> attach PASID to a mm. What's INTEL specific about that?
> 
> So can we pretty please do that correct right away?

It's a bit messy (surprise).

There are two reasons to hold a refcount on a PASID

1) The process has done a bind on a device that uses PASIDs

This one isn't dependent on Intel.

2) A task within a process is using ENQCMD (and thus holds
   a reference on the PASID because IA32_PASID MSR for this
   task has the PASID value loaded with the enable bit set).

This is (currently) Intel specific (until others
implement an ENQCMD-like feature to allow apps to
access PASID enabled devices without going through
the OS).

Perhaps some better function naming might help?  E.g. have
a task_pasid_put() function that handles the process exit
case separatley from the device unbind case.

void task_pasid_put(void)
{
if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
return;

if (current->has_valid_pasid) {
mutex_lock(_mutex);
iommu_sva_free_pasid(mm);
mutex_unlock(_mutex);
}
}

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-22 Thread Luck, Tony
>> > +static bool fixup_pasid_exception(void)
>> > +{
>> > +  if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
>> > +  return false;
>> > +
>> > +  return __fixup_pasid_exception();
>> > +}
>
> That is, shouldn't the above at the very least decode the instruction
> causing the #GP and check it's this ENQCMD thing?

It can't reliably do that because some other thread in the process may
have re-written the memory that regs->ip points at (bizarre case, but
I think Dave Hansen brought it up).

So it would just add extra code, and still only be a hint.

Without the check this sequence is possible:

1) Process binds an accelerator (so mm->pasid is set)
2) Task in the process executes something other than ENQCMD that gets a #GP
3) Kernel says "Oh, mm->pasid is set, I'll initialize the IA32_PASID MSR to see 
if that fixes it"
4) Nope. Re-executing the instruction at step #2 just gives another #GP
5) Kernel says "I already set IA32_PASID, so this must be something else ... do 
regular #GP actions"

Now if the task catches the signal that results from step #5 and avoids 
termination, it will have
IA32_PASID set ... but to the right value should it go on to actually execute 
ENQCMD at some
future point.

So the corner case from not knowing whether this #GP was from ENQCMD or not is 
harmless.

-Tony


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()

2021-06-25 Thread Luck, Tony
On Wed, Jun 09, 2021 at 04:34:31PM -0700, Andy Lutomirski wrote:
> On 6/9/21 10:32 AM, Luck, Tony wrote:
> >>> I've told Fenghua to dig out the previous iteration of this patch where
> >>> the plan was to lazily fix the PASID_MSR in other threads in the #GP
> >>> handler.
> >>
> >> Blech.  Also this won't work for other PASID-like features.
> >>
> >> I have a half-written patch to fix this up for real.  Stay tuned.
> > 
> > Andy: Any update on this?
> > 
> > -Tony
> > 
> 
> Let me try to merge my pile with tglx's pile and come up with something
> halfway sane.

Looks like Thomas' pile is done (well done enough to be queued in TIP).

Can we see your "pile" soon?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()

2021-06-09 Thread Luck, Tony
>> I've told Fenghua to dig out the previous iteration of this patch where
>> the plan was to lazily fix the PASID_MSR in other threads in the #GP
>> handler.
>
> Blech.  Also this won't work for other PASID-like features.
>
> I have a half-written patch to fix this up for real.  Stay tuned.

Andy: Any update on this?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()

2021-06-02 Thread Luck, Tony
>> ... so on a PASID system, your trivial reproducer would theoretically
>> fire the same way and corrupt FPU state just as well.
>
> This is worse and you can't selftest it because the IPI can just hit in
> the middle of _any_ FPU state operation and corrupt state.

That sounds like we should abandon the "IPI all the other threads
to force enable the PASID for them" approach. It would just be a
nightmare of papering over cracks when the IPI was delivered at
some inconvenient moment when the recipient was in the middle
of touching xsave state.

I've told Fenghua to dig out the previous iteration of this patch where
the plan was to lazily fix the PASID_MSR in other threads in the #GP
handler.

That algorithm is very simple and easy to check. Pseudo-code:

#GP
if (usermode && current->mm->pasid && rdmsr(PASID_MSR) != valid) {
wrmsr(PASID_MSR, current->mm->pasid | PASID_VALID);
return;
}

Worst case is that some thread of a multi-threaded process that is using PASID
takes some unrelated #GP ... this code will try to fix it by enabling the 
PASID_MSR.
That will just #GP a second time and this test will see the MSR is already set,
so fall into the usual #GP handling code.

Seems like a better direction than trying to fix the IPI method. The 
virtualization
folks will like this way more because IPI in guest causes a couple of VMEXIT
so is somewhat expensive.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Luck, Tony
On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> It seems there are two options:
> 1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU page
> table/domain, mark the device is PASID only with a flag. Use DMA APIs
> to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU driver
> will decide whether to use system PASID domain during map/unmap. Not clear
> if we also need to make IOVA==kernel VA.
> 
> 2. Add a new IOMMU API to setup a system PASID which points to init_mm.pgd.
> This API only allows trusted device to bind with the system PASID at its
> own risk. There is no need for DMA API. This is the same as the current
> code except with an explicit API.
> 
> Which option?

Option #1 looks cleaner to me. Option #2 gives access to bits
of memory that the users of system PASID shouldn't ever need
to touch ... just map regions of memory that the kernel has
a "struct page" for.

What does "use DMA APIs to explicitly map/unmap" mean? Is that
for the whole region?

I'm expecting that once this system PASID has been initialized,
then any accelerator device with a kernel use case would use the
same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
& decompression, etc.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Luck, Tony
> If you want this then be explicit about what it is you are making when
> building the API. Don't try to hide it under some generic idea of
> "kernel PCI DMA SVA"

So, a special API call (that Dave can call from IDXD) to set up this
kernel PASID. With suitable documentation to explain the scope.
Maybe with a separate CONFIG option so it can be completely
stubbed out (IDXD does *NOT* "select" this option ... users have
to explicitly pick it).

> I could easily see an admin option to "turn this off" entirely as
> being too dangerous, especially if the users have no interest in IDXD.

And a kernel command line option to block IDXD from using that
special API ... for users on generic kernels who want to block
this use model (but still use IDXD in non-kernel cases). Users
who don't want IDXD at all can block loading of the driver.

Would that work?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Luck, Tony
On Thu, May 13, 2021 at 02:33:03PM -0300, Jason Gunthorpe wrote:
> The page table under the kernel PASID should behave the same way that
> the kernel would operate the page table assigned to a kernel RID.
> 
> If the kernel has security off then the PASID should map to all
> physical memory, just like the RID does.
> 
> If security is on then every DMA map needs to be loaded into the
> PASID's io page table no different than a RID page table.
> 
> "kernel SVA" is, IMHO, not a desirable thing, it completely destroys
> the kernel's DMA security model.
> 
> > If people want to use an accelerator on memory allocated by vmalloc()
> > things will get more complicated. But maybe we can delay solving that
> > problem until someone comes up with a real use case that needs to
> > do this?
> 
> If you have a HW limitation that the device can only issue TLPs
> with a PASID, even for kernel users, then I think the proper thing is
> to tell the IOMMU layer than a certain 'struct device' enters
> PASID-only mode and the IOMMU layer should construct an appropriate
> PASID and flow the dma operations through it.
> 
> Pretending the DMA layer doesn't exist and that PASID gets a free pass
> is not OK in the kernel.

I can see why a tight security model is needed to stop
random devices having access to mamory that they should
not be able to access.  Now that PCIe devices can be plugged
into Thunderbolt ports on computers, nobody wants to repeat
the disaster that Firewire ports created for systems over
a decade ago.

But I'd like to challege the one-size-fits-all policy. There's
a big difference between a random device plugged into a port
(which may even lie about its VendorID/DeviceID) and a device
that is part of the CPU socket.

I'd like people to think of DSA as an extension to the instruction
set. It implements asynchronous instructions like "MEMFILL" and
"MEMCOPY". These can be limited in scope when executed in user
processes or guests. But when executed by the host OS ring0 code
they can have all the same access that ring0 code has when it
dereferences a pointer.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

2021-05-13 Thread Luck, Tony
> For shared workqueue, it can only generate DMA request with PASID. The
> submission is done by ENQCMDS (S for supervisor) instruction.
>
> If we were not to share page tables with init_mm, we need a system PASID
> that doing the same direct mapping in IOMMU page tables.

Note that for the currently envisioned kernel use cases for accelerators it
would be OK for this system PASID to just provide either:

1) A 1:1 mapping for physical addresses.  Kernel users of the accelerators
  would provide physical addresses in descriptors.
2) The same mapping that the kernel uses for its "1:1" map of all physical
memory. Users would use kernel virtual addresses in that "1:1" range
(e.g. those obtained from page_to_virt(struct page *p);)

If people want to use an accelerator on memory allocated by vmalloc()
things will get more complicated. But maybe we can delay solving that
problem until someone comes up with a real use case that needs to
do this?

-Tony


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 12/12] x86/traps: Fix up invalid PASID

2020-06-26 Thread Luck, Tony
On Fri, Jun 26, 2020 at 11:44:50AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 25, 2020 at 01:17:22PM -0700, Fenghua Yu wrote:
> 
> > +static bool fixup_pasid_exception(void)
> > +{
> > +   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
> > +   return false;
> > +   if (!static_cpu_has(X86_FEATURE_ENQCMD))
> > +   return false;
> 
> elsewhere you had another variation:
> 
> +   if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
> +   return;
> +
> +   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
> +   return;
> 
> Which is it, and why do we need the CONFIG thing when combined with the
> enabled thing?

Do we have a standard way of coding for a feature that depends on multiple
other features?  For this case the system needs to both suport the ENQCMD
instruction, and also have kernel code that programs the IOMMU.

And/or guidance on when to use each of the very somewhat simlar looking
boot_cpu_has()
static_cpu_has()
IS_ENABLED()
cpu_feature_enabled()
options?

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 01/13] iommu: Change type of pasid to unsigned int

2020-06-17 Thread Luck, Tony
> Is PASID in the uapi at all?


$ grep pasid include/uapi/linux/iommu.h
* @pasid: Process Address Space ID
__u32   pasid;
 * @pasid: Process Address Space ID
__u32   pasid;
 * @pasid: Process Address Space ID
__u32   pasid;
 * - If the PASID bit is set, the @pasid field is populated and the invalidation
 * @pasid: process address space ID
__u64   pasid;
 * struct iommu_inv_pasid_info - PASID Selective Invalidation Structure
 * - If the PASID bit is set, the @pasid field is populated and the invalidation
 * @pasid: process address space ID
struct iommu_inv_pasid_info {
__u64   pasid;
 * @pasid_info: invalidation data when @granularity is %IOMMU_INV_GRANU_PASID
struct iommu_inv_pasid_info pasid_info;
 * struct iommu_gpasid_bind_data_vtd - Intel VT-d specific data on device and 
guest
struct iommu_gpasid_bind_data_vtd {
 * struct iommu_gpasid_bind_data - Information about device and guest PASID 
binding
 * @hpasid: Process address space ID used for the guest mm in host IOMMU
 * @gpasid: Process address space ID used for the guest mm in guest IOMMU
struct iommu_gpasid_bind_data {
__u64 hpasid;
__u64 gpasid;
struct iommu_gpasid_bind_data_vtd vtd;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Luck, Tony
> So what’s the RDMSR for?  Surely you
> have some state somewhere that says “this task has a PASID.”
> Can’t you just make sure that stays in sync with the MSR?  Then, on #GP, if 
> the task already has a PASID, you know the MSR is set.

We have state that says the process ("mm") has been allocated a PASID. But not 
for each task.

E.g. a process may clone a bunch of tasks, then one of them opens a device that 
needs
a PASID.   We did internally have patches to go hunt down all those other tasks 
and
force a PASID onto each. But the code was big and ugly. Also maybe the wrong 
thing
to do if those threads didn't ever need to access the device.

PeterZ suggested that we can add a bit to the task structure for this purpose.

Fenghua is hesitant about adding an x86 only bit there.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Luck, Tony
> Are we planning to keep PASID live once a task has used it once or are we 
> going to swap it lazily?  If the latter, a percpu variable might be better.

Current plan is "touch it once and the task owns it until exit(2)"

Maybe someday in the future when we have data on how applications
actually use accelerators we could look at something more complex
if usage patterns look like it would be beneficial.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

2020-06-15 Thread Luck, Tony
>> The heuristic always initializes the MSR with the per mm PASID IIF the
>> mm has a valid PASID but the MSR doesn't have one. This heuristic usually
>> happens only once on the first #GP in a thread.
>
> But it doesn't guarantee the PASID is the right one. Suppose both the mm
> has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
> Then we NO-OP. So if the exception was due to us having the wrong PASID,
> we stuck.

ENQCMD only checks the 'valid' bit of the IA32_PASID MSR to decide whether
to #GP or not.  H/W has no concept of the "right" pasid value.

If IA32_PASID is valid with the wrong value ... then the system is about to
see some major corruption because the operations in the accelerator are
not going to translate to the physical addresses for pages owned by the process
that issued the ENQCMD.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Luck, Tony
> There are two users of a PASID, mm and device driver(FD). If
> either one is not done with the PASID, it cannot be reclaimed. As you
> mentioned, it could take a long time for the driver to abort. If the
> abort ends *after* mmdrop, we are in trouble.
> If driver drops reference after abort/drain PASID is done, then we are
> safe.

I don't think there should be an abort ... suppose the application requested
the DSA to copy some large block of important results from DDR4 to
persistent memory.  Driver should wait for that copy operation to complete.

Note that for the operation to succeed, the kernel should still be processing
and fixing page faults for the "mm" (some parts of the data that the user wanted
to save to persistent memory may have been paged out).

The wait by the DSA diver needs to by synchronous ... the "mm" cannot be
freed until DSA says all the pending operations have completed.

Even without persistent memory, there are cases where you want the operations
to complete (mmap'd files, shared memory with other processes).

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Luck, Tony
>> So the driver needs to use flush/drain operations to make sure all
>> the in-flight work has completed before releasing/re-using the PASID.
>> 
> Are you suggesting we should let driver also hold a reference of the
> PASID?

The sequence for bare metal is:

process is queuing requests to DSA
process exits (either deliberately, or crashes, or is killed)
kernel does exit processing
DSA driver is called as part of tear down of "mm"
issues drain/flush commands to ensure that all
queued operations on the PASID for this mm have
completed
PASID can be freed

There's a 1:1 map from "mm" to PASID ... so reference counting seems
like overkill. Once the kernel is in the "exit" path, we know that no more
work can be queued using this PASID.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 5/7] x86/mmu: Allocate/free PASID

2020-04-28 Thread Luck, Tony
> If fd release cleans up then how should there be something in flight at
> the final mmdrop?

ENQCMD from the user is only synchronous in that it lets the user know their
request has been added to a queue (or not).  Execution of the request may happen
later (if the device is busy working on requests for other users).  The request 
will
take some time to complete. Someone told me the theoretical worst case once,
which I've since forgotten, but it can be a long time.

So the driver needs to use flush/drain operations to make sure all the in-flight
work has completed before releasing/re-using the PASID.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 6/7] x86/traps: Fix up invalid PASID

2020-04-27 Thread Luck, Tony
> Just for the record I also suggested to have a proper errorcode in the
> #GP for ENQCMD and I surely did not suggest to avoid decoding the user
> instructions.

Thomas,

Is the heuristic to avoid decoding the user instructions OK (you are just 
pointing
out that you should not be given credit for this part of the idea)?

Or are you saying that you'd like to see the instruction checked to see that it
was an ENQCMD?

-Tony

 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC] avoid indirect calls for DMA direct mappings v2

2018-12-11 Thread Luck, Tony
> But that might not be your fault. My ancient system is getting flaky. A v4.19 
> build that
> has booted before is also resetting :-(

After a power-cycle (and some time to let the machine cool off). System now 
boots
with your patch series plus the __phys_to_pfn() #define

So if you can figure the right way to fix that, you are good to go.

Tested-by: Tony Luck 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC] avoid indirect calls for DMA direct mappings v2

2018-12-11 Thread Luck, Tony
> This should fix it:
...
> +#include 

Not quite. Still have an issue with __phys_to_pfn(paddr)

Trying ti #include  gave we a raft of redefined
macros. So I just added

#define  __phys_to_pfn(paddr)PHYS_PFN(paddr)

to arch/ia64/mm/init.c

That made the build work. But boot spontaneously resets after:

mptsas: ioc1: attaching ssp device: fw_channel 0, fw_id 6, phy 6, sas_addr 
0x5000c5000ecada69
scsi 5:0:0:0: Direct-Access SEAGATE  ST9146802SS  0003 PQ: 0 ANSI: 5
EFI Variables Facility v0.08 2004-May-17
sd 5:0:0:0: [sdb] 286749488 512-byte logical blocks: (147 GB/137 GiB)
sd 5:0:0:0: [sdb] Write Protect is off
sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and 
FUA
 sdb: sdb1 sdb2
sd 5:0:0:0: [sdb] Attached SCSI disk

But that might not be your fault. My ancient system is getting flaky. A v4.19 
build that
has booted before is also resetting :-(

-Tony


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings v2

2018-12-10 Thread Luck, Tony
On Fri, Dec 07, 2018 at 11:07:05AM -0800, Christoph Hellwig wrote:
> This works is based on the dma-mapping tree, so you probably want to
> want this git tree for testing:
> 
> git://git.infradead.org/users/hch/misc.git dma-direct-calls.2

Pulled this tree. Got HEAD

33b9fc015171 ("dma-mapping: bypass indirect calls for dma-direct")

But the ia64 build fails with:

arch/ia64/mm/init.c:75:21: warning: 'enum dma_data_direction' declared inside 
parameter list [enabled by default]
arch/ia64/mm/init.c:75:21: warning: its scope is only this definition or 
declaration, which is probably not what you want [enabled by default]
arch/ia64/mm/init.c:75:40: error: parameter 4 ('dir') has incomplete type
arch/ia64/mm/init.c:74:6: error: function declaration isn't a prototype 
[-Werror=strict-prototypes]
arch/ia64/mm/init.c: In function 'arch_sync_dma_for_cpu':
arch/ia64/mm/init.c:77:2: error: implicit declaration of function 
'__phys_to_pfn' [-Werror=implicit-function-declaration]

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: misc ia64 cleanups (resend)

2018-09-20 Thread Luck, Tony
> A couple random cleanups I stumbled upon when doing dma related work.

Christoph,

Thanks.  Applied. Will send pull request for next merge window.

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu