Re: iommu_sva_bind_device question

2022-06-25 Thread Fenghua Yu
Hi, Jerry and Baolu,

On Fri, Jun 24, 2022 at 07:47:30AM -0700, Jerry Snitselaar wrote:
> > > > > > Hi Baolu & Dave,
> > > > fails.
> > > > 
> > > > You also will get the following warning if you don't have scalable
> > > > mode enabled (either not enabled by default, or if enabled by default
> > > > and passed intel_iommu=on,sm_off):
> > > 
> > > If scalable mode is disabled, iommu_dev_enable_feature(IOMMU_SVA) will
> > > return failure, hence driver should not call iommu_sva_bind_device().
> > > I guess below will disappear if above is fixed in the idxd driver.

Yes, Jerry's patch fixes the WARNING as well.

> > > 
> > > Best regards,
> > > baolu
> > >
> > 
> > It looks like there was a recent maintainer change, and Fenghua is now
> > the maintainer. Fenghua thoughts on this? With 42a1b73852c4
> > ("dmaengine: idxd: Separate user and kernel pasid enabling") the code
> > no longer depends on iommu_dev_feature_enable succeeding. Testing with
> > something like this works (ran dmatest without sm_on, and
> > dsa_user_test_runner.sh with sm_on, plus booting with various
> > intel_iommu= combinations):
> > 
> > diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> > index 355fb3ef4cbf..5b49fd5c1e25 100644
> > --- a/drivers/dma/idxd/init.c
> > +++ b/drivers/dma/idxd/init.c
> > @@ -514,13 +514,14 @@ static int idxd_probe(struct idxd_device *idxd)
> > if (IS_ENABLED(CONFIG_INTEL_IDXD_SVM) && sva) {
> > if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA))
> > dev_warn(dev, "Unable to turn on user SVA 
> > feature.\n");
> > -   else
> > +   else {
> > set_bit(IDXD_FLAG_USER_PASID_ENABLED, >flags);
> > 
> > -   if (idxd_enable_system_pasid(idxd))

Please add "{" after this if.

> > -   dev_warn(dev, "No in-kernel DMA with PASID.\n");
> > -   else
then "}" before this else.

> > -   set_bit(IDXD_FLAG_PASID_ENABLED, >flags);
> > +   if (idxd_enable_system_pasid(idxd))
> > +   dev_warn(dev, "No in-kernel DMA with 
> > PASID.\n");
> > +   else
> > +   set_bit(IDXD_FLAG_PASID_ENABLED, 
> > >flags);
> > +   }
> > } else if (!sva) {
> > dev_warn(dev, "User forced SVA off via module param.\n");
> > }

The patch was copied/pasted here. So the tabs are lost at beginning of each
line. So it cannot be applied. Please change the tabs back.

Could you please send this patch in a separate email so that it has a
right patch format and description and ready to be picked up?

> > 
> > The commit description is a bit confusing, because it talks about there
> > being no dependency, but ties user pasid to enabling/disabling the SVA
> > feature, which system pasid would depend on as well.
> > 
> > Regards,
> > Jerry
> 
> Things like that warning message "Unable to turn on user SVA feature" when
> iommu_dev_enable_feature fails though seems to be misleading with user
> inserted in there. I'll leave it to the idxd folks to figure out.

How about removing "user" from the warning message? So the message will
be "Unable to turn on SVA feature"?

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-29 Thread Fenghua Yu
Hi, Jean and Baolu,

On Fri, Apr 29, 2022 at 03:34:36PM +0100, Jean-Philippe Brucker wrote:
> On Fri, Apr 29, 2022 at 06:51:17AM -0700, Fenghua Yu wrote:
> > Hi, Baolu,
> > 
> > On Fri, Apr 29, 2022 at 03:53:57PM +0800, Baolu Lu wrote:
> > > On 2022/4/28 16:39, Jean-Philippe Brucker wrote:
> > > > > The address space is what the OOM killer is after.  That gets 
> > > > > refcounted
> > > > > with mmget()/mmput()/mm->mm_users.  The OOM killer is satiated by the
> > > > > page freeing done in __mmput()->exit_mmap().
> > > > > 
> > > > > Also, all the VMAs should be gone after exit_mmap().  So, even if
> > > > > vma->vm_file was holding a reference to a device driver, that 
> > > > > reference
> > > > > should be gone by the time __mmdrop() is actually freeing the PASID.
> > > > 
> > > > I agree with all that. The concern was about tearing down the PASID in 
> > > > the
> > > > IOMMU and device from the release() MMU notifier, which would happen in
> > > > exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And
> > > > since the IOMMU drivers need to hold mm->mm_count anyway between bind()
> > > > and unbind(), I think Fenghua's fix works.
> > > 
> > > But I didn't find mmgrab()/mmdrop() get called in both arm and intel
> > > IOMMU drivers.
> > > 
> > > $ git grep mmgrab drivers/iommu/
> > > [no output]
> > > 
> > > Do we need to add these in a separated fix patch, or I missed anything
> > > here?
> > 
> > On both ARM and X86, sva_bind() calls mmu_notifier_register()->mmgrab() and
> > sva_unbind() calls mmu_notifier_unregister()/mmu_notifier_put()->mmdrop().
> 
> Yes, although for Arm I realized the mmu_notifier grab wasn't sufficient
> so I sent a separate fix that should go in 5.18 as well
> https://lore.kernel.org/linux-iommu/20220426130444.300556-1-jean-phili...@linaro.org/
> The Arm driver still touches the arch mm context after mmu_notifier_put().
> I don't think X86 has that problem.

I think so too. On X86, the mm is not used after mmu_notifier_unregister().

In case of supervisor mode SVM (i.e. svm is bound to init_mm), the code
is right too because init_mm and its PASID cannot be dropped and
mmu_notifier_register()/mmu_notifier_unregister() are not called.

So I think no extra fix patch is needed on X86.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-29 Thread Fenghua Yu
Hi, Baolu,

On Fri, Apr 29, 2022 at 03:53:57PM +0800, Baolu Lu wrote:
> On 2022/4/28 16:39, Jean-Philippe Brucker wrote:
> > > The address space is what the OOM killer is after.  That gets refcounted
> > > with mmget()/mmput()/mm->mm_users.  The OOM killer is satiated by the
> > > page freeing done in __mmput()->exit_mmap().
> > > 
> > > Also, all the VMAs should be gone after exit_mmap().  So, even if
> > > vma->vm_file was holding a reference to a device driver, that reference
> > > should be gone by the time __mmdrop() is actually freeing the PASID.
> > 
> > I agree with all that. The concern was about tearing down the PASID in the
> > IOMMU and device from the release() MMU notifier, which would happen in
> > exit_mmap(). But doing the teardown at or before __mmdrop() is fine. And
> > since the IOMMU drivers need to hold mm->mm_count anyway between bind()
> > and unbind(), I think Fenghua's fix works.
> 
> But I didn't find mmgrab()/mmdrop() get called in both arm and intel
> IOMMU drivers.
> 
> $ git grep mmgrab drivers/iommu/
> [no output]
> 
> Do we need to add these in a separated fix patch, or I missed anything
> here?

On both ARM and X86, sva_bind() calls mmu_notifier_register()->mmgrab() and
sva_unbind() calls mmu_notifier_unregister()/mmu_notifier_put()->mmdrop().

So mm->mm_count are already counted in existing ARM and X86 binding and
unbinding. The fix patch just frees the PASID in __mmdrop() after
no more mm->mm_count.

There is no need to add extra  mmgrab() and mmdrop() pair.

Thanks.

-Fenghua

 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/sva: Fix PASID use-after-free issue

2022-04-28 Thread Fenghua Yu
The PASID is being freed too early.  It needs to stay around until after
device drivers that might be using it have had a chance to clear it out
of the hardware.

As a reminder:

mmget() /mmput()  refcount the mm's address space
mmgrab()/mmdrop() refcount the mm itself

The PASID is currently tied to the life of the mm's address space and
freed in __mmput().  This makes logical sense because the PASID can't be
used once the address space is gone.

But, this misses an important point: even after the address space is
gone, the PASID will still be programmed into a device.  Device drivers
might, for instance, still need to flush operations that are outstanding
and need to use that PASID.  They do this at file->release() time.

Device drivers call the IOMMU driver to hold a reference on the mm itself
and drop it at file->release() time.  But, the IOMMU driver holds a
reference on the mm itself, not the address space.  The address space
(and the PASID) is long gone by the time the driver tries to clean up.
This is effectively a use-after-free bug on the PASID.

To fix this, move the PASID free operation from __mmput() to __mmdrop().
This ensures that the IOMMU driver's existing mmgrab() keeps the PASID
allocated until it drops its mm reference.

Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and 
free it on mm exit")

Reported-by: Zhangfei Gao 
Tested-by: Zhangfei Gao 
Suggested-by: Jean-Philippe Brucker 
Suggested-by: Jacob Pan 
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Fenghua Yu 
---

v2:
- Dave Hansen rewrites the change log.
- Add Tested-by: Zhangfei Gao 
- Add Reviewed-by: Jean-Philippe Brucker 

The original patch was posted and discussed in:
https://lore.kernel.org/lkml/ymdzffx7fn586...@fyu1.sc.intel.com/

 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..35a3beff140b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm)
mmu_notifier_subscriptions_destroy(mm);
check_mm(mm);
put_user_ns(mm->user_ns);
+   mm_pasid_drop(mm);
free_mm(mm);
 }
 EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm)
}
if (mm->binfmt)
module_put(mm->binfmt->module);
-   mm_pasid_drop(mm);
mmdrop(mm);
 }
 
-- 
2.32.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-28 Thread Fenghua Yu
Hi, Dave,

On Thu, Apr 28, 2022 at 08:09:04AM -0700, Dave Hansen wrote:
> On 4/25/22 21:20, Fenghua Yu wrote:
> >>From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001
> > From: Fenghua Yu 
> > Date: Fri, 15 Apr 2022 00:51:33 -0700
> > Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue
> > 
> > A PASID might be still used on ARM after it is freed in __mmput().
> 
> Is it really just ARM?

Actually it should happen on X86 as well. I will remove "on ARM" in the
changelog.

> 
> > process:
> > open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm
> > exit();
> > exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1
> > exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure
> > 
> > To avoid the use-after-free issue, free the PASID after no device uses it,
> > i.e. after all devices are unbound from the mm.
> > 
> > sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count.
> > __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID
> > in __mmdrop() guarantees the PASID is safely freed only after no device
> > is bound to the mm.
> 
> Does this changelog work for everyone?
> 
> ==
> 
> tl;dr: The PASID is being freed too early.  It needs to stay around
> until after device drivers that might be using it have had a chance to
> clear it out of the hardware.
> 

Do you want me to change the changlog to add both this paragraph and the
following paragraph?

> --
> 
> As a reminder:
> 
> mmget() /mmput()  refcount the mm's address space
> mmgrab()/mmdrop() refcount the mm itself
> 
> The PASID is currently tied to the life of the mm's address space and
> freed in __mmput().  This makes logical sense because the PASID can't be
> used once the address space is gone.
> 
> But, this misses an important point: even after the address space is
> gone, the PASID will still be programmed into a device.  Device drivers
> might, for instance, still need to flush operations that are outstanding
> and need to use that PASID.  They do this at ->release() time.
> 
> Device drivers hold a reference on the mm itself and drop it at
> ->release() time.  But, the device driver holds a reference mm itself,
> not the address space.  The address space (and the PASID) is long gone
> by the time the driver tries to clean up.  This is effectively a
> use-after-free bug on the PASID.
> 
> To fix this, move the PASID free operation from __mmput() to __mmdrop().
>  This ensures that the device drivers' existing mmgrab() keeps the PASID
> allocated until they drop their mm reference.
> 

Thank you very much!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-27 Thread Fenghua Yu
Hi, Dave and Jean,

On Tue, Apr 26, 2022 at 01:04:45PM +0800, Zhangfei Gao wrote:
> 
> 
> On 2022/4/26 下午12:20, Fenghua Yu wrote:
> > Hi, Jean and Zhangfei,
> > 
> > On Mon, Apr 25, 2022 at 05:13:02PM +0100, Jean-Philippe Brucker wrote:
> > > Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
> > > we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
> > > also part of Lu's rework [1].
> > Is this a right fix for the issue? Could you please test it on ARM?
> > I don't have an ARM machine.
> > 
> > Thanks.
> > 
> > -Fenghua
> > 
> >  From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001
> > From: Fenghua Yu 
> > Date: Fri, 15 Apr 2022 00:51:33 -0700
> > Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue
> > 
> > A PASID might be still used on ARM after it is freed in __mmput().
> > 
> > process:
> > open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm
> > exit();
> > exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1
> > exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure
> > 
> > To avoid the use-after-free issue, free the PASID after no device uses it,
> > i.e. after all devices are unbound from the mm.
> > 
> > sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count.
> > __mmdrop() is called only after mm->mm_count is zero. So freeing the PASID
> > in __mmdrop() guarantees the PASID is safely freed only after no device
> > is bound to the mm.
> > 
> > Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation 
> > and free it on mm exit")
> > 
> > Reported-by: Zhangfei Gao 
> > Suggested-by: Jean-Philippe Brucker 
> > Suggested-by: Jacob Pan 
> > Signed-off-by: Fenghua Yu 
> Thanks for the fix.
> 
> Tested-by: Zhangfei Gao 
> 
> 
> > ---
> >   kernel/fork.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 9796897560ab..35a3beff140b 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm)
> > mmu_notifier_subscriptions_destroy(mm);
> > check_mm(mm);
> > put_user_ns(mm->user_ns);
> > +   mm_pasid_drop(mm);
> > free_mm(mm);
> >   }
> >   EXPORT_SYMBOL_GPL(__mmdrop);
> > @@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm)
> > }
> > if (mm->binfmt)
> > module_put(mm->binfmt->module);
> > -   mm_pasid_drop(mm);
> > mmdrop(mm);
> >   }
> 

Is this patch a good fix? Will you help push the fix into upstream?

Thank you very much!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Fenghua Yu
On Tue, Apr 26, 2022 at 12:28:00PM +0800, Zhangfei Gao wrote:
> Hi, Jean
> 
> On 2022/4/26 上午12:13, Jean-Philippe Brucker wrote:
> > Hi Jacob,
> > 
> > On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote:
> > > Hi Jean-Philippe,
> > > 
> > > On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
> > >  wrote:
> > > 
> > > > On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:
> > > > > On 4/25/22 06:53, Jean-Philippe Brucker wrote:
> > > > > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com
> > > > > > wrote:
> > > > > > > > > On 5.17
> > > > > > > > > fops_release is called automatically, as well as
> > > > > > > > > iommu_sva_unbind_device. On 5.18-rc1.
> > > > > > > > > fops_release is not called, have to manually call close(fd)
> > > > > > > > Right that's weird
> > > > > > > Looks it is caused by the fix patch, via mmget, which may add
> > > > > > > refcount of fd.
> > > > > > Yes indirectly I think: when the process mmaps the queue,
> > > > > > mmap_region() takes a reference to the uacce fd. That reference is
> > > > > > released either by explicit close() or munmap(), or by exit_mmap()
> > > > > > (which is triggered by mmput()). Since there is an mm->fd 
> > > > > > dependency,
> > > > > > we cannot add a fd->mm dependency, so no mmget()/mmput() in
> > > > > > bind()/unbind().
> > > > > > 
> > > > > > I guess we should go back to refcounted PASIDs instead, to avoid
> > > > > > freeing them until unbind().
> > > > > Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
> > > > > nothing else simple we can do.
> > > > > 
> > > > > How does the IOMMU hardware know that all activity to a given PASID is
> > > > > finished?  That activity should, today, be independent of an mm or a
> > > > > fd's lifetime.
> > > > In the case of uacce, it's tied to the fd lifetime: opening an 
> > > > accelerator
> > > > queue calls iommu_sva_bind_device(), which sets up the PASID context in
> > > > the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
> > > > destroys the PASID context (after the device driver stopped all DMA for
> > > > this PASID).
> > > > 
> > > For VT-d, it is essentially the same flow except managed by the individual
> > > drivers such as DSA.
> > > If free() happens before unbind(), we deactivate the PASIDs and suppress
> > > faults from the device. When the unbind finally comes, we finalize the
> > > PASID teardown. It seems we have a need for an intermediate state where
> > > PASID is "pending free"?
> > Yes we do have that state, though I'm not sure we need to make it explicit
> > in the ioasid allocator.
> > 
> > Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
> > we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
> > also part of Lu's rework [1].
> 
> Move mm_pasid_drop to __mmdrop looks workable.
> 
> The nginx works since ioasid is not freed when master exit until nginx stop.
> 
> The ioasid does not free immediately when fops_release->unbind finished.
> Instead, __mmdrop happens a bit lazy,  which has no issue though
> I passed 1 times exit without unbind test, the pasid allocation is ok.
> 
> Thanks
> 
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9796897560ab..60f417f69367 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -792,6 +792,8 @@ void __mmdrop(struct mm_struct *mm)
>     mmu_notifier_subscriptions_destroy(mm);
>     check_mm(mm);
>     put_user_ns(mm->user_ns);
> +   mm_pasid_drop(mm);
>     free_mm(mm);
>  }
>  EXPORT_SYMBOL_GPL(__mmdrop);
> @@ -1190,7 +1192,6 @@ static inline void __mmput(struct mm_struct *mm)
>     }
>     if (mm->binfmt)
>     module_put(mm->binfmt->module);
> -   mm_pasid_drop(mm);
>     mmdrop(mm);
>  }

Thank you very much, Zhangfei!

I just now sent out an identical patch. It works on X86 as well.

So seems the patch is the right fix.

Either you can send out the patch or I add your Signed-off-by? Either way
is OK for me.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Fenghua Yu
Hi, Jean and Zhangfei,

On Mon, Apr 25, 2022 at 05:13:02PM +0100, Jean-Philippe Brucker wrote:
> Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
> we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
> also part of Lu's rework [1].

Is this a right fix for the issue? Could you please test it on ARM?
I don't have an ARM machine.

Thanks.

-Fenghua

>From 84aa68f6174439d863c40cdc2db0e1b89d620dd0 Mon Sep 17 00:00:00 2001
From: Fenghua Yu 
Date: Fri, 15 Apr 2022 00:51:33 -0700
Subject: [PATCH] iommu/sva: Fix PASID use-after-free issue

A PASID might be still used on ARM after it is freed in __mmput().

process:
open()->sva_bind()->ioasid_alloc() = N; // Get PASID N for the mm
exit();
exit_mm()->__mmput()->mm_pasid_drop()->mm->pasid = -1; // PASID -1
exit_files()->release(dev)->sva_unbind()->use mm->pasid; // Failure

To avoid the use-after-free issue, free the PASID after no device uses it,
i.e. after all devices are unbound from the mm.

sva_bind()/sva_unbind() call mmgrab()/mmdrop() to track mm->mm_count.
__mmdrop() is called only after mm->mm_count is zero. So freeing the PASID
in __mmdrop() guarantees the PASID is safely freed only after no device
is bound to the mm.

Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and 
free it on mm exit")

Reported-by: Zhangfei Gao 
Suggested-by: Jean-Philippe Brucker 
Suggested-by: Jacob Pan 
Signed-off-by: Fenghua Yu 
---
 kernel/fork.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab..35a3beff140b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -792,6 +792,7 @@ void __mmdrop(struct mm_struct *mm)
mmu_notifier_subscriptions_destroy(mm);
check_mm(mm);
put_user_ns(mm->user_ns);
+   mm_pasid_drop(mm);
free_mm(mm);
 }
 EXPORT_SYMBOL_GPL(__mmdrop);
@@ -1190,7 +1191,6 @@ static inline void __mmput(struct mm_struct *mm)
}
if (mm->binfmt)
module_put(mm->binfmt->module);
-   mm_pasid_drop(mm);
mmdrop(mm);
 }
 
-- 
2.32.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-15 Thread Fenghua Yu
Hi, Zhangfei,

On Fri, Apr 15, 2022 at 07:52:03PM +0800, zhangfei@foxmail.com wrote:
> > On my X86 machine, nginx doesn't trigger the kernel sva binding function
> > to allocate ioasid. I tried pre- nstalled nginx/openssl and also tried my 
> > built
> > a few versions of nginx/openssl. nginx does call OPENSSL_init_ssl() but
> > doesn't go to the binding function. Don't know if it's my configuration 
> > issue.
> > Maybe you can give me some advice?
> I am using openssl engine, which use crypto driver and using sva via uacce.
> nginx -> openssl -> openssl engine -> sva related.

uacce is not used on X86. That's why I cannot test IOASID/PASID by nginx
on X86.

I only can test the RFC patch by other test tools via IDXD driver which uses
PASID on X86.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-15 Thread Fenghua Yu
Hi, Zhangfei,

On Fri, Apr 15, 2022 at 07:52:03PM +0800, zhangfei@foxmail.com wrote:
> 
> 
> On 2022/4/15 下午6:50, Fenghua Yu wrote:
> > Hi, Zhangfei,
> > 
> > On Fri, Apr 15, 2022 at 06:14:09PM +0800, zhangfei@foxmail.com wrote:
> > I download this patch from:
> > https://lore.kernel.org/lkml/ylladl6umolll...@fyu1.sc.intel.com/raw
> > git am to either v5.18-rc2 or the latest upstream without any issue.
> It is my copy paste issue.
> 
> I have tested, nginx woks well.

Great!

> 
> Other than the following issue,
> Each time /sbin/nginx will alloc ioasid but not free.
> which I think it maybe nginx issue or the mis-usage, will ask there.

Which nginx/openssl function is supposed to call kernel sva_unbind? I couldn't
find the function in nginx tree.

If nginx doesn't free ioasid, it will cause ioasid leak and memory leak.

> 
> Tested-by: Zhangfei Gao 

Thank you for your testing!

> 
> > 
> > > It should work for arm.
> > > 
> > > In fact I have a similar patch at hand but pending since I found an issue.
> > > 
> > > I start & stop nginx via this cmd.
> > > //start
> > > sudo sbin/nginx                    // this alloc an ioasid=1
> > > //stop
> > > sudo sbin/nginx -s quit    // this does not free ioasid=1, but still alloc
> > > ioasid=2.
> > > So ioasid will keep allocated but not freed if continue start/stop nginx,
> > > though not impact the nginx function.
> > > 
> > > stop nginx with -s quit still calls
> > > src/core/nginx.c
> > > main -> ngx_ssl_init -> openssl engine:    bind_fn -> ... -> alloc asid
> > > But openssl engine: ENGINE_free is not called
> > > 
> > > Still in checking nginx code.
> > > 
> > > Or do you test with nginx?
> > On my X86 machine, nginx doesn't trigger the kernel sva binding function
> > to allocate ioasid. I tried pre- nstalled nginx/openssl and also tried my 
> > built
> > a few versions of nginx/openssl. nginx does call OPENSSL_init_ssl() but
> > doesn't go to the binding function. Don't know if it's my configuration 
> > issue.
> > Maybe you can give me some advice?
> I am using openssl engine, which use crypto driver and using sva via uacce.
> nginx -> openssl -> openssl engine -> sva related.

I'll do more nginx experiments.

> 
> > 
> > I test the patch with a few internal test tools and observe mmget()/mmput()
> > works fine in various cases.
> OK, thanks

Thank you very much!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-15 Thread Fenghua Yu
Hi, Zhangfei,

On Fri, Apr 15, 2022 at 06:14:09PM +0800, zhangfei@foxmail.com wrote:
> 
> 
> On 2022/4/15 下午5:51, Fenghua Yu wrote:
> > On Thu, Apr 14, 2022 at 06:08:09PM +0800, zhangfei@foxmail.com wrote:
> > > On 2022/4/12 下午11:35, zhangfei@foxmail.com wrote:
> > > > Hi, Fenghua
> > > > 
> > > > On 2022/4/12 下午9:41, Fenghua Yu wrote:
> >  From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001
> > From: Fenghua Yu 
> > Date: Fri, 15 Apr 2022 00:51:33 -0700
> > Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue
> > 
> > A PASID might be still used even though it is freed on mm exit.
> > 
> > process A:
> > sva_bind();
> > ioasid_alloc() = N; // Get PASID N for the mm
> > fork(): // spawn process B
> > exit();
> > ioasid_free(N);
> > 
> > process B:
> > device uses PASID N -> failure
> > sva_unbind();
> > 
> > Dave Hansen suggests to take a refcount on the mm whenever binding the
> > PASID to a device and drop the refcount on unbinding. The mm won't be
> > dropped if the PASID is still bound to it.
> > 
> > Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation 
> > and free it on mm exit")
> > 
> > Reported-by: Zhangfei Gao 
> > Suggested-by: Dave Hansen" 
> > Signed-off-by: Fenghua Yu 
> > ---
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++
> >   drivers/iommu/intel/svm.c   | 4 
> >   2 files changed, 10 insertions(+)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > index 22ddd05bbdcd..3fcb842a0df0 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > @@ -7,6 +7,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include "arm-smmu-v3.h"
> >   #include "../../iommu-sva-lib.h"
> > @@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct 
> > *mm, void *drvdata)
> > mutex_lock(_lock);
> > handle = __arm_smmu_sva_bind(dev, mm);
> > +   /* Take an mm refcount on a successful bind. */
> > +   if (!IS_ERR(handle))
> > +   mmget(mm);
> > mutex_unlock(_lock);
> > return handle;
> >   }
> > @@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
> > struct arm_smmu_bond *bond = sva_to_bond(handle);
> > mutex_lock(_lock);
> > +   /* Drop an mm refcount. */
> > +   mmput(bond->mm);
> > if (refcount_dec_and_test(>refs)) {
> > list_del(>list);
> > arm_smmu_mmu_notifier_put(bond->smmu_mn);
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index 23a38763c1d1..345a0d5d7922 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -403,6 +403,8 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
> > intel_iommu *iommu,
> > goto free_sdev;
> > list_add_rcu(>list, >devs);
> > +   /* Take an mm refcount on binding mm. */
> > +   mmget(mm);
> >   success:
> > return >sva;
> > @@ -465,6 +467,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
> > pasid)
> > kfree(svm);
> > }
> > }
> > +   /* Drop an mm reference on unbinding mm. */
> > +   mmput(mm);
> > }
> >   out:
> > return ret;
> This patch can not be applied on 5.18-rc2 for intel part.

What error do you see? Could you please send to me errors?

I download this patch from:
https://lore.kernel.org/lkml/ylladl6umolll...@fyu1.sc.intel.com/raw
git am to either v5.18-rc2 or the latest upstream without any issue.

> It should work for arm.
> 
> In fact I have a similar patch at hand but pending since I found an issue.
> 
> I start & stop nginx via this cmd.
> //start
> sudo sbin/nginx                    // this alloc an ioasid=1
> //stop
> sudo sbin/nginx -s quit    // this does not free ioasid=1, but still alloc
> ioasid=2.
> So ioasid will keep allocated but not freed if continue start/stop nginx, 
> though not impact the nginx function.
> 
> stop nginx with -s quit still calls
> src/core/nginx.c
> main -> ngx_ssl_init -> openssl engine:    bind_fn -> ... -> alloc asid
> But openssl engine: ENGINE_free is not called
> 
> Still in checking nginx code.
> 
> Or do you test with nginx?

On my X86 machine, nginx doesn't trigger the kernel sva binding function
to allocate ioasid. I tried pre- nstalled nginx/openssl and also tried my built
a few versions of nginx/openssl. nginx does call OPENSSL_init_ssl() but
doesn't go to the binding function. Don't know if it's my configuration issue.
Maybe you can give me some advice?

I test the patch with a few internal test tools and observe mmget()/mmput()
works fine in various cases.

Thanks.

-Fenghua 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-15 Thread Fenghua Yu
Hi, Dave,

On Tue, Apr 12, 2022 at 07:39:10AM -0700, Dave Hansen wrote:
> On 4/12/22 06:41, Fenghua Yu wrote:
> >> master process quit, mmput ->  mm_pasid_drop->ioasid_free
> >> But this ignore driver's iommu_sva_unbind_device function,
> >> iommu_sva_bind_device and iommu_sva_unbind_device are not pair,  So driver
> >> does not know ioasid is freed.
> >>
> >> Any suggestion?
> > ioasid is per process or per mm. A daemon process shouldn't share the same 
> > ioasid with any other process with even its parent process. Its parent gets
> > an ioasid and frees it on exit. The ioasid is gone and shouldn't be used
> > by its child process.
> > 
> > Each daemon process should call driver -> iommu_sva_bind_device -> 
> > ioasid_alloc
> > to get its own ioasid/PASID. On daemon quit, the ioasid is freed.
> > 
> > That means nqnix needs to be changed.
> 
> Fenghua, please step back for a second and look at what you are saying.
>  Your patch caused userspace to break.  Now, you're telling someone that
> they need to go change that userspace to work around something that your
> patch.  How, exactly, are you suggesting that nginx could change to fix
> this?  What, specifically, was it doing with *fork()* that was wrong?
> 
> It sounds to me like you're saying that it's OK to break userspace.

You are right. The patch should not break userspace. I follow your
suggestion to fix the issue by mmget() in binding and mmput() in unbinding.
The RFC patch was sent out in another thread. Please review it.

Thank you very much for your advice.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-15 Thread Fenghua Yu
On Thu, Apr 14, 2022 at 06:08:09PM +0800, zhangfei@foxmail.com wrote:
> 
> On 2022/4/12 下午11:35, zhangfei@foxmail.com wrote:
> > Hi, Fenghua
> > 
> > On 2022/4/12 下午9:41, Fenghua Yu wrote:
> > > Hi, Zhangfei,
> > > 
> > > On Tue, Apr 12, 2022 at 03:04:09PM +0800, zhangfei@foxmail.com
> > > wrote:
> > > > 
> > > > On 2022/4/11 下午10:52, Dave Hansen wrote:
> > > > > On 4/11/22 07:44, zhangfei@foxmail.com wrote:
> > > > > > On 2022/4/11 下午10:36, Dave Hansen wrote:
> > > > > > > On 4/11/22 07:20, zhangfei@foxmail.com wrote:
> > Agree with Dave, I think user space should not be broken.
> > 
> > Thanks
> 
> Any plan about this regression?
> Currently I need this patch to workaround the issue.
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index 22ddd05bbdcd..2d74ac53d11c 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -4,6 +4,7 @@
>   */
> 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -363,6 +364,7 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct
> *mm, void *drvdata)
> 
>     mutex_lock(_lock);
>     handle = __arm_smmu_sva_bind(dev, mm);
> +   mmget(mm);
>     mutex_unlock(_lock);
>     return handle;
>  }
> @@ -377,6 +379,7 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
>     arm_smmu_mmu_notifier_put(bond->smmu_mn);
>     kfree(bond);
>     }
> +       mmput(bond->mm);
>     mutex_unlock(_lock);
>  }

Could you please review and/or test this patch? It's supposed to fix
the PASID issue on both ARM and X86.

From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001
From: Fenghua Yu 
Date: Fri, 15 Apr 2022 00:51:33 -0700
Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue

A PASID might be still used even though it is freed on mm exit.

process A:
sva_bind();
ioasid_alloc() = N; // Get PASID N for the mm
fork(): // spawn process B
exit();
ioasid_free(N);

process B:
device uses PASID N -> failure
sva_unbind();

Dave Hansen suggests to take a refcount on the mm whenever binding the
PASID to a device and drop the refcount on unbinding. The mm won't be
dropped if the PASID is still bound to it.

Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and 
free it on mm exit")

Reported-by: Zhangfei Gao 
Suggested-by: Dave Hansen" 
Signed-off-by: Fenghua Yu 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++
 drivers/iommu/intel/svm.c   | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 22ddd05bbdcd..3fcb842a0df0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "arm-smmu-v3.h"
 #include "../../iommu-sva-lib.h"
@@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 
mutex_lock(_lock);
handle = __arm_smmu_sva_bind(dev, mm);
+   /* Take an mm refcount on a successful bind. */
+   if (!IS_ERR(handle))
+   mmget(mm);
mutex_unlock(_lock);
return handle;
 }
@@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
struct arm_smmu_bond *bond = sva_to_bond(handle);
 
mutex_lock(_lock);
+   /* Drop an mm refcount. */
+   mmput(bond->mm);
if (refcount_dec_and_test(>refs)) {
list_del(>list);
arm_smmu_mmu_notifier_put(bond->smmu_mn);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 23a38763c1d1..345a0d5d7922 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -403,6 +403,8 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
goto free_sdev;
 
list_add_rcu(>list, >devs);
+   /* Take an mm refcount on binding mm. */
+   mmget(mm);
 success:
return >sva;
 
@@ -465,6 +467,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
pasid)
kfree(svm);
}
}
+   /* Drop an mm reference on unbinding mm. */
+   mmput(mm);
}
 out:
return ret;
-- 
2.32.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-12 Thread Fenghua Yu
Hi, Zhangfei,

On Tue, Apr 12, 2022 at 03:04:09PM +0800, zhangfei@foxmail.com wrote:
> 
> 
> On 2022/4/11 下午10:52, Dave Hansen wrote:
> > On 4/11/22 07:44, zhangfei@foxmail.com wrote:
> > > On 2022/4/11 下午10:36, Dave Hansen wrote:
> > > > On 4/11/22 07:20, zhangfei@foxmail.com wrote:
> > > > > > Is there nothing before this call trace?  Usually there will be at 
> > > > > > least
> > > > > > some warning text.
> > > > > I added dump_stack() in ioasid_free.
> > > > Hold on a sec, though...
> > > > 
> > > > What's the *problem* here?  Did something break or are you just saying
> > > > that something looks weird to _you_?
> > > After this, nginx is not working at all, and hardware reports error.
> > > Suppose the the master use the ioasid for init, but got freed.
> > > 
> > > hardware reports:
> > > [  152.731869] hisi_sec2 :76:00.0: qm_acc_do_task_timeout [error 
> > > status=0x20] found
> > > [  152.739657] hisi_sec2 :76:00.0: qm_acc_wb_not_ready_timeout [error 
> > > status=0x40] found
> > > [  152.747877] hisi_sec2 :76:00.0: sec_fsm_hbeat_rint [error 
> > > status=0x20] found
> > > [  152.755340] hisi_sec2 :76:00.0: Controller resetting...
> > > [  152.762044] hisi_sec2 :76:00.0: QM mailbox operation timeout!
> > > [  152.768198] hisi_sec2 :76:00.0: Failed to dump sqc!
> > > [  152.773490] hisi_sec2 :76:00.0: Failed to drain out data for 
> > > stopping!
> > > [  152.781426] hisi_sec2 :76:00.0: QM mailbox is busy to start!
> > > [  152.787468] hisi_sec2 :76:00.0: Failed to dump sqc!
> > > [  152.792753] hisi_sec2 :76:00.0: Failed to drain out data for 
> > > stopping!
> > > [  152.800685] hisi_sec2 :76:00.0: QM mailbox is busy to start!
> > > [  152.806730] hisi_sec2 :76:00.0: Failed to dump sqc!
> > > [  152.812017] hisi_sec2 :76:00.0: Failed to drain out data for 
> > > stopping!
> > > [  152.819946] hisi_sec2 :76:00.0: QM mailbox is busy to start!
> > > [  152.825992] hisi_sec2 :76:00.0: Failed to dump sqc!
> > That would have been awfully handy information to have in an initial bug 
> > report. :)
> > Is there a chance you could dump out that ioasid alloc *and* free 
> > information in ioasid_alloc/free()?  This could be some kind of problem 
> > with the allocator, or with copying the ioasid at fork.
> The issue is nginx master process init resource, start daemon process, then
> master process quit and free ioasid.
> The daemon nginx process is not the original master process.
> 
> master process:  init resource
> driver -> iommu_sva_bind_device -> ioasid_alloc

Which code in the master process/daemon calls driver->iommu_sva_unbind_device?

> 
> nginx : ngx_daemon
> fork daemon, without add mm's refcount.
> 
> src/os/unix/ngx_daemon.c
> ngx_daemon(ngx_log_t *log)
> {
>     int  fd;
> 
>     switch (fork()) {
>     case -1:
>     ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed");
>     return NGX_ERROR;
> 
>     case 0:        // here master process is quit directly and will be
> released.
>     break;
> 
>     default:
>     exit(0);
>     }
> // here daemon process take control.
>     ngx_parent = ngx_pid;
>     ngx_pid = ngx_getpid();
> 
> 
> fork.c
> copy_mm
>     if (clone_flags & CLONE_VM) {
>     mmget(oldmm);
>     mm = oldmm;
>     } else {
>     mm = dup_mm(tsk, current->mm);    // here daemon process
> handling without mmget.
> 
> master process quit, mmput ->  mm_pasid_drop->ioasid_free
> But this ignore driver's iommu_sva_unbind_device function,
> iommu_sva_bind_device and iommu_sva_unbind_device are not pair,  So driver
> does not know ioasid is freed.
> 
> Any suggestion?

ioasid is per process or per mm. A daemon process shouldn't share the same 
ioasid with any other process with even its parent process. Its parent gets
an ioasid and frees it on exit. The ioasid is gone and shouldn't be used
by its child process.

Each daemon process should call driver -> iommu_sva_bind_device -> ioasid_alloc
to get its own ioasid/PASID. On daemon quit, the ioasid is freed.

That means nqnix needs to be changed.

> Or can we still use the original ioasid refcount mechanism?
> 

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 00/11] Re-enable ENQCMD and PASID MSR

2022-02-11 Thread Fenghua Yu
Hi, Thomas,

On Mon, Feb 07, 2022 at 03:02:43PM -0800, Fenghua Yu wrote:
> Problems in the old code to manage SVM (Shared Virtual Memory) devices
> and the PASID (Process Address Space ID) led to that code being
> disabled.
> 
> Subsequent discussions resulted in a far simpler approach:
> 
> 1) PASID life cycle is from first allocation by a process until that
>process exits.
> 2) All tasks begin with PASID disabled
> 3) The #GP fault handler tries to fix faulting ENQCMD instructions very
>early (thus avoiding complexities of the XSAVE infrastructure)
> 
> Change Log:
> v4:
> - Update commit message in patch #4 (Thomas).
> - Update commit message in patch #5 (Thomas).
> - Add "Reviewed-by: Thomas Gleixner " in patch #1-#3
>   and patch #6-#9 (Thomas).
> - Rebased to 5.17-rc3.

A friendly reminder. Any comment on this series? Will you pick up this
series in tip?

Thank you very much!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-10 Thread Fenghua Yu
Hi, Tony,

On Thu, Feb 10, 2022 at 10:31:42AM -0800, Fenghua Yu wrote:
> 
> On Thu, Feb 10, 2022 at 09:24:50AM -0800, Luck, Tony wrote:
> > On Thu, Feb 10, 2022 at 08:27:50AM -0800, Fenghua Yu wrote:
> > > Hi, Jacob,
> > > 
> > > On Wed, Feb 09, 2022 at 07:16:14PM -0800, Jacob Pan wrote:
> > > > Hi Fenghua,
> > > > 
> > > > On Mon,  7 Feb 2022 15:02:48 -0800, Fenghua Yu  
> > > > wrote:
> > > > 
> > > > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device
> > > > > *dev, struct mm_struct *mm, void }
> > > > >  
> > > > >   sva = intel_svm_bind_mm(iommu, dev, mm, flags);
> > > > > - if (IS_ERR_OR_NULL(sva))
> > > > > - intel_svm_free_pasid(mm);
> > > > If bind fails, the PASID has no IOMMU nor CPU context. It should be 
> > > > safe to
> > > > free here.
> > > 
> > > The PASID can not be freed even if bind fails. The PASID allocated earlier
> > > (either in this thread or in another thread) might be populated to other
> > > threads already and being used now.
> > > 
> > > Without freeing the PASID on bind failure, the worst case is the PASID 
> > > might
> > > not be used in the process (and will be freed on process exit anyway).
> > > 
> > > This all matches with the PASID life time described in the commit message.
> > 
> > But what does this mean for the user that failed that intel_svm_bind_mm()?
> > 
> 
> That means the user may have a PASID but there is no PASID entry for the
> device. Then ENQCMD cannot be executed successfully.
> 
> > Here's a scenario:
> > 
> > Process sets up to use PASID capable device #1. Everything works,
> > so the process has mm->pasid, and the IOMMU has the tables to map
> > virtual addresses coming from device #1 using that PASID.
> > 
> > Now the same process asks to start using PASID capable device #2,
> > but there is a failure at intel_svm_bind_mm().
> > 
> > Fenghua is right that we shouldn't free the PASID. It is in use
> > by at least one thread of the process to access device #1.
> > 
> > But what happens with device #2? Does the caller of intel_svm_bind()
> > do the right thing with the IS_ERR_OR_NULL return value to let the
> > user know that device #2 isn't accessible?
> 
> A driver caller of intel_svm_bind() should handle this failure, i.e. fail
> the binding and let the user know the failure.
> 
> Even if the driver doesn't do the right thing to handle the failure,
> intel_svm_bind() doesn't set up a PASID entry for device #2.
> 
> One example is IDXD driver. User calls open()->IDXD driver idxd_cdev_open()
> ->intel_svm_bind()->intel_svm_bind_mm(). idxd_cdev_open() gets failed "sva"
> and passes the PTR_ERR(sva) to the user and the user cannot open the device.
> Due to the failure, no PASID entry is set up for the device.
> 
> Even if the user ignores the open() failure and tries to run ENQCMD on
> device #2, a PASID table fault will be generated due to no PASID entry
> for the device and the ENQCMD execution will fail.

Plus, the above behavior of handling intel_svm_bind_mm() failure is expected
right behavior and the current series doesn't need to be changed.

Thanks.
 
-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-10 Thread Fenghua Yu
On Thu, Feb 10, 2022 at 10:49:04AM -0800, Jacob Pan wrote:
> 
> On Wed, 9 Feb 2022 19:16:14 -0800, Jacob Pan
>  wrote:
> 
> > Hi Fenghua,
> > 
> > On Mon,  7 Feb 2022 15:02:48 -0800, Fenghua Yu 
> > wrote:
> > 
> > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device
> > > *dev, struct mm_struct *mm, void }
> > >  
> > >   sva = intel_svm_bind_mm(iommu, dev, mm, flags);
> > > - if (IS_ERR_OR_NULL(sva))
> > > - intel_svm_free_pasid(mm);  
> > If bind fails, the PASID has no IOMMU nor CPU context. It should be safe
> > to free here.
> > 
> Actually, here we cannot tell if the bind is the first of the mm so I think
> this is fine.
> 
> Reviewed-by: Jacob Pan 

Thank you very much for your review, Jacob!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-10 Thread Fenghua Yu
Hi, Tony,

On Thu, Feb 10, 2022 at 09:24:50AM -0800, Luck, Tony wrote:
> On Thu, Feb 10, 2022 at 08:27:50AM -0800, Fenghua Yu wrote:
> > Hi, Jacob,
> > 
> > On Wed, Feb 09, 2022 at 07:16:14PM -0800, Jacob Pan wrote:
> > > Hi Fenghua,
> > > 
> > > On Mon,  7 Feb 2022 15:02:48 -0800, Fenghua Yu  
> > > wrote:
> > > 
> > > > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device
> > > > *dev, struct mm_struct *mm, void }
> > > >  
> > > > sva = intel_svm_bind_mm(iommu, dev, mm, flags);
> > > > -   if (IS_ERR_OR_NULL(sva))
> > > > -   intel_svm_free_pasid(mm);
> > > If bind fails, the PASID has no IOMMU nor CPU context. It should be safe 
> > > to
> > > free here.
> > 
> > The PASID can not be freed even if bind fails. The PASID allocated earlier
> > (either in this thread or in another thread) might be populated to other
> > threads already and being used now.
> > 
> > Without freeing the PASID on bind failure, the worst case is the PASID might
> > not be used in the process (and will be freed on process exit anyway).
> > 
> > This all matches with the PASID life time described in the commit message.
> 
> But what does this mean for the user that failed that intel_svm_bind_mm()?
> 

That means the user may have a PASID but there is no PASID entry for the
device. Then ENQCMD cannot be executed successfully.

> Here's a scenario:
> 
> Process sets up to use PASID capable device #1. Everything works,
> so the process has mm->pasid, and the IOMMU has the tables to map
> virtual addresses coming from device #1 using that PASID.
> 
> Now the same process asks to start using PASID capable device #2,
> but there is a failure at intel_svm_bind_mm().
> 
> Fenghua is right that we shouldn't free the PASID. It is in use
> by at least one thread of the process to access device #1.
> 
> But what happens with device #2? Does the caller of intel_svm_bind()
> do the right thing with the IS_ERR_OR_NULL return value to let the
> user know that device #2 isn't accessible?

A driver caller of intel_svm_bind() should handle this failure, i.e. fail
the binding and let the user know the failure.

Even if the driver doesn't do the right thing to handle the failure,
intel_svm_bind() doesn't set up a PASID entry for device #2.

One example is IDXD driver. User calls open()->IDXD driver idxd_cdev_open()
->intel_svm_bind()->intel_svm_bind_mm(). idxd_cdev_open() gets failed "sva"
and passes the PTR_ERR(sva) to the user and the user cannot open the device.
Due to the failure, no PASID entry is set up for the device.

Even if the user ignores the open() failure and tries to run ENQCMD on
device #2, a PASID table fault will be generated due to no PASID entry
for the device and the ENQCMD execution will fail.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-10 Thread Fenghua Yu
Hi, Jacob,

On Wed, Feb 09, 2022 at 07:16:14PM -0800, Jacob Pan wrote:
> Hi Fenghua,
> 
> On Mon,  7 Feb 2022 15:02:48 -0800, Fenghua Yu  wrote:
> 
> > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device
> > *dev, struct mm_struct *mm, void }
> >  
> > sva = intel_svm_bind_mm(iommu, dev, mm, flags);
> > -   if (IS_ERR_OR_NULL(sva))
> > -   intel_svm_free_pasid(mm);
> If bind fails, the PASID has no IOMMU nor CPU context. It should be safe to
> free here.

The PASID can not be freed even if bind fails. The PASID allocated earlier
(either in this thread or in another thread) might be populated to other
threads already and being used now.

Without freeing the PASID on bind failure, the worst case is the PASID might
not be used in the process (and will be freed on process exit anyway).

This all matches with the PASID life time described in the commit message.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-08 Thread Fenghua Yu
On Tue, Feb 08, 2022 at 10:41:39AM +0800, Lu Baolu wrote:
> On 2/8/22 7:02 AM, Fenghua Yu wrote:
> > PASIDs are process wide. It was attempted to use refcounted PASIDs to
> > free them when the last thread drops the refcount. This turned out to
> > be complex and error prone. Given the fact that the PASID space is 20
> > bits, which allows up to 1M processes to have a PASID associated
> > concurrently, PASID resource exhaustion is not a realistic concern.
> > 
> > Therefore it was decided to simplify the approach and stick with lazy
> > on demand PASID allocation, but drop the eager free approach and make
> > a allocated PASID lifetime bound to the life time of the process.
> > 
> > Get rid of the refcounting mechanisms and replace/rename the interfaces
> > to reflect this new approach.
> > 
> > Suggested-by: Dave Hansen 
> > Signed-off-by: Fenghua Yu 
> > Reviewed-by: Tony Luck 
> > ---
> > v4:
> > - Update the commit message (Thomas).
> 
> Reviewed-by: Lu Baolu 

Thank you very much for your review, Baolu!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 08/11] x86/traps: Demand-populate PASID MSR via #GP

2022-02-07 Thread Fenghua Yu
All tasks start with PASID state disabled. This means that the first
time they execute an ENQCMD instruction they will take a #GP fault.

Modify the #GP fault handler to check if the "mm" for the task has
already been allocated a PASID. If so, try to fix the #GP fault by
loading the IA32_PASID MSR.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Thomas Gleixner 
---
v4:
- Add "Reviewed-by: Thomas Gleixner " (Thomas).

v2:
- Directly write IA32_PASID MSR in fixup while local IRQ is still disabled
  (Thomas)
- Move #ifdef over to CONFIG_IOMMU_SVA since it is what
  defines mm->pasid and ->pasid_activated (Dave Hansen).
- Rename try_fixup_pasid() -> try_fixup_enqcmd_gp(). This
  code really is highly specific to ENQCMD, not PASIDs (Dave Hansen).
- Add lockdep assert and comment about context (Dave Hansen).
- Re-flow the if() mess (Dave Hansen).

 arch/x86/kernel/traps.c | 55 +
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c9d566dcf89a..7ef00dee35be 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -559,6 +560,57 @@ static bool fixup_iopl_exception(struct pt_regs *regs)
return true;
 }
 
+/*
+ * The unprivileged ENQCMD instruction generates #GPs if the
+ * IA32_PASID MSR has not been populated.  If possible, populate
+ * the MSR from a PASID previously allocated to the mm.
+ */
+static bool try_fixup_enqcmd_gp(void)
+{
+#ifdef CONFIG_IOMMU_SVA
+   u32 pasid;
+
+   /*
+* MSR_IA32_PASID is managed using XSAVE.  Directly
+* writing to the MSR is only possible when fpregs
+* are valid and the fpstate is not.  This is
+* guaranteed when handling a userspace exception
+* in *before* interrupts are re-enabled.
+*/
+   lockdep_assert_irqs_disabled();
+
+   /*
+* Hardware without ENQCMD will not generate
+* #GPs that can be fixed up here.
+*/
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return false;
+
+   pasid = current->mm->pasid;
+
+   /*
+* If the mm has not been allocated a
+* PASID, the #GP can not be fixed up.
+*/
+   if (!pasid_valid(pasid))
+   return false;
+
+   /*
+* Did this thread already have its PASID activated?
+* If so, the #GP must be from something else.
+*/
+   if (current->pasid_activated)
+   return false;
+
+   wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
+   current->pasid_activated = 1;
+
+   return true;
+#else
+   return false;
+#endif
+}
+
 DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 {
char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
@@ -567,6 +619,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
unsigned long gp_addr;
int ret;
 
+   if (user_mode(regs) && try_fixup_enqcmd_gp())
+   return;
+
cond_local_irq_enable(regs);
 
if (static_cpu_has(X86_FEATURE_UMIP)) {
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 07/11] sched: Define and initialize a flag to identify valid PASID in the task

2022-02-07 Thread Fenghua Yu
From: Peter Zijlstra 

Add a new single bit field to the task structure to track whether this task
has initialized the IA32_PASID MSR to the mm's PASID.

Initialize the field to zero when creating a new task with fork/clone.

Signed-off-by: Peter Zijlstra 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Thomas Gleixner 
---
v4:
- Add "Reviewed-by: Thomas Gleixner " (Thomas).

v2:
- Change condition to more accurate CONFIG_IOMMU_SVA (Jacob)

 include/linux/sched.h | 3 +++
 kernel/fork.c | 4 
 2 files changed, 7 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75ba8aa60248..4e5de3aed410 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -938,6 +938,9 @@ struct task_struct {
/* Recursion prevention for eventfd_signal() */
unsignedin_eventfd_signal:1;
 #endif
+#ifdef CONFIG_IOMMU_SVA
+   unsignedpasid_activated:1;
+#endif
 
unsigned long   atomic_flags; /* Flags requiring atomic 
access. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index c03c6682464c..51fd1df994b7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -968,6 +968,10 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
tsk->use_memdelay = 0;
 #endif
 
+#ifdef CONFIG_IOMMU_SVA
+   tsk->pasid_activated = 0;
+#endif
+
 #ifdef CONFIG_MEMCG
tsk->active_memcg = NULL;
 #endif
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-07 Thread Fenghua Yu
PASIDs are process wide. It was attempted to use refcounted PASIDs to
free them when the last thread drops the refcount. This turned out to
be complex and error prone. Given the fact that the PASID space is 20
bits, which allows up to 1M processes to have a PASID associated
concurrently, PASID resource exhaustion is not a realistic concern.

Therefore it was decided to simplify the approach and stick with lazy
on demand PASID allocation, but drop the eager free approach and make
a allocated PASID lifetime bound to the life time of the process.

Get rid of the refcounting mechanisms and replace/rename the interfaces
to reflect this new approach.

Suggested-by: Dave Hansen 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Update the commit message (Thomas).

v3:
- Rename mm_pasid_get() to mm_pasid_set() (Thomas).
- Remove ioasid_get() because it's not used any more when the IOASID
  is freed on mm exit (Thomas).
- Remove PASID's refcount exercise in ioasid_put() and rename
  ioasid_put() to ioasid_free() (Thomas).

v2:
- Free PASID on mm exit instead of in exit(2) or unbind() (Thomas, AndyL,
  PeterZ)
- Add mm_pasid_init(), mm_pasid_get(), and mm_pasid_drop() functions in mm.
  So the mm's PASID operations are generic for both X86 and ARM
  (Dave Hansen)

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +--
 drivers/iommu/intel/iommu.c   |  4 +-
 drivers/iommu/intel/svm.c |  9 -
 drivers/iommu/ioasid.c| 39 ++-
 drivers/iommu/iommu-sva-lib.c | 39 ++-
 drivers/iommu/iommu-sva-lib.h |  1 -
 include/linux/ioasid.h| 12 +-
 include/linux/sched/mm.h  | 16 
 kernel/fork.c |  1 +
 9 files changed, 38 insertions(+), 88 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a737ba5f727e..22ddd05bbdcd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -340,14 +340,12 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
bond->smmu_mn = arm_smmu_mmu_notifier_get(smmu_domain, mm);
if (IS_ERR(bond->smmu_mn)) {
ret = PTR_ERR(bond->smmu_mn);
-   goto err_free_pasid;
+   goto err_free_bond;
}
 
list_add(>list, >bonds);
return >sva;
 
-err_free_pasid:
-   iommu_sva_free_pasid(mm);
 err_free_bond:
kfree(bond);
return ERR_PTR(ret);
@@ -377,7 +375,6 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
if (refcount_dec_and_test(>refs)) {
list_del(>list);
arm_smmu_mmu_notifier_put(bond->smmu_mn);
-   iommu_sva_free_pasid(bond->mm);
kfree(bond);
}
mutex_unlock(_lock);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 92fea3fbbb11..ef03b2176bbd 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4781,7 +4781,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
 link_failed:
spin_unlock_irqrestore(_domain_lock, flags);
if (list_empty(>subdevices) && domain->default_pasid > 0)
-   ioasid_put(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 
return ret;
 }
@@ -4811,7 +4811,7 @@ static void aux_domain_remove_dev(struct dmar_domain 
*domain,
spin_unlock_irqrestore(_domain_lock, flags);
 
if (list_empty(>subdevices) && domain->default_pasid > 0)
-   ioasid_put(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 }
 
 static int prepare_domain_attach_device(struct iommu_domain *domain,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5b5d69b04fcc..51ac2096b3da 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -514,11 +514,6 @@ static int intel_svm_alloc_pasid(struct device *dev, 
struct mm_struct *mm,
return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
 }
 
-static void intel_svm_free_pasid(struct mm_struct *mm)
-{
-   iommu_sva_free_pasid(mm);
-}
-
 static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
   struct device *dev,
   struct mm_struct *mm,
@@ -662,8 +657,6 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
pasid)
kfree(svm);
}
}
-   /* Drop a PASID reference and free it if no reference. */
-   intel_svm_free_pasid(mm);
}
 out:
return ret;
@@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device *dev, 
struct mm_struct

[PATCH v4 01/11] iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA

2022-02-07 Thread Fenghua Yu
This CONFIG option originally only referred to the Shared
Virtual Address (SVA) library. But it is now also used for
non-library portions of code.

Drop the "_LIB" suffix so that there is just one configuration
options for all code relating to SVA.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Thomas Gleixner 
---
v4:
- Add "Reviewed-by: Thomas Gleixner " (Thomas).

v2:
- Add this patch for more meaningful name CONFIG_IOMMU_SVA

 drivers/iommu/Kconfig | 6 +++---
 drivers/iommu/Makefile| 2 +-
 drivers/iommu/intel/Kconfig   | 2 +-
 drivers/iommu/iommu-sva-lib.h | 6 +++---
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 3eb68fa1b8cc..c79a0df090c0 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -144,8 +144,8 @@ config IOMMU_DMA
select IRQ_MSI_IOMMU
select NEED_SG_DMA_LENGTH
 
-# Shared Virtual Addressing library
-config IOMMU_SVA_LIB
+# Shared Virtual Addressing
+config IOMMU_SVA
bool
select IOASID
 
@@ -379,7 +379,7 @@ config ARM_SMMU_V3
 config ARM_SMMU_V3_SVA
bool "Shared Virtual Addressing support for the ARM SMMUv3"
depends on ARM_SMMU_V3
-   select IOMMU_SVA_LIB
+   select IOMMU_SVA
select MMU_NOTIFIER
help
  Support for sharing process address spaces with devices using the
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index bc7f730edbb0..44475a9b3eea 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,6 +27,6 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
 obj-$(CONFIG_APPLE_DART) += apple-dart.o
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 247d0f2d5fdf..39a06d245f12 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -52,7 +52,7 @@ config INTEL_IOMMU_SVM
select PCI_PRI
select MMU_NOTIFIER
select IOASID
-   select IOMMU_SVA_LIB
+   select IOMMU_SVA
help
  Shared Virtual Memory (SVM) provides a facility for devices
  to access DMA resources through process address space by
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 031155010ca8..95dc3ebc1928 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -17,7 +17,7 @@ struct device;
 struct iommu_fault;
 struct iopf_queue;
 
-#ifdef CONFIG_IOMMU_SVA_LIB
+#ifdef CONFIG_IOMMU_SVA
 int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
 
 int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
@@ -28,7 +28,7 @@ struct iopf_queue *iopf_queue_alloc(const char *name);
 void iopf_queue_free(struct iopf_queue *queue);
 int iopf_queue_discard_partial(struct iopf_queue *queue);
 
-#else /* CONFIG_IOMMU_SVA_LIB */
+#else /* CONFIG_IOMMU_SVA */
 static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
 {
return -ENODEV;
@@ -64,5 +64,5 @@ static inline int iopf_queue_discard_partial(struct 
iopf_queue *queue)
 {
return -ENODEV;
 }
-#endif /* CONFIG_IOMMU_SVA_LIB */
+#endif /* CONFIG_IOMMU_SVA */
 #endif /* _IOMMU_SVA_LIB_H */
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 10/11] tools/objtool: Check for use of the ENQCMD instruction in the kernel

2022-02-07 Thread Fenghua Yu
The ENQCMD implicitly accesses the PASID_MSR to fill in the pasid field
of the descriptor being submitted to an accelerator. But there is no
precise (and stable across kernel changes) point at which the PASID_MSR
is updated from the value for one task to the next.

Kernel code that uses accelerators must always use the ENQCMDS instruction
which does not access the PASID_MSR.

Check for use of the ENQCMD instruction in the kernel and warn on its
usage.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Acked-by: Josh Poimboeuf 
---
v3:
- Add Acked-by: Josh Poimboeuf 

v2:
- Simplify handling ENQCMD (PeterZ and Josh)

 tools/objtool/arch/x86/decode.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index c10ef78df050..479e769ca324 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -112,7 +112,7 @@ int arch_decode_instruction(struct objtool_file *file, 
const struct section *sec
const struct elf *elf = file->elf;
struct insn insn;
int x86_64, ret;
-   unsigned char op1, op2,
+   unsigned char op1, op2, op3,
  rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
  modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
  sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
@@ -139,6 +139,7 @@ int arch_decode_instruction(struct objtool_file *file, 
const struct section *sec
 
op1 = insn.opcode.bytes[0];
op2 = insn.opcode.bytes[1];
+   op3 = insn.opcode.bytes[2];
 
if (insn.rex_prefix.nbytes) {
rex = insn.rex_prefix.bytes[0];
@@ -491,6 +492,14 @@ int arch_decode_instruction(struct objtool_file *file, 
const struct section *sec
/* nopl/nopw */
*type = INSN_NOP;
 
+   } else if (op2 == 0x38 && op3 == 0xf8) {
+   if (insn.prefixes.nbytes == 1 &&
+   insn.prefixes.bytes[0] == 0xf2) {
+   /* ENQCMD cannot be used in the kernel. */
+   WARN("ENQCMD instruction at %s:%lx", sec->name,
+offset);
+   }
+
} else if (op2 == 0xa0 || op2 == 0xa8) {
 
/* push fs/gs */
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 00/11] Re-enable ENQCMD and PASID MSR

2022-02-07 Thread Fenghua Yu
Problems in the old code to manage SVM (Shared Virtual Memory) devices
and the PASID (Process Address Space ID) led to that code being
disabled.

Subsequent discussions resulted in a far simpler approach:

1) PASID life cycle is from first allocation by a process until that
   process exits.
2) All tasks begin with PASID disabled
3) The #GP fault handler tries to fix faulting ENQCMD instructions very
   early (thus avoiding complexities of the XSAVE infrastructure)

Change Log:
v4:
- Update commit message in patch #4 (Thomas).
- Update commit message in patch #5 (Thomas).
- Add "Reviewed-by: Thomas Gleixner " in patch #1-#3
  and patch #6-#9 (Thomas).
- Rebased to 5.17-rc3.

v3 can be found at 
https://lore.kernel.org/lkml/20220128202905.2274672-7-fenghua...@intel.com/T/#m039e1a201e9894d99b117fa6005bc05724a5a4bb

v3:
- Rename mm_pasid_get() to mm_pasid_set() in patch #5 (Thomas).
- Remove ioasid_get() because it's not used any more when the IOASID
  is freed on mm exit in patch #5 (Thomas).
- Remove PASID's refcount exercise in ioasid_put() and rename
  ioasid_put() to ioasid_free() in patch #5 and #11 (Thomas).
- Add Acked-by: Josh Poimboeuf  in patch #10.

v2 can be found at 
https://lore.kernel.org/lkml/20211217220136.2762116-1-fenghua...@intel.com/

v2:
- Free PASID on mm exit instead of in exit(2) or unbind() (Thomas, AndyL,
  PeterZ)
- Directly write IA32_PASID MSR in fixup while local IRQ is still disabled
  (Thomas)
- Simplify handling ENQCMD in objtool (PeterZ and Josh)
- Define mm_pasid_get(), mm_pasid_drop(), and mm_pasid_init() in mm and
  call the functions from IOMMU (Dave Hansen).
- A few changes in the #GP fixup function (Dave Hansen, Tony Luck).
- Initial PASID value is changed to INVALID_PASID (Ashok Raj and
  Jacob Pan).
- Add mm_pasid_init(), mm_pasid_get(), and mm_pasid_drop() functions in mm.
  So the mm's PASID operations are generic for both X86 and ARM
  (Dave Hansen).
- Rename CONFIG_IOMMU_SVA_LIB to more useful and accurate
  CONFIG_IOMMU_SVA
- Use CONFIG_IOMMU_SVA for PASID processing condition (Jacob)
- The patch that cleans up old update_pasid() function is in upstream
  now (commit: 00ecd5401349 "iommu/vt-d: Clean up unused PASID updating
  functions") and therefore it's removed from this version.

v1 can be found at 
https://lore.kernel.org/lkml/20210920192349.2602141-1-fenghua...@intel.com/T/#md6d542091da1d1159eda0a44a16e57d0c0dfb209

Fenghua Yu (10):
  iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
  mm: Change CONFIG option for mm->pasid field
  iommu/ioasid: Introduce a helper to check for valid PASIDs
  kernel/fork: Initialize mm's PASID
  iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm
exit
  x86/fpu: Clear PASID when copying fpstate
  x86/traps: Demand-populate PASID MSR via #GP
  x86/cpufeatures: Re-enable ENQCMD
  tools/objtool: Check for use of the ENQCMD instruction in the kernel
  docs: x86: Change documentation for SVA (Shared Virtual Addressing)

Peter Zijlstra (1):
  sched: Define and initialize a flag to identify valid PASID in the
task

 Documentation/x86/sva.rst | 53 ++
 arch/x86/include/asm/disabled-features.h  |  7 ++-
 arch/x86/kernel/fpu/core.c|  7 +++
 arch/x86/kernel/traps.c   | 55 +++
 drivers/iommu/Kconfig |  6 +-
 drivers/iommu/Makefile|  2 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +-
 drivers/iommu/intel/Kconfig   |  2 +-
 drivers/iommu/intel/iommu.c   |  4 +-
 drivers/iommu/intel/svm.c |  9 ---
 drivers/iommu/ioasid.c| 39 ++---
 drivers/iommu/iommu-sva-lib.c | 39 -
 drivers/iommu/iommu-sva-lib.h |  7 +--
 include/linux/ioasid.h| 21 +++
 include/linux/mm_types.h  |  2 +-
 include/linux/sched.h |  3 +
 include/linux/sched/mm.h  | 26 +
 kernel/fork.c | 15 +++--
 mm/init-mm.c  |  4 ++
 tools/objtool/arch/x86/decode.c   | 11 +++-
 20 files changed, 197 insertions(+), 120 deletions(-)

-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 06/11] x86/fpu: Clear PASID when copying fpstate

2022-02-07 Thread Fenghua Yu
The kernel must allocate a Process Address Space ID (PASID) on behalf of
each process which will use ENQCMD and program it into the new MSR to
communicate the process identity to platform hardware. ENQCMD uses the
PASID stored in this MSR to tag requests from this process.

The PASID state must be cleared on fork() since fork creates a
new address space.

For clone(), it would be functionally OK to copy the PASID. However,
clearing it is _also_ functionally OK since any PASID use will trigger
the #GP handler to populate the MSR.

Copying the PASID state has two main downsides:
 * It requires differentiating fork() and clone() in the code,
   both in the FPU code and keeping tsk->pasid_activated consistent.
 * It guarantees that the PASID is out of its init state, which
   incurs small but non-zero cost on every XSAVE/XRSTOR.

The main downside of clearing the PASID at fpstate copy is the future,
one-time #GP for the thread.

Use the simplest approach: clear the PASID state both on clone() and
fork().  Rely on the #GP handler for MSR population in children.

Also, just clear the PASID bit from xfeatures if XSAVE is supported.
This will have no effect on systems that do not have PASID support.  It
is virtually zero overhead because 'dst_fpu' was just written and
the whole thing is cache hot.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Thomas Gleixner 
---
v4:
- Add "Reviewed-by: Thomas Gleixner " (Thomas).

v2:
- Rewrite changelog (Dave Hansen).
- Move xfeature tweaking into fpu_clone() and make it unconditional
  if XSAVE is supported (Dave Hansen).

 arch/x86/kernel/fpu/core.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8dea01ffc5c1..19821f027cb3 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -612,6 +612,13 @@ int fpu_clone(struct task_struct *dst, unsigned long 
clone_flags)
fpu_inherit_perms(dst_fpu);
fpregs_unlock();
 
+   /*
+* Children never inherit PASID state.
+* Force it to have its init value:
+*/
+   if (use_xsave())
+   dst_fpu->fpstate->regs.xsave.header.xfeatures &= 
~XFEATURE_MASK_PASID;
+
trace_x86_fpu_copy_src(src_fpu);
trace_x86_fpu_copy_dst(dst_fpu);
 
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 11/11] docs: x86: Change documentation for SVA (Shared Virtual Addressing)

2022-02-07 Thread Fenghua Yu
Since allocating, freeing and fixing up PASID are changed, the
documentation is updated to reflect the changes.

Originally-by: Ashok Raj 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v3:
- Remove PASID refcount description (Thomas).

v2:
- Update life cycle info (Tony and Thomas).
- Update initial PASID value to INVALID_IOASID on fork().

 Documentation/x86/sva.rst | 53 ++-
 1 file changed, 41 insertions(+), 12 deletions(-)

diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
index 076efd51ef1f..1a22020735a3 100644
--- a/Documentation/x86/sva.rst
+++ b/Documentation/x86/sva.rst
@@ -104,18 +104,47 @@ The MSR must be configured on each logical CPU before any 
application
 thread can interact with a device. Threads that belong to the same
 process share the same page tables, thus the same MSR value.
 
-PASID is cleared when a process is created. The PASID allocation and MSR
-programming may occur long after a process and its threads have been created.
-One thread must call iommu_sva_bind_device() to allocate the PASID for the
-process. If a thread uses ENQCMD without the MSR first being populated, a #GP
-will be raised. The kernel will update the PASID MSR with the PASID for all
-threads in the process. A single process PASID can be used simultaneously
-with multiple devices since they all share the same address space.
-
-One thread can call iommu_sva_unbind_device() to free the allocated PASID.
-The kernel will clear the PASID MSR for all threads belonging to the process.
-
-New threads inherit the MSR value from the parent.
+PASID Life Cycle Management
+==
+
+PASID is initialized as INVALID_IOASID (-1) when a process is created.
+
+Only processes that access SVA-capable devices need to have a PASID
+allocated. This allocation happens when a process opens/binds an SVA-capable
+device but finds no PASID for this process. Subsequent binds of the same, or
+other devices will share the same PASID.
+
+Although the PASID is allocated to the process by opening a device,
+it is not active in any of the threads of that process. It's loaded to the
+IA32_PASID MSR lazily when a thread tries to submit a work descriptor
+to a device using the ENQCMD.
+
+That first access will trigger a #GP fault because the IA32_PASID MSR
+has not been initialized with the PASID value assigned to the process
+when the device was opened. The Linux #GP handler notes that a PASID has
+been allocated for the process, and so initializes the IA32_PASID MSR
+and returns so that the ENQCMD instruction is re-executed.
+
+On fork(2) or exec(2) the PASID is removed from the process as it no
+longer has the same address space that it had when the device was opened.
+
+On clone(2) the new task shares the same address space, so will be
+able to use the PASID allocated to the process. The IA32_PASID is not
+preemptively initialized as the PASID value might not be allocated yet or
+the kernel does not know whether this thread is going to access the device
+and the cleared IA32_PASID MSR reduces context switch overhead by xstate
+init optimization. Since #GP faults have to be handled on any threads that
+were created before the PASID was assigned to the mm of the process, newly
+created threads might as well be treated in a consistent way.
+
+Due to complexity of freeing the PASID and clearing all IA32_PASID MSRs in
+all threads in unbind, free the PASID lazily only on mm exit.
+
+If a process does a close(2) of the device file descriptor and munmap(2)
+of the device MMIO portal, then the driver will unbind the device. The
+PASID is still marked VALID in the PASID_MSR for any threads in the
+process that accessed the device. But this is harmless as without the
+MMIO portal they cannot submit new work to the device.
 
 Relationships
 =
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 09/11] x86/cpufeatures: Re-enable ENQCMD

2022-02-07 Thread Fenghua Yu
Since ENQCMD is handled by #GP fix up, it can be re-enabled.

The ENQCMD feature can only be used if CONFIG_INTEL_IOMMU_SVM is set. Add
X86_FEATURE_ENQCMD to the disabled features mask as appropriate so that
cpu_feature_enabled() can be used to check the feature.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Thomas Gleixner 
---
v4:
- Add "Reviewed-by: Thomas Gleixner " (Thomas).

v2:
- Update the commit message (Tony).

 arch/x86/include/asm/disabled-features.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 8f28fafa98b3..1231d63f836d 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -56,8 +56,11 @@
 # define DISABLE_PTI   (1 << (X86_FEATURE_PTI & 31))
 #endif
 
-/* Force disable because it's broken beyond repair */
-#define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
+#ifdef CONFIG_INTEL_IOMMU_SVM
+# define DISABLE_ENQCMD0
+#else
+# define DISABLE_ENQCMD(1 << (X86_FEATURE_ENQCMD & 31))
+#endif
 
 #ifdef CONFIG_X86_SGX
 # define DISABLE_SGX   0
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 03/11] iommu/ioasid: Introduce a helper to check for valid PASIDs

2022-02-07 Thread Fenghua Yu
pasid_valid() is defined to check if a given PASID is valid.

Suggested-by: Ashok Raj 
Suggested-by: Jacob Pan 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Thomas Gleixner 
---
v4:
- Add "Reviewed-by: Thomas Gleixner " (Thomas).

v2:
- Define a helper pasid_valid() (Ashok, Jacob, Thomas, Tony)

 include/linux/ioasid.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index e9dacd4b9f6b..2237f64dbaae 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -41,6 +41,10 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
 int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
 void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
 int ioasid_set_data(ioasid_t ioasid, void *data);
+static inline bool pasid_valid(ioasid_t ioasid)
+{
+   return ioasid != INVALID_IOASID;
+}
 
 #else /* !CONFIG_IOASID */
 static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
@@ -78,5 +82,10 @@ static inline int ioasid_set_data(ioasid_t ioasid, void 
*data)
return -ENOTSUPP;
 }
 
+static inline bool pasid_valid(ioasid_t ioasid)
+{
+   return false;
+}
+
 #endif /* CONFIG_IOASID */
 #endif /* __LINUX_IOASID_H */
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 04/11] kernel/fork: Initialize mm's PASID

2022-02-07 Thread Fenghua Yu
A new mm doesn't have a PASID yet when it's created. Initialize
the mm's PASID on fork() or for init_mm to INVALID_IOASID (-1).

INIT_PASID (0) is reserved for kernel legacy DMA PASID. It cannot be
allocated to a user process. Initializing the process's PASID to 0 may
cause confusion that why the process uses the reserved kernel legacy DMA
PASID. Initializing the PASID to INVALID_IOASID (-1) explicitly
tells the process doesn't have a valid PASID yet.

Even though the only user of mm_pasid_init() is in fork.c, define it
in  as the first of three mm/pasid life cycle
functions (init/set/drop) to keep these all together.

Suggested-by: Dave Hansen 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v4:
- Update the commit message to explain why init pasid to -1 and why
  define mm_pasid_init() (Thomas)

v2:
- Change condition to more accurate CONFIG_IOMMU_SVA (Jacob)

 include/linux/sched/mm.h | 10 ++
 kernel/fork.c| 10 ++
 mm/init-mm.c |  4 
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index aa5f09ca5bcf..c74d1edbac2f 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Routines for handling mm_structs
@@ -433,4 +434,13 @@ static inline void membarrier_update_current_mm(struct 
mm_struct *next_mm)
 }
 #endif
 
+#ifdef CONFIG_IOMMU_SVA
+static inline void mm_pasid_init(struct mm_struct *mm)
+{
+   mm->pasid = INVALID_IOASID;
+}
+#else
+static inline void mm_pasid_init(struct mm_struct *mm) {}
+#endif
+
 #endif /* _LINUX_SCHED_MM_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 6ee7551d3bd2..deacd2c17a7f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -97,6 +97,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1019,13 +1020,6 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 #endif
 }
 
-static void mm_init_pasid(struct mm_struct *mm)
-{
-#ifdef CONFIG_IOMMU_SVA
-   mm->pasid = INIT_PASID;
-#endif
-}
-
 static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
@@ -1054,7 +1048,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
-   mm_init_pasid(mm);
+   mm_pasid_init(mm);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
diff --git a/mm/init-mm.c b/mm/init-mm.c
index b4a6f38fb51d..fbe7844d0912 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #ifndef INIT_MM_CONTEXT
@@ -38,6 +39,9 @@ struct mm_struct init_mm = {
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
.user_ns= _user_ns,
.cpu_bitmap = CPU_BITS_NONE,
+#ifdef CONFIG_IOMMU_SVA
+   .pasid  = INVALID_IOASID,
+#endif
INIT_MM_CONTEXT(init_mm)
 };
 
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 02/11] mm: Change CONFIG option for mm->pasid field

2022-02-07 Thread Fenghua Yu
This currently depends on CONFIG_IOMMU_SUPPORT. But it is only
needed when CONFIG_IOMMU_SVA option is enabled.

Change the CONFIG guards around definition and initialization
of mm->pasid field.

Suggested-by: Jacob Pan 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Thomas Gleixner 
---
v4:
- Add "Reviewed-by: Thomas Gleixner " (Thomas).

v2:
- Change condition to more accurate CONFIG_IOMMU_SVA (Jacob and Tony)

 include/linux/mm_types.h | 2 +-
 kernel/fork.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5140e5feb486..c5cbfd7915ad 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -631,7 +631,7 @@ struct mm_struct {
 #endif
struct work_struct async_put_work;
 
-#ifdef CONFIG_IOMMU_SUPPORT
+#ifdef CONFIG_IOMMU_SVA
u32 pasid;
 #endif
} __randomize_layout;
diff --git a/kernel/fork.c b/kernel/fork.c
index d75a528f7b21..6ee7551d3bd2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1021,7 +1021,7 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 
 static void mm_init_pasid(struct mm_struct *mm)
 {
-#ifdef CONFIG_IOMMU_SUPPORT
+#ifdef CONFIG_IOMMU_SVA
mm->pasid = INIT_PASID;
 #endif
 }
-- 
2.35.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-04 Thread Fenghua Yu
Hi, Baolu,
On Sat, Feb 05, 2022 at 11:50:59AM +0800, Lu Baolu wrote:
> Hi Fenghua,
> 
> On 2022/1/29 4:28, Fenghua Yu wrote:
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 92fea3fbbb11..ef03b2176bbd 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -4781,7 +4781,7 @@ static int aux_domain_add_dev(struct dmar_domain 
> > *domain,
> >   link_failed:
> > spin_unlock_irqrestore(_domain_lock, flags);
> > if (list_empty(>subdevices) && domain->default_pasid > 0)
> > -   ioasid_put(domain->default_pasid);
> > +   ioasid_free(domain->default_pasid);
> > return ret;
> >   }
> > @@ -4811,7 +4811,7 @@ static void aux_domain_remove_dev(struct dmar_domain 
> > *domain,
> > spin_unlock_irqrestore(_domain_lock, flags);
> > if (list_empty(>subdevices) && domain->default_pasid > 0)
> > -   ioasid_put(domain->default_pasid);
> > +   ioasid_free(domain->default_pasid);
> >   }
> >   static int prepare_domain_attach_device(struct iommu_domain *domain,
> 
> The domain->default_pasid is not relevant to SVA and it's being cleaned
> up by another series. No need to take care of it in this series.

Because ioasid_put() is renamed to ioasid_free() in this patch, without
above changes, this series cannot be compiled.

Thomas and I discussed how to handle aux_domain while you will remove
the entire aux_domain code (https://lore.kernel.org/lkml/87zgnf29op.ffs@tglx/).
The above changes are minimal and temporary changes to compile this series.
The changes will be removed along with the entire aux_domain by your
removing aux_domain series later in 5.18.

Thanks.

-Fenghua

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-04 Thread Fenghua Yu
On Sat, Feb 05, 2022 at 12:56:00AM +0100, Thomas Gleixner wrote:
> On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote:
> > To avoid complexity of updating each thread's PASID status (e.g. sending
> > IPI to update IA32_PASID MSR) on allocating and freeing PASID, once
> > allocated and assigned to an mm, the PASID stays with the mm for the
> > rest of the mm's lifetime. A reference to the PASID is taken on
> > allocating the PASID. Binding/unbinding the PASID won't change refcount.
> > The reference is dropped on mm exit and thus the PASID is freed.
> >
> > Two helpers mm_pasid_set() and mm_pasid_drop() are defined in mm because
> > the PASID operations handle the pasid member in mm_struct and should be
> > part of mm operations. Because IOASID's reference count is not used any
> > more and removed, unused ioasid_get() and iommu_sva_free_pasid()
> > are deleted and ioasid_put() is renamed to ioasid_free().
> >
> > 20-bit PASID allows up to 1M processes bound to PASIDs at the same time.
> > With cgroups and other controls that might limit the number of process
> > creation, the limited number of PASIDs is not a realistic issue for
> > lazy PASID free.
> 
> Please take a step back and think hard about it whether that changelog
> makes sense to you a year from now.
> 
> Let me walk you through:
> 
> > To avoid complexity of updating each thread's PASID status (e.g. sending
> > IPI to update IA32_PASID MSR) on allocating and freeing PASID, once
> > allocated and assigned to an mm, the PASID stays with the mm for the
> > rest of the mm's lifetime.
> 
> You are missing the oportunity to tell a story about the history of this
> decision here:
> 
>   PASIDs are process wide. It was attempted to use refcounted PASIDs to
>   free them when the last thread drops the refcount. This turned out to
>   be complex and error prone. Given the fact that the PASID space is 20
>   bits, which allows up to 1M processes to have a PASID associated
>   concurrently, PASID resource exhaustion is not a realistic concern.
> 
>   Therefore it was decided to simplify the approach and stick with lazy
>   on demand PASID allocation, but drop the eager free approach and make
>   a allocated PASID lifetime bound to the life time of the process.
> 
> > A reference to the PASID is taken on allocating the
> > PASID. Binding/unbinding the PASID won't change refcount.  The
> > reference is dropped on mm exit and thus the PASID is freed.
> 
> There is no refcount in play anymore, right? So how does this part of
> the changelog make any sense?
> 
> This is followed by:
> 
> > Two helpers mm_pasid_set() and mm_pasid_drop() are defined in mm because
> > the PASID operations handle the pasid member in mm_struct and should be
> > part of mm operations. Because IOASID's reference count is not used any
> > more and removed, unused ioasid_get() and iommu_sva_free_pasid()
> > are deleted and ioasid_put() is renamed to ioasid_free().
> 
> which does not provide much rationale and just blurbs about _what_ the
> patch is doing and not about the why and the connection to the
> above. And the refcount removal section contradicts the stale text
> above.
> 
> So this paragraph should be something like this:
> 
>   Get rid of the refcounting mechanisms and replace/rename the
>   interfaces to reflect this new approach.
> 
> Hmm?

Sure. I will update the commit message with your comments.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 04/11] kernel/fork: Initialize mm's PASID

2022-02-04 Thread Fenghua Yu
Hi, Thomas,

On Sat, Feb 05, 2022 at 12:22:12AM +0100, Thomas Gleixner wrote:
> On Fri, Jan 28 2022 at 12:28, Fenghua Yu wrote:
> > A new mm doesn't have a PASID yet when it's created. Initialize
> > the mm's PASID on fork() or for init_mm to INVALID_IOASID (-1).
> 
> I must be missing something here.
> 
> > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > index aa5f09ca5bcf..c74d1edbac2f 100644
> > --- a/include/linux/sched/mm.h
> > +++ b/include/linux/sched/mm.h
> > @@ -8,6 +8,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  /*
> >   * Routines for handling mm_structs
> > @@ -433,4 +434,13 @@ static inline void membarrier_update_current_mm(struct 
> > mm_struct *next_mm)
> >  }
> >  #endif
> >  
> > +#ifdef CONFIG_IOMMU_SVA
> > +static inline void mm_pasid_init(struct mm_struct *mm)
> > +{
> > +   mm->pasid = INVALID_IOASID;
> > +}
> > +#else
> > +static inline void mm_pasid_init(struct mm_struct *mm) {}
> > +#endif
> > +
> >  #endif /* _LINUX_SCHED_MM_H */
> 
> So this adds mm_pasid_init() to linux/sched/mm.h which replaces:
> 
> > -static void mm_init_pasid(struct mm_struct *mm)
> > -{
> > -#ifdef CONFIG_IOMMU_SVA
> > -   mm->pasid = INIT_PASID;
> > -#endif
> > -}
> > -
> 
> I.e. already existing code which is initializing mm->pasid with
> INIT_PASID (0) while the replacement initializes it with INVALID_IOASID
> (-1).
> 
> The change log does not have any information about why INIT_PASID is the
> wrong initialization value and why this change is not having any side
> effects.

I should add the following info in the commit message to explain why
change INIT_PASID (0) to INVALID_IOASID (-1):

INIT_PASID (0) is reserved for kernel legacy DMA PASID. It cannot be
allocated to a user process. Initialize the process's PASID to 0 may
cause confusion that why the process uses reserved kernel legacy DMA
PASID. Initializing the PASID to INVALID_IOASID (-1) explicitly
tells the process doesn't have a valid PASID yet initially.

Is it OK for you?

> 
> It neither mentions why having this in a global available header makes
> sense when the only call site is in the C file from which the already
> existing function is removed.

This series defines three helpers mm_pasid_init(), mm_pasid_set(), and
mm_pasid_drop() in mm because they handle the pasid member in mm_struct
and should be part of mm operations. I explained why mm_pasid_set() and
mm_pasid_drop() are defined in mm, but I didn't explain why mm_pasid_init()
is define in mm.

Is it OK to add the following explanation on why mm_pasid_init() is defined?

mm_pasid_init() is defined in mm and replaces mm_init_pasid() because
the PASID init operation initializes the pasid member in mm_struct and
should be part of mm operations.

Thanks,

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 10/11] tools/objtool: Check for use of the ENQCMD instruction in the kernel

2022-01-28 Thread Fenghua Yu
The ENQCMD implicitly accesses the PASID_MSR to fill in the pasid field
of the descriptor being submitted to an accelerator. But there is no
precise (and stable across kernel changes) point at which the PASID_MSR
is updated from the value for one task to the next.

Kernel code that uses accelerators must always use the ENQCMDS instruction
which does not access the PASID_MSR.

Check for use of the ENQCMD instruction in the kernel and warn on its
usage.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Acked-by: Josh Poimboeuf 
---
v3:
- Add Acked-by: Josh Poimboeuf 

v2:
- Simplify handling ENQCMD (PeterZ and Josh)

 tools/objtool/arch/x86/decode.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index c10ef78df050..479e769ca324 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -112,7 +112,7 @@ int arch_decode_instruction(struct objtool_file *file, 
const struct section *sec
const struct elf *elf = file->elf;
struct insn insn;
int x86_64, ret;
-   unsigned char op1, op2,
+   unsigned char op1, op2, op3,
  rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
  modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
  sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
@@ -139,6 +139,7 @@ int arch_decode_instruction(struct objtool_file *file, 
const struct section *sec
 
op1 = insn.opcode.bytes[0];
op2 = insn.opcode.bytes[1];
+   op3 = insn.opcode.bytes[2];
 
if (insn.rex_prefix.nbytes) {
rex = insn.rex_prefix.bytes[0];
@@ -491,6 +492,14 @@ int arch_decode_instruction(struct objtool_file *file, 
const struct section *sec
/* nopl/nopw */
*type = INSN_NOP;
 
+   } else if (op2 == 0x38 && op3 == 0xf8) {
+   if (insn.prefixes.nbytes == 1 &&
+   insn.prefixes.bytes[0] == 0xf2) {
+   /* ENQCMD cannot be used in the kernel. */
+   WARN("ENQCMD instruction at %s:%lx", sec->name,
+offset);
+   }
+
} else if (op2 == 0xa0 || op2 == 0xa8) {
 
/* push fs/gs */
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 03/11] iommu/ioasid: Introduce a helper to check for valid PASIDs

2022-01-28 Thread Fenghua Yu
pasid_valid() is defined to check if a given PASID is valid.

Suggested-by: Ashok Raj 
Suggested-by: Jacob Pan 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Define a helper pasid_valid() (Ashok, Jacob, Thomas, Tony)

 include/linux/ioasid.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index e9dacd4b9f6b..2237f64dbaae 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -41,6 +41,10 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
 int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
 void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
 int ioasid_set_data(ioasid_t ioasid, void *data);
+static inline bool pasid_valid(ioasid_t ioasid)
+{
+   return ioasid != INVALID_IOASID;
+}
 
 #else /* !CONFIG_IOASID */
 static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
@@ -78,5 +82,10 @@ static inline int ioasid_set_data(ioasid_t ioasid, void 
*data)
return -ENOTSUPP;
 }
 
+static inline bool pasid_valid(ioasid_t ioasid)
+{
+   return false;
+}
+
 #endif /* CONFIG_IOASID */
 #endif /* __LINUX_IOASID_H */
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 11/11] docs: x86: Change documentation for SVA (Shared Virtual Addressing)

2022-01-28 Thread Fenghua Yu
Since allocating, freeing and fixing up PASID are changed, the
documentation is updated to reflect the changes.

Originally-by: Ashok Raj 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v3:
- Remove PASID refcount description (Thomas).

v2:
- Update life cycle info (Tony and Thomas).
- Update initial PASID value to INVALID_IOASID on fork().

 Documentation/x86/sva.rst | 53 ++-
 1 file changed, 41 insertions(+), 12 deletions(-)

diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
index 076efd51ef1f..1a22020735a3 100644
--- a/Documentation/x86/sva.rst
+++ b/Documentation/x86/sva.rst
@@ -104,18 +104,47 @@ The MSR must be configured on each logical CPU before any 
application
 thread can interact with a device. Threads that belong to the same
 process share the same page tables, thus the same MSR value.
 
-PASID is cleared when a process is created. The PASID allocation and MSR
-programming may occur long after a process and its threads have been created.
-One thread must call iommu_sva_bind_device() to allocate the PASID for the
-process. If a thread uses ENQCMD without the MSR first being populated, a #GP
-will be raised. The kernel will update the PASID MSR with the PASID for all
-threads in the process. A single process PASID can be used simultaneously
-with multiple devices since they all share the same address space.
-
-One thread can call iommu_sva_unbind_device() to free the allocated PASID.
-The kernel will clear the PASID MSR for all threads belonging to the process.
-
-New threads inherit the MSR value from the parent.
+PASID Life Cycle Management
+==
+
+PASID is initialized as INVALID_IOASID (-1) when a process is created.
+
+Only processes that access SVA-capable devices need to have a PASID
+allocated. This allocation happens when a process opens/binds an SVA-capable
+device but finds no PASID for this process. Subsequent binds of the same, or
+other devices will share the same PASID.
+
+Although the PASID is allocated to the process by opening a device,
+it is not active in any of the threads of that process. It's loaded to the
+IA32_PASID MSR lazily when a thread tries to submit a work descriptor
+to a device using the ENQCMD.
+
+That first access will trigger a #GP fault because the IA32_PASID MSR
+has not been initialized with the PASID value assigned to the process
+when the device was opened. The Linux #GP handler notes that a PASID has
+been allocated for the process, and so initializes the IA32_PASID MSR
+and returns so that the ENQCMD instruction is re-executed.
+
+On fork(2) or exec(2) the PASID is removed from the process as it no
+longer has the same address space that it had when the device was opened.
+
+On clone(2) the new task shares the same address space, so will be
+able to use the PASID allocated to the process. The IA32_PASID is not
+preemptively initialized as the PASID value might not be allocated yet or
+the kernel does not know whether this thread is going to access the device
+and the cleared IA32_PASID MSR reduces context switch overhead by xstate
+init optimization. Since #GP faults have to be handled on any threads that
+were created before the PASID was assigned to the mm of the process, newly
+created threads might as well be treated in a consistent way.
+
+Due to complexity of freeing the PASID and clearing all IA32_PASID MSRs in
+all threads in unbind, free the PASID lazily only on mm exit.
+
+If a process does a close(2) of the device file descriptor and munmap(2)
+of the device MMIO portal, then the driver will unbind the device. The
+PASID is still marked VALID in the PASID_MSR for any threads in the
+process that accessed the device. But this is harmless as without the
+MMIO portal they cannot submit new work to the device.
 
 Relationships
 =
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-28 Thread Fenghua Yu
To avoid complexity of updating each thread's PASID status (e.g. sending
IPI to update IA32_PASID MSR) on allocating and freeing PASID, once
allocated and assigned to an mm, the PASID stays with the mm for the
rest of the mm's lifetime. A reference to the PASID is taken on
allocating the PASID. Binding/unbinding the PASID won't change refcount.
The reference is dropped on mm exit and thus the PASID is freed.

Two helpers mm_pasid_set() and mm_pasid_drop() are defined in mm because
the PASID operations handle the pasid member in mm_struct and should be
part of mm operations. Because IOASID's reference count is not used any
more and removed, unused ioasid_get() and iommu_sva_free_pasid()
are deleted and ioasid_put() is renamed to ioasid_free().

20-bit PASID allows up to 1M processes bound to PASIDs at the same time.
With cgroups and other controls that might limit the number of process
creation, the limited number of PASIDs is not a realistic issue for
lazy PASID free.

Suggested-by: Dave Hansen 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v3:
- Rename mm_pasid_get() to mm_pasid_set() (Thomas).
- Remove ioasid_get() because it's not used any more when the IOASID
  is freed on mm exit (Thomas).
- Remove PASID's refcount exercise in ioasid_put() and rename
  ioasid_put() to ioasid_free() (Thomas).

v2:
- Free PASID on mm exit instead of in exit(2) or unbind() (Thomas, AndyL,
  PeterZ)
- Add mm_pasid_init(), mm_pasid_get(), and mm_pasid_drop() functions in mm.
  So the mm's PASID operations are generic for both X86 and ARM
  (Dave Hansen)

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +--
 drivers/iommu/intel/iommu.c   |  4 +-
 drivers/iommu/intel/svm.c |  9 -
 drivers/iommu/ioasid.c| 38 ++
 drivers/iommu/iommu-sva-lib.c | 39 ++-
 drivers/iommu/iommu-sva-lib.h |  1 -
 include/linux/ioasid.h| 12 +-
 include/linux/sched/mm.h  | 16 
 kernel/fork.c |  1 +
 9 files changed, 38 insertions(+), 87 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a737ba5f727e..22ddd05bbdcd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -340,14 +340,12 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
bond->smmu_mn = arm_smmu_mmu_notifier_get(smmu_domain, mm);
if (IS_ERR(bond->smmu_mn)) {
ret = PTR_ERR(bond->smmu_mn);
-   goto err_free_pasid;
+   goto err_free_bond;
}
 
list_add(>list, >bonds);
return >sva;
 
-err_free_pasid:
-   iommu_sva_free_pasid(mm);
 err_free_bond:
kfree(bond);
return ERR_PTR(ret);
@@ -377,7 +375,6 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
if (refcount_dec_and_test(>refs)) {
list_del(>list);
arm_smmu_mmu_notifier_put(bond->smmu_mn);
-   iommu_sva_free_pasid(bond->mm);
kfree(bond);
}
mutex_unlock(_lock);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 92fea3fbbb11..ef03b2176bbd 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4781,7 +4781,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
 link_failed:
spin_unlock_irqrestore(_domain_lock, flags);
if (list_empty(>subdevices) && domain->default_pasid > 0)
-   ioasid_put(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 
return ret;
 }
@@ -4811,7 +4811,7 @@ static void aux_domain_remove_dev(struct dmar_domain 
*domain,
spin_unlock_irqrestore(_domain_lock, flags);
 
if (list_empty(>subdevices) && domain->default_pasid > 0)
-   ioasid_put(domain->default_pasid);
+   ioasid_free(domain->default_pasid);
 }
 
 static int prepare_domain_attach_device(struct iommu_domain *domain,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5b5d69b04fcc..51ac2096b3da 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -514,11 +514,6 @@ static int intel_svm_alloc_pasid(struct device *dev, 
struct mm_struct *mm,
return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
 }
 
-static void intel_svm_free_pasid(struct mm_struct *mm)
-{
-   iommu_sva_free_pasid(mm);
-}
-
 static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
   struct device *dev,
   struct mm_struct *mm,
@@ -662,8 +657,6 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
pasid)
kfree(svm);
 

[PATCH v3 08/11] x86/traps: Demand-populate PASID MSR via #GP

2022-01-28 Thread Fenghua Yu
All tasks start with PASID state disabled. This means that the first
time they execute an ENQCMD instruction they will take a #GP fault.

Modify the #GP fault handler to check if the "mm" for the task has
already been allocated a PASID. If so, try to fix the #GP fault by
loading the IA32_PASID MSR.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Directly write IA32_PASID MSR in fixup while local IRQ is still disabled
  (Thomas)
- Move #ifdef over to CONFIG_IOMMU_SVA since it is what
  defines mm->pasid and ->pasid_activated (Dave Hansen).
- Rename try_fixup_pasid() -> try_fixup_enqcmd_gp(). This
  code really is highly specific to ENQCMD, not PASIDs (Dave Hansen).
- Add lockdep assert and comment about context (Dave Hansen).
- Re-flow the if() mess (Dave Hansen).

 arch/x86/kernel/traps.c | 55 +
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c9d566dcf89a..7ef00dee35be 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -559,6 +560,57 @@ static bool fixup_iopl_exception(struct pt_regs *regs)
return true;
 }
 
+/*
+ * The unprivileged ENQCMD instruction generates #GPs if the
+ * IA32_PASID MSR has not been populated.  If possible, populate
+ * the MSR from a PASID previously allocated to the mm.
+ */
+static bool try_fixup_enqcmd_gp(void)
+{
+#ifdef CONFIG_IOMMU_SVA
+   u32 pasid;
+
+   /*
+* MSR_IA32_PASID is managed using XSAVE.  Directly
+* writing to the MSR is only possible when fpregs
+* are valid and the fpstate is not.  This is
+* guaranteed when handling a userspace exception
+* in *before* interrupts are re-enabled.
+*/
+   lockdep_assert_irqs_disabled();
+
+   /*
+* Hardware without ENQCMD will not generate
+* #GPs that can be fixed up here.
+*/
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return false;
+
+   pasid = current->mm->pasid;
+
+   /*
+* If the mm has not been allocated a
+* PASID, the #GP can not be fixed up.
+*/
+   if (!pasid_valid(pasid))
+   return false;
+
+   /*
+* Did this thread already have its PASID activated?
+* If so, the #GP must be from something else.
+*/
+   if (current->pasid_activated)
+   return false;
+
+   wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
+   current->pasid_activated = 1;
+
+   return true;
+#else
+   return false;
+#endif
+}
+
 DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 {
char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
@@ -567,6 +619,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
unsigned long gp_addr;
int ret;
 
+   if (user_mode(regs) && try_fixup_enqcmd_gp())
+   return;
+
cond_local_irq_enable(regs);
 
if (static_cpu_has(X86_FEATURE_UMIP)) {
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 01/11] iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA

2022-01-28 Thread Fenghua Yu
This CONFIG option originally only referred to the Shared
Virtual Address (SVA) library. But it is now also used for
non-library portions of code.

Drop the "_LIB" suffix so that there is just one configuration
options for all code relating to SVA.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this patch for more meaningful name CONFIG_IOMMU_SVA

 drivers/iommu/Kconfig | 6 +++---
 drivers/iommu/Makefile| 2 +-
 drivers/iommu/intel/Kconfig   | 2 +-
 drivers/iommu/iommu-sva-lib.h | 6 +++---
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 3eb68fa1b8cc..c79a0df090c0 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -144,8 +144,8 @@ config IOMMU_DMA
select IRQ_MSI_IOMMU
select NEED_SG_DMA_LENGTH
 
-# Shared Virtual Addressing library
-config IOMMU_SVA_LIB
+# Shared Virtual Addressing
+config IOMMU_SVA
bool
select IOASID
 
@@ -379,7 +379,7 @@ config ARM_SMMU_V3
 config ARM_SMMU_V3_SVA
bool "Shared Virtual Addressing support for the ARM SMMUv3"
depends on ARM_SMMU_V3
-   select IOMMU_SVA_LIB
+   select IOMMU_SVA
select MMU_NOTIFIER
help
  Support for sharing process address spaces with devices using the
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index bc7f730edbb0..44475a9b3eea 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,6 +27,6 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
 obj-$(CONFIG_APPLE_DART) += apple-dart.o
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 247d0f2d5fdf..39a06d245f12 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -52,7 +52,7 @@ config INTEL_IOMMU_SVM
select PCI_PRI
select MMU_NOTIFIER
select IOASID
-   select IOMMU_SVA_LIB
+   select IOMMU_SVA
help
  Shared Virtual Memory (SVM) provides a facility for devices
  to access DMA resources through process address space by
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 031155010ca8..95dc3ebc1928 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -17,7 +17,7 @@ struct device;
 struct iommu_fault;
 struct iopf_queue;
 
-#ifdef CONFIG_IOMMU_SVA_LIB
+#ifdef CONFIG_IOMMU_SVA
 int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
 
 int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
@@ -28,7 +28,7 @@ struct iopf_queue *iopf_queue_alloc(const char *name);
 void iopf_queue_free(struct iopf_queue *queue);
 int iopf_queue_discard_partial(struct iopf_queue *queue);
 
-#else /* CONFIG_IOMMU_SVA_LIB */
+#else /* CONFIG_IOMMU_SVA */
 static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
 {
return -ENODEV;
@@ -64,5 +64,5 @@ static inline int iopf_queue_discard_partial(struct 
iopf_queue *queue)
 {
return -ENODEV;
 }
-#endif /* CONFIG_IOMMU_SVA_LIB */
+#endif /* CONFIG_IOMMU_SVA */
 #endif /* _IOMMU_SVA_LIB_H */
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 07/11] sched: Define and initialize a flag to identify valid PASID in the task

2022-01-28 Thread Fenghua Yu
From: Peter Zijlstra 

Add a new single bit field to the task structure to track whether this task
has initialized the IA32_PASID MSR to the mm's PASID.

Initialize the field to zero when creating a new task with fork/clone.

Signed-off-by: Peter Zijlstra 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change condition to more accurate CONFIG_IOMMU_SVA (Jacob)

 include/linux/sched.h | 3 +++
 kernel/fork.c | 4 
 2 files changed, 7 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f5b2be39a78c..812e40c5bde5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -938,6 +938,9 @@ struct task_struct {
/* Recursion prevention for eventfd_signal() */
unsignedin_eventfd_signal:1;
 #endif
+#ifdef CONFIG_IOMMU_SVA
+   unsignedpasid_activated:1;
+#endif
 
unsigned long   atomic_flags; /* Flags requiring atomic 
access. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index c03c6682464c..51fd1df994b7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -968,6 +968,10 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
tsk->use_memdelay = 0;
 #endif
 
+#ifdef CONFIG_IOMMU_SVA
+   tsk->pasid_activated = 0;
+#endif
+
 #ifdef CONFIG_MEMCG
tsk->active_memcg = NULL;
 #endif
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 09/11] x86/cpufeatures: Re-enable ENQCMD

2022-01-28 Thread Fenghua Yu
Since ENQCMD is handled by #GP fix up, it can be re-enabled.

The ENQCMD feature can only be used if CONFIG_INTEL_IOMMU_SVM is set. Add
X86_FEATURE_ENQCMD to the disabled features mask as appropriate so that
cpu_feature_enabled() can be used to check the feature.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Update the commit message (Tony).

 arch/x86/include/asm/disabled-features.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 8f28fafa98b3..1231d63f836d 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -56,8 +56,11 @@
 # define DISABLE_PTI   (1 << (X86_FEATURE_PTI & 31))
 #endif
 
-/* Force disable because it's broken beyond repair */
-#define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
+#ifdef CONFIG_INTEL_IOMMU_SVM
+# define DISABLE_ENQCMD0
+#else
+# define DISABLE_ENQCMD(1 << (X86_FEATURE_ENQCMD & 31))
+#endif
 
 #ifdef CONFIG_X86_SGX
 # define DISABLE_SGX   0
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 04/11] kernel/fork: Initialize mm's PASID

2022-01-28 Thread Fenghua Yu
A new mm doesn't have a PASID yet when it's created. Initialize
the mm's PASID on fork() or for init_mm to INVALID_IOASID (-1).

Suggested-by: Dave Hansen 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change condition to more accurate CONFIG_IOMMU_SVA (Jacob)

 include/linux/sched/mm.h | 10 ++
 kernel/fork.c| 10 ++
 mm/init-mm.c |  4 
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index aa5f09ca5bcf..c74d1edbac2f 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Routines for handling mm_structs
@@ -433,4 +434,13 @@ static inline void membarrier_update_current_mm(struct 
mm_struct *next_mm)
 }
 #endif
 
+#ifdef CONFIG_IOMMU_SVA
+static inline void mm_pasid_init(struct mm_struct *mm)
+{
+   mm->pasid = INVALID_IOASID;
+}
+#else
+static inline void mm_pasid_init(struct mm_struct *mm) {}
+#endif
+
 #endif /* _LINUX_SCHED_MM_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 6ee7551d3bd2..deacd2c17a7f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -97,6 +97,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1019,13 +1020,6 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 #endif
 }
 
-static void mm_init_pasid(struct mm_struct *mm)
-{
-#ifdef CONFIG_IOMMU_SVA
-   mm->pasid = INIT_PASID;
-#endif
-}
-
 static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
@@ -1054,7 +1048,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
-   mm_init_pasid(mm);
+   mm_pasid_init(mm);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
diff --git a/mm/init-mm.c b/mm/init-mm.c
index b4a6f38fb51d..fbe7844d0912 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #ifndef INIT_MM_CONTEXT
@@ -38,6 +39,9 @@ struct mm_struct init_mm = {
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
.user_ns= _user_ns,
.cpu_bitmap = CPU_BITS_NONE,
+#ifdef CONFIG_IOMMU_SVA
+   .pasid  = INVALID_IOASID,
+#endif
INIT_MM_CONTEXT(init_mm)
 };
 
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 06/11] x86/fpu: Clear PASID when copying fpstate

2022-01-28 Thread Fenghua Yu
The kernel must allocate a Process Address Space ID (PASID) on behalf of
each process which will use ENQCMD and program it into the new MSR to
communicate the process identity to platform hardware. ENQCMD uses the
PASID stored in this MSR to tag requests from this process.

The PASID state must be cleared on fork() since fork creates a
new address space.

For clone(), it would be functionally OK to copy the PASID. However,
clearing it is _also_ functionally OK since any PASID use will trigger
the #GP handler to populate the MSR.

Copying the PASID state has two main downsides:
 * It requires differentiating fork() and clone() in the code,
   both in the FPU code and keeping tsk->pasid_activated consistent.
 * It guarantees that the PASID is out of its init state, which
   incurs small but non-zero cost on every XSAVE/XRSTOR.

The main downside of clearing the PASID at fpstate copy is the future,
one-time #GP for the thread.

Use the simplest approach: clear the PASID state both on clone() and
fork().  Rely on the #GP handler for MSR population in children.

Also, just clear the PASID bit from xfeatures if XSAVE is supported.
This will have no effect on systems that do not have PASID support.  It
is virtually zero overhead because 'dst_fpu' was just written and
the whole thing is cache hot.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Rewrite changelog (Dave Hansen).
- Move xfeature tweaking into fpu_clone() and make it unconditional
  if XSAVE is supported (Dave Hansen).

 arch/x86/kernel/fpu/core.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8dea01ffc5c1..19821f027cb3 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -612,6 +612,13 @@ int fpu_clone(struct task_struct *dst, unsigned long 
clone_flags)
fpu_inherit_perms(dst_fpu);
fpregs_unlock();
 
+   /*
+* Children never inherit PASID state.
+* Force it to have its init value:
+*/
+   if (use_xsave())
+   dst_fpu->fpstate->regs.xsave.header.xfeatures &= 
~XFEATURE_MASK_PASID;
+
trace_x86_fpu_copy_src(src_fpu);
trace_x86_fpu_copy_dst(dst_fpu);
 
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 00/11] Re-enable ENQCMD and PASID MSR

2022-01-28 Thread Fenghua Yu
Problems in the old code to manage SVM (Shared Virtual Memory) devices
and the PASID (Process Address Space ID) led to that code being
disabled.

Subsequent discussions resulted in a far simpler approach:

1) PASID life cycle is from first allocation by a process until that
   process exits.
2) All tasks begin with PASID disabled
3) The #GP fault handler tries to fix faulting ENQCMD instructions very
   early (thus avoiding complexities of the XSAVE infrastructure)

Change Log:
v3:
- Rename mm_pasid_get() to mm_pasid_set() in patch #5 (Thomas).
- Remove ioasid_get() because it's not used any more when the IOASID
  is freed on mm exit in patch #5 (Thomas).
- Remove PASID's refcount exercise in ioasid_put() and rename
  ioasid_put() to ioasid_free() in patch #5 and #11 (Thomas).
- Add Acked-by: Josh Poimboeuf  in patch #10.

v2 can be found at 
https://lore.kernel.org/lkml/20211217220136.2762116-1-fenghua...@intel.com/

v2:
- Free PASID on mm exit instead of in exit(2) or unbind() (Thomas, AndyL,
  PeterZ)
- Directly write IA32_PASID MSR in fixup while local IRQ is still disabled
  (Thomas)
- Simplify handling ENQCMD in objtool (PeterZ and Josh)
- Define mm_pasid_get(), mm_pasid_drop(), and mm_pasid_init() in mm and
  call the functions from IOMMU (Dave Hansen).
- A few changes in the #GP fixup function (Dave Hansen, Tony Luck).
- Initial PASID value is changed to INVALID_PASID (Ashok Raj and
  Jacob Pan).
- Add mm_pasid_init(), mm_pasid_get(), and mm_pasid_drop() functions in mm.
  So the mm's PASID operations are generic for both X86 and ARM
  (Dave Hansen).
- Rename CONFIG_IOMMU_SVA_LIB to more useful and accurate
  CONFIG_IOMMU_SVA
- Use CONFIG_IOMMU_SVA for PASID processing condition (Jacob)
- The patch that cleans up old update_pasid() function is in upstream
  now (commit: 00ecd5401349 "iommu/vt-d: Clean up unused PASID updating
  functions") and therefore it's removed from this version.

v1 can be found at 
https://lore.kernel.org/lkml/20210920192349.2602141-1-fenghua...@intel.com/T/#md6d542091da1d1159eda0a44a16e57d0c0dfb209

Fenghua Yu (10):
  iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
  mm: Change CONFIG option for mm->pasid field
  iommu/ioasid: Introduce a helper to check for valid PASIDs
  kernel/fork: Initialize mm's PASID
  iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm
exit
  x86/fpu: Clear PASID when copying fpstate
  x86/traps: Demand-populate PASID MSR via #GP
  x86/cpufeatures: Re-enable ENQCMD
  tools/objtool: Check for use of the ENQCMD instruction in the kernel
  docs: x86: Change documentation for SVA (Shared Virtual Addressing)

Peter Zijlstra (1):
  sched: Define and initialize a flag to identify valid PASID in the
task

 Documentation/x86/sva.rst | 53 ++
 arch/x86/include/asm/disabled-features.h  |  7 ++-
 arch/x86/kernel/fpu/core.c|  7 +++
 arch/x86/kernel/traps.c   | 55 +++
 drivers/iommu/Kconfig |  6 +-
 drivers/iommu/Makefile|  2 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +-
 drivers/iommu/intel/Kconfig   |  2 +-
 drivers/iommu/intel/iommu.c   |  4 +-
 drivers/iommu/intel/svm.c |  9 ---
 drivers/iommu/ioasid.c| 38 ++---
 drivers/iommu/iommu-sva-lib.c | 39 -
 drivers/iommu/iommu-sva-lib.h |  7 +--
 include/linux/ioasid.h| 21 +++
 include/linux/mm_types.h  |  2 +-
 include/linux/sched.h |  3 +
 include/linux/sched/mm.h  | 26 +
 kernel/fork.c | 15 +++--
 mm/init-mm.c  |  4 ++
 tools/objtool/arch/x86/decode.c   | 11 +++-
 20 files changed, 197 insertions(+), 119 deletions(-)

-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 02/11] mm: Change CONFIG option for mm->pasid field

2022-01-28 Thread Fenghua Yu
This currently depends on CONFIG_IOMMU_SUPPORT. But it is only
needed when CONFIG_IOMMU_SVA option is enabled.

Change the CONFIG guards around definition and initialization
of mm->pasid field.

Suggested-by: Jacob Pan 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change condition to more accurate CONFIG_IOMMU_SVA (Jacob and Tony)

 include/linux/mm_types.h | 2 +-
 kernel/fork.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 9db36dc5d4cf..c959116abd95 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -630,7 +630,7 @@ struct mm_struct {
 #endif
struct work_struct async_put_work;
 
-#ifdef CONFIG_IOMMU_SUPPORT
+#ifdef CONFIG_IOMMU_SVA
u32 pasid;
 #endif
} __randomize_layout;
diff --git a/kernel/fork.c b/kernel/fork.c
index d75a528f7b21..6ee7551d3bd2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1021,7 +1021,7 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 
 static void mm_init_pasid(struct mm_struct *mm)
 {
-#ifdef CONFIG_IOMMU_SUPPORT
+#ifdef CONFIG_IOMMU_SVA
mm->pasid = INIT_PASID;
 #endif
 }
-- 
2.35.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-27 Thread Fenghua Yu
Hi, Thomas,

On Wed, Jan 26, 2022 at 10:38:04PM +0100, Thomas Gleixner wrote:
> On Wed, Jan 26 2022 at 09:36, Fenghua Yu wrote:
> > On Wed, Jan 26, 2022 at 03:23:42PM +0100, Thomas Gleixner wrote:
> >> On Tue, Jan 25 2022 at 07:18, Fenghua Yu wrote:
> >> While looking at ioasid_put() usage I tripped over the following UAF
> >> issue:
> >> 
> >> --- a/drivers/iommu/intel/iommu.c
> >> +++ b/drivers/iommu/intel/iommu.c
> >> @@ -4817,8 +4817,10 @@ static int aux_domain_add_dev(struct dma
> >>auxiliary_unlink_device(domain, dev);
> >>  link_failed:
> >>spin_unlock_irqrestore(_domain_lock, flags);
> >> -  if (list_empty(>subdevices) && domain->default_pasid > 0)
> >> +  if (list_empty(>subdevices) && domain->default_pasid > 0) {
> >>ioasid_put(domain->default_pasid);
> >> +  domain->default_pasid = INVALID_IOASID;
> >> +  }
> >>  
> >>return ret;
> >>  }
> >> @@ -4847,8 +4849,10 @@ static void aux_domain_remove_dev(struct
> >>  
> >>spin_unlock_irqrestore(_domain_lock, flags);
> >>  
> >> -  if (list_empty(>subdevices) && domain->default_pasid > 0)
> >> +  if (list_empty(>subdevices) && domain->default_pasid > 0) {
> >>ioasid_put(domain->default_pasid);
> >> +  domain->default_pasid = INVALID_IOASID;
> >> +  }
> >>  }
> >>  
> >>  static int prepare_domain_attach_device(struct iommu_domain *domain,
> >
> > The above patch fixes an existing issue. I will put it in a separate patch,
> > right?
> 
> Correct.
> 
> > It cannot be applied cleanly to the upstream tree. Do you want me to base
> > the above patch (and the whole patch set) to the upstream tree or a specific
> > tip branch?
> 
> Against Linus tree please so that the bugfix applies.
> 
> > I will fold the following patch into patch #5. The patch #11 (the doc patch)
> > also needs to remove one paragraph talking about refcount.
> >
> > So I will send the whole patch set with the following changes:
> > 1. One new bug fix patch (the above patch)

When I study your above aux_domain bug fix path, I find more aux_domain bugs.
But then I find aux_domain will be completely removed because all aux_domain
related callbacks are not called and are dead code (no wonder there are
so many bugs in aux_domain). Please see this series: 
https://lore.kernel.org/linux-iommu/20220124071103.2097118-4-baolu...@linux.intel.com/
For the series, Baolu confirms that he is "pretty sure that should be part
of v5.18". And I don't find the series calls any IOASID function after
removing the aux_domain code.

So that means we don't need to fix those issues in the dead aux_domain
code any more because it will be completely removed in 5.18, right?

If you agree, I will not include the aux_domain fix patch or any other
aux_domain fix patches in the up-coming v3. Is that OK?

> > 2. Updated patch #5 (with the following patch folded)

I will still change ioasid_put() in the aux_domain callbacks to ioasid_free()
in patch #5. So compilation will pass. Baolu's series will remove
the entire aux_domain code in 5.18.

> > 3. Updated patch #11 (removing refcount description)
> 
> Looks good.
> 

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-26 Thread Fenghua Yu
Hi, Thomas,

On Wed, Jan 26, 2022 at 03:23:42PM +0100, Thomas Gleixner wrote:
> On Tue, Jan 25 2022 at 07:18, Fenghua Yu wrote:
> > On Mon, Jan 24, 2022 at 09:55:56PM +0100, Thomas Gleixner wrote:
> >  /**
> >   * ioasid_put - Release a reference to an ioasid
> >   * @ioasid: the ID to remove
> 
> which in turn makes ioasid_put() a misnomer and the whole refcounting of
> the ioasid a pointless exercise.
> 
> While looking at ioasid_put() usage I tripped over the following UAF
> issue:
> 
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4817,8 +4817,10 @@ static int aux_domain_add_dev(struct dma
>   auxiliary_unlink_device(domain, dev);
>  link_failed:
>   spin_unlock_irqrestore(_domain_lock, flags);
> - if (list_empty(>subdevices) && domain->default_pasid > 0)
> + if (list_empty(>subdevices) && domain->default_pasid > 0) {
>   ioasid_put(domain->default_pasid);
> + domain->default_pasid = INVALID_IOASID;
> + }
>  
>   return ret;
>  }
> @@ -4847,8 +4849,10 @@ static void aux_domain_remove_dev(struct
>  
>   spin_unlock_irqrestore(_domain_lock, flags);
>  
> - if (list_empty(>subdevices) && domain->default_pasid > 0)
> + if (list_empty(>subdevices) && domain->default_pasid > 0) {
>   ioasid_put(domain->default_pasid);
> + domain->default_pasid = INVALID_IOASID;
> + }
>  }
>  
>  static int prepare_domain_attach_device(struct iommu_domain *domain,

The above patch fixes an existing issue. I will put it in a separate patch,
right?

It cannot be applied cleanly to the upstream tree. Do you want me to base
the above patch (and the whole patch set) to the upstream tree or a specific
tip branch?

I will fold the following patch into patch #5. The patch #11 (the doc patch)
also needs to remove one paragraph talking about refcount.

So I will send the whole patch set with the following changes:
1. One new bug fix patch (the above patch)
2. Updated patch #5 (with the following patch folded)
3. Updated patch #11 (removing refcount description)

Are the changes OK to you?
 
> 
> Vs. ioasid_put() I think we should fold the following:
> 
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4818,7 +4818,7 @@ static int aux_domain_add_dev(struct dma
>  link_failed:
>   spin_unlock_irqrestore(_domain_lock, flags);
>   if (list_empty(>subdevices) && domain->default_pasid > 0) {
> - ioasid_put(domain->default_pasid);
> + ioasid_free(domain->default_pasid);
>   domain->default_pasid = INVALID_IOASID;
>   }
>  
> @@ -4850,7 +4850,7 @@ static void aux_domain_remove_dev(struct
>   spin_unlock_irqrestore(_domain_lock, flags);
>  
>   if (list_empty(>subdevices) && domain->default_pasid > 0) {
> - ioasid_put(domain->default_pasid);
> + ioasid_free(domain->default_pasid);
>   domain->default_pasid = INVALID_IOASID;
>   }
>  }
> --- a/drivers/iommu/ioasid.c
> +++ b/drivers/iommu/ioasid.c
> @@ -2,7 +2,7 @@
>  /*
>   * I/O Address Space ID allocator. There is one global IOASID space, split 
> into
>   * subsets. Users create a subset with DECLARE_IOASID_SET, then allocate and
> - * free IOASIDs with ioasid_alloc and ioasid_put.
> + * free IOASIDs with ioasid_alloc() and ioasid_free().
>   */
>  #include 
>  #include 
> @@ -15,7 +15,6 @@ struct ioasid_data {
>   struct ioasid_set *set;
>   void *private;
>   struct rcu_head rcu;
> - refcount_t refs;
>  };
>  
>  /*
> @@ -315,7 +314,6 @@ ioasid_t ioasid_alloc(struct ioasid_set
>  
>   data->set = set;
>   data->private = private;
> - refcount_set(>refs, 1);
>  
>   /*
>* Custom allocator needs allocator data to perform platform specific
> @@ -348,17 +346,11 @@ ioasid_t ioasid_alloc(struct ioasid_set
>  EXPORT_SYMBOL_GPL(ioasid_alloc);
>  
>  /**
> - * ioasid_put - Release a reference to an ioasid
> + * ioasid_free - Free an ioasid
>   * @ioasid: the ID to remove
> - *
> - * Put a reference to the IOASID, free it when the number of references 
> drops to
> - * zero.
> - *
> - * Return: %true if the IOASID was freed, %false otherwise.
>   */
> -bool ioasid_put(ioasid_t ioasid)
> +void ioasid_free(ioasid_t ioasid)
>  {
> - bool free = false;
>   struct ioasid_data *ioasid_data;
>  
>   spin_lock(_allocator_lock);
> @@ -368,10 +360,6 @@ bool ioasid_put(ioasid_t ioasid)
>   goto exit_unlock;
>   }
>  

Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-25 Thread Fenghua Yu
Hi, Thomas,

On Mon, Jan 24, 2022 at 09:55:56PM +0100, Thomas Gleixner wrote:
> On Mon, Jan 24 2022 at 12:52, Fenghua Yu wrote:
> > On Mon, Jan 24, 2022 at 09:36:00PM +0100, Thomas Gleixner wrote:
> >> On Mon, Jan 24 2022 at 21:21, Thomas Gleixner wrote:
> > Ah. This patch should remove ioasid_get(). So I will change this patch
> > as follows:
> >
> > 1. Remove ioasid_get() because it's not used any more when IOASID's
> >refcount is set to 1 once the IOASID is allocated and is freed on mm 
> > exit.
> > 2. Change mm_pasid_get() to mm_pasid_set().
> 
> Yes. Just resend this one. No need to post the full queue again.

Here is the updated patch #5. Thank you very much for your review!

>From 895f2c15385dc93acf39700014458fe275d996a2 Mon Sep 17 00:00:00 2001
From: Fenghua Yu 
Date: Thu, 18 Nov 2021 14:06:40 -0800
Subject: [PATCH v2.1 05/11] iommu/sva: Assign a PASID to mm on PASID
 allocation and free it on mm exit

To avoid complexity of updating each thread's PASID status (e.g. sending
IPI to update IA32_PASID MSR) on allocating and freeing PASID, once
allocated and assigned to an mm, the PASID stays with the mm for the
rest of the mm's lifetime. A reference to the PASID is taken on
allocating the PASID. Binding/unbinding the PASID won't change refcount.
The reference is dropped on mm exit and thus the PASID is freed.

Two helpers mm_pasid_set() and mm_pasid_drop() are defined in mm because
the PASID operations handle the pasid member in mm_struct and should be
part of mm operations. Unused ioasid_get() and iommu_sva_free_pasid()
are deleted.

20-bit PASID allows up to 1M processes bound to PASIDs at the same time.
With cgroups and other controls that might limit the number of process
creation, the limited number of PASIDs is not a realistic issue for
lazy PASID free.

Suggested-by: Dave Hansen 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2.1:
- Rename mm_pasid_get() to mm_pasid_set() (Thomas).
- Remove ioasid_get() because it's not used any more when IOASID's
  refcount is set to 1 once the IOASID is allocated and is freed
  on mm exit (Thomas).

v2:
- Free PASID on mm exit instead of in exit(2) or unbind() (Thomas, AndyL,
  PeterZ)
- Add mm_pasid_init(), mm_pasid_get(), and mm_pasid_drop() functions in mm.
  So the mm's PASID operations are generic for both X86 and ARM
  (Dave Hansen)

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +--
 drivers/iommu/intel/svm.c |  9 -
 drivers/iommu/ioasid.c| 17 
 drivers/iommu/iommu-sva-lib.c | 39 ++-
 drivers/iommu/iommu-sva-lib.h |  1 -
 include/linux/ioasid.h|  5 ---
 include/linux/sched/mm.h  | 16 
 kernel/fork.c |  1 +
 8 files changed, 30 insertions(+), 63 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a737ba5f727e..22ddd05bbdcd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -340,14 +340,12 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
bond->smmu_mn = arm_smmu_mmu_notifier_get(smmu_domain, mm);
if (IS_ERR(bond->smmu_mn)) {
ret = PTR_ERR(bond->smmu_mn);
-   goto err_free_pasid;
+   goto err_free_bond;
}
 
list_add(>list, >bonds);
return >sva;
 
-err_free_pasid:
-   iommu_sva_free_pasid(mm);
 err_free_bond:
kfree(bond);
return ERR_PTR(ret);
@@ -377,7 +375,6 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
if (refcount_dec_and_test(>refs)) {
list_del(>list);
arm_smmu_mmu_notifier_put(bond->smmu_mn);
-   iommu_sva_free_pasid(bond->mm);
kfree(bond);
}
mutex_unlock(_lock);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5b5d69b04fcc..51ac2096b3da 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -514,11 +514,6 @@ static int intel_svm_alloc_pasid(struct device *dev, 
struct mm_struct *mm,
return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
 }
 
-static void intel_svm_free_pasid(struct mm_struct *mm)
-{
-   iommu_sva_free_pasid(mm);
-}
-
 static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
   struct device *dev,
   struct mm_struct *mm,
@@ -662,8 +657,6 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
pasid)
kfree(svm);
}
}
-   /* Drop a PASID reference and free it if no reference. */
-   intel_svm_free_pasid(mm);
}
 out:
retur

Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-24 Thread Fenghua Yu
Hi, Thomas,

On Mon, Jan 24, 2022 at 09:36:00PM +0100, Thomas Gleixner wrote:
> On Mon, Jan 24 2022 at 21:21, Thomas Gleixner wrote:
> >
> > Hrm. This is odd.
> >
> >> +/* Associate a PASID with an mm_struct: */
> >> +static inline void mm_pasid_get(struct mm_struct *mm, u32 pasid)
> >> +{
> >> +  mm->pasid = pasid;
> >> +}
> >
> > This does not get anything. It sets the allocated PASID in the mm. The
> > refcount on the PASID was already taken by the allocation. So this
> > should be mm_pasid_set() or mm_pasid_install(), right?
> 
> And as a result of all this ioasid_get() is now left without users...
> 
> Thanks,

Ah. This patch should remove ioasid_get(). So I will change this patch
as follows:

1. Remove ioasid_get() because it's not used any more when IOASID's
   refcount is set to 1 once the IOASID is allocated and is freed on mm exit.
2. Change mm_pasid_get() to mm_pasid_set().

Will that work?

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-01-24 Thread Fenghua Yu
Hi, Thomas,

On Mon, Jan 24, 2022 at 09:21:24PM +0100, Thomas Gleixner wrote:
> On Fri, Dec 17 2021 at 22:01, Fenghua Yu wrote:
> > diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> > index bd41405d34e9..ee2294e02716 100644
> > --- a/drivers/iommu/iommu-sva-lib.c
> > +++ b/drivers/iommu/iommu-sva-lib.c
> > @@ -18,8 +18,7 @@ static DECLARE_IOASID_SET(iommu_sva_pasid);
> >   *
> >   * Try to allocate a PASID for this mm, or take a reference to the 
> > existing one
> >   * provided it fits within the [@min, @max] range. On success the PASID is
> > - * available in mm->pasid, and must be released with 
> > iommu_sva_free_pasid().
> > - * @min must be greater than 0, because 0 indicates an unused mm->pasid.
> > + * available in mm->pasid and will be available for the lifetime of the mm.
> >   *
> >   * Returns 0 on success and < 0 on error.
> >   */
> > @@ -33,38 +32,24 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, 
> > ioasid_t min, ioasid_t max)
> > return -EINVAL;
> >  
> > mutex_lock(_sva_lock);
> > -   if (mm->pasid) {
> > -   if (mm->pasid >= min && mm->pasid <= max)
> > -   ioasid_get(mm->pasid);
> > -   else
> > +   /* Is a PASID already associated with this mm? */
> > +   if (pasid_valid(mm->pasid)) {
> > +   if (mm->pasid < min || mm->pasid >= max)
> > ret = -EOVERFLOW;
> > -   } else {
> > -   pasid = ioasid_alloc(_sva_pasid, min, max, mm);
> > -   if (pasid == INVALID_IOASID)
> > -   ret = -ENOMEM;
> > -   else
> > -   mm->pasid = pasid;
> > +   goto out;
> > }
> > +
> > +   pasid = ioasid_alloc(_sva_pasid, min, max, mm);
> > +   if (!pasid_valid(pasid))
> > +   ret = -ENOMEM;
> > +   else
> > +   mm_pasid_get(mm, pasid);
> 
> Hrm. This is odd.
> 
> > +/* Associate a PASID with an mm_struct: */
> > +static inline void mm_pasid_get(struct mm_struct *mm, u32 pasid)
> > +{
> > +   mm->pasid = pasid;
> > +}
> 
> This does not get anything. It sets the allocated PASID in the mm. The
> refcount on the PASID was already taken by the allocation. So this
> should be mm_pasid_set() or mm_pasid_install(), right?

Ok. Will change mm_pasid_get() to mm_pasid_set().

Thank you very much for your review!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/11] Re-enable ENQCMD and PASID MSR

2021-12-27 Thread Fenghua Yu
Hi, Dear Maintainers,

On Fri, Dec 17, 2021 at 10:01:25PM +, Fenghua Yu wrote:
> Problems in the old code to manage SVM (Shared Virtual Memory) devices
> and the PASID (Process Address Space ID) led to that code being
> disabled.
> 
> Subsequent discussions resulted in a far simpler approach:
> 
> 1) PASID life cycle is from first allocation by a process until that
>process exits.
> 2) All tasks begin with PASID disabled
> 3) The #GP fault handler tries to fix faulting ENQCMD instructions very
>early (thus avoiding complexities of the XSAVE infrastructure)

Any comments on this series?

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 10/11] tools/objtool: Check for use of the ENQCMD instruction in the kernel

2021-12-27 Thread Fenghua Yu
Hi, Josh,

On Fri, Dec 17, 2021 at 02:57:06PM -0800, Josh Poimboeuf wrote:
> On Fri, Dec 17, 2021 at 10:01:35PM +0000, Fenghua Yu wrote:
> > The ENQCMD implicitly accesses the PASID_MSR to fill in the pasid field
> > of the descriptor being submitted to an accelerator. But there is no
> > precise (and stable across kernel changes) point at which the PASID_MSR
> > is updated from the value for one task to the next.
> > 
> > Kernel code that uses accelerators must always use the ENQCMDS instruction
> > which does not access the PASID_MSR.
> > 
> > Check for use of the ENQCMD instruction in the kernel and warn on its
> > usage.
> > 
> > Signed-off-by: Fenghua Yu 
> > Reviewed-by: Tony Luck 
> > ---
> > v2:
> > - Simplify handling ENQCMD (PeterZ and Josh)
> 
> Acked-by: Josh Poimboeuf 

Thank you!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 01/11] iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA

2021-12-17 Thread Fenghua Yu
This CONFIG option originally only referred to the Shared
Virtual Address (SVA) library. But it is now also used for
non-library portions of code.

Drop the "_LIB" suffix so that there is just one configuration
options for all code relating to SVA.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Add this patch for more meaningful name CONFIG_IOMMU_SVA

 drivers/iommu/Kconfig | 6 +++---
 drivers/iommu/Makefile| 2 +-
 drivers/iommu/intel/Kconfig   | 2 +-
 drivers/iommu/iommu-sva-lib.h | 6 +++---
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 3eb68fa1b8cc..c79a0df090c0 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -144,8 +144,8 @@ config IOMMU_DMA
select IRQ_MSI_IOMMU
select NEED_SG_DMA_LENGTH
 
-# Shared Virtual Addressing library
-config IOMMU_SVA_LIB
+# Shared Virtual Addressing
+config IOMMU_SVA
bool
select IOASID
 
@@ -379,7 +379,7 @@ config ARM_SMMU_V3
 config ARM_SMMU_V3_SVA
bool "Shared Virtual Addressing support for the ARM SMMUv3"
depends on ARM_SMMU_V3
-   select IOMMU_SVA_LIB
+   select IOMMU_SVA
select MMU_NOTIFIER
help
  Support for sharing process address spaces with devices using the
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index bc7f730edbb0..44475a9b3eea 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,6 +27,6 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
 obj-$(CONFIG_APPLE_DART) += apple-dart.o
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index 247d0f2d5fdf..39a06d245f12 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -52,7 +52,7 @@ config INTEL_IOMMU_SVM
select PCI_PRI
select MMU_NOTIFIER
select IOASID
-   select IOMMU_SVA_LIB
+   select IOMMU_SVA
help
  Shared Virtual Memory (SVM) provides a facility for devices
  to access DMA resources through process address space by
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 031155010ca8..95dc3ebc1928 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -17,7 +17,7 @@ struct device;
 struct iommu_fault;
 struct iopf_queue;
 
-#ifdef CONFIG_IOMMU_SVA_LIB
+#ifdef CONFIG_IOMMU_SVA
 int iommu_queue_iopf(struct iommu_fault *fault, void *cookie);
 
 int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev);
@@ -28,7 +28,7 @@ struct iopf_queue *iopf_queue_alloc(const char *name);
 void iopf_queue_free(struct iopf_queue *queue);
 int iopf_queue_discard_partial(struct iopf_queue *queue);
 
-#else /* CONFIG_IOMMU_SVA_LIB */
+#else /* CONFIG_IOMMU_SVA */
 static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
 {
return -ENODEV;
@@ -64,5 +64,5 @@ static inline int iopf_queue_discard_partial(struct 
iopf_queue *queue)
 {
return -ENODEV;
 }
-#endif /* CONFIG_IOMMU_SVA_LIB */
+#endif /* CONFIG_IOMMU_SVA */
 #endif /* _IOMMU_SVA_LIB_H */
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 08/11] x86/traps: Demand-populate PASID MSR via #GP

2021-12-17 Thread Fenghua Yu
All tasks start with PASID state disabled. This means that the first
time they execute an ENQCMD instruction they will take a #GP fault.

Modify the #GP fault handler to check if the "mm" for the task has
already been allocated a PASID. If so, try to fix the #GP fault by
loading the IA32_PASID MSR.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Directly write IA32_PASID MSR in fixup while local IRQ is still disabled
  (Thomas)
- Move #ifdef over to CONFIG_IOMMU_SVA since it is what
  defines mm->pasid and ->pasid_activated (Dave Hansen).
- Rename try_fixup_pasid() -> try_fixup_enqcmd_gp(). This
  code really is highly specific to ENQCMD, not PASIDs (Dave Hansen).
- Add lockdep assert and comment about context (Dave Hansen).
- Re-flow the if() mess (Dave Hansen).

 arch/x86/kernel/traps.c | 55 +
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c9d566dcf89a..7ef00dee35be 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -559,6 +560,57 @@ static bool fixup_iopl_exception(struct pt_regs *regs)
return true;
 }
 
+/*
+ * The unprivileged ENQCMD instruction generates #GPs if the
+ * IA32_PASID MSR has not been populated.  If possible, populate
+ * the MSR from a PASID previously allocated to the mm.
+ */
+static bool try_fixup_enqcmd_gp(void)
+{
+#ifdef CONFIG_IOMMU_SVA
+   u32 pasid;
+
+   /*
+* MSR_IA32_PASID is managed using XSAVE.  Directly
+* writing to the MSR is only possible when fpregs
+* are valid and the fpstate is not.  This is
+* guaranteed when handling a userspace exception
+* in *before* interrupts are re-enabled.
+*/
+   lockdep_assert_irqs_disabled();
+
+   /*
+* Hardware without ENQCMD will not generate
+* #GPs that can be fixed up here.
+*/
+   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+   return false;
+
+   pasid = current->mm->pasid;
+
+   /*
+* If the mm has not been allocated a
+* PASID, the #GP can not be fixed up.
+*/
+   if (!pasid_valid(pasid))
+   return false;
+
+   /*
+* Did this thread already have its PASID activated?
+* If so, the #GP must be from something else.
+*/
+   if (current->pasid_activated)
+   return false;
+
+   wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
+   current->pasid_activated = 1;
+
+   return true;
+#else
+   return false;
+#endif
+}
+
 DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 {
char desc[sizeof(GPFSTR) + 50 + 2*sizeof(unsigned long) + 1] = GPFSTR;
@@ -567,6 +619,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
unsigned long gp_addr;
int ret;
 
+   if (user_mode(regs) && try_fixup_enqcmd_gp())
+   return;
+
cond_local_irq_enable(regs);
 
if (static_cpu_has(X86_FEATURE_UMIP)) {
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 09/11] x86/cpufeatures: Re-enable ENQCMD

2021-12-17 Thread Fenghua Yu
Since ENQCMD is handled by #GP fix up, it can be re-enabled.

The ENQCMD feature can only be used if CONFIG_INTEL_IOMMU_SVM is set. Add
X86_FEATURE_ENQCMD to the disabled features mask as appropriate so that
cpu_feature_enabled() can be used to check the feature.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Update the commit message (Tony).

 arch/x86/include/asm/disabled-features.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 8f28fafa98b3..1231d63f836d 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -56,8 +56,11 @@
 # define DISABLE_PTI   (1 << (X86_FEATURE_PTI & 31))
 #endif
 
-/* Force disable because it's broken beyond repair */
-#define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
+#ifdef CONFIG_INTEL_IOMMU_SVM
+# define DISABLE_ENQCMD0
+#else
+# define DISABLE_ENQCMD(1 << (X86_FEATURE_ENQCMD & 31))
+#endif
 
 #ifdef CONFIG_X86_SGX
 # define DISABLE_SGX   0
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 10/11] tools/objtool: Check for use of the ENQCMD instruction in the kernel

2021-12-17 Thread Fenghua Yu
The ENQCMD implicitly accesses the PASID_MSR to fill in the pasid field
of the descriptor being submitted to an accelerator. But there is no
precise (and stable across kernel changes) point at which the PASID_MSR
is updated from the value for one task to the next.

Kernel code that uses accelerators must always use the ENQCMDS instruction
which does not access the PASID_MSR.

Check for use of the ENQCMD instruction in the kernel and warn on its
usage.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Simplify handling ENQCMD (PeterZ and Josh)

 tools/objtool/arch/x86/decode.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 4d6d7fc13255..11ffa0e53f84 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -112,7 +112,7 @@ int arch_decode_instruction(struct objtool_file *file, 
const struct section *sec
const struct elf *elf = file->elf;
struct insn insn;
int x86_64, ret;
-   unsigned char op1, op2,
+   unsigned char op1, op2, op3,
  rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
  modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
  sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
@@ -139,6 +139,7 @@ int arch_decode_instruction(struct objtool_file *file, 
const struct section *sec
 
op1 = insn.opcode.bytes[0];
op2 = insn.opcode.bytes[1];
+   op3 = insn.opcode.bytes[2];
 
if (insn.rex_prefix.nbytes) {
rex = insn.rex_prefix.bytes[0];
@@ -491,6 +492,14 @@ int arch_decode_instruction(struct objtool_file *file, 
const struct section *sec
/* nopl/nopw */
*type = INSN_NOP;
 
+   } else if (op2 == 0x38 && op3 == 0xf8) {
+   if (insn.prefixes.nbytes == 1 &&
+   insn.prefixes.bytes[0] == 0xf2) {
+   /* ENQCMD cannot be used in the kernel. */
+   WARN("ENQCMD instruction at %s:%lx", sec->name,
+offset);
+   }
+
} else if (op2 == 0xa0 || op2 == 0xa8) {
 
/* push fs/gs */
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 07/11] sched: Define and initialize a flag to identify valid PASID in the task

2021-12-17 Thread Fenghua Yu
From: Peter Zijlstra 

Add a new single bit field to the task structure to track whether this task
has initialized the IA32_PASID MSR to the mm's PASID.

Initialize the field to zero when creating a new task with fork/clone.

Signed-off-by: Peter Zijlstra 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change condition to more accurate CONFIG_IOMMU_SVA (Jacob)

 include/linux/sched.h | 3 +++
 kernel/fork.c | 4 
 2 files changed, 7 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78c351e35fec..41a0b5703f94 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -933,6 +933,9 @@ struct task_struct {
/* Recursion prevention for eventfd_signal() */
unsignedin_eventfd_signal:1;
 #endif
+#ifdef CONFIG_IOMMU_SVA
+   unsignedpasid_activated:1;
+#endif
 
unsigned long   atomic_flags; /* Flags requiring atomic 
access. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 3adad225cc09..cd297926b6f2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -967,6 +967,10 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
tsk->use_memdelay = 0;
 #endif
 
+#ifdef CONFIG_IOMMU_SVA
+   tsk->pasid_activated = 0;
+#endif
+
 #ifdef CONFIG_MEMCG
tsk->active_memcg = NULL;
 #endif
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 03/11] iommu/ioasid: Introduce a helper to check for valid PASIDs

2021-12-17 Thread Fenghua Yu
pasid_valid() is defined to check if a given PASID is valid.

Suggested-by: Ashok Raj 
Suggested-by: Jacob Pan 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Define a helper pasid_valid() (Ashok, Jacob, Thomas, Tony)

 include/linux/ioasid.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index e9dacd4b9f6b..2237f64dbaae 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -41,6 +41,10 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
 int ioasid_register_allocator(struct ioasid_allocator_ops *allocator);
 void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator);
 int ioasid_set_data(ioasid_t ioasid, void *data);
+static inline bool pasid_valid(ioasid_t ioasid)
+{
+   return ioasid != INVALID_IOASID;
+}
 
 #else /* !CONFIG_IOASID */
 static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
@@ -78,5 +82,10 @@ static inline int ioasid_set_data(ioasid_t ioasid, void 
*data)
return -ENOTSUPP;
 }
 
+static inline bool pasid_valid(ioasid_t ioasid)
+{
+   return false;
+}
+
 #endif /* CONFIG_IOASID */
 #endif /* __LINUX_IOASID_H */
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 06/11] x86/fpu: Clear PASID when copying fpstate

2021-12-17 Thread Fenghua Yu
The kernel must allocate a Process Address Space ID (PASID) on behalf of
each process which will use ENQCMD and program it into the new MSR to
communicate the process identity to platform hardware. ENQCMD uses the
PASID stored in this MSR to tag requests from this process.

The PASID state must be cleared on fork() since fork creates a
new address space.

For clone(), it would be functionally OK to copy the PASID. However,
clearing it is _also_ functionally OK since any PASID use will trigger
the #GP handler to populate the MSR.

Copying the PASID state has two main downsides:
 * It requires differentiating fork() and clone() in the code,
   both in the FPU code and keeping tsk->pasid_activated consistent.
 * It guarantees that the PASID is out of its init state, which
   incurs small but non-zero cost on every XSAVE/XRSTOR.

The main downside of clearing the PASID at fpstate copy is the future,
one-time #GP for the thread.

Use the simplest approach: clear the PASID state both on clone() and
fork().  Rely on the #GP handler for MSR population in children.

Also, just clear the PASID bit from xfeatures if XSAVE is supported.
This will have no effect on systems that do not have PASID support.  It
is virtually zero overhead because 'dst_fpu' was just written and
the whole thing is cache hot.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Rewrite changelog (Dave Hansen).
- Move xfeature tweaking into fpu_clone() and make it unconditional
  if XSAVE is supported (Dave Hansen).

 arch/x86/kernel/fpu/core.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8ea306b1bf8e..13fc0ea52237 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -518,6 +518,13 @@ int fpu_clone(struct task_struct *dst, unsigned long 
clone_flags)
fpu_inherit_perms(dst_fpu);
fpregs_unlock();
 
+   /*
+* Children never inherit PASID state.
+* Force it to have its init value:
+*/
+   if (use_xsave())
+   dst_fpu->fpstate->regs.xsave.header.xfeatures &= 
~XFEATURE_MASK_PASID;
+
trace_x86_fpu_copy_src(src_fpu);
trace_x86_fpu_copy_dst(dst_fpu);
 
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2021-12-17 Thread Fenghua Yu
To avoid complexity of updating each thread's PASID status (e.g. sending
IPI to update IA32_PASID MSR) on allocating and freeing PASID, once
allocated and assigned to an mm, the PASID stays with the mm for the
rest of the mm's lifetime. A reference to the PASID is taken on
allocating the PASID. Binding/unbinding the PASID won't change refcount.
The reference is dropped on mm exit and thus the PASID is freed.

Two helpers mm_pasid_get() and mm_pasid_drop() are defined in mm because
the PASID operations handle the pasid member in mm_struct and should be
part of mm operations.

20-bit PASID allows up to 1M processes bound to PASIDs at the same time.
With cgroups and other controls that might limit the number of process
creation, the limited number of PASIDs is not a realistic issue for
lazy PASID free.

Suggested-by: Dave Hansen 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Free PASID on mm exit instead of in exit(2) or unbind() (Thomas, AndyL,
  PeterZ)
- Add mm_pasid_init(), mm_pasid_get(), and mm_pasid_drop() functions in mm.
  So the mm's PASID operations are generic for both X86 and ARM
  (Dave Hansen)

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +--
 drivers/iommu/intel/svm.c |  9 -
 drivers/iommu/iommu-sva-lib.c | 39 ++-
 drivers/iommu/iommu-sva-lib.h |  1 -
 include/linux/sched/mm.h  | 16 
 kernel/fork.c |  1 +
 6 files changed, 30 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index ee66d1f4cb81..c153ffae5462 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -340,14 +340,12 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
bond->smmu_mn = arm_smmu_mmu_notifier_get(smmu_domain, mm);
if (IS_ERR(bond->smmu_mn)) {
ret = PTR_ERR(bond->smmu_mn);
-   goto err_free_pasid;
+   goto err_free_bond;
}
 
list_add(>list, >bonds);
return >sva;
 
-err_free_pasid:
-   iommu_sva_free_pasid(mm);
 err_free_bond:
kfree(bond);
return ERR_PTR(ret);
@@ -377,7 +375,6 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
if (refcount_dec_and_test(>refs)) {
list_del(>list);
arm_smmu_mmu_notifier_put(bond->smmu_mn);
-   iommu_sva_free_pasid(bond->mm);
kfree(bond);
}
mutex_unlock(_lock);
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5b5d69b04fcc..51ac2096b3da 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -514,11 +514,6 @@ static int intel_svm_alloc_pasid(struct device *dev, 
struct mm_struct *mm,
return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
 }
 
-static void intel_svm_free_pasid(struct mm_struct *mm)
-{
-   iommu_sva_free_pasid(mm);
-}
-
 static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
   struct device *dev,
   struct mm_struct *mm,
@@ -662,8 +657,6 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
pasid)
kfree(svm);
}
}
-   /* Drop a PASID reference and free it if no reference. */
-   intel_svm_free_pasid(mm);
}
 out:
return ret;
@@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device *dev, 
struct mm_struct *mm, void
}
 
sva = intel_svm_bind_mm(iommu, dev, mm, flags);
-   if (IS_ERR_OR_NULL(sva))
-   intel_svm_free_pasid(mm);
mutex_unlock(_mutex);
 
return sva;
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index bd41405d34e9..ee2294e02716 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -18,8 +18,7 @@ static DECLARE_IOASID_SET(iommu_sva_pasid);
  *
  * Try to allocate a PASID for this mm, or take a reference to the existing one
  * provided it fits within the [@min, @max] range. On success the PASID is
- * available in mm->pasid, and must be released with iommu_sva_free_pasid().
- * @min must be greater than 0, because 0 indicates an unused mm->pasid.
+ * available in mm->pasid and will be available for the lifetime of the mm.
  *
  * Returns 0 on success and < 0 on error.
  */
@@ -33,38 +32,24 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t 
min, ioasid_t max)
return -EINVAL;
 
mutex_lock(_sva_lock);
-   if (mm->pasid) {
-   if (mm->pasid >= min && mm->pasid <= max)
-   ioasid_get(mm->pasid);
-   else
+   /* Is a PASID a

[PATCH v2 11/11] docs: x86: Change documentation for SVA (Shared Virtual Addressing)

2021-12-17 Thread Fenghua Yu
Since allocating, freeing and fixing up PASID are changed, the
documentation is updated to reflect the changes.

Originally-by: Ashok Raj 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Update life cycle info (Tony and Thomas).
- Update initial PASID value to INVALID_IOASID on fork().

 Documentation/x86/sva.rst | 58 +++
 1 file changed, 46 insertions(+), 12 deletions(-)

diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
index 076efd51ef1f..92341f26e525 100644
--- a/Documentation/x86/sva.rst
+++ b/Documentation/x86/sva.rst
@@ -104,18 +104,52 @@ The MSR must be configured on each logical CPU before any 
application
 thread can interact with a device. Threads that belong to the same
 process share the same page tables, thus the same MSR value.
 
-PASID is cleared when a process is created. The PASID allocation and MSR
-programming may occur long after a process and its threads have been created.
-One thread must call iommu_sva_bind_device() to allocate the PASID for the
-process. If a thread uses ENQCMD without the MSR first being populated, a #GP
-will be raised. The kernel will update the PASID MSR with the PASID for all
-threads in the process. A single process PASID can be used simultaneously
-with multiple devices since they all share the same address space.
-
-One thread can call iommu_sva_unbind_device() to free the allocated PASID.
-The kernel will clear the PASID MSR for all threads belonging to the process.
-
-New threads inherit the MSR value from the parent.
+PASID Life Cycle Management
+==
+
+PASID is initialized as INVALID_IOASID (-1) when a process is created.
+
+Only processes that access SVA-capable devices need to have a PASID
+allocated. This allocation happens when a process opens/binds an SVA-capable
+device but finds no PASID for this process. Subsequent binds of the same, or
+other devices will share the same PASID.
+
+Although the PASID is allocated to the process by opening a device,
+it is not active in any of the threads of that process. It's loaded to the
+IA32_PASID MSR lazily when a thread tries to submit a work descriptor
+to a device using the ENQCMD.
+
+That first access will trigger a #GP fault because the IA32_PASID MSR
+has not been initialized with the PASID value assigned to the process
+when the device was opened. The Linux #GP handler notes that a PASID has
+been allocated for the process, and so initializes the IA32_PASID MSR
+and returns so that the ENQCMD instruction is re-executed.
+
+On fork(2) or exec(2) the PASID is removed from the process as it no
+longer has the same address space that it had when the device was opened.
+
+On clone(2) the new task shares the same address space, so will be
+able to use the PASID allocated to the process. The IA32_PASID is not
+preemptively initialized as the PASID value might not be allocated yet or
+the kernel does not know whether this thread is going to access the device
+and the cleared IA32_PASID MSR reduces context switch overhead by xstate
+init optimization. Since #GP faults have to be handled on any threads that
+were created before the PASID was assigned to the mm of the process, newly
+created threads might as well be treated in a consistent way.
+
+Due to complexity of freeing the PASID and clearing all IA32_PASID MSRs in
+all threads in unbind, free the PASID lazily only on mm exit. Track the
+PASID's reference count in the following way:
+
+- Initialize the PASID's reference to 1 when the PASID is first allocated
+  to the mm. The reference is held for the rest life time of the mm until
+  it's dropped to 0 and the PASID is freed on mm exit.
+
+If a process does a close(2) of the device file descriptor and munmap(2)
+of the device MMIO portal, then the driver will unbind the device. The
+PASID is still marked VALID in the PASID_MSR for any threads in the
+process that accessed the device. But this is harmless as without the
+MMIO portal they cannot submit new work to the device.
 
 Relationships
 =
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 04/11] kernel/fork: Initialize mm's PASID

2021-12-17 Thread Fenghua Yu
A new mm doesn't have a PASID yet when it's created. Initialize
the mm's PASID on fork() or for init_mm to INVALID_IOASID (-1).

Suggested-by: Dave Hansen 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change condition to more accurate CONFIG_IOMMU_SVA (Jacob)

 include/linux/sched/mm.h | 10 ++
 kernel/fork.c| 10 ++
 mm/init-mm.c |  4 
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index aca874d33fe6..394c4359c4e1 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Routines for handling mm_structs
@@ -407,4 +408,13 @@ static inline void membarrier_update_current_mm(struct 
mm_struct *next_mm)
 }
 #endif
 
+#ifdef CONFIG_IOMMU_SVA
+static inline void mm_pasid_init(struct mm_struct *mm)
+{
+   mm->pasid = INVALID_IOASID;
+}
+#else
+static inline void mm_pasid_init(struct mm_struct *mm) {}
+#endif
+
 #endif /* _LINUX_SCHED_MM_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index b33c52021d4e..4e799c9b13bb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -96,6 +96,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1018,13 +1019,6 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 #endif
 }
 
-static void mm_init_pasid(struct mm_struct *mm)
-{
-#ifdef CONFIG_IOMMU_SVA
-   mm->pasid = INIT_PASID;
-#endif
-}
-
 static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
@@ -1053,7 +1047,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
-   mm_init_pasid(mm);
+   mm_pasid_init(mm);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
diff --git a/mm/init-mm.c b/mm/init-mm.c
index b4a6f38fb51d..fbe7844d0912 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #ifndef INIT_MM_CONTEXT
@@ -38,6 +39,9 @@ struct mm_struct init_mm = {
.mmlist = LIST_HEAD_INIT(init_mm.mmlist),
.user_ns= _user_ns,
.cpu_bitmap = CPU_BITS_NONE,
+#ifdef CONFIG_IOMMU_SVA
+   .pasid  = INVALID_IOASID,
+#endif
INIT_MM_CONTEXT(init_mm)
 };
 
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 00/11] Re-enable ENQCMD and PASID MSR

2021-12-17 Thread Fenghua Yu
Problems in the old code to manage SVM (Shared Virtual Memory) devices
and the PASID (Process Address Space ID) led to that code being
disabled.

Subsequent discussions resulted in a far simpler approach:

1) PASID life cycle is from first allocation by a process until that
   process exits.
2) All tasks begin with PASID disabled
3) The #GP fault handler tries to fix faulting ENQCMD instructions very
   early (thus avoiding complexities of the XSAVE infrastructure)

Change Log:
v2:
- Free PASID on mm exit instead of in exit(2) or unbind() (Thomas, AndyL,
  PeterZ)
- Directly write IA32_PASID MSR in fixup while local IRQ is still disabled
  (Thomas)
- Simplify handling ENQCMD in objtool (PeterZ and Josh)
- Define mm_pasid_get(), mm_pasid_drop(), and mm_pasid_init() in mm and
  call the functions from IOMMU (Dave Hansen).
- A few changes in the #GP fixup function (Dave Hansen, Tony Luck).
- Initial PASID value is changed to INVALID_PASID (Ashok Raj and
  Jacob Pan).
- Add mm_pasid_init(), mm_pasid_get(), and mm_pasid_drop() functions in mm.
  So the mm's PASID operations are generic for both X86 and ARM
  (Dave Hansen).
- Rename CONFIG_IOMMU_SVA_LIB to more useful and accurate
  CONFIG_IOMMU_SVA
- Use CONFIG_IOMMU_SVA for PASID processing condition (Jacob)
- The patch that cleans up old update_pasid() function is in upstream
  now (commit: 00ecd5401349 "iommu/vt-d: Clean up unused PASID updating
  functions") and therefore it's removed from this version.

v1 can be found at 
https://lore.kernel.org/lkml/20210920192349.2602141-1-fenghua...@intel.com/T/#md6d542091da1d1159eda0a44a16e57d0c0dfb209

Fenghua Yu (10):
  iommu/sva: Rename CONFIG_IOMMU_SVA_LIB to CONFIG_IOMMU_SVA
  mm: Change CONFIG option for mm->pasid field
  iommu/ioasid: Introduce a helper to check for valid PASIDs
  kernel/fork: Initialize mm's PASID
  iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm
exit
  x86/fpu: Clear PASID when copying fpstate
  x86/traps: Demand-populate PASID MSR via #GP
  x86/cpufeatures: Re-enable ENQCMD
  tools/objtool: Check for use of the ENQCMD instruction in the kernel
  docs: x86: Change documentation for SVA (Shared Virtual Addressing)

Peter Zijlstra (1):
  sched: Define and initialize a flag to identify valid PASID in the
task

 Documentation/x86/sva.rst | 58 +++
 arch/x86/include/asm/disabled-features.h  |  7 ++-
 arch/x86/kernel/fpu/core.c|  7 +++
 arch/x86/kernel/traps.c   | 55 ++
 drivers/iommu/Kconfig |  6 +-
 drivers/iommu/Makefile|  2 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  5 +-
 drivers/iommu/intel/Kconfig   |  2 +-
 drivers/iommu/intel/svm.c |  9 ---
 drivers/iommu/iommu-sva-lib.c | 39 -
 drivers/iommu/iommu-sva-lib.h |  7 +--
 include/linux/ioasid.h|  9 +++
 include/linux/mm_types.h  |  2 +-
 include/linux/sched.h |  3 +
 include/linux/sched/mm.h  | 26 +
 kernel/fork.c | 15 +++--
 mm/init-mm.c  |  4 ++
 tools/objtool/arch/x86/decode.c   | 11 +++-
 18 files changed, 194 insertions(+), 73 deletions(-)

-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 02/11] mm: Change CONFIG option for mm->pasid field

2021-12-17 Thread Fenghua Yu
This currently depends on CONFIG_IOMMU_SUPPORT. But it is only
needed when CONFIG_IOMMU_SVA option is enabled.

Change the CONFIG guards around definition and initialization
of mm->pasid field.

Suggested-by: Jacob Pan 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change condition to more accurate CONFIG_IOMMU_SVA (Jacob and Tony)

 include/linux/mm_types.h | 2 +-
 kernel/fork.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index c3a6e6209600..6d92121d5bd9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -643,7 +643,7 @@ struct mm_struct {
 #endif
struct work_struct async_put_work;
 
-#ifdef CONFIG_IOMMU_SUPPORT
+#ifdef CONFIG_IOMMU_SVA
u32 pasid;
 #endif
} __randomize_layout;
diff --git a/kernel/fork.c b/kernel/fork.c
index 3244cc56b697..b33c52021d4e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1020,7 +1020,7 @@ static void mm_init_owner(struct mm_struct *mm, struct 
task_struct *p)
 
 static void mm_init_pasid(struct mm_struct *mm)
 {
-#ifdef CONFIG_IOMMU_SUPPORT
+#ifdef CONFIG_IOMMU_SVA
mm->pasid = INIT_PASID;
 #endif
 }
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-29 Thread Fenghua Yu
Hi, Baolu,

On Thu, Sep 23, 2021 at 01:43:32PM +0800, Lu Baolu wrote:
> On 9/21/21 3:23 AM, Fenghua Yu wrote:
> > +void pasid_put(struct task_struct *tsk, struct mm_struct *mm)
> > +{
> > +   if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
> > +   return;
> > +
> > +   /*
> > +* Nothing to do if this task doesn't have a reference to the PASID.
> > +*/
> > +   if (tsk->has_valid_pasid) {
> > +   mutex_lock(_mutex);
> > +   /*
> > +* The PASID's reference was taken during fix up. Release it
> > +* now. If the reference count is 0, the PASID is freed.
> > +*/
> > +   iommu_sva_free_pasid(mm);
> > +   mutex_unlock(_mutex);
> > +   }
> > +}
> > 
> 
> It looks odd that both __fixup_pasid_exception() and pasid_put() are
> defined in the vendor IOMMU driver, but get called in the arch/x86
> code.
> 
> Is it feasible to move these two helpers to the files where they are
> called? The IA32_PASID MSR fixup and release are not part of the IOMMU
> implementation.

Sure. The functions will be removed in the next version. And new functions
will not be called in IOMMU driver.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/8] iommu/vt-d: Clean up unused PASID updating functions

2021-09-29 Thread Fenghua Yu
Hi, Baolu,

On Wed, Sep 29, 2021 at 03:34:51PM +0800, Lu Baolu wrote:
> On 2021/9/21 3:23, Fenghua Yu wrote:
> > update_pasid() and its call chain are currently unused in the tree because
> > Thomas disabled the ENQCMD feature. The feature will be re-enabled shortly
> > using a different approach and update_pasid() and its call chain will not
> > be used in the new approach.
> > 
> > Remove the useless functions.
> > 
> > Signed-off-by: Fenghua Yu 
> > Reviewed-by: Tony Luck 
> 
> Thanks for this cleanup. I have queued it for v5.16.

Thank you for pushing this patch to 5.16!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-29 Thread Fenghua Yu
Hi, Tony,

On Wed, Sep 29, 2021 at 10:41:42AM -0700, Luck, Tony wrote:
> On Wed, Sep 29, 2021 at 07:15:53PM +0200, Thomas Gleixner wrote:
> > On Wed, Sep 29 2021 at 09:59, Andy Lutomirski wrote:
> > > On 9/29/21 05:28, Thomas Gleixner wrote:
> > >> Looking at that patch again, none of this muck in fpu__pasid_write() is
> > >> required at all. The whole exception fixup is:
> > >> 
> > >>  if (!user_mode(regs))
> > >>   return false;
> > >> 
> > >>  if (!current->mm->pasid)
> > >>   return false;
> > >> 
> > >>  if (current->pasid_activated)
> > >>   return false;
> > >
> > > <-- preemption or BH here: kaboom.
> > 
> > Sigh, this had obviously to run in the early portion of #GP, i.e. before
> > enabling interrupts.
> 
> Like this? Obviously with some comment about why this is being done.
> 
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index a58800973aed..a848a59291e7 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -536,6 +536,12 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
>   unsigned long gp_addr;
>   int ret;
>  

Add
+#ifdef CONFIG_IOMMU_SUPPORT
because mm->pasid and current-pasid_activated are defined only if
CONFIG_IOMMU_SUPPORT is defined.

> + if (user_mode(regs) && current->mm->pasid && !current->pasid_activated) 
> {

Maybe need to add "&& cpu_feature_enabled(X86_FEATURE_ENQCMD)" because
the IA32_PASID MSR is only used when ENQCMD is enabled?

> + current->pasid_activated = 1;
> + wrmsrl(MSR_IA32_PASID, current->mm->pasid | 
> MSR_IA32_PASID_VALID);
> + return;
> + }
> +

+endif

>   cond_local_irq_enable(regs);
>  
>   if (static_cpu_has(X86_FEATURE_UMIP)) {

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-29 Thread Fenghua Yu
Hi, Thomas,

On Wed, Sep 29, 2021 at 09:51:15AM -0700, Luck, Tony wrote:
> > There is zero requirement to look at TIF_NEED_FPU_LOAD or
> > fpregs_state_valid() simply because the #GP comes straight from user
> > space which means the FPU registers contain the current tasks user space
> > state.
> 
> Just to double confirm ... there is no point in the #GP handler up to this 
> point
> where pre-emption can occur?

Same question here. The fixup function is called after cond_local_irq_enable().
If an interrupt comes before fixup_pasid_exception(), the interrupt may
use FPU and call kernel_fpu_begin_mask()->set(TIF_NEED_FPU_LOAD)->
__cpu_invalidate_fpregs_state(). Then writing to the IA32_PASID MSR. When
exiting to user, the FPU states will be restored to the FPU regs including
the IA32_PASID MSR. So the MSR could be different from the value written in
fixup_pasid_execption(). Is it possible?

Or should fixup_pasid_exception() be called before cond_local_irq_enable()?

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Fenghua Yu
Hi, Tony,

On Tue, Sep 28, 2021 at 06:06:52PM -0700, Luck, Tony wrote:
> >>fpregs_lock();
> >
> > I'm afraid we may hit the same locking issue when we send IPI to notify 
> > another task to modify its
> > PASID state. Here the API is called to modify another running task's PASID 
> > state as well without a right lock.
> > fpregs_lock() is not enough to deal with this, I'm afraid.
> 
> We don't send IPI any more to change PASID state. The only place that the
> current patch series touches the PASID MSR is in the #GP fault handler.

It's safe for the helpers to handle the PASID case (modifying the current task's
PASID state in #GP).

But the helpers seem to be generic. They take "task" as a parameter and
handle the task as non-current case. So the helpers are not for PASID only.
They may be used by others to modify a running task's FPU state. But
It's not safe to do so.

At least need some comments/restriction for the helpers to be used on
a running task?

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-28 Thread Fenghua Yu
Hi, Tony,

On Tue, Sep 28, 2021 at 04:10:39PM -0700, Luck, Tony wrote:
> Moving beyond pseudo-code and into compiles-but-probably-broken-code.
> 
> 
> The intent of the functions below is that Fenghua should be able to
> do:
> 
> void fpu__pasid_write(u32 pasid)
> {
>   u64 msr_val = pasid | MSR_IA32_PASID_VALID;
>   struct ia32_pasid_state *addr;
> 
>   addr = begin_update_one_xsave_feature(current, XFEATURE_PASID, true);
>   addr->pasid = msr_val;
>   finish_update_one_xsave_feature(current);
> }
> 
> So here's the two new functions that would be added to
> arch/x86/kernel/fpu/xstate.c
> 
> 
> 
> void *begin_update_one_xsave_feature(struct task_struct *tsk,
>  enum xfeature xfeature, bool full)
> {
> struct xregs_state *xsave = >thread.fpu.state.xsave;
> struct xregs_state *xinit = _fpstate.xsave;
> u64 fmask = 1ull << xfeature;
> void *addr;
> 
> BUG_ON(!(xsave->header.xcomp_bv & fmask));
> 
> fpregs_lock();
> 
> addr = __raw_xsave_addr(xsave, xfeature);
> 
> if (full || tsk != current) {
> memcpy(addr, __raw_xsave_addr(xinit, xfeature), 
> xstate_sizes[xfeature]);
> goto out;
> }
> 
>   /* could optimize some cases where xsaves() isn't fastest option */
> if (!(xsave->header.xfeatures & fmask))
> xsaves(xsave, fmask);

If xfeatures's feature bit is 0, xsaves will not write its init value to the
memory due to init optimization. So the xsaves will do nothing and the
state is not initialized and may have random data.

> 
> out:
> xsave->header.xfeatures |= fmask;
> return addr;
> }
> 
> void finish_update_one_xsave_feature(struct task_struct *tsk)
> {
> set_ti_thread_flag(task_thread_info(tsk), TIF_NEED_FPU_LOAD);

Setting TIF_NEED_FPU_LOAD cannot guaranteed to execute XRSTORS on exiting
to user. In fpregs_restore_userregs():
if (!fpregs_state_valid(fpu, cpu)) {
...
__restore_fpregs_from_fpstate(>state, mask);
...
}

fpregs state should be invalid to get the XRSTROS executed.

So setting TIF_NEED_FPU_LOAD may get the FPU register unchanged on exiting
to user.


> fpregs_unlock();
> }

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-28 Thread Fenghua Yu
Hi, Thomas,

On Sun, Sep 26, 2021 at 01:13:50AM +0200, Thomas Gleixner wrote:
> Fenghua,
> 
> On Fri, Sep 24 2021 at 16:12, Fenghua Yu wrote:
> > On Fri, Sep 24, 2021 at 03:18:12PM +0200, Thomas Gleixner wrote:
> >> But OTOH why do you need a per task reference count on the PASID at all?
> >> 
> >> The PASID is fundamentaly tied to the mm and the mm can't go away before
> >> the threads have gone away unless this magically changed after I checked
> >> that ~20 years ago.
> >
> > There are up to 1M PASIDs because PASID is 20-bit. I think there are a few 
> > ways
> > to allocate and free PASID:
> >
> > 1. Statically allocate a PASID once a mm is created and free it in mm
> >exit. No PASID allocation/free during the mm's lifetime. Then
> >up to 1M processes can be created due to 1M PASIDs limitation.
> >We don't want this method because the 1M processes limitation.
> 
> I'm not so worried about the 1M limitation, but it obviously makes sense
> to avoid that because allocating stuff which is not used is pointless in
> general.
> 
> > 2. A PASID is allocated to the mm in open(dev)->bind(dev, mm). There
> >are three ways to free it:
> >(a) Actively free it in close(fd)->unbind(dev, mm) by sending
> >IPIs to tell all tasks using the PASID to clear the IA32_PASID
> >MSR. This has locking issues similar to the actively loading
> >IA32_PASID MSR which was force disabled in upstream. So won't work.
> 
> Exactly.
> 
> >(b) Passively free the PASID in destroy_context(mm) in mm exit. Once
> >the PASID is allocated, it stays with the process for the lifetime. 
> > It's
> >better than #1 because the PASID is allocated only on demand.
> 
> Which is simple and makes a lot of sense. See below.
> 
> >(c) Passively free the PASID in deactive_mm(mm) or unbind() whenever 
> > there
> >is no usage as implemented in this series. Tracking the PASID usage
> >per task provides a chance to free the PASID on task exit. The
> >PASID has a better chance to be freed earlier than mm exit in #(b).
> >
> > This series uses #2 and #(c) to allocate and free the PASID for a better
> > chance to ease the 1M PASIDs limitation pressure. For example, a thread
> > doing open(dev)->ENQCMD->close(fd)->exit(2) will not occupy a PASID while
> > its sibling threads are still running.
> 
> I'm not seeing that as a realistic problem. Applications which use this
> kind of devices are unlikely to behave exactly that way.
> 
> 2^20 PASIDs are really plenty and just adding code for the theoretical
> case of PASID pressure is a pointless exercise IMO. It just adds
> complexity for no reason.
> 
> IMO reality will be that either you have long lived processes with tons
> of threads which use such devices over and over or short lived forked
> processes which open the device, do the job, close and exit. Both
> scenarios are fine with allocate on first use and drop on process exit.
> 
> I think with your approach you create overhead for applications which
> use thread pools where the threads get work thrown at them and do open()
> -> do_stuff() -> close() and then go back to wait for the next job which
> will do exactly the same thing. So you add the overhead of refcounts in
> general and in the worst case if the refcount drops to zero then the
> next worker has to allocate a new PASID instead of just moving on.
> 
> So unless you have a really compelling real world usecase argument, I'm
> arguing that the PASID pressure problem is a purely academic exercise.
> 
> I think you are conflating two things here:
> 
>   1) PASID lifetime
>   2) PASID MSR overhead
> 
> Which is not correct: You still can and have to optimize the per thread
> behaviour vs. the PASID MSR: Track per thread whether it ever needed the
> PASID and act upon that.
> 
> If the thread just does EMQCMD once in it's lifetime, then so be
> it. That's not a realistic use case, really.
> 
> And if someone does this then this does not mean we have to optimize for
> that. Optimizing for possible stupid implementations is the wrong
> approach. There is no technial measure against stupidity. If that would
> exist the world would be a much better place.
> 
> You really have to think about the problem space you are working
> on. There are problems which need a 'get it right at the first shot'
> solution because they create user space ABI or otheer hard to fix
> dependencies.
> 
> That's absolutely not the case here.
> 
> Get the basic simple support correct and work from there. Trying to
&

Re: [PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-24 Thread Fenghua Yu
Hi, Thomas,

On Fri, Sep 24, 2021 at 03:18:12PM +0200, Thomas Gleixner wrote:
> On Thu, Sep 23 2021 at 19:48, Thomas Gleixner wrote:
> > On Thu, Sep 23 2021 at 09:40, Tony Luck wrote:
> >
> > fpu_write_task_pasid() can just grab the pasid from current->mm->pasid
> > and be done with it.
> >
> > The task exit code can just call iommu_pasid_put_task_ref() from the
> > generic code and not from within x86.
> 
> But OTOH why do you need a per task reference count on the PASID at all?
> 
> The PASID is fundamentaly tied to the mm and the mm can't go away before
> the threads have gone away unless this magically changed after I checked
> that ~20 years ago.

There are up to 1M PASIDs because PASID is 20-bit. I think there are a few ways
to allocate and free PASID:

1. Statically allocate a PASID once a mm is created and free it in mm
   exit. No PASID allocation/free during the mm's lifetime. Then
   up to 1M processes can be created due to 1M PASIDs limitation.
   We don't want this method because the 1M processes limitation.

2. A PASID is allocated to the mm in open(dev)->bind(dev, mm). There
   are three ways to free it:
   (a) Actively free it in close(fd)->unbind(dev, mm) by sending
   IPIs to tell all tasks using the PASID to clear the IA32_PASID
   MSR. This has locking issues similar to the actively loading
   IA32_PASID MSR which was force disabled in upstream. So won't work.
   (b) Passively free the PASID in destroy_context(mm) in mm exit. Once
   the PASID is allocated, it stays with the process for the lifetime. It's
   better than #1 because the PASID is allocated only on demand.
   (c) Passively free the PASID in deactive_mm(mm) or unbind() whenever there
   is no usage as implemented in this series. Tracking the PASID usage
   per task provides a chance to free the PASID on task exit. The
   PASID has a better chance to be freed earlier than mm exit in #(b).

This series uses #2 and #(c) to allocate and free the PASID for a better
chance to ease the 1M PASIDs limitation pressure. For example, a thread
doing open(dev)->ENQCMD->close(fd)->exit(2) will not occupy a PASID while
its sibling threads are still running.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-23 Thread Fenghua Yu
Hi, Andy,

On Thu, Sep 23, 2021 at 04:17:05PM -0700, Andy Lutomirski wrote:
> On Mon, Sep 20, 2021, at 12:23 PM, Fenghua Yu wrote:
> > ENQCMD requires the IA32_PASID MSR has a valid PASID value which was
> > allocated to the process during bind. The MSR could be populated eagerly
> > by an IPI after the PASID is allocated in bind. But the method was
> > disabled in commit 9bfecd058339 ("x86/cpufeatures: Force disable
> > X86_FEATURE_ENQCMD and remove update_pasid()")' due to locking and other
> > issues.
> >
> > Since the MSR was cleared in fork()/clone(), the first ENQCMD will
> > generate a #GP fault. The #GP fault handler will initialize the MSR
> > if a PASID has been allocated for this process.
> >
> > The lazy enabling of the PASID MSR in the #GP handler is not an elegant
> > solution. But it has the least complexity that fits with h/w behavior.
> >
> > Signed-off-by: Fenghua Yu 
> > Reviewed-by: Tony Luck 
> > ---
> >  arch/x86/include/asm/fpu/api.h |  6 
> >  arch/x86/include/asm/iommu.h   |  2 ++
> >  arch/x86/kernel/fpu/xstate.c   | 59 ++
> >  arch/x86/kernel/traps.c| 12 +++
> >  drivers/iommu/intel/svm.c  | 32 ++
> >  5 files changed, 111 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/fpu/api.h 
> > b/arch/x86/include/asm/fpu/api.h
> > index ca4d0dee1ecd..f146849e5c8c 100644
> > --- a/arch/x86/include/asm/fpu/api.h
> > +++ b/arch/x86/include/asm/fpu/api.h
> > @@ -106,4 +106,10 @@ extern int cpu_has_xfeatures(u64 xfeatures_mask, 
> > const char **feature_name);
> >   */
> >  #define PASID_DISABLED 0
> > 
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +void fpu__pasid_write(u32 pasid);
> > +#else
> > +static inline void fpu__pasid_write(u32 pasid) { }
> > +#endif
> > +
> >  #endif /* _ASM_X86_FPU_API_H */
> > diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
> > index bf1ed2ddc74b..9c4bf9b0702f 100644
> > --- a/arch/x86/include/asm/iommu.h
> > +++ b/arch/x86/include/asm/iommu.h
> > @@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory 
> > *rmrr)
> > return -EINVAL;
> >  }
> > 
> > +bool __fixup_pasid_exception(void);
> > +
> >  #endif /* _ASM_X86_IOMMU_H */
> > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> > index c8def1b7f8fb..8a89b2cecd77 100644
> > --- a/arch/x86/kernel/fpu/xstate.c
> > +++ b/arch/x86/kernel/fpu/xstate.c
> > @@ -1289,3 +1289,62 @@ int proc_pid_arch_status(struct seq_file *m, 
> > struct pid_namespace *ns,
> > return 0;
> >  }
> >  #endif /* CONFIG_PROC_PID_ARCH_STATUS */
> > +
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +/**
> > + * fpu__pasid_write - Write the current task's PASID state/MSR.
> > + * @pasid: the PASID
> > + *
> > + * The PASID is written to the IA32_PASID MSR directly if the MSR is 
> > active.
> > + * Otherwise it's written to the PASID. The IA32_PASID MSR should 
> > contain
> > + * the PASID after returning to the user.
> > + *
> > + * This is called only when ENQCMD is enabled.
> > + */
> > +void fpu__pasid_write(u32 pasid)
> > +{
> > +   struct xregs_state *xsave = >thread.fpu.state.xsave;
> > +   u64 msr_val = pasid | MSR_IA32_PASID_VALID;
> > +   struct fpu *fpu = >thread.fpu;
> > +
> > +   /*
> > +* ENQCMD always uses the compacted XSAVE format. Ensure the buffer
> > +* has space for the PASID.
> > +*/
> > +   BUG_ON(!(xsave->header.xcomp_bv & XFEATURE_MASK_PASID));
> > +
> > +   fpregs_lock();
> > +
> > +   /*
> > +* If the task's FPU doesn't need to be loaded or is valid, directly
> > +* write the IA32_PASID MSR. Otherwise, write the PASID state and
> > +* the MSR will be loaded from the PASID state before returning to
> > +* the user.
> > +*/
> > +   if (!test_thread_flag(TIF_NEED_FPU_LOAD) ||
> > +   fpregs_state_valid(fpu, smp_processor_id())) {
> > +   wrmsrl(MSR_IA32_PASID, msr_val);
> 
> Let me try to decode this.
> 
> If the current task's FPU state is live or if the state is in memory but the 
> CPU regs just happen to match the copy in memory, then write the MSR.  Else 
> write the value to memory.
> 
> This is wrong.  If !TIF_NEED_FPU_LOAD && fpregs_state_valid, you MUST NOT 
> MODIFY FPU STATE.

But the FPU state is not modified if !TIF_NEED_FPU_LOAD && fpregs_state_valid.

The FPU state is mo

Re: [PATCH 7/8] tools/objtool: Check for use of the ENQCMD instruction in the kernel

2021-09-23 Thread Fenghua Yu
Hi, Josh,

On Thu, Sep 23, 2021 at 05:55:40PM -0700, Josh Poimboeuf wrote:
> On Thu, Sep 23, 2021 at 03:26:14PM +0000, Fenghua Yu wrote:
> > > > +   } else if (op2 == 0x38 && op3 == 0xf8) {
> > > > +   if (insn.prefixes.nbytes == 1 &&
> > > > +   insn.prefixes.bytes[0] == 0xf2) {
> > > > +   /* ENQCMD cannot be used in the kernel. 
> > > > */
> > > > +   WARN("ENQCMD instruction at %s:%lx", 
> > > > sec->name,
> > > > +offset);
> > > > +
> > > > +   return -1;
> > > > +   }
> > > 
> > > The only concern here is if we want it to be fatal or not. But otherwise
> > > this seems to be all that's required.
> > 
> > objtool doesn't fail kernel build on this fatal warning.
> > 
> > Returning -1 here stops checking the rest of the file and won't report any
> > further warnings unless this ENQCMD warning is fixed. Not returning -1
> > continues checking the rest of the file and may report more warnings.
> > Seems that's the only difference b/w them.
> > 
> > Should I keep this "return -1" or not? Please advice.
> 
> I'd say remove the "return -1" since it's not a fatal-type analysis
> error and there's nothing to prevent objtool from analyzing the rest of
> the file.

Sure. It does make sense to remove "return -1". I will remove it.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/8] tools/objtool: Check for use of the ENQCMD instruction in the kernel

2021-09-23 Thread Fenghua Yu
Hi, Peter,

On Thu, Sep 23, 2021 at 09:17:01AM +0200, Peter Zijlstra wrote:
> On Wed, Sep 22, 2021 at 11:44:41PM +0000, Fenghua Yu wrote:
> 
> > > Since you're making it a fatal error, before doing much of anything
> > > else, you might at well fail decode and keep it all in the x86/decode.c
> > > file, no need to spread this 'knowledge' any further.
> 
> > Is the following updated patch a right one?
> 
> Yes, that's what I was thinking of.
> 
> > diff --git a/tools/objtool/arch/x86/decode.c 
> > b/tools/objtool/arch/x86/decode.c
> > index bc821056aba9..3e0f928e28a5 100644
> > --- a/tools/objtool/arch/x86/decode.c
> > +++ b/tools/objtool/arch/x86/decode.c
> > @@ -110,7 +110,7 @@ int arch_decode_instruction(const struct elf *elf, 
> > const struct section *sec,
> >  {
> > struct insn insn;
> > int x86_64, ret;
> > -   unsigned char op1, op2,
> > +   unsigned char op1, op2, op3,
> >   rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
> >   modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
> >   sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
> > @@ -137,6 +137,7 @@ int arch_decode_instruction(const struct elf *elf, 
> > const struct section *sec,
> >  
> > op1 = insn.opcode.bytes[0];
> > op2 = insn.opcode.bytes[1];
> > +   op3 = insn.opcode.bytes[2];
> >  
> > if (insn.rex_prefix.nbytes) {
> > rex = insn.rex_prefix.bytes[0];
> > @@ -489,6 +490,16 @@ int arch_decode_instruction(const struct elf *elf, 
> > const struct section *sec,
> > /* nopl/nopw */
> > *type = INSN_NOP;
> >  
> > +   } else if (op2 == 0x38 && op3 == 0xf8) {
> > +   if (insn.prefixes.nbytes == 1 &&
> > +   insn.prefixes.bytes[0] == 0xf2) {
> > +   /* ENQCMD cannot be used in the kernel. */
> > +   WARN("ENQCMD instruction at %s:%lx", sec->name,
> > +offset);
> > +
> > +   return -1;
> > +   }
> 
> The only concern here is if we want it to be fatal or not. But otherwise
> this seems to be all that's required.

objtool doesn't fail kernel build on this fatal warning.

Returning -1 here stops checking the rest of the file and won't report any
further warnings unless this ENQCMD warning is fixed. Not returning -1
continues checking the rest of the file and may report more warnings.
Seems that's the only difference b/w them.

Should I keep this "return -1" or not? Please advice.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/8] tools/objtool: Check for use of the ENQCMD instruction in the kernel

2021-09-22 Thread Fenghua Yu
Hi, Peter,

On Wed, Sep 22, 2021 at 11:03:43PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 20, 2021 at 07:23:48PM +0000, Fenghua Yu wrote:
> > +   ret = validate_enqcmd(file);
> > +   if (ret < 0)
> > +   goto out;
> > +   warnings += ret;
> > +
> > if (vmlinux && !validate_dup) {
> > ret = validate_vmlinux_functions(file);
> > if (ret < 0)
> 
> Since you're making it a fatal error, before doing much of anything
> else, you might at well fail decode and keep it all in the x86/decode.c
> file, no need to spread this 'knowledge' any further.
> 
> There's no actual state associated with it, you just want to avoid the
> instruction being present.
> 
> Much simpler patch too.

Is the following updated patch a right one?

Thanks.

-Fenghua

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index bc821056aba9..3e0f928e28a5 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -110,7 +110,7 @@ int arch_decode_instruction(const struct elf *elf, const 
struct section *sec,
 {
struct insn insn;
int x86_64, ret;
-   unsigned char op1, op2,
+   unsigned char op1, op2, op3,
  rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
  modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
  sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
@@ -137,6 +137,7 @@ int arch_decode_instruction(const struct elf *elf, const 
struct section *sec,
 
op1 = insn.opcode.bytes[0];
op2 = insn.opcode.bytes[1];
+   op3 = insn.opcode.bytes[2];
 
if (insn.rex_prefix.nbytes) {
rex = insn.rex_prefix.bytes[0];
@@ -489,6 +490,16 @@ int arch_decode_instruction(const struct elf *elf, const 
struct section *sec,
/* nopl/nopw */
*type = INSN_NOP;
 
+   } else if (op2 == 0x38 && op3 == 0xf8) {
+   if (insn.prefixes.nbytes == 1 &&
+   insn.prefixes.bytes[0] == 0xf2) {
+   /* ENQCMD cannot be used in the kernel. */
+   WARN("ENQCMD instruction at %s:%lx", sec->name,
+offset);
+
+   return -1;
+   }
+
} else if (op2 == 0xa0 || op2 == 0xa8) {
 
/* push fs/gs */
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-22 Thread Fenghua Yu
Hi, Peter,

On Wed, Sep 22, 2021 at 11:07:22PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 20, 2021 at 07:23:45PM +0000, Fenghua Yu wrote:
> >  
> > +   if (user_mode(regs) && fixup_pasid_exception())
> > +   goto exit;
> > +
> > if (static_cpu_has(X86_FEATURE_UMIP)) {
> > if (user_mode(regs) && fixup_umip_exception(regs))
> > goto exit;
> 
> So you're eating any random #GP that might or might not be PASID
> related. And all that witout a comment... Enlighten?

I will add a comment here.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-22 Thread Fenghua Yu
Hi, Peter,

On Wed, Sep 22, 2021 at 11:11:45PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 22, 2021 at 11:07:22PM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 20, 2021 at 07:23:45PM +, Fenghua Yu wrote:
> > > +static bool fixup_pasid_exception(void)
> > > +{
> > > + if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
> > > + return false;
> > > +
> > > + return __fixup_pasid_exception();
> > > +}
> 
> That is, shouldn't the above at the very least decode the instruction
> causing the #GP and check it's this ENQCMD thing?

There were comments on a previous version when we used #GP fixup method:
https://lore.kernel.org/linux-iommu/f6d34d59-e6eb-ee9f-d247-8fb2f0e37...@intel.com/

There are three reasons for not decoding the instruction:

1. Parsing the instruction sets bad architectural precedent and is ugly.
2. The instruction could be modified (e.g. JVM) while decoding the
   instruction. It's.
3. Decoding is more complex than this patch and doesn't worth it.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/8] x86/mmu: Add mm-based PASID refcounting

2021-09-20 Thread Fenghua Yu
PASIDs are fundamentally hardware resources in a shared address space.
There is a limited number of them to use ENQCMD on shared workqueue.
They must be shared and managed. They can not, for instance, be
statically allocated to processes.

Free PASID eagerly by sending IPIs in unbind was disabled due to locking
and other issues in commit 9bfecd058339 ("x86/cpufeatures: Force disable
X86_FEATURE_ENQCMD and remove update_pasid()").

Lazy PASID free is implemented in order to re-enable the ENQCMD feature.
PASIDs are currently reference counted and are centered around device
usage. To support lazy PASID free, reference counts are tracked in the
following scenarios:

1. The PASID's reference count is initialized as 1 when the PASID is first
   allocated in bind. This is already implemented.
2. A reference is taken when a device is bound to the mm and dropped
   when the device is unbound from the mm. This reference tracks device
   usage of the PASID. This is already implemented.
3. A reference is taken when a task's IA32_PASID MSR is initialized in
   #GP fix up and dropped when the task exits. This reference tracks
   the task usage of the PASID. It is implemented here.

Once a PASID is allocated to an mm in bind, it's associated to the mm until
it's freed lazily when its reference count is dropped to zero in unbind or
exit(2).

ENQCMD requires a valid IA32_PASID MSR with the PASID value and a valid
PASID table entry for the PASID. Lazy PASID free may cause the process
still has the valid PASID but the PASID table entry is removed in unbind.
In this case, workqueue submitted by ENQCMD cannot find the PASID table
entry and will generate a DMAR fault.

Here is a more detailed explanation of the life cycle of a PASID:

All processes start out without a PASID allocated (because fork(2)
clears the PASID in the child).

A PASID is allocated on the first open of an accelerator device by
a call to:
iommu_sva_bind_device()
-> intel_svm_bind()
   -> intel_svm_alloc_pasid()
  -> iommu_sva_alloc_pasid()
-> ioasid_alloc()

At this point mm->pasid for the process is initialized, the reference
count on that PASID is 1, but as yet no tasks within the process have
set up their MSR_IA32_PASID to be able to execute the ENQCMD instruction.

When a task in the process does execute ENQCMD there is a #GP fault.
The Linux handler notes that the process has a PASID allocated, and
attempts to fix the #GP fault by initializing MSR_IA32_PASID for this
task. It also increments the reference count for the PASID.

Additional threads in the task may also execute ENQCMD, and each
will add to the reference count of the PASID.

Tasks within the process may open additional accelerator devices.
In this case the call to iommu_sva_bind_device() merely increments
the reference count for the PASID. Since all devices use the same
PASID (all are accessing the same address space).

So the reference count on a PASID is the sum of the number of open
accelerator devices plus the number of threads that have tried to
execute ENQCMD.

The reverse happens as a process gives up resources. Each call to
iommu_sva_unbind_device() will reduce the reference count on the
PASID. Each task in the process that had set up MSR_IA32_PASID will
reduce the reference count as it exits.

When the reference count is dropped to 0 in either task exit or
unbind, the PASID will be freed.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
 arch/x86/include/asm/iommu.h   |  6 +
 arch/x86/include/asm/mmu_context.h |  2 ++
 drivers/iommu/intel/svm.c  | 39 ++
 3 files changed, 47 insertions(+)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index 9c4bf9b0702f..d00f0a3f32fb 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -28,4 +28,10 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory 
*rmrr)
 
 bool __fixup_pasid_exception(void);
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+void pasid_put(struct task_struct *tsk, struct mm_struct *mm);
+#else
+static inline void pasid_put(struct task_struct *tsk, struct mm_struct *mm) { }
+#endif
+
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 27516046117a..3a2de87e98a9 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern atomic64_t last_mm_ctx_id;
 
@@ -146,6 +147,7 @@ do {\
 #else
 #define deactivate_mm(tsk, mm) \
 do {   \
+   pasid_put(tsk, mm); \
load_gs_index(0);   \
loadsegment(fs, 0); \
 } while (0)
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ab65020019b6..8b6b8007ba2c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/

[PATCH 3/8] sched: Define and initialize a flag to identify valid PASID in the task

2021-09-20 Thread Fenghua Yu
From: Peter Zijlstra 

Add a new field to the task structure to track whether this task
has initialized the IA32_PASID MSR (and thus holds a reference
count on the PASID for this process).

Initialize the field to zero when creating a new task with fork/clone.

Signed-off-by: Peter Zijlstra 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
 include/linux/sched.h | 4 
 kernel/fork.c | 4 
 2 files changed, 8 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 39039ce8ac4c..21a8cff9155c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -936,6 +936,10 @@ struct task_struct {
unsignedin_eventfd_signal:1;
 #endif
 
+#ifdef CONFIG_IOMMU_SUPPORT
+   unsignedhas_valid_pasid:1;
+#endif
+
unsigned long   atomic_flags; /* Flags requiring atomic 
access. */
 
struct restart_blockrestart_block;
diff --git a/kernel/fork.c b/kernel/fork.c
index 38681ad44c76..e379f88260eb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -967,6 +967,10 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
tsk->use_memdelay = 0;
 #endif
 
+#ifdef CONFIG_IOMMU_SUPPORT
+   tsk->has_valid_pasid = 0;
+#endif
+
 #ifdef CONFIG_MEMCG
tsk->active_memcg = NULL;
 #endif
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/8] iommu/vt-d: Clean up unused PASID updating functions

2021-09-20 Thread Fenghua Yu
update_pasid() and its call chain are currently unused in the tree because
Thomas disabled the ENQCMD feature. The feature will be re-enabled shortly
using a different approach and update_pasid() and its call chain will not
be used in the new approach.

Remove the useless functions.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
 arch/x86/include/asm/fpu/api.h |  2 --
 drivers/iommu/intel/svm.c  | 24 +---
 2 files changed, 1 insertion(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 23bef08a8388..ca4d0dee1ecd 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -106,6 +106,4 @@ extern int cpu_has_xfeatures(u64 xfeatures_mask, const char 
**feature_name);
  */
 #define PASID_DISABLED 0
 
-static inline void update_pasid(void) { }
-
 #endif /* _ASM_X86_FPU_API_H */
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 0c228787704f..5b5d69b04fcc 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -505,21 +505,6 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
return ret;
 }
 
-static void _load_pasid(void *unused)
-{
-   update_pasid();
-}
-
-static void load_pasid(struct mm_struct *mm, u32 pasid)
-{
-   mutex_lock(>context.lock);
-
-   /* Update PASID MSR on all CPUs running the mm's tasks. */
-   on_each_cpu_mask(mm_cpumask(mm), _load_pasid, NULL, true);
-
-   mutex_unlock(>context.lock);
-}
-
 static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
 unsigned int flags)
 {
@@ -614,10 +599,6 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
if (ret)
goto free_sdev;
 
-   /* The newly allocated pasid is loaded to the mm. */
-   if (!(flags & SVM_FLAG_SUPERVISOR_MODE) && list_empty(>devs))
-   load_pasid(mm, svm->pasid);
-
list_add_rcu(>list, >devs);
 success:
return >sva;
@@ -670,11 +651,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
pasid)
kfree_rcu(sdev, rcu);
 
if (list_empty(>devs)) {
-   if (svm->notifier.ops) {
+   if (svm->notifier.ops)
mmu_notifier_unregister(>notifier, 
mm);
-   /* Clear mm's pasid. */
-   load_pasid(mm, PASID_DISABLED);
-   }
pasid_private_remove(svm->pasid);
/* We mandate that no page faults may be 
outstanding
 * for the PASID when intel_svm_unbind_mm() is 
called.
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 8/8] docs: x86: Change documentation for SVA (Shared Virtual Addressing)

2021-09-20 Thread Fenghua Yu
Since allocating, freeing and fixing up PASID are changed, the
documentation is updated to reflect the changes.

Originally-by: Ashok Raj 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
 Documentation/x86/sva.rst | 81 +++
 1 file changed, 74 insertions(+), 7 deletions(-)

diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
index 076efd51ef1f..868ed4b25002 100644
--- a/Documentation/x86/sva.rst
+++ b/Documentation/x86/sva.rst
@@ -106,16 +106,83 @@ process share the same page tables, thus the same MSR 
value.
 
 PASID is cleared when a process is created. The PASID allocation and MSR
 programming may occur long after a process and its threads have been created.
-One thread must call iommu_sva_bind_device() to allocate the PASID for the
-process. If a thread uses ENQCMD without the MSR first being populated, a #GP
-will be raised. The kernel will update the PASID MSR with the PASID for all
-threads in the process. A single process PASID can be used simultaneously
+One thread must call iommu_sva_bind(_device) to allocate the PASID for the 
process.
+If a thread uses ENQCMD without the MSR first being populated, it will cause 
#GP.
+The kernel will fix up the #GP by writing the process-wide PASID into the
+thread that took the #GP. A single process PASID can be used simultaneously
 with multiple devices since they all share the same address space.
 
-One thread can call iommu_sva_unbind_device() to free the allocated PASID.
-The kernel will clear the PASID MSR for all threads belonging to the process.
+The PASID MSR value is cleared at thread creation and is never inherited from a
+parent. This ensures consistent child behavior no matter whether the thread is
+created first or the PASID is allocated (and the MSR written).
 
-New threads inherit the MSR value from the parent.
+
+PASID Lifecycle Management
+==
+
+Only processes that access SVA-capable devices need to have a PASID
+allocated. This allocation happens when a process opens/binds an SVA-capable
+device but finds no PASID for this process. Subsequent binds of the same, or
+other devices will share the same PASID.
+
+Although the PASID is allocated to the process by opening a device,
+it is not active in any of the threads of that process. It's loaded to the
+IA32_PASID MSR lazily when a thread tries to submit a work descriptor
+to a device using the ENQCMD.
+
+That first access will trigger a #GP fault because the IA32_PASID MSR
+has not been initialized with the PASID value assigned to the process
+when the device was opened. The Linux #GP handler notes that a PASID has
+been allocated for the process, and so initializes the IA32_PASID MSR, takes
+a reference to the PASID and returns so that the ENQCMD instruction is
+re-executed.
+
+On fork(2) or exec(2) the PASID is removed from the process as it no
+longer has the same address space that it had when the device was opened.
+
+On clone(2) the new task shares the same address space, so will be
+able to use the PASID allocated to the process. The IA32_PASID is not
+preemptively initialized as the PASID value might not be allocated yet or
+the kernel does not know whether this thread is going to access the device
+and the cleared IA32_PASID MSR reduces context switch overhead by xstate
+init optimization. Since #GP faults have to be handled on any threads that
+were created before the PASID was assigned to the mm of the process, newly
+created threads might as well be treated in a consistent way.
+
+Due to complexity of freeing the PASID and clearing all IA32_PASID MSRs in
+all threads in unbind, free the PASID lazily when there is no PASID user.
+Track the PASID's reference count in the following way:
+
+- Track device usage of the PASID: The PASID's reference count is initialized
+  as 1 when the PASID is allocated in the first bind. Bind takes a reference
+  and unbind drops the reference.
+- Track task usage of the PASID: Fixing up the IA32_PASID MSR in #GP takes
+  reference and exit(2) drops the reference. Once the MSR is fixed up, the
+  PASID value stays in the MSR stays for the rest life of the task.
+
+The PASID is freed lazily in exit(2) or unbind when there is no reference
+to the PASID. After freed, the PASID can be allocated to any process.
+
+ENQCMD needs at least two requirements: a valid IA32_PASID MSR with the
+PASID value of the process and a valid PASID table entry for the PASID.
+To execute ENQCMD, the user must ensure the device is bound to the
+process so that the kernel can guarantee to meet the above two requirements.
+
+Lazy PASID free may cause the task still has the PASID value in IA32_PASID
+while there is no PASID table entry for the PASID. The workqueue submitted
+by ENQCMD in this scenario cannot find the PASID table entry and generates
+a DMAR fault. Currently DMAR fault handler just prints a fault reason.
+Future DMAR fault handler might notify the user the workqueue failure.
+Here

[PATCH 7/8] tools/objtool: Check for use of the ENQCMD instruction in the kernel

2021-09-20 Thread Fenghua Yu
The ENQCMD implicitly accesses the PASID_MSR to fill in the pasid field
of the descriptor being submitted to an accelerator. But there is no
precise (and stable across kernel changes) point at which the PASID_MSR
is updated from the value for one task to the next.

Kernel code that uses accelerators must always use the ENQCMDS instruction
which does not access the PASID_MSR.

Check for use of the ENQCMD instruction in the kernel and warn on its
usage.

Checking the invalid instruction is a relatively new use of objtool and
I'm open to feedback about the approach.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
 tools/objtool/arch/x86/decode.c  | 10 +-
 tools/objtool/check.c| 20 
 tools/objtool/include/objtool/arch.h |  1 +
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index bc821056aba9..e63b44ba3bd4 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -110,7 +110,7 @@ int arch_decode_instruction(const struct elf *elf, const 
struct section *sec,
 {
struct insn insn;
int x86_64, ret;
-   unsigned char op1, op2,
+   unsigned char op1, op2, op3,
  rex = 0, rex_b = 0, rex_r = 0, rex_w = 0, rex_x = 0,
  modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
  sib = 0, /* sib_scale = 0, */ sib_index = 0, sib_base = 0;
@@ -137,6 +137,7 @@ int arch_decode_instruction(const struct elf *elf, const 
struct section *sec,
 
op1 = insn.opcode.bytes[0];
op2 = insn.opcode.bytes[1];
+   op3 = insn.opcode.bytes[2];
 
if (insn.rex_prefix.nbytes) {
rex = insn.rex_prefix.bytes[0];
@@ -489,6 +490,13 @@ int arch_decode_instruction(const struct elf *elf, const 
struct section *sec,
/* nopl/nopw */
*type = INSN_NOP;
 
+   } else if (op2 == 0x38 && op3 == 0xf8) {
+   if (insn.prefixes.nbytes == 1 &&
+   insn.prefixes.bytes[0] == 0xf2) {
+   /* enqcmd */
+   *type = INSN_ENQCMD;
+   }
+
} else if (op2 == 0xa0 || op2 == 0xa8) {
 
/* push fs/gs */
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e5947fbb9e7a..91d13521d9d6 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3133,6 +3133,21 @@ static int validate_reachable_instructions(struct 
objtool_file *file)
return 0;
 }
 
+static int validate_enqcmd(struct objtool_file *file)
+{
+   struct instruction *insn;
+
+   for_each_insn(file, insn) {
+   if (insn->type == INSN_ENQCMD) {
+   WARN_FUNC("enqcmd instruction", insn->sec,
+ insn->offset);
+   return 1;
+   }
+   }
+
+   return 0;
+}
+
 int check(struct objtool_file *file)
 {
int ret, warnings = 0;
@@ -3147,6 +3162,11 @@ int check(struct objtool_file *file)
if (list_empty(>insn_list))
goto out;
 
+   ret = validate_enqcmd(file);
+   if (ret < 0)
+   goto out;
+   warnings += ret;
+
if (vmlinux && !validate_dup) {
ret = validate_vmlinux_functions(file);
if (ret < 0)
diff --git a/tools/objtool/include/objtool/arch.h 
b/tools/objtool/include/objtool/arch.h
index 062bb6e9b865..5147c0762b49 100644
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -26,6 +26,7 @@ enum insn_type {
INSN_CLAC,
INSN_STD,
INSN_CLD,
+   INSN_ENQCMD,
INSN_OTHER,
 };
 
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/8] x86/process: Clear PASID state for a newly forked/cloned thread

2021-09-20 Thread Fenghua Yu
The PASID state has to be cleared on forks, since the child has a
different address space. The PASID is also cleared for thread clone. While
it would be correct to inherit the PASID in this case, it is unknown
whether the new task will use ENQCMD. Giving it the PASID "just in case"
would have the downside of increased context switch overhead to setting
the PASID MSR and losing init optimization.

Since #GP faults have to be handled on any threads that were created before
the PASID was assigned to the mm of the process, newly created threads
might as well be treated in a consistent way.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
 arch/x86/kernel/process.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1d9463e3096b..c713986ef7d7 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -154,6 +154,14 @@ int copy_thread(unsigned long clone_flags, unsigned long 
sp, unsigned long arg,
frame->flags = X86_EFLAGS_FIXED;
 #endif
 
+   if (static_cpu_has(X86_FEATURE_ENQCMD)) {
+   /*
+* Clear the PASID bit in xfeatures so that the PASID MSR
+* will be initialized as init state (0).
+*/
+   p->thread.fpu.state.xsave.header.xfeatures &= 
~XFEATURE_MASK_PASID;
+   }
+
/* Kernel thread ? */
if (unlikely(p->flags & PF_KTHREAD)) {
p->thread.pkru = pkru_get_init_value();
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-20 Thread Fenghua Yu
ENQCMD requires the IA32_PASID MSR has a valid PASID value which was
allocated to the process during bind. The MSR could be populated eagerly
by an IPI after the PASID is allocated in bind. But the method was
disabled in commit 9bfecd058339 ("x86/cpufeatures: Force disable
X86_FEATURE_ENQCMD and remove update_pasid()")' due to locking and other
issues.

Since the MSR was cleared in fork()/clone(), the first ENQCMD will
generate a #GP fault. The #GP fault handler will initialize the MSR
if a PASID has been allocated for this process.

The lazy enabling of the PASID MSR in the #GP handler is not an elegant
solution. But it has the least complexity that fits with h/w behavior.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
 arch/x86/include/asm/fpu/api.h |  6 
 arch/x86/include/asm/iommu.h   |  2 ++
 arch/x86/kernel/fpu/xstate.c   | 59 ++
 arch/x86/kernel/traps.c| 12 +++
 drivers/iommu/intel/svm.c  | 32 ++
 5 files changed, 111 insertions(+)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index ca4d0dee1ecd..f146849e5c8c 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -106,4 +106,10 @@ extern int cpu_has_xfeatures(u64 xfeatures_mask, const 
char **feature_name);
  */
 #define PASID_DISABLED 0
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+void fpu__pasid_write(u32 pasid);
+#else
+static inline void fpu__pasid_write(u32 pasid) { }
+#endif
+
 #endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index bf1ed2ddc74b..9c4bf9b0702f 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
return -EINVAL;
 }
 
+bool __fixup_pasid_exception(void);
+
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8def1b7f8fb..8a89b2cecd77 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1289,3 +1289,62 @@ int proc_pid_arch_status(struct seq_file *m, struct 
pid_namespace *ns,
return 0;
 }
 #endif /* CONFIG_PROC_PID_ARCH_STATUS */
+
+#ifdef CONFIG_INTEL_IOMMU_SVM
+/**
+ * fpu__pasid_write - Write the current task's PASID state/MSR.
+ * @pasid: the PASID
+ *
+ * The PASID is written to the IA32_PASID MSR directly if the MSR is active.
+ * Otherwise it's written to the PASID. The IA32_PASID MSR should contain
+ * the PASID after returning to the user.
+ *
+ * This is called only when ENQCMD is enabled.
+ */
+void fpu__pasid_write(u32 pasid)
+{
+   struct xregs_state *xsave = >thread.fpu.state.xsave;
+   u64 msr_val = pasid | MSR_IA32_PASID_VALID;
+   struct fpu *fpu = >thread.fpu;
+
+   /*
+* ENQCMD always uses the compacted XSAVE format. Ensure the buffer
+* has space for the PASID.
+*/
+   BUG_ON(!(xsave->header.xcomp_bv & XFEATURE_MASK_PASID));
+
+   fpregs_lock();
+
+   /*
+* If the task's FPU doesn't need to be loaded or is valid, directly
+* write the IA32_PASID MSR. Otherwise, write the PASID state and
+* the MSR will be loaded from the PASID state before returning to
+* the user.
+*/
+   if (!test_thread_flag(TIF_NEED_FPU_LOAD) ||
+   fpregs_state_valid(fpu, smp_processor_id())) {
+   wrmsrl(MSR_IA32_PASID, msr_val);
+   } else {
+   struct ia32_pasid_state *ppasid_state;
+   /*
+* Mark XFEATURE_PASID as non-init in the XSAVE buffer.
+* This ensures that a subsequent XRSTOR will see the new
+* value instead of writing the init value to the MSR.
+*/
+   xsave->header.xfeatures |= XFEATURE_MASK_PASID;
+   ppasid_state = get_xsave_addr(xsave, XFEATURE_PASID);
+   /*
+* ppasid_state shouldn't be NULL because XFEATURE_PASID
+* was set just now.
+*
+* Please note that the following operation is a "write only"
+* operation on the PASID state and it writes the *ENTIRE*
+* state component. Please don't blindly copy this code to
+* modify other XSAVE states.
+*/
+   ppasid_state->pasid = msr_val;
+   }
+
+   fpregs_unlock();
+}
+#endif /* CONFIG_INTEL_IOMMU_SVM */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a58800973aed..a25d738ae839 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_X86_64
 #include 
@@ -526,6 +527,14 @@ static enum kernel_gp_hint get_kernel_gp_address(struct 
pt_regs *regs,
return GP_CANONICAL;
 }
 
+static bool fixup_pasid_exception(void)
+{
+   if (!cpu_feature_enabled(X86_

[PATCH 0/8] Re-enable ENQCMD and PASID MSR

2021-09-20 Thread Fenghua Yu
Since updating PASID (Process Address Space ID) MSR through IPI has a few
issues that are beyond repair, Thomas disables ENQCMD [1].

Please check Documentation/x86/sva.rst for various concepts and terms
related to PASID, ENQCMD, SVM (Shared Virtual Memory), etc.

This series re-enables ENQCMD and IA32_PASID MSR by using a #GP fix up
method previously published in [2]. A PASID is allocated to a mm once
a SVM is bound to the mm via intel_svm_bind() API. The #GP fix up method
updates the PASID MSR from the mm's PASID in #GP handler when one thread
in a process executes ENQCMD for the first time and one reference is taken
to the PASID. Once the MSR is uploaded, the thread keeps and can use it
for the rest life time of the thread. In exit(2) or unbind, the PASID's
reference is dropped and the PASID is freed if there is no reference.

References:
1. ENQCMD was disabled in upstream due to serious issues:
https://lore.kernel.org/linux-iommu/87mtsd6gr9@nanos.tec.linutronix.de/

2. #GP fix up PASID MSR:
https://lore.kernel.org/linux-iommu/1594684087-61184-1-git-send-email-fenghua...@intel.com/

Fenghua Yu (7):
  iommu/vt-d: Clean up unused PASID updating functions
  x86/process: Clear PASID state for a newly forked/cloned thread
  x86/traps: Demand-populate PASID MSR via #GP
  x86/mmu: Add mm-based PASID refcounting
  x86/cpufeatures: Re-enable ENQCMD
  tools/objtool: Check for use of the ENQCMD instruction in the kernel
  docs: x86: Change documentation for SVA (Shared Virtual Addressing)

Peter Zijlstra (1):
  sched: Define and initialize a flag to identify valid PASID in the
task

 Documentation/x86/sva.rst| 81 ++--
 arch/x86/include/asm/disabled-features.h |  7 +-
 arch/x86/include/asm/fpu/api.h   |  6 +-
 arch/x86/include/asm/iommu.h |  8 ++
 arch/x86/include/asm/mmu_context.h   |  2 +
 arch/x86/kernel/fpu/xstate.c | 59 +++
 arch/x86/kernel/process.c|  8 ++
 arch/x86/kernel/traps.c  | 12 +++
 drivers/iommu/intel/svm.c| 95 ++--
 include/linux/sched.h|  4 +
 kernel/fork.c|  4 +
 tools/objtool/arch/x86/decode.c  | 10 ++-
 tools/objtool/check.c| 20 +
 tools/objtool/include/objtool/arch.h |  1 +
 14 files changed, 283 insertions(+), 34 deletions(-)

-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/8] x86/cpufeatures: Re-enable ENQCMD

2021-09-20 Thread Fenghua Yu
Since ENQCMD is handled by #GP fix up, it can be re-enabled.

The ENQCMD feature cannot be used if CONFIG_INTEL_IOMMU_SVM
is not set. Add X86_FEATURE_ENQCMD to the disabled features mask as
appropriate and use cpu_feature_enabled() to check the feature.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
 arch/x86/include/asm/disabled-features.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h 
b/arch/x86/include/asm/disabled-features.h
index 8f28fafa98b3..1231d63f836d 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -56,8 +56,11 @@
 # define DISABLE_PTI   (1 << (X86_FEATURE_PTI & 31))
 #endif
 
-/* Force disable because it's broken beyond repair */
-#define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
+#ifdef CONFIG_INTEL_IOMMU_SVM
+# define DISABLE_ENQCMD0
+#else
+# define DISABLE_ENQCMD(1 << (X86_FEATURE_ENQCMD & 31))
+#endif
 
 #ifdef CONFIG_X86_SGX
 # define DISABLE_SGX   0
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/vt-d: Fix a deadlock in SVM

2021-08-26 Thread Fenghua Yu
unbind_device+0x62/0x80
[  187.039955]  idxd_cdev_release+0x15a/0x200

pasid_mutex protects pasid and svm data mapping data. It's unnecessary
to hold pasid_mutex while flushing the workqueue. To fix the deadlock
issue, unlock pasid_pasid during flushing the workqueue to allow the works
to be handled.

Fixes: d5b9e4bfe0d8 ("iommu/vt-d: Report prq to io-pgfault framework")

Reported-and-tested-by: Dave Jiang 
Signed-off-by: Fenghua Yu 
---
 drivers/iommu/intel/svm.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ceeca633a5f9..d575082567ca 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -793,7 +793,19 @@ static void intel_svm_drain_prq(struct device *dev, u32 
pasid)
goto prq_retry;
}
 
+   /*
+* A work in IO page fault workqueue may try to lock pasid_mutex now.
+* Holding pasid_mutex while waiting in iopf_queue_flush_dev() for
+* all works in the workqueue to finish may cause deadlock.
+*
+* It's unnecessary to hold pasid_mutex in iopf_queue_flush_dev().
+* Unlock it to allow the works to be handled while waiting for
+* them to finish.
+*/
+   lockdep_assert_held(_mutex);
+   mutex_unlock(_mutex);
iopf_queue_flush_dev(dev);
+   mutex_lock(_mutex);
 
/*
 * Perform steps described in VT-d spec CH7.10 to drain page
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/vt-d: Fix PASID leak

2021-08-26 Thread Fenghua Yu
mm->pasid will be used in intel_svm_free_pasid() after load_pasid()
during unbinding mm. Clearing it in load_pasid() will cause PASID cannot
be freed in intel_svm_free_pasid().

Additionally mm->pasid was updated already before load_pasid() during pasid
allocation. No need to update it again in load_pasid() during binding mm.

Don't update mm->pasid to avoid the issues in both binding mm and
unbinding mm.

Please note that load_pasid() and its called functions will be re-written
in upcoming re-enabling ENQCMD series. This patch tries to fix the issues
cleanly before the re-enabling ENQCMD series.

Fixes: 62ef907a045e ("iommu/vt-d: Fix PASID reference leak")

Reported-and-tested-by: Dave Jiang 
Co-developed-by: Jacob Pan 
Signed-off-by: Jacob Pan 
Signed-off-by: Fenghua Yu 
---
 drivers/iommu/intel/svm.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4b9b3f35ba0e..ceeca633a5f9 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -516,9 +516,6 @@ static void load_pasid(struct mm_struct *mm, u32 pasid)
 {
mutex_lock(>context.lock);
 
-   /* Synchronize with READ_ONCE in update_pasid(). */
-   smp_store_release(>pasid, pasid);
-
/* Update PASID MSR on all CPUs running the mm's tasks. */
on_each_cpu_mask(mm_cpumask(mm), _load_pasid, NULL, true);
 
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: Fix PASID reference leak

2021-08-13 Thread Fenghua Yu
A PASID reference is increased whenever a device is bound to an mm (and
its PASID) successfully (i.e. the device's sdev user count is increased).
But the reference is not dropped every time the device is unbound
successfully from the mm (i.e. the device's sdev user count is decreased).
The reference is dropped only once by calling intel_svm_free_pasid() when
there isn't any device bound to the mm. intel_svm_free_pasid() drops the
reference and only frees the PASID on zero reference.

Fix the issue by dropping the PASID reference and freeing the PASID when
no reference on successful unbinding the device by calling
intel_svm_free_pasid() .

Signed-off-by: Fenghua Yu 
---
 drivers/iommu/intel/svm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 9b0f22bc0514..4b9b3f35ba0e 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -675,7 +675,6 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
pasid)
kfree_rcu(sdev, rcu);
 
if (list_empty(>devs)) {
-   intel_svm_free_pasid(mm);
if (svm->notifier.ops) {
mmu_notifier_unregister(>notifier, 
mm);
/* Clear mm's pasid. */
@@ -690,6 +689,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 
pasid)
kfree(svm);
}
}
+   /* Drop a PASID reference and free it if no reference. */
+   intel_svm_free_pasid(mm);
}
 out:
return ret;
-- 
2.32.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] fork: Clear PASID for new mm

2021-02-25 Thread Fenghua Yu
Hi, Jean,

On Wed, Feb 24, 2021 at 11:19:27AM +0100, Jean-Philippe Brucker wrote:
> Hi Fenghua,
> 
> [Trimmed the Cc list]
> 
> On Mon, Jul 13, 2020 at 04:48:03PM -0700, Fenghua Yu wrote:
> > When a new mm is created, its PASID should be cleared, i.e. the PASID is
> > initialized to its init state 0 on both ARM and X86.
> 
> I just noticed this patch was dropped in v7, and am wondering whether we
> could still upstream it. Does x86 need a child with a new address space
> (!CLONE_VM) to inherit the PASID of the parent?  That doesn't make much
> sense with regard to IOMMU structures - same PASID indexing multiple PGDs?

You are right: x86 should clear mm->pasid when a new mm is created.
This patch somehow is losted:(

> 
> Currently iommu_sva_alloc_pasid() assumes mm->pasid is always initialized
> to 0 and fails on forked tasks. I'm trying to figure out how to fix this.
> Could we clear the pasid on fork or does it break the x86 model?

x86 calls ioasid_alloc() instead of iommu_sva_alloc_pasid(). So
functionality is not a problem without this patch on x86. But I think
we do need to have this patch in the kernel because PASID is per addr
space and two addr spaces shouldn't have the same PASID.

Who will accept this patch?

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 01/13] mm: Define pasid in mm

2020-09-28 Thread Fenghua Yu
Hi, Will and Jean,

On Mon, Sep 28, 2020 at 11:22:51PM +0100, Will Deacon wrote:
> On Fri, Sep 18, 2020 at 12:18:41PM +0200, Jean-Philippe Brucker wrote:
> > From: Fenghua Yu 
> > 
> > PASID is shared by all threads in a process. So the logical place to keep
> > track of it is in the "mm". Both ARM and X86 need to use the PASID in the
> > "mm".
> > 
> > Suggested-by: Christoph Hellwig 
> > Signed-off-by: Fenghua Yu 
> > Reviewed-by: Tony Luck 
> > ---
> > https://lore.kernel.org/linux-iommu/1600187413-163670-8-git-send-email-fenghua...@intel.com/
> > ---
> >  include/linux/mm_types.h | 4 
> >  1 file changed, 4 insertions(+)
> 
> Acked-by: Will Deacon 

FYI. This patch is in x86 maintainers tree tip:x86/pasid now as part of
the x86 PASID MSR series.

Thanks.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 3/9] Documentation/x86: Add documentation for SVA (Shared Virtual Addressing)

2020-09-18 Thread Fenghua Yu
On Thu, Sep 17, 2020 at 07:30:41PM +0200, Borislav Petkov wrote:
> On Thu, Sep 17, 2020 at 10:22:39AM -0700, Raj, Ashok wrote:
> > s/translation again/translation
> 
> Ok, last one. Now stop looking at that text because you'll find more.
> 
> :-)))

Thank you very much for taking care of the series, Boris!

I tested the tip:x86/pasid branch and everything works fine by my tests.
I'll enable more tests for the branch.

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 0/9] x86: tag application address space for devices

2020-09-17 Thread Fenghua Yu
Hi, Joerg,

On Wed, Sep 16, 2020 at 10:06:02AM +0200, Joerg Roedel wrote:
> On Tue, Sep 15, 2020 at 09:30:04AM -0700, Fenghua Yu wrote:
> > Ashok Raj (1):
> >   Documentation/x86: Add documentation for SVA (Shared Virtual
> > Addressing)
> > 
> > Fenghua Yu (7):
> >   drm, iommu: Change type of pasid to u32
> >   iommu/vt-d: Change flags type to unsigned int in binding mm
> >   x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
> >   x86/msr-index: Define IA32_PASID MSR
> >   mm: Define pasid in mm
> >   x86/cpufeatures: Mark ENQCMD as disabled when configured out
> >   x86/mmu: Allocate/free PASID
> > 
> > Yu-cheng Yu (1):
> >   x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature
> 
> For the IOMMU bits:
> 
> Acked-by: Joerg Roedel 

Thank you!

-Fenghua
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 6/9] x86/msr-index: Define IA32_PASID MSR

2020-09-15 Thread Fenghua Yu
The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses PASID to identify process
address space and direct responses to the right address space.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Change "identify process" to "identify process address space" in the
  commit message (Thomas)

 arch/x86/include/asm/msr-index.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 2859ee4f39a8..aaddc6a9e237 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -257,6 +257,9 @@
 #define MSR_IA32_LASTINTFROMIP 0x01dd
 #define MSR_IA32_LASTINTTOIP   0x01de
 
+#define MSR_IA32_PASID 0x0d93
+#define MSR_IA32_PASID_VALID   BIT_ULL(31)
+
 /* DEBUGCTLMSR bits (others vary by model): */
 #define DEBUGCTLMSR_LBR(1UL <<  0) /* last branch 
recording */
 #define DEBUGCTLMSR_BTF_SHIFT  1
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 2/9] iommu/vt-d: Change flags type to unsigned int in binding mm

2020-09-15 Thread Fenghua Yu
"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".

Change its type to "unsigned int".

Suggested-by: Thomas Gleixner 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
Reviewed-by: Lu Baolu 
---
v5:
- Reviewed by Lu Baolu

v2:
- Add this new patch per Thomas' comment.

 drivers/iommu/intel/svm.c   | 7 ---
 include/linux/intel-iommu.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index e78a74a9c1cf..fc90a079e228 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -446,7 +446,8 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
 
 /* Caller must hold pasid_mutex, mm reference */
 static int
-intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
+intel_svm_bind_mm(struct device *dev, unsigned int flags,
+ struct svm_dev_ops *ops,
  struct mm_struct *mm, struct intel_svm_dev **sd)
 {
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
@@ -1033,7 +1034,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 {
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
-   int flags = 0;
+   unsigned int flags = 0;
int ret;
 
/*
@@ -1042,7 +1043,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 * and intel_svm etc.
 */
if (drvdata)
-   flags = *(int *)drvdata;
+   flags = *(unsigned int *)drvdata;
mutex_lock(_mutex);
ret = intel_svm_bind_mm(dev, flags, NULL, mm, );
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 7322073f62d0..9c3e8337442a 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -765,7 +765,7 @@ struct intel_svm {
struct mm_struct *mm;
 
struct intel_iommu *iommu;
-   int flags;
+   unsigned int flags;
u32 pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 3/9] Documentation/x86: Add documentation for SVA (Shared Virtual Addressing)

2020-09-15 Thread Fenghua Yu
From: Ashok Raj 

ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
features are a complicated stack with lots of interconnected pieces.
This documentation provides a big picture overview for all of the
features.

Signed-off-by: Ashok Raj 
Co-developed-by: Fenghua Yu 
Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v8:
- Address all of comments from Boris and Randy.

v7:
- Change the doc for updating PASID by IPI and context switch (Andy).

v3:
- Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu)
- Fix a couple of typos (Baolu)

v2:
- Fix the doc format and add the doc in toctree (Thomas)
- Modify the doc for better description (Thomas, Tony, Dave)

 Documentation/x86/index.rst |   1 +
 Documentation/x86/sva.rst   | 256 
 2 files changed, 257 insertions(+)
 create mode 100644 Documentation/x86/sva.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..e5d5ff096685 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -30,3 +30,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+   sva
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index ..a1f008ef7dad
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,256 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+Shared Virtual Addressing (SVA) with ENQCMD
+===
+
+Background
+==
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM).
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to the PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU also is required
+to support PCIe features ATS and PRI. ATS allows devices to cache
+translations for virtual addresses. The IOMMU driver uses the mmu_notifier()
+support to keep the device TLB cache and the CPU cache in sync. PRI allows
+the device to request paging the virtual address by using the CPU page tables
+before accessing the address.
+
+
+Shared Hardware Workqueues
+==
+
+Unlike Single Root I/O Virtualization (SR-IOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20-bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+==
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement fairness or ensure forward progress should be provided.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permits hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=
+
+A new thread-scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA-capable device this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU-specific API
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+iommu_sva_bind_device(), which will do the following:
+
+- Allocate the PASID, and program the process page-table (%cr3 register) in the
+  PASID context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+  the device TLB in sync. For example, when a page-table entry is invalidated

[PATCH v8 4/9] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions

2020-09-15 Thread Fenghua Yu
Work submission instruction comes in two flavors. ENQCMD can be called
both in ring 3 and ring 0 and always uses the contents of PASID MSR when
shipping the command to the device. ENQCMDS allows a kernel driver to
submit commands on behalf of a user process. The driver supplies the
PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel
as of now.

The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo.

Signed-off-by: Fenghua Yu 
Reviewed-by: Tony Luck 
---
v2:
- Re-write commit message (Thomas)

 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 2901d5df4366..fea10d04d05f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -353,6 +353,7 @@
 #define X86_FEATURE_CLDEMOTE   (16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B  (16*32+28) /* MOVDIR64B instruction */
+#define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS 
instructions */
 
 /* AMD-defined CPU features, CPUID level 0x8007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery 
support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3cbe24ca80ab..3a02707c1f4d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_TOTAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_CQM_MBM_LOCAL,X86_FEATURE_CQM_LLC   },
{ X86_FEATURE_AVX512_BF16,  X86_FEATURE_AVX512VL  },
+   { X86_FEATURE_ENQCMD,   X86_FEATURE_XSAVES},
{}
 };
 
-- 
2.19.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   3   >