Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-31 Thread Jacob Pan
Hi Jason,

On Tue, 31 May 2022 16:05:50 -0300, Jason Gunthorpe  wrote:

> On Tue, May 31, 2022 at 10:29:55AM -0700, Jacob Pan wrote:
> 
> > The reason why I store PASID at IOMMU domain is for IOTLB flush within
> > the domain. Device driver is not aware of domain level IOTLB flush. We
> > also have iova_cookie for each domain which essentially is for
> > RIDPASID.  
> 
> You need to make the PASID stuff work generically.
> 
> The domain needs to hold a list of all the places it needs to flush
> and that list needs to be maintained during attach/detach.
> 
> A single PASID on the domain is obviously never going to work
> generically.
> 
I agree, I did it this way really meant to be part of iommu_domain's
iommu_dma_cookie, not meant to be global. But for the lack of common
storage between identity domain and dma domain, I put it here as global.

Then should we also extract RIDPASID to become part of the generic API?
i.e. set pasid, flush IOTLB etc. Right? RIDPASID is not in group's
pasid_array today.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-31 Thread Jacob Pan
Hi Baolu,

On Tue, 31 May 2022 20:45:28 +0800, Baolu Lu 
wrote:

> On 2022/5/31 18:12, Tian, Kevin wrote:
>  +++ b/include/linux/iommu.h
>  @@ -105,6 +105,8 @@ struct iommu_domain {
>   enum iommu_page_response_code (*iopf_handler)(struct  
> >> iommu_fault *fault,  
> void *data);
>   void *fault_data;
>  +ioasid_t pasid; /* Used for DMA requests
>  with PASID */
>  +atomic_t pasid_users;  
> >>> These are poorly named, this is really the DMA API global PASID and
> >>> shouldn't be used for other things.
> >>>
> >>>
> >>>
> >>> Perhaps I misunderstood, do you mind explaining more?  
> >> You still haven't really explained what this is for in this patch,
> >> maybe it just needs a better commit message, or maybe something is
> >> wrong.
> >>
> >> I keep saying the DMA API usage is not special, so why do we need to
> >> create a new global pasid and refcount? Realistically this is only
> >> going to be used by IDXD, why can't we just allocate a PASID and
> >> return it to the driver every time a driver asks for DMA API on PASI
> >> mode? Why does the core need to do anything special?
> >>  
The reason why I store PASID at IOMMU domain is for IOTLB flush within the
domain. Device driver is not aware of domain level IOTLB flush. We also
have iova_cookie for each domain which essentially is for RIDPASID.

> > Agree. I guess it was a mistake caused by treating ENQCMD as the
> > only user although the actual semantics of the invented interfaces
> > have already evolved to be quite general.
> > 
> > This is very similar to what we have been discussing for iommufd.
> > a PASID is just an additional routing info when attaching a device
> > to an I/O address space (DMA API in this context) and by default
> > it should be a per-device resource except when ENQCMD is
> > explicitly opt in.
> > 
> > Hence it's right time for us to develop common facility working
> > for both this DMA API usage and iommufd, i.e.:
> > 
> > for normal PASID attach to a domain, driver:
> > 
> > allocates a local pasid from device local space;
> > attaches the local pasid to a domain;
> > 
> > for PASID attach in particular for ENQCMD, driver:
> > 
> > allocates a global pasid in system-wide;
> > attaches the global pasid to a domain;
> > set the global pasid in PASID_MSR;
> > 
> > In both cases the pasid is stored in the attach data instead of the
> > domain.
> > 
So during IOTLB flush for the domain, do we loop through the attach data?

> > DMA API pasid is no special from above except it needs to allow
> > one device attached to the same domain twice (one with RID
> > and the other with RID+PASID).
> > 
> > for iommufd those operations are initiated by userspace via
> > iommufd uAPI.  
> 
> My understanding is that device driver owns its PASID policy. If ENQCMD
> is supported on the device, the PASIDs should be allocated through
> ioasid_alloc(). Otherwise, the whole PASID pool is managed by the device
> driver.
> 
It seems the changes we want for this patchset are:
1. move ioasid_alloc() from the core to device (allocation scope will be
based on whether ENQCMD is intended or not)
2. store pasid in the attach data
3. use the same iommufd api to attach/set pasid on its default domain
Am I summarizing correctly?

> For kernel DMA w/ PASID, after the driver has a PASID for this purpose,
> it can just set the default domain to the PASID on device. There's no
> need for enable/disable() interfaces.
> 
> Best regards,
> baolu


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-24 Thread Jacob Pan
Hi Jason,

On Tue, 24 May 2022 15:02:41 -0300, Jason Gunthorpe  wrote:

> On Tue, May 24, 2022 at 09:12:35AM -0700, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Tue, 24 May 2022 10:51:35 -0300, Jason Gunthorpe 
> > wrote: 
> > > On Wed, May 18, 2022 at 11:21:17AM -0700, Jacob Pan wrote:  
> > > > On VT-d platforms with scalable mode enabled, devices issue DMA
> > > > requests with PASID need to attach PASIDs to given IOMMU domains.
> > > > The attach operation involves the following:
> > > > - Programming the PASID into the device's PASID table
> > > > - Tracking device domain and the PASID relationship
> > > > - Managing IOTLB and device TLB invalidations
> > > > 
> > > > This patch add attach_dev_pasid functions to the default domain ops
> > > > which is used by DMA and identity domain types. It could be
> > > > extended to support other domain types whenever necessary.
> > > > 
> > > > Signed-off-by: Lu Baolu 
> > > > Signed-off-by: Jacob Pan 
> > > >  drivers/iommu/intel/iommu.c | 72
> > > > +++-- 1 file changed, 70
> > > > insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/iommu/intel/iommu.c
> > > > b/drivers/iommu/intel/iommu.c index 1c2c92b657c7..75615c105fdf
> > > > 100644 +++ b/drivers/iommu/intel/iommu.c
> > > > @@ -1556,12 +1556,18 @@ static void __iommu_flush_dev_iotlb(struct
> > > > device_domain_info *info, u64 addr, unsigned int mask)
> > > >  {
> > > > u16 sid, qdep;
> > > > +   ioasid_t pasid;
> > > >  
> > > > if (!info || !info->ats_enabled)
> > > > return;
> > > >  
> > > > sid = info->bus << 8 | info->devfn;
> > > > qdep = info->ats_qdep;
> > > > +   pasid = iommu_get_pasid_from_domain(info->dev,
> > > > >domain->domain);
> > > 
> > > No, a simgple domain can be attached to multiple pasids, all need to
> > > be flushed.
> > >   
> > Here is device TLB flush, why would I want to flush PASIDs other than my
> > own device attached?  
> 
> Again, a domain can be attached to multiple PASID's *on the same
> device*
> 
> The idea that there is only one PASID per domain per device is not
> right.
> 
Got you, I was under the impression that there is no use case yet for
multiple PASIDs per device-domain based on our early discussion.
https://lore.kernel.org/lkml/20220315142216.gv11...@nvidia.com/

Perhaps I misunderstood. I will make the API more future proof and search
through the pasid_array xa for *all* domain-device matches. Like you
suggested earlier, may need to retrieve the xa in the first place and use
xas_for_each to get a faster search.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-24 Thread Jacob Pan
Hi Jason,

On Tue, 24 May 2022 10:51:35 -0300, Jason Gunthorpe  wrote:

> On Wed, May 18, 2022 at 11:21:17AM -0700, Jacob Pan wrote:
> > On VT-d platforms with scalable mode enabled, devices issue DMA requests
> > with PASID need to attach PASIDs to given IOMMU domains. The attach
> > operation involves the following:
> > - Programming the PASID into the device's PASID table
> > - Tracking device domain and the PASID relationship
> > - Managing IOTLB and device TLB invalidations
> > 
> > This patch add attach_dev_pasid functions to the default domain ops
> > which is used by DMA and identity domain types. It could be extended to
> > support other domain types whenever necessary.
> > 
> > Signed-off-by: Lu Baolu 
> > Signed-off-by: Jacob Pan 
> >  drivers/iommu/intel/iommu.c | 72 +++--
> >  1 file changed, 70 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 1c2c92b657c7..75615c105fdf 100644
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1556,12 +1556,18 @@ static void __iommu_flush_dev_iotlb(struct
> > device_domain_info *info, u64 addr, unsigned int mask)
> >  {
> > u16 sid, qdep;
> > +   ioasid_t pasid;
> >  
> > if (!info || !info->ats_enabled)
> > return;
> >  
> > sid = info->bus << 8 | info->devfn;
> > qdep = info->ats_qdep;
> > +   pasid = iommu_get_pasid_from_domain(info->dev,
> > >domain->domain);  
> 
> No, a simgple domain can be attached to multiple pasids, all need to
> be flushed.
> 
Here is device TLB flush, why would I want to flush PASIDs other than my
own device attached?

At one level up, we do have a list of device to be flushed.
list_for_each_entry(info, >devices, link)
__iommu_flush_dev_iotlb(info, addr, mask);


Note that RID2PASID is not in the pasid_array, its DEVTLB flush also needs
special handling in that the device is doing DMA w/o PASID, thus not aware
of RID2PASID.


> This whole API isn't suitable.
> 
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-24 Thread Jacob Pan
Hi Jason,

On Tue, 24 May 2022 10:50:34 -0300, Jason Gunthorpe  wrote:

> On Wed, May 18, 2022 at 11:21:15AM -0700, Jacob Pan wrote:
> > DMA requests tagged with PASID can target individual IOMMU domains.
> > Introduce a domain-wide PASID for DMA API, it will be used on the same
> > mapping as legacy DMA without PASID. Let it be IOVA or PA in case of
> > identity domain.  
> 
> Huh? I can't understand what this is trying to say or why this patch
> makes sense.
> 
> We really should not have pasid's like this attached to the domains..
> 
This is the same "DMA API global PASID" you reviewed in v3, I just
singled it out as a standalone patch and renamed it. Here is your previous
review comment.

> +++ b/include/linux/iommu.h
> @@ -105,6 +105,8 @@ struct iommu_domain {
>   enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
> void *data);
>   void *fault_data;
> + ioasid_t pasid; /* Used for DMA requests with PASID */
> + atomic_t pasid_users;  

These are poorly named, this is really the DMA API global PASID and
shouldn't be used for other things.



Perhaps I misunderstood, do you mind explaining more?


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-23 Thread Jacob Pan
Hi Kevin,

On Mon, 23 May 2022 09:14:04 +, "Tian, Kevin" 
wrote:

> > From: Tian, Kevin
> > Sent: Monday, May 23, 2022 3:55 PM
> >   
> > > From: Jacob Pan 
> > > +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> > > iommu_domain *domain)
> > > +{
> > > + struct iommu_domain *tdomain;
> > > + struct iommu_group *group;
> > > + unsigned long index;
> > > + ioasid_t pasid = INVALID_IOASID;
> > > +
> > > + group = iommu_group_get(dev);
> > > + if (!group)
> > > + return pasid;
> > > +
> > > + xa_for_each(>pasid_array, index, tdomain) {
> > > + if (domain == tdomain) {
> > > + pasid = index;
> > > + break;
> > > + }
> > > + }  
> > 
> > Don't we need to acquire the group lock here?
> > 
pasid_array is under RCU read lock so it is protected though may have stale
data. It also used in atomic context for TLB flush, cannot take the
group mutex. If the caller does detach_dev_pasid while doing TLB flush, it
could result in extra flush but harmless.

> > Btw the intention of this function is a bit confusing. Patch01 already
> > stores the pasid under domain hence it's redundant to get it
> > indirectly from xarray index. You could simply introduce a flag bit
> > (e.g. dma_pasid_enabled) in device_domain_info and then directly
> > use domain->dma_pasid once the flag is true.
> >   
> 
> Just saw your discussion with Jason about v3. While it makes sense
> to not specialize DMA domain in iommu driver, the use of this function
> should only be that when the call chain doesn't pass down a pasid
> value e.g. when doing cache invalidation for domain map/unmap. If
> the upper interface already carries a pasid e.g. in detach_dev_pasid()
> iommu driver can simply verify that the corresponding pasid xarray 
> entry points to the specified domain instead of using this function to
> loop xarray and then verify the returned pasid (as done in patch03/04).
Excellent point, I could just use xa_load(pasid) to compare the domain
instead of loop through xa.
I will add another helper.

bool iommu_is_pasid_domain_attached(struct device *dev, struct iommu_domain 
*domain, ioasid_t pasid)
{
struct iommu_group *group;
bool ret = false;

group = iommu_group_get(dev);
if (WARN_ON(!group))
return false;

if (domain == xa_load(>pasid_array, pasid))
ret = true;

iommu_group_put(group);

return ret;
}

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/6] iommu: Add PASID support for DMA mapping API users

2022-05-23 Thread Jacob Pan
Hi Kevin,

On Mon, 23 May 2022 08:25:33 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Thursday, May 19, 2022 2:21 AM
> > 
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> > 
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a PASID for work submission within
> > the kernel
> > 2. Enable the kernel PASID on the IOMMU for the device
> > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > be IOVA or physical address in case of pass-through.
> > 
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/dma-iommu.c | 114
> > ++
> >  include/linux/dma-iommu.h |   3 +
> >  2 files changed, 117 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 1ca85d37eeab..6ad7ba619ef0 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
> > phys_addr_t phys;
> >  };
> > 
> > +static DECLARE_IOASID_SET(iommu_dma_pasid);
> > +
> >  enum iommu_dma_cookie_type {
> > IOMMU_DMA_IOVA_COOKIE,
> > IOMMU_DMA_MSI_COOKIE,
> > @@ -370,6 +372,118 @@ void iommu_put_dma_cookie(struct
> > iommu_domain *domain)
> > domain->iova_cookie = NULL;
> >  }
> > 
> > +/* Protect iommu_domain DMA PASID data */
> > +static DEFINE_MUTEX(dma_pasid_lock);
> > +/**
> > + * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the
> > device's
> > + * DMA domain.
> > + * @dev: Device to be enabled
> > + * @pasid: The returned kernel PASID to be used for DMA
> > + *
> > + * DMA request with PASID will be mapped the same way as the legacy
> > DMA.
> > + * If the device is in pass-through, PASID will also pass-through. If
> > the
> > + * device is in IOVA, the PASID will point to the same IOVA page table.
> > + *
> > + * @return err code or 0 on success
> > + */
> > +int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)  
> 
> iommu_attach_dma_domain_pasid? 'dma_pasid' is too broad from
> a API p.o.v.
> 
I agree dma_pasid is too broad, technically it is dma_api_pasid but seems
too long.
My concern with dma_domain_pasid is that the pasid can also be used for
identity domain.

> > +{
> > +   struct iommu_domain *dom;
> > +   ioasid_t id, max;
> > +   int ret = 0;
> > +
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > +   return -ENODEV;
> > +
> > +   /* Only support domain types that DMA API can be used */
> > +   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
> > +   dom->type == IOMMU_DOMAIN_BLOCKED) {
> > +   dev_warn(dev, "Invalid domain type %d", dom->type);
> > +   return -EPERM;
> > +   }  
> 
> WARN_ON.
> 
> and probably we can just check whether domain is default domain here.
> 
good point, I will just use
struct iommu_domain *def_domain = iommu_get_dma_domain(dev);

> > +
> > +   mutex_lock(_pasid_lock);
> > +   id = dom->dma_pasid;
> > +   if (!id) {
> > +   /*
> > +* First device to use PASID in its DMA domain,
> > allocate
> > +* a single PASID per DMA domain is all we need, it is
> > also
> > +* good for performance when it comes down to IOTLB
> > flush.
> > +*/
> > +   max = 1U << dev->iommu->pasid_bits;
> > +   if (!max) {
> > +   ret = -EINVAL;
> > +   goto done_unlock;
> > +   }
> > +
> > +   id = ioasid_alloc(_dma_pasid, 1, max, dev);
> > +   if (id == INVALID_IOASID) {
> > +   ret = -ENOMEM;
> > +   goto done_unlock;
> > +   }
> > +
> > +   dom->dma_pasid = id;
> > +   atomic_set(>dma_pasid_users, 1);  
> 
> this is always a

Re: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-20 Thread Jacob Pan
Hi Christoph,

On Wed, 18 May 2022 23:48:44 -0700, Christoph Hellwig 
wrote:

> On Wed, May 18, 2022 at 11:21:16AM -0700, Jacob Pan wrote:
> > +ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct
> > iommu_domain *domain)  
> 
> Overly long line here.
will fix,

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-19 Thread Jacob Pan
Hi Jason,

On Wed, 18 May 2022 15:52:05 -0300, Jason Gunthorpe  wrote:

> On Wed, May 18, 2022 at 11:42:04AM -0700, Jacob Pan wrote:
> 
> > > Yes.. It seems inefficient to iterate over that xarray multiple times
> > > on the flush hot path, but maybe there is little choice. Try to use
> > > use the xas iterators under the xa_lock spinlock..
> > >   
> > xas_for_each takes a max range, here we don't really have one. So I
> > posted v4 w/o using the xas advanced API. Please let me know if you have
> > suggestions.  
> 
> You are supposed to use ULONG_MAX in cases like that.
> 
got it.
> > xa_for_each takes RCU read lock, it should be fast for tlb flush,
> > right? The worst case maybe over flush when we have stale data but
> > should be very rare.  
> 
> Not really, xa_for_each walks the tree for every iteration, it is
> slower than a linked list walk in any cases where the xarray is
> multi-node. xas_for_each is able to retain a pointer where it is in
> the tree so each iteration is usually just a pointer increment.
> 
Thanks for explaining, yeah if we have to iterate multiple times
xas_for_each() is better.

> The downside is you cannot sleep while doing xas_for_each
> 
will do under RCU read lock

> > > The challenge will be accessing the group xa in the first place, but
> > > maybe the core code can gain a function call to return a pointer to
> > > that XA or something..  
>  
> > I added a helper function to find the matching DMA API PASID in v4.  
> 
> Again, why are we focused on DMA API? Nothing you build here should be
> DMA API beyond the fact that the iommu_domain being attached is the
> default domain.
The helper is not DMA API specific. Just a domain-PASID look up. Sorry for
the confusion.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-19 Thread Jacob Pan
Hi Baolu,

On Thu, 19 May 2022 14:41:06 +0800, Baolu Lu 
wrote:

> > IOMMU group maintains a PASID array which stores the associated IOMMU
> > domains. This patch introduces a helper function to do domain to PASID
> > look up. It will be used by TLB flush and device-PASID attach
> > verification.  
> 
> Do you really need this?
> 
> The IOMMU driver has maintained a device tracking list for each domain.
> It has been used for cache invalidation when unmap() is called against
> dma domain.
Yes, I am aware of the device list. In v3, I stored DMA API PASID in device
list of device_domain_info. Since we already have a pasid_array, Jason
suggested to share the storage with the code. This helper is needed to
reverse look up the DMA PASID based on the domain attached.
Discussions here:
https://lore.kernel.org/lkml/20220511170025.gf49...@nvidia.com/t/#mf7cb7d54d89e6e732a020dc22435260da0a49580

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-18 Thread Jacob Pan
Hi Jason,

On Wed, 11 May 2022 15:29:08 -0300, Jason Gunthorpe  wrote:

> On Wed, May 11, 2022 at 10:25:21AM -0700, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Wed, 11 May 2022 14:00:25 -0300, Jason Gunthorpe 
> > wrote: 
> > > On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote:  
> > > > > > If not global, perhaps we could have a list of pasids (e.g.
> > > > > > xarray) attached to the device_domain_info. The TLB flush logic
> > > > > > would just go through the list w/o caring what the PASIDs are
> > > > > > for. Does it make sense to you?  
> > > > > 
> > > > > Sort of, but we shouldn't duplicate xarrays - the group already
> > > > > has this xarray - need to find some way to allow access to it
> > > > > from the driver.
> > > > > 
> > > > I am not following,  here are the PASIDs for devTLB flush which is
> > > > per device. Why group?
> > > 
> > > Because group is where the core code stores it.  
> > I see, with singleton group. I guess I can let dma-iommu code call
> > 
> > iommu_attach_dma_pasid {
> > iommu_attach_device_pasid();
> > Then the PASID will be stored in the group xa.  
> 
> Yes, again, the dma-iommu should not be any different from the normal
> unmanaged path. At this point there is no longer any difference, we
> should not invent new ones.
> 
> > The flush code can retrieve PASIDs from device_domain_info.device ->
> > group -> pasid_array.  Thanks for pointing it out, I missed the new
> > pasid_array.  
> 
> Yes.. It seems inefficient to iterate over that xarray multiple times
> on the flush hot path, but maybe there is little choice. Try to use
> use the xas iterators under the xa_lock spinlock..
> 
xas_for_each takes a max range, here we don't really have one. So I posted
v4 w/o using the xas advanced API. Please let me know if you have
suggestions.
xa_for_each takes RCU read lock, it should be fast for tlb flush, right? The
worst case maybe over flush when we have stale data but should be very rare.

> The challenge will be accessing the group xa in the first place, but
> maybe the core code can gain a function call to return a pointer to
> that XA or something..
> 
I added a helper function to find the matching DMA API PASID in v4.


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 6/6] iommu/vt-d: Delete unused SVM flag

2022-05-18 Thread Jacob Pan
Supervisor PASID for SVA/SVM is no longer supported, delete the unused
flag.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c |  2 +-
 include/linux/intel-svm.h | 13 -
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 44331db060e4..5b220d464218 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -750,7 +750,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 * to unbind the mm while any page faults are 
outstanding.
 */
svm = pasid_private_find(req->pasid);
-   if (IS_ERR_OR_NULL(svm) || (svm->flags & 
SVM_FLAG_SUPERVISOR_MODE))
+   if (IS_ERR_OR_NULL(svm))
goto bad_req;
}
 
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index b3b125b332aa..6835a665c195 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -13,17 +13,4 @@
 #define PRQ_RING_MASK  ((0x1000 << PRQ_ORDER) - 0x20)
 #define PRQ_DEPTH  ((0x1000 << PRQ_ORDER) >> 5)
 
-/*
- * The SVM_FLAG_SUPERVISOR_MODE flag requests a PASID which can be used only
- * for access to kernel addresses. No IOTLB flushes are automatically done
- * for kernel mappings; it is valid only for access to the kernel's static
- * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
- * A future API addition may permit the use of such ranges, by means of an
- * explicit IOTLB flush call (akin to the DMA API's unmap method).
- *
- * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
- * do such IOTLB flushes automatically.
- */
-#define SVM_FLAG_SUPERVISOR_MODE   BIT(0)
-
 #endif /* __INTEL_SVM_H__ */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 2/6] iommu: Add a helper to do PASID lookup from domain

2022-05-18 Thread Jacob Pan
IOMMU group maintains a PASID array which stores the associated IOMMU
domains. This patch introduces a helper function to do domain to PASID
look up. It will be used by TLB flush and device-PASID attach verification.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c | 22 ++
 include/linux/iommu.h |  6 +-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 00d0262a1fe9..22f44833db64 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3199,3 +3199,25 @@ struct iommu_domain *iommu_get_domain_for_iopf(struct 
device *dev,
 
return domain;
 }
+
+ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct iommu_domain 
*domain)
+{
+   struct iommu_domain *tdomain;
+   struct iommu_group *group;
+   unsigned long index;
+   ioasid_t pasid = INVALID_IOASID;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return pasid;
+
+   xa_for_each(>pasid_array, index, tdomain) {
+   if (domain == tdomain) {
+   pasid = index;
+   break;
+   }
+   }
+   iommu_group_put(group);
+
+   return pasid;
+}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 36ad007084cc..c0440a4be699 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -694,7 +694,7 @@ void iommu_detach_device_pasid(struct iommu_domain *domain,
   struct device *dev, ioasid_t pasid);
 struct iommu_domain *
 iommu_get_domain_for_iopf(struct device *dev, ioasid_t pasid);
-
+ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct iommu_domain 
*domain);
 #else /* CONFIG_IOMMU_API */
 
 struct iommu_ops {};
@@ -1070,6 +1070,10 @@ iommu_get_domain_for_iopf(struct device *dev, ioasid_t 
pasid)
 {
return NULL;
 }
+static ioasid_t iommu_get_pasid_from_domain(struct device *dev, struct 
iommu_domain *domain)
+{
+   return INVALID_IOASID;
+}
 #endif /* CONFIG_IOMMU_API */
 
 #ifdef CONFIG_IOMMU_SVA
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 5/6] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2022-05-18 Thread Jacob Pan
The current in-kernel supervisor PASID support is based on the SVM/SVA
machinery in SVA lib. The binding between a kernel PASID and kernel
mapping has many flaws. See discussions in the link below.

This patch enables in-kernel DMA by switching from SVA lib to the
standard DMA mapping APIs. Since both DMA requests with and without
PASIDs are mapped identically, there is no change to how DMA APIs are
used after the kernel PASID is enabled.

Link: https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/idxd.h  |  1 -
 drivers/dma/idxd/init.c  | 34 +-
 drivers/dma/idxd/sysfs.c |  7 ---
 3 files changed, 9 insertions(+), 33 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index ccbefd0be617..190b08bd7c08 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -277,7 +277,6 @@ struct idxd_device {
struct idxd_wq **wqs;
struct idxd_engine **engines;
 
-   struct iommu_sva *sva;
unsigned int pasid;
 
int num_groups;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index e1b5d1e4a949..e2e1c0eae6d6 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "../dmaengine.h"
@@ -466,36 +467,22 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev, struct idxd_driver_d
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-   int flags;
-   unsigned int pasid;
-   struct iommu_sva *sva;
+   u32 pasid;
+   int ret;
 
-   flags = SVM_FLAG_SUPERVISOR_MODE;
-
-   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
-   if (IS_ERR(sva)) {
-   dev_warn(>pdev->dev,
-"iommu sva bind failed: %ld\n", PTR_ERR(sva));
-   return PTR_ERR(sva);
-   }
-
-   pasid = iommu_sva_get_pasid(sva);
-   if (pasid == IOMMU_PASID_INVALID) {
-   iommu_sva_unbind_device(sva);
-   return -ENODEV;
+   ret = iommu_attach_dma_pasid(>pdev->dev, );
+   if (ret) {
+   dev_err(>pdev->dev, "No DMA PASID %d\n", ret);
+   return ret;
}
-
-   idxd->sva = sva;
idxd->pasid = pasid;
-   dev_dbg(>pdev->dev, "system pasid: %u\n", pasid);
+
return 0;
 }
 
 static void idxd_disable_system_pasid(struct idxd_device *idxd)
 {
-
-   iommu_sva_unbind_device(idxd->sva);
-   idxd->sva = NULL;
+   iommu_detach_dma_pasid(>pdev->dev);
 }
 
 static int idxd_probe(struct idxd_device *idxd)
@@ -527,10 +514,7 @@ static int idxd_probe(struct idxd_device *idxd)
else
set_bit(IDXD_FLAG_PASID_ENABLED, >flags);
}
-   } else if (!sva) {
-   dev_warn(dev, "User forced SVA off via module param.\n");
}
-
idxd_read_caps(idxd);
idxd_read_table_offsets(idxd);
 
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index dfd549685c46..a48928973bd4 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -839,13 +839,6 @@ static ssize_t wq_name_store(struct device *dev,
if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
return -EINVAL;
 
-   /*
-* This is temporarily placed here until we have SVM support for
-* dmaengine.
-*/
-   if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
-   return -EOPNOTSUPP;
-
memset(wq->name, 0, WQ_NAME_SIZE + 1);
strncpy(wq->name, buf, WQ_NAME_SIZE);
strreplace(wq->name, '\n', '\0');
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 4/6] iommu: Add PASID support for DMA mapping API users

2022-05-18 Thread Jacob Pan
DMA mapping API is the de facto standard for in-kernel DMA. It operates
on a per device/RID basis which is not PASID-aware.

Some modern devices such as Intel Data Streaming Accelerator, PASID is
required for certain work submissions. To allow such devices use DMA
mapping API, we need the following functionalities:
1. Provide device a way to retrieve a PASID for work submission within
the kernel
2. Enable the kernel PASID on the IOMMU for the device
3. Attach the kernel PASID to the device's default DMA domain, let it
be IOVA or physical address in case of pass-through.

This patch introduces a driver facing API that enables DMA API
PASID usage. Once enabled, device drivers can continue to use DMA APIs as
is. There is no difference in dma_handle between without PASID and with
PASID.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dma-iommu.c | 114 ++
 include/linux/dma-iommu.h |   3 +
 2 files changed, 117 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1ca85d37eeab..6ad7ba619ef0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
phys_addr_t phys;
 };
 
+static DECLARE_IOASID_SET(iommu_dma_pasid);
+
 enum iommu_dma_cookie_type {
IOMMU_DMA_IOVA_COOKIE,
IOMMU_DMA_MSI_COOKIE,
@@ -370,6 +372,118 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
domain->iova_cookie = NULL;
 }
 
+/* Protect iommu_domain DMA PASID data */
+static DEFINE_MUTEX(dma_pasid_lock);
+/**
+ * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the device's
+ * DMA domain.
+ * @dev: Device to be enabled
+ * @pasid: The returned kernel PASID to be used for DMA
+ *
+ * DMA request with PASID will be mapped the same way as the legacy DMA.
+ * If the device is in pass-through, PASID will also pass-through. If the
+ * device is in IOVA, the PASID will point to the same IOVA page table.
+ *
+ * @return err code or 0 on success
+ */
+int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)
+{
+   struct iommu_domain *dom;
+   ioasid_t id, max;
+   int ret = 0;
+
+   dom = iommu_get_domain_for_dev(dev);
+   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
+   return -ENODEV;
+
+   /* Only support domain types that DMA API can be used */
+   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
+   dom->type == IOMMU_DOMAIN_BLOCKED) {
+   dev_warn(dev, "Invalid domain type %d", dom->type);
+   return -EPERM;
+   }
+
+   mutex_lock(_pasid_lock);
+   id = dom->dma_pasid;
+   if (!id) {
+   /*
+* First device to use PASID in its DMA domain, allocate
+* a single PASID per DMA domain is all we need, it is also
+* good for performance when it comes down to IOTLB flush.
+*/
+   max = 1U << dev->iommu->pasid_bits;
+   if (!max) {
+   ret = -EINVAL;
+   goto done_unlock;
+   }
+
+   id = ioasid_alloc(_dma_pasid, 1, max, dev);
+   if (id == INVALID_IOASID) {
+   ret = -ENOMEM;
+   goto done_unlock;
+   }
+
+   dom->dma_pasid = id;
+   atomic_set(>dma_pasid_users, 1);
+   }
+
+   ret = iommu_attach_device_pasid(dom, dev, id);
+   if (!ret) {
+   *pasid = id;
+   atomic_inc(>dma_pasid_users);
+   goto done_unlock;
+   }
+
+   if (atomic_dec_and_test(>dma_pasid_users)) {
+   ioasid_free(id);
+   dom->dma_pasid = 0;
+   }
+done_unlock:
+   mutex_unlock(_pasid_lock);
+   return ret;
+}
+EXPORT_SYMBOL(iommu_attach_dma_pasid);
+
+/**
+ * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID
+ * @dev:   Device's PASID DMA to be disabled
+ *
+ * It is the device driver's responsibility to ensure no more incoming DMA
+ * requests with the kernel PASID before calling this function. IOMMU driver
+ * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
+ * drained.
+ *
+ */
+void iommu_detach_dma_pasid(struct device *dev)
+{
+   struct iommu_domain *dom;
+   ioasid_t pasid;
+
+   dom = iommu_get_domain_for_dev(dev);
+   if (WARN_ON(!dom || !dom->ops || !dom->ops->detach_dev_pasid))
+   return;
+
+   /* Only support DMA API managed domain type */
+   if (WARN_ON(dom->type == IOMMU_DOMAIN_UNMANAGED ||
+   dom->type == IOMMU_DOMAIN_BLOCKED))
+   return;
+
+   mutex_lock(_pasid_lock);
+   pasid = iommu_get_pasid_from_domain(dev, dom);
+   if (!pasid || pasid == INVALID_IOASID) {
+   dev_err(dev, "No valid DMA PASI

[PATCH v4 0/6] Enable PASID for DMA API users

2022-05-18 Thread Jacob Pan
Some modern accelerators such as Intel's Data Streaming Accelerator (DSA)
require PASID in DMA requests to be operational. Specifically, the work
submissions with ENQCMD on shared work queues require PASIDs. The use cases
include both user DMA with shared virtual addressing (SVA) and in-kernel
DMA similar to legacy DMA w/o PASID. Here we address the latter.

DMA mapping API is the de facto standard for in-kernel DMA. However, it
operates on a per device or Requester ID(RID) basis which is not
PASID-aware. To leverage DMA API for devices relies on PASIDs, this
patchset introduces the following APIs

1. A driver facing API that enables DMA API PASID usage:
iommu_attach_dma_pasid(struct device *dev, ioasid_t );
2. VT-d driver default domain op that allows attaching device-domain-PASID

Once PASID DMA is enabled and attached to the appropriate IOMMU domain,
device drivers can continue to use DMA APIs as-is. There is no difference
in terms of mapping in dma_handle between without PASID and with PASID.
The DMA mapping performed by IOMMU will be identical for both requests, let
it be IOVA or PA in case of pass-through.

In addition, this set converts DSA driver in-kernel DMA with PASID from SVA
lib to DMA API. There have been security and functional issues with the
kernel SVA approach:
(https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
The highlights are as the following:
 - The lack of IOTLB synchronization upon kernel page table updates.
   (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
 - Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use cases yet
where DMA engines need kernel virtual addresses for in-kernel DMA.

Subsequently, cleanup is done around the usage of sva_bind_device() for
in-kernel DMA. Removing special casing code in VT-d driver and tightening
SVA lib API.

This work and idea behind it is a collaboration with many people, many
thanks to Baolu Lu, Jason Gunthorpe, Dave Jiang, and others.


ChangeLog:
v4
- Rebased on "Baolu's SVA and IOPF refactoring" series v6.

(https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v6)
- Fixed locking for protecting iommu domain PASID data
- Use iommu_attach_device_pasid() API instead of calling domain
  ops directly. This will leverage the common pasid_array that
  replaces driver specific storage in device_domain_info.
- Added a helper function to do look up in pasid_array from
  domain

v3
- Rebased on "Baolu's SVA and IOPF refactoring" series v5.

(https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v5)
This version is significantly simplified by leveraging IOMMU domain
ops, attach_dev_pasid() op is implemented differently on a DMA domain
than on a SVA domain.
We currently have no need to support multiple PASIDs per DMA domain.
(https://lore.kernel.org/lkml/20220315142216.gv11...@nvidia.com/).
Removed PASID-device list from V2, a PASID field is introduced to
struct iommu_domain instead. It is intended for DMA requests with
PASID by all devices attached to the domain.

v2
- Do not reserve a special PASID for DMA API usage. Use IOASID
  allocation instead.
- Introduced a generic device-pasid-domain attachment IOMMU op.
  Replaced the DMA API only IOMMU op.
- Removed supervisor SVA support in VT-d
- Removed unused sva_bind_device parameters
- Use IOMMU specific data instead of struct device to store PASID
  info




*** SUBJECT HERE ***

*** BLURB HERE ***

Jacob Pan (6):
  iommu: Add a per domain PASID for DMA API
  iommu: Add a helper to do PASID lookup from domain
  iommu/vt-d: Implement domain ops for attach_dev_pasid
  iommu: Add PASID support for DMA mapping API users
  dmaengine: idxd: Use DMA API for in-kernel DMA with PASID
  iommu/vt-d: Delete unused SVM flag

 drivers/dma/idxd/idxd.h |   1 -
 drivers/dma/idxd/init.c |  34 +++
 drivers/dma/idxd/sysfs.c|   7 ---
 drivers/iommu/dma-iommu.c   | 114 
 drivers/iommu/intel/iommu.c |  72 ++-
 drivers/iommu/intel/svm.c   |   2 +-
 drivers/iommu/iommu.c   |  22 +++
 include/linux/dma-iommu.h   |   3 +
 include/linux/intel-svm.h   |  13 
 include/linux/iommu.h   |   8 ++-
 10 files changed, 226 insertions(+), 50 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/6] iommu: Add a per domain PASID for DMA API

2022-05-18 Thread Jacob Pan
DMA requests tagged with PASID can target individual IOMMU domains.
Introduce a domain-wide PASID for DMA API, it will be used on the same
mapping as legacy DMA without PASID. Let it be IOVA or PA in case of
identity domain.

Signed-off-by: Jacob Pan 
---
 include/linux/iommu.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9405034e3013..36ad007084cc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -106,6 +106,8 @@ struct iommu_domain {
enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
  void *data);
void *fault_data;
+   ioasid_t dma_pasid; /* Used for DMA requests with PASID */
+   atomic_t dma_pasid_users;
 };
 
 static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/6] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-18 Thread Jacob Pan
On VT-d platforms with scalable mode enabled, devices issue DMA requests
with PASID need to attach PASIDs to given IOMMU domains. The attach
operation involves the following:
- Programming the PASID into the device's PASID table
- Tracking device domain and the PASID relationship
- Managing IOTLB and device TLB invalidations

This patch add attach_dev_pasid functions to the default domain ops which
is used by DMA and identity domain types. It could be extended to support
other domain types whenever necessary.

Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 72 +++--
 1 file changed, 70 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 1c2c92b657c7..75615c105fdf 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1556,12 +1556,18 @@ static void __iommu_flush_dev_iotlb(struct 
device_domain_info *info,
u64 addr, unsigned int mask)
 {
u16 sid, qdep;
+   ioasid_t pasid;
 
if (!info || !info->ats_enabled)
return;
 
sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
+   pasid = iommu_get_pasid_from_domain(info->dev, >domain->domain);
+   if (pasid != INVALID_IOASID) {
+   qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
+pasid, qdep, addr, mask);
+   }
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
   qdep, addr, mask);
 }
@@ -1591,6 +1597,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
unsigned int mask = ilog2(aligned_pages);
uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
u16 did = domain->iommu_did[iommu->seq_id];
+   struct iommu_domain *iommu_domain = >domain;
 
BUG_ON(pages == 0);
 
@@ -1599,6 +1606,9 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
 
if (domain_use_first_level(domain)) {
qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
+   /* flush additional kernel DMA PASIDs attached */
+   if (iommu_domain->dma_pasid)
+   qi_flush_piotlb(iommu, did, iommu_domain->dma_pasid, 
addr, pages, ih);
} else {
unsigned long bitmask = aligned_pages - 1;
 
@@ -4255,6 +4265,7 @@ static void __dmar_remove_one_dev_info(struct 
device_domain_info *info)
struct dmar_domain *domain;
struct intel_iommu *iommu;
unsigned long flags;
+   ioasid_t pasid;
 
assert_spin_locked(_domain_lock);
 
@@ -4265,10 +4276,15 @@ static void __dmar_remove_one_dev_info(struct 
device_domain_info *info)
domain = info->domain;
 
if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
-   if (dev_is_pci(info->dev) && sm_supported(iommu))
+   if (dev_is_pci(info->dev) && sm_supported(iommu)) {
intel_pasid_tear_down_entry(iommu, info->dev,
PASID_RID2PASID, false);
-
+   pasid = iommu_get_pasid_from_domain(info->dev,
+   
>domain->domain);
+   if (pasid != INVALID_IOASID)
+   intel_pasid_tear_down_entry(iommu, info->dev,
+   pasid, false);
+   }
iommu_disable_dev_iotlb(info);
domain_context_clear(info);
intel_pasid_free_table(info->dev);
@@ -4904,6 +4920,56 @@ static void intel_iommu_iotlb_sync_map(struct 
iommu_domain *domain,
}
 }
 
+static int intel_iommu_attach_dev_pasid(struct iommu_domain *domain,
+   struct device *dev,
+   ioasid_t pasid)
+{
+   struct device_domain_info *info = dev_iommu_priv_get(dev);
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct intel_iommu *iommu = info->iommu;
+   unsigned long flags;
+   int ret = 0;
+
+   if (!sm_supported(iommu) || !info)
+   return -ENODEV;
+
+   if (WARN_ON(pasid == PASID_RID2PASID))
+   return -EINVAL;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   spin_lock(>lock);
+   if (hw_pass_through && domain_type_is_si(dmar_domain))
+   ret = intel_pasid_setup_pass_through(iommu, dmar_domain,
+dev, pasid);
+   else if (domain_use_first_level(dmar_domain))
+   ret = domain_setup_first_level(iommu, dmar_domain,
+  dev, pasid);
+   else
+   ret = intel_pasid_s

Re: [PATCH 6/7] x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU

2022-05-16 Thread Jacob Pan
Hi Jason,

On Mon, 16 May 2022 15:06:28 -0300, Jason Gunthorpe  wrote:

> Unrelated, but when we are in the special secure IOMMU modes, do we
> force ATS off? Specifically does the IOMMU reject TLPs that are marked
> as translated?
Yes, VT-d context entry has a Device TLB Enable bit, if 0, it means
"Translation Requests (with or without PASID) and Translated Requests
received and processed through this scalable-mode context-entry are
blocked."

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jacob Pan
Hi Jason,

On Wed, 11 May 2022 14:00:25 -0300, Jason Gunthorpe  wrote:

> On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote:
> > > > If not global, perhaps we could have a list of pasids (e.g. xarray)
> > > > attached to the device_domain_info. The TLB flush logic would just
> > > > go through the list w/o caring what the PASIDs are for. Does it
> > > > make sense to you?
> > > 
> > > Sort of, but we shouldn't duplicate xarrays - the group already has
> > > this xarray - need to find some way to allow access to it from the
> > > driver.
> > >   
> > I am not following,  here are the PASIDs for devTLB flush which is per
> > device. Why group?  
> 
> Because group is where the core code stores it.
I see, with singleton group. I guess I can let dma-iommu code call

iommu_attach_dma_pasid {
iommu_attach_device_pasid();
Then the PASID will be stored in the group xa.
The flush code can retrieve PASIDs from device_domain_info.device -> group
-> pasid_array.
Thanks for pointing it out, I missed the new pasid_array.
> 
> > We could retrieve PASIDs from the device PASID table but xa would be
> > more efficient.
> >   
> > > > > > Are you suggesting the dma-iommu API should be called
> > > > > > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?  
> > > > > 
> > > > > No that API is Ok - the driver ops API should be 'set' not
> > > > > attach/detach   
> > > > Sounds good, this operation has little in common with
> > > > domain_ops.dev_attach_pasid() used by SVA domain. So I will add a
> > > > new domain_ops.dev_set_pasid()
> > > 
> > > What? No, their should only be one operation, 'dev_set_pasid' and it
> > > is exactly the same as the SVA operation. It configures things so that
> > > any existing translation on the PASID is removed and the PASID
> > > translates according to the given domain.
> > > 
> > > SVA given domain or UNMANAGED given domain doesn't matter to the
> > > higher level code. The driver should implement per-domain ops as
> > > required to get the different behaviors.  
> > Perhaps some code to clarify, we have
> > sva_domain_ops.dev_attach_pasid() = intel_svm_attach_dev_pasid;
> > default_domain_ops.dev_attach_pasid() = intel_iommu_attach_dev_pasid;  
> 
> Yes, keep that structure
>  
> > Consolidate pasid programming into dev_set_pasid() then called by both
> > intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?
> >  
> 
> I was only suggesting that really dev_attach_pasid() op is misnamed,
> it should be called set_dev_pasid() and act like a set, not a paired
> attach/detach - same as the non-PASID ops.
> 
Got it. Perhaps another patch to rename, Baolu?


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jacob Pan
Hi Jason,

On Wed, 11 May 2022 13:12:37 -0300, Jason Gunthorpe  wrote:

> On Wed, May 11, 2022 at 08:35:18AM -0700, Jacob Pan wrote:
> 
> > > Huh? The intel driver shares the same ops between UNMANAGED and DMA -
> > > and in general I do not think we should be putting special knowledge
> > > about the DMA domains in the drivers. Drivers should continue to treat
> > > them identically to UNMANAGED.
> > >   
> > OK, other than SVA domain, the rest domain types share the same default
> > ops. I agree that the default ops should be the same for UNMANAGED,
> > IDENTITY, and DMA domain types. Minor detail is that we need to treat
> > IDENTITY domain slightly different when it comes down to PASID entry
> > programming.  
> 
> I would be happy if IDENTITY had its own ops, if that makes sense
> 
I have tried to have its own ops but there are complications around
checking if a domain has ops. It would be a logic thing to clean up next.

> > If not global, perhaps we could have a list of pasids (e.g. xarray)
> > attached to the device_domain_info. The TLB flush logic would just go
> > through the list w/o caring what the PASIDs are for. Does it make sense
> > to you?  
> 
> Sort of, but we shouldn't duplicate xarrays - the group already has
> this xarray - need to find some way to allow access to it from the
> driver.
> 
I am not following,  here are the PASIDs for devTLB flush which is per
device. Why group?
We could retrieve PASIDs from the device PASID table but xa would be more
efficient.

> > > > Are you suggesting the dma-iommu API should be called
> > > > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?
> > > 
> > > No that API is Ok - the driver ops API should be 'set' not
> > > attach/detach 
> > Sounds good, this operation has little in common with
> > domain_ops.dev_attach_pasid() used by SVA domain. So I will add a new
> > domain_ops.dev_set_pasid()  
> 
> What? No, their should only be one operation, 'dev_set_pasid' and it
> is exactly the same as the SVA operation. It configures things so that
> any existing translation on the PASID is removed and the PASID
> translates according to the given domain.
> 
> SVA given domain or UNMANAGED given domain doesn't matter to the
> higher level code. The driver should implement per-domain ops as
> required to get the different behaviors.
Perhaps some code to clarify, we have
sva_domain_ops.dev_attach_pasid() = intel_svm_attach_dev_pasid;
default_domain_ops.dev_attach_pasid() = intel_iommu_attach_dev_pasid;

Consolidate pasid programming into dev_set_pasid() then called by both
intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jacob Pan
Hi Jason,

On Wed, 11 May 2022 08:54:27 -0300, Jason Gunthorpe  wrote:

> On Tue, May 10, 2022 at 05:23:09PM -0700, Jacob Pan wrote:
> 
> > > > diff --git a/include/linux/intel-iommu.h
> > > > b/include/linux/intel-iommu.h index 5af24befc9f1..55845a8c4f4d
> > > > 100644 +++ b/include/linux/intel-iommu.h
> > > > @@ -627,6 +627,7 @@ struct device_domain_info {
> > > > struct intel_iommu *iommu; /* IOMMU used by this device */
> > > > struct dmar_domain *domain; /* pointer to domain */
> > > > struct pasid_table *pasid_table; /* pasid table */
> > > > +   ioasid_t pasid; /* DMA request with PASID */
> > > 
> > > And this seems wrong - the DMA API is not the only user of
> > > attach_dev_pasid, so there should not be any global pasid for the
> > > device.
> > >   
> > True but the attach_dev_pasid() op is domain type specific. i.e. DMA API
> > has its own attach_dev_pasid which is different than sva domain
> > attach_dev_pasid().  
> 
> Huh? The intel driver shares the same ops between UNMANAGED and DMA -
> and in general I do not think we should be putting special knowledge
> about the DMA domains in the drivers. Drivers should continue to treat
> them identically to UNMANAGED.
> 
OK, other than SVA domain, the rest domain types share the same default ops.
I agree that the default ops should be the same for UNMANAGED, IDENTITY, and
DMA domain types. Minor detail is that we need to treat IDENTITY domain
slightly different when it comes down to PASID entry programming.

If not global, perhaps we could have a list of pasids (e.g. xarray) attached
to the device_domain_info. The TLB flush logic would just go through the
list w/o caring what the PASIDs are for. Does it make sense to you?

> > device_domain_info is only used by DMA API.  
> 
> Huh?
My mistake, i meant the device_domain_info.pasid is only used by DMA API

>  
> > > I suspect this should be a counter of # of pasid domains attached so
> > > that the special flush logic triggers
> > >   
> > This field is only used for devTLB, so it is per domain-device. struct
> > device_domain_info is allocated per device-domain as well. Sorry, I
> > might have totally missed your point.  
> 
> You can't store a single pasid in the driver like this, since the only
> thing it does is trigger the flush logic just count how many pasids
> are used by the device-domain and trigger pasid flush if any pasids
> are attached
> 
Got it, will put the pasids in an xa as described above.

> > > And rely on the core code to worry about assigning only one domain per
> > > pasid - this should really be a 'set' function.  
> >
> > Yes, in this set the core code (in dma-iommu.c) only assign one PASID
> > per DMA domain type.
> > 
> > Are you suggesting the dma-iommu API should be called
> > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?  
> 
> No that API is Ok - the driver ops API should be 'set' not attach/detach
> 
Sounds good, this operation has little in common with
domain_ops.dev_attach_pasid() used by SVA domain. So I will add a new
domain_ops.dev_set_pasid()


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/4] iommu: Add PASID support for DMA mapping API users

2022-05-10 Thread Jacob Pan
Hi Jason,

On Tue, 10 May 2022 20:28:04 -0300, Jason Gunthorpe  wrote:

> On Tue, May 10, 2022 at 02:07:02PM -0700, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> > 
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a PASID for work submission within
> > the kernel
> > 2. Enable the kernel PASID on the IOMMU for the device
> > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > be IOVA or physical address in case of pass-through.
> > 
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
> > 
> > Signed-off-by: Jacob Pan 
> >  drivers/iommu/dma-iommu.c | 107 ++
> >  include/linux/dma-iommu.h |   3 ++
> >  include/linux/iommu.h |   2 +
> >  3 files changed, 112 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 1ca85d37eeab..5984f3129fa2 100644
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
> > phys_addr_t phys;
> >  };
> >  
> > +static DECLARE_IOASID_SET(iommu_dma_pasid);
> > +
> >  enum iommu_dma_cookie_type {
> > IOMMU_DMA_IOVA_COOKIE,
> > IOMMU_DMA_MSI_COOKIE,
> > @@ -370,6 +372,111 @@ void iommu_put_dma_cookie(struct iommu_domain
> > *domain) domain->iova_cookie = NULL;
> >  }
> >  
> > +/**
> > + * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the
> > device's
> > + * DMA domain.
> > + * @dev: Device to be enabled
> > + * @pasid: The returned kernel PASID to be used for DMA
> > + *
> > + * DMA request with PASID will be mapped the same way as the legacy
> > DMA.
> > + * If the device is in pass-through, PASID will also pass-through. If
> > the
> > + * device is in IOVA, the PASID will point to the same IOVA page table.
> > + *
> > + * @return err code or 0 on success
> > + */
> > +int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)
> > +{
> > +   struct iommu_domain *dom;
> > +   ioasid_t id, max;
> > +   int ret = 0;
> > +
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > +   return -ENODEV;
> > +
> > +   /* Only support domain types that DMA API can be used */
> > +   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
> > +   dom->type == IOMMU_DOMAIN_BLOCKED) {
> > +   dev_warn(dev, "Invalid domain type %d", dom->type);  
> 
> This should be a WARN_ON
> 
will do, thanks

> > +   return -EPERM;
> > +   }
> > +
> > +   id = dom->pasid;
> > +   if (!id) {
> > +   /*
> > +* First device to use PASID in its DMA domain,
> > allocate
> > +* a single PASID per DMA domain is all we need, it is
> > also
> > +* good for performance when it comes down to IOTLB
> > flush.
> > +*/
> > +   max = 1U << dev->iommu->pasid_bits;
> > +   if (!max)
> > +   return -EINVAL;
> > +
> > +   id = ioasid_alloc(_dma_pasid, 1, max, dev);
> > +   if (id == INVALID_IOASID)
> > +   return -ENOMEM;
> > +
> > +   dom->pasid = id;
> > +   atomic_set(>pasid_users, 1);  
> 
> All of this needs proper locking.
> 
good catch, will add a mutex for domain updates, detach as well.

> > +   }
> > +
> > +   ret = dom->ops->attach_dev_pasid(dom, dev, id);
> > +   if (!ret) {
> > +   *pasid = id;
> > +   atomic_inc(>pasid_users);
> > +   return 0;
> > +   }
> > +
> > +   if (atomic_dec_and_test(>pasid_users)) {
> > +   ioasid_free(id);
> > +   dom->pasid = 0;
> > +   }
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL(iommu_attach_dma_pasid);
> > +
> > +/**
> > + * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID
> > + * @dev:   Device's PASID DMA to be disa

Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-10 Thread Jacob Pan
Hi Jason,

On Tue, 10 May 2022 20:21:21 -0300, Jason Gunthorpe  wrote:

> On Tue, May 10, 2022 at 02:07:01PM -0700, Jacob Pan wrote:
> > +static int intel_iommu_attach_dev_pasid(struct iommu_domain *domain,
> > +   struct device *dev,
> > +   ioasid_t pasid)
> > +{
> > +   struct device_domain_info *info = dev_iommu_priv_get(dev);
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct intel_iommu *iommu = info->iommu;
> > +   unsigned long flags;
> > +   int ret = 0;
> > +
> > +   if (!sm_supported(iommu) || !info)
> > +   return -ENODEV;
> > +
> > +   spin_lock_irqsave(_domain_lock, flags);
> > +   /*
> > +* If the same device already has a PASID attached, just
> > return.
> > +* DMA layer will return the PASID value to the caller.
> > +*/
> > +   if (pasid != PASID_RID2PASID && info->pasid) {  
> 
> Why check for PASID == 0 like this? Shouldn't pasid == 0 be rejected
> as an invalid argument?
Right, I was planning on reuse the attach function for RIDPASID as clean
up, but didn't include here. Will fix.

> 
> > +   if (info->pasid == pasid)
> > +   ret = 0;  
> 
> Doesn't this need to check that the current domain is the requested
> domain as well? How can this happen anyhow - isn't it an error to
> double attach?
> 
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 5af24befc9f1..55845a8c4f4d 100644
> > +++ b/include/linux/intel-iommu.h
> > @@ -627,6 +627,7 @@ struct device_domain_info {
> > struct intel_iommu *iommu; /* IOMMU used by this device */
> > struct dmar_domain *domain; /* pointer to domain */
> > struct pasid_table *pasid_table; /* pasid table */
> > +   ioasid_t pasid; /* DMA request with PASID */  
> 
> And this seems wrong - the DMA API is not the only user of
> attach_dev_pasid, so there should not be any global pasid for the
> device.
> 
True but the attach_dev_pasid() op is domain type specific. i.e. DMA API
has its own attach_dev_pasid which is different than sva domain
attach_dev_pasid().
device_domain_info is only used by DMA API.

> I suspect this should be a counter of # of pasid domains attached so
> that the special flush logic triggers
> 
This field is only used for devTLB, so it is per domain-device. struct
device_domain_info is allocated per device-domain as well. Sorry, I might
have totally missed your point.

> And rely on the core code to worry about assigning only one domain per
> pasid - this should really be a 'set' function.
> 
Yes, in this set the core code (in dma-iommu.c) only assign one PASID per
DMA domain type.

Are you suggesting the dma-iommu API should be called
iommu_set_dma_pasid instead of iommu_attach_dma_pasid?

Thanks a lot for the quick review!

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/4] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2022-05-10 Thread Jacob Pan
The current in-kernel supervisor PASID support is based on the SVM/SVA
machinery in SVA lib. The binding between a kernel PASID and kernel
mapping has many flaws. See discussions in the link below.

This patch enables in-kernel DMA by switching from SVA lib to the
standard DMA mapping APIs. Since both DMA requests with and without
PASIDs are mapped identically, there is no change to how DMA APIs are
used after the kernel PASID is enabled.

Link: https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/idxd.h  |  1 -
 drivers/dma/idxd/init.c  | 34 +-
 drivers/dma/idxd/sysfs.c |  7 ---
 3 files changed, 9 insertions(+), 33 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index ccbefd0be617..190b08bd7c08 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -277,7 +277,6 @@ struct idxd_device {
struct idxd_wq **wqs;
struct idxd_engine **engines;
 
-   struct iommu_sva *sva;
unsigned int pasid;
 
int num_groups;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index e1b5d1e4a949..e2e1c0eae6d6 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "../dmaengine.h"
@@ -466,36 +467,22 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev, struct idxd_driver_d
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-   int flags;
-   unsigned int pasid;
-   struct iommu_sva *sva;
+   u32 pasid;
+   int ret;
 
-   flags = SVM_FLAG_SUPERVISOR_MODE;
-
-   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
-   if (IS_ERR(sva)) {
-   dev_warn(>pdev->dev,
-"iommu sva bind failed: %ld\n", PTR_ERR(sva));
-   return PTR_ERR(sva);
-   }
-
-   pasid = iommu_sva_get_pasid(sva);
-   if (pasid == IOMMU_PASID_INVALID) {
-   iommu_sva_unbind_device(sva);
-   return -ENODEV;
+   ret = iommu_attach_dma_pasid(>pdev->dev, );
+   if (ret) {
+   dev_err(>pdev->dev, "No DMA PASID %d\n", ret);
+   return ret;
}
-
-   idxd->sva = sva;
idxd->pasid = pasid;
-   dev_dbg(>pdev->dev, "system pasid: %u\n", pasid);
+
return 0;
 }
 
 static void idxd_disable_system_pasid(struct idxd_device *idxd)
 {
-
-   iommu_sva_unbind_device(idxd->sva);
-   idxd->sva = NULL;
+   iommu_detach_dma_pasid(>pdev->dev);
 }
 
 static int idxd_probe(struct idxd_device *idxd)
@@ -527,10 +514,7 @@ static int idxd_probe(struct idxd_device *idxd)
else
set_bit(IDXD_FLAG_PASID_ENABLED, >flags);
}
-   } else if (!sva) {
-   dev_warn(dev, "User forced SVA off via module param.\n");
}
-
idxd_read_caps(idxd);
idxd_read_table_offsets(idxd);
 
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index dfd549685c46..a48928973bd4 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -839,13 +839,6 @@ static ssize_t wq_name_store(struct device *dev,
if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
return -EINVAL;
 
-   /*
-* This is temporarily placed here until we have SVM support for
-* dmaengine.
-*/
-   if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
-   return -EOPNOTSUPP;
-
memset(wq->name, 0, WQ_NAME_SIZE + 1);
strncpy(wq->name, buf, WQ_NAME_SIZE);
strreplace(wq->name, '\n', '\0');
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/4] iommu: Add PASID support for DMA mapping API users

2022-05-10 Thread Jacob Pan
DMA mapping API is the de facto standard for in-kernel DMA. It operates
on a per device/RID basis which is not PASID-aware.

Some modern devices such as Intel Data Streaming Accelerator, PASID is
required for certain work submissions. To allow such devices use DMA
mapping API, we need the following functionalities:
1. Provide device a way to retrieve a PASID for work submission within
the kernel
2. Enable the kernel PASID on the IOMMU for the device
3. Attach the kernel PASID to the device's default DMA domain, let it
be IOVA or physical address in case of pass-through.

This patch introduces a driver facing API that enables DMA API
PASID usage. Once enabled, device drivers can continue to use DMA APIs as
is. There is no difference in dma_handle between without PASID and with
PASID.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dma-iommu.c | 107 ++
 include/linux/dma-iommu.h |   3 ++
 include/linux/iommu.h |   2 +
 3 files changed, 112 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1ca85d37eeab..5984f3129fa2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -34,6 +34,8 @@ struct iommu_dma_msi_page {
phys_addr_t phys;
 };
 
+static DECLARE_IOASID_SET(iommu_dma_pasid);
+
 enum iommu_dma_cookie_type {
IOMMU_DMA_IOVA_COOKIE,
IOMMU_DMA_MSI_COOKIE,
@@ -370,6 +372,111 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
domain->iova_cookie = NULL;
 }
 
+/**
+ * iommu_attach_dma_pasid --Attach a PASID for in-kernel DMA. Use the device's
+ * DMA domain.
+ * @dev: Device to be enabled
+ * @pasid: The returned kernel PASID to be used for DMA
+ *
+ * DMA request with PASID will be mapped the same way as the legacy DMA.
+ * If the device is in pass-through, PASID will also pass-through. If the
+ * device is in IOVA, the PASID will point to the same IOVA page table.
+ *
+ * @return err code or 0 on success
+ */
+int iommu_attach_dma_pasid(struct device *dev, ioasid_t *pasid)
+{
+   struct iommu_domain *dom;
+   ioasid_t id, max;
+   int ret = 0;
+
+   dom = iommu_get_domain_for_dev(dev);
+   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
+   return -ENODEV;
+
+   /* Only support domain types that DMA API can be used */
+   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
+   dom->type == IOMMU_DOMAIN_BLOCKED) {
+   dev_warn(dev, "Invalid domain type %d", dom->type);
+   return -EPERM;
+   }
+
+   id = dom->pasid;
+   if (!id) {
+   /*
+* First device to use PASID in its DMA domain, allocate
+* a single PASID per DMA domain is all we need, it is also
+* good for performance when it comes down to IOTLB flush.
+*/
+   max = 1U << dev->iommu->pasid_bits;
+   if (!max)
+   return -EINVAL;
+
+   id = ioasid_alloc(_dma_pasid, 1, max, dev);
+   if (id == INVALID_IOASID)
+   return -ENOMEM;
+
+   dom->pasid = id;
+   atomic_set(>pasid_users, 1);
+   }
+
+   ret = dom->ops->attach_dev_pasid(dom, dev, id);
+   if (!ret) {
+   *pasid = id;
+   atomic_inc(>pasid_users);
+   return 0;
+   }
+
+   if (atomic_dec_and_test(>pasid_users)) {
+   ioasid_free(id);
+   dom->pasid = 0;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL(iommu_attach_dma_pasid);
+
+/**
+ * iommu_detach_dma_pasid --Disable in-kernel DMA request with PASID
+ * @dev:   Device's PASID DMA to be disabled
+ *
+ * It is the device driver's responsibility to ensure no more incoming DMA
+ * requests with the kernel PASID before calling this function. IOMMU driver
+ * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
+ * drained.
+ *
+ */
+void iommu_detach_dma_pasid(struct device *dev)
+{
+   struct iommu_domain *dom;
+   ioasid_t pasid;
+
+   dom = iommu_get_domain_for_dev(dev);
+   if (!dom || !dom->ops || !dom->ops->detach_dev_pasid) {
+   dev_warn(dev, "No ops for detaching PASID %u", pasid);
+   return;
+   }
+   /* Only support DMA API managed domain type */
+   if (dom->type == IOMMU_DOMAIN_UNMANAGED ||
+   dom->type == IOMMU_DOMAIN_BLOCKED) {
+   dev_err(dev, "Invalid domain type %d to detach DMA PASID %u\n",
+dom->type, pasid);
+   return;
+   }
+
+   pasid = dom->pasid;
+   if (!pasid) {
+   dev_err(dev, "No DMA PASID attached\n");
+   return;
+   }
+   dom->ops->detach_dev_pasid(dom, dev, pasid);
+   if (atomic_dec_and_test(>pasid_u

[PATCH v3 4/4] iommu/vt-d: Delete unused SVM flag

2022-05-10 Thread Jacob Pan
Supervisor PASID for SVA/SVM is no longer supported, delete the unused
flag.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c |  2 +-
 include/linux/intel-svm.h | 13 -
 2 files changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 38c33cde177e..98ec77415770 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -750,7 +750,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 * to unbind the mm while any page faults are 
outstanding.
 */
svm = pasid_private_find(req->pasid);
-   if (IS_ERR_OR_NULL(svm) || (svm->flags & 
SVM_FLAG_SUPERVISOR_MODE))
+   if (IS_ERR_OR_NULL(svm))
goto bad_req;
}
 
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index b3b125b332aa..6835a665c195 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -13,17 +13,4 @@
 #define PRQ_RING_MASK  ((0x1000 << PRQ_ORDER) - 0x20)
 #define PRQ_DEPTH  ((0x1000 << PRQ_ORDER) >> 5)
 
-/*
- * The SVM_FLAG_SUPERVISOR_MODE flag requests a PASID which can be used only
- * for access to kernel addresses. No IOTLB flushes are automatically done
- * for kernel mappings; it is valid only for access to the kernel's static
- * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
- * A future API addition may permit the use of such ranges, by means of an
- * explicit IOTLB flush call (akin to the DMA API's unmap method).
- *
- * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
- * do such IOTLB flushes automatically.
- */
-#define SVM_FLAG_SUPERVISOR_MODE   BIT(0)
-
 #endif /* __INTEL_SVM_H__ */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 0/4] Enable PASID for DMA API users

2022-05-10 Thread Jacob Pan
Some modern accelerators such as Intel's Data Streaming Accelerator (DSA)
require PASID in DMA requests to be operational. Specifically, the work
submissions with ENQCMD on shared work queues require PASIDs. The use cases
include both user DMA with shared virtual addressing (SVA) and in-kernel
DMA similar to legacy DMA w/o PASID. Here we address the latter.

DMA mapping API is the de facto standard for in-kernel DMA. However, it
operates on a per device or Requester ID(RID) basis which is not
PASID-aware. To leverage DMA API for devices relies on PASIDs, this
patchset introduces the following APIs

1. A driver facing API that enables DMA API PASID usage:
iommu_enable_pasid_dma(struct device *dev, ioasid_t );

2. An IOMMU op that allows attaching device-domain-PASID generically (will
be used beyond DMA API PASID support)

Once PASID DMA is enabled and attached to the appropriate IOMMU domain,
device drivers can continue to use DMA APIs as-is. There is no difference
in terms of mapping in dma_handle between without PASID and with PASID.
The DMA mapping performed by IOMMU will be identical for both requests, let
it be IOVA or PA in case of pass-through.

In addition, this set converts DSA driver in-kernel DMA with PASID from SVA
lib to DMA API. There have been security and functional issues with the
kernel SVA approach:
(https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
The highlights are as the following:
 - The lack of IOTLB synchronization upon kernel page table updates.
   (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
 - Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use cases yet
where DMA engines need kernel virtual addresses for in-kernel DMA.

Subsequently, cleanup is done around the usage of sva_bind_device() for
in-kernel DMA. Removing special casing code in VT-d driver and tightening
SVA lib API.

This work and idea behind it is a collaboration with many people, many
thanks to Baolu Lu, Jason Gunthorpe, Dave Jiang, and others.


ChangeLog:
v3
- Rebased on "Baolu's SVA and IOPF refactoring" series v5.

(https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v5)
This version is significantly simplified by leveraging IOMMU domain
ops, attach_dev_pasid() op is implemented differently on a DMA domain
than on a SVA domain.
We currently have no need to support multiple PASIDs per DMA domain.
(https://lore.kernel.org/lkml/20220315142216.gv11...@nvidia.com/).
Removed PASID-device list from V2, a PASID field is introduced to
struct iommu_domain instead. It is intended for DMA requests with
PASID by all devices attached to the domain.

v2
- Do not reserve a special PASID for DMA API usage. Use IOASID
  allocation instead.
- Introduced a generic device-pasid-domain attachment IOMMU op.
  Replaced the DMA API only IOMMU op.
- Removed supervisor SVA support in VT-d
- Removed unused sva_bind_device parameters
- Use IOMMU specific data instead of struct device to store PASID
      info


Jacob Pan (4):
  iommu/vt-d: Implement domain ops for attach_dev_pasid
  iommu: Add PASID support for DMA mapping API users
  dmaengine: idxd: Use DMA API for in-kernel DMA with PASID
  iommu/vt-d: Delete unused SVM flag

 drivers/dma/idxd/idxd.h |   1 -
 drivers/dma/idxd/init.c |  34 +++-
 drivers/dma/idxd/sysfs.c|   7 ---
 drivers/iommu/dma-iommu.c   | 107 
 drivers/iommu/intel/iommu.c |  81 ++-
 drivers/iommu/intel/svm.c   |   2 +-
 include/linux/dma-iommu.h   |   3 +
 include/linux/intel-iommu.h |   1 +
 include/linux/intel-svm.h   |  13 -
 include/linux/iommu.h   |   2 +
 10 files changed, 202 insertions(+), 49 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-10 Thread Jacob Pan
On VT-d platforms with scalable mode enabled, devices issue DMA requests
with PASID need to attach PASIDs to given IOMMU domains. The attach
operation involves the following:
- Programming the PASID into the device's PASID table
- Tracking device domain and the PASID relationship
- Managing IOTLB and device TLB invalidations

This patch add attach_dev_pasid functions to the default domain ops which
is used by DMA and identity domain types. It could be extended to support
other domain types whenever necessary.

Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 81 -
 include/linux/intel-iommu.h |  1 +
 2 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a51b96fa9b3a..5408418f4f4b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1562,6 +1562,10 @@ static void __iommu_flush_dev_iotlb(struct 
device_domain_info *info,
 
sid = info->bus << 8 | info->devfn;
qdep = info->ats_qdep;
+   if (info->pasid) {
+   qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
+info->pasid, qdep, addr, mask);
+   }
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
   qdep, addr, mask);
 }
@@ -1591,6 +1595,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
unsigned int mask = ilog2(aligned_pages);
uint64_t addr = (uint64_t)pfn << VTD_PAGE_SHIFT;
u16 did = domain->iommu_did[iommu->seq_id];
+   struct iommu_domain *iommu_domain = >domain;
 
BUG_ON(pages == 0);
 
@@ -1599,6 +1604,9 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
 
if (domain_use_first_level(domain)) {
qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
+   /* flush additional kernel DMA PASIDs attached */
+   if (iommu_domain->pasid)
+   qi_flush_piotlb(iommu, did, iommu_domain->pasid, addr, 
pages, ih);
} else {
unsigned long bitmask = aligned_pages - 1;
 
@@ -4265,10 +4273,13 @@ static void __dmar_remove_one_dev_info(struct 
device_domain_info *info)
domain = info->domain;
 
if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
-   if (dev_is_pci(info->dev) && sm_supported(iommu))
+   if (dev_is_pci(info->dev) && sm_supported(iommu)) {
intel_pasid_tear_down_entry(iommu, info->dev,
PASID_RID2PASID, false);
-
+   if (info->pasid)
+   intel_pasid_tear_down_entry(iommu, info->dev,
+   info->pasid, false);
+   }
iommu_disable_dev_iotlb(info);
domain_context_clear(info);
intel_pasid_free_table(info->dev);
@@ -4912,6 +4923,70 @@ static void intel_iommu_iotlb_sync_map(struct 
iommu_domain *domain,
}
 }
 
+static int intel_iommu_attach_dev_pasid(struct iommu_domain *domain,
+   struct device *dev,
+   ioasid_t pasid)
+{
+   struct device_domain_info *info = dev_iommu_priv_get(dev);
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   struct intel_iommu *iommu = info->iommu;
+   unsigned long flags;
+   int ret = 0;
+
+   if (!sm_supported(iommu) || !info)
+   return -ENODEV;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   /*
+* If the same device already has a PASID attached, just return.
+* DMA layer will return the PASID value to the caller.
+*/
+   if (pasid != PASID_RID2PASID && info->pasid) {
+   if (info->pasid == pasid)
+   ret = 0;
+   else {
+   dev_warn(dev, "Cannot attach PASID %u, %u already 
attached\n",
+pasid, info->pasid);
+   ret = -EBUSY;
+   }
+   goto out_unlock_domain;
+   }
+
+   spin_lock(>lock);
+   if (hw_pass_through && domain_type_is_si(dmar_domain))
+   ret = intel_pasid_setup_pass_through(iommu, dmar_domain,
+dev, pasid);
+   else if (domain_use_first_level(dmar_domain))
+   ret = domain_setup_first_level(iommu, dmar_domain,
+  dev, pasid);
+   else
+   ret = intel_pasid_setup_second_level(iommu, dmar_domain,
+dev, pasid);
+
+   spin_unlock(>lock);
+out_unlock_dom

Re: [PATCH 4/5] iommu/vt-d: Remove domain_update_iommu_snooping()

2022-05-02 Thread Jacob Pan
Hi BaoLu,

On Sun, 1 May 2022 19:24:33 +0800, Lu Baolu 
wrote:

> The IOMMU force snooping capability is not required to be consistent
> among all the IOMMUs anymore. Remove force snooping capability check
> in the IOMMU hot-add path and domain_update_iommu_snooping() becomes
> a dead code now.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/iommu.c | 34 +-
>  1 file changed, 1 insertion(+), 33 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 3c1c228f9031..d5808495eb64 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -533,33 +533,6 @@ static void domain_update_iommu_coherency(struct
> dmar_domain *domain) rcu_read_unlock();
>  }
>  
> -static bool domain_update_iommu_snooping(struct intel_iommu *skip)
> -{
> - struct dmar_drhd_unit *drhd;
> - struct intel_iommu *iommu;
> - bool ret = true;
> -
> - rcu_read_lock();
> - for_each_active_iommu(iommu, drhd) {
> - if (iommu != skip) {
> - /*
> -  * If the hardware is operating in the scalable
> mode,
> -  * the snooping control is always supported
> since we
> -  * always set PASID-table-entry.PGSNP bit if the
> domain
> -  * is managed outside (UNMANAGED).
> -  */
> - if (!sm_supported(iommu) &&
> - !ecap_sc_support(iommu->ecap)) {
> - ret = false;
> - break;
> - }
> - }
> - }
> - rcu_read_unlock();
> -
> - return ret;
> -}
> -
>  static int domain_update_iommu_superpage(struct dmar_domain *domain,
>struct intel_iommu *skip)
>  {
> @@ -3593,12 +3566,7 @@ static int intel_iommu_add(struct dmar_drhd_unit
> *dmaru) iommu->name);
>   return -ENXIO;
>   }
> - if (!ecap_sc_support(iommu->ecap) &&
> - domain_update_iommu_snooping(iommu)) {
> - pr_warn("%s: Doesn't support snooping.\n",
> - iommu->name);
> - return -ENXIO;
> - }
> +
Maybe I missed earlier patches, so this bit can also be deleted?

struct dmar_domain {
u8 iommu_snooping: 1;   /* indicate snooping control
feature */

>   sp = domain_update_iommu_superpage(NULL, iommu) - 1;
>   if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) {
>   pr_warn("%s: Doesn't support large page.\n",


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/5] iommu/vt-d: Check domain force_snooping against attached devices

2022-05-02 Thread Jacob Pan
Hi BaoLu,

On Sun, 1 May 2022 19:24:32 +0800, Lu Baolu 
wrote:

> As domain->force_snooping only impacts the devices attached with the
> domain, there's no need to check against all IOMMU units. At the same
> time, for a brand new domain (hasn't been attached to any device), the
> force_snooping field could be set, but the attach_dev callback will
> return failure if it wants to attach to a device which IOMMU has no
> snoop control capability.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/pasid.h |  2 ++
>  drivers/iommu/intel/iommu.c | 50 -
>  drivers/iommu/intel/pasid.c | 18 +
>  3 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index ab4408c824a5..583ea67fc783 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -123,4 +123,6 @@ void intel_pasid_tear_down_entry(struct intel_iommu
> *iommu, bool fault_ignore);
>  int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid);
>  void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid);
> +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
> +   struct device *dev, u32 pasid);
>  #endif /* __INTEL_PASID_H */
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 98050943d863..3c1c228f9031 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4554,13 +4554,61 @@ static phys_addr_t
> intel_iommu_iova_to_phys(struct iommu_domain *domain, return phys;
>  }
>  
> +static bool domain_support_force_snooping(struct dmar_domain *domain)
> +{
> + struct device_domain_info *info;
> + unsigned long flags;
> + bool support = true;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + if (list_empty(>devices))
> + goto out;
> +
> + list_for_each_entry(info, >devices, link) {
> + if (!ecap_sc_support(info->iommu->ecap)) {
> + support = false;
> + break;
> + }
> + }
why not just check the flag dmar_domain->force_snooping? devices wouldn't
be able to attach if !ecap_sc, right?

> +out:
> + spin_unlock_irqrestore(_domain_lock, flags);
> + return support;
> +}
> +
> +static void domain_set_force_snooping(struct dmar_domain *domain)
> +{
> + struct device_domain_info *info;
> + unsigned long flags;
> +
> + /*
> +  * Second level page table supports per-PTE snoop control. The
> +  * iommu_map() interface will handle this by setting SNP bit.
> +  */
> + if (!domain_use_first_level(domain))
> + return;
> +
> + spin_lock_irqsave(_domain_lock, flags);
> + if (list_empty(>devices))
> + goto out_unlock;
> +
> + list_for_each_entry(info, >devices, link)
> + intel_pasid_setup_page_snoop_control(info->iommu,
> info->dev,
> +  PASID_RID2PASID);
> +
I guess other DMA API PASIDs need to have sc bit set as well. I will keep
this in mind for my DMA API PASID patch.

> +out_unlock:
> + spin_unlock_irqrestore(_domain_lock, flags);
> +}
> +
>  static bool intel_iommu_enforce_cache_coherency(struct iommu_domain
> *domain) {
>   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>  
> - if (!domain_update_iommu_snooping(NULL))
> + if (!domain_support_force_snooping(dmar_domain))
>   return false;
> +
> + domain_set_force_snooping(dmar_domain);
>   dmar_domain->force_snooping = true;
> +
nit: spurious change
>   return true;
>  }
>  
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index f8d215d85695..815c744e6a34 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -762,3 +762,21 @@ int intel_pasid_setup_pass_through(struct
> intel_iommu *iommu, 
>   return 0;
>  }
> +
> +/*
> + * Set the page snoop control for a pasid entry which has been set up.
> + */
> +void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
> +   struct device *dev, u32 pasid)
> +{
> + struct pasid_entry *pte;
> + u16 did;
> +
> + pte = intel_pasid_get_entry(dev, pasid);
> + if (WARN_ON(!pte || !pasid_pte_is_present(pte)))
> + return;
> +
> + pasid_set_pgsnp(pte);
> + did = pasid_get_domain_id(pte);
> + pasid_flush_caches(iommu, pte, pasid, did);
> +}


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Documentation: x86: rework IOMMU documentation

2022-04-26 Thread Jacob Pan
Hi Alex,

Thanks for doing this, really helps to catch up the current state. Please
see my comments inline.

On Fri, 22 Apr 2022 16:06:07 -0400, Alex Deucher
 wrote:

> Add preliminary documentation for AMD IOMMU and combine
> with the existing Intel IOMMU documentation and clean
> up and modernize some of the existing documentation to
> align with the current state of the kernel.
> 
> Signed-off-by: Alex Deucher 
> ---
> 
> V2: Incorporate feedback from Robin to clarify IOMMU vs DMA engine (e.g.,
> a device) and document proper DMA API.  Also correct the fact that
> the AMD IOMMU is not limited to managing PCI devices.
> v3: Fix spelling and rework text as suggested by Vasant
> v4: Combine Intel and AMD documents into a single document as suggested
> by Dave Hansen
> v5: Clarify that keywords are related to ACPI, grammatical fixes
> v6: Make more stuff common based on feedback from Robin
> 
>  Documentation/x86/index.rst   |   2 +-
>  Documentation/x86/intel-iommu.rst | 115 
>  Documentation/x86/iommu.rst   | 143 ++
>  3 files changed, 144 insertions(+), 116 deletions(-)
>  delete mode 100644 Documentation/x86/intel-iommu.rst
>  create mode 100644 Documentation/x86/iommu.rst
> 
> diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
> index f498f1d36cd3..6f8409fe0674 100644
> --- a/Documentation/x86/index.rst
> +++ b/Documentation/x86/index.rst
> @@ -21,7 +21,7 @@ x86-specific Documentation
> tlb
> mtrr
> pat
> -   intel-iommu
> +   iommu
> intel_txt
> amd-memory-encryption
> pti
> diff --git a/Documentation/x86/intel-iommu.rst
> b/Documentation/x86/intel-iommu.rst deleted file mode 100644
> index 099f13d51d5f..
> --- a/Documentation/x86/intel-iommu.rst
> +++ /dev/null
> @@ -1,115 +0,0 @@
> -===
> -Linux IOMMU Support
> -===
> -
> -The architecture spec can be obtained from the below location.
> -
> -http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
> -
> -This guide gives a quick cheat sheet for some basic understanding.
> -
> -Some Keywords
> -
> -- DMAR - DMA remapping
> -- DRHD - DMA Remapping Hardware Unit Definition
> -- RMRR - Reserved memory Region Reporting Structure
> -- ZLR  - Zero length reads from PCI devices
> -- IOVA - IO Virtual address.
> -
I feel this combined document only focus on IOVA and DMA APIs, it is
considered as legacy DMA after scalable mode is introduced by Intel to
support DMA with PASID, shared virtual addressing (SVA).
Perhaps, we can also combine ./Documentation/x86/sva.rst

With scalable mode, it affects boot messages, fault reporting, etc. I am
not saying no to this document, just suggesting. I don't know where AMD is
at in terms of PASID support but there are lots of things in common between
VT-d and ARM's SMMU in terms of PASID/SVA. Should we broaden the purpose of
this document even further?

> -Basic stuff
> 
> -
> -ACPI enumerates and lists the different DMA engines in the platform, and
> -device scope relationships between PCI devices and which DMA engine
> controls -them.
> -
> -What is RMRR?
> --
> -
> -There are some devices the BIOS controls, for e.g USB devices to perform
> -PS2 emulation. The regions of memory used for these devices are marked
> -reserved in the e820 map. When we turn on DMA translation, DMA to those
> -regions will fail. Hence BIOS uses RMRR to specify these regions along
> with -devices that need to access these regions. OS is expected to setup
> -unity mappings for these regions for these devices to access these
> regions. -
> -How is IOVA generated?
> ---
> -
> -Well behaved drivers call pci_map_*() calls before sending command to
> device -that needs to perform DMA. Once DMA is completed and mapping is
> no longer -required, device performs a pci_unmap_*() calls to unmap the
> region. -
> -The Intel IOMMU driver allocates a virtual address per domain. Each PCIE
> -device has its own domain (hence protection). Devices under p2p bridges
> -share the virtual address with all devices under the p2p bridge due to
> -transaction id aliasing for p2p bridges.
> -
> -IOVA generation is pretty generic. We used the same technique as
> vmalloc() -but these are not global address spaces, but separate for each
> domain. -Different DMA engines may support different number of domains.
> -
> -We also allocate guard pages with each mapping, so we can attempt to
> catch -any overflow that might happen.
> -
> -
> -Graphics Problems?
> ---
> -If you encounter issues with graphics devices, you can try adding
> -option intel_iommu=igfx_off to turn off the integrated graphics engine.
> -If this fixes anything, please ensure you file a bug reporting the
> problem. -
> -Some exceptions to IOVA
> 
> -Interrupt ranges are not address translated, (0xfee0 - 0xfeef).

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Jacob Pan
Hi Jean-Philippe,

On Mon, 25 Apr 2022 17:13:02 +0100, Jean-Philippe Brucker
 wrote:

> Hi Jacob,
> 
> On Mon, Apr 25, 2022 at 08:34:44AM -0700, Jacob Pan wrote:
> > Hi Jean-Philippe,
> > 
> > On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
> >  wrote:
> >   
> > > On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:  
> > > > On 4/25/22 06:53, Jean-Philippe Brucker wrote:
> > > > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com
> > > > > wrote:
> > > > >>>> On 5.17
> > > > >>>> fops_release is called automatically, as well as
> > > > >>>> iommu_sva_unbind_device. On 5.18-rc1.
> > > > >>>> fops_release is not called, have to manually call close(fd)
> > > > >>> Right that's weird
> > > > >> Looks it is caused by the fix patch, via mmget, which may add
> > > > >> refcount of fd.
> > > > > Yes indirectly I think: when the process mmaps the queue,
> > > > > mmap_region() takes a reference to the uacce fd. That reference is
> > > > > released either by explicit close() or munmap(), or by exit_mmap()
> > > > > (which is triggered by mmput()). Since there is an mm->fd
> > > > > dependency, we cannot add a fd->mm dependency, so no
> > > > > mmget()/mmput() in bind()/unbind().
> > > > > 
> > > > > I guess we should go back to refcounted PASIDs instead, to avoid
> > > > > freeing them until unbind().
> > > > 
> > > > Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
> > > > nothing else simple we can do.
> > > > 
> > > > How does the IOMMU hardware know that all activity to a given PASID
> > > > is finished?  That activity should, today, be independent of an mm
> > > > or a fd's lifetime.
> > > 
> > > In the case of uacce, it's tied to the fd lifetime: opening an
> > > accelerator queue calls iommu_sva_bind_device(), which sets up the
> > > PASID context in the IOMMU. Closing the queue calls
> > > iommu_sva_unbind_device() which destroys the PASID context (after the
> > > device driver stopped all DMA for this PASID).
> > >   
> > For VT-d, it is essentially the same flow except managed by the
> > individual drivers such as DSA.
> > If free() happens before unbind(), we deactivate the PASIDs and suppress
> > faults from the device. When the unbind finally comes, we finalize the
> > PASID teardown. It seems we have a need for an intermediate state where
> > PASID is "pending free"?  
> 
> Yes we do have that state, though I'm not sure we need to make it explicit
> in the ioasid allocator.
> 
IMHO, making it explicit would fail ioasid_get() on a "pending free" PASID.
Making free a one-way trip and prevent further complications.

> Could we move mm_pasid_drop() to __mmdrop() instead of __mmput()?  For Arm
> we do need to hold the mm_count until unbind(), and mmgrab()/mmdrop() is
> also part of Lu's rework [1].
> 
Yes, I would agree. IIRC, Fenghua's early patch was doing pasid drop
in mmdrop. Maybe I missed something.

> Thanks,
> Jean
> 
> [1]
> https://lore.kernel.org/linux-iommu/20220421052121.3464100-9-baolu...@linux.intel.com/


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-25 Thread Jacob Pan
Hi Jean-Philippe,

On Mon, 25 Apr 2022 15:26:40 +0100, Jean-Philippe Brucker
 wrote:

> On Mon, Apr 25, 2022 at 07:18:36AM -0700, Dave Hansen wrote:
> > On 4/25/22 06:53, Jean-Philippe Brucker wrote:  
> > > On Sat, Apr 23, 2022 at 07:13:39PM +0800, zhangfei@foxmail.com
> > > wrote:  
> >  On 5.17
> >  fops_release is called automatically, as well as
> >  iommu_sva_unbind_device. On 5.18-rc1.
> >  fops_release is not called, have to manually call close(fd)  
> > >>> Right that's weird  
> > >> Looks it is caused by the fix patch, via mmget, which may add
> > >> refcount of fd.  
> > > Yes indirectly I think: when the process mmaps the queue,
> > > mmap_region() takes a reference to the uacce fd. That reference is
> > > released either by explicit close() or munmap(), or by exit_mmap()
> > > (which is triggered by mmput()). Since there is an mm->fd dependency,
> > > we cannot add a fd->mm dependency, so no mmget()/mmput() in
> > > bind()/unbind().
> > > 
> > > I guess we should go back to refcounted PASIDs instead, to avoid
> > > freeing them until unbind().  
> > 
> > Yeah, this is a bit gnarly for -rc4.  Let's just make sure there's
> > nothing else simple we can do.
> > 
> > How does the IOMMU hardware know that all activity to a given PASID is
> > finished?  That activity should, today, be independent of an mm or a
> > fd's lifetime.  
> 
> In the case of uacce, it's tied to the fd lifetime: opening an accelerator
> queue calls iommu_sva_bind_device(), which sets up the PASID context in
> the IOMMU. Closing the queue calls iommu_sva_unbind_device() which
> destroys the PASID context (after the device driver stopped all DMA for
> this PASID).
> 
For VT-d, it is essentially the same flow except managed by the individual
drivers such as DSA.
If free() happens before unbind(), we deactivate the PASIDs and suppress
faults from the device. When the unbind finally comes, we finalize the
PASID teardown. It seems we have a need for an intermediate state where
PASID is "pending free"?

> Thanks,
> Jean
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-18 Thread Jacob Pan
Hi zhangfei@foxmail.com,

On Sat, 16 Apr 2022 09:43:07 +0800, "zhangfei@foxmail.com"
 wrote:

> On 2022/4/16 上午5:00, Jacob Pan wrote:
> > Hi zhangfei@foxmail.com,
> >
> > On Fri, 15 Apr 2022 19:52:03 +0800, "zhangfei@foxmail.com"
> >  wrote:
> >  
> >>>>> A PASID might be still used even though it is freed on mm exit.
> >>>>>
> >>>>> process A:
> >>>>> sva_bind();
> >>>>> ioasid_alloc() = N; // Get PASID N for the mm
> >>>>> fork(): // spawn process B
> >>>>> exit();
> >>>>> ioasid_free(N);
> >>>>>
> >>>>> process B:
> >>>>> device uses PASID N -> failure
> >>>>> sva_unbind();
> >>>>>
> >>>>> Dave Hansen suggests to take a refcount on the mm whenever binding
> >>>>> the PASID to a device and drop the refcount on unbinding. The mm
> >>>>> won't be dropped if the PASID is still bound to it.
> >>>>>
> >>>>> Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID
> >>>>> allocation and free it on mm exit")
> >>>>>  
> > Is process A's mm intended to be used by process B? Or you really should
> > use PASID N on process B's mm? If the latter, it may work for a while
> > until B changes mapping.
> >
> > It seems you are just extending the life of a defunct mm?  
> 
>  From nginx code, the master process init resources, then fork daemon 
> process to take over,
> then master process exit by itself.
> 
> src/core/nginx.c
> main
> ngx_ssl_init(log);    -> openssl engine -> bind_fn -> sva_bind()
> ngx_daemon(cycle->log)
> 
> src/os/unix/ngx_daemon.c
> ngx_daemon(ngx_log_t *log)
> {
>   int  fd;
> 
>   switch (fork()) {
>   case -1:
>   ngx_log_error(NGX_LOG_EMERG, log, ngx_errno, "fork() failed");
>   return NGX_ERROR;
> 
>   case 0:
>      // the fork daemon process
>   break;
> 
Does this child process call sva_bind() again to get another PASID? Or it
will keep using the parent's PASID for DMA?

>   default:
>     // master process directly exit, and release mm as well as ioasid
>   exit(0);
>   }
> 
>    // only daemon process
> 
> Thanks
> 
> >
> > Thanks,
> >
> > Jacob  
> 


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-18 Thread Jacob Pan
Hi Kevin,

On Mon, 18 Apr 2022 06:34:19 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Saturday, April 16, 2022 5:00 AM
> > 
> > Hi zhangfei@foxmail.com,
> > 
> > On Fri, 15 Apr 2022 19:52:03 +0800, "zhangfei@foxmail.com"
> >  wrote:
> >   
> > > >>> A PASID might be still used even though it is freed on mm exit.
> > > >>>
> > > >>> process A:
> > > >>>   sva_bind();
> > > >>>   ioasid_alloc() = N; // Get PASID N for the mm
> > > >>>   fork(): // spawn process B
> > > >>>   exit();
> > > >>>   ioasid_free(N);
> > > >>>
> > > >>> process B:
> > > >>>   device uses PASID N -> failure
> > > >>>   sva_unbind();
> > > >>>
> > > >>> Dave Hansen suggests to take a refcount on the mm whenever
> > > >>> binding  
> > the  
> > > >>> PASID to a device and drop the refcount on unbinding. The mm
> > > >>> won't  
> > be  
> > > >>> dropped if the PASID is still bound to it.
> > > >>>
> > > >>> Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID
> > > >>> allocation and free it on mm exit")
> > > >>>  
> > Is process A's mm intended to be used by process B? Or you really should
> > use PASID N on process B's mm? If the latter, it may work for a while
> > until B changes mapping.
> > 
> > It seems you are just extending the life of a defunct mm?
> >   
> 
> IMHO the intention is not to allow B to access A's mm.
> 
> The problem is that PASID N is released on exit() of A and then
> reallocated to B before iommu driver gets the chance to quiesce
> the device and clear the PASID entry. According to the discussion
> the quiesce operation must be done when driver calls unbind()
> instead of in mm exit. In this case a failure is reported when
> B tries to call bind() on PASID N due to an already-present entry.
> 
> Dave's patch extending the life of A's mm until unbind() is called.
> With it B either gets a different PASID before A's unbind() is 
> completed or same PASID N pointing to B's mm after A's unbind().
> 
As long as B gets a different PASID, that is fine. It seems PASID N has no
use then.

> Thanks
> Kevin


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-15 Thread Jacob Pan
Hi zhangfei@foxmail.com,

On Fri, 15 Apr 2022 19:52:03 +0800, "zhangfei@foxmail.com"
 wrote:

> >>> A PASID might be still used even though it is freed on mm exit.
> >>>
> >>> process A:
> >>>   sva_bind();
> >>>   ioasid_alloc() = N; // Get PASID N for the mm
> >>>   fork(): // spawn process B
> >>>   exit();
> >>>   ioasid_free(N);
> >>>
> >>> process B:
> >>>   device uses PASID N -> failure
> >>>   sva_unbind();
> >>>
> >>> Dave Hansen suggests to take a refcount on the mm whenever binding the
> >>> PASID to a device and drop the refcount on unbinding. The mm won't be
> >>> dropped if the PASID is still bound to it.
> >>>
> >>> Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID
> >>> allocation and free it on mm exit")
> >>>
Is process A's mm intended to be used by process B? Or you really should
use PASID N on process B's mm? If the latter, it may work for a while until
B changes mapping.

It seems you are just extending the life of a defunct mm?

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v2 08/11] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-03-31 Thread Jacob Pan
Hi Lu,

On Tue, 29 Mar 2022 13:37:57 +0800, Lu Baolu 
wrote:

> The existing iommu SVA interfaces are implemented by calling the SVA
> specific iommu ops provided by the IOMMU drivers. There's no need for
> any SVA specific ops in iommu_ops vector anymore as we can achieve
> this through the generic attach/detach_dev_pasid domain ops.
> 
> This refactors the IOMMU SVA interfaces implementation by using the
> attach/detach_pasid_dev ops and align them with the concept of the
> iommu domain. Put the new SVA code in the sva related file in order
> to make it self-contained.
> 
> Signed-off-by: Lu Baolu 
> ---
>  include/linux/iommu.h |  51 +---
>  drivers/iommu/iommu-sva-lib.c | 110 +-
>  drivers/iommu/iommu.c |  92 
>  3 files changed, 138 insertions(+), 115 deletions(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index a46285488a57..11c4d99e122d 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -629,7 +629,12 @@ struct iommu_fwspec {
>   * struct iommu_sva - handle to a device-mm bond
>   */
>  struct iommu_sva {
> - struct device   *dev;
> + struct device   *dev;
> + ioasid_tpasid;
> + struct iommu_domain *domain;
> + /* Link to sva domain's bonds list */
> + struct list_headnode;
> + refcount_t  users;
>  };
>  
>  int iommu_fwspec_init(struct device *dev, struct fwnode_handle
> *iommu_fwnode, @@ -672,12 +677,6 @@ int iommu_dev_enable_feature(struct
> device *dev, enum iommu_dev_features f); int
> iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);
> bool iommu_dev_feature_enabled(struct device *dev, enum
> iommu_dev_features f); -struct iommu_sva *iommu_sva_bind_device(struct
> device *dev,
> - struct mm_struct *mm,
> - void *drvdata);
> -void iommu_sva_unbind_device(struct iommu_sva *handle);
> -u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> -
>  int iommu_device_use_default_domain(struct device *dev);
>  void iommu_device_unuse_default_domain(struct device *dev);
>  
> @@ -1018,21 +1017,6 @@ iommu_dev_disable_feature(struct device *dev, enum
> iommu_dev_features feat) return -ENODEV;
>  }
>  
> -static inline struct iommu_sva *
> -iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void
> *drvdata) -{
> - return NULL;
> -}
> -
> -static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
> -{
> -}
> -
> -static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> -{
> - return IOMMU_PASID_INVALID;
> -}
> -
>  static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device
> *dev) {
>   return NULL;
> @@ -1085,6 +1069,29 @@ iommu_put_domain_for_dev_pasid(struct iommu_domain
> *domain) }
>  #endif /* CONFIG_IOMMU_API */
>  
> +#ifdef CONFIG_IOMMU_SVA
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> + struct mm_struct *mm,
> + void *drvdata);
> +void iommu_sva_unbind_device(struct iommu_sva *handle);
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> +#else /* CONFIG_IOMMU_SVA */
> +static inline struct iommu_sva *
> +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void
> *drvdata) +{
> + return NULL;
> +}
> +
> +static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
> +{
> +}
> +
> +static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> +{
> + return IOMMU_PASID_INVALID;
> +}
> +#endif /* CONFIG_IOMMU_SVA */
> +
>  /**
>   * iommu_map_sgtable - Map the given buffer to the IOMMU domain
>   * @domain:  The IOMMU domain to perform the mapping
> diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> index 78820be23f15..1b45b7d01836 100644
> --- a/drivers/iommu/iommu-sva-lib.c
> +++ b/drivers/iommu/iommu-sva-lib.c
> @@ -17,6 +17,7 @@ struct iommu_sva_cookie {
>   struct mm_struct *mm;
>   ioasid_t pasid;
>   refcount_t users;
> + struct list_head bonds;
>  };
>  
>  /**
> @@ -101,6 +102,7 @@ iommu_sva_alloc_domain(struct device *dev, struct
> mm_struct *mm) cookie->mm = mm;
>   cookie->pasid = mm->pasid;
>   refcount_set(>users, 1);
> + INIT_LIST_HEAD(>bonds);
>   domain->type = IOMMU_DOMAIN_SVA;
>   domain->sva_cookie = cookie;
>   curr = xa_store(_domain_array, mm->pasid, domain,
> GFP_KERNEL); @@ -118,6 +120,7 @@ iommu_sva_alloc_domain(struct device
> *dev, struct mm_struct *mm) static void iommu_sva_free_domain(struct
> iommu_domain *domain) {
>   xa_erase(_domain_array, domain->sva_cookie->pasid);
> + WARN_ON(!list_empty(>sva_cookie->bonds));
>   kfree(domain->sva_cookie);
>   domain->ops->free(domain);
>  }
> @@ -137,7 +140,7 @@ void iommu_sva_domain_put_user(struct iommu_domain
> *domain) iommu_sva_free_domain(domain);
>  

Re: [PATCH RFC v2 03/11] iommu/sva: Add iommu_domain type for SVA

2022-03-29 Thread Jacob Pan
Hi BaoLu,

On Tue, 29 Mar 2022 13:37:52 +0800, Lu Baolu 
wrote:

> Add a new iommu domain type IOMMU_DOMAIN_SVA to represent an I/O page
> table which is shared from CPU host VA. Add some helpers to get and
> put an SVA domain and implement SVA domain life cycle management.
> 
> Signed-off-by: Lu Baolu 
> ---
>  include/linux/iommu.h |  7 +++
>  drivers/iommu/iommu-sva-lib.h | 10 
>  drivers/iommu/iommu-sva-lib.c | 89 +++
>  3 files changed, 106 insertions(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 36f43af0af53..29c4c2edd706 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -42,6 +42,7 @@ struct notifier_block;
>  struct iommu_sva;
>  struct iommu_fault_event;
>  struct iommu_dma_cookie;
> +struct iommu_sva_cookie;
>  
>  /* iommu fault flags */
>  #define IOMMU_FAULT_READ 0x0
> @@ -64,6 +65,9 @@ struct iommu_domain_geometry {
>  #define __IOMMU_DOMAIN_PT(1U << 2)  /* Domain is identity mapped
>   */ #define __IOMMU_DOMAIN_DMA_FQ(1U << 3)  /* DMA-API uses
> flush queue*/ 
> +#define __IOMMU_DOMAIN_SHARED(1U << 4)  /* Page table shared
> from CPU  */ +#define __IOMMU_DOMAIN_HOST_VA  (1U << 5)  /* Host
> CPU virtual address */ +
>  /*
>   * This are the possible domain-types
>   *
> @@ -86,6 +90,8 @@ struct iommu_domain_geometry {
>  #define IOMMU_DOMAIN_DMA_FQ  (__IOMMU_DOMAIN_PAGING |\
>__IOMMU_DOMAIN_DMA_API |   \
>__IOMMU_DOMAIN_DMA_FQ)
> +#define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SHARED |\
> +  __IOMMU_DOMAIN_HOST_VA)
>  
>  struct iommu_domain {
>   unsigned type;
> @@ -95,6 +101,7 @@ struct iommu_domain {
>   void *handler_token;
>   struct iommu_domain_geometry geometry;
>   struct iommu_dma_cookie *iova_cookie;
> + struct iommu_sva_cookie *sva_cookie;
>  };
>  
>  static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
> diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
> index 8909ea1094e3..1a71218b07f5 100644
> --- a/drivers/iommu/iommu-sva-lib.h
> +++ b/drivers/iommu/iommu-sva-lib.h
> @@ -10,6 +10,7 @@
>  
>  int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t
> max); struct mm_struct *iommu_sva_find(ioasid_t pasid);
> +struct mm_struct *iommu_sva_domain_mm(struct iommu_domain *domain);
>  
>  /* I/O Page fault */
>  struct device;
> @@ -26,6 +27,8 @@ int iopf_queue_flush_dev(struct device *dev);
>  struct iopf_queue *iopf_queue_alloc(const char *name);
>  void iopf_queue_free(struct iopf_queue *queue);
>  int iopf_queue_discard_partial(struct iopf_queue *queue);
> +bool iommu_sva_domain_get_user(struct iommu_domain *domain);
> +void iommu_sva_domain_put_user(struct iommu_domain *domain);
>  
>  #else /* CONFIG_IOMMU_SVA */
>  static inline int iommu_queue_iopf(struct iommu_fault *fault, void
> *cookie) @@ -63,5 +66,12 @@ static inline int
> iopf_queue_discard_partial(struct iopf_queue *queue) {
>   return -ENODEV;
>  }
> +
> +static inline bool iommu_sva_domain_get_user(struct iommu_domain *domain)
> +{
> + return false;
> +}
> +
> +static inline void iommu_sva_domain_put_user(struct iommu_domain
> *domain) { } #endif /* CONFIG_IOMMU_SVA */
>  #endif /* _IOMMU_SVA_LIB_H */
> diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
> index 106506143896..78820be23f15 100644
> --- a/drivers/iommu/iommu-sva-lib.c
> +++ b/drivers/iommu/iommu-sva-lib.c
> @@ -3,12 +3,21 @@
>   * Helpers for IOMMU drivers implementing SVA
>   */
>  #include 
> +#include 
> +#include 
>  #include 
>  
>  #include "iommu-sva-lib.h"
>  
>  static DEFINE_MUTEX(iommu_sva_lock);
>  static DECLARE_IOASID_SET(iommu_sva_pasid);
> +static DEFINE_XARRAY_ALLOC(sva_domain_array);
> +
> +struct iommu_sva_cookie {
> + struct mm_struct *mm;
> + ioasid_t pasid;
> + refcount_t users;
> +};
>  
>  /**
>   * iommu_sva_alloc_pasid - Allocate a PASID for the mm
> @@ -69,3 +78,83 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
>   return ioasid_find(_sva_pasid, pasid, __mmget_not_zero);
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_find);
> +
> +static struct iommu_domain *
> +iommu_sva_alloc_domain(struct device *dev, struct mm_struct *mm)
> +{
> + struct bus_type *bus = dev->bus;
> + struct iommu_sva_cookie *cookie;
> + struct iommu_domain *domain;
> + void *curr;
> +
> + if (!bus || !bus->iommu_ops)
> + return NULL;
> +
> + cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
> + if (!cookie)
> + return NULL;
> +
> + domain = bus->iommu_ops->domain_alloc(IOMMU_DOMAIN_SVA);
> + if (!domain)
> + goto err_domain_alloc;
> +
> + cookie->mm = mm;
> + cookie->pasid = mm->pasid;
How do you manage the mm life cycle? do you require caller take mm reference?
Or this should be limited to the current mm?

> + refcount_set(>users, 

Re: [PATCH RFC v2 01/11] iommu: Add pasid_bits field in struct dev_iommu

2022-03-29 Thread Jacob Pan
Hi BaoLu,

On Tue, 29 Mar 2022 13:37:50 +0800, Lu Baolu 
wrote:

> Use this field to save the pasid/ssid bits that a device is able to
> support with its IOMMU hardware. It is a generic attribute of a device
> and lifting it into the per-device dev_iommu struct makes it possible
> to allocate a PASID for device without calls into the IOMMU drivers.
> Any iommu driver which suports PASID related features should set this
> field before features are enabled on the devices.
> 
> Signed-off-by: Lu Baolu 
> ---
>  include/linux/iommu.h   | 1 +
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 ++
>  drivers/iommu/intel/iommu.c | 5 -
>  3 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 6ef2df258673..36f43af0af53 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -368,6 +368,7 @@ struct dev_iommu {
>   struct iommu_fwspec *fwspec;
>   struct iommu_device *iommu_dev;
>   void*priv;
> + unsigned intpasid_bits;
pasid_width?
PCI spec uses "Max PASID Width"

>  };
>  
>  int iommu_device_register(struct iommu_device *iommu,
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index
> 627a3ed5ee8f..afc63fce6107 100644 ---
> a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2681,6 +2681,8 @@
> static struct iommu_device *arm_smmu_probe_device(struct device *dev)
> smmu->features & ARM_SMMU_FEAT_STALL_FORCE) master->stall_enabled = true;
>  
> + dev->iommu->pasid_bits = master->ssid_bits;
> +
>   return >iommu;
>  
>  err_free_master:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 6f7485c44a4b..c1b91bce1530 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -4587,8 +4587,11 @@ static struct iommu_device
> *intel_iommu_probe_device(struct device *dev) if (pasid_supported(iommu))
> { int features = pci_pasid_features(pdev);
>  
> - if (features >= 0)
> + if (features >= 0) {
>   info->pasid_supported = features
> | 1;
> + dev->iommu->pasid_bits =
> +
> fls(pci_max_pasids(pdev)) - 1;
> + }
>   }
>  
>   if (info->ats_supported && ecap_prs(iommu->ecap)
> &&


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 7/8] iommu/vt-d: Delete supervisor/kernel SVA

2022-03-29 Thread Jacob Pan
Hi Kevin,

On Fri, 18 Mar 2022 06:16:58 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Tuesday, March 15, 2022 1:07 PM
> > 
> > In-kernel DMA with PASID should use DMA API now, remove supervisor
> > PASID
> > SVA support. Remove special cases in bind mm and page request service.
> > 
> > Signed-off-by: Jacob Pan   
> 
> so you removed all the references to SVM_FLAG_SUPERVISOR_MODE
> but the definition is still kept in include/linux/intel-svm.h...
> 
Good catch, will remove.

> > ---
> >  drivers/iommu/intel/svm.c | 42 ---
> >  1 file changed, 8 insertions(+), 34 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index 2c53689da461..37d6218f173b 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -516,11 +516,10 @@ static void intel_svm_free_pasid(struct mm_struct
> > *mm)
> > 
> >  static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
> >struct device *dev,
> > -  struct mm_struct *mm,
> > -  unsigned int flags)
> > +  struct mm_struct *mm)
> >  {
> > struct device_domain_info *info = get_domain_info(dev);
> > -   unsigned long iflags, sflags;
> > +   unsigned long iflags, sflags = 0;
> > struct intel_svm_dev *sdev;
> > struct intel_svm *svm;
> > int ret = 0;
> > @@ -533,16 +532,13 @@ static struct iommu_sva
> > *intel_svm_bind_mm(struct intel_iommu *iommu,
> > 
> > svm->pasid = mm->pasid;
> > svm->mm = mm;
> > -   svm->flags = flags;
> > INIT_LIST_HEAD_RCU(>devs);
> > 
> > -   if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
> > -   svm->notifier.ops = _mmuops;
> > -   ret = mmu_notifier_register(>notifier,
> > mm);
> > -   if (ret) {
> > -   kfree(svm);
> > -   return ERR_PTR(ret);
> > -   }
> > +   svm->notifier.ops = _mmuops;
> > +   ret = mmu_notifier_register(>notifier, mm);
> > +   if (ret) {
> > +   kfree(svm);
> > +   return ERR_PTR(ret);
> > }
> > 
> > ret = pasid_private_add(svm->pasid, svm);
> > @@ -583,8 +579,6 @@ static struct iommu_sva *intel_svm_bind_mm(struct
> > intel_iommu *iommu,
> > }
> > 
> > /* Setup the pasid table: */
> > -   sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
> > -   PASID_FLAG_SUPERVISOR_MODE : 0;
> > sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ?
> > PASID_FLAG_FL5LP : 0;
> > spin_lock_irqsave(>lock, iflags);
> > ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm-  
> > >pasid,  
> > @@ -957,7 +951,7 @@ static irqreturn_t prq_event_thread(int irq, void
> > *d)
> >  * to unbind the mm while any page faults are
> > outstanding.
> >  */
> > svm = pasid_private_find(req->pasid);
> > -   if (IS_ERR_OR_NULL(svm) || (svm->flags &
> > SVM_FLAG_SUPERVISOR_MODE))
> > +   if (IS_ERR_OR_NULL(svm))
> > goto bad_req;
> > }
> > 
> > @@ -1011,29 +1005,9 @@ static irqreturn_t prq_event_thread(int irq, void
> > *d)
> >  struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct
> > *mm, void *drvdata)
> >  {
> > struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> > -   unsigned int flags = 0;
> > struct iommu_sva *sva;
> > int ret;
> > 
> > -   if (drvdata)
> > -   flags = *(unsigned int *)drvdata;
> > -
> > -   if (flags & SVM_FLAG_SUPERVISOR_MODE) {
> > -   if (!ecap_srs(iommu->ecap)) {
> > -   dev_err(dev, "%s: Supervisor PASID not
> > supported\n",
> > -   iommu->name);
> > -   return ERR_PTR(-EOPNOTSUPP);
> > -   }
> > -
> > -   if (mm) {
> > -   dev_err(dev, "%s: Supervisor PASID with user
> > provided mm\n",
> > -   iommu->name);
> > -   return ERR_PTR(-EINVAL);
> > -   }
> > -
> > -   mm = _mm;
> > -   }
> > -
> > mutex_lock(_mutex);
> > ret = intel_svm_alloc_pasid(dev, mm, flags);
> > if (ret) {
> > --
> > 2.25.1  
> 


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/8] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2022-03-29 Thread Jacob Pan
Hi Kevin,

On Fri, 18 Mar 2022 06:10:40 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Tuesday, March 15, 2022 1:07 PM
> > 
> > The current in-kernel supervisor PASID support is based on the SVM/SVA
> > machinery in SVA lib. The binding between a kernel PASID and kernel
> > mapping has many flaws. See discussions in the link below.
> > 
> > This patch enables in-kernel DMA by switching from SVA lib to the
> > standard DMA mapping APIs. Since both DMA requests with and without
> > PASIDs are mapped identically, there is no change to how DMA APIs are
> > used after the kernel PASID is enabled.
> > 
> > Link: https://lore.kernel.org/linux-
> > iommu/20210511194726.gp1002...@nvidia.com/
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/dma/idxd/idxd.h  |  1 -
> >  drivers/dma/idxd/init.c  | 34 +-
> >  drivers/dma/idxd/sysfs.c |  7 ---
> >  3 files changed, 9 insertions(+), 33 deletions(-)
> > 
> > diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
> > index da72eb15f610..a09ab4a6e1c1 100644
> > --- a/drivers/dma/idxd/idxd.h
> > +++ b/drivers/dma/idxd/idxd.h
> > @@ -276,7 +276,6 @@ struct idxd_device {
> > struct idxd_wq **wqs;
> > struct idxd_engine **engines;
> > 
> > -   struct iommu_sva *sva;
> > unsigned int pasid;
> > 
> > int num_groups;
> > diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> > index 08a5f4310188..5d1f8dd4abf6 100644
> > --- a/drivers/dma/idxd/init.c
> > +++ b/drivers/dma/idxd/init.c
> > @@ -16,6 +16,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include "../dmaengine.h"
> > @@ -466,36 +467,22 @@ static struct idxd_device *idxd_alloc(struct
> > pci_dev *pdev, struct idxd_driver_d
> > 
> >  static int idxd_enable_system_pasid(struct idxd_device *idxd)  
> 
> idxd_enable_pasid_dma() since system pasid is a confusing term now?
> Or just remove the idxd specific wrappers and have the caller to call
> iommu_enable_pasid_dma() directly given the simple logic here.
> 
agreed, will do.

> >  {
> > -   int flags;
> > -   unsigned int pasid;
> > -   struct iommu_sva *sva;
> > +   u32 pasid;
> > +   int ret;
> > 
> > -   flags = SVM_FLAG_SUPERVISOR_MODE;
> > -
> > -   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
> > -   if (IS_ERR(sva)) {
> > -   dev_warn(>pdev->dev,
> > -"iommu sva bind failed: %ld\n", PTR_ERR(sva));
> > -   return PTR_ERR(sva);
> > -   }
> > -
> > -   pasid = iommu_sva_get_pasid(sva);
> > -   if (pasid == IOMMU_PASID_INVALID) {
> > -   iommu_sva_unbind_device(sva);
> > -   return -ENODEV;
> > +   ret = iommu_enable_pasid_dma(>pdev->dev, );
> > +   if (ret) {
> > +   dev_err(>pdev->dev, "No DMA PASID %d\n", ret);
> > +   return ret;
> > }
> > -
> > -   idxd->sva = sva;
> > idxd->pasid = pasid;
> > -   dev_dbg(>pdev->dev, "system pasid: %u\n", pasid);
> > +
> > return 0;
> >  }
> > 
> >  static void idxd_disable_system_pasid(struct idxd_device *idxd)
> >  {
> > -
> > -   iommu_sva_unbind_device(idxd->sva);
> > -   idxd->sva = NULL;
> > +   iommu_disable_pasid_dma(>pdev->dev, idxd->pasid);
> >  }
> > 
> >  static int idxd_probe(struct idxd_device *idxd)
> > @@ -524,10 +511,7 @@ static int idxd_probe(struct idxd_device *idxd)
> > } else {
> > dev_warn(dev, "Unable to turn on SVA
> > feature.\n"); }
> > -   } else if (!sva) {
> > -   dev_warn(dev, "User forced SVA off via module
> > param.\n");  
> 
> why removing above 2 lines? they are related to a module param thus
> not affected by the logic in this series.
> 
This should be in a separate patch. I consulted with Dave, sva module param
is not needed anymore.
Thanks for pointing it out.

> > }
> > -
> > idxd_read_caps(idxd);
> > idxd_read_table_offsets(idxd);
> > 
> > diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
> > index 7e19ab92b61a..fde6656695ba 100644
> > --- a/drivers/dma/idxd/sysfs.c
> > +++ b/drivers/dma/idxd/sysfs.c
> > @@ -839,13 +839,6 @@ static ssize_t wq_name_store(struct device *dev,
> > if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
> > return -EINVAL;
> > 
> > -   /*
> > -* This is temporarily placed here until we have SVM support
> > for
> > -* dmaengine.
> > -*/
> > -   if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq-  
> > >idxd))  
> > -   return -EOPNOTSUPP;
> > -
> > memset(wq->name, 0, WQ_NAME_SIZE + 1);
> > strncpy(wq->name, buf, WQ_NAME_SIZE);
> > strreplace(wq->name, '\n', '\0');
> > --
> > 2.25.1  
> 


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/8] iommu: Add PASID support for DMA mapping API users

2022-03-28 Thread Jacob Pan
Hi BaoLu,

On Fri, 18 Mar 2022 20:43:54 +0800, Lu Baolu 
wrote:

> On 2022/3/15 13:07, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> > 
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a PASID for work submission within
> > the kernel
> > 2. Enable the kernel PASID on the IOMMU for the device
> > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > be IOVA or physical address in case of pass-through.
> > 
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/dma-iommu.c | 65 +++
> >   include/linux/dma-iommu.h |  7 +
> >   include/linux/iommu.h |  9 ++
> >   3 files changed, 81 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index b22034975301..d0ff1a34b1b6 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
> > IOMMU_DMA_MSI_COOKIE,
> >   };
> >   
> > +static DECLARE_IOASID_SET(iommu_dma_pasid);
> > +
> >   struct iommu_dma_cookie {
> > enum iommu_dma_cookie_type  type;
> > union {
> > @@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain
> > *domain) domain->iova_cookie = NULL;
> >   }
> >   
> > +/**
> > + * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
> > + * @dev:   Device to be enabled
> > + *
> > + * DMA request with PASID will be mapped the same way as the legacy
> > DMA.
> > + * If the device is in pass-through, PASID will also pass-through. If
> > the
> > + * device is in IOVA map, the supervisor PASID will point to the same
> > IOVA
> > + * page table.
> > + *
> > + * @return the kernel PASID to be used for DMA or INVALID_IOASID on
> > failure  
> 
> The comment on the return value should be rephrased according to the
> real code.
> 
yes, will do.

> > + */
> > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> > +{
> > +   struct iommu_domain *dom;
> > +   ioasid_t id, max;
> > +   int ret;
> > +
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > +   return -ENODEV;
> > +   max = iommu_get_dev_pasid_max(dev);
> > +   if (!max)
> > +   return -EINVAL;
> > +
> > +   id = ioasid_alloc(_dma_pasid, 1, max, dev);
> > +   if (id == INVALID_IOASID)
> > +   return -ENOMEM;
> > +
> > +   ret = dom->ops->attach_dev_pasid(dom, dev, id);
> > +   if (ret) {
> > +   ioasid_put(id);
> > +   return ret;
> > +   }
> > +   *pasid = id;
> > +
> > +   return ret;
> > +}
> > +EXPORT_SYMBOL(iommu_enable_pasid_dma);
> > +
> > +/**
> > + * iommu_disable_pasid_dma --Disable in-kernel DMA request with PASID
> > + * @dev:   Device's PASID DMA to be disabled
> > + *
> > + * It is the device driver's responsibility to ensure no more incoming
> > DMA
> > + * requests with the kernel PASID before calling this function. IOMMU
> > driver
> > + * ensures PASID cache, IOTLBs related to the kernel PASID are cleared
> > and
> > + * drained.
> > + *
> > + * @return 0 on success or error code on failure  
> 
> Ditto.
> 
same

> > + */
> > +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid)
> > +{
> > +   struct iommu_domain *dom;
> > +
> > +   /* TODO: check the given PASID is within the ioasid_set */
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom->ops->detach_dev_pasid)
> > +   return;
> > +   dom->ops->detach_dev_pasid(dom, dev, pasid);
> > +   ioasid_put(pasid);
> > +}
> > +EXPORT_SYMBOL(iommu_disable_pasid_dma);
> > +
> >   /**
> >* iommu_dma_get_resv_regions - Reserved region driver helper
> >* @dev: Device from iommu_get_resv_regions()
> > diff --git a/include/linu

Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-28 Thread Jacob Pan
Hi Kevin,

On Fri, 18 Mar 2022 05:33:38 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan
> > Sent: Thursday, March 17, 2022 5:02 AM
> > 
> > Hi Kevin,
> > 
> > On Wed, 16 Mar 2022 07:41:34 +, "Tian, Kevin" 
> > wrote:
> >   
> > > > From: Jason Gunthorpe 
> > > > Sent: Tuesday, March 15, 2022 10:33 PM
> > > >
> > > > On Mon, Mar 14, 2022 at 10:07:07PM -0700, Jacob Pan wrote:  
> > > > > + /*
> > > > > +  * Each domain could have multiple devices attached with
> > > > > shared or  
> > > > per  
> > > > > +  * device PASIDs. At the domain level, we keep track of
> > > > > unique PASIDs  
> > > > and  
> > > > > +  * device user count.
> > > > > +  * E.g. If a domain has two devices attached, device A
> > > > > has PASID 0, 1;
> > > > > +  * device B has PASID 0, 2. Then the domain would have
> > > > > PASID 0, 1, 2.
> > > > > +  */  
> > > >
> > > > A 2d array of xarray's seems like a poor data structure for this
> > > > task.  
> > >  
> > Perhaps i mis-presented here, I am not using 2D array. It is an 1D
> > xarray for domain PASIDs only. Then I use the existing device list in
> > each domain, adding another xa to track per-device-domain PASIDs.  
> > > besides that it also doesn't work when we support per-device PASID
> > > allocation in the future. In that case merging device PASIDs together
> > > is conceptually wrong.
> > >  
> > Sorry, could you elaborate? If we do per-dev PASID allocation, we could
> > use the ioasid_set for each pdev, right?  
> 
> My point is simply about the comment above which says the domain
> will have PASID 0, 1, 2 when there is [devA, PASID0] and [devB, PASID0].
> You can maintain a single  PASID list only when it's globally allocated
> cross devices. otherwise this has to be a tuple including device and
> PASID.
> 
Got you, you are right we don't want to limit to globally allocated scheme.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/8] iommu/vt-d: Use device_pasid attach op for RID2PASID

2022-03-17 Thread Jacob Pan
Hi Kevin,

On Wed, 16 Mar 2022 07:54:19 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Tuesday, March 15, 2022 1:07 PM
> > 
> > With the availability of a generic device-PASID-domain attachment API,
> > there's no need to special case RID2PASID.  Use the API to replace
> > duplicated code.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/intel/iommu.c | 18 ++
> >  1 file changed, 2 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 9267194eaed3..f832b7599d21 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1683,9 +1683,6 @@ static void domain_flush_piotlb(struct
> > intel_iommu *iommu,
> > qi_flush_piotlb(iommu, did, domain->default_pasid,
> > addr, npages, ih);
> > 
> > -   if (!list_empty(>devices))
> > -   qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr,
> > npages, ih);
> > -  
> 
> this should be rebased on top of Baolu's "iommu cleanup and refactoring"
> series which has removed the entire domain_flush_piotlb().
> 
Yes, I have been working with Baolu. Some of the refactoring patches were
withdrawn, so there are lots of moving targets. 

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-17 Thread Jacob Pan
Hi Jason,

On Thu, 17 Mar 2022 10:23:08 -0300, Jason Gunthorpe  wrote:

> On Wed, Mar 16, 2022 at 05:49:59PM -0700, Jacob Pan wrote:
> 
> > > I would expect real applications will try to use the same PASID for
> > > the same IOVA map to optimize IOTLB caching.
> > > 
> > > Is there a use case for that I'm missing?
> > >   
> > Yes. it would be more efficient for PASID selective domain TLB flush.
> > But on VT-d IOTLB is also tagged by domain ID, domain flush can use DID
> > if there are many PASIDs. Not sure about other archs. Agree that
> > optimizing PASIDs for TLB flush should be a common goal.  
> 
> If you sort the list of (device, pasid) tuples can something like VT-d
> collapse all the same devices and just issue one DID invalidation:
> 
>  list_for_each()
> if (itm->device == last_invalidated_device)
>   continue;
> invalidate(itm->device);
> last_invalidated_device = itm->device;
> 
I assume this is for devTLB since IOMMU's IOTLB flush doesn't care about
device. I think it works for device-wide invalidation.

> While something that was per-pasid could issue per-pasid invalidations
> from the same data structure?
> 
yes. we can use the same data structure for PASID selective devTLB but 
 list_for_each()
 if (itm->pasid == pasid_to_be_invalidated;
 invalidate(itm->device, pasid);

For IOMMU's IOTLB, we also have two granularities
1. domain-wide
2. pasid-wide
For #1, we just use DID to invalidate w/o traverse the list.
For #2, we just need to sanity check the pasid is indeed attached by going
through the list.

Seems to work!

> > > Otherwise your explanation is what I was imagining as well.
> > > 
> > > I would also think about expanding your struct so that the device
> > > driver can track per-device per-domain data as well, that seems
> > > useful IIRC?
> > >   
> > yes, at least both VT-d and FSL drivers have struct device_domain_info.
> >   
> > > ie put a 'sizeof_iommu_dev_pasid_data' in the domain->ops and
> > > allocate that much memory so the driver can use the trailer space for
> > > its own purpose.
> > >   
> > That sounds great to have but not sure i understood correctly how to do
> > it.
> > 
> > Do you mean for each vendor driver's struct device_domain_info (or
> > equivalent), we carve out sizeof_iommu_dev_pasid_data as common data,
> > then the rest of the space is vendor specific? I don't feel I get your
> > point, could you elaborate?  
> 
> I've seen it done two ways..
> 
> With a flex array:
> 
>  struct iommu_device_data {
>  struct list_head list
>  ioasid_t pasid;
>  struct device *dev;
>  [..]
>  u64 device_data[];
>  }
> 
>  struct intel_device_data {
>   [..]
>  }
>  struct iommu_device_data *dev_data;
>  struct intel_device_data *intel_data = (void *)_data->device_data;
> 
> Or with container of:
> 
>  struct iommu_device_data {
>  struct list_head list
>  ioasid_t pasid;
>  struct device *dev;
>  [..]
>  }
> 
>  struct intel_device_data {
>  struct iommu_device_data iommu; // must be first
>  [...]
>  }
>  struct iommu_device_data *dev_data;
>  struct intel_device_data *intel_data = container_of(dev_data, struct
> intel_device_data, iommu);
> 
> In either case you'd add a size_t to the domain->ops specifying how
> much extra memory for the core code to allocate when it manages the
> datastructure. The first case allocates based on struct_size, the
> second case allocates what is specified.
> 
> Look at INIT_RDMA_OBJ_SIZE() for some more complicated example how the
> latter can work. I like it because it has the nice container_of
> pattern in drivers, the downside is it requires a BUILD_BUG_ON to
> check that the driver ordered its struct properly.
> 
> The point is to consolidate all the code for allocating and walking
> the data structure without having to force two allocations and extra
> pointer indirections on performance paths.
Make sense, very neat. Vendor driver would not need to do allocations. Let
me give that a try. Seems #2 has better type safety.

Thank you so much for the thorough explanation!

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-16 Thread Jacob Pan
Hi Jason,

On Wed, 16 Mar 2022 19:15:50 -0300, Jason Gunthorpe  wrote:

> On Wed, Mar 16, 2022 at 01:50:04PM -0700, Jacob Pan wrote:
> 
> > I guess a list of (device, pasid) tuples as you suggested could work
> > but it will have duplicated device entries since each device could have
> > multiple PASIDs. right?  
> 
> Is assigning the same iommu_domain to multiple PASIDs of the same
> device something worth optimizing for?
Probably not, the current usage case has only two PASIDs at most (RID2PASID
+ a kernel PASID).

I was just thinking for the generalized case, device TLB flush would be
more efficient if we don't go through the domain list. Use a per-domain-dev
list instead. But it doesn't matter much for DMA domain which has one
device mostly.

> I would expect real applications will try to use the same PASID for
> the same IOVA map to optimize IOTLB caching.
> 
> Is there a use case for that I'm missing?
> 
Yes. it would be more efficient for PASID selective domain TLB flush. But
on VT-d IOTLB is also tagged by domain ID, domain flush can use DID if
there are many PASIDs. Not sure about other archs. Agree that optimizing
PASIDs for TLB flush should be a common goal.

> Otherwise your explanation is what I was imagining as well.
> 
> I would also think about expanding your struct so that the device
> driver can track per-device per-domain data as well, that seems
> useful IIRC?
> 
yes, at least both VT-d and FSL drivers have struct device_domain_info.

> ie put a 'sizeof_iommu_dev_pasid_data' in the domain->ops and
> allocate that much memory so the driver can use the trailer space for
> its own purpose.
> 
That sounds great to have but not sure i understood correctly how to do it.

Do you mean for each vendor driver's struct device_domain_info (or
equivalent), we carve out sizeof_iommu_dev_pasid_data as common data, then
the rest of the space is vendor specific? I don't feel I get your point,
could you elaborate?


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-16 Thread Jacob Pan
Hi Kevin,

On Wed, 16 Mar 2022 07:41:34 +, "Tian, Kevin" 
wrote:

> > From: Jason Gunthorpe 
> > Sent: Tuesday, March 15, 2022 10:33 PM
> > 
> > On Mon, Mar 14, 2022 at 10:07:07PM -0700, Jacob Pan wrote:  
> > > + /*
> > > +  * Each domain could have multiple devices attached with
> > > shared or  
> > per  
> > > +  * device PASIDs. At the domain level, we keep track of
> > > unique PASIDs  
> > and  
> > > +  * device user count.
> > > +  * E.g. If a domain has two devices attached, device A has
> > > PASID 0, 1;
> > > +  * device B has PASID 0, 2. Then the domain would have PASID
> > > 0, 1, 2.
> > > +  */  
> > 
> > A 2d array of xarray's seems like a poor data structure for this task.  
> 
Perhaps i mis-presented here, I am not using 2D array. It is an 1D xarray
for domain PASIDs only. Then I use the existing device list in each domain,
adding another xa to track per-device-domain PASIDs.
> besides that it also doesn't work when we support per-device PASID
> allocation in the future. In that case merging device PASIDs together is
> conceptually wrong.
> 
Sorry, could you elaborate? If we do per-dev PASID allocation, we could use
the ioasid_set for each pdev, right?

> > 
> > AFACIT this wants to store a list of (device, pasid) tuples, so a
> > simple linked list, 1d xarray vector or a red black tree seems more
> > appropriate..
> >   
> 
> this tuple can well serve per-device PASID. 
> 
I commented on the other email, but a simple list of tuples could have
duplicated devices since each dev could attach multiple PASIDs, right?
Should we still do two level then?

> Thanks
> Kevin


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-16 Thread Jacob Pan
Hi Kevin,

On Wed, 16 Mar 2022 07:39:09 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Tuesday, March 15, 2022 1:07 PM
> > +static int intel_iommu_attach_dev_pasid(struct iommu_domain *domain,
> > +   struct device *dev, ioasid_t
> > pasid) +{
> > +   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > +   struct device_domain_info *info = get_domain_info(dev);
> > +   struct intel_iommu *iommu = info->iommu;
> > +   struct pasid_info *pinfo;
> > +   unsigned long flags;
> > +   int ret = 0;
> > +   void *entry;
> > +
> > +   if (!info)
> > +   return -ENODEV;  
> 
> btw this interface only works in scalable mode. Lack of a check to
> return error on legacy mode here.
> 
right, legacy mode has no PASIDs. will check

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-16 Thread Jacob Pan
Hi Jason,

On Tue, 15 Mar 2022 20:04:57 -0300, Jason Gunthorpe  wrote:

> On Tue, Mar 15, 2022 at 03:36:20PM -0700, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Tue, 15 Mar 2022 11:33:22 -0300, Jason Gunthorpe 
> > wrote: 
> > > On Mon, Mar 14, 2022 at 10:07:07PM -0700, Jacob Pan wrote:  
> > > > +   /*
> > > > +* Each domain could have multiple devices attached with
> > > > shared or per
> > > > +* device PASIDs. At the domain level, we keep track of
> > > > unique PASIDs and
> > > > +* device user count.
> > > > +* E.g. If a domain has two devices attached, device A has
> > > > PASID 0, 1;
> > > > +* device B has PASID 0, 2. Then the domain would have
> > > > PASID 0, 1, 2.
> > > > +*/
> > > 
> > > A 2d array of xarray's seems like a poor data structure for this task.
> > > 
> > > AFACIT this wants to store a list of (device, pasid) tuples, so a
> > > simple linked list, 1d xarray vector or a red black tree seems more
> > > appropriate..
> > >   
> > Agreed.
> > It might need some surgery for dmar_domain and device_domain_info, which
> > already has a simple device list. I am trying to leverage the existing
> > data struct, let me take a closer look.  
> 
> Maybe the core code should provide this data structure in the
> iommu_domain.
> 
> Figuring out what stuff is attached is something every driver has to
> do right?
yeah, seems we already have some duplicated device list in vendor domain
struct, e.g. VT-d's dmar_domain, AMD's protection_domain. Similarly for 
device_domain_info equivalent.

If core code provides domain-device-pasid tracking, we could do device-pasid
tracking in ioasid.c, when we support per device PASID allocation, each
phy device could be an IOASID set, thus its own namespace.

Perhaps, we could do the following: add a device list to struct
iommu_domain, this will replace vender's domain lists. The data would be
something like:
struct iommu_dev_pasid_data {
struct list_head list;/* For iommu_domain->dev_list */
struct ioasid_set *pasids;  /* For the PASIDs used by the device */
struct device *dev;
};

I guess a list of (device, pasid) tuples as you suggested could work but it
will have duplicated device entries since each device could have multiple
PASIDs. right?

Have to code this up to see.

Thanks for the pointers,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-15 Thread Jacob Pan
Hi Jason,

On Tue, 15 Mar 2022 11:33:22 -0300, Jason Gunthorpe  wrote:

> On Mon, Mar 14, 2022 at 10:07:07PM -0700, Jacob Pan wrote:
> > +   /*
> > +* Each domain could have multiple devices attached with
> > shared or per
> > +* device PASIDs. At the domain level, we keep track of unique
> > PASIDs and
> > +* device user count.
> > +* E.g. If a domain has two devices attached, device A has
> > PASID 0, 1;
> > +* device B has PASID 0, 2. Then the domain would have PASID
> > 0, 1, 2.
> > +*/  
> 
> A 2d array of xarray's seems like a poor data structure for this task.
> 
> AFACIT this wants to store a list of (device, pasid) tuples, so a
> simple linked list, 1d xarray vector or a red black tree seems more
> appropriate..
> 
Agreed.
It might need some surgery for dmar_domain and device_domain_info, which
already has a simple device list. I am trying to leverage the existing data
struct, let me take a closer look.

> > +   if (entry) {
> > +   pinfo = entry;
> > +   } else {
> > +   pinfo = kzalloc(sizeof(*pinfo), GFP_ATOMIC);
> > +   if (!pinfo)
> > +   return -ENOMEM;
> > +   pinfo->pasid = pasid;
> > +   /* Store the new PASID info in the per domain array */
> > +   ret = xa_err(__xa_store(_domain->pasids, pasid,
> > pinfo,
> > +GFP_ATOMIC));
> > +   if (ret)
> > +   goto xa_store_err;
> > +   }
> > +   /* Store PASID in per device-domain array, this is for
> > tracking devTLB */
> > +   ret = xa_err(xa_store(>pasids, pasid, pinfo,
> > GFP_ATOMIC));
> > +   if (ret)
> > +   goto xa_store_err;
> > +
> > +   atomic_inc(>users);
> > +   xa_unlock(_domain->pasids);
> > +
> > +   return 0;
> > +
> > +xa_store_err:
> > +   xa_unlock(_domain->pasids);
> > +   spin_lock_irqsave(>lock, flags);
> > +   intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> > +   spin_unlock_irqrestore(>lock, flags);
> > +
> > +   if (!atomic_read(>users)) {
> > +   __xa_erase(_domain->pasids, pasid);  
> 
> This isn't locked right
> 
good catch! need to move under xa_unlock.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-15 Thread Jacob Pan
Hi Kevin,

On Tue, 15 Mar 2022 10:33:08 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Tuesday, March 15, 2022 1:07 PM
> > 
> > On VT-d platforms with scalable mode enabled, devices issue DMA requests
> > with PASID need to attach to the correct IOMMU domains.
> > The attach operation involves the following:
> > - programming the PASID into device's PASID table
> > - tracking device domain and the PASID relationship
> > - managing IOTLB and device TLB invalidations
> > 
> > This patch extends DMAR domain and device domain info with xarrays to
> > track per domain and per device PASIDs.  It provides the flexibility to
> > be used beyond DMA API PASID support.
> > 
> > Signed-off-by: Lu Baolu 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/intel/iommu.c | 194
> > +++-
> >  include/linux/intel-iommu.h |  12 ++-
> >  2 files changed, 202 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 881f8361eca2..9267194eaed3 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1622,20 +1622,48 @@ static void __iommu_flush_dev_iotlb(struct
> > device_domain_info *info,
> >qdep, addr, mask);
> >  }
> > 
> > +static void __iommu_flush_dev_piotlb(struct device_domain_info *info,  
> 
> piotlb is confusing, better be:
> 
>   __iommu_flush_dev_iotlb_pasid()
> 
yeah, that is more clear.

> > +   u64 address,
> > +ioasid_t pasid, unsigned int mask)
> > +{
> > +   u16 sid, qdep;
> > +
> > +   if (!info || !info->ats_enabled)
> > +   return;
> > +
> > +   sid = info->bus << 8 | info->devfn;
> > +   qdep = info->ats_qdep;
> > +   qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
> > +pasid, qdep, address, mask);
> > +}
> > +
> >  static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
> >   u64 addr, unsigned mask)
> >  {
> > unsigned long flags;
> > struct device_domain_info *info;
> > struct subdev_domain_info *sinfo;
> > +   unsigned long pasid;
> > +   struct pasid_info *pinfo;
> > 
> > if (!domain->has_iotlb_device)
> > return;
> > 
> > spin_lock_irqsave(_domain_lock, flags);
> > -   list_for_each_entry(info, >devices, link)
> > -   __iommu_flush_dev_iotlb(info, addr, mask);
> > -
> > +   list_for_each_entry(info, >devices, link) {
> > +   /*
> > +* We cannot use PASID based devTLB invalidation on
> > RID2PASID
> > +* Device does not understand RID2PASID/0. This is
> > different  
> 
> Lack of a conjunction word between 'RID2PASID' and 'Device'.
> 
> and what is RID2PASID/0? It would be clearer to point out that RID2PASID
> is visible only within the iommu to mark out requests without PASID, 
> thus this PASID value should never be sent to the device side.
> 
Good point, will do.

> > +* than IOTLB invalidation where RID2PASID is also
> > used for
> > +* tagging.  
> 
> Then it would be obvious because IOTLB is iommu internal agent thus takes 
> use of RID2PASID for tagging.
> 
ditto

> > +*/
> > +   xa_for_each(>pasids, pasid, pinfo) {
> > +   if (!pasid)  
> 
> this should be compared to PASID_RID2PASID (though it's zero today).
> 
ditto

> > +   __iommu_flush_dev_iotlb(info, addr,
> > mask);
> > +   else
> > +   __iommu_flush_dev_piotlb(info, addr,
> > pasid, mask);
> > +   }
> > +   }
> > list_for_each_entry(sinfo, >subdevices, link_domain) {
> > info = get_domain_info(sinfo->pdev);
> > __iommu_flush_dev_iotlb(info, addr, mask);  
> 
> Thanks
> Kevin


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/8] iommu: Add PASID support for DMA mapping API users

2022-03-15 Thread Jacob Pan
Hi Jason,

On Tue, 15 Mar 2022 14:05:07 -0300, Jason Gunthorpe  wrote:

> On Tue, Mar 15, 2022 at 09:31:35AM -0700, Jacob Pan wrote:
> 
> > > IMHO it is a device mis-design of IDXD to require all DMA be PASID
> > > tagged. Devices should be able to do DMA on their RID when the PCI  
> 
> > IDXD can do DMA w/ RID, the PASID requirement is only for shared WQ
> > where ENQCMDS is used. ENQCMDS has the benefit of avoiding locking
> > where work submission is done from multiple CPUs.
> > Tony, Dave?  
> 
> This is what I mean, it has an operating mode you want to use from the
> kernel driver that cannot do RID DMA. It is a HW mis-design, IMHO.
> 
> Something like PASID0 in the ENQCMDS should have triggered RID DMA.
> 
That would simplify things a lot, it would be just a device change I think.
>From IA pov, only ENQCMD will #GP if PASID==0. I will bring this back to HW
team to consider for future generations.

> > > In any case I think we are better to wait for an actual user for multi
> > > DMA API iommu_domains to come forward before we try to build an API
> > > for it.  
> > 
> > What would you recommend in the interim?  
> 
> Oh, I mean this approach at a high level is fine - I was saying we
> shouldn't try to broaden it like Robin was suggesting without a driver
> that needs multiple iommu_domains for the DMA API.
> 
Got it. Thanks for the clarification.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/8] iommu: Add PASID support for DMA mapping API users

2022-03-15 Thread Jacob Pan
Hi Jason,

On Tue, 15 Mar 2022 11:35:35 -0300, Jason Gunthorpe  wrote:

> On Mon, Mar 14, 2022 at 10:07:09PM -0700, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> > 
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a PASID for work submission within
> > the kernel
> > 2. Enable the kernel PASID on the IOMMU for the device
> > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > be IOVA or physical address in case of pass-through.
> > 
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
> > 
> > Signed-off-by: Jacob Pan 
> >  drivers/iommu/dma-iommu.c | 65 +++
> >  include/linux/dma-iommu.h |  7 +
> >  include/linux/iommu.h |  9 ++
> >  3 files changed, 81 insertions(+)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index b22034975301..d0ff1a34b1b6 100644
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
> > IOMMU_DMA_MSI_COOKIE,
> >  };
> >  
> > +static DECLARE_IOASID_SET(iommu_dma_pasid);
> > +
> >  struct iommu_dma_cookie {
> > enum iommu_dma_cookie_type  type;
> > union {
> > @@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain
> > *domain) domain->iova_cookie = NULL;
> >  }
> >  
> > +/**
> > + * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
> > + * @dev:   Device to be enabled
> > + *
> > + * DMA request with PASID will be mapped the same way as the legacy
> > DMA.
> > + * If the device is in pass-through, PASID will also pass-through. If
> > the
> > + * device is in IOVA map, the supervisor PASID will point to the same
> > IOVA
> > + * page table.
> > + *
> > + * @return the kernel PASID to be used for DMA or INVALID_IOASID on
> > failure
> > + */
> > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
> > +{
> > +   struct iommu_domain *dom;
> > +   ioasid_t id, max;
> > +   int ret;
> > +
> > +   dom = iommu_get_domain_for_dev(dev);
> > +   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
> > +   return -ENODEV;  
> 
> Given the purpose of this API I think it should assert that the device
> has the DMA API in-use using the machinery from the other series.
> 
> ie this should not be used to clone non-DMA API iommu_domains..
> 
Let me try to confirm the specific here. I should check domain type and
rejects all others except IOMMU_DOMAIN_DMA type, right? Should also allow
IOMMU_DOMAIN_IDENTITY.

That makes sense.

> > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> > index 24607dc3c2ac..e6cb9b52a420 100644
> > +++ b/include/linux/dma-iommu.h
> > @@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain
> > *domain); int iommu_get_msi_cookie(struct iommu_domain *domain,
> > dma_addr_t base); void iommu_put_dma_cookie(struct iommu_domain
> > *domain); 
> > +/*
> > + * For devices that can do DMA request with PASID, setup a system
> > PASID.
> > + * Address modes (IOVA, PA) are selected by the platform code.
> > + */
> > +int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid);
> > +void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid);  
> 
> The functions already have a kdoc, don't need two..
> 
Good point!

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/8] iommu: Add PASID support for DMA mapping API users

2022-03-15 Thread Jacob Pan
Hi Jason,

On Tue, 15 Mar 2022 11:22:16 -0300, Jason Gunthorpe  wrote:

> On Tue, Mar 15, 2022 at 11:16:41AM +, Robin Murphy wrote:
> > On 2022-03-15 05:07, Jacob Pan wrote:  
> > > DMA mapping API is the de facto standard for in-kernel DMA. It
> > > operates on a per device/RID basis which is not PASID-aware.
> > > 
> > > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > > required for certain work submissions. To allow such devices use DMA
> > > mapping API, we need the following functionalities:
> > > 1. Provide device a way to retrieve a PASID for work submission within
> > > the kernel
> > > 2. Enable the kernel PASID on the IOMMU for the device
> > > 3. Attach the kernel PASID to the device's default DMA domain, let it
> > > be IOVA or physical address in case of pass-through.
> > > 
> > > This patch introduces a driver facing API that enables DMA API
> > > PASID usage. Once enabled, device drivers can continue to use DMA
> > > APIs as is. There is no difference in dma_handle between without
> > > PASID and with PASID.  
> > 
> > Surely the main point of PASIDs is to be able to use more than one
> > of them?  
> 
> IMHO, not for the DMA API.
> 
Right, but we really need two here. One for DMA request w/o PASID (PASID 0)
and a kernel PASID for DMA request tagged w/ PASID.
Since DMA API is not per process, there is no need for more right now.

> I can't think of good reasons why a single in-kernel device should
> require more than one iommu_domain for use by the DMA API. Even with
> the SIOV cases we have been looking at we don't really see a use case
> for more than one DMA API iommu_domain on a single physical device.
> Do you know of something on the horizon?
> 
Not that I know.

> From my view the main point of PASIDs is to assign iommu_domains that
> are not used by the DMA API.
> 
Right, DMA API default to PASID 0. But IDXD device cannot use PASID 0 for
enqcmds.

> IMHO it is a device mis-design of IDXD to require all DMA be PASID
> tagged. Devices should be able to do DMA on their RID when the PCI
IDXD can do DMA w/ RID, the PASID requirement is only for shared WQ where
ENQCMDS is used. ENQCMDS has the benefit of avoiding locking where work
submission is done from multiple CPUs.
Tony, Dave?

> function is controlled by a kernel driver. I see this driver facing
> API as addressing a device quirk by aliasing the DMA API of the RID
> into a PASID and that is really all it is good for.
> 
> In any case I think we are better to wait for an actual user for multi
> DMA API iommu_domains to come forward before we try to build an API
> for it.
> 
What would you recommend in the interim?

Shall we let VT-d driver set up a special global PASID for DMA API? Then
IDXD driver can retrieve it somehow? But that still needs an API similar to
what I did in the previous version where PASID #1 was used.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/8] iommu: Add attach/detach_dev_pasid domain ops

2022-03-15 Thread Jacob Pan
Hi Kevin,

On Tue, 15 Mar 2022 11:49:57 +, "Tian, Kevin" 
wrote:

> > From: Jean-Philippe Brucker 
> > Sent: Tuesday, March 15, 2022 7:27 PM
> > 
> > On Mon, Mar 14, 2022 at 10:07:06PM -0700, Jacob Pan wrote:  
> > > From: Lu Baolu 
> > >
> > > An IOMMU domain represents an address space which can be attached by
> > > devices that perform DMA within a domain. However, for platforms with
> > > PASID capability the domain attachment needs be handled at
> > > device+PASID level. There can be multiple PASIDs within a device and
> > > multiple devices attached to a given domain.
> > > This patch introduces a new IOMMU op which support device, PASID, and
> > > IOMMU domain attachment. The immediate use case is for PASID capable
> > > devices to perform DMA under DMA APIs.
> > >
> > > Signed-off-by: Lu Baolu 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > >  include/linux/iommu.h | 6 ++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index 369f05c2a4e2..fde5b933dbe3 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
> > >   * @aux_get_pasid: get the pasid given an aux-domain
> > >   * @sva_bind: Bind process address space to device
> > >   * @sva_unbind: Unbind process address space from device
> > > + * @attach_dev_pasid: attach an iommu domain to a pasid of device
> > > + * @detach_dev_pasid: detach an iommu domain from a pasid of device  
> > 
> > Isn't that operation "assign a PASID to a domain" instead?  In patch 5,
> > the domain is already attached to the device, so set_domain_pasid()
> > might be clearer and to the point. If the IOMMU driver did the
> > allocation we could also avoid patch 1.
I agree, we could let vendor driver do the allocation inside this op. On
the other side, we could also keep the flexibility such that this op can be
used for guest PASID bind, with a SVA domain.
> 
> iiuc this API can also work for future SIOV usage where each mdev attached
> to the domain has its own pasid. "assigning a PASID to a domain" sounds
> like going back to the previous aux domain approach which has one PASID
> per domain and that PASID is used on all devices attached to the aux
> domain...
> 
yes, that is the intention. I plan to lift the requirement in patch 5 such
that device attachment will not be a prerequisite. That would be after mdev
adoption.

> > 
> > If I understand correctly this series is not about a generic PASID API
> > that allows drivers to manage multiple DMA address spaces, because there
> > still doesn't seem to be any interest in that. It's about the specific
> > IDXD use-case, so let's focus on that. We can introduce a specialized
> > call such as (iommu|dma)_set_device_pasid(), which will be easy to
> > consolidate later into a more generic "dma_enable_pasid()" API if that
> > ever seems useful.
> > 
Right, at the moment it is still a single address space. i.e. the current
domain of the device/group.

But this limitation is at the driver facing API layer not limited in the
IOMMU ops.

> > Thanks,
> > Jean
> >   
> > >   * @sva_get_pasid: Get PASID associated to a SVA handle
> > >   * @page_response: handle page request response
> > >   * @cache_invalidate: invalidate translation caches
> > > @@ -296,6 +298,10 @@ struct iommu_ops {
> > >   struct iommu_sva *(*sva_bind)(struct device *dev, struct
> > > mm_struct  
> > *mm,  
> > > void *drvdata);
> > >   void (*sva_unbind)(struct iommu_sva *handle);
> > > + int (*attach_dev_pasid)(struct iommu_domain *domain,
> > > + struct device *dev, ioasid_t id);
> > > + void (*detach_dev_pasid)(struct iommu_domain *domain,
> > > +  struct device *dev, ioasid_t id);
> > >   u32 (*sva_get_pasid)(struct iommu_sva *handle);
> > >
> > >   int (*page_response)(struct device *dev,
> > > --
> > > 2.25.1
> > >  


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/8] Enable PASID for DMA API users

2022-03-15 Thread Jacob Pan
Hi Kevin,

On Tue, 15 Mar 2022 08:16:59 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Tuesday, March 15, 2022 1:07 PM
> > 
> > Some modern accelerators such as Intel's Data Streaming Accelerator
> > (DSA) require PASID in DMA requests to be operational. Specifically,
> > the work submissions with ENQCMD on shared work queues require PASIDs.
> > The use cases
> > include both user DMA with shared virtual addressing (SVA) and in-kernel
> > DMA similar to legacy DMA w/o PASID. Here we address the latter.
> > 
> > DMA mapping API is the de facto standard for in-kernel DMA. However, it
> > operates on a per device or Requester ID(RID) basis which is not
> > PASID-aware. To leverage DMA API for devices relies on PASIDs, this
> > patchset introduces the following APIs
> > 
> > 1. A driver facing API that enables DMA API PASID usage:
> > iommu_enable_pasid_dma(struct device *dev, ioasid_t );  
> 
> Should this be called dma_enable_pasid() since it's about DMA API? Doing
> so also avoids the driver to include iommu.h.
> 
PASID is still tied to IOMMU, drivers who wants to use this must explicitly
put dependency on IOMMU. So I prefer not to give that illusion.

> Thanks
> Kevin


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/8] iommu: Assign per device max PASID

2022-03-14 Thread Jacob Pan
From: Lu Baolu 

PCIe spec defines Max PASID Width per-device.  Since a PASID is only
used with IOMMU enabled, this patch introduces a PASID max variable on
the per-device IOMMU data. It will be used for limiting PASID allocation
in that PASID table is per-device.

Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c |  4 +++-
 include/linux/iommu.h   | 13 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 50666d250b36..881f8361eca2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2602,8 +2602,10 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (sm_supported(iommu)) {
if (pasid_supported(iommu)) {
int features = pci_pasid_features(pdev);
-   if (features >= 0)
+   if (features >= 0) {
info->pasid_supported = features | 1;
+   iommu_set_dev_pasid_max(>dev, 
pci_max_pasids(pdev));
+   }
}
 
if (info->ats_supported && ecap_prs(iommu->ecap) &&
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index de0c57a567c8..369f05c2a4e2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -364,6 +364,7 @@ struct iommu_fault_param {
  * @fwspec: IOMMU fwspec data
  * @iommu_dev:  IOMMU device this device is linked to
  * @priv:   IOMMU Driver private data
+ * @pasid_max   Max PASID value supported by this device
  *
  * TODO: migrate other per device data pointers under iommu_dev_data, e.g.
  * struct iommu_group  *iommu_group;
@@ -375,8 +376,20 @@ struct dev_iommu {
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void*priv;
+   unsigned intpasid_max;
 };
 
+static inline void iommu_set_dev_pasid_max(struct device *dev,
+   unsigned int max)
+{
+   struct dev_iommu *param = dev->iommu;
+
+   if (WARN_ON(!param))
+   return;
+
+   param->pasid_max = max;
+}
+
 int iommu_device_register(struct iommu_device *iommu,
  const struct iommu_ops *ops,
  struct device *hwdev);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/8] iommu/vt-d: Use device_pasid attach op for RID2PASID

2022-03-14 Thread Jacob Pan
With the availability of a generic device-PASID-domain attachment API,
there's no need to special case RID2PASID.  Use the API to replace
duplicated code.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 18 ++
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9267194eaed3..f832b7599d21 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1683,9 +1683,6 @@ static void domain_flush_piotlb(struct intel_iommu *iommu,
qi_flush_piotlb(iommu, did, domain->default_pasid,
addr, npages, ih);
 
-   if (!list_empty(>devices))
-   qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
-
if (list_empty(>devices) || xa_empty(>pasids))
return;
 
@@ -2826,17 +2823,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
}
 
/* Setup the PASID entry for requests without PASID: */
-   spin_lock_irqsave(>lock, flags);
-   if (hw_pass_through && domain_type_is_si(domain))
-   ret = intel_pasid_setup_pass_through(iommu, domain,
-   dev, PASID_RID2PASID);
-   else if (domain_use_first_level(domain))
-   ret = domain_setup_first_level(iommu, domain, dev,
-   PASID_RID2PASID);
-   else
-   ret = intel_pasid_setup_second_level(iommu, domain,
-   dev, PASID_RID2PASID);
-   spin_unlock_irqrestore(>lock, flags);
+   ret = intel_iommu_attach_dev_pasid(>domain, dev, 
PASID_RID2PASID);
if (ret) {
dev_err(dev, "Setup RID2PASID failed\n");
dmar_remove_one_dev_info(dev);
@@ -4618,8 +4605,7 @@ static void __dmar_remove_one_dev_info(struct 
device_domain_info *info)
 
if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
if (dev_is_pci(info->dev) && sm_supported(iommu))
-   intel_pasid_tear_down_entry(iommu, info->dev,
-   PASID_RID2PASID, false);
+   intel_iommu_detach_dev_pasid(>domain, 
info->dev, PASID_RID2PASID);
 
iommu_disable_dev_iotlb(info);
domain_context_clear(info);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 7/8] iommu/vt-d: Delete supervisor/kernel SVA

2022-03-14 Thread Jacob Pan
In-kernel DMA with PASID should use DMA API now, remove supervisor PASID
SVA support. Remove special cases in bind mm and page request service.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 42 ---
 1 file changed, 8 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 2c53689da461..37d6218f173b 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -516,11 +516,10 @@ static void intel_svm_free_pasid(struct mm_struct *mm)
 
 static struct iommu_sva *intel_svm_bind_mm(struct intel_iommu *iommu,
   struct device *dev,
-  struct mm_struct *mm,
-  unsigned int flags)
+  struct mm_struct *mm)
 {
struct device_domain_info *info = get_domain_info(dev);
-   unsigned long iflags, sflags;
+   unsigned long iflags, sflags = 0;
struct intel_svm_dev *sdev;
struct intel_svm *svm;
int ret = 0;
@@ -533,16 +532,13 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
 
svm->pasid = mm->pasid;
svm->mm = mm;
-   svm->flags = flags;
INIT_LIST_HEAD_RCU(>devs);
 
-   if (!(flags & SVM_FLAG_SUPERVISOR_MODE)) {
-   svm->notifier.ops = _mmuops;
-   ret = mmu_notifier_register(>notifier, mm);
-   if (ret) {
-   kfree(svm);
-   return ERR_PTR(ret);
-   }
+   svm->notifier.ops = _mmuops;
+   ret = mmu_notifier_register(>notifier, mm);
+   if (ret) {
+   kfree(svm);
+   return ERR_PTR(ret);
}
 
ret = pasid_private_add(svm->pasid, svm);
@@ -583,8 +579,6 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
}
 
/* Setup the pasid table: */
-   sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
-   PASID_FLAG_SUPERVISOR_MODE : 0;
sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
spin_lock_irqsave(>lock, iflags);
ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm->pasid,
@@ -957,7 +951,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 * to unbind the mm while any page faults are 
outstanding.
 */
svm = pasid_private_find(req->pasid);
-   if (IS_ERR_OR_NULL(svm) || (svm->flags & 
SVM_FLAG_SUPERVISOR_MODE))
+   if (IS_ERR_OR_NULL(svm))
goto bad_req;
}
 
@@ -1011,29 +1005,9 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
 {
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
-   unsigned int flags = 0;
struct iommu_sva *sva;
int ret;
 
-   if (drvdata)
-   flags = *(unsigned int *)drvdata;
-
-   if (flags & SVM_FLAG_SUPERVISOR_MODE) {
-   if (!ecap_srs(iommu->ecap)) {
-   dev_err(dev, "%s: Supervisor PASID not supported\n",
-   iommu->name);
-   return ERR_PTR(-EOPNOTSUPP);
-   }
-
-   if (mm) {
-   dev_err(dev, "%s: Supervisor PASID with user provided 
mm\n",
-   iommu->name);
-   return ERR_PTR(-EINVAL);
-   }
-
-   mm = _mm;
-   }
-
mutex_lock(_mutex);
ret = intel_svm_alloc_pasid(dev, mm, flags);
if (ret) {
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/8] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2022-03-14 Thread Jacob Pan
The current in-kernel supervisor PASID support is based on the SVM/SVA
machinery in SVA lib. The binding between a kernel PASID and kernel
mapping has many flaws. See discussions in the link below.

This patch enables in-kernel DMA by switching from SVA lib to the
standard DMA mapping APIs. Since both DMA requests with and without
PASIDs are mapped identically, there is no change to how DMA APIs are
used after the kernel PASID is enabled.

Link: https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/idxd.h  |  1 -
 drivers/dma/idxd/init.c  | 34 +-
 drivers/dma/idxd/sysfs.c |  7 ---
 3 files changed, 9 insertions(+), 33 deletions(-)

diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index da72eb15f610..a09ab4a6e1c1 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -276,7 +276,6 @@ struct idxd_device {
struct idxd_wq **wqs;
struct idxd_engine **engines;
 
-   struct iommu_sva *sva;
unsigned int pasid;
 
int num_groups;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 08a5f4310188..5d1f8dd4abf6 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "../dmaengine.h"
@@ -466,36 +467,22 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev, struct idxd_driver_d
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-   int flags;
-   unsigned int pasid;
-   struct iommu_sva *sva;
+   u32 pasid;
+   int ret;
 
-   flags = SVM_FLAG_SUPERVISOR_MODE;
-
-   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
-   if (IS_ERR(sva)) {
-   dev_warn(>pdev->dev,
-"iommu sva bind failed: %ld\n", PTR_ERR(sva));
-   return PTR_ERR(sva);
-   }
-
-   pasid = iommu_sva_get_pasid(sva);
-   if (pasid == IOMMU_PASID_INVALID) {
-   iommu_sva_unbind_device(sva);
-   return -ENODEV;
+   ret = iommu_enable_pasid_dma(>pdev->dev, );
+   if (ret) {
+   dev_err(>pdev->dev, "No DMA PASID %d\n", ret);
+   return ret;
}
-
-   idxd->sva = sva;
idxd->pasid = pasid;
-   dev_dbg(>pdev->dev, "system pasid: %u\n", pasid);
+
return 0;
 }
 
 static void idxd_disable_system_pasid(struct idxd_device *idxd)
 {
-
-   iommu_sva_unbind_device(idxd->sva);
-   idxd->sva = NULL;
+   iommu_disable_pasid_dma(>pdev->dev, idxd->pasid);
 }
 
 static int idxd_probe(struct idxd_device *idxd)
@@ -524,10 +511,7 @@ static int idxd_probe(struct idxd_device *idxd)
} else {
dev_warn(dev, "Unable to turn on SVA feature.\n");
}
-   } else if (!sva) {
-   dev_warn(dev, "User forced SVA off via module param.\n");
}
-
idxd_read_caps(idxd);
idxd_read_table_offsets(idxd);
 
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 7e19ab92b61a..fde6656695ba 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -839,13 +839,6 @@ static ssize_t wq_name_store(struct device *dev,
if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
return -EINVAL;
 
-   /*
-* This is temporarily placed here until we have SVM support for
-* dmaengine.
-*/
-   if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
-   return -EOPNOTSUPP;
-
memset(wq->name, 0, WQ_NAME_SIZE + 1);
strncpy(wq->name, buf, WQ_NAME_SIZE);
strreplace(wq->name, '\n', '\0');
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 9/9] dmaengine: idxd: separate user and kernel pasid enabling

2022-03-14 Thread Jacob Pan
From: Dave Jiang 

The idxd driver always gated the pasid enabling under a single knob and
this assumption is incorrect. The pasid used for kernel operation can be
independently toggled and has no dependency on the user pasid (and vice
versa). Split the two so they are independent "enabled" flags.

Signed-off-by: Dave Jiang 
Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/cdev.c |  4 ++--
 drivers/dma/idxd/idxd.h |  6 ++
 drivers/dma/idxd/init.c | 30 ++
 3 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 312ec37ebf91..addaebca7683 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -99,7 +99,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
ctx->wq = wq;
filp->private_data = ctx;
 
-   if (device_pasid_enabled(idxd)) {
+   if (device_user_pasid_enabled(idxd)) {
sva = iommu_sva_bind_device(dev, current->mm);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
@@ -152,7 +152,7 @@ static int idxd_cdev_release(struct inode *node, struct 
file *filep)
if (wq_shared(wq)) {
idxd_device_drain_pasid(idxd, ctx->pasid);
} else {
-   if (device_pasid_enabled(idxd)) {
+   if (device_user_pasid_enabled(idxd)) {
/* The wq disable in the disable pasid function will 
drain the wq */
rc = idxd_wq_disable_pasid(wq);
if (rc < 0)
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index a09ab4a6e1c1..190b08bd7c08 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -239,6 +239,7 @@ enum idxd_device_flag {
IDXD_FLAG_CONFIGURABLE = 0,
IDXD_FLAG_CMD_RUNNING,
IDXD_FLAG_PASID_ENABLED,
+   IDXD_FLAG_USER_PASID_ENABLED,
 };
 
 struct idxd_dma_dev {
@@ -468,6 +469,11 @@ static inline bool device_pasid_enabled(struct idxd_device 
*idxd)
return test_bit(IDXD_FLAG_PASID_ENABLED, >flags);
 }
 
+static inline bool device_user_pasid_enabled(struct idxd_device *idxd)
+{
+   return test_bit(IDXD_FLAG_USER_PASID_ENABLED, >flags);
+}
+
 static inline bool device_swq_supported(struct idxd_device *idxd)
 {
return (support_enqcmd && device_pasid_enabled(idxd));
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 5d1f8dd4abf6..981150b7d09b 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -500,16 +500,19 @@ static int idxd_probe(struct idxd_device *idxd)
 
if (IS_ENABLED(CONFIG_INTEL_IDXD_SVM) && sva) {
rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA);
-   if (rc == 0) {
-   rc = idxd_enable_system_pasid(idxd);
-   if (rc < 0) {
-   iommu_dev_disable_feature(dev, 
IOMMU_DEV_FEAT_SVA);
-   dev_warn(dev, "Failed to enable PASID. No SVA 
support: %d\n", rc);
-   } else {
-   set_bit(IDXD_FLAG_PASID_ENABLED, >flags);
-   }
-   } else {
+   if (rc) {
+   /*
+* Do not bail here since legacy DMA is still
+* supported, both user and in-kernel DMA with
+* PASID rely on SVA feature.
+*/
dev_warn(dev, "Unable to turn on SVA feature.\n");
+   } else {
+   set_bit(IDXD_FLAG_USER_PASID_ENABLED, >flags);
+   if (idxd_enable_system_pasid(idxd))
+   dev_warn(dev, "No in-kernel DMA with PASID.\n");
+   else
+   set_bit(IDXD_FLAG_PASID_ENABLED, >flags);
}
}
idxd_read_caps(idxd);
@@ -545,7 +548,8 @@ static int idxd_probe(struct idxd_device *idxd)
  err:
if (device_pasid_enabled(idxd))
idxd_disable_system_pasid(idxd);
-   iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
+   if (device_user_pasid_enabled(idxd) || device_pasid_enabled(idxd))
+   iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
return rc;
 }
 
@@ -558,7 +562,8 @@ static void idxd_cleanup(struct idxd_device *idxd)
idxd_cleanup_internals(idxd);
if (device_pasid_enabled(idxd))
idxd_disable_system_pasid(idxd);
-   iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
+   if (device_user_pasid_enabled(idxd) || device_pasid_enabled(idxd))
+   iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
 }
 
 static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
@@ -677,7 +682,8 @@ static void idxd_remove(struct pci_dev *pdev)
free_i

[PATCH v2 3/8] iommu/vt-d: Implement device_pasid domain attach ops

2022-03-14 Thread Jacob Pan
On VT-d platforms with scalable mode enabled, devices issue DMA requests
with PASID need to attach to the correct IOMMU domains.
The attach operation involves the following:
- programming the PASID into device's PASID table
- tracking device domain and the PASID relationship
- managing IOTLB and device TLB invalidations

This patch extends DMAR domain and device domain info with xarrays to
track per domain and per device PASIDs.  It provides the flexibility to
be used beyond DMA API PASID support.

Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 194 +++-
 include/linux/intel-iommu.h |  12 ++-
 2 files changed, 202 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 881f8361eca2..9267194eaed3 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1622,20 +1622,48 @@ static void __iommu_flush_dev_iotlb(struct 
device_domain_info *info,
   qdep, addr, mask);
 }
 
+static void __iommu_flush_dev_piotlb(struct device_domain_info *info,
+   u64 address,
+ioasid_t pasid, unsigned int mask)
+{
+   u16 sid, qdep;
+
+   if (!info || !info->ats_enabled)
+   return;
+
+   sid = info->bus << 8 | info->devfn;
+   qdep = info->ats_qdep;
+   qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
+pasid, qdep, address, mask);
+}
+
 static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
  u64 addr, unsigned mask)
 {
unsigned long flags;
struct device_domain_info *info;
struct subdev_domain_info *sinfo;
+   unsigned long pasid;
+   struct pasid_info *pinfo;
 
if (!domain->has_iotlb_device)
return;
 
spin_lock_irqsave(_domain_lock, flags);
-   list_for_each_entry(info, >devices, link)
-   __iommu_flush_dev_iotlb(info, addr, mask);
-
+   list_for_each_entry(info, >devices, link) {
+   /*
+* We cannot use PASID based devTLB invalidation on RID2PASID
+* Device does not understand RID2PASID/0. This is different
+* than IOTLB invalidation where RID2PASID is also used for
+* tagging.
+*/
+   xa_for_each(>pasids, pasid, pinfo) {
+   if (!pasid)
+   __iommu_flush_dev_iotlb(info, addr, mask);
+   else
+   __iommu_flush_dev_piotlb(info, addr, pasid, 
mask);
+   }
+   }
list_for_each_entry(sinfo, >subdevices, link_domain) {
info = get_domain_info(sinfo->pdev);
__iommu_flush_dev_iotlb(info, addr, mask);
@@ -1648,6 +1676,8 @@ static void domain_flush_piotlb(struct intel_iommu *iommu,
u64 addr, unsigned long npages, bool ih)
 {
u16 did = domain->iommu_did[iommu->seq_id];
+   struct pasid_info *pinfo;
+   unsigned long pasid;
 
if (domain->default_pasid)
qi_flush_piotlb(iommu, did, domain->default_pasid,
@@ -1655,6 +1685,21 @@ static void domain_flush_piotlb(struct intel_iommu 
*iommu,
 
if (!list_empty(>devices))
qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
+
+   if (list_empty(>devices) || xa_empty(>pasids))
+   return;
+
+   /*
+* Flush IOTLBs for all the PASIDs attached to this domain, RID2PASID
+* included.
+* TODO: If there are many PASIDs, we may resort to flush with
+* domain ID which may have performance benefits due to fewer
+* invalidation descriptors. VM exits may be reduced when running on
+* vIOMMU. The current use cases utilize no more than 2 PASIDs per
+* device, i.e. RID2PASID and a kernel DMA API PASID.
+*/
+   xa_for_each(>pasids, pasid, pinfo)
+   qi_flush_piotlb(iommu, did, pasid, addr, npages, ih);
 }
 
 static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
@@ -1902,6 +1947,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
domain->has_iotlb_device = false;
INIT_LIST_HEAD(>devices);
INIT_LIST_HEAD(>subdevices);
+   xa_init(>pasids);
 
return domain;
 }
@@ -2556,6 +2602,144 @@ static bool dev_is_real_dma_subdevice(struct device 
*dev)
   pci_real_dma_dev(to_pci_dev(dev)) != to_pci_dev(dev);
 }
 
+
+static bool is_device_domain_attached(struct dmar_domain *dmar_domain,
+ struct device *dev)
+{
+   struct device_domain_info *info;
+
+   list_for_each_entry(info, _domain->devices, link) {
+   if (info->dev == dev)
+  

[PATCH v2 8/8] iommu: Remove unused driver data in sva_bind_device

2022-03-14 Thread Jacob Pan
No one is using drvdata for sva_bind_device after kernel SVA support is
removed from VT-d driver. Remove the drvdata parameter as well.

Signed-off-by: Jacob Pan 
---
 drivers/dma/idxd/cdev.c | 2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 ++---
 drivers/iommu/intel/svm.c   | 9 -
 drivers/iommu/iommu.c   | 4 ++--
 drivers/misc/uacce/uacce.c  | 2 +-
 include/linux/intel-iommu.h | 3 +--
 include/linux/iommu.h   | 9 +++--
 8 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index b9b2b4a4124e..312ec37ebf91 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -100,7 +100,7 @@ static int idxd_cdev_open(struct inode *inode, struct file 
*filp)
filp->private_data = ctx;
 
if (device_pasid_enabled(idxd)) {
-   sva = iommu_sva_bind_device(dev, current->mm, NULL);
+   sva = iommu_sva_bind_device(dev, current->mm);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a737ba5f727e..eb2f5cb0701a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -354,7 +354,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
 }
 
 struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
 {
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..d2ba86470c42 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -754,8 +754,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master 
*master);
 int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
 int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
 bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
-   void *drvdata);
+struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
 void arm_smmu_sva_unbind(struct iommu_sva *handle);
 u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
 void arm_smmu_sva_notifier_synchronize(void);
@@ -791,7 +790,7 @@ static inline bool arm_smmu_master_iopf_supported(struct 
arm_smmu_master *master
 }
 
 static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
 {
return ERR_PTR(-ENODEV);
 }
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 37d6218f173b..94deb58375f5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -500,8 +500,7 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
return ret;
 }
 
-static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
-unsigned int flags)
+static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm)
 {
ioasid_t max_pasid = dev_is_pci(dev) ?
pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
@@ -1002,20 +1001,20 @@ static irqreturn_t prq_event_thread(int irq, void *d)
return IRQ_RETVAL(handled);
 }
 
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
+struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm)
 {
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
struct iommu_sva *sva;
int ret;
 
mutex_lock(_mutex);
-   ret = intel_svm_alloc_pasid(dev, mm, flags);
+   ret = intel_svm_alloc_pasid(dev, mm);
if (ret) {
mutex_unlock(_mutex);
return ERR_PTR(ret);
}
 
-   sva = intel_svm_bind_mm(iommu, dev, mm, flags);
+   sva = intel_svm_bind_mm(iommu, dev, mm);
if (IS_ERR_OR_NULL(sva))
intel_svm_free_pasid(mm);
mutex_unlock(_mutex);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 107dcf5938d6..fef34879bc0c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3049,7 +3049,7 @@ EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
  * On error, returns an ERR_PTR value.
  */
 struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+iommu_sva_bind_device

[PATCH v2 2/8] iommu: Add attach/detach_dev_pasid domain ops

2022-03-14 Thread Jacob Pan
From: Lu Baolu 

An IOMMU domain represents an address space which can be attached by
devices that perform DMA within a domain. However, for platforms with
PASID capability the domain attachment needs be handled at device+PASID
level. There can be multiple PASIDs within a device and multiple devices
attached to a given domain.
This patch introduces a new IOMMU op which support device, PASID, and
IOMMU domain attachment. The immediate use case is for PASID capable
devices to perform DMA under DMA APIs.

Signed-off-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 include/linux/iommu.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 369f05c2a4e2..fde5b933dbe3 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -227,6 +227,8 @@ struct iommu_iotlb_gather {
  * @aux_get_pasid: get the pasid given an aux-domain
  * @sva_bind: Bind process address space to device
  * @sva_unbind: Unbind process address space from device
+ * @attach_dev_pasid: attach an iommu domain to a pasid of device
+ * @detach_dev_pasid: detach an iommu domain from a pasid of device
  * @sva_get_pasid: Get PASID associated to a SVA handle
  * @page_response: handle page request response
  * @cache_invalidate: invalidate translation caches
@@ -296,6 +298,10 @@ struct iommu_ops {
struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
  void *drvdata);
void (*sva_unbind)(struct iommu_sva *handle);
+   int (*attach_dev_pasid)(struct iommu_domain *domain,
+   struct device *dev, ioasid_t id);
+   void (*detach_dev_pasid)(struct iommu_domain *domain,
+struct device *dev, ioasid_t id);
u32 (*sva_get_pasid)(struct iommu_sva *handle);
 
int (*page_response)(struct device *dev,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/8] iommu: Add PASID support for DMA mapping API users

2022-03-14 Thread Jacob Pan
DMA mapping API is the de facto standard for in-kernel DMA. It operates
on a per device/RID basis which is not PASID-aware.

Some modern devices such as Intel Data Streaming Accelerator, PASID is
required for certain work submissions. To allow such devices use DMA
mapping API, we need the following functionalities:
1. Provide device a way to retrieve a PASID for work submission within
the kernel
2. Enable the kernel PASID on the IOMMU for the device
3. Attach the kernel PASID to the device's default DMA domain, let it
be IOVA or physical address in case of pass-through.

This patch introduces a driver facing API that enables DMA API
PASID usage. Once enabled, device drivers can continue to use DMA APIs as
is. There is no difference in dma_handle between without PASID and with
PASID.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dma-iommu.c | 65 +++
 include/linux/dma-iommu.h |  7 +
 include/linux/iommu.h |  9 ++
 3 files changed, 81 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b22034975301..d0ff1a34b1b6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -39,6 +39,8 @@ enum iommu_dma_cookie_type {
IOMMU_DMA_MSI_COOKIE,
 };
 
+static DECLARE_IOASID_SET(iommu_dma_pasid);
+
 struct iommu_dma_cookie {
enum iommu_dma_cookie_type  type;
union {
@@ -370,6 +372,69 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
domain->iova_cookie = NULL;
 }
 
+/**
+ * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
+ * @dev:   Device to be enabled
+ *
+ * DMA request with PASID will be mapped the same way as the legacy DMA.
+ * If the device is in pass-through, PASID will also pass-through. If the
+ * device is in IOVA map, the supervisor PASID will point to the same IOVA
+ * page table.
+ *
+ * @return the kernel PASID to be used for DMA or INVALID_IOASID on failure
+ */
+int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid)
+{
+   struct iommu_domain *dom;
+   ioasid_t id, max;
+   int ret;
+
+   dom = iommu_get_domain_for_dev(dev);
+   if (!dom || !dom->ops || !dom->ops->attach_dev_pasid)
+   return -ENODEV;
+   max = iommu_get_dev_pasid_max(dev);
+   if (!max)
+   return -EINVAL;
+
+   id = ioasid_alloc(_dma_pasid, 1, max, dev);
+   if (id == INVALID_IOASID)
+   return -ENOMEM;
+
+   ret = dom->ops->attach_dev_pasid(dom, dev, id);
+   if (ret) {
+   ioasid_put(id);
+   return ret;
+   }
+   *pasid = id;
+
+   return ret;
+}
+EXPORT_SYMBOL(iommu_enable_pasid_dma);
+
+/**
+ * iommu_disable_pasid_dma --Disable in-kernel DMA request with PASID
+ * @dev:   Device's PASID DMA to be disabled
+ *
+ * It is the device driver's responsibility to ensure no more incoming DMA
+ * requests with the kernel PASID before calling this function. IOMMU driver
+ * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
+ * drained.
+ *
+ * @return 0 on success or error code on failure
+ */
+void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid)
+{
+   struct iommu_domain *dom;
+
+   /* TODO: check the given PASID is within the ioasid_set */
+   dom = iommu_get_domain_for_dev(dev);
+   if (!dom->ops->detach_dev_pasid)
+   return;
+   dom->ops->detach_dev_pasid(dom, dev, pasid);
+   ioasid_put(pasid);
+}
+EXPORT_SYMBOL(iommu_disable_pasid_dma);
+
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 24607dc3c2ac..e6cb9b52a420 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain *domain);
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
 void iommu_put_dma_cookie(struct iommu_domain *domain);
 
+/*
+ * For devices that can do DMA request with PASID, setup a system PASID.
+ * Address modes (IOVA, PA) are selected by the platform code.
+ */
+int iommu_enable_pasid_dma(struct device *dev, ioasid_t *pasid);
+void iommu_disable_pasid_dma(struct device *dev, ioasid_t pasid);
+
 /* Setup call for arch DMA mapping code */
 void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
 int iommu_dma_init_fq(struct iommu_domain *domain);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fde5b933dbe3..fb011722e4f8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -395,6 +395,15 @@ static inline void iommu_set_dev_pasid_max(struct device 
*dev,
 
param->pasid_max = max;
 }
+static inline ioasid_t iommu_get_dev_pasid_max(struct device *dev)
+{
+   struct dev_iommu *param = dev->iommu;
+
+   if (WARN_ON(!param))
+   return 0;
+
+   return param->pas

[PATCH v2 0/8] Enable PASID for DMA API users

2022-03-14 Thread Jacob Pan
Some modern accelerators such as Intel's Data Streaming Accelerator (DSA)
require PASID in DMA requests to be operational. Specifically, the work
submissions with ENQCMD on shared work queues require PASIDs. The use cases
include both user DMA with shared virtual addressing (SVA) and in-kernel
DMA similar to legacy DMA w/o PASID. Here we address the latter.

DMA mapping API is the de facto standard for in-kernel DMA. However, it
operates on a per device or Requester ID(RID) basis which is not
PASID-aware. To leverage DMA API for devices relies on PASIDs, this
patchset introduces the following APIs

1. A driver facing API that enables DMA API PASID usage:
iommu_enable_pasid_dma(struct device *dev, ioasid_t );

2. An IOMMU op that allows attaching device-domain-PASID generically (will
be used beyond DMA API PASID support)

Once PASID DMA is enabled and attached to the appropriate IOMMU domain,
device drivers can continue to use DMA APIs as-is. There is no difference
in terms of mapping in dma_handle between without PASID and with PASID.
The DMA mapping performed by IOMMU will be identical for both requests, let
it be IOVA or PA in case of pass-through.

In addition, this set converts DSA driver in-kernel DMA with PASID from SVA
lib to DMA API. There have been security and functional issues with the
kernel SVA approach:
(https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
The highlights are as the following:
 - The lack of IOTLB synchronization upon kernel page table updates.
   (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
 - Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address. There are also no use cases yet
where DMA engines need kernel virtual addresses for in-kernel DMA.

Subsequently, cleanup is done around the usage of sva_bind_device() for
in-kernel DMA. Removing special casing code in VT-d driver and tightening
SVA lib API.

This work and idea behind it is a collaboration with many people, many
thanks to Baolu Lu, Jason Gunthorpe, Dave Jiang, and others.


ChangeLog:
v2
- Do not reserve a special PASID for DMA API usage. Use IOASID
  allocation instead.
- Introduced a generic device-pasid-domain attachment IOMMU op.
  Replaced the DMA API only IOMMU op.
- Removed supervisor SVA support in VT-d
- Removed unused sva_bind_device parameters
- Use IOMMU specific data instead of struct device to store PASID
  info

Jacob Pan (6):
  iommu/vt-d: Implement device_pasid domain attach ops
  iommu/vt-d: Use device_pasid attach op for RID2PASID
  iommu: Add PASID support for DMA mapping API users
  dmaengine: idxd: Use DMA API for in-kernel DMA with PASID
  iommu/vt-d: Delete supervisor/kernel SVA
  iommu: Remove unused driver data in sva_bind_device

Lu Baolu (2):
  iommu: Assign per device max PASID
  iommu: Add attach/detach_dev_pasid domain ops

 drivers/dma/idxd/cdev.c   |   2 +-
 drivers/dma/idxd/idxd.h   |   1 -
 drivers/dma/idxd/init.c   |  34 +--
 drivers/dma/idxd/sysfs.c  |   7 -
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   5 +-
 drivers/iommu/dma-iommu.c |  65 ++
 drivers/iommu/intel/iommu.c   | 214 --
 drivers/iommu/intel/svm.c |  51 +
 drivers/iommu/iommu.c |   4 +-
 drivers/misc/uacce/uacce.c|   2 +-
 include/linux/dma-iommu.h |   7 +
 include/linux/intel-iommu.h   |  15 +-
 include/linux/iommu.h |  37 ++-
 14 files changed, 338 insertions(+), 108 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 09/10] iommu/vt-d: Refactor dmar_insert_one_dev_info()

2022-02-25 Thread Jacob Pan
Hi BaoLu,

On Mon,  7 Feb 2022 14:41:41 +0800, Lu Baolu 
wrote:

>  
> - if (dev && domain_context_mapping(domain, dev)) {
> - dev_err(dev, "Domain context map failed\n");
> - dmar_remove_one_dev_info(dev);
> - return NULL;
> - }
> + /* Setup the context entry for device: */
> + ret = domain_context_mapping(domain, dev);
> + if (ret)
> + goto setup_context_err;
>  
> - return domain;
> + info->domain = domain;
> + list_add_rcu(>link, >devices);
> +
There seems to be an ordering problem. We need to do list_add_rcu()
*before*  domain_context_mapping(). Otherwise, while doing context mapping
the search for dev IOTLB support in iommu_support_dev_iotlb() will fail.
Then ATS capable device context will not have DTE bit set. The result is
DMAR unrecoverable fault while doing DMA.


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-10 Thread Jacob Pan


On Wed, 9 Feb 2022 19:16:14 -0800, Jacob Pan
 wrote:

> Hi Fenghua,
> 
> On Mon,  7 Feb 2022 15:02:48 -0800, Fenghua Yu 
> wrote:
> 
> > @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device
> > *dev, struct mm_struct *mm, void }
> >  
> > sva = intel_svm_bind_mm(iommu, dev, mm, flags);
> > -   if (IS_ERR_OR_NULL(sva))
> > -   intel_svm_free_pasid(mm);  
> If bind fails, the PASID has no IOMMU nor CPU context. It should be safe
> to free here.
> 
Actually, here we cannot tell if the bind is the first of the mm so I think
this is fine.

Reviewed-by: Jacob Pan 

> Thanks,
> 
> Jacob


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-02-09 Thread Jacob Pan
Hi Fenghua,

On Mon,  7 Feb 2022 15:02:48 -0800, Fenghua Yu  wrote:

> @@ -1047,8 +1040,6 @@ struct iommu_sva *intel_svm_bind(struct device
> *dev, struct mm_struct *mm, void }
>  
>   sva = intel_svm_bind_mm(iommu, dev, mm, flags);
> - if (IS_ERR_OR_NULL(sva))
> - intel_svm_free_pasid(mm);
If bind fails, the PASID has no IOMMU nor CPU context. It should be safe to
free here.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/vt-d: Fix PCI bus rescan device hot add

2022-02-09 Thread Jacob Pan
Hi Joerg,

On Thu, 3 Feb 2022 09:25:58 +0100, Joerg Roedel  wrote:

> On Tue, Feb 01, 2022 at 11:19:18AM -0800, Jacob Pan wrote:
> > Do you mean having an IRQ remapping notifier regardless IOMMU-API is
> > enabled, right?
> > It make sense, I will give it a try.  
> 
> Yeah, that would be best. I really don't like to use two notifiers just
> to work around ordering issues.
> 
Another option Ashok and I discussed is that we can make DMAR cache persist
(which should be for explicitly listed devices in each DRHD) across PCI
remove-rescan cycle, then we don't need the DMAR PCI bus notifier at all.

This bug only impacts RCIEP device hotplug, which is not the most reasonable
use case, we have the space to look into a proper fix.

> Regards,
> 
>   Joerg


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1 09/10] iommu/vt-d: Refactor dmar_insert_one_dev_info()

2022-02-07 Thread Jacob Pan
Hi BaoLu,

On Mon,  7 Feb 2022 14:41:41 +0800, Lu Baolu 
wrote:

>  static void intel_iommu_release_device(struct device *dev)
>  {
> - struct intel_iommu *iommu;
> -
> - iommu = device_to_iommu(dev, NULL, NULL);
> - if (!iommu)
> - return;
> -
> - dmar_remove_one_dev_info(dev);
> + struct device_domain_info *info = get_domain_info(dev);
> + unsigned long index = DEVI_IDX(info->segment, info->bus,
> info->devfn); 
> + xa_erase(_domain_array, index);
> + dev_iommu_priv_set(info->dev, NULL);
>   set_dma_ops(dev, NULL);
> + kfree(info);
Now that info and sinfo are under RCU, should we use kfree_rcu?

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/vt-d: Fix PCI bus rescan device hot add

2022-02-01 Thread Jacob Pan
Hi Joerg,

On Mon, 31 Jan 2022 16:52:10 +0100, Joerg Roedel  wrote:

> On Mon, Jan 31, 2022 at 01:53:06PM +, Robin Murphy wrote:
> > Indeed I very nearly asked whether we couldn't just call these from
> > intel_iommu_{probe,release}_device() directly, but it looks like they
> > also interact with the interrupt remapping stuff which can be built
> > independently of the IOMMU API :(  
> 
> Okay, but having two notifiers is still ugly. Can we only register a
> notifier when IRQ-remapping is used without IOMMU-API? In this case a
> single notifier be sufficient.
> 
Do you mean having an IRQ remapping notifier regardless IOMMU-API is
enabled, right?
It make sense, I will give it a try.

> Regards,
> 
> Joerg
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/vt-d: Fix PCI bus rescan device hot add

2022-02-01 Thread Jacob Pan
Hi Joerg,

On Sun, 30 Jan 2022 08:43:19 +0100, Joerg Roedel  wrote:

> Hi Jacob, Baolu,
> 
> On Fri, Jan 28, 2022 at 11:10:01AM +0800, Lu Baolu wrote:
> > During PCI bus rescan, adding new devices involve two notifiers.
> > 1. dmar_pci_bus_notifier()
> > 2. iommu_bus_notifier()
> > The current code sets #1 as low priority (INT_MIN) which resulted in #2
> > being invoked first. The result is that struct device pointer cannot be
> > found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the
> > device is put under the "catch-all" IOMMU instead of the correct one.  
> 
> There are actually iommu_ops pointers invoked from iommu_bus_notifier()
> into IOMMU driver code. Can those be used to enforce the ordering in a
> more reliable way?
> 
The race condition is between PCI/ACPI and IOMMU subsystems, I don't see
how having IOMMU internal ops can solve the external race.  Perhaps we
should merge into one notifier to have direct control of the ordering, is
that what you are suggesting? It seems to be a good albeit larger clean-up
I can look into.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/vt-d: Fix PCI bus rescan device hot add

2022-01-25 Thread Jacob Pan
Hi all,

Just wondering if there are any other comments? This fixes a
regression that can cause system hang.

On Fri, 14 Jan 2022 00:21:10 -0800, Jacob Pan
 wrote:

> During PCI bus rescan, adding new devices involve two notifiers.
> 1. dmar_pci_bus_notifier()
> 2. iommu_bus_notifier()
> The current code sets #1 as low priority (INT_MIN) which resulted in #2
> being invoked first. The result is that struct device pointer cannot be
> found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the
> device is put under the "catch-all" IOMMU instead of the correct one.
> 
> This could cause system hang when device TLB invalidation is sent to the
> wrong IOMMU. Invalidation timeout error and hard lockup have been
> observed.
> 
> On the reverse direction for device removal, the order should be #2-#1
> such that DMAR cleanup is done after IOMMU.
> 
> This patch fixes the issue by setting proper priorities for
> dmar_pci_bus_notifier around IOMMU bus notifier. DRHD search for a new
> device will find the correct IOMMU. The order with this patch is the
> following:
> 1. dmar_pci_bus_add_dev()
> 2. iommu_probe_device()
> 3. iommu_release_device()
> 4. dmar_pci_bus_remove_dev()
> 
> Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
> Reported-by: Zhang, Bernice 
> Suggested-by: Lu Baolu 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel/dmar.c | 69 --
>  drivers/iommu/iommu.c  |  1 +
>  include/linux/iommu.h  |  1 +
>  3 files changed, 53 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 915bff76fe96..5f4751ba6bb1 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -340,15 +340,19 @@ static inline void vf_inherit_msi_domain(struct
> pci_dev *pdev) dev_set_msi_domain(>dev,
> dev_get_msi_domain(>dev)); }
>  
> -static int dmar_pci_bus_notifier(struct notifier_block *nb,
> +static int dmar_pci_bus_add_notifier(struct notifier_block *nb,
>unsigned long action, void *data)
>  {
>   struct pci_dev *pdev = to_pci_dev(data);
>   struct dmar_pci_notify_info *info;
>  
> - /* Only care about add/remove events for physical functions.
> + if (action != BUS_NOTIFY_ADD_DEVICE)
> + return NOTIFY_DONE;
> +
> + /*
>* For VFs we actually do the lookup based on the corresponding
> -  * PF in device_to_iommu() anyway. */
> +  * PF in device_to_iommu() anyway.
> +  */
>   if (pdev->is_virtfn) {
>   /*
>* Ensure that the VF device inherits the irq domain of
> the @@ -358,13 +362,34 @@ static int dmar_pci_bus_notifier(struct
> notifier_block *nb,
>* from the PF device, but that's yet another x86'sism to
>* inflict on everybody else.
>*/
> - if (action == BUS_NOTIFY_ADD_DEVICE)
> - vf_inherit_msi_domain(pdev);
> + vf_inherit_msi_domain(pdev);
>   return NOTIFY_DONE;
>   }
>  
> - if (action != BUS_NOTIFY_ADD_DEVICE &&
> - action != BUS_NOTIFY_REMOVED_DEVICE)
> + info = dmar_alloc_pci_notify_info(pdev, action);
> + if (!info)
> + return NOTIFY_DONE;
> +
> + down_write(_global_lock);
> + dmar_pci_bus_add_dev(info);
> + up_write(_global_lock);
> + dmar_free_pci_notify_info(info);
> +
> + return NOTIFY_OK;
> +}
> +
> +static struct notifier_block dmar_pci_bus_add_nb = {
> + .notifier_call = dmar_pci_bus_add_notifier,
> + .priority = IOMMU_BUS_NOTIFY_PRIORITY + 1,
> +};
> +
> +static int dmar_pci_bus_remove_notifier(struct notifier_block *nb,
> +  unsigned long action, void *data)
> +{
> + struct pci_dev *pdev = to_pci_dev(data);
> + struct dmar_pci_notify_info *info;
> +
> + if (pdev->is_virtfn || action != BUS_NOTIFY_REMOVED_DEVICE)
>   return NOTIFY_DONE;
>  
>   info = dmar_alloc_pci_notify_info(pdev, action);
> @@ -372,10 +397,7 @@ static int dmar_pci_bus_notifier(struct
> notifier_block *nb, return NOTIFY_DONE;
>  
>   down_write(_global_lock);
> - if (action == BUS_NOTIFY_ADD_DEVICE)
> - dmar_pci_bus_add_dev(info);
> - else if (action == BUS_NOTIFY_REMOVED_DEVICE)
> - dmar_pci_bus_del_dev(info);
> + dmar_pci_bus_del_dev(info);
>   up_write(_global_lock);
>  
>   dmar_free_pci_notify_info(info);
> @@ -383,11 +405,10 @@ static int dmar_pci_bus_notifier(struct
> notifier_block *nb, return NOTIFY_OK;
>  }
>

[PATCH v2] iommu/vt-d: Fix PCI bus rescan device hot add

2022-01-14 Thread Jacob Pan
During PCI bus rescan, adding new devices involve two notifiers.
1. dmar_pci_bus_notifier()
2. iommu_bus_notifier()
The current code sets #1 as low priority (INT_MIN) which resulted in #2
being invoked first. The result is that struct device pointer cannot be
found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the
device is put under the "catch-all" IOMMU instead of the correct one.

This could cause system hang when device TLB invalidation is sent to the
wrong IOMMU. Invalidation timeout error and hard lockup have been observed.

On the reverse direction for device removal, the order should be #2-#1
such that DMAR cleanup is done after IOMMU.

This patch fixes the issue by setting proper priorities for
dmar_pci_bus_notifier around IOMMU bus notifier. DRHD search for a new
device will find the correct IOMMU. The order with this patch is the
following:
1. dmar_pci_bus_add_dev()
2. iommu_probe_device()
3. iommu_release_device()
4. dmar_pci_bus_remove_dev()

Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
Reported-by: Zhang, Bernice 
Suggested-by: Lu Baolu 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/dmar.c | 69 --
 drivers/iommu/iommu.c  |  1 +
 include/linux/iommu.h  |  1 +
 3 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 915bff76fe96..5f4751ba6bb1 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -340,15 +340,19 @@ static inline void vf_inherit_msi_domain(struct pci_dev 
*pdev)
dev_set_msi_domain(>dev, dev_get_msi_domain(>dev));
 }
 
-static int dmar_pci_bus_notifier(struct notifier_block *nb,
+static int dmar_pci_bus_add_notifier(struct notifier_block *nb,
 unsigned long action, void *data)
 {
struct pci_dev *pdev = to_pci_dev(data);
struct dmar_pci_notify_info *info;
 
-   /* Only care about add/remove events for physical functions.
+   if (action != BUS_NOTIFY_ADD_DEVICE)
+   return NOTIFY_DONE;
+
+   /*
 * For VFs we actually do the lookup based on the corresponding
-* PF in device_to_iommu() anyway. */
+* PF in device_to_iommu() anyway.
+*/
if (pdev->is_virtfn) {
/*
 * Ensure that the VF device inherits the irq domain of the
@@ -358,13 +362,34 @@ static int dmar_pci_bus_notifier(struct notifier_block 
*nb,
 * from the PF device, but that's yet another x86'sism to
 * inflict on everybody else.
 */
-   if (action == BUS_NOTIFY_ADD_DEVICE)
-   vf_inherit_msi_domain(pdev);
+   vf_inherit_msi_domain(pdev);
return NOTIFY_DONE;
}
 
-   if (action != BUS_NOTIFY_ADD_DEVICE &&
-   action != BUS_NOTIFY_REMOVED_DEVICE)
+   info = dmar_alloc_pci_notify_info(pdev, action);
+   if (!info)
+   return NOTIFY_DONE;
+
+   down_write(_global_lock);
+   dmar_pci_bus_add_dev(info);
+   up_write(_global_lock);
+   dmar_free_pci_notify_info(info);
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block dmar_pci_bus_add_nb = {
+   .notifier_call = dmar_pci_bus_add_notifier,
+   .priority = IOMMU_BUS_NOTIFY_PRIORITY + 1,
+};
+
+static int dmar_pci_bus_remove_notifier(struct notifier_block *nb,
+unsigned long action, void *data)
+{
+   struct pci_dev *pdev = to_pci_dev(data);
+   struct dmar_pci_notify_info *info;
+
+   if (pdev->is_virtfn || action != BUS_NOTIFY_REMOVED_DEVICE)
return NOTIFY_DONE;
 
info = dmar_alloc_pci_notify_info(pdev, action);
@@ -372,10 +397,7 @@ static int dmar_pci_bus_notifier(struct notifier_block *nb,
return NOTIFY_DONE;
 
down_write(_global_lock);
-   if (action == BUS_NOTIFY_ADD_DEVICE)
-   dmar_pci_bus_add_dev(info);
-   else if (action == BUS_NOTIFY_REMOVED_DEVICE)
-   dmar_pci_bus_del_dev(info);
+   dmar_pci_bus_del_dev(info);
up_write(_global_lock);
 
dmar_free_pci_notify_info(info);
@@ -383,11 +405,10 @@ static int dmar_pci_bus_notifier(struct notifier_block 
*nb,
return NOTIFY_OK;
 }
 
-static struct notifier_block dmar_pci_bus_nb = {
-   .notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,
+static struct notifier_block dmar_pci_bus_remove_nb = {
+   .notifier_call = dmar_pci_bus_remove_notifier,
+   .priority = IOMMU_BUS_NOTIFY_PRIORITY - 1,
 };
-
 static struct dmar_drhd_unit *
 dmar_find_dmaru(struct acpi_dmar_hardware_unit *drhd)
 {
@@ -835,7 +856,17 @@ int __init dmar_dev_scope_init(void)
 
 void __init dmar_register_bus_notifier(void)
 {
-   bus_register_notifier(_bus_type, _pci_bus_nb);
+   /*
+* We need two notifiers i

Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-01-14 Thread Jacob Pan
Hi Lu,

On Fri, 14 Jan 2022 11:12:45 +0800, Lu Baolu 
wrote:

> On 1/14/22 11:11 AM, Jacob Pan wrote:
> > On Fri, 14 Jan 2022 08:58:53 +0800, Lu Baolu
> > wrote:
> >   
> >> Hi Jacob,
> >>
> >> On 1/13/22 9:23 PM, Jacob Pan wrote:  
> >>> During PCI bus rescan, adding new devices involve two notifiers.
> >>> 1. dmar_pci_bus_notifier()
> >>> 2. iommu_bus_notifier()
> >>> The current code sets #1 as low priority (INT_MIN) which resulted in
> >>> #2 being invoked first. The result is that struct device pointer
> >>> cannot be found in DRHD search for the new device's DMAR/IOMMU.
> >>> Subsequently, the device is put under the "catch-all" IOMMU instead
> >>> of the correct one.
> >>>
> >>> This could cause system hang when device TLB invalidation is sent to
> >>> the wrong IOMMU. Invalidation timeout error or hard lockup can be
> >>> observed.
> >>>
> >>> This patch fixes the issue by setting a higher priority for
> >>> dmar_pci_bus_notifier. DRHD search for a new device will find the
> >>> correct IOMMU.
> >>>
> >>> Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
> >>> Reported-by: Zhang, Bernice
> >>> Signed-off-by: Jacob Pan
> >>> ---
> >>>drivers/iommu/intel/dmar.c | 2 +-
> >>>1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> >>> index 915bff76fe96..5d07e5b89c2e 100644
> >>> --- a/drivers/iommu/intel/dmar.c
> >>> +++ b/drivers/iommu/intel/dmar.c
> >>> @@ -385,7 +385,7 @@ static int dmar_pci_bus_notifier(struct
> >>> notifier_block *nb,
> >>>static struct notifier_block dmar_pci_bus_nb = {
> >>>   .notifier_call = dmar_pci_bus_notifier,
> >>> - .priority = INT_MIN,
> >>> + .priority = INT_MAX,
> >>>};
> >>>
> >>>static struct dmar_drhd_unit *
> >>>  
> >> Nice catch! dmar_pci_bus_add_dev() should take place*before*
> >> iommu_probe_device(). This change enforces this with a higher notifier
> >> priority for dmar callback.
> >>
> >> Comparably, dmar_pci_bus_del_dev() should take place*after*
> >> iommu_release_device(). Perhaps we can use two notifiers, one for
> >> ADD_DEVICE (with .priority=INT_MAX) and the other for REMOVE_DEVICE
> >> (with .priority=INT_MIN)?
> >>  
> > Since device_to_iommu() lookup in intel_iommu_release_device() only
> > checks if device is under "an" IOMMU, not "the" IOMMU. Then the remove
> > path order is not needed, right?
> > 
> > I know this is not robust, but having so many notifiers with implicit
> > priority is not clean either.
> > 
> > Perhaps, we should have explicit priority defined around iommu_bus
> > notifier? i.e.
> > 
> > @@ -1841,6 +1841,7 @@ static int iommu_bus_init(struct bus_type *bus,
> > const struct iommu_ops *ops) return -ENOMEM;
> >  nb->notifier_call = iommu_bus_notifier;
> > 
> > +   nb->priority = IOMMU_BUS_NOTIFY_PRIORITY;
> > 
> > 
> >   static struct notifier_block dmar_pci_bus_add_nb = {
> >  .notifier_call = dmar_pci_bus_notifier,
> > -   .priority = INT_MIN,
> > +   .priority = IOMMU_BUS_NOTIFY_PRIORITY + 1,
> >   };
> > 
> >   static struct notifier_block dmar_pci_bus_remove_nb = {
> >  .notifier_call = dmar_pci_bus_notifier,
> > -   .priority = INT_MIN,
> > +   .priority = IOMMU_BUS_NOTIFY_PRIORITY - 1,
> >   };  
> 
> IOMMU_BUS_NOTIFY_PRIORITY by default is 0. So you can simply use 1 and
> -1? Adding a comment around it will be helpful.
> 
Yeah, I will add comment.


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-01-13 Thread Jacob Pan
Hi BaoLu,

On Fri, 14 Jan 2022 08:58:53 +0800, Lu Baolu 
wrote:

> Hi Jacob,
> 
> On 1/13/22 9:23 PM, Jacob Pan wrote:
> > During PCI bus rescan, adding new devices involve two notifiers.
> > 1. dmar_pci_bus_notifier()
> > 2. iommu_bus_notifier()
> > The current code sets #1 as low priority (INT_MIN) which resulted in #2
> > being invoked first. The result is that struct device pointer cannot be
> > found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the
> > device is put under the "catch-all" IOMMU instead of the correct one.
> > 
> > This could cause system hang when device TLB invalidation is sent to the
> > wrong IOMMU. Invalidation timeout error or hard lockup can be observed.
> > 
> > This patch fixes the issue by setting a higher priority for
> > dmar_pci_bus_notifier. DRHD search for a new device will find the
> > correct IOMMU.
> > 
> > Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
> > Reported-by: Zhang, Bernice 
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/intel/dmar.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> > index 915bff76fe96..5d07e5b89c2e 100644
> > --- a/drivers/iommu/intel/dmar.c
> > +++ b/drivers/iommu/intel/dmar.c
> > @@ -385,7 +385,7 @@ static int dmar_pci_bus_notifier(struct
> > notifier_block *nb, 
> >   static struct notifier_block dmar_pci_bus_nb = {
> > .notifier_call = dmar_pci_bus_notifier,
> > -   .priority = INT_MIN,
> > +   .priority = INT_MAX,
> >   };
> >   
> >   static struct dmar_drhd_unit *
> >   
> 
> Nice catch! dmar_pci_bus_add_dev() should take place *before*
> iommu_probe_device(). This change enforces this with a higher notifier
> priority for dmar callback.
> 
> Comparably, dmar_pci_bus_del_dev() should take place *after*
> iommu_release_device(). Perhaps we can use two notifiers, one for
> ADD_DEVICE (with .priority=INT_MAX) and the other for REMOVE_DEVICE
> (with .priority=INT_MIN)?
> 

Since device_to_iommu() lookup in intel_iommu_release_device() only
checks if device is under "an" IOMMU, not "the" IOMMU. Then the remove path
order is not needed, right?

I know this is not robust, but having so many notifiers with implicit
priority is not clean either.

Perhaps, we should have explicit priority defined around iommu_bus
notifier? i.e.

@@ -1841,6 +1841,7 @@ static int iommu_bus_init(struct bus_type *bus, const
struct iommu_ops *ops) return -ENOMEM; 
nb->notifier_call = iommu_bus_notifier;
   
+   nb->priority = IOMMU_BUS_NOTIFY_PRIORITY;
   

 static struct notifier_block dmar_pci_bus_add_nb = {  
.notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,   
+   .priority = IOMMU_BUS_NOTIFY_PRIORITY + 1,   
 };

 static struct notifier_block dmar_pci_bus_remove_nb = {  
.notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,   
+   .priority = IOMMU_BUS_NOTIFY_PRIORITY - 1,   
 };   
   

> Best regards,
> baolu


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/vt-d: Fix PCI bus rescan device hot add

2022-01-13 Thread Jacob Pan
During PCI bus rescan, adding new devices involve two notifiers.
1. dmar_pci_bus_notifier()
2. iommu_bus_notifier()
The current code sets #1 as low priority (INT_MIN) which resulted in #2
being invoked first. The result is that struct device pointer cannot be
found in DRHD search for the new device's DMAR/IOMMU. Subsequently, the
device is put under the "catch-all" IOMMU instead of the correct one.

This could cause system hang when device TLB invalidation is sent to the
wrong IOMMU. Invalidation timeout error or hard lockup can be observed.

This patch fixes the issue by setting a higher priority for
dmar_pci_bus_notifier. DRHD search for a new device will find the
correct IOMMU.

Fixes: 59ce0515cdaf ("iommu/vt-d: Update DRHD/RMRR/ATSR device scope")
Reported-by: Zhang, Bernice 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 915bff76fe96..5d07e5b89c2e 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -385,7 +385,7 @@ static int dmar_pci_bus_notifier(struct notifier_block *nb,
 
 static struct notifier_block dmar_pci_bus_nb = {
.notifier_call = dmar_pci_bus_notifier,
-   .priority = INT_MIN,
+   .priority = INT_MAX,
 };
 
 static struct dmar_drhd_unit *
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu/vt-d: Support PASID DMA for in-kernel usage

2021-12-10 Thread Jacob Pan
Hi Jason,

On Fri, 10 Dec 2021 13:48:48 -0400, Jason Gunthorpe  wrote:

> On Fri, Dec 10, 2021 at 09:50:25AM -0800, Jacob Pan wrote:
> 
> > > Tying pasid to an iommu_domain is not a good idea. An iommu_domain
> > > represents an I/O address translation table. It could be attached to a
> > > device or a PASID on the device.  
> > 
> > I don;t think we can avoid storing PASID at domain level or the group's
> > default domain. IOTLB flush is per domain. Default domain of DMA type
> > is already tying to PASID0, right?  
> 
> No, it is just wrong.
> 
> If the HW requires a list of everything that is connected to the
> iommu_domain then it's private iommu_domain should have that list.
> 
What I have in this patchset is in the private dmar_domain
struct dmar_domain {
...
u32 kernel_pasid;   /* for in-kernel DMA w/
PASID */ atomic_t   kernel_pasid_user; /* count of kernel_pasid users
*/ struct iommu_domain domain;  /* generic domain data structure for
   iommu core */
};

Perhaps I am missing the point. "private domain" is still "domain level" as
what I stated. Confused :(

> But it is a *list* not a single PASID.
> 
We could have a list when real use case comes.

> If one device has 10 PASID's pointing to this domain you must flush
> them all if that is what the HW requires.
> 
Yes. My point is that other than PASID 0 is a given, we must track the 10
PASIDs to avoid wasted flush. It also depend on how TLBs are tagged and
flush granularity available. But at the API level, should we support all the
cases?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] ioasid: Reserve a global PASID for in-kernel DMA

2021-12-10 Thread Jacob Pan
Hi Jason,

On Fri, 10 Dec 2021 08:31:09 -0400, Jason Gunthorpe  wrote:

> On Fri, Dec 10, 2021 at 09:06:24AM +, Jean-Philippe Brucker wrote:
> > On Thu, Dec 09, 2021 at 10:14:04AM -0800, Jacob Pan wrote:  
> > > > This looks like we're just one step away from device drivers needing
> > > > multiple PASIDs for kernel DMA so I'm trying to figure out how to
> > > > evolve the API towards that. It's probably as simple as keeping a
> > > > kernel IOASID set at first, but then we'll probably want to
> > > > optimize by having multiple overlapping sets for each device driver
> > > > (all separate from the SVA set).  
> > > Sounds reasonable to start with a kernel set for in-kernel DMA once
> > > we need multiple ones. But I am not sure what *overlapping* sets mean
> > > here, could you explain?  
> > 
> > Given that each device uses a separate PASID table, we could allocate
> > the same set of PASID values for different device drivers. We just need
> > to make sure that those values are different from PASIDs allocated for
> > user SVA.  
> 
> Why does user SVA need global values anyhow?
> 
Currently, we have mm.pasid for user SVA. mm is global. We could have per
device PASID for dedicated devices (not shared across mm's), but that would
make things a lot more complex. I am thinking multiple PASIDs per mm is
needed, right?

For VT-d, the shared workqueue (SWQ) requires global PASIDs in that we
cannot have two processes use the same PASID to submit work on a workqueue
shared by the two processes. Each process's PASID must be unique to the
SWQ's PASID table.

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu/vt-d: Support PASID DMA for in-kernel usage

2021-12-10 Thread Jacob Pan
Hi Lu,

On Fri, 10 Dec 2021 14:46:32 +0800, Lu Baolu 
wrote:

> On 2021/12/10 7:21, Jacob Pan wrote:
> > On Thu, 9 Dec 2021 10:32:43 +0800, Lu Baolu
> > wrote:
> >   
> >> On 12/9/21 3:16 AM, Jacob Pan wrote:  
> >>> Hi Jason,
> >>>
> >>> On Wed, 8 Dec 2021 09:22:55 -0400, Jason Gunthorpe
> >>> wrote:  
> >>>> On Tue, Dec 07, 2021 at 05:47:13AM -0800, Jacob Pan wrote:  
> >>>>> Between DMA requests with and without PASID (legacy), DMA mapping
> >>>>> APIs are used indiscriminately on a device. Therefore, we should
> >>>>> always match the addressing mode of the legacy DMA when enabling
> >>>>> kernel PASID.
> >>>>>
> >>>>> This patch adds support for VT-d driver where the kernel PASID is
> >>>>> programmed to match RIDPASID. i.e. if the device is in pass-through,
> >>>>> the kernel PASID is also in pass-through; if the device is in IOVA
> >>>>> mode, the kernel PASID will also be using the same IOVA space.
> >>>>>
> >>>>> There is additional handling for IOTLB and device TLB flush w.r.t.
> >>>>> the kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
> >>>>> per-domain basis; whereas device TLB flush is per device. Note that
> >>>>> IOTLBs are used even when devices are in pass-through mode. ATS is
> >>>>> enabled device-wide, but the device drivers can choose to manage ATS
> >>>>> at per PASID level whenever control is available.
> >>>>>
> >>>>> Signed-off-by: Jacob Pan
> >>>>>drivers/iommu/intel/iommu.c | 105
> >>>>> +++- drivers/iommu/intel/pasid.c |
> >>>>> 7 +++ include/linux/intel-iommu.h |   3 +-
> >>>>>3 files changed, 113 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/iommu/intel/iommu.c
> >>>>> b/drivers/iommu/intel/iommu.c index 60253bc436bb..a2ef6b9e4bfc
> >>>>> 100644 +++ b/drivers/iommu/intel/iommu.c
> >>>>> @@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct
> >>>>> intel_iommu *iommu, if (domain->default_pasid)
> >>>>> qi_flush_piotlb(iommu, did,
> >>>>> domain->default_pasid, addr, npages, ih);
> >>>>> -
> >>>>> +   if (domain->kernel_pasid && !domain_type_is_si(domain)) {
> >>>>> +   /*
> >>>>> +* REVISIT: we only do PASID IOTLB inval for FL, we
> >>>>> could have SL
> >>>>> +* for PASID in the future such as vIOMMU PT. this
> >>>>> doesn't get hit.
> >>>>> +*/
> >>>>> +   qi_flush_piotlb(iommu, did, domain->kernel_pasid,
> >>>>> +   addr, npages, ih);
> >>>>> +   }
> >>>>> if (!list_empty(>devices))
> >>>>> qi_flush_piotlb(iommu, did, PASID_RID2PASID,
> >>>>> addr, npages, ih); }
> >>>>> @@ -5695,6 +5702,100 @@ static void
> >>>>> intel_iommu_iotlb_sync_map(struct iommu_domain *domain, }
> >>>>>}
> >>>>>
> >>>>> +static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
> >>>>> +{  
> >>>> This seems like completely the wrong kind of op.
> >>>>
> >>>> At the level of the iommu driver things should be iommu_domain
> >>>> centric
> >>>>
> >>>> The op should be
> >>>>
> >>>> int attach_dev_pasid(struct iommu_domain *domain, struct device *dev,
> >>>> ioasid_t pasid)
> >>>>
> >>>> Where 'dev' purpose is to provide the RID
> >>>>
> >>>> The iommu_domain passed in should be the 'default domain' ie the
> >>>> table used for on-demand mapping, or the passthrough page table.
> >>>> 
> >>> Makes sense. DMA API is device centric, iommu API is domain centric.
> >>> It should be the common IOMMU code to get the default domain then
> >>> pass to vendor drivers. Then we can enforce default domain behavior
> >>> across all vendor drivers.
> >>> i.e.  
> >>>   dom 

Re: [PATCH 3/4] iommu/vt-d: Support PASID DMA for in-kernel usage

2021-12-09 Thread Jacob Pan
Hi Lu,

On Thu, 9 Dec 2021 10:32:43 +0800, Lu Baolu 
wrote:

> On 12/9/21 3:16 AM, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Wed, 8 Dec 2021 09:22:55 -0400, Jason Gunthorpe 
> > wrote: 
> >> On Tue, Dec 07, 2021 at 05:47:13AM -0800, Jacob Pan wrote:  
> >>> Between DMA requests with and without PASID (legacy), DMA mapping APIs
> >>> are used indiscriminately on a device. Therefore, we should always
> >>> match the addressing mode of the legacy DMA when enabling kernel
> >>> PASID.
> >>>
> >>> This patch adds support for VT-d driver where the kernel PASID is
> >>> programmed to match RIDPASID. i.e. if the device is in pass-through,
> >>> the kernel PASID is also in pass-through; if the device is in IOVA
> >>> mode, the kernel PASID will also be using the same IOVA space.
> >>>
> >>> There is additional handling for IOTLB and device TLB flush w.r.t. the
> >>> kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
> >>> per-domain basis; whereas device TLB flush is per device. Note that
> >>> IOTLBs are used even when devices are in pass-through mode. ATS is
> >>> enabled device-wide, but the device drivers can choose to manage ATS
> >>> at per PASID level whenever control is available.
> >>>
> >>> Signed-off-by: Jacob Pan 
> >>>   drivers/iommu/intel/iommu.c | 105
> >>> +++- drivers/iommu/intel/pasid.c |
> >>> 7 +++ include/linux/intel-iommu.h |   3 +-
> >>>   3 files changed, 113 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> >>> index 60253bc436bb..a2ef6b9e4bfc 100644
> >>> +++ b/drivers/iommu/intel/iommu.c
> >>> @@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct
> >>> intel_iommu *iommu, if (domain->default_pasid)
> >>>   qi_flush_piotlb(iommu, did, domain->default_pasid,
> >>>   addr, npages, ih);
> >>> -
> >>> + if (domain->kernel_pasid && !domain_type_is_si(domain)) {
> >>> + /*
> >>> +  * REVISIT: we only do PASID IOTLB inval for FL, we
> >>> could have SL
> >>> +  * for PASID in the future such as vIOMMU PT. this
> >>> doesn't get hit.
> >>> +  */
> >>> + qi_flush_piotlb(iommu, did, domain->kernel_pasid,
> >>> + addr, npages, ih);
> >>> + }
> >>>   if (!list_empty(>devices))
> >>>   qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr,
> >>> npages, ih); }
> >>> @@ -5695,6 +5702,100 @@ static void intel_iommu_iotlb_sync_map(struct
> >>> iommu_domain *domain, }
> >>>   }
> >>>   
> >>> +static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
> >>> +{  
> >>
> >> This seems like completely the wrong kind of op.
> >>
> >> At the level of the iommu driver things should be iommu_domain centric
> >>
> >> The op should be
> >>
> >> int attach_dev_pasid(struct iommu_domain *domain, struct device *dev,
> >> ioasid_t pasid)
> >>
> >> Where 'dev' purpose is to provide the RID
> >>
> >> The iommu_domain passed in should be the 'default domain' ie the table
> >> used for on-demand mapping, or the passthrough page table.
> >>  
> > Makes sense. DMA API is device centric, iommu API is domain centric. It
> > should be the common IOMMU code to get the default domain then pass to
> > vendor drivers. Then we can enforce default domain behavior across all
> > vendor drivers.
> > i.e.
> > dom = iommu_get_dma_domain(dev);
> > attach_dev_pasid(dom, dev, pasid);
> >   
> >>> + struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> >>> + struct device_domain_info *info;  
> >>
> >> I don't even want to know why an iommu driver is tracking its own
> >> per-device state. That seems like completely wrong layering.
> >>  
> > This is for IOTLB and deTLB flush. IOTLB is flushed at per domain level,
> > devTLB is per device.
> > 
> > For multi-device groups, this is a need to track how many devices are
> > using the kernel DMA PASID.
> > 
> > Are you suggesting we add the tracking info in the generic layer? i.e.
> &g

Re: [PATCH 4/4] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2021-12-09 Thread Jacob Pan
Hi Kevin,

On Thu, 9 Dec 2021 01:48:09 +, "Tian, Kevin" 
wrote:

> > From: Jason Gunthorpe 
> > Sent: Thursday, December 9, 2021 1:51 AM
> >   
> > > > > + /*
> > > > > +  * Try to enable both in-kernel and user DMA request
> > > > > with PASID.
> > > > > +  * PASID is supported unless both user and kernel PASID
> > > > > are
> > > > > +  * supported. Do not fail probe here in that idxd can
> > > > > still be
> > > > > +  * used w/o PASID or IOMMU.
> > > > > +  */
> > > > > + if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) ||
> > > > > + idxd_enable_system_pasid(idxd)) {
> > > > > + dev_warn(dev, "Failed to enable PASID\n");
> > > > > + } else {
> > > > > + set_bit(IDXD_FLAG_PASID_ENABLED, >flags);
> > > > >   }  
> > > > Huh? How can the driver keep going if PASID isn't supported? I
> > > > thought the whole point of this was because the device cannot do
> > > > DMA without PASID at all?  
> > >
> > > There are 2 types of WQ supported with the DSA devices. A dedicated
> > > WQ  
> > type  
> > > and a shared WQ type. The dedicated WQ type can support DMA with and  
> > without  
> > > PASID. The shared wq type must have a PASID to operate. The driver can
> > > support dedicated WQ only without PASID usage when there is no PASID
> > > support.  
> > 
> > Can you add to the cover letter why does the kernel require to use the
> > shared WQ?
> > 
> > Jason  
> 
> Two reasons:
> 
> On native the shared WQ is useful when the kernel wants to offload
> some memory operations (e.g. page-zeroing) to DSA. When #CPUs are
> more than #WQs, this allows per-cpu lock-less submissions using
> ENQCMD(PASID, payload) instruction.
> 
> In guest the virtual DSA HW may only contain a WQ in shared mode
> (unchangeable by the guest) when the host admin wants to share
> the limited WQ resource among many VMs. Then there is no choice
> in guest regardless whether it's for user or kernel controlled DMA.
I will add these to the next cover letter.


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] ioasid: Reserve a global PASID for in-kernel DMA

2021-12-09 Thread Jacob Pan
Hi Jean-Philippe,

On Thu, 9 Dec 2021 11:03:23 +, Jean-Philippe Brucker
 wrote:

> Hi Jacob,
> 
> On Tue, Dec 07, 2021 at 05:47:11AM -0800, Jacob Pan wrote:
> > In-kernel DMA is managed by DMA mapping APIs, which supports per device
> > addressing mode for legacy DMA requests. With the introduction of
> > Process Address Space ID (PASID), device DMA can now target at a finer
> > granularity per PASID + Requester ID (RID).
> > 
> > However, for in-kernel DMA there is no need to differentiate between
> > legacy DMA and DMA with PASID in terms of mapping. DMA address mapping
> > for RID+PASID can be made identical to the RID. The benefit for the
> > drivers is the continuation of DMA mapping APIs without change.
> > 
> > This patch reserves a special IOASID for devices that perform in-kernel
> > DMA requests with PASID. This global IOASID is excluded from the
> > IOASID allocator. The analogous case is PASID #0, a special PASID
> > reserved for DMA requests without PASID (legacy). We could have
> > different kernel PASIDs for individual devices, but for simplicity
> > reasons, a globally reserved one will fit the bill.
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
> >  drivers/iommu/intel/iommu.c | 4 ++--
> >  drivers/iommu/intel/pasid.h | 3 +--
> >  drivers/iommu/intel/svm.c   | 2 +-
> >  drivers/iommu/ioasid.c  | 2 ++
> >  include/linux/ioasid.h  | 4 
> >  6 files changed, 11 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index
> > ee66d1f4cb81..ac79a37ffe06 100644 ---
> > a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -329,7 +329,7 @@
> > __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) return
> > ERR_PTR(-ENOMEM); 
> > /* Allocate a PASID for this mm if necessary */
> > -   ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) -
> > 1);
> > +   ret = iommu_sva_alloc_pasid(mm, IOASID_ALLOC_BASE, (1U <<
> > master->ssid_bits) - 1);  
> 
> I'd rather keep hardware limits as parameters here. PASID#0 is reserved by
> the SMMUv3 hardware so we have to pass at least 1 here, but VT-d could
> change RID_PASID and pass 0. On the other hand IOASID_DMA_PASID depends on
> device drivers needs and is not needed on all systems, so I think could
> stay within the ioasid allocator. Could VT-d do an
> ioasid_alloc()/ioasid_get() to reserve this global PASID, storing it
> under the device_domain_lock?
> 
Yes, this works. We can delegate DMA PASID allocation to vendor drivers. My
proposal here is driven by simplicity.

> This looks like we're just one step away from device drivers needing
> multiple PASIDs for kernel DMA so I'm trying to figure out how to evolve
> the API towards that. It's probably as simple as keeping a kernel IOASID
> set at first, but then we'll probably want to optimize by having multiple
> overlapping sets for each device driver (all separate from the SVA set).
Sounds reasonable to start with a kernel set for in-kernel DMA once we need
multiple ones. But I am not sure what *overlapping* sets mean here, could
you explain?

> 


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu: Add PASID support for DMA mapping API users

2021-12-09 Thread Jacob Pan
Hi Ashok,

On Thu, 9 Dec 2021 08:57:15 -0800, "Raj, Ashok"  wrote:

> > Prefixes is disabled
> >  - Root Complexes may optionally support TLPs with PASID TLP Prefixes.
> > The mechanism used to detect whether a Root Complex supports the PASID
> > TLP Prefix is implementation specific  
> 
> Isn't implementation specific mechanism is IOMMU?
> 
I agree. In case of VT-d it would be in ecap.pasid bit.
> > "
> > For all practical purposes, why would someone sets up PASID for DMA
> > just to be ignored? An IOMMU interface makes sense to me.
> >   
> > > Yes, exactly. Imagining in the VM guest environment, do we require a
> > > vIOMMU for this functionality? vIOMMU is not performance friendly if
> > > we put aside the security considerations.
> > >   
> > The primary use case for accelerators to use in-kernel DMA will be in
> > pass-through mode. vIOMMU should be able to do PT with good performance,
> > right? no nesting, IO page faults.  
> 
> But from an enabling perspective when PASID is in use we have to mandate
> either the presence of an IOMMU, or some hypercall that will do the
> required plumbing for PASID isn't it? 
So the point is that we need either vIOMMU or virtio IOMMU to use PASID?
For the purpose of this discussion to decide whether iommu API or DMA API
should be used, I am still convinced it should be iommu API.

Unlike IOMMU on/off for DMA API (which is transparent to the driver), using
PASID is not transparent. Other than enabling the PASID, the driver has to
program the PASID explicitly. There is no point of doing this dance knowing
the PASID might be ignored.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu: Add PASID support for DMA mapping API users

2021-12-09 Thread Jacob Pan
Hi Lu,

On Thu, 9 Dec 2021 10:21:38 +0800, Lu Baolu 
wrote:

> On 12/9/21 9:56 AM, Tian, Kevin wrote:
> >> From: Jacob Pan 
> >> Sent: Thursday, December 9, 2021 2:50 AM
> >>  
> >>> Can a device issue DMA requests with PASID even there's no system  
> >> IOMMU  
> >>> or the system IOMMU is disabled?
> >>>  
> >> Good point.
> >> If IOMMU is not enabled, device cannot issue DMA requests with PASID.
> >> This API will not be available. Forgot to add dummy functions to the
> >> header. 
> > 
> > PASID is a PCI thing, not defined by IOMMU.
> > 
> > I think the key is physically if IOMMU is disabled, how will root
> > complex handle a PCI memory request including a PASID TLP prefix? Does
> > it block such request due to no IOMMU to consume PASID or simply ignore
> > PASID and continue routing the request to the memory controller?
> > 
> > If block, then having an iommu interface makes sense.
> > 
> > If ignore, possibly a DMA API call makes more sense instead, implying
> > that this extension can be used even when iommu is disabled.
> > 
> > I think that is what Baolu wants to point out.  
> 
Thanks for clarifying, very good point.
Looking at the PCIe spec. I don't see specific rules for RC to ignore or
block PASID TLP if not enabled.
"- A Root Complex that supports PASID TLP Prefixes must have a device
specific mechanism for enabling them. By default usage of PASID TLP
Prefixes is disabled
 - Root Complexes may optionally support TLPs with PASID TLP Prefixes. The
mechanism used to detect whether a Root Complex supports the PASID TLP
Prefix is implementation specific
"
For all practical purposes, why would someone sets up PASID for DMA just to
be ignored? An IOMMU interface makes sense to me.

> Yes, exactly. Imagining in the VM guest environment, do we require a
> vIOMMU for this functionality? vIOMMU is not performance friendly if we
> put aside the security considerations.
> 
The primary use case for accelerators to use in-kernel DMA will be in
pass-through mode. vIOMMU should be able to do PT with good performance,
right? no nesting, IO page faults.

> Best regards,
> baolu


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/4] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2021-12-08 Thread Jacob Pan
Hi Jason,

On Wed, 8 Dec 2021 16:30:22 -0400, Jason Gunthorpe  wrote:

> On Wed, Dec 08, 2021 at 11:55:16AM -0800, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Wed, 8 Dec 2021 09:13:58 -0400, Jason Gunthorpe 
> > wrote: 
> > > > This patch utilizes iommu_enable_pasid_dma() to enable DSA to
> > > > perform DMA requests with PASID under the same mapping managed by
> > > > DMA mapping API. In addition, SVA-related bits for kernel DMA are
> > > > removed. As a result, DSA users shall use DMA mapping API to obtain
> > > > DMA handles instead of using kernel virtual addresses.
> > > 
> > > Er, shouldn't this be adding dma_map/etc type calls?
> > > 
> > > You can't really say a driver is using the DMA API without actually
> > > calling the DMA API..  
> > The IDXD driver is not aware of addressing mode, it is up to the user of
> > dmaengine API to prepare the buffer mappings. Here we only set up the
> > PASID such that it can be picked up during DMA work submission. I
> > tested with /drivers/dma/dmatest.c which does dma_map_page(),
> > map_single etc. also tested with other pieces under development.  
> 
> Ignoring the work, doesn't IDXD prepare the DMA queues itself, don't
> those need the DMA API?
> 
Do you mean wq completion record address? It is already using DMA API.
wq->compls = dma_alloc_coherent(dev, wq->compls_size,
>compls_addr, GFP_KERNEL);
desc->compl_dma = wq->compls_addr + idxd->data->compl_size * i;

> I'm still very confused how this can radically change from using kSVA
> to DMA API and NOT introduce some more changes than this. They are not
I am guessing the confusion comes from that fact the user of kSVA is not
merged. We were in the process of upstreaming then abandon it. Perhaps that
is why you don't see removing kSVA code?

> the same thing, they do not use the same IOVA's. Did you test this
> with bypass mode off?
Yes with dmatest. IOVA is the default, I separated out the SATC patch which
will put internal accelerators in bypass mode. It can also be verified by
iommu debugfs dump of DMA PASID (2) and PASID 0 (RIDPASID) are pointing to
he same default domain. e.g
PASID   PASID_table_entry
0   0x000119ed7004:0x0082:0x004d
1   0x0001:0x0081:0x010d
2   0x000119ed7004:0x0082:0x004d


> 
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/4] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2021-12-08 Thread Jacob Pan
Hi Jason,

On Wed, 8 Dec 2021 09:13:58 -0400, Jason Gunthorpe  wrote:

> > This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform
> > DMA requests with PASID under the same mapping managed by DMA mapping
> > API. In addition, SVA-related bits for kernel DMA are removed. As a
> > result, DSA users shall use DMA mapping API to obtain DMA handles
> > instead of using kernel virtual addresses.  
> 
> Er, shouldn't this be adding dma_map/etc type calls?
> 
> You can't really say a driver is using the DMA API without actually
> calling the DMA API..
The IDXD driver is not aware of addressing mode, it is up to the user of
dmaengine API to prepare the buffer mappings. Here we only set up the PASID
such that it can be picked up during DMA work submission. I tested with
/drivers/dma/dmatest.c which does dma_map_page(), map_single etc. also
tested with other pieces under development.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu/vt-d: Support PASID DMA for in-kernel usage

2021-12-08 Thread Jacob Pan
Hi Jason,

On Wed, 8 Dec 2021 09:22:55 -0400, Jason Gunthorpe  wrote:

> On Tue, Dec 07, 2021 at 05:47:13AM -0800, Jacob Pan wrote:
> > Between DMA requests with and without PASID (legacy), DMA mapping APIs
> > are used indiscriminately on a device. Therefore, we should always match
> > the addressing mode of the legacy DMA when enabling kernel PASID.
> > 
> > This patch adds support for VT-d driver where the kernel PASID is
> > programmed to match RIDPASID. i.e. if the device is in pass-through, the
> > kernel PASID is also in pass-through; if the device is in IOVA mode, the
> > kernel PASID will also be using the same IOVA space.
> > 
> > There is additional handling for IOTLB and device TLB flush w.r.t. the
> > kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
> > per-domain basis; whereas device TLB flush is per device. Note that
> > IOTLBs are used even when devices are in pass-through mode. ATS is
> > enabled device-wide, but the device drivers can choose to manage ATS at
> > per PASID level whenever control is available.
> > 
> > Signed-off-by: Jacob Pan 
> >  drivers/iommu/intel/iommu.c | 105 +++-
> >  drivers/iommu/intel/pasid.c |   7 +++
> >  include/linux/intel-iommu.h |   3 +-
> >  3 files changed, 113 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 60253bc436bb..a2ef6b9e4bfc 100644
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct
> > intel_iommu *iommu, if (domain->default_pasid)
> > qi_flush_piotlb(iommu, did, domain->default_pasid,
> > addr, npages, ih);
> > -
> > +   if (domain->kernel_pasid && !domain_type_is_si(domain)) {
> > +   /*
> > +* REVISIT: we only do PASID IOTLB inval for FL, we
> > could have SL
> > +* for PASID in the future such as vIOMMU PT. this
> > doesn't get hit.
> > +*/
> > +   qi_flush_piotlb(iommu, did, domain->kernel_pasid,
> > +   addr, npages, ih);
> > +   }
> > if (!list_empty(>devices))
> > qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr,
> > npages, ih); }
> > @@ -5695,6 +5702,100 @@ static void intel_iommu_iotlb_sync_map(struct
> > iommu_domain *domain, }
> >  }
> >  
> > +static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
> > +{  
> 
> This seems like completely the wrong kind of op.
> 
> At the level of the iommu driver things should be iommu_domain centric
> 
> The op should be
> 
> int attach_dev_pasid(struct iommu_domain *domain, struct device *dev,
> ioasid_t pasid)
> 
> Where 'dev' purpose is to provide the RID
> 
> The iommu_domain passed in should be the 'default domain' ie the table
> used for on-demand mapping, or the passthrough page table.
> 
Makes sense. DMA API is device centric, iommu API is domain centric. It
should be the common IOMMU code to get the default domain then pass to
vendor drivers. Then we can enforce default domain behavior across all
vendor drivers.
i.e.
dom = iommu_get_dma_domain(dev);
attach_dev_pasid(dom, dev, pasid);

> > +   struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> > +   struct device_domain_info *info;  
> 
> I don't even want to know why an iommu driver is tracking its own
> per-device state. That seems like completely wrong layering.
> 
This is for IOTLB and deTLB flush. IOTLB is flushed at per domain level,
devTLB is per device.

For multi-device groups, this is a need to track how many devices are using
the kernel DMA PASID.

Are you suggesting we add the tracking info in the generic layer? i.e.
iommu_group.

We could also have a generic device domain info to replace what is in VT-d
and FSL IOMMU driver, etc.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu: Add PASID support for DMA mapping API users

2021-12-08 Thread Jacob Pan
Hi Lu,

On Wed, 8 Dec 2021 10:31:36 +0800, Lu Baolu 
wrote:

> Hi Jacob,
> 
> On 12/7/21 9:47 PM, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> > 
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a kernel PASID for work submission
> > 2. Enable the kernel PASID on the IOMMU
> > 3. Establish address space for the kernel PASID that matches the default
> > domain. Let it be IOVA or physical address in case of pass-through.
> > 
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.  
> 
> Can a device issue DMA requests with PASID even there's no system IOMMU
> or the system IOMMU is disabled?
> 
Good point. 
If IOMMU is not enabled, device cannot issue DMA requests with PASID. This
API will not be available. Forgot to add dummy functions to the header.

> Best regards,
> baolu


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/4] Enable PASID for DMA API users

2021-12-08 Thread Jacob Pan
Hi Jason,

Thanks for the quick review.

On Wed, 8 Dec 2021 09:10:38 -0400, Jason Gunthorpe  wrote:

> On Tue, Dec 07, 2021 at 05:47:10AM -0800, Jacob Pan wrote:
> > Modern accelerators such as Intel's Data Streaming Accelerator (DSA) can
> > perform DMA requests with PASID, which is a finer granularity than the
> > device's requester ID(RID). In fact, work submissions on DSA shared work
> > queues require PASID.  
> 
> Lets use plain langauge please:
> 
> DSA HW cannot do DMA from its RID, so always requires a PASID, even
> for kernel controlled DMA.
> 
> To allow it to use the DMA API we must associate a PASID with the
> iommu_domain that the DMA API is already using for the device's RID.
> 
> This way DMA tagged with the PASID will be treated exactly the same as
> DMA originating from the RID.
> 
Exactly, will incorporate in the next version.

> > DMA mapping API is the de facto standard for in-kernel DMA. However, it
> > operates on a per device/RID basis which is not PASID-aware.
> > 
> > This patch introduces the following driver facing API that enables DMA
> > API PASID usage: ioasid_t iommu_enable_pasid_dma(struct device *dev);  
> 
> This is the wrong API, IMHO
> 
> It should be more like
> 
> int iommu_get_dma_api_pasid(struct device *dev, ioasid_t *pasid);
This works. I had ioasid_t *pasid in my previous version but thinking we
can simplify the interface since we can have the reserved INVALID_IOASID
for return status.
But it seems to me _get_ does not convey the message that this API
actually enables/attaches the kernel DMA PASID. Perhaps call it
iommu_attach_dma_api_pasid() as you suggested in the ops function?

> void iommu_destroy_dma_api_pasid(struct device *dev);
> 
Sounds good

> > A PASID field is added to struct device for the purposes of storing
> > kernel DMA PASID and flushing device IOTLBs. A separate use case in
> > interrupt  
> 
> And this really should not be touching the struct device at all.
> 
I was thinking RID is per device, this PASID == RID. We could put in pci_dev
but there are non-PCI devices use SSID/PASID.

> At worst the PASID should be stored in the iommu_group.
> 
This also makes sense, default domain is stored per group. To support
multiple devices per group, we still need a per device flag for devTLB
flush. Right?

i.e. while doing iova unmap, IOTLBs are flush for all devices, but we
only need to flush the device TLBs for devices that has kernel DMA PASID.

> > message store (IMS) also hinted adding a PASID field to struct device.
> > https://lore.kernel.org/all/87pmx73tfw@nanos.tec.linutronix.de/
> > IMS virtualization and DMA API does not overlap.  
> 
> This is under debate, I'm skeptical it will happen considering the new
> direction for this work.
> 
Good to know, thanks.

> > Once enabled, device drivers can continue to use DMA APIs as-is. There
> > is no difference in terms of mapping in dma_handle between without
> > PASID and with PASID.  The DMA mapping performed by IOMMU will be
> > identical for both requests with and without PASID (legacy), let it be
> > IOVA or PA in case of pass-through.  
> 
> In other words all this does is connect the PASID to the normal
> DMA-API owned iommu_domain.
> 
Exactly! will incorporate. 


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/4] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2021-12-08 Thread Jacob Pan
Hi Vinod,

On Wed, 8 Dec 2021 10:26:22 +0530, Vinod Koul  wrote:

> Pls resend collecting acks. I dont have this in my queue
Will do. Sorry I missed the dmaengine list.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] iommu: Add PASID support for DMA mapping API users

2021-12-07 Thread Jacob Pan
DMA mapping API is the de facto standard for in-kernel DMA. It operates
on a per device/RID basis which is not PASID-aware.

Some modern devices such as Intel Data Streaming Accelerator, PASID is
required for certain work submissions. To allow such devices use DMA
mapping API, we need the following functionalities:
1. Provide device a way to retrieve a kernel PASID for work submission
2. Enable the kernel PASID on the IOMMU
3. Establish address space for the kernel PASID that matches the default
   domain. Let it be IOVA or physical address in case of pass-through.

This patch introduces a driver facing API that enables DMA API
PASID usage. Once enabled, device drivers can continue to use DMA APIs as
is. There is no difference in dma_handle between without PASID and with
PASID.

To manage device IOTLB flush at PASID level, this patch also introduces
a .pasid field to struct device. This also serves as a flag indicating
whether PASID is being used for the device to perform in-kernel DMA.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/dma-iommu.c | 71 +++
 include/linux/device.h|  1 +
 include/linux/dma-iommu.h |  7 
 include/linux/iommu.h |  4 +++
 4 files changed, 83 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b42e38a0dbe2..8855d5e99d8e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -167,6 +167,77 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
domain->iova_cookie = NULL;
 }
 
+/**
+ * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
+ * @dev:   Device to be enabled
+ *
+ * DMA request with PASID will be mapped the same way as the legacy DMA.
+ * If the device is in pass-through, PASID will also pass-through. If the
+ * device is in IOVA map, the supervisor PASID will point to the same IOVA
+ * page table.
+ *
+ * @return the kernel PASID to be used for DMA or INVALID_IOASID on failure
+ */
+ioasid_t iommu_enable_pasid_dma(struct device *dev)
+{
+   struct iommu_domain *dom;
+
+   if (dev->pasid) {
+   dev_err(dev, "PASID DMA already enabled\n");
+   return IOASID_DMA_PASID;
+   }
+   dom = iommu_get_domain_for_dev(dev);
+
+   if (!dom) {
+   dev_err(dev, "No IOMMU domain\n");
+   return INVALID_IOASID;
+   }
+
+   /*
+* Use the reserved kernel PASID for all devices. For now,
+* there is no need to have different PASIDs for in-kernel use.
+*/
+   if (!dom->ops->enable_pasid_dma || dom->ops->enable_pasid_dma(dev, 
IOASID_DMA_PASID))
+   return INVALID_IOASID;
+   /* Used for device IOTLB flush */
+   dev->pasid = IOASID_DMA_PASID;
+
+   return IOASID_DMA_PASID;
+}
+EXPORT_SYMBOL(iommu_enable_pasid_dma);
+
+/**
+ * iommu_disable_pasid_dma --Disable in-kernel DMA request with PASID
+ * @dev:   Device's PASID DMA to be disabled
+ *
+ * It is the device driver's responsibility to ensure no more incoming DMA
+ * requests with the kernel PASID before calling this function. IOMMU driver
+ * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
+ * drained.
+ *
+ * @return 0 on success or error code on failure
+ */
+int iommu_disable_pasid_dma(struct device *dev)
+{
+   struct iommu_domain *dom;
+   int ret = 0;
+
+   if (!dev->pasid) {
+   dev_err(dev, "PASID DMA not enabled\n");
+   return -ENODEV;
+   }
+   dom = iommu_get_domain_for_dev(dev);
+   if (!dom->ops->disable_pasid_dma)
+   return -ENOTSUPP;
+
+   ret = dom->ops->disable_pasid_dma(dev);
+   if (!ret)
+   dev->pasid = 0;
+
+   return ret;
+}
+EXPORT_SYMBOL(iommu_disable_pasid_dma);
+
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
diff --git a/include/linux/device.h b/include/linux/device.h
index e270cb740b9e..8afa033b8b0b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -559,6 +559,7 @@ struct device {
void(*release)(struct device *dev);
struct iommu_group  *iommu_group;
struct dev_iommu*iommu;
+   u32 pasid;  /* For in-kernel DMA w/ PASID */
 
enum device_removable   removable;
 
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 24607dc3c2ac..298b31e3a007 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain *domain);
 int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
 void iommu_put_dma_cookie(struct iommu_domain *domain);
 
+/*
+ * For devices that can do DMA request with PASID, setup a system PASID.
+ * Address modes (IOVA, PA) are selected by the platform code.
+ */
+ioasid_t iommu_enable_pasid_dma(struct device *dev

[PATCH 1/4] ioasid: Reserve a global PASID for in-kernel DMA

2021-12-07 Thread Jacob Pan
In-kernel DMA is managed by DMA mapping APIs, which supports per device
addressing mode for legacy DMA requests. With the introduction of
Process Address Space ID (PASID), device DMA can now target at a finer
granularity per PASID + Requester ID (RID).

However, for in-kernel DMA there is no need to differentiate between
legacy DMA and DMA with PASID in terms of mapping. DMA address mapping
for RID+PASID can be made identical to the RID. The benefit for the
drivers is the continuation of DMA mapping APIs without change.

This patch reserves a special IOASID for devices that perform in-kernel
DMA requests with PASID. This global IOASID is excluded from the
IOASID allocator. The analogous case is PASID #0, a special PASID
reserved for DMA requests without PASID (legacy). We could have different
kernel PASIDs for individual devices, but for simplicity reasons, a
globally reserved one will fit the bill.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
 drivers/iommu/intel/iommu.c | 4 ++--
 drivers/iommu/intel/pasid.h | 3 +--
 drivers/iommu/intel/svm.c   | 2 +-
 drivers/iommu/ioasid.c  | 2 ++
 include/linux/ioasid.h  | 4 
 6 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index ee66d1f4cb81..ac79a37ffe06 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -329,7 +329,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct 
*mm)
return ERR_PTR(-ENOMEM);
 
/* Allocate a PASID for this mm if necessary */
-   ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
+   ret = iommu_sva_alloc_pasid(mm, IOASID_ALLOC_BASE, (1U << 
master->ssid_bits) - 1);
if (ret)
goto err_free_bond;
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6afb4d4e09ef..60253bc436bb 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3253,7 +3253,7 @@ static ioasid_t intel_vcmd_ioasid_alloc(ioasid_t min, 
ioasid_t max, void *data)
 * PASID range. Host can partition guest PASID range based on
 * policies but it is out of guest's control.
 */
-   if (min < PASID_MIN || max > intel_pasid_max_id)
+   if (min < IOASID_ALLOC_BASE || max > intel_pasid_max_id)
return INVALID_IOASID;
 
if (vcmd_alloc_pasid(iommu, ))
@@ -4824,7 +4824,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
u32 pasid;
 
/* No private data needed for the default pasid */
-   pasid = ioasid_alloc(NULL, PASID_MIN,
+   pasid = ioasid_alloc(NULL, IOASID_ALLOC_BASE,
 pci_max_pasids(to_pci_dev(dev)) - 1,
 NULL);
if (pasid == INVALID_IOASID) {
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index d5552e2c160d..c3a714535c03 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -10,8 +10,7 @@
 #ifndef __INTEL_PASID_H
 #define __INTEL_PASID_H
 
-#define PASID_RID2PASID0x0
-#define PASID_MIN  0x1
+#define PASID_RID2PASIDIOASID_DMA_NO_PASID
 #define PASID_MAX  0x10
 #define PASID_PTE_MASK 0x3F
 #define PASID_PTE_PRESENT  1
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5b5d69b04fcc..95dcaf78c22c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -511,7 +511,7 @@ static int intel_svm_alloc_pasid(struct device *dev, struct 
mm_struct *mm,
ioasid_t max_pasid = dev_is_pci(dev) ?
pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
 
-   return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
+   return iommu_sva_alloc_pasid(mm, IOASID_ALLOC_BASE, max_pasid - 1);
 }
 
 static void intel_svm_free_pasid(struct mm_struct *mm)
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 50ee27bbd04e..89c6132bf1ec 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -317,6 +317,8 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, 
ioasid_t max,
data->private = private;
refcount_set(>refs, 1);
 
+   if (min < IOASID_ALLOC_BASE)
+   min = IOASID_ALLOC_BASE;
/*
 * Custom allocator needs allocator data to perform platform specific
 * operations.
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index e9dacd4b9f6b..4d435cbd48b8 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -6,6 +6,10 @@
 #include 
 
 #define INVALID_IOASID ((ioasid_

[PATCH 0/4] Enable PASID for DMA API users

2021-12-07 Thread Jacob Pan
Modern accelerators such as Intel's Data Streaming Accelerator (DSA) can
perform DMA requests with PASID, which is a finer granularity than the
device's requester ID(RID). In fact, work submissions on DSA shared work
queues require PASID.

DMA mapping API is the de facto standard for in-kernel DMA. However, it
operates on a per device/RID basis which is not PASID-aware.

This patch introduces the following driver facing API that enables DMA API
PASID usage: ioasid_t iommu_enable_pasid_dma(struct device *dev);

A PASID field is added to struct device for the purposes of storing kernel
DMA PASID and flushing device IOTLBs. A separate use case in interrupt
message store (IMS) also hinted adding a PASID field to struct device.
https://lore.kernel.org/all/87pmx73tfw@nanos.tec.linutronix.de/
IMS virtualization and DMA API does not overlap.

Once enabled, device drivers can continue to use DMA APIs as-is. There is
no difference in terms of mapping in dma_handle between without PASID and
with PASID.  The DMA mapping performed by IOMMU will be identical for both
requests with and without PASID (legacy), let it be IOVA or PA in case of
pass-through.

In addition, this set converts the current support for in-kernel PASID DMA
from SVA lib to DMA API. There have been security and functional issues
with the kernel SVA approach:
(https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/)
The highlights are as the following:
 - The lack of IOTLB synchronization upon kernel page table updates.
   (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
 - Other than slight more protection, using kernel virtual address (KVA)
 has little advantage over physical address.
There are also no use cases yet where DMA engines need kernel virtual
addresses for in-kernel DMA.

Once this set is accepted, more cleanup patches will follow. The plan is to
remove the usage of sva_bind_device() for in-kernel usages. Removing page
requests and other special cases around kernel SVA in VT-d driver.



Jacob Pan (4):
  ioasid: Reserve a global PASID for in-kernel DMA
  iommu: Add PASID support for DMA mapping API users
  iommu/vt-d: Support PASID DMA for in-kernel usage
  dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

 .../admin-guide/kernel-parameters.txt |   6 -
 drivers/dma/Kconfig   |  10 --
 drivers/dma/idxd/idxd.h   |   1 -
 drivers/dma/idxd/init.c   |  59 +++---
 drivers/dma/idxd/sysfs.c  |   7 --
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   2 +-
 drivers/iommu/dma-iommu.c |  71 
 drivers/iommu/intel/iommu.c   | 109 +-
 drivers/iommu/intel/pasid.c   |   7 ++
 drivers/iommu/intel/pasid.h   |   3 +-
 drivers/iommu/intel/svm.c |   2 +-
 drivers/iommu/ioasid.c|   2 +
 include/linux/device.h|   1 +
 include/linux/dma-iommu.h |   7 ++
 include/linux/intel-iommu.h   |   3 +-
 include/linux/ioasid.h|   4 +
 include/linux/iommu.h |   4 +
 17 files changed, 226 insertions(+), 72 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] iommu/vt-d: Support PASID DMA for in-kernel usage

2021-12-07 Thread Jacob Pan
Between DMA requests with and without PASID (legacy), DMA mapping APIs
are used indiscriminately on a device. Therefore, we should always match
the addressing mode of the legacy DMA when enabling kernel PASID.

This patch adds support for VT-d driver where the kernel PASID is
programmed to match RIDPASID. i.e. if the device is in pass-through, the
kernel PASID is also in pass-through; if the device is in IOVA mode, the
kernel PASID will also be using the same IOVA space.

There is additional handling for IOTLB and device TLB flush w.r.t. the
kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
per-domain basis; whereas device TLB flush is per device. Note that
IOTLBs are used even when devices are in pass-through mode. ATS is
enabled device-wide, but the device drivers can choose to manage ATS at
per PASID level whenever control is available.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c | 105 +++-
 drivers/iommu/intel/pasid.c |   7 +++
 include/linux/intel-iommu.h |   3 +-
 3 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 60253bc436bb..a2ef6b9e4bfc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct intel_iommu 
*iommu,
if (domain->default_pasid)
qi_flush_piotlb(iommu, did, domain->default_pasid,
addr, npages, ih);
-
+   if (domain->kernel_pasid && !domain_type_is_si(domain)) {
+   /*
+* REVISIT: we only do PASID IOTLB inval for FL, we could have 
SL
+* for PASID in the future such as vIOMMU PT. this doesn't get 
hit.
+*/
+   qi_flush_piotlb(iommu, did, domain->kernel_pasid,
+   addr, npages, ih);
+   }
if (!list_empty(>devices))
qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
 }
@@ -5695,6 +5702,100 @@ static void intel_iommu_iotlb_sync_map(struct 
iommu_domain *domain,
}
 }
 
+static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
+{
+   struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
+   struct device_domain_info *info;
+   unsigned long flags;
+   int ret = 0;
+
+   info = get_domain_info(dev);
+   if (!info)
+   return -ENODEV;
+
+   if (!dev_is_pci(dev) || !sm_supported(info->iommu))
+   return -EINVAL;
+
+   if (intel_iommu_enable_pasid(info->iommu, dev))
+   return -ENODEV;
+
+   spin_lock_irqsave(_domain_lock, flags);
+   spin_lock(>lock);
+   /*
+* Store PASID for IOTLB flush, but only needed for non-passthrough
+* unmap case. For passthrough, we only need to do IOTLB flush during
+* PASID teardown. Flush covers all devices in the same domain as the
+* domain ID is the same for the same SL.
+*/
+   info->domain->kernel_pasid = pasid;
+
+   /*
+* Tracks how many attached devices are using the kernel PASID. Clear
+* the domain kernel PASID when all users called disable_pasid_dma().
+*/
+   atomic_inc(>domain->kernel_pasid_user);
+
+   /*
+* Addressing modes (IOVA vs. PA) is a per device choice made by the
+* platform code. We must treat legacy DMA (request w/o PASID) and
+* DMA w/ PASID identially in terms of mapping. Here we just set up
+* the kernel PASID to match the mapping of RID2PASID/PASID0.
+*/
+   if (hw_pass_through && domain_type_is_si(info->domain)) {
+   ret = intel_pasid_setup_pass_through(info->iommu, info->domain,
+   dev, pasid);
+   if (ret)
+   dev_err(dev, "Failed kernel PASID %d in BYPASS", pasid);
+
+   } else if (domain_use_first_level(info->domain)) {
+   /* We are using FL for IOVA, this is the default option */
+   ret = domain_setup_first_level(info->iommu, info->domain, dev,
+  pasid);
+   if (ret)
+   dev_err(dev, "Failed kernel PASID %d IOVA FL", pasid);
+   } else {
+   ret = intel_pasid_setup_second_level(info->iommu, info->domain,
+dev, pasid);
+   if (ret)
+   dev_err(dev, "Failed kernel SPASID %d IOVA SL", pasid);
+   }
+
+   spin_unlock(>lock);
+   spin_unlock_irqrestore(_domain_lock, flags);
+
+   return ret;
+}
+
+static int intel_disable_pasid_dma(struct device *dev)
+{
+   struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
+   struct device_domain_info *info;
+   unsigned long

[PATCH 4/4] dmaengine: idxd: Use DMA API for in-kernel DMA with PASID

2021-12-07 Thread Jacob Pan
In-kernel DMA should be managed by DMA mapping API. The existing kernel
PASID support is based on the SVA machinery in SVA lib that is intended
for user process SVA. The binding between a kernel PASID and kernel
mapping has many flaws. See discussions in the link below.

This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform DMA
requests with PASID under the same mapping managed by DMA mapping API.
In addition, SVA-related bits for kernel DMA are removed. As a result,
DSA users shall use DMA mapping API to obtain DMA handles instead of
using kernel virtual addresses.

Link: https://lore.kernel.org/linux-iommu/20210511194726.gp1002...@nvidia.com/
Signed-off-by: Jacob Pan 
---
 .../admin-guide/kernel-parameters.txt |  6 --
 drivers/dma/Kconfig   | 10 
 drivers/dma/idxd/idxd.h   |  1 -
 drivers/dma/idxd/init.c   | 59 ++-
 drivers/dma/idxd/sysfs.c  |  7 ---
 5 files changed, 19 insertions(+), 64 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d4..fe73d02c62f3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1751,12 +1751,6 @@
In such case C2/C3 won't be used again.
idle=nomwait: Disable mwait for CPU C-states
 
-   idxd.sva=   [HW]
-   Format: 
-   Allow force disabling of Shared Virtual Memory (SVA)
-   support for the idxd driver. By default it is set to
-   true (1).
-
idxd.tc_override= [HW]
Format: 
Allow override of default traffic class configuration
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 6bcdb4e6a0d1..3b28bd720e7d 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -313,16 +313,6 @@ config INTEL_IDXD_COMPAT
 
  If unsure, say N.
 
-# Config symbol that collects all the dependencies that's necessary to
-# support shared virtual memory for the devices supported by idxd.
-config INTEL_IDXD_SVM
-   bool "Accelerator Shared Virtual Memory Support"
-   depends on INTEL_IDXD
-   depends on INTEL_IOMMU_SVM
-   depends on PCI_PRI
-   depends on PCI_PASID
-   depends on PCI_IOV
-
 config INTEL_IDXD_PERFMON
bool "Intel Data Accelerators performance monitor support"
depends on INTEL_IDXD
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 0cf8d3145870..3155e3a2d3ae 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -262,7 +262,6 @@ struct idxd_device {
struct idxd_wq **wqs;
struct idxd_engine **engines;
 
-   struct iommu_sva *sva;
unsigned int pasid;
 
int num_groups;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 7bf03f371ce1..44633f8113e2 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include "../dmaengine.h"
@@ -28,10 +29,6 @@ MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Intel Corporation");
 MODULE_IMPORT_NS(IDXD);
 
-static bool sva = true;
-module_param(sva, bool, 0644);
-MODULE_PARM_DESC(sva, "Toggle SVA support on/off");
-
 bool tc_override;
 module_param(tc_override, bool, 0644);
 MODULE_PARM_DESC(tc_override, "Override traffic class defaults");
@@ -530,36 +527,22 @@ static struct idxd_device *idxd_alloc(struct pci_dev 
*pdev, struct idxd_driver_d
 
 static int idxd_enable_system_pasid(struct idxd_device *idxd)
 {
-   int flags;
-   unsigned int pasid;
-   struct iommu_sva *sva;
-
-   flags = SVM_FLAG_SUPERVISOR_MODE;
-
-   sva = iommu_sva_bind_device(>pdev->dev, NULL, );
-   if (IS_ERR(sva)) {
-   dev_warn(>pdev->dev,
-"iommu sva bind failed: %ld\n", PTR_ERR(sva));
-   return PTR_ERR(sva);
-   }
+   u32 pasid;
 
-   pasid = iommu_sva_get_pasid(sva);
-   if (pasid == IOMMU_PASID_INVALID) {
-   iommu_sva_unbind_device(sva);
+   pasid = iommu_enable_pasid_dma(>pdev->dev);
+   if (pasid == INVALID_IOASID) {
+   dev_err(>pdev->dev, "No kernel DMA PASID\n");
return -ENODEV;
}
-
-   idxd->sva = sva;
idxd->pasid = pasid;
-   dev_dbg(>pdev->dev, "system pasid: %u\n", pasid);
+
return 0;
 }
 
 static void idxd_disable_system_pasid(struct idxd_device *idxd)
 {
-
-   iommu_sva_unbind_device(idxd->sva);
-   idxd->sva = NULL;
+   iommu_disable_pasid_dma(>pdev->dev);
+   idxd->pasid = 0;
 }
 
 static int idxd_probe(struct idxd_device

Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-19 Thread Jacob Pan
Hi Jason,

On Tue, 19 Oct 2021 13:57:47 -0300, Jason Gunthorpe  wrote:

> On Tue, Oct 19, 2021 at 09:57:34AM -0700, Jacob Pan wrote:
> > Hi Jason,
> > 
> > On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe 
> > wrote: 
> > > On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:
> > >   
> > > > >   Acquire from the xarray is
> > > > >rcu_lock()
> > > > >ioas = xa_load()
> > > > >if (ioas)
> > > > >   if (down_read_trylock(>destroying_lock))
> > > > 
> > > > all good suggestions, will refine accordingly. Here destroying_lock
> > > > is a rw_semaphore. right? Since down_read_trylock() accepts a
> > > > rwsem.
> > > 
> > > Yes, you probably need a sleeping lock
> > >   
> > I am not following why we want a sleeping lock inside RCU protected
> > section?  
> 
> trylock is not sleeping
Of course, thanks for clarifying.

> > For ioas, do we really care about the stale data to choose rw_lock vs
> > RCU? Destroying can be kfree_rcu?  
> 
> It needs a hard fence so things don't continue to use the IOS once it
> is destroyed.
I guess RCU can do that also perhaps can scale better?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 01/20] iommu/iommufd: Add /dev/iommu core

2021-10-19 Thread Jacob Pan
Hi Jason,

On Fri, 15 Oct 2021 08:18:07 -0300, Jason Gunthorpe  wrote:

> On Fri, Oct 15, 2021 at 09:18:06AM +, Liu, Yi L wrote:
> 
> > >   Acquire from the xarray is
> > >rcu_lock()
> > >ioas = xa_load()
> > >if (ioas)
> > >   if (down_read_trylock(>destroying_lock))  
> > 
> > all good suggestions, will refine accordingly. Here destroying_lock is a
> > rw_semaphore. right? Since down_read_trylock() accepts a rwsem.  
> 
> Yes, you probably need a sleeping lock
> 
I am not following why we want a sleeping lock inside RCU protected
section?

For ioas, do we really care about the stale data to choose rw_lock vs RCU?
Destroying can be kfree_rcu?
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   3   4   5   6   7   8   9   10   >