Re: [PATCH] iommu/dma: Fix iova map result check bug

2022-05-11 Thread Yong Wu via iommu
On Sat, 2022-05-07 at 16:52 +0800, yf.w...@mediatek.com wrote:
> From: Yunfei Wang 
> 
> The data type of the return value of the iommu_map_sg_atomic
> is ssize_t, but the data type of iova size is size_t,
> e.g. one is int while the other is unsigned int.
> 
> When iommu_map_sg_atomic return value is compared with iova size,
> it will force the signed int to be converted to unsigned int, if
> iova map fails and iommu_map_sg_atomic return error code is less
> than 0, then (ret < iova_len) is false, which will to cause not
> do free iova, and the master can still successfully get the iova
> of map fail, which is not expected.
> 
> Therefore, we need to check the return value of iommu_map_sg_atomic
> in two cases according to whether it is less than 0.
> 
> Fixes: ad8f36e4b6b1 ("iommu: return full error code from
> iommu_map_sg[_atomic]()")
> Signed-off-by: Yunfei Wang 
> Cc:  # 5.15.*
> ---
>  drivers/iommu/dma-iommu.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 09f6e1c0f9c0..2932281e93fc 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -776,6 +776,7 @@ static struct page
> **__iommu_dma_alloc_noncontiguous(struct device *dev,
>   unsigned int count, min_size, alloc_sizes = domain-
> >pgsize_bitmap;
>   struct page **pages;
>   dma_addr_t iova;
> + ssize_t ret;
>  
>   if (static_branch_unlikely(_deferred_attach_enabled) &&
>   iommu_deferred_attach(dev, domain))
> @@ -813,8 +814,8 @@ static struct page
> **__iommu_dma_alloc_noncontiguous(struct device *dev,
>   arch_dma_prep_coherent(sg_page(sg), sg-
> >length);
>   }
>  
> - if (iommu_map_sg_atomic(domain, iova, sgt->sgl, sgt-
> >orig_nents, ioprot)
> - < size)
> + ret = iommu_map_sg_atomic(domain, iova, sgt->sgl, sgt-
> >orig_nents, ioprot);
> + if (ret < 0 || ret < size)

if (IS_ERR_VALUE(ret) || ret < size) for readable?

>   goto out_free_sg;
>  
>   sgt->sgl->dma_address = iova;
> @@ -1209,7 +1210,7 @@ static int iommu_dma_map_sg(struct device *dev,
> struct scatterlist *sg,
>* implementation - it knows better than we do.
>*/
>   ret = iommu_map_sg_atomic(domain, iova, sg, nents, prot);
> - if (ret < iova_len)
> + if (ret < 0 || ret < iova_len)
>   goto out_free_iova;
>  
>   return __finalise_sg(dev, sg, nents, iova);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Tian, Kevin
> From: Baolu Lu 
> Sent: Thursday, May 12, 2022 1:17 PM
> 
> On 2022/5/12 13:01, Tian, Kevin wrote:
> >> From: Baolu Lu 
> >> Sent: Thursday, May 12, 2022 11:03 AM
> >>
> >> On 2022/5/11 22:53, Jason Gunthorpe wrote:
> > Also, given the current arrangement it might make sense to have a
> > struct iommu_domain_sva given that no driver is wrappering this in
> > something else.
>  Fair enough. How about below wrapper?
> 
>  +struct iommu_sva_domain {
>  +   /*
>  +* Common iommu domain header,*must*  be put at the top
>  +* of the structure.
>  +*/
>  +   struct iommu_domain domain;
>  +   struct mm_struct *mm;
>  +   struct iommu_sva bond;
>  +}
> 
>  The refcount is wrapped in bond.
> >>> I'm still not sure that bond is necessary
> >>
> >> "bond" is the sva handle that the device drivers get through calling
> >> iommu_sva_bind().
> >>
> >
> > 'bond' was required before because we didn't have a domain to wrap
> > the page table at that time.
> >
> > Now we have a domain and it is 1:1 associated to bond. Probably
> > make sense now by just returning the domain as the sva handle
> > instead?
> 
> It also includes the device information that the domain has been
> attached. So the sva_unbind() looks like this:
> 
> /**
>   * iommu_sva_unbind_device() - Remove a bond created with
> iommu_sva_bind_device
>   * @handle: the handle returned by iommu_sva_bind_device()
>   *
>   * Put reference to a bond between device and address space. The device
> should
>   * not be issuing any more transaction for this PASID. All outstanding page
>   * requests for this PASID must have been flushed to the IOMMU.
>   */
> void iommu_sva_unbind_device(struct iommu_sva *handle)
> 
> It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
> if we can include the device in the unbind() interface.

can we just have unbind(domain, device)?

> 
> Anyway, I'd expect to achieve all these in two steps:
> 
> - sva and iopf refactoring, only iommu internal changes;
> - sva interface refactoring, only interface changes.
> 
> Does above work?
> 
> Best regards,
> baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Baolu Lu

On 2022/5/12 13:01, Tian, Kevin wrote:

From: Baolu Lu 
Sent: Thursday, May 12, 2022 11:03 AM

On 2022/5/11 22:53, Jason Gunthorpe wrote:

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header,*must*  be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

I'm still not sure that bond is necessary


"bond" is the sva handle that the device drivers get through calling
iommu_sva_bind().



'bond' was required before because we didn't have a domain to wrap
the page table at that time.

Now we have a domain and it is 1:1 associated to bond. Probably
make sense now by just returning the domain as the sva handle
instead?


It also includes the device information that the domain has been
attached. So the sva_unbind() looks like this:

/**
 * iommu_sva_unbind_device() - Remove a bond created with 
iommu_sva_bind_device

 * @handle: the handle returned by iommu_sva_bind_device()
 *
 * Put reference to a bond between device and address space. The device 
should

 * not be issuing any more transaction for this PASID. All outstanding page
 * requests for this PASID must have been flushed to the IOMMU.
 */
void iommu_sva_unbind_device(struct iommu_sva *handle)

It's fine to replace the iommu_sva with iommu_sva_domain for sva handle,
if we can include the device in the unbind() interface.

Anyway, I'd expect to achieve all these in two steps:

- sva and iopf refactoring, only iommu internal changes;
- sva interface refactoring, only interface changes.

Does above work?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] vfio: Remove VFIO_TYPE1_NESTING_IOMMU

2022-05-11 Thread Tian, Kevin
> From: Jason Gunthorpe
> Sent: Wednesday, May 11, 2022 12:55 AM
> 
> This control causes the ARM SMMU drivers to choose a stage 2
> implementation for the IO pagetable (vs the stage 1 usual default),
> however this choice has no visible impact to the VFIO user. Further qemu
> never implemented this and no other userspace user is known.
> 
> The original description in commit f5c9ecebaf2a ("vfio/iommu_type1: add
> new VFIO_TYPE1_NESTING_IOMMU IOMMU type") suggested this was to
> "provide
> SMMU translation services to the guest operating system" however the rest
> of the API to set the guest table pointer for the stage 1 was never
> completed, or at least never upstreamed, rendering this part useless dead
> code.
> 
> Since the current patches to enable nested translation, aka userspace page
> tables, rely on iommufd and will not use the enable_nesting()
> iommu_domain_op, remove this infrastructure. However, don't cut too deep
> into the SMMU drivers for now expecting the iommufd work to pick it up -
> we still need to create S2 IO page tables.
> 
> Remove VFIO_TYPE1_NESTING_IOMMU and everything under it including the
> enable_nesting iommu_domain_op.
> 
> Just in-case there is some userspace using this continue to treat
> requesting it as a NOP, but do not advertise support any more.
> 
> Signed-off-by: Jason Gunthorpe 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 16 
>  drivers/iommu/iommu.c   | 10 --
>  drivers/vfio/vfio_iommu_type1.c | 12 +---
>  include/linux/iommu.h   |  3 ---
>  include/uapi/linux/vfio.h   |  2 +-
>  6 files changed, 2 insertions(+), 57 deletions(-)
> 
> It would probably make sense for this to go through the VFIO tree with
> Robin's
> ack for the SMMU changes.
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 627a3ed5ee8fd1..b901e8973bb4ea 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2724,21 +2724,6 @@ static struct iommu_group
> *arm_smmu_device_group(struct device *dev)
>   return group;
>  }
> 
> -static int arm_smmu_enable_nesting(struct iommu_domain *domain)
> -{
> - struct arm_smmu_domain *smmu_domain =
> to_smmu_domain(domain);
> - int ret = 0;
> -
> - mutex_lock(_domain->init_mutex);
> - if (smmu_domain->smmu)
> - ret = -EPERM;
> - else
> - smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> - mutex_unlock(_domain->init_mutex);
> -
> - return ret;
> -}
> -
>  static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args
> *args)
>  {
>   return iommu_fwspec_add_ids(dev, args->args, 1);
> @@ -2865,7 +2850,6 @@ static struct iommu_ops arm_smmu_ops = {
>   .flush_iotlb_all= arm_smmu_flush_iotlb_all,
>   .iotlb_sync = arm_smmu_iotlb_sync,
>   .iova_to_phys   = arm_smmu_iova_to_phys,
> - .enable_nesting = arm_smmu_enable_nesting,
>   .free   = arm_smmu_domain_free,
>   }
>  };
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index 568cce590ccc13..239e6f6585b48d 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1507,21 +1507,6 @@ static struct iommu_group
> *arm_smmu_device_group(struct device *dev)
>   return group;
>  }
> 
> -static int arm_smmu_enable_nesting(struct iommu_domain *domain)
> -{
> - struct arm_smmu_domain *smmu_domain =
> to_smmu_domain(domain);
> - int ret = 0;
> -
> - mutex_lock(_domain->init_mutex);
> - if (smmu_domain->smmu)
> - ret = -EPERM;
> - else
> - smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> - mutex_unlock(_domain->init_mutex);
> -
> - return ret;
> -}
> -
>  static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain,
>   unsigned long quirks)
>  {
> @@ -1600,7 +1585,6 @@ static struct iommu_ops arm_smmu_ops = {
>   .flush_iotlb_all= arm_smmu_flush_iotlb_all,
>   .iotlb_sync = arm_smmu_iotlb_sync,
>   .iova_to_phys   = arm_smmu_iova_to_phys,
> - .enable_nesting = arm_smmu_enable_nesting,
>   .set_pgtable_quirks = arm_smmu_set_pgtable_quirks,
>   .free   = arm_smmu_domain_free,
>   }
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 857d4c2fd1a206..f33c0d569a5d03 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2561,16 +2561,6 @@ static int __init iommu_init(void)
>  }
>  core_initcall(iommu_init);
> 
> -int iommu_enable_nesting(struct iommu_domain *domain)
> -{
> - if 

RE: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Tian, Kevin
> From: Baolu Lu 
> Sent: Thursday, May 12, 2022 11:03 AM
> 
> On 2022/5/11 22:53, Jason Gunthorpe wrote:
> >>> Also, given the current arrangement it might make sense to have a
> >>> struct iommu_domain_sva given that no driver is wrappering this in
> >>> something else.
> >> Fair enough. How about below wrapper?
> >>
> >> +struct iommu_sva_domain {
> >> +   /*
> >> +* Common iommu domain header,*must*  be put at the top
> >> +* of the structure.
> >> +*/
> >> +   struct iommu_domain domain;
> >> +   struct mm_struct *mm;
> >> +   struct iommu_sva bond;
> >> +}
> >>
> >> The refcount is wrapped in bond.
> > I'm still not sure that bond is necessary
> 
> "bond" is the sva handle that the device drivers get through calling
> iommu_sva_bind().
> 

'bond' was required before because we didn't have a domain to wrap
the page table at that time.

Now we have a domain and it is 1:1 associated to bond. Probably
make sense now by just returning the domain as the sva handle
instead?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 6/6] media: platform: s5p-mfc: use DMA_ATTR_LOW_ADDRESS

2022-05-11 Thread Ajay Kumar
From: Marek Szyprowski 

S5P-MFC driver relied on the way the ARM DMA-IOMMU glue code worked -
mainly it relied on the fact that the allocator used first-fit algorithm
and the first allocated buffer were at 0x0 DMA/IOVA address. This is not
true for the generic IOMMU-DMA glue code that will be used for ARM
architecture soon, so limit the dma_mask to size of the DMA window the
hardware can use and add the needed DMA attribute to force proper IOVA
allocation of the firmware buffer.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Ajay Kumar 
---
 drivers/media/platform/samsung/s5p-mfc/s5p_mfc.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/samsung/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/samsung/s5p-mfc/s5p_mfc.c
index 761341934925..15c9c2273561 100644
--- a/drivers/media/platform/samsung/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/samsung/s5p-mfc/s5p_mfc.c
@@ -1196,8 +1196,12 @@ static int s5p_mfc_configure_common_memory(struct 
s5p_mfc_dev *mfc_dev)
if (!mfc_dev->mem_bitmap)
return -ENOMEM;
 
-   mfc_dev->mem_virt = dma_alloc_coherent(dev, mem_size,
-  _dev->mem_base, GFP_KERNEL);
+   /* MFC v5 can access memory only via the 256M window */
+   if (exynos_is_iommu_available(dev) && !IS_MFCV6_PLUS(mfc_dev))
+   dma_set_mask_and_coherent(dev, SZ_256M - 1);
+
+   mfc_dev->mem_virt = dma_alloc_attrs(dev, mem_size, _dev->mem_base,
+   GFP_KERNEL, DMA_ATTR_LOW_ADDRESS);
if (!mfc_dev->mem_virt) {
bitmap_free(mfc_dev->mem_bitmap);
dev_err(dev, "failed to preallocate %ld MiB for the firmware 
and context buffers\n",
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 5/6] iommu: dma-iommu: add support for DMA_ATTR_LOW_ADDRESS

2022-05-11 Thread Ajay Kumar
From: Marek Szyprowski 

Implement support for the DMA_ATTR_LOW_ADDRESS DMA attribute. If it has
been set, call alloc_iova_first_fit() instead of the alloc_iova_fast() to
allocate the new IOVA from the beginning of the address space.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Ajay Kumar 
---
 drivers/iommu/dma-iommu.c | 50 +--
 1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index cb235b40303c..553c5b863e19 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -601,6 +601,18 @@ static int dma_info_to_prot(enum dma_data_direction dir, 
bool coherent,
 }
 
 #define DMA_ALLOC_IOVA_COHERENTBIT(0)
+#define DMA_ALLOC_IOVA_FIRST_FIT   BIT(1)
+
+static unsigned int dma_attrs_to_alloc_flags(unsigned long attrs, bool 
coherent)
+{
+   unsigned int flags = 0;
+
+   if (coherent)
+   flags |= DMA_ALLOC_IOVA_COHERENT;
+   if (attrs & DMA_ATTR_LOW_ADDRESS)
+   flags |= DMA_ALLOC_IOVA_FIRST_FIT;
+   return flags;
+}
 
 static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
struct device *dev, size_t size, unsigned int flags)
@@ -625,13 +637,23 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
dma_limit = min(dma_limit, (u64)domain->geometry.aperture_end);
 
/* Try to get PCI devices a SAC address */
-   if (dma_limit > DMA_BIT_MASK(32) && !iommu_dma_forcedac && 
dev_is_pci(dev))
-   iova = alloc_iova_fast(iovad, iova_len,
-  DMA_BIT_MASK(32) >> shift, false);
+   if (dma_limit > DMA_BIT_MASK(32) && !iommu_dma_forcedac && 
dev_is_pci(dev)) {
+   if (unlikely(flags & DMA_ALLOC_IOVA_FIRST_FIT))
+   iova = alloc_iova_first_fit(iovad, iova_len,
+   DMA_BIT_MASK(32) >> shift);
+   else
+   iova = alloc_iova_fast(iovad, iova_len,
+ DMA_BIT_MASK(32) >> shift, false);
+   }
 
-   if (iova == IOVA_BAD_ADDR)
-   iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
-  true);
+   if (iova == IOVA_BAD_ADDR) {
+   if (unlikely(flags & DMA_ALLOC_IOVA_FIRST_FIT))
+   iova = alloc_iova_first_fit(iovad, iova_len,
+   dma_limit >> shift);
+   else
+   iova = alloc_iova_fast(iovad, iova_len,
+   dma_limit >> shift, true);
+   }
 
if (iova != IOVA_BAD_ADDR)
return (dma_addr_t)iova << shift;
@@ -779,6 +801,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
struct iova_domain *iovad = >iovad;
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
+   unsigned int flags = dma_attrs_to_alloc_flags(attrs, true);
unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
struct page **pages;
dma_addr_t iova;
@@ -804,7 +827,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
return NULL;
 
size = iova_align(iovad, size);
-   iova = iommu_dma_alloc_iova(domain, dev, size, DMA_ALLOC_IOVA_COHERENT);
+   iova = iommu_dma_alloc_iova(domain, dev, size, flags);
if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
 
@@ -964,6 +987,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
int prot = dma_info_to_prot(dir, coherent, attrs);
+   unsigned int flags = dma_attrs_to_alloc_flags(attrs, false);
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
@@ -1005,7 +1029,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
arch_sync_dma_for_device(phys, size, dir);
 
-   iova = __iommu_dma_map(dev, phys, size, prot, 0);
+   iova = __iommu_dma_map(dev, phys, size, prot, flags);
if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
return iova;
@@ -1152,6 +1176,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iova_domain *iovad = >iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+   unsigned int flags = dma_attrs_to_alloc_flags(attrs, 

[PATCH V2 4/6] iommu: dma-iommu: refactor iommu_dma_alloc_iova()

2022-05-11 Thread Ajay Kumar
From: Marek Szyprowski 

Change the parameters passed to iommu_dma_alloc_iova(): the dma_limit can
be easily extracted from the parameters of the passed struct device, so
replace it with a flags parameter, which can later hold more information
about the way the IOVA allocator should do it job. While touching the
parameter list, move struct device to the second position to better match
the convention of the DMA-mapping related functions.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Ajay Kumar 
---
 drivers/iommu/dma-iommu.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 16218d6a0703..cb235b40303c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -600,12 +600,16 @@ static int dma_info_to_prot(enum dma_data_direction dir, 
bool coherent,
}
 }
 
+#define DMA_ALLOC_IOVA_COHERENTBIT(0)
+
 static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
-   size_t size, u64 dma_limit, struct device *dev)
+   struct device *dev, size_t size, unsigned int flags)
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
unsigned long shift, iova_len, iova = IOVA_BAD_ADDR;
+   u64 dma_limit = (flags & DMA_ALLOC_IOVA_COHERENT) ?
+   dev->coherent_dma_mask : dma_get_mask(dev);
 
if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
cookie->msi_iova += size;
@@ -675,7 +679,7 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot, u64 dma_mask)
+   size_t size, int prot, unsigned int flags)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -689,7 +693,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
 
size = iova_align(iovad, size + iova_off);
 
-   iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
+   iova = iommu_dma_alloc_iova(domain, dev, size, flags);
if (iova == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
@@ -800,7 +804,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
return NULL;
 
size = iova_align(iovad, size);
-   iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev);
+   iova = iommu_dma_alloc_iova(domain, dev, size, DMA_ALLOC_IOVA_COHERENT);
if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
 
@@ -963,7 +967,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
-   dma_addr_t iova, dma_mask = dma_get_mask(dev);
+   dma_addr_t iova;
 
/*
 * If both the physical buffer start address and size are
@@ -1001,7 +1005,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
arch_sync_dma_for_device(phys, size, dir);
 
-   iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
+   iova = __iommu_dma_map(dev, phys, size, prot, 0);
if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
return iova;
@@ -1205,7 +1209,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = s;
}
 
-   iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
+   iova = iommu_dma_alloc_iova(domain, dev, iova_len, 0);
if (iova == DMA_MAPPING_ERROR) {
ret = -ENOMEM;
goto out_restore_sg;
@@ -1264,8 +1268,7 @@ static dma_addr_t iommu_dma_map_resource(struct device 
*dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
return __iommu_dma_map(dev, phys, size,
-   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
-   dma_get_mask(dev));
+   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO, 0);
 }
 
 static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
@@ -1375,7 +1378,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return NULL;
 
*handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
-   dev->coherent_dma_mask);
+   DMA_ALLOC_IOVA_COHERENT);
if (*handle == DMA_MAPPING_ERROR) {
__iommu_dma_free(dev, size, cpu_addr);
return NULL;
@@ -1517,7 +1520,7 @@ 

[PATCH V2 3/6] iommu: iova: add support for 'first-fit' algorithm

2022-05-11 Thread Ajay Kumar
From: Marek Szyprowski 

Add support for the 'first-fit' allocation algorithm. It will be used for
the special case of implementing DMA_ATTR_LOW_ADDRESS, so this path
doesn't use IOVA cache.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Ajay Kumar 
---
 drivers/iommu/iova.c | 78 
 include/linux/iova.h |  2 ++
 2 files changed, 80 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index ae0fe0a6714e..89f9338f83a3 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -231,6 +231,59 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
return -ENOMEM;
 }
 
+static unsigned long
+__iova_get_aligned_start(unsigned long start, unsigned long size)
+{
+   unsigned long mask = __roundup_pow_of_two(size) - 1;
+
+   return (start + mask) & ~mask;
+}
+
+static int __alloc_and_insert_iova_range_forward(struct iova_domain *iovad,
+   unsigned long size, unsigned long limit_pfn,
+   struct iova *new)
+{
+   struct rb_node *curr;
+   unsigned long flags;
+   unsigned long start, limit;
+
+   spin_lock_irqsave(>iova_rbtree_lock, flags);
+
+   curr = rb_first(>rbroot);
+   limit = limit_pfn;
+   start = __iova_get_aligned_start(iovad->start_pfn, size);
+
+   while (curr) {
+   struct iova *curr_iova = rb_entry(curr, struct iova, node);
+   struct rb_node *next = rb_next(curr);
+
+   start = __iova_get_aligned_start(curr_iova->pfn_hi + 1, size);
+   if (next) {
+   struct iova *next_iova = rb_entry(next, struct iova, 
node);
+   limit = next_iova->pfn_lo - 1;
+   } else {
+   limit = limit_pfn;
+   }
+
+   if ((start + size) <= limit)
+   break;  /* found a free slot */
+   curr = next;
+   }
+
+   if (!curr && start + size > limit) {
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+   return -ENOMEM;
+   }
+
+   new->pfn_lo = start;
+   new->pfn_hi = new->pfn_lo + size - 1;
+   iova_insert_rbtree(>rbroot, new, curr);
+
+   spin_unlock_irqrestore(>iova_rbtree_lock, flags);
+
+   return 0;
+}
+
 static struct kmem_cache *iova_cache;
 static unsigned int iova_cache_users;
 static DEFINE_MUTEX(iova_cache_mutex);
@@ -420,6 +473,31 @@ free_iova(struct iova_domain *iovad, unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(free_iova);
 
+/**
+ * alloc_iova_first_fit - allocates an iova from the beginning of address space
+ * @iovad: - iova domain in question
+ * @size: - size of page frames to allocate
+ * @limit_pfn: - max limit address
+ * Returns a pfn the allocated iova starts at or IOVA_BAD_ADDR in the case
+ * of a failure.
+ */
+unsigned long
+alloc_iova_first_fit(struct iova_domain *iovad, unsigned long size,
+unsigned long limit_pfn)
+{
+   struct iova *new_iova = alloc_iova_mem();
+
+   if (!new_iova)
+   return IOVA_BAD_ADDR;
+
+   if (__alloc_and_insert_iova_range_forward(iovad, size, limit_pfn, 
new_iova)) {
+   free_iova_mem(new_iova);
+   return IOVA_BAD_ADDR;
+   }
+   return new_iova->pfn_lo;
+}
+EXPORT_SYMBOL_GPL(alloc_iova_first_fit);
+
 /**
  * alloc_iova_fast - allocates an iova from rcache
  * @iovad: - iova domain in question
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 46b5b10c532b..45ed6d41490a 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -89,6 +89,8 @@ void free_iova_fast(struct iova_domain *iovad, unsigned long 
pfn,
unsigned long size);
 unsigned long alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
  unsigned long limit_pfn, bool flush_rcache);
+unsigned long alloc_iova_first_fit(struct iova_domain *iovad, unsigned long 
size,
+  unsigned long limit_pfn);
 struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo,
unsigned long pfn_hi);
 void init_iova_domain(struct iova_domain *iovad, unsigned long granule,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 2/6] iommu: iova: properly handle 0 as a valid IOVA address

2022-05-11 Thread Ajay Kumar
From: Marek Szyprowski 

Zero is a valid DMA and IOVA address on many architectures, so adjust the
IOVA management code to properly handle it. A new value IOVA_BAD_ADDR
(~0UL) is introduced as a generic value for the error case. Adjust all
callers of the alloc_iova_fast() function for the new return value.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Ajay Kumar 
---
 drivers/iommu/dma-iommu.c | 16 +---
 drivers/iommu/iova.c  | 13 +
 include/linux/iova.h  |  1 +
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 1ca85d37eeab..16218d6a0703 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -605,7 +605,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
-   unsigned long shift, iova_len, iova = 0;
+   unsigned long shift, iova_len, iova = IOVA_BAD_ADDR;
 
if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
cookie->msi_iova += size;
@@ -625,11 +625,13 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
iova = alloc_iova_fast(iovad, iova_len,
   DMA_BIT_MASK(32) >> shift, false);
 
-   if (!iova)
+   if (iova == IOVA_BAD_ADDR)
iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
   true);
 
-   return (dma_addr_t)iova << shift;
+   if (iova != IOVA_BAD_ADDR)
+   return (dma_addr_t)iova << shift;
+   return DMA_MAPPING_ERROR;
 }
 
 static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
@@ -688,7 +690,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size = iova_align(iovad, size + iova_off);
 
iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
 
if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) {
@@ -799,7 +801,7 @@ static struct page **__iommu_dma_alloc_noncontiguous(struct 
device *dev,
 
size = iova_align(iovad, size);
iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
 
if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))
@@ -1204,7 +1206,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
}
 
iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
-   if (!iova) {
+   if (iova == DMA_MAPPING_ERROR) {
ret = -ENOMEM;
goto out_restore_sg;
}
@@ -1516,7 +1518,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
return NULL;
 
iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_free_page;
 
if (iommu_map(domain, iova, msi_addr, size, prot))
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..ae0fe0a6714e 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -429,6 +429,8 @@ EXPORT_SYMBOL_GPL(free_iova);
  * This function tries to satisfy an iova allocation from the rcache,
  * and falls back to regular allocation on failure. If regular allocation
  * fails too and the flush_rcache flag is set then the rcache will be flushed.
+ * Returns a pfn the allocated iova starts at or IOVA_BAD_ADDR in the case
+ * of a failure.
 */
 unsigned long
 alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
@@ -447,7 +449,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
size = roundup_pow_of_two(size);
 
iova_pfn = iova_rcache_get(iovad, size, limit_pfn + 1);
-   if (iova_pfn)
+   if (iova_pfn != IOVA_BAD_ADDR)
return iova_pfn;
 
 retry:
@@ -456,7 +458,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
unsigned int cpu;
 
if (!flush_rcache)
-   return 0;
+   return IOVA_BAD_ADDR;
 
/* Try replenishing IOVAs by flushing rcache. */
flush_rcache = false;
@@ -831,7 +833,7 @@ static unsigned long __iova_rcache_get(struct iova_rcache 
*rcache,
   unsigned long limit_pfn)
 {
struct iova_cpu_rcache *cpu_rcache;
-   unsigned long iova_pfn = 0;
+   unsigned long iova_pfn = IOVA_BAD_ADDR;
bool has_pfn = false;
unsigned long flags;
 
@@ -858,6 +860,9 @@ static unsigned long __iova_rcache_get(struct iova_rcache 
*rcache,
 
spin_unlock_irqrestore(_rcache->lock, flags);
 
+   if (!iova_pfn)
+  

[PATCH V2 1/6] dma-mapping: add DMA_ATTR_LOW_ADDRESS attribute

2022-05-11 Thread Ajay Kumar
From: Marek Szyprowski 

Some devices require to allocate a special buffer (usually for the
firmware) just at the beginning of the address space to ensure that all
further allocations can be expressed as a positive offset from that
special buffer. When IOMMU is used for managing the DMA address space,
such requirement can be easily fulfilled, simply by enforcing the
'first-fit' IOVA allocation algorithm.

This patch adds a DMA attribute for such case.

Signed-off-by: Marek Szyprowski 
Signed-off-by: Ajay Kumar 
---
 include/linux/dma-mapping.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..3cbdf857edd1 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -60,6 +60,12 @@
  * at least read-only at lesser-privileged levels).
  */
 #define DMA_ATTR_PRIVILEGED(1UL << 9)
+/*
+ * DMA_ATTR_LOW_ADDRESS: used to indicate that the buffer should be allocated
+ * at the lowest possible DMA address, usually just at the beginning of the
+ * DMA/IOVA address space ('first-fit' allocation algorithm).
+ */
+#define DMA_ATTR_LOW_ADDRESS   (1UL << 10)
 
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.  It can
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 0/6] IOMMU-DMA - support DMA_ATTR_LOW_ADDRESS attribute

2022-05-11 Thread Ajay Kumar
This patchset is a rebase of original patches from Marek Szyprowski:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg2321261.html

The patches have been rebased on Joro's IOMMU tree "next" branch:
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git

This patchset is needed to address the IOVA address dependency issue between
firmware buffers and other buffers in Samsung s5p-mfc driver.

There have been few discussions in the past on how to find a generic
soultion for this issue, ranging from adding an entirely new API to choose
IOVA window[1], to adding a DMA attribute DMA_ATTR_LOW_ADDRESS which handles
buffer allocation from lower address[2].
This is a continuation of initial work from Marek for approach [2].
Patches have been tested with latest version of Samsung s5p-mfc driver.

Changes since V1:
[PATCH V2 1/6]
- Rebase on latest tree.

[PATCH V2 2/6]
- Rebase on latest tree.
  Added a missing check for iova_pfn in __iova_rcache_get()
  Discard changes from drivers/iommu/intel/iommu.c which are not necessary

[PATCH V2 3/6]
- Rebase on latest tree.

[PATCH V2 4/6]
- Rebase on latest tree

[PATCH V2 5/6]
- Rebase on latest tree

[PATCH V2 6/6]
- Rebase on latest tree.

Marek Szyprowski (6):
  dma-mapping: add DMA_ATTR_LOW_ADDRESS attribute
  iommu: iova: properly handle 0 as a valid IOVA address
  iommu: iova: add support for 'first-fit' algorithm
  iommu: dma-iommu: refactor iommu_dma_alloc_iova()
  iommu: dma-iommu: add support for DMA_ATTR_LOW_ADDRESS
  media: platform: s5p-mfc: use DMA_ATTR_LOW_ADDRESS

References:
[1]
https://lore.kernel.org/linux-iommu/20200811054912.ga...@infradead.org/

[2]
https://lore.kernel.org/linux-mm/bff57cbe-2247-05e1-9059-d9c66d64c...@arm.com

 drivers/iommu/dma-iommu.c | 77 +++-
 drivers/iommu/iova.c  | 91 ++-
 .../media/platform/samsung/s5p-mfc/s5p_mfc.c  |  8 +-
 include/linux/dma-mapping.h   |  6 ++
 include/linux/iova.h  |  3 +
 5 files changed, 156 insertions(+), 29 deletions(-)


base-commit: faf93cfaadfaaff2a5c35d6301b45aa2f6e4ddb2
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Baolu Lu

On 2022/5/11 22:53, Jason Gunthorpe wrote:

Assuming we leave room for multi-device groups this logic should just
be

group = iommu_group_get(dev);
if (!group)
return -ENODEV;

mutex_lock(>mutex);
domain = xa_load(>pasid_array, mm->pasid);
if (!domain || domain->type != IOMMU_DOMAIN_SVA || domain->mm != mm)
domain = iommu_sva_alloc_domain(dev, mm);

?

Agreed. As a helper in iommu core, how about making it more generic like
below?

IDK, is there more users of this? AFAIK SVA is the only place that
will be auto-sharing?


The generic thing is that components, like SVA, want to fetch the
attached domain from the iommu core.




+   mutex_lock(>mutex);
+   domain = xa_load(>pasid_array, pasid);
+   if (domain && domain->type != type)
+   domain = NULL;
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   return domain;

This is bad locking, group->pasid_array values cannot be taken outside
the lock.


It's not iommu core, but SVA (or other feature components) that manage
the life cycle of a domain. The iommu core only provides a place to
store the domain pointer. The feature components are free to fetch their
domain pointers from iommu core as long as they are sure that the domain
is alive during use.




And stick the refcount in the sva_domain

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.

Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header,*must*  be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

I'm still not sure that bond is necessary


"bond" is the sva handle that the device drivers get through calling
iommu_sva_bind().



But yes, something like that


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Baolu Lu

On 2022/5/12 01:25, Jacob Pan wrote:

Hi Jason,

On Wed, 11 May 2022 14:00:25 -0300, Jason Gunthorpe  wrote:


On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote:

If not global, perhaps we could have a list of pasids (e.g. xarray)
attached to the device_domain_info. The TLB flush logic would just
go through the list w/o caring what the PASIDs are for. Does it
make sense to you?


Sort of, but we shouldn't duplicate xarrays - the group already has
this xarray - need to find some way to allow access to it from the
driver.
   

I am not following,  here are the PASIDs for devTLB flush which is per
device. Why group?


Because group is where the core code stores it.

I see, with singleton group. I guess I can let dma-iommu code call

iommu_attach_dma_pasid {
iommu_attach_device_pasid();
Then the PASID will be stored in the group xa.
The flush code can retrieve PASIDs from device_domain_info.device -> group
-> pasid_array.
Thanks for pointing it out, I missed the new pasid_array.



We could retrieve PASIDs from the device PASID table but xa would be
more efficient.
   

Are you suggesting the dma-iommu API should be called
iommu_set_dma_pasid instead of iommu_attach_dma_pasid?


No that API is Ok - the driver ops API should be 'set' not
attach/detach

Sounds good, this operation has little in common with
domain_ops.dev_attach_pasid() used by SVA domain. So I will add a
new domain_ops.dev_set_pasid()


What? No, their should only be one operation, 'dev_set_pasid' and it
is exactly the same as the SVA operation. It configures things so that
any existing translation on the PASID is removed and the PASID
translates according to the given domain.

SVA given domain or UNMANAGED given domain doesn't matter to the
higher level code. The driver should implement per-domain ops as
required to get the different behaviors.

Perhaps some code to clarify, we have
sva_domain_ops.dev_attach_pasid() = intel_svm_attach_dev_pasid;
default_domain_ops.dev_attach_pasid() = intel_iommu_attach_dev_pasid;


Yes, keep that structure
  

Consolidate pasid programming into dev_set_pasid() then called by both
intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?
  


I was only suggesting that really dev_attach_pasid() op is misnamed,
it should be called set_dev_pasid() and act like a set, not a paired
attach/detach - same as the non-PASID ops.


Got it. Perhaps another patch to rename, Baolu?


Yes. I can rename it in my sva series if others are also happy with this
naming.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/29] x86/apic/msi: Set the delivery mode individually for each IRQ

2022-05-11 Thread Ricardo Neri
On Fri, May 06, 2022 at 10:05:34PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> > There are no restrictions in hardware to set  MSI messages with its
> > own delivery mode.
> 
> "messages with its own" ? Plural/singular confusion.

Yes, this is not correct. It should have read "messages with their own..."

> 
> > Use the mode specified in the provided IRQ hardware
> > configuration data. Since most of the IRQs are configured to use the
> > delivery mode of the APIC driver in use (set in all of them to
> > APIC_DELIVERY_MODE_FIXED), the only functional changes are where
> > IRQs are configured to use a specific delivery mode.
> 
> This does not parse. There are no functional changes due to this patch
> and there is no point talking about functional changes in subsequent
> patches here.

I will remove this.

> 
> > Changing the utility function __irq_msi_compose_msg() takes care of
> > implementing the change in the in the local APIC, PCI-MSI, and DMAR-MSI
> 
> in the in the

Sorry! This is not correct.

> 
> > irq_chips.
> >
> > The IO-APIC irq_chip configures the entries in the interrupt redirection
> > table using the delivery mode specified in the corresponding MSI message.
> > Since the MSI message is composed by a higher irq_chip in the hierarchy,
> > it does not need to be updated.
> 
> The point is that updating __irq_msi_compose_msg() covers _all_ MSI
> consumers including IO-APIC.
> 
> I had to read that changelog 3 times to make sense of it. Something like
> this perhaps:
> 
>   "x86/apic/msi: Use the delivery mode from irq_cfg for message composition
> 
>irq_cfg provides a delivery mode for each interrupt. Use it instead
>of the hardcoded APIC_DELIVERY_MODE_FIXED. This allows to compose
>messages for NMI delivery mode which is required to implement a HPET
>based NMI watchdog.
> 
>No functional change as the default delivery mode is set to
>APIC_DELIVERY_MODE_FIXED."

Thank you for your help on the changelog! I will take your suggestion.

BR,
Ricardo
> 
> Thanks,
> 
> tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/29] x86/apic: Add irq_cfg::delivery_mode

2022-05-11 Thread Ricardo Neri
On Fri, May 06, 2022 at 09:53:54PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> > Currently, the delivery mode of all interrupts is set to the mode of the
> > APIC driver in use. There are no restrictions in hardware to configure the
> > delivery mode of each interrupt individually. Also, certain IRQs need
> > to be
> 
> s/IRQ/interrupt/ Changelogs can do without acronyms.

Sure. I will sanitize all the changelogs to remove acronyms.

> 
> > configured with a specific delivery mode (e.g., NMI).
> >
> > Add a new member, delivery_mode, to struct irq_cfg. Subsequent changesets
> > will update every irq_domain to set the delivery mode of each IRQ to that
> > specified in its irq_cfg data.
> >
> > To keep the current behavior, when allocating an IRQ in the root
> > domain
> 
> The root domain does not allocate an interrupt. The root domain
> allocates a vector for an interrupt. There is a very clear and technical
> destinction. Can you please be more careful about the wording?

I will review the wording in the changelogs.

> 
> > --- a/arch/x86/kernel/apic/vector.c
> > +++ b/arch/x86/kernel/apic/vector.c
> > @@ -567,6 +567,7 @@ static int x86_vector_alloc_irqs(struct irq_domain 
> > *domain, unsigned int virq,
> > irqd->chip_data = apicd;
> > irqd->hwirq = virq + i;
> > irqd_set_single_target(irqd);
> > +
> 
> Stray newline.

Sorry! I will remove it.
> 
> > /*
> >  * Prevent that any of these interrupts is invoked in
> >  * non interrupt context via e.g. generic_handle_irq()
> > @@ -577,6 +578,14 @@ static int x86_vector_alloc_irqs(struct irq_domain 
> > *domain, unsigned int virq,
> > /* Don't invoke affinity setter on deactivated interrupts */
> > irqd_set_affinity_on_activate(irqd);
> >  
> > +   /*
> > +* Initialize the delivery mode of this irq to match the
> 
> s/irq/interrupt/

I will make this change.

Thanks and BR,
Ricardo

> 
> > +* default delivery mode of the APIC. Children irq domains
> > +* may take the delivery mode from the individual irq
> > +* configuration rather than from the APIC driver.
> > +*/
> > +   apicd->hw_irq_cfg.delivery_mode = apic->delivery_mode;
> > +
> > /*
> >  * Legacy vectors are already assigned when the IOAPIC
> >  * takes them over. They stay on the same vector. This is
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 01/29] irq/matrix: Expose functions to allocate the best CPU for new vectors

2022-05-11 Thread Ricardo Neri
On Fri, May 06, 2022 at 09:48:28PM +0200, Thomas Gleixner wrote:
> Ricardo,

Thank you very much for your feedback Thomas! I am sorry for my late reply, I
had been out of office.

> 
> On Thu, May 05 2022 at 16:59, Ricardo Neri wrote:
> > Certain types of interrupts, such as NMI, do not have an associated vector.
> > They, however, target specific CPUs. Thus, when assigning the destination
> > CPU, it is beneficial to select the one with the lowest number of
> > vectors.
> 
> Why is that beneficial especially in the context of a NMI watchdog which
> then broadcasts the NMI to all other CPUs?

My intent was not the NMI watchdog specifically but potential use cases that do
not involve NMI broadcasts. If the NMI targets a single CPU, it is best to
select the CPU with the lowest vector allocation count.

> 
> That's wishful thinking perhaps, but I don't see any benefit at all.
> 
> > Prepend the functions matrix_find_best_cpu_managed() and
> > matrix_find_best_cpu_managed()
> 
> The same function prepended twice becomes two functions :)
> 

Sorry, I missed this.

> > with the irq_ prefix and expose them for
> > IRQ controllers to use when allocating and activating vector-less IRQs.
> 
> There is no such thing like a vectorless IRQ. NMIs have a vector. Can we
> please describe facts and not pulled out of thin air concepts which do
> not exist?

Thank you for the clarification. I see your point. I wrote this patch because
maskable interrupts and NMIs have different entry points. As you state,
however, the also have a vector.

I can drop this patch.

BR,
Ricardo

> 
> Thanks,
> 
> tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-11 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Thursday, May 12, 2022 12:32 AM
> 
> On Wed, May 11, 2022 at 03:15:22AM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Wednesday, May 11, 2022 3:00 AM
> > >
> > > On Tue, May 10, 2022 at 05:12:04PM +1000, David Gibson wrote:
> > > > Ok... here's a revised version of my proposal which I think addresses
> > > > your concerns and simplfies things.
> > > >
> > > > - No new operations, but IOAS_MAP gets some new flags (and
> IOAS_COPY
> > > >   will probably need matching changes)
> > > >
> > > > - By default the IOVA given to IOAS_MAP is a hint only, and the IOVA
> > > >   is chosen by the kernel within the aperture(s).  This is closer to
> > > >   how mmap() operates, and DPDK and similar shouldn't care about
> > > >   having specific IOVAs, even at the individual mapping level.
> > > >
> > > > - IOAS_MAP gets an IOMAP_FIXED flag, analagous to mmap()'s
> MAP_FIXED,
> > > >   for when you really do want to control the IOVA (qemu, maybe some
> > > >   special userspace driver cases)
> > >
> > > We already did both of these, the flag is called
> > > IOMMU_IOAS_MAP_FIXED_IOVA - if it is not specified then kernel will
> > > select the IOVA internally.
> > >
> > > > - ATTACH will fail if the new device would shrink the aperture to
> > > >   exclude any already established mappings (I assume this is already
> > > >   the case)
> > >
> > > Yes
> > >
> > > > - IOAS_MAP gets an IOMAP_RESERVE flag, which operates a bit like a
> > > >   PROT_NONE mmap().  It reserves that IOVA space, so other (non-
> FIXED)
> > > >   MAPs won't use it, but doesn't actually put anything into the IO
> > > >   pagetables.
> > > > - Like a regular mapping, ATTACHes that are incompatible with an
> > > >   IOMAP_RESERVEed region will fail
> > > > - An IOMAP_RESERVEed area can be overmapped with an
> IOMAP_FIXED
> > > >   mapping
> > >
> > > Yeah, this seems OK, I'm thinking a new API might make sense because
> > > you don't really want mmap replacement semantics but a permanent
> > > record of what IOVA must always be valid.
> > >
> > > IOMMU_IOA_REQUIRE_IOVA perhaps, similar signature to
> > > IOMMUFD_CMD_IOAS_IOVA_RANGES:
> > >
> > > struct iommu_ioas_require_iova {
> > > __u32 size;
> > > __u32 ioas_id;
> > > __u32 num_iovas;
> > > __u32 __reserved;
> > > struct iommu_required_iovas {
> > > __aligned_u64 start;
> > > __aligned_u64 last;
> > > } required_iovas[];
> > > };
> >
> > As a permanent record do we want to enforce that once the required
> > range list is set all FIXED and non-FIXED allocations must be within the
> > list of ranges?
> 
> No, I would just use this as a guarntee that going forward any
> get_ranges will always return ranges that cover the listed required
> ranges. Ie any narrowing of the ranges will be refused.
> 
> map/unmap should only be restricted to the get_ranges output.
> 
> Wouldn't burn CPU cycles to nanny userspace here.

fair enough.

> 
> > If yes we can take the end of the last range as the max size of the iova
> > address space to optimize the page table layout.
> 
> I think this API should not interact with the driver. Its only job is
> to prevent devices from attaching that would narrow the ranges.
> 
> If we also use it to adjust the aperture of the created iommu_domain
> then it looses its usefullness as guard since something like qemu
> would have to leave room for hotplug as well.
> 
> I suppose optimizing the created iommu_domains should be some other
> API, with a different set of ranges and the '# of bytes of IOVA' hint
> as well.

make sense.

> 
> > > > For (unoptimized) qemu it would be:
> > > >
> > > > 1. Create IOAS
> > > > 2. IOAS_MAP(IOMAP_FIXED|IOMAP_RESERVE) the valid IOVA regions of
> > > the
> > > >guest platform
> > > > 3. ATTACH devices (this will fail if they're not compatible with the
> > > >reserved IOVA regions)
> > > > 4. Boot the guest
> >
> > I suppose above is only the sample flow for PPC vIOMMU. For non-PPC
> > vIOMMUs regular mappings are required before booting the guest and
> > reservation might be done but not mandatory (at least not what current
> > Qemu vfio can afford as it simply replays valid ranges in the CPU address
> > space).
> 
> I think qemu can always do it, it feels like it would simplify error
> cases around aperture mismatches.
> 

It could, but require more changes in Qemu to define required ranges
in platform logic and then convey it from Qemu address space to VFIO.
I view it as an optimization hence not necessarily to be done immediately.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm

2022-05-11 Thread Stefano Stabellini
On Wed, 11 May 2022, Christoph Hellwig wrote:
> On Fri, Apr 29, 2022 at 04:15:38PM -0700, Stefano Stabellini wrote:
> > Great! Christoph you can go ahead and pick it up in your tree if you are
> > up for it.
> 
> The patch is in the dma-mapping for-next brancch now:
> 
> http://git.infradead.org/users/hch/dma-mapping.git/commitdiff/62cb1ca1654b57589c582efae2748159c74ee356
> 
> There were a few smaller merge conflicts with the swiotlb refactoring.
> I think everything is fine, but please take another look if possible.

Looks good to me
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 10:25:21AM -0700, Jacob Pan wrote:
> Hi Jason,
> 
> On Wed, 11 May 2022 14:00:25 -0300, Jason Gunthorpe  wrote:
> 
> > On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote:
> > > > > If not global, perhaps we could have a list of pasids (e.g. xarray)
> > > > > attached to the device_domain_info. The TLB flush logic would just
> > > > > go through the list w/o caring what the PASIDs are for. Does it
> > > > > make sense to you?
> > > > 
> > > > Sort of, but we shouldn't duplicate xarrays - the group already has
> > > > this xarray - need to find some way to allow access to it from the
> > > > driver.
> > > >   
> > > I am not following,  here are the PASIDs for devTLB flush which is per
> > > device. Why group?  
> > 
> > Because group is where the core code stores it.
> I see, with singleton group. I guess I can let dma-iommu code call
> 
> iommu_attach_dma_pasid {
>   iommu_attach_device_pasid();
> Then the PASID will be stored in the group xa.

Yes, again, the dma-iommu should not be any different from the normal
unmanaged path. At this point there is no longer any difference, we
should not invent new ones.

> The flush code can retrieve PASIDs from device_domain_info.device ->
> group -> pasid_array.  Thanks for pointing it out, I missed the new
> pasid_array.

Yes.. It seems inefficient to iterate over that xarray multiple times
on the flush hot path, but maybe there is little choice. Try to use
use the xas iterators under the xa_lock spinlock..

The challenge will be accessing the group xa in the first place, but
maybe the core code can gain a function call to return a pointer to
that XA or something..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jacob Pan
Hi Jason,

On Wed, 11 May 2022 14:00:25 -0300, Jason Gunthorpe  wrote:

> On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote:
> > > > If not global, perhaps we could have a list of pasids (e.g. xarray)
> > > > attached to the device_domain_info. The TLB flush logic would just
> > > > go through the list w/o caring what the PASIDs are for. Does it
> > > > make sense to you?
> > > 
> > > Sort of, but we shouldn't duplicate xarrays - the group already has
> > > this xarray - need to find some way to allow access to it from the
> > > driver.
> > >   
> > I am not following,  here are the PASIDs for devTLB flush which is per
> > device. Why group?  
> 
> Because group is where the core code stores it.
I see, with singleton group. I guess I can let dma-iommu code call

iommu_attach_dma_pasid {
iommu_attach_device_pasid();
Then the PASID will be stored in the group xa.
The flush code can retrieve PASIDs from device_domain_info.device -> group
-> pasid_array.
Thanks for pointing it out, I missed the new pasid_array.
> 
> > We could retrieve PASIDs from the device PASID table but xa would be
> > more efficient.
> >   
> > > > > > Are you suggesting the dma-iommu API should be called
> > > > > > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?  
> > > > > 
> > > > > No that API is Ok - the driver ops API should be 'set' not
> > > > > attach/detach   
> > > > Sounds good, this operation has little in common with
> > > > domain_ops.dev_attach_pasid() used by SVA domain. So I will add a
> > > > new domain_ops.dev_set_pasid()
> > > 
> > > What? No, their should only be one operation, 'dev_set_pasid' and it
> > > is exactly the same as the SVA operation. It configures things so that
> > > any existing translation on the PASID is removed and the PASID
> > > translates according to the given domain.
> > > 
> > > SVA given domain or UNMANAGED given domain doesn't matter to the
> > > higher level code. The driver should implement per-domain ops as
> > > required to get the different behaviors.  
> > Perhaps some code to clarify, we have
> > sva_domain_ops.dev_attach_pasid() = intel_svm_attach_dev_pasid;
> > default_domain_ops.dev_attach_pasid() = intel_iommu_attach_dev_pasid;  
> 
> Yes, keep that structure
>  
> > Consolidate pasid programming into dev_set_pasid() then called by both
> > intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?
> >  
> 
> I was only suggesting that really dev_attach_pasid() op is misnamed,
> it should be called set_dev_pasid() and act like a set, not a paired
> attach/detach - same as the non-PASID ops.
> 
Got it. Perhaps another patch to rename, Baolu?


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 10:02:16AM -0700, Jacob Pan wrote:
> > > If not global, perhaps we could have a list of pasids (e.g. xarray)
> > > attached to the device_domain_info. The TLB flush logic would just go
> > > through the list w/o caring what the PASIDs are for. Does it make sense
> > > to you?  
> > 
> > Sort of, but we shouldn't duplicate xarrays - the group already has
> > this xarray - need to find some way to allow access to it from the
> > driver.
> > 
> I am not following,  here are the PASIDs for devTLB flush which is per
> device. Why group?

Because group is where the core code stores it.

> We could retrieve PASIDs from the device PASID table but xa would be more
> efficient.
> 
> > > > > Are you suggesting the dma-iommu API should be called
> > > > > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?
> > > > 
> > > > No that API is Ok - the driver ops API should be 'set' not
> > > > attach/detach 
> > > Sounds good, this operation has little in common with
> > > domain_ops.dev_attach_pasid() used by SVA domain. So I will add a new
> > > domain_ops.dev_set_pasid()  
> > 
> > What? No, their should only be one operation, 'dev_set_pasid' and it
> > is exactly the same as the SVA operation. It configures things so that
> > any existing translation on the PASID is removed and the PASID
> > translates according to the given domain.
> > 
> > SVA given domain or UNMANAGED given domain doesn't matter to the
> > higher level code. The driver should implement per-domain ops as
> > required to get the different behaviors.
> Perhaps some code to clarify, we have
> sva_domain_ops.dev_attach_pasid() = intel_svm_attach_dev_pasid;
> default_domain_ops.dev_attach_pasid() = intel_iommu_attach_dev_pasid;

Yes, keep that structure
 
> Consolidate pasid programming into dev_set_pasid() then called by both
> intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?

I was only suggesting that really dev_attach_pasid() op is misnamed,
it should be called set_dev_pasid() and act like a set, not a paired
attach/detach - same as the non-PASID ops.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jacob Pan
Hi Jason,

On Wed, 11 May 2022 13:12:37 -0300, Jason Gunthorpe  wrote:

> On Wed, May 11, 2022 at 08:35:18AM -0700, Jacob Pan wrote:
> 
> > > Huh? The intel driver shares the same ops between UNMANAGED and DMA -
> > > and in general I do not think we should be putting special knowledge
> > > about the DMA domains in the drivers. Drivers should continue to treat
> > > them identically to UNMANAGED.
> > >   
> > OK, other than SVA domain, the rest domain types share the same default
> > ops. I agree that the default ops should be the same for UNMANAGED,
> > IDENTITY, and DMA domain types. Minor detail is that we need to treat
> > IDENTITY domain slightly different when it comes down to PASID entry
> > programming.  
> 
> I would be happy if IDENTITY had its own ops, if that makes sense
> 
I have tried to have its own ops but there are complications around
checking if a domain has ops. It would be a logic thing to clean up next.

> > If not global, perhaps we could have a list of pasids (e.g. xarray)
> > attached to the device_domain_info. The TLB flush logic would just go
> > through the list w/o caring what the PASIDs are for. Does it make sense
> > to you?  
> 
> Sort of, but we shouldn't duplicate xarrays - the group already has
> this xarray - need to find some way to allow access to it from the
> driver.
> 
I am not following,  here are the PASIDs for devTLB flush which is per
device. Why group?
We could retrieve PASIDs from the device PASID table but xa would be more
efficient.

> > > > Are you suggesting the dma-iommu API should be called
> > > > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?
> > > 
> > > No that API is Ok - the driver ops API should be 'set' not
> > > attach/detach 
> > Sounds good, this operation has little in common with
> > domain_ops.dev_attach_pasid() used by SVA domain. So I will add a new
> > domain_ops.dev_set_pasid()  
> 
> What? No, their should only be one operation, 'dev_set_pasid' and it
> is exactly the same as the SVA operation. It configures things so that
> any existing translation on the PASID is removed and the PASID
> translates according to the given domain.
> 
> SVA given domain or UNMANAGED given domain doesn't matter to the
> higher level code. The driver should implement per-domain ops as
> required to get the different behaviors.
Perhaps some code to clarify, we have
sva_domain_ops.dev_attach_pasid() = intel_svm_attach_dev_pasid;
default_domain_ops.dev_attach_pasid() = intel_iommu_attach_dev_pasid;

Consolidate pasid programming into dev_set_pasid() then called by both
intel_svm_attach_dev_pasid() and intel_iommu_attach_dev_pasid(), right?


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 11/12] iommufd: vfio container FD ioctl compatibility

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 03:15:22AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, May 11, 2022 3:00 AM
> > 
> > On Tue, May 10, 2022 at 05:12:04PM +1000, David Gibson wrote:
> > > Ok... here's a revised version of my proposal which I think addresses
> > > your concerns and simplfies things.
> > >
> > > - No new operations, but IOAS_MAP gets some new flags (and IOAS_COPY
> > >   will probably need matching changes)
> > >
> > > - By default the IOVA given to IOAS_MAP is a hint only, and the IOVA
> > >   is chosen by the kernel within the aperture(s).  This is closer to
> > >   how mmap() operates, and DPDK and similar shouldn't care about
> > >   having specific IOVAs, even at the individual mapping level.
> > >
> > > - IOAS_MAP gets an IOMAP_FIXED flag, analagous to mmap()'s MAP_FIXED,
> > >   for when you really do want to control the IOVA (qemu, maybe some
> > >   special userspace driver cases)
> > 
> > We already did both of these, the flag is called
> > IOMMU_IOAS_MAP_FIXED_IOVA - if it is not specified then kernel will
> > select the IOVA internally.
> > 
> > > - ATTACH will fail if the new device would shrink the aperture to
> > >   exclude any already established mappings (I assume this is already
> > >   the case)
> > 
> > Yes
> > 
> > > - IOAS_MAP gets an IOMAP_RESERVE flag, which operates a bit like a
> > >   PROT_NONE mmap().  It reserves that IOVA space, so other (non-FIXED)
> > >   MAPs won't use it, but doesn't actually put anything into the IO
> > >   pagetables.
> > > - Like a regular mapping, ATTACHes that are incompatible with an
> > >   IOMAP_RESERVEed region will fail
> > > - An IOMAP_RESERVEed area can be overmapped with an IOMAP_FIXED
> > >   mapping
> > 
> > Yeah, this seems OK, I'm thinking a new API might make sense because
> > you don't really want mmap replacement semantics but a permanent
> > record of what IOVA must always be valid.
> > 
> > IOMMU_IOA_REQUIRE_IOVA perhaps, similar signature to
> > IOMMUFD_CMD_IOAS_IOVA_RANGES:
> > 
> > struct iommu_ioas_require_iova {
> > __u32 size;
> > __u32 ioas_id;
> > __u32 num_iovas;
> > __u32 __reserved;
> > struct iommu_required_iovas {
> > __aligned_u64 start;
> > __aligned_u64 last;
> > } required_iovas[];
> > };
> 
> As a permanent record do we want to enforce that once the required
> range list is set all FIXED and non-FIXED allocations must be within the
> list of ranges?

No, I would just use this as a guarntee that going forward any
get_ranges will always return ranges that cover the listed required
ranges. Ie any narrowing of the ranges will be refused.

map/unmap should only be restricted to the get_ranges output.

Wouldn't burn CPU cycles to nanny userspace here.

> If yes we can take the end of the last range as the max size of the iova
> address space to optimize the page table layout.

I think this API should not interact with the driver. Its only job is
to prevent devices from attaching that would narrow the ranges.

If we also use it to adjust the aperture of the created iommu_domain
then it looses its usefullness as guard since something like qemu
would have to leave room for hotplug as well.

I suppose optimizing the created iommu_domains should be some other
API, with a different set of ranges and the '# of bytes of IOVA' hint
as well.

> > > For (unoptimized) qemu it would be:
> > >
> > > 1. Create IOAS
> > > 2. IOAS_MAP(IOMAP_FIXED|IOMAP_RESERVE) the valid IOVA regions of
> > the
> > >guest platform
> > > 3. ATTACH devices (this will fail if they're not compatible with the
> > >reserved IOVA regions)
> > > 4. Boot the guest
> 
> I suppose above is only the sample flow for PPC vIOMMU. For non-PPC
> vIOMMUs regular mappings are required before booting the guest and
> reservation might be done but not mandatory (at least not what current
> Qemu vfio can afford as it simply replays valid ranges in the CPU address
> space).

I think qemu can always do it, it feels like it would simplify error
cases around aperture mismatches.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 08:35:18AM -0700, Jacob Pan wrote:

> > Huh? The intel driver shares the same ops between UNMANAGED and DMA -
> > and in general I do not think we should be putting special knowledge
> > about the DMA domains in the drivers. Drivers should continue to treat
> > them identically to UNMANAGED.
> > 
> OK, other than SVA domain, the rest domain types share the same default ops.
> I agree that the default ops should be the same for UNMANAGED, IDENTITY, and
> DMA domain types. Minor detail is that we need to treat IDENTITY domain
> slightly different when it comes down to PASID entry programming.

I would be happy if IDENTITY had its own ops, if that makes sense

> If not global, perhaps we could have a list of pasids (e.g. xarray) attached
> to the device_domain_info. The TLB flush logic would just go through the
> list w/o caring what the PASIDs are for. Does it make sense to you?

Sort of, but we shouldn't duplicate xarrays - the group already has
this xarray - need to find some way to allow access to it from the
driver.

> > > Are you suggesting the dma-iommu API should be called
> > > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?  
> > 
> > No that API is Ok - the driver ops API should be 'set' not attach/detach
> > 
> Sounds good, this operation has little in common with
> domain_ops.dev_attach_pasid() used by SVA domain. So I will add a new
> domain_ops.dev_set_pasid()

What? No, their should only be one operation, 'dev_set_pasid' and it
is exactly the same as the SVA operation. It configures things so that
any existing translation on the PASID is removed and the PASID
translates according to the given domain.

SVA given domain or UNMANAGED given domain doesn't matter to the
higher level code. The driver should implement per-domain ops as
required to get the different behaviors.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] iommu/mediatek: Use dev_err_probe to mute probe_defer err log

2022-05-11 Thread Guenter Roeck via iommu
On Tue, May 10, 2022 at 11:49 PM Yong Wu  wrote:
>
> Mute the probe defer log:
>
> [2.654806] mtk-iommu 14018000.iommu: mm dts parse fail(-517).
> [2.656168] mtk-iommu 1c01f000.iommu: mm dts parse fail(-517).
>
> Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
> Signed-off-by: Yong Wu 

Reviewed-by: Guenter Roeck 

> ---
> The Fixes tag commit-id is from linux-next.
> ---
>  drivers/iommu/mtk_iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 71b2ace74cd6..0f6ec4a4d9d4 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -1198,7 +1198,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
> if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
> ret = mtk_iommu_mm_dts_parse(dev, , data);
> if (ret) {
> -   dev_err(dev, "mm dts parse fail(%d).", ret);
> +   dev_err_probe(dev, ret, "mm dts parse fail.");
> goto out_runtime_disable;
> }
> } else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) &&
> --
> 2.18.0
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jacob Pan
Hi Jason,

On Wed, 11 May 2022 08:54:27 -0300, Jason Gunthorpe  wrote:

> On Tue, May 10, 2022 at 05:23:09PM -0700, Jacob Pan wrote:
> 
> > > > diff --git a/include/linux/intel-iommu.h
> > > > b/include/linux/intel-iommu.h index 5af24befc9f1..55845a8c4f4d
> > > > 100644 +++ b/include/linux/intel-iommu.h
> > > > @@ -627,6 +627,7 @@ struct device_domain_info {
> > > > struct intel_iommu *iommu; /* IOMMU used by this device */
> > > > struct dmar_domain *domain; /* pointer to domain */
> > > > struct pasid_table *pasid_table; /* pasid table */
> > > > +   ioasid_t pasid; /* DMA request with PASID */
> > > 
> > > And this seems wrong - the DMA API is not the only user of
> > > attach_dev_pasid, so there should not be any global pasid for the
> > > device.
> > >   
> > True but the attach_dev_pasid() op is domain type specific. i.e. DMA API
> > has its own attach_dev_pasid which is different than sva domain
> > attach_dev_pasid().  
> 
> Huh? The intel driver shares the same ops between UNMANAGED and DMA -
> and in general I do not think we should be putting special knowledge
> about the DMA domains in the drivers. Drivers should continue to treat
> them identically to UNMANAGED.
> 
OK, other than SVA domain, the rest domain types share the same default ops.
I agree that the default ops should be the same for UNMANAGED, IDENTITY, and
DMA domain types. Minor detail is that we need to treat IDENTITY domain
slightly different when it comes down to PASID entry programming.

If not global, perhaps we could have a list of pasids (e.g. xarray) attached
to the device_domain_info. The TLB flush logic would just go through the
list w/o caring what the PASIDs are for. Does it make sense to you?

> > device_domain_info is only used by DMA API.  
> 
> Huh?
My mistake, i meant the device_domain_info.pasid is only used by DMA API

>  
> > > I suspect this should be a counter of # of pasid domains attached so
> > > that the special flush logic triggers
> > >   
> > This field is only used for devTLB, so it is per domain-device. struct
> > device_domain_info is allocated per device-domain as well. Sorry, I
> > might have totally missed your point.  
> 
> You can't store a single pasid in the driver like this, since the only
> thing it does is trigger the flush logic just count how many pasids
> are used by the device-domain and trigger pasid flush if any pasids
> are attached
> 
Got it, will put the pasids in an xa as described above.

> > > And rely on the core code to worry about assigning only one domain per
> > > pasid - this should really be a 'set' function.  
> >
> > Yes, in this set the core code (in dma-iommu.c) only assign one PASID
> > per DMA domain type.
> > 
> > Are you suggesting the dma-iommu API should be called
> > iommu_set_dma_pasid instead of iommu_attach_dma_pasid?  
> 
> No that API is Ok - the driver ops API should be 'set' not attach/detach
> 
Sounds good, this operation has little in common with
domain_ops.dev_attach_pasid() used by SVA domain. So I will add a new
domain_ops.dev_set_pasid()


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] iommu/mediatek: Use dev_err_probe to mute probe_defer err log

2022-05-11 Thread AngeloGioacchino Del Regno

Il 11/05/22 08:49, Yong Wu ha scritto:

Mute the probe defer log:

[2.654806] mtk-iommu 14018000.iommu: mm dts parse fail(-517).
[2.656168] mtk-iommu 1c01f000.iommu: mm dts parse fail(-517).

Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
Signed-off-by: Yong Wu 


Reviewed-by: AngeloGioacchino Del Regno 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu/mediatek: Validate number of phandles associated with "mediatek,larbs"

2022-05-11 Thread AngeloGioacchino Del Regno

Il 11/05/22 08:49, Yong Wu ha scritto:

From: Guenter Roeck 

Fix the smatch warnings:
drivers/iommu/mtk_iommu.c:878 mtk_iommu_mm_dts_parse() error: uninitialized
symbol 'larbnode'.

If someone abuse the dtsi node(Don't follow the definition of dt-binding),
for example "mediatek,larbs" is provided as boolean property, the code may
crash. To fix this problem and improve the code safety, add some checking
for the invalid input from dtsi, e.g. checking the larb_nr/larbid valid
range, and avoid "mediatek,larb-id" property conflicts in the smi-larb
nodes.

Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
Signed-off-by: Guenter Roeck 
Signed-off-by: Yong Wu 


Reviewed-by: AngeloGioacchino Del Regno 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 03:21:31PM +0800, Baolu Lu wrote:
> On 2022/5/10 23:23, Jason Gunthorpe wrote:
> > On Tue, May 10, 2022 at 02:17:34PM +0800, Lu Baolu wrote:
> > 
> > > +/**
> > > + * iommu_sva_bind_device() - Bind a process address space to a device
> > > + * @dev: the device
> > > + * @mm: the mm to bind, caller must hold a reference to mm_users
> > > + * @drvdata: opaque data pointer to pass to bind callback
> > > + *
> > > + * Create a bond between device and address space, allowing the device 
> > > to access
> > > + * the mm using the returned PASID. If a bond already exists between 
> > > @device and
> > > + * @mm, it is returned and an additional reference is taken. Caller must 
> > > call
> > > + * iommu_sva_unbind_device() to release each reference.
> > > + *
> > > + * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called 
> > > first, to
> > > + * initialize the required SVA features.
> > > + *
> > > + * On error, returns an ERR_PTR value.
> > > + */
> > > +struct iommu_sva *
> > > +iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void 
> > > *drvdata)
> > > +{
> > > + int ret = -EINVAL;
> > > + struct iommu_sva *handle;
> > > + struct iommu_domain *domain;
> > > +
> > > + /*
> > > +  * TODO: Remove the drvdata parameter after kernel PASID support is
> > > +  * enabled for the idxd driver.
> > > +  */
> > > + if (drvdata)
> > > + return ERR_PTR(-EOPNOTSUPP);
> > 
> > Why is this being left behind? Clean up the callers too please.
> 
> Okay, let me try to.
> 
> > 
> > > + /* Allocate mm->pasid if necessary. */
> > > + ret = iommu_sva_alloc_pasid(mm, 1, (1U << dev->iommu->pasid_bits) - 1);
> > > + if (ret)
> > > + return ERR_PTR(ret);
> > > +
> > > + mutex_lock(_sva_lock);
> > > + /* Search for an existing bond. */
> > > + handle = xa_load(>iommu->sva_bonds, mm->pasid);
> > > + if (handle) {
> > > + refcount_inc(>users);
> > > + goto out_success;
> > > + }
> > 
> > How can there be an existing bond?
> > 
> > dev->iommu is per-device
> > 
> > The device_group_immutable_singleton() insists on a single device
> > group
> > 
> > Basically 'sva_bonds' is the same thing as the group->pasid_array.
> 
> Yes, really.
> 
> > 
> > Assuming we leave room for multi-device groups this logic should just
> > be
> > 
> > group = iommu_group_get(dev);
> > if (!group)
> > return -ENODEV;
> > 
> > mutex_lock(>mutex);
> > domain = xa_load(>pasid_array, mm->pasid);
> > if (!domain || domain->type != IOMMU_DOMAIN_SVA || domain->mm != mm)
> > domain = iommu_sva_alloc_domain(dev, mm);
> > 
> > ?
> 
> Agreed. As a helper in iommu core, how about making it more generic like
> below?

IDK, is there more users of this? AFAIK SVA is the only place that
will be auto-sharing?

> +   mutex_lock(>mutex);
> +   domain = xa_load(>pasid_array, pasid);
> +   if (domain && domain->type != type)
> +   domain = NULL;
> +   mutex_unlock(>mutex);
> +   iommu_group_put(group);
> +
> +   return domain;

This is bad locking, group->pasid_array values cannot be taken outside
the lock.

> > And stick the refcount in the sva_domain
> > 
> > Also, given the current arrangement it might make sense to have a
> > struct iommu_domain_sva given that no driver is wrappering this in
> > something else.
> 
> Fair enough. How about below wrapper?
> 
> +struct iommu_sva_domain {
> +   /*
> +* Common iommu domain header, *must* be put at the top
> +* of the structure.
> +*/
> +   struct iommu_domain domain;
> +   struct mm_struct *mm;
> +   struct iommu_sva bond;
> +}
>
> The refcount is wrapped in bond.

I'm still not sure that bond is necessary

But yes, something like that

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm

2022-05-11 Thread Christoph Hellwig
On Fri, Apr 29, 2022 at 04:15:38PM -0700, Stefano Stabellini wrote:
> Great! Christoph you can go ahead and pick it up in your tree if you are
> up for it.

The patch is in the dma-mapping for-next brancch now:

http://git.infradead.org/users/hch/dma-mapping.git/commitdiff/62cb1ca1654b57589c582efae2748159c74ee356

There were a few smaller merge conflicts with the swiotlb refactoring.
I think everything is fine, but please take another look if possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] swiotlb: use the right nslabs-derived sizes in swiotlb_init_late

2022-05-11 Thread Christoph Hellwig
nslabs can shrink when allocations or the remap don't succeed, so make
sure to use it for all sizing.  For that remove the bytes value that
can get stale and replace it with local calculations and a boolean to
indicate if the originally requested size could not be allocated.

Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and 
swiotlb_init_late_with_tbl")
Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 113e1e8aaca37..d6e62a6a42ceb 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -297,9 +297,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 {
struct io_tlb_mem *mem = _tlb_default_mem;
unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
-   unsigned long bytes;
unsigned char *vstart = NULL;
unsigned int order;
+   bool retried = false;
int rc = 0;
 
if (swiotlb_force_disable)
@@ -308,7 +308,6 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 retry:
order = get_order(nslabs << IO_TLB_SHIFT);
nslabs = SLABS_PER_PAGE << order;
-   bytes = nslabs << IO_TLB_SHIFT;
 
while ((SLABS_PER_PAGE << order) > IO_TLB_MIN_SLABS) {
vstart = (void *)__get_free_pages(gfp_mask | __GFP_NOWARN,
@@ -316,16 +315,13 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
if (vstart)
break;
order--;
+   nslabs = SLABS_PER_PAGE << order;
+   retried = true;
}
 
if (!vstart)
return -ENOMEM;
 
-   if (order != get_order(bytes)) {
-   pr_warn("only able to allocate %ld MB\n",
-   (PAGE_SIZE << order) >> 20);
-   nslabs = SLABS_PER_PAGE << order;
-   }
if (remap)
rc = remap(vstart, nslabs);
if (rc) {
@@ -334,9 +330,15 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
if (nslabs < IO_TLB_MIN_SLABS)
return rc;
+   retried = true;
goto retry;
}
 
+   if (retried) {
+   pr_warn("only able to allocate %ld MB\n",
+   (PAGE_SIZE << order) >> 20);
+   }
+
mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
get_order(array_size(sizeof(*mem->slots), nslabs)));
if (!mem->slots) {
@@ -344,7 +346,8 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
return -ENOMEM;
}
 
-   set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT);
+   set_memory_decrypted((unsigned long)vstart,
+(nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
 
swiotlb_print_info();
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] swiotlb: use the right nslabs value in swiotlb_init_remap

2022-05-11 Thread Christoph Hellwig
default_nslabs should only be used to initialize nslabs, after that we
need to use the local variable that can shrink when allocations or the
remap don't succeed.

Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and 
swiotlb_init_late_with_tbl")
Signed-off-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 3e992a308c8a1..113e1e8aaca37 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -234,7 +234,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 {
struct io_tlb_mem *mem = _tlb_default_mem;
unsigned long nslabs = default_nslabs;
-   size_t alloc_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), nslabs));
+   size_t alloc_size;
size_t bytes;
void *tlb;
 
@@ -249,7 +249,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 * memory encryption.
 */
 retry:
-   bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
+   bytes = PAGE_ALIGN(nslabs << IO_TLB_SHIFT);
if (flags & SWIOTLB_ANY)
tlb = memblock_alloc(bytes, PAGE_SIZE);
else
@@ -269,12 +269,13 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
goto retry;
}
 
+   alloc_size = PAGE_ALIGN(array_size(sizeof(*mem->slots), nslabs));
mem->slots = memblock_alloc(alloc_size, PAGE_SIZE);
if (!mem->slots)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
 
-   swiotlb_init_io_tlb_mem(mem, __pa(tlb), default_nslabs, false);
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
mem->force_bounce = flags & SWIOTLB_FORCE;
 
if (flags & SWIOTLB_VERBOSE)
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] swiotlb: don't panic when the swiotlb buffer can't be allocated

2022-05-11 Thread Christoph Hellwig
For historical reasons the switlb code paniced when the metadata could
not be allocated, but just printed a warning when the actual main
swiotlb buffer could not be allocated.  Restore this somewhat unexpected
behavior as changing it caused a boot failure on the Microchip RISC-V
PolarFire SoC Icicle kit.

Fixes: 6424e31b1c05 ("swiotlb: remove swiotlb_init_with_tbl and 
swiotlb_init_late_with_tbl")
Reported-by: Conor Dooley 
Signed-off-by: Christoph Hellwig 
Tested-by: Conor Dooley 
---
 kernel/dma/swiotlb.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e2ef0864eb1e5..3e992a308c8a1 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -254,8 +254,10 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
tlb = memblock_alloc(bytes, PAGE_SIZE);
else
tlb = memblock_alloc_low(bytes, PAGE_SIZE);
-   if (!tlb)
-   panic("%s: failed to allocate tlb structure\n", __func__);
+   if (!tlb) {
+   pr_warn("%s: failed to allocate tlb structure\n", __func__);
+   return;
+   }
 
if (remap && remap(tlb, nslabs) < 0) {
memblock_free(tlb, PAGE_ALIGN(bytes));
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


swiotlb regression fixe

2022-05-11 Thread Christoph Hellwig
Hi all,

attached are a bunch of fixes for regressions in the recent swiotlb
refactoring.  The first one was reported by Conor, and the other two
are things I found by code inspections while trying to fix what he
reported.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 10/12] iommufd: Add kAPI toward external drivers

2022-05-11 Thread Yi Liu



On 2022/3/19 01:27, Jason Gunthorpe wrote:


+
+/**
+ * iommufd_device_attach - Connect a device to an iommu_domain
+ * @idev: device to attach
+ * @pt_id: Input a IOMMUFD_OBJ_IOAS, or IOMMUFD_OBJ_HW_PAGETABLE
+ * Output the IOMMUFD_OBJ_HW_PAGETABLE ID
+ * @flags: Optional flags
+ *
+ * This connects the device to an iommu_domain, either automatically or 
manually
+ * selected. Once this completes the device could do DMA.
+ *
+ * The caller should return the resulting pt_id back to userspace.
+ * This function is undone by calling iommufd_device_detach().
+ */
+int iommufd_device_attach(struct iommufd_device *idev, u32 *pt_id,
+ unsigned int flags)
+{
+   struct iommufd_hw_pagetable *hwpt;
+   int rc;
+
+   refcount_inc(>obj.users);
+
+   hwpt = iommufd_hw_pagetable_from_id(idev->ictx, *pt_id, idev->dev);
+   if (IS_ERR(hwpt)) {
+   rc = PTR_ERR(hwpt);
+   goto out_users;
+   }
+
+   mutex_lock(>devices_lock);
+   /* FIXME: Use a device-centric iommu api. For now check if the
+* hw_pagetable already has a device of the same group joined to tell if
+* we are the first and need to attach the group. */
+   if (!iommufd_hw_pagetable_has_group(hwpt, idev->group)) {
+   phys_addr_t sw_msi_start = 0;
+
+   rc = iommu_attach_group(hwpt->domain, idev->group);
+   if (rc)
+   goto out_unlock;
+
+   /*
+* hwpt is now the exclusive owner of the group so this is the
+* first time enforce is called for this group.
+*/
+   rc = iopt_table_enforce_group_resv_regions(
+   >ioas->iopt, idev->group, _msi_start);
+   if (rc)
+   goto out_detach;
+   rc = iommufd_device_setup_msi(idev, hwpt, sw_msi_start, flags);
+   if (rc)
+   goto out_iova;
+   }
+
+   idev->hwpt = hwpt;


could the below list_empty check be moved to the above "if branch"? If
above "if branch" is false, that means there is already group attached with
the hwpt->domain. So the hwpt->devices should be non-empty. Only if the 
above "if branch" is true, should the hwpt->devices possible to be empty.

So moving it into above "if branch" may be better?


+   if (list_empty(>devices)) {
+   rc = iopt_table_add_domain(>ioas->iopt, hwpt->domain);
+   if (rc)
+   goto out_iova;
+   }
+   list_add(>devices_item, >devices);
+   mutex_unlock(>devices_lock);
+
+   *pt_id = idev->hwpt->obj.id;
+   return 0;
+
+out_iova:
+   iopt_remove_reserved_iova(>ioas->iopt, idev->group);
+out_detach:
+   iommu_detach_group(hwpt->domain, idev->group);
+out_unlock:
+   mutex_unlock(>devices_lock);
+   iommufd_hw_pagetable_put(idev->ictx, hwpt);
+out_users:
+   refcount_dec(>obj.users);
+   return rc;
+}
+EXPORT_SYMBOL_GPL(iommufd_device_attach);


--
Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 08:54:39AM +0100, Jean-Philippe Brucker wrote:
> > > > Then 'detach pasid' is:
> > > >
> > > > iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev,
> > > pasid);
> > > >
> > > > And we move away from the notion of 'detach' and in the direction that
> > > > everything continuously has a domain set. PASID would logically
> > > > default to blocking_domain, though we wouldn't track this anywhere.
> > > 
> > > I am not sure whether we still need to keep the blocking domain concept
> > > when we are entering the new PASID world. Please allow me to wait and
> > > listen to more opinions.
> > > 
> > 
> > I'm with Jason on this direction. In concept after a PASID is detached it's
> > essentially blocked. Implementation-wise it doesn't prevent the iommu
> > driver from marking the PASID entry as non-present as doing in this
> > series instead of actually pointing to the empty page table of the block
> > domain. But api-wise it does make the entire semantics more consistent.
> 
> This is all internal to IOMMU so I don't think we should be concerned
> about API consistency. I prefer a straighforward detach() operation
> because that way IOMMU drivers don't have to keep track of which domain is
> attached to which PASID. That code can be factored into the IOMMU core.

Why would a driver need to keep additional tracking?

> In addition to clearing contexts, detach() also needs to invalidate TLBs,
> and for that the SMMU driver needs to know the old ASID (!= PASID) that
> was used by the context descriptor. We can certainly work around a missing
> detach() to implement this, but it will be convoluted.

It is not "missing" it is just renamed to blocking_domain->ops->set_dev_pasid()

The implementation of that function would be identical to
detach_dev_pasid.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/12] iommu: Add pasid_bits field in struct dev_iommu

2022-05-11 Thread Jason Gunthorpe via iommu
On Wed, May 11, 2022 at 09:00:50AM +0100, Jean-Philippe Brucker wrote:

> > /**
> >  * struct iommu_device - IOMMU core representation of one IOMMU hardware
> >  *   instance
> >  * @list: Used by the iommu-core to keep a list of registered iommus
> >  * @ops: iommu-ops for talking to this iommu
> >  * @dev: struct device for sysfs handling
> >  */
> > struct iommu_device {
> > struct list_head list;
> > const struct iommu_ops *ops;
> > struct fwnode_handle *fwnode;
> > struct device *dev;
> > };
> > 
> > I haven't checked ARM code yet, but it works for x86 as far as I can
> > see.
> 
> Arm also supports non-PCI PASID by reading a firmware property:
> 
> device_property_read_u32(dev, "pasid-num-bits", >ssid_bits);
> 
> should be the only difference

That is not "ARM" that is generic DT/ACPI for platform devices and
should be handled by the core code in the same place it does PCI
discovery.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/4] iommu/vt-d: Implement domain ops for attach_dev_pasid

2022-05-11 Thread Jason Gunthorpe via iommu
On Tue, May 10, 2022 at 05:23:09PM -0700, Jacob Pan wrote:

> > > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > > index 5af24befc9f1..55845a8c4f4d 100644
> > > +++ b/include/linux/intel-iommu.h
> > > @@ -627,6 +627,7 @@ struct device_domain_info {
> > >   struct intel_iommu *iommu; /* IOMMU used by this device */
> > >   struct dmar_domain *domain; /* pointer to domain */
> > >   struct pasid_table *pasid_table; /* pasid table */
> > > + ioasid_t pasid; /* DMA request with PASID */  
> > 
> > And this seems wrong - the DMA API is not the only user of
> > attach_dev_pasid, so there should not be any global pasid for the
> > device.
> > 
> True but the attach_dev_pasid() op is domain type specific. i.e. DMA API
> has its own attach_dev_pasid which is different than sva domain
> attach_dev_pasid().

Huh? The intel driver shares the same ops between UNMANAGED and DMA -
and in general I do not think we should be putting special knowledge
about the DMA domains in the drivers. Drivers should continue to treat
them identically to UNMANAGED.

> device_domain_info is only used by DMA API.

Huh?
 
> > I suspect this should be a counter of # of pasid domains attached so
> > that the special flush logic triggers
> > 
> This field is only used for devTLB, so it is per domain-device. struct
> device_domain_info is allocated per device-domain as well. Sorry, I might
> have totally missed your point.

You can't store a single pasid in the driver like this, since the only
thing it does is trigger the flush logic just count how many pasids
are used by the device-domain and trigger pasid flush if any pasids
are attached

> > And rely on the core code to worry about assigning only one domain per
> > pasid - this should really be a 'set' function.
>
> Yes, in this set the core code (in dma-iommu.c) only assign one PASID per
> DMA domain type.
> 
> Are you suggesting the dma-iommu API should be called
> iommu_set_dma_pasid instead of iommu_attach_dma_pasid?

No that API is Ok - the driver ops API should be 'set' not attach/detach

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 1/7] iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity

2022-05-11 Thread John Garry via iommu

On 07/04/2022 13:58, Yicong Yang wrote:

The DMA operations of HiSilicon PTT device can only work properly with
identical mappings. So add a quirk for the device to force the domain


I'm not sure if you meant to write "identity mappings".


as passthrough.

Signed-off-by: Yicong Yang 


FWIW,

Reviewed-by: John Garry 


---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 
  1 file changed, 16 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8f..5ec15ae2a9b1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2839,6 +2839,21 @@ static int arm_smmu_dev_disable_feature(struct device 
*dev,
}
  }
  
+#define IS_HISI_PTT_DEVICE(pdev)	((pdev)->vendor == PCI_VENDOR_ID_HUAWEI && \

+(pdev)->device == 0xa12e)
+
+static int arm_smmu_def_domain_type(struct device *dev)
+{
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   if (IS_HISI_PTT_DEVICE(pdev))
+   return IOMMU_DOMAIN_IDENTITY;
+   }
+
+   return 0;
+}
+
  static struct iommu_ops arm_smmu_ops = {
.capable= arm_smmu_capable,
.domain_alloc   = arm_smmu_domain_alloc,
@@ -2856,6 +2871,7 @@ static struct iommu_ops arm_smmu_ops = {
.sva_unbind = arm_smmu_sva_unbind,
.sva_get_pasid  = arm_smmu_sva_get_pasid,
.page_response  = arm_smmu_page_response,
+   .def_domain_type= arm_smmu_def_domain_type,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
.owner  = THIS_MODULE,
.default_domain_ops = &(const struct iommu_domain_ops) {


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] drm/tegra: Stop using iommu_present()

2022-05-11 Thread Dmitry Osipenko
On 5/4/22 14:52, Robin Murphy wrote:
> On 2022-05-04 01:52, Dmitry Osipenko wrote:
>> On 4/11/22 16:46, Robin Murphy wrote:
>>> @@ -1092,6 +1092,19 @@ static bool host1x_drm_wants_iommu(struct
>>> host1x_device *dev)
>>>   struct host1x *host1x = dev_get_drvdata(dev->dev.parent);
>>>   struct iommu_domain *domain;
>>>   +    /* For starters, this is moot if no IOMMU is available */
>>> +    if (!device_iommu_mapped(>dev))
>>> +    return false;
>>
>> Unfortunately this returns false on T30 with enabled IOMMU because we
>> don't use IOMMU for Host1x on T30 [1] to optimize performance. We can't
>> change it until we will update drivers to support Host1x-dedicated
>> buffers.
> 
> Huh, so is dev->dev here not the DRM device? If it is, and
> device_iommu_mapped() returns false, then the later iommu_attach_group()
> call is going to fail anyway, so there's not much point allocating a
> domain. If it's not, then what the heck is host1x_drm_wants_iommu()
> actually testing for?

The dev->dev is the host1x device and it's the DRM device.

The iommu_attach_group() is called for the DRM sub-devices (clients in
the Tegra driver), which are the devices sitting on the host1x bus.

There is no single GPU device on Tegra, instead it's composed of
independent GPU engines and display controllers that are connected to
the host1x bus.

Host1x also has channel DMA engines that are used by DRM driver. We
don't have dedicated devices for the host1x DMA, there is single host1x
driver that manages host1x bus and DMA.

-- 
Best regards,
Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 5/7] perf tool: Add support for HiSilicon PCIe Tune and Trace device driver

2022-05-11 Thread James Clark


On 11/05/2022 03:02, liuqi (BA) wrote:
> 
> Hi James,
> 
> On 2022/5/10 18:14, James Clark wrote:
>>
>>
>> On 07/04/2022 13:58, Yicong Yang wrote:
>>> From: Qi Liu 
>>>
> [...]
>>>   struct auxtrace_record
>>>   *auxtrace_record__init(struct evlist *evlist, int *err)
>>>   {
>>> @@ -57,8 +112,12 @@ struct auxtrace_record
>>>   struct evsel *evsel;
>>>   bool found_etm = false;
>>>   struct perf_pmu *found_spe = NULL;
>>> +    struct perf_pmu *found_ptt = NULL;
>>>   struct perf_pmu **arm_spe_pmus = NULL;
>>> +    struct perf_pmu **hisi_ptt_pmus = NULL;
>>> +
>>>   int nr_spes = 0;
>>> +    int nr_ptts = 0;
>>>   int i = 0;
>>>     if (!evlist)
>>> @@ -66,13 +125,14 @@ struct auxtrace_record
>>>     cs_etm_pmu = perf_pmu__find(CORESIGHT_ETM_PMU_NAME);
>>>   arm_spe_pmus = find_all_arm_spe_pmus(_spes, err);
>>> +    hisi_ptt_pmus = find_all_hisi_ptt_pmus(_ptts, err);
>>>     evlist__for_each_entry(evlist, evsel) {
>>>   if (cs_etm_pmu &&
>>>   evsel->core.attr.type == cs_etm_pmu->type)
>>>   found_etm = true;
>>>   -    if (!nr_spes || found_spe)
>>> +    if ((!nr_spes || found_spe) && (!nr_ptts || found_ptt))
>>>   continue;
>>>     for (i = 0; i < nr_spes; i++) {
>>> @@ -81,11 +141,18 @@ struct auxtrace_record
>>>   break;
>>>   }
>>>   }
>>> +
>>> +    for (i = 0; i < nr_ptts; i++) {
>>> +    if (evsel->core.attr.type == hisi_ptt_pmus[i]->type) {
>>> +    found_ptt = hisi_ptt_pmus[i];
>>> +    break;
>>> +    }
>>> +    }
>>>   }
>>>   free(arm_spe_pmus);
>>>   -    if (found_etm && found_spe) {
>>> -    pr_err("Concurrent ARM Coresight ETM and SPE operation not 
>>> currently supported\n");
>>> +    if (found_etm && found_spe && found_ptt) {
>>> +    pr_err("Concurrent ARM Coresight ETM ,SPE and HiSilicon PCIe Trace 
>>> operation not currently supported\n");
>>
>> Hi Yicong,
>>
>> Is that actually a limitation? I don't see why they couldn't work 
>> concurrently.
> 
> As Leo said, the logic here should be like this:
> 
>     int auxtrace_event_cnt = 0;
>     if (found_etm)
>     auxtrace_event_cnt++;
>     if (found_spe)
>     auxtrace_event_cnt++;
>     if (found_ptt)
>     auxtrace_event_cnt++;
> 
>     if (auxtrace_event_cnt > 1) {
>     pr_err("Concurrent AUX trace operation isn't supported: found 
> etm %d spe %d ptt %d\n",
>    found_etm, found_spe, found_ptt);
>     *err = -EOPNOTSUPP;
>     return NULL;
>     }
> 
> which means perf doesn't allow more than one auxtrace event recording at the 
> same time.

Oh I see that the limitation is actually in perf when decoding the data. I 
thought it meant
that it wasn't possible to open multiple aux events at the same time, which I 
think should
work in theory. Makes sense.

> 
>>
>>
>>>   *err = -EOPNOTSUPP;
>>>   return NULL;
>>>   }
>>> @@ -96,6 +163,9 @@ struct auxtrace_record
>>>   #if defined(__aarch64__)
>>>   if (found_spe)
>>>   return arm_spe_recording_init(err, found_spe);
>>> +
>>> +    if (found_ptt)
>>> +    return hisi_ptt_recording_init(err, found_ptt);
>>>   #endif
>>>     /*
> 
> [...]
>>> +
>>> +static int hisi_ptt_recording_options(struct auxtrace_record *itr,
>>> +  struct evlist *evlist,
>>> +  struct record_opts *opts)
>>> +{
>>> +    struct hisi_ptt_recording *pttr =
>>> +    container_of(itr, struct hisi_ptt_recording, itr);
>>> +    struct perf_pmu *hisi_ptt_pmu = pttr->hisi_ptt_pmu;
>>> +    struct perf_cpu_map *cpus = evlist->core.cpus;
>>> +    struct evsel *evsel, *hisi_ptt_evsel = NULL;
>>> +    struct evsel *tracking_evsel;
>>> +    int err;
>>> +
>>> +    pttr->evlist = evlist;
>>> +    evlist__for_each_entry(evlist, evsel) {
>>> +    if (evsel->core.attr.type == hisi_ptt_pmu->type) {
>>> +    if (hisi_ptt_evsel) {
>>> +    pr_err("There may be only one " HISI_PTT_PMU_NAME "x 
>>> event\n");
>>> +    return -EINVAL;
>>> +    }
>>> +    evsel->core.attr.freq = 0;
>>> +    evsel->core.attr.sample_period = 1;
>>> +    hisi_ptt_evsel = evsel;
>>> +    opts->full_auxtrace = true;
>>> +    }
>>> +    }
>>> +
>>> +    err = hisi_ptt_set_auxtrace_mmap_page(opts);
>>> +    if (err)
>>> +    return err;
>>> +    /*
>>> + * To obtain the auxtrace buffer file descriptor, the auxtrace event
>>> + * must come first.
>>> + */
>>> +    evlist__to_front(evlist, hisi_ptt_evsel);
>>> +
>>> +    if (!perf_cpu_map__empty(cpus)) {
>>> +    evsel__set_sample_bit(hisi_ptt_evsel, TIME);
>>> +    evsel__set_sample_bit(hisi_ptt_evsel, CPU);
>>> +    }
>>
>> Similar to Leo's comment: CPU isn't required if it's uncore,
>> and if TIME is useful then 

Re: [PATCH 1/4] iommu/mediatek: Use dev_err_probe to mute probe_defer err log

2022-05-11 Thread Dan Carpenter
On Wed, May 11, 2022 at 02:49:17PM +0800, Yong Wu wrote:
> Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
> Signed-off-by: Yong Wu 
> ---
> The Fixes tag commit-id is from linux-next.

This is fine.  The commit hash will not change unless the maintainer
rebases the tree.

When maintainers rebase their trees it's their responsibility to deal
with the Fixes tags.  Often they just fold the fix into the original
commit so the issue is moot.  Stephen Rothwell checks that Fixes tags
point to a valid commit and there are probably other people who have
checks for that as well.

regards,
dan carpenter

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/12] iommu: Add pasid_bits field in struct dev_iommu

2022-05-11 Thread Jean-Philippe Brucker
On Wed, May 11, 2022 at 10:25:48AM +0800, Baolu Lu wrote:
> On 2022/5/10 22:34, Jason Gunthorpe wrote:
> > On Tue, May 10, 2022 at 02:17:28PM +0800, Lu Baolu wrote:
> > 
> > >   int iommu_device_register(struct iommu_device *iommu,
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> > > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index 627a3ed5ee8f..afc63fce6107 100644
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -2681,6 +2681,8 @@ static struct iommu_device 
> > > *arm_smmu_probe_device(struct device *dev)
> > >   smmu->features & ARM_SMMU_FEAT_STALL_FORCE)
> > >   master->stall_enabled = true;
> > > + dev->iommu->pasid_bits = master->ssid_bits;
> > >   return >iommu;
> > >   err_free_master:
> > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > > index 2990f80c5e08..99643f897f26 100644
> > > +++ b/drivers/iommu/intel/iommu.c
> > > @@ -4624,8 +4624,11 @@ static struct iommu_device 
> > > *intel_iommu_probe_device(struct device *dev)
> > >   if (pasid_supported(iommu)) {
> > >   int features = pci_pasid_features(pdev);
> > > - if (features >= 0)
> > > + if (features >= 0) {
> > >   info->pasid_supported = 
> > > features | 1;
> > > + dev->iommu->pasid_bits =
> > > + fls(pci_max_pasids(pdev)) - 1;
> > > + }
> > 
> > It is not very nice that both the iommu drivers have to duplicate the
> > code to read the pasid capability out of the PCI device.
> > 
> > IMHO it would make more sense for the iommu layer to report the
> > capability of its own HW block only, and for the core code to figure
> > out the master's limitation using a bus-specific approach.
> 
> Fair enough. The iommu hardware capability could be reported in
> 
> /**
>  * struct iommu_device - IOMMU core representation of one IOMMU hardware
>  *   instance
>  * @list: Used by the iommu-core to keep a list of registered iommus
>  * @ops: iommu-ops for talking to this iommu
>  * @dev: struct device for sysfs handling
>  */
> struct iommu_device {
> struct list_head list;
> const struct iommu_ops *ops;
> struct fwnode_handle *fwnode;
> struct device *dev;
> };
> 
> I haven't checked ARM code yet, but it works for x86 as far as I can
> see.

Arm also supports non-PCI PASID by reading a firmware property:

device_property_read_u32(dev, "pasid-num-bits", >ssid_bits);

should be the only difference

Thanks,
Jean

> 
> > 
> > It is also unfortunate that the enable/disable pasid is inside the
> > iommu driver as well - ideally the PCI driver itself would do this
> > when it knows it wants to use PASIDs.
> > 
> > The ordering interaction with ATS makes this look quite annoying
> > though. :(
> > 
> > I'm also not convinced individual IOMMU drivers should be forcing ATS
> > on, there are performance and functional implications here. Using ATS
> > or not is possibly best left as an administrator policy controlled by
> > the core code. Again we seem to have some mess.
> 
> Agreed with you. This has already been in my task list. I will start to
> solve it after the iommufd tasks.
> 
> Best regards,
> baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 03/12] iommu: Add attach/detach_dev_pasid domain ops

2022-05-11 Thread Jean-Philippe Brucker
On Wed, May 11, 2022 at 04:09:14AM +, Tian, Kevin wrote:
> > From: Baolu Lu 
> > Sent: Wednesday, May 11, 2022 10:32 AM
> > 
> > On 2022/5/10 22:02, Jason Gunthorpe wrote:
> > > On Tue, May 10, 2022 at 02:17:29PM +0800, Lu Baolu wrote:
> > >
> > >> This adds a pair of common domain ops for this purpose and adds
> > helpers
> > >> to attach/detach a domain to/from a {device, PASID}.
> > >
> > > I wonder if this should not have a detach op - after discussing with
> > > Robin we can see that detach_dev is not used in updated
> > > drivers. Instead attach_dev acts as 'set_domain'
> > >
> > > So, it would be more symmetrical if attaching a blocking_domain to the
> > > PASID was the way to 'detach'.
> > >
> > > This could be made straightforward by following the sketch I showed to
> > > have a static, global blocing_domain and providing a pointer to it in
> > > struct iommu_ops
> > >
> > > Then 'detach pasid' is:
> > >
> > > iommu_ops->blocking_domain->ops->attach_dev_pasid(domain, dev,
> > pasid);
> > >
> > > And we move away from the notion of 'detach' and in the direction that
> > > everything continuously has a domain set. PASID would logically
> > > default to blocking_domain, though we wouldn't track this anywhere.
> > 
> > I am not sure whether we still need to keep the blocking domain concept
> > when we are entering the new PASID world. Please allow me to wait and
> > listen to more opinions.
> > 
> 
> I'm with Jason on this direction. In concept after a PASID is detached it's
> essentially blocked. Implementation-wise it doesn't prevent the iommu
> driver from marking the PASID entry as non-present as doing in this
> series instead of actually pointing to the empty page table of the block
> domain. But api-wise it does make the entire semantics more consistent.

This is all internal to IOMMU so I don't think we should be concerned
about API consistency. I prefer a straighforward detach() operation
because that way IOMMU drivers don't have to keep track of which domain is
attached to which PASID. That code can be factored into the IOMMU core.

In addition to clearing contexts, detach() also needs to invalidate TLBs,
and for that the SMMU driver needs to know the old ASID (!= PASID) that
was used by the context descriptor. We can certainly work around a missing
detach() to implement this, but it will be convoluted.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 35/35] iommu/amd: Update amd_iommu_fault structure to include PCI seg ID

2022-05-11 Thread Vasant Hegde via iommu
Rename 'device_id' as 'sbdf' and extend it to 32bit so that we can
pass PCI segment ID to ppr_notifier(). Also pass PCI segment ID to
pci_get_domain_bus_and_slot() instead of default value.

Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 2 +-
 drivers/iommu/amd/iommu.c   | 2 +-
 drivers/iommu/amd/iommu_v2.c| 9 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 7cf6bc353028..328572cf6fa5 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -482,7 +482,7 @@ extern struct kmem_cache *amd_iommu_irq_cache;
 struct amd_iommu_fault {
u64 address;/* IO virtual address of the fault*/
u32 pasid;  /* Address space identifier */
-   u16 device_id;  /* Originating PCI device id */
+   u32 sbdf;   /* Originating PCI device id */
u16 tag;/* PPR tag */
u16 flags;  /* Fault flags */
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 6320f2f97d88..c95c09c56b37 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -701,7 +701,7 @@ static void iommu_handle_ppr_entry(struct amd_iommu *iommu, 
u64 *raw)
 
fault.address   = raw[1];
fault.pasid = PPR_PASID(raw[0]);
-   fault.device_id = PPR_DEVID(raw[0]);
+   fault.sbdf  = PCI_SEG_DEVID_TO_SBDF(iommu->pci_seg->id, 
PPR_DEVID(raw[0]));
fault.tag   = PPR_TAG(raw[0]);
fault.flags = PPR_FLAGS(raw[0]);
 
diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index b186d6e0..28fecc6d0e53 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -518,15 +518,16 @@ static int ppr_notifier(struct notifier_block *nb, 
unsigned long e, void *data)
unsigned long flags;
struct fault *fault;
bool finish;
-   u16 tag, devid;
+   u16 tag, devid, seg_id;
int ret;
 
iommu_fault = data;
tag = iommu_fault->tag & 0x1ff;
finish  = (iommu_fault->tag >> 9) & 1;
 
-   devid = iommu_fault->device_id;
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   seg_id = PCI_SBDF_TO_SEGID(iommu_fault->sbdf);
+   devid = PCI_SBDF_TO_DEVID(iommu_fault->sbdf);
+   pdev = pci_get_domain_bus_and_slot(seg_id, PCI_BUS_NUM(devid),
   devid & 0xff);
if (!pdev)
return -ENODEV;
@@ -540,7 +541,7 @@ static int ppr_notifier(struct notifier_block *nb, unsigned 
long e, void *data)
goto out;
}
 
-   dev_state = get_device_state(iommu_fault->device_id);
+   dev_state = get_device_state(iommu_fault->sbdf);
if (dev_state == NULL)
goto out;
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 34/35] iommu/amd: Update device_state structure to include PCI seg ID

2022-05-11 Thread Vasant Hegde via iommu
Rename struct device_state.devid variable to struct device_state.sbdf
and extend it to 32-bit to include the 16-bit PCI segment ID via
the helper function get_pci_sbdf_id().

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu_v2.c | 58 +++-
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index e56b137ceabd..b186d6e0 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -51,7 +51,7 @@ struct pasid_state {
 
 struct device_state {
struct list_head list;
-   u16 devid;
+   u32 sbdf;
atomic_t count;
struct pci_dev *pdev;
struct pasid_state **states;
@@ -83,35 +83,25 @@ static struct workqueue_struct *iommu_wq;
 
 static void free_pasid_states(struct device_state *dev_state);
 
-static u16 device_id(struct pci_dev *pdev)
-{
-   u16 devid;
-
-   devid = pdev->bus->number;
-   devid = (devid << 8) | pdev->devfn;
-
-   return devid;
-}
-
-static struct device_state *__get_device_state(u16 devid)
+static struct device_state *__get_device_state(u32 sbdf)
 {
struct device_state *dev_state;
 
list_for_each_entry(dev_state, _list, list) {
-   if (dev_state->devid == devid)
+   if (dev_state->sbdf == sbdf)
return dev_state;
}
 
return NULL;
 }
 
-static struct device_state *get_device_state(u16 devid)
+static struct device_state *get_device_state(u32 sbdf)
 {
struct device_state *dev_state;
unsigned long flags;
 
spin_lock_irqsave(_lock, flags);
-   dev_state = __get_device_state(devid);
+   dev_state = __get_device_state(sbdf);
if (dev_state != NULL)
atomic_inc(_state->count);
spin_unlock_irqrestore(_lock, flags);
@@ -609,7 +599,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
struct pasid_state *pasid_state;
struct device_state *dev_state;
struct mm_struct *mm;
-   u16 devid;
+   u32 sbdf;
int ret;
 
might_sleep();
@@ -617,8 +607,8 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
if (!amd_iommu_v2_supported())
return -ENODEV;
 
-   devid = device_id(pdev);
-   dev_state = get_device_state(devid);
+   sbdf  = get_pci_sbdf_id(pdev);
+   dev_state = get_device_state(sbdf);
 
if (dev_state == NULL)
return -EINVAL;
@@ -692,15 +682,15 @@ void amd_iommu_unbind_pasid(struct pci_dev *pdev, u32 
pasid)
 {
struct pasid_state *pasid_state;
struct device_state *dev_state;
-   u16 devid;
+   u32 sbdf;
 
might_sleep();
 
if (!amd_iommu_v2_supported())
return;
 
-   devid = device_id(pdev);
-   dev_state = get_device_state(devid);
+   sbdf = get_pci_sbdf_id(pdev);
+   dev_state = get_device_state(sbdf);
if (dev_state == NULL)
return;
 
@@ -742,7 +732,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
struct iommu_group *group;
unsigned long flags;
int ret, tmp;
-   u16 devid;
+   u32 sbdf;
 
might_sleep();
 
@@ -759,7 +749,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
if (pasids <= 0 || pasids > (PASID_MASK + 1))
return -EINVAL;
 
-   devid = device_id(pdev);
+   sbdf = get_pci_sbdf_id(pdev);
 
dev_state = kzalloc(sizeof(*dev_state), GFP_KERNEL);
if (dev_state == NULL)
@@ -768,7 +758,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
spin_lock_init(_state->lock);
init_waitqueue_head(_state->wq);
dev_state->pdev  = pdev;
-   dev_state->devid = devid;
+   dev_state->sbdf = sbdf;
 
tmp = pasids;
for (dev_state->pasid_levels = 0; (tmp - 1) & ~0x1ff; tmp >>= 9)
@@ -806,7 +796,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 
spin_lock_irqsave(_lock, flags);
 
-   if (__get_device_state(devid) != NULL) {
+   if (__get_device_state(sbdf) != NULL) {
spin_unlock_irqrestore(_lock, flags);
ret = -EBUSY;
goto out_free_domain;
@@ -838,16 +828,16 @@ void amd_iommu_free_device(struct pci_dev *pdev)
 {
struct device_state *dev_state;
unsigned long flags;
-   u16 devid;
+   u32 sbdf;
 
if (!amd_iommu_v2_supported())
return;
 
-   devid = device_id(pdev);
+   sbdf = get_pci_sbdf_id(pdev);
 
spin_lock_irqsave(_lock, flags);
 
-   dev_state = __get_device_state(devid);
+   dev_state = __get_device_state(sbdf);
if (dev_state == NULL) {
spin_unlock_irqrestore(_lock, flags);
return;
@@ -867,18 +857,18 @@ int 

[PATCH v3 33/35] iommu/amd: Print PCI segment ID in error log messages

2022-05-11 Thread Vasant Hegde via iommu
Print pci segment ID along with bdf. Useful for debugging.

Co-developed-by: Suravee Suthikulpaint 
Signed-off-by: Suravee Suthikulpaint 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/init.c  | 10 +-
 drivers/iommu/amd/iommu.c | 36 ++--
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 969b496f7e74..8483d98a1775 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1853,11 +1853,11 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
h = (struct ivhd_header *)p;
if (*p == amd_iommu_target_ivhd_type) {
 
-   DUMP_printk("device: %02x:%02x.%01x cap: %04x "
-   "seg: %d flags: %01x info %04x\n",
-   PCI_BUS_NUM(h->devid), PCI_SLOT(h->devid),
-   PCI_FUNC(h->devid), h->cap_ptr,
-   h->pci_seg, h->flags, h->info);
+   DUMP_printk("device: %04x:%02x:%02x.%01x cap: %04x "
+   "flags: %01x info %04x\n",
+   h->pci_seg, PCI_BUS_NUM(h->devid),
+   PCI_SLOT(h->devid), PCI_FUNC(h->devid),
+   h->cap_ptr, h->flags, h->info);
DUMP_printk("   mmio-addr: %016llx\n",
h->mmio_phys);
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 536dbc1d26ad..6320f2f97d88 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -496,8 +496,8 @@ static void amd_iommu_report_rmp_hw_error(struct amd_iommu 
*iommu, volatile u32
vmg_tag, spa, flags);
}
} else {
-   pr_err_ratelimited("Event logged [RMP_HW_ERROR 
device=%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   pr_err_ratelimited("Event logged [RMP_HW_ERROR 
device=%04x:%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
vmg_tag, spa, flags);
}
 
@@ -529,8 +529,8 @@ static void amd_iommu_report_rmp_fault(struct amd_iommu 
*iommu, volatile u32 *ev
vmg_tag, gpa, flags_rmp, flags);
}
} else {
-   pr_err_ratelimited("Event logged [RMP_PAGE_FAULT 
device=%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, 
flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   pr_err_ratelimited("Event logged [RMP_PAGE_FAULT 
device=%04x:%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, 
flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
vmg_tag, gpa, flags_rmp, flags);
}
 
@@ -576,8 +576,8 @@ static void amd_iommu_report_page_fault(struct amd_iommu 
*iommu,
domain_id, address, flags);
}
} else {
-   pr_err_ratelimited("Event logged [IO_PAGE_FAULT 
device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   pr_err_ratelimited("Event logged [IO_PAGE_FAULT 
device=%04x:%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
domain_id, address, flags);
}
 
@@ -620,20 +620,20 @@ static void iommu_print_event(struct amd_iommu *iommu, 
void *__evt)
 
switch (type) {
case EVENT_TYPE_ILL_DEV:
-   dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY 
device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY 
device=%04x:%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);
dump_dte_entry(iommu, devid);
break;
case EVENT_TYPE_DEV_TAB_ERR:
-   dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR 
device=%02x:%02x.%x "
+   dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR 
device=%04x:%02x:%02x.%x "
"address=0x%llx flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 

[PATCH v3 32/35] iommu/amd: Add PCI segment support for ivrs_[ioapic/hpet/acpihid] commands

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

By default, PCI segment is zero and can be omitted. To support system
with non-zero PCI segment ID, modify the parsing functions to allow
PCI segment ID.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 .../admin-guide/kernel-parameters.txt | 34 ++
 drivers/iommu/amd/init.c  | 44 ---
 2 files changed, 52 insertions(+), 26 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index f5a27f067db9..cc8f0c82ff55 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2208,23 +2208,39 @@
 
ivrs_ioapic [HW,X86-64]
Provide an override to the IOAPIC-ID<->DEVICE-ID
-   mapping provided in the IVRS ACPI table. For
-   example, to map IOAPIC-ID decimal 10 to
-   PCI device 00:14.0 write the parameter as:
+   mapping provided in the IVRS ACPI table.
+   By default, PCI segment is 0, and can be omitted.
+   For example:
+   * To map IOAPIC-ID decimal 10 to PCI device 00:14.0
+ write the parameter as:
ivrs_ioapic[10]=00:14.0
+   * To map IOAPIC-ID decimal 10 to PCI segment 0x1 and
+ PCI device 00:14.0 write the parameter as:
+   ivrs_ioapic[10]=0001:00:14.0
 
ivrs_hpet   [HW,X86-64]
Provide an override to the HPET-ID<->DEVICE-ID
-   mapping provided in the IVRS ACPI table. For
-   example, to map HPET-ID decimal 0 to
-   PCI device 00:14.0 write the parameter as:
+   mapping provided in the IVRS ACPI table.
+   By default, PCI segment is 0, and can be omitted.
+   For example:
+   * To map HPET-ID decimal 0 to PCI device 00:14.0
+ write the parameter as:
ivrs_hpet[0]=00:14.0
+   * To map HPET-ID decimal 10 to PCI segment 0x1 and
+ PCI device 00:14.0 write the parameter as:
+   ivrs_ioapic[10]=0001:00:14.0
 
ivrs_acpihid[HW,X86-64]
Provide an override to the ACPI-HID:UID<->DEVICE-ID
-   mapping provided in the IVRS ACPI table. For
-   example, to map UART-HID:UID AMD0020:0 to
-   PCI device 00:14.5 write the parameter as:
+   mapping provided in the IVRS ACPI table.
+
+   For example, to map UART-HID:UID AMD0020:0 to
+   PCI segment 0x1 and PCI device ID 00:14.5,
+   write the parameter as:
+   ivrs_acpihid[0001:00:14.5]=AMD0020:0
+
+   By default, PCI segment is 0, and can be omitted.
+   For example, PCI device 00:14.5 write the parameter as:
ivrs_acpihid[00:14.5]=AMD0020:0
 
js= [HW,JOY] Analog joystick
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index ca79637560a3..969b496f7e74 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -85,6 +85,10 @@
 #define ACPI_DEVFLAG_ATSDIS 0x1000
 
 #define LOOP_TIMEOUT   10
+
+#define IVRS_GET_SBDF_ID(seg, bus, dev, fd)(((seg & 0x) << 16) | ((bus 
& 0xff) << 8) \
+| ((dev & 0x1f) << 3) | (fn & 
0x7))
+
 /*
  * ACPI table definitions
  *
@@ -3287,15 +3291,17 @@ static int __init parse_amd_iommu_options(char *str)
 
 static int __init parse_ivrs_ioapic(char *str)
 {
-   unsigned int bus, dev, fn;
+   u32 seg = 0, bus, dev, fn;
int ret, id, i;
-   u16 devid;
+   u32 devid;
 
ret = sscanf(str, "[%d]=%x:%x.%x", , , , );
-
if (ret != 4) {
-   pr_err("Invalid command line: ivrs_ioapic%s\n", str);
-   return 1;
+   ret = sscanf(str, "[%d]=%x:%x:%x.%x", , , , , 
);
+   if (ret != 5) {
+   pr_err("Invalid command line: ivrs_ioapic%s\n", str);
+   return 1;
+   }
}
 
if (early_ioapic_map_size == EARLY_MAP_SIZE) {
@@ -3304,7 +3310,7 @@ static int __init parse_ivrs_ioapic(char *str)
return 1;
}
 
-   devid = ((bus & 0xff) << 8) | ((dev & 0x1f) << 3) | (fn & 0x7);
+   devid = IVRS_GET_SBDF_ID(seg, bus, dev, fn);
 
cmdline_maps= true;
i   = early_ioapic_map_size++;
@@ -3317,15 

[PATCH v3 31/35] iommu/amd: Specify PCI segment ID when getting pci device

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Upcoming AMD systems can have multiple PCI segments. Hence pass PCI
segment ID to pci_get_domain_bus_and_slot() instead of '0'.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/init.c  |  6 --
 drivers/iommu/amd/iommu.c | 19 ++-
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index c746b71c0dbb..ca79637560a3 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1960,7 +1960,8 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
int cap_ptr = iommu->cap_ptr;
int ret;
 
-   iommu->dev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(iommu->devid),
+   iommu->dev = pci_get_domain_bus_and_slot(iommu->pci_seg->id,
+PCI_BUS_NUM(iommu->devid),
 iommu->devid & 0xff);
if (!iommu->dev)
return -ENODEV;
@@ -2023,7 +2024,8 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
int i, j;
 
iommu->root_pdev =
-   pci_get_domain_bus_and_slot(0, iommu->dev->bus->number,
+   pci_get_domain_bus_and_slot(iommu->pci_seg->id,
+   iommu->dev->bus->number,
PCI_DEVFN(0, 0));
 
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index d9b23f7820a9..536dbc1d26ad 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -473,7 +473,7 @@ static void dump_command(unsigned long phys_addr)
pr_err("CMD[%d]: %08x\n", i, cmd->data[i]);
 }
 
-static void amd_iommu_report_rmp_hw_error(volatile u32 *event)
+static void amd_iommu_report_rmp_hw_error(struct amd_iommu *iommu, volatile 
u32 *event)
 {
struct iommu_dev_data *dev_data = NULL;
int devid, vmg_tag, flags;
@@ -485,7 +485,7 @@ static void amd_iommu_report_rmp_hw_error(volatile u32 
*event)
flags   = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK;
spa = ((u64)event[3] << 32) | (event[2] & 0xFFF8);
 
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 
PCI_BUS_NUM(devid),
   devid & 0xff);
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
@@ -505,7 +505,7 @@ static void amd_iommu_report_rmp_hw_error(volatile u32 
*event)
pci_dev_put(pdev);
 }
 
-static void amd_iommu_report_rmp_fault(volatile u32 *event)
+static void amd_iommu_report_rmp_fault(struct amd_iommu *iommu, volatile u32 
*event)
 {
struct iommu_dev_data *dev_data = NULL;
int devid, flags_rmp, vmg_tag, flags;
@@ -518,7 +518,7 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event)
flags = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK;
gpa   = ((u64)event[3] << 32) | event[2];
 
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 
PCI_BUS_NUM(devid),
   devid & 0xff);
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
@@ -544,13 +544,14 @@ static void amd_iommu_report_rmp_fault(volatile u32 
*event)
 #define IS_WRITE_REQUEST(flags)\
((flags) & EVENT_FLAG_RW)
 
-static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
+static void amd_iommu_report_page_fault(struct amd_iommu *iommu,
+   u16 devid, u16 domain_id,
u64 address, int flags)
 {
struct iommu_dev_data *dev_data = NULL;
struct pci_dev *pdev;
 
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 
PCI_BUS_NUM(devid),
   devid & 0xff);
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
@@ -613,7 +614,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
}
 
if (type == EVENT_TYPE_IO_FAULT) {
-   amd_iommu_report_page_fault(devid, pasid, address, flags);
+   amd_iommu_report_page_fault(iommu, devid, pasid, address, 
flags);
return;
}
 
@@ -654,10 +655,10 @@ static void iommu_print_event(struct amd_iommu *iommu, 
void *__evt)
pasid, address, flags);
break;
case EVENT_TYPE_RMP_FAULT:
-   amd_iommu_report_rmp_fault(event);
+   amd_iommu_report_rmp_fault(iommu, event);
break;
case EVENT_TYPE_RMP_HW_ERR:
-   amd_iommu_report_rmp_hw_error(event);
+   

[PATCH v3 30/35] iommu/amd: Include PCI segment ID when initialize IOMMU

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Extend current device ID variables to 32-bit to include the 16-bit
segment ID when parsing device information from IVRS table to initialize
each IOMMU.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  2 +-
 drivers/iommu/amd/amd_iommu_types.h |  6 ++--
 drivers/iommu/amd/init.c| 56 +++--
 drivers/iommu/amd/quirks.c  |  4 +--
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index e73bd48fc716..9b7092182ca7 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -125,7 +125,7 @@ static inline int get_pci_sbdf_id(struct pci_dev *pdev)
 
 extern bool translation_pre_enabled(struct amd_iommu *iommu);
 extern bool amd_iommu_is_attach_deferred(struct device *dev);
-extern int __init add_special_device(u8 type, u8 id, u16 *devid,
+extern int __init add_special_device(u8 type, u8 id, u32 *devid,
 bool cmd_line);
 
 #ifdef CONFIG_DMI
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 0d47aac685ee..7cf6bc353028 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -740,8 +740,8 @@ struct acpihid_map_entry {
struct list_head list;
u8 uid[ACPIHID_UID_LEN];
u8 hid[ACPIHID_HID_LEN];
-   u16 devid;
-   u16 root_devid;
+   u32 devid;
+   u32 root_devid;
bool cmd_line;
struct iommu_group *group;
 };
@@ -749,7 +749,7 @@ struct acpihid_map_entry {
 struct devid_map {
struct list_head list;
u8 id;
-   u16 devid;
+   u32 devid;
bool cmd_line;
 };
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 7c81e733a3ac..c746b71c0dbb 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1146,7 +1146,7 @@ static void __init set_dev_entry_from_acpi(struct 
amd_iommu *iommu,
amd_iommu_set_rlookup_table(iommu, devid);
 }
 
-int __init add_special_device(u8 type, u8 id, u16 *devid, bool cmd_line)
+int __init add_special_device(u8 type, u8 id, u32 *devid, bool cmd_line)
 {
struct devid_map *entry;
struct list_head *list;
@@ -1183,7 +1183,7 @@ int __init add_special_device(u8 type, u8 id, u16 *devid, 
bool cmd_line)
return 0;
 }
 
-static int __init add_acpi_hid_device(u8 *hid, u8 *uid, u16 *devid,
+static int __init add_acpi_hid_device(u8 *hid, u8 *uid, u32 *devid,
  bool cmd_line)
 {
struct acpihid_map_entry *entry;
@@ -1262,7 +1262,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 {
u8 *p = (u8 *)h;
u8 *end = p, flags = 0;
-   u16 devid = 0, devid_start = 0, devid_to = 0;
+   u16 devid = 0, devid_start = 0, devid_to = 0, seg_id;
u32 dev_i, ext_flags = 0;
bool alias = false;
struct ivhd_entry *e;
@@ -1298,6 +1298,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 
while (p < end) {
e = (struct ivhd_entry *)p;
+   seg_id = pci_seg->id;
+
switch (e->type) {
case IVHD_DEV_ALL:
 
@@ -1308,9 +1310,9 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case IVHD_DEV_SELECT:
 
-   DUMP_printk("  DEV_SELECT\t\t\t devid: %02x:%02x.%x "
+   DUMP_printk("  DEV_SELECT\t\t\t devid: 
%04x:%02x:%02x.%x "
"flags: %02x\n",
-   PCI_BUS_NUM(e->devid),
+   seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
e->flags);
@@ -1321,8 +1323,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
case IVHD_DEV_SELECT_RANGE_START:
 
DUMP_printk("  DEV_SELECT_RANGE_START\t "
-   "devid: %02x:%02x.%x flags: %02x\n",
-   PCI_BUS_NUM(e->devid),
+   "devid: %04x:%02x:%02x.%x flags: %02x\n",
+   seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
e->flags);
@@ -1334,9 +1336,9 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case IVHD_DEV_ALIAS:
 
-   DUMP_printk("  DEV_ALIAS\t\t\t devid: %02x:%02x.%x "
+   DUMP_printk("  DEV_ALIAS\t\t\t devid: %04x:%02x:%02x.%x 
"
"flags: 

[PATCH v3 29/35] iommu/amd: Introduce get_device_sbdf_id() helper function

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Current get_device_id() only provide 16-bit PCI device ID (i.e. BDF).
With multiple PCI segment support, we need to extend the helper function
to include PCI segment ID.

So, introduce a new helper function get_device_sbdf_id() to replace
the current get_pci_device_id().

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  7 
 drivers/iommu/amd/amd_iommu_types.h |  2 +
 drivers/iommu/amd/iommu.c   | 58 ++---
 3 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 64c954e168d7..e73bd48fc716 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -115,6 +115,13 @@ void amd_iommu_domain_clr_pt_root(struct protection_domain 
*domain)
amd_iommu_domain_set_pt_root(domain, 0);
 }
 
+static inline int get_pci_sbdf_id(struct pci_dev *pdev)
+{
+   int seg = pci_domain_nr(pdev->bus);
+   u16 devid = pci_dev_id(pdev);
+
+   return PCI_SEG_DEVID_TO_SBDF(seg, devid);
+}
 
 extern bool translation_pre_enabled(struct amd_iommu *iommu);
 extern bool amd_iommu_is_attach_deferred(struct device *dev);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index dfb1f2055f0c..0d47aac685ee 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -452,6 +452,8 @@ extern struct kmem_cache *amd_iommu_irq_cache;
 
 #define PCI_SBDF_TO_SEGID(sbdf)(((sbdf) >> 16) & 0x)
 #define PCI_SBDF_TO_DEVID(sbdf)((sbdf) & 0x)
+#define PCI_SEG_DEVID_TO_SBDF(seg, devid)  u32)(seg) & 0x) << 16) 
| \
+((devid) & 0x))
 
 /* Make iterating over all pci segment easier */
 #define for_each_pci_segment(pci_seg) \
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 68ebbccef5c4..d9b23f7820a9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -92,13 +92,6 @@ static void detach_device(struct device *dev);
  *
  /
 
-static inline u16 get_pci_device_id(struct device *dev)
-{
-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   return pci_dev_id(pdev);
-}
-
 static inline int get_acpihid_device_id(struct device *dev,
struct acpihid_map_entry **entry)
 {
@@ -119,16 +112,16 @@ static inline int get_acpihid_device_id(struct device 
*dev,
return -EINVAL;
 }
 
-static inline int get_device_id(struct device *dev)
+static inline int get_device_sbdf_id(struct device *dev)
 {
-   int devid;
+   int sbdf;
 
if (dev_is_pci(dev))
-   devid = get_pci_device_id(dev);
+   sbdf = get_pci_sbdf_id(to_pci_dev(dev));
else
-   devid = get_acpihid_device_id(dev, NULL);
+   sbdf = get_acpihid_device_id(dev, NULL);
 
-   return devid;
+   return sbdf;
 }
 
 struct dev_table_entry *get_dev_table(struct amd_iommu *iommu)
@@ -182,9 +175,11 @@ static struct amd_iommu *__rlookup_amd_iommu(u16 seg, u16 
devid)
 static struct amd_iommu *rlookup_amd_iommu(struct device *dev)
 {
u16 seg = get_device_segment(dev);
-   u16 devid = get_device_id(dev);
+   int devid = get_device_sbdf_id(dev);
 
-   return __rlookup_amd_iommu(seg, devid);
+   if (devid < 0)
+   return NULL;
+   return __rlookup_amd_iommu(seg, PCI_SBDF_TO_DEVID(devid));
 }
 
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
@@ -360,14 +355,15 @@ static bool check_device(struct device *dev)
 {
struct amd_iommu_pci_seg *pci_seg;
struct amd_iommu *iommu;
-   int devid;
+   int devid, sbdf;
 
if (!dev)
return false;
 
-   devid = get_device_id(dev);
-   if (devid < 0)
+   sbdf = get_device_sbdf_id(dev);
+   if (sbdf < 0)
return false;
+   devid = PCI_SBDF_TO_DEVID(sbdf);
 
iommu = rlookup_amd_iommu(dev);
if (!iommu)
@@ -375,7 +371,7 @@ static bool check_device(struct device *dev)
 
/* Out of our scope? */
pci_seg = iommu->pci_seg;
-   if ((devid & 0x) > pci_seg->last_bdf)
+   if (devid > pci_seg->last_bdf)
return false;
 
return true;
@@ -384,15 +380,16 @@ static bool check_device(struct device *dev)
 static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
 {
struct iommu_dev_data *dev_data;
-   int devid;
+   int devid, sbdf;
 
if (dev_iommu_priv_get(dev))
return 0;
 
-   devid = get_device_id(dev);
-   if (devid < 0)
-   return devid;
+   sbdf = get_device_sbdf_id(dev);
+   if (sbdf < 0)
+   return sbdf;
 
+   devid = 

[PATCH v3 28/35] iommu/amd: Flush upto last_bdf only

2022-05-11 Thread Vasant Hegde via iommu
Fix amd_iommu_flush_dte_all() and amd_iommu_flush_tlb_all() to flush
upto last_bdf only.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c21346e48bcd..68ebbccef5c4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1191,8 +1191,9 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 
devid)
 static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
 {
u32 devid;
+   u16 last_bdf = iommu->pci_seg->last_bdf;
 
-   for (devid = 0; devid <= 0x; ++devid)
+   for (devid = 0; devid <= last_bdf; ++devid)
iommu_flush_dte(iommu, devid);
 
iommu_completion_wait(iommu);
@@ -1205,8 +1206,9 @@ static void amd_iommu_flush_dte_all(struct amd_iommu 
*iommu)
 static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
 {
u32 dom_id;
+   u16 last_bdf = iommu->pci_seg->last_bdf;
 
-   for (dom_id = 0; dom_id <= 0x; ++dom_id) {
+   for (dom_id = 0; dom_id <= last_bdf; ++dom_id) {
struct iommu_cmd cmd;
build_inv_iommu_pages(, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
  dom_id, 1);
@@ -1249,8 +1251,9 @@ static void iommu_flush_irt(struct amd_iommu *iommu, u16 
devid)
 static void amd_iommu_flush_irt_all(struct amd_iommu *iommu)
 {
u32 devid;
+   u16 last_bdf = iommu->pci_seg->last_bdf;
 
-   for (devid = 0; devid <= MAX_DEV_TABLE_ENTRIES; devid++)
+   for (devid = 0; devid <= last_bdf; devid++)
iommu_flush_irt(iommu, devid);
 
iommu_completion_wait(iommu);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 27/35] iommu/amd: Remove global amd_iommu_[dev_table/alias_table/last_bdf]

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Replace them with per PCI segment device table.
Also remove dev_table_size, alias_table_size, amd_iommu_last_bdf
variables.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h | 15 -
 drivers/iommu/amd/init.c| 89 +
 drivers/iommu/amd/iommu.c   | 18 --
 3 files changed, 27 insertions(+), 95 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index ddd606daa653..dfb1f2055f0c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -830,24 +830,9 @@ struct unity_map_entry {
  * Data structures for device handling
  */
 
-/*
- * Device table used by hardware. Read and write accesses by software are
- * locked with the amd_iommu_pd_table lock.
- */
-extern struct dev_table_entry *amd_iommu_dev_table;
-
-/*
- * Alias table to find requestor ids to device ids. Not locked because only
- * read on runtime.
- */
-extern u16 *amd_iommu_alias_table;
-
 /* size of the dma_ops aperture as power of 2 */
 extern unsigned amd_iommu_aperture_order;
 
-/* largest PCI device id we expect translation requests for */
-extern u16 amd_iommu_last_bdf;
-
 /* allocation bitmap for domain ids */
 extern unsigned long *amd_iommu_pd_alloc_bitmap;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 2cdce8a3b86e..7c81e733a3ac 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -161,9 +161,6 @@ static bool amd_iommu_disabled __initdata;
 static bool amd_iommu_force_enable __initdata;
 static int amd_iommu_target_ivhd_type;
 
-u16 amd_iommu_last_bdf;/* largest PCI device id we have
-  to handle */
-
 LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
   system */
@@ -185,30 +182,12 @@ static bool amd_iommu_pc_present __read_mostly;
 
 bool amd_iommu_force_isolation __read_mostly;
 
-/*
- * Pointer to the device table which is shared by all AMD IOMMUs
- * it is indexed by the PCI device id or the HT unit id and contains
- * information about the domain the device belongs to as well as the
- * page table root pointer.
- */
-struct dev_table_entry *amd_iommu_dev_table;
-
-/*
- * The alias table is a driver specific data structure which contains the
- * mappings of the PCI device ids to the actual requestor ids on the IOMMU.
- * More than one device can share the same requestor id.
- */
-u16 *amd_iommu_alias_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
  */
 unsigned long *amd_iommu_pd_alloc_bitmap;
 
-static u32 dev_table_size; /* size of the device table */
-static u32 alias_table_size;   /* size of the alias table */
-
 enum iommu_init_state {
IOMMU_START_STATE,
IOMMU_IVRS_DETECTED,
@@ -263,16 +242,10 @@ static void init_translation_status(struct amd_iommu 
*iommu)
iommu->flags |= AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
 }
 
-static inline void update_last_devid(u16 devid)
-{
-   if (devid > amd_iommu_last_bdf)
-   amd_iommu_last_bdf = devid;
-}
-
-static inline unsigned long tbl_size(int entry_size)
+static inline unsigned long tbl_size(int entry_size, int last_bdf)
 {
unsigned shift = PAGE_SHIFT +
-get_order(((int)amd_iommu_last_bdf + 1) * entry_size);
+get_order((last_bdf + 1) * entry_size);
 
return 1UL << shift;
 }
@@ -402,10 +375,11 @@ static void iommu_set_device_table(struct amd_iommu 
*iommu)
 {
u64 entry;
u32 dev_table_size = iommu->pci_seg->dev_table_size;
+   void *dev_table = (void *)get_dev_table(iommu);
 
BUG_ON(iommu->mmio_base == NULL);
 
-   entry = iommu_virt_to_phys(amd_iommu_dev_table);
+   entry = iommu_virt_to_phys(dev_table);
entry |= (dev_table_size >> 12) - 1;
memcpy_toio(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET,
, sizeof(entry));
@@ -555,14 +529,12 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
switch (dev->type) {
case IVHD_DEV_ALL:
/* Use maximum BDF value for DEV_ALL */
-   update_last_devid(0x);
return 0x;
case IVHD_DEV_SELECT:
case IVHD_DEV_RANGE_END:
case IVHD_DEV_ALIAS:
case IVHD_DEV_EXT_SELECT:
/* all the above subfield types refer to device ids */
-   update_last_devid(dev->devid);
if (dev->devid > last_devid)
last_devid = dev->devid;
 

[PATCH v3 26/35] iommu/amd: Update set_dev_entry_bit() and get_dev_entry_bit()

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

To include a pointer to per PCI segment device table.

Also include struct amd_iommu as one of the function parameter to
amd_iommu_apply_erratum_63() since it is needed when setting up DTE.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h |  2 +-
 drivers/iommu/amd/init.c  | 59 +++
 drivers/iommu/amd/iommu.c |  2 +-
 3 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 2947239700ce..64c954e168d7 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -13,7 +13,7 @@
 
 extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
 extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
-extern void amd_iommu_apply_erratum_63(u16 devid);
+extern void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid);
 extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu);
 extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index ca9131ab745b..2cdce8a3b86e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -987,22 +987,37 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
 }
 
 /* sets a specific bit in the device table entry. */
-static void set_dev_entry_bit(u16 devid, u8 bit)
+static void __set_dev_entry_bit(struct dev_table_entry *dev_table,
+   u16 devid, u8 bit)
 {
int i = (bit >> 6) & 0x03;
int _bit = bit & 0x3f;
 
-   amd_iommu_dev_table[devid].data[i] |= (1UL << _bit);
+   dev_table[devid].data[i] |= (1UL << _bit);
 }
 
-static int get_dev_entry_bit(u16 devid, u8 bit)
+static void set_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
+{
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+   return __set_dev_entry_bit(dev_table, devid, bit);
+}
+
+static int __get_dev_entry_bit(struct dev_table_entry *dev_table,
+  u16 devid, u8 bit)
 {
int i = (bit >> 6) & 0x03;
int _bit = bit & 0x3f;
 
-   return (amd_iommu_dev_table[devid].data[i] & (1UL << _bit)) >> _bit;
+   return (dev_table[devid].data[i] & (1UL << _bit)) >> _bit;
 }
 
+static int get_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
+{
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+   return __get_dev_entry_bit(dev_table, devid, bit);
+}
 
 static bool __copy_device_table(struct amd_iommu *iommu)
 {
@@ -1121,15 +1136,15 @@ static bool copy_device_table(void)
return true;
 }
 
-void amd_iommu_apply_erratum_63(u16 devid)
+void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid)
 {
int sysmgt;
 
-   sysmgt = get_dev_entry_bit(devid, DEV_ENTRY_SYSMGT1) |
-(get_dev_entry_bit(devid, DEV_ENTRY_SYSMGT2) << 1);
+   sysmgt = get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1) |
+(get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2) << 1);
 
if (sysmgt == 0x01)
-   set_dev_entry_bit(devid, DEV_ENTRY_IW);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_IW);
 }
 
 /* Writes the specific IOMMU for a device into the rlookup table */
@@ -1146,21 +1161,21 @@ static void __init set_dev_entry_from_acpi(struct 
amd_iommu *iommu,
   u16 devid, u32 flags, u32 ext_flags)
 {
if (flags & ACPI_DEVFLAG_INITPASS)
-   set_dev_entry_bit(devid, DEV_ENTRY_INIT_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_INIT_PASS);
if (flags & ACPI_DEVFLAG_EXTINT)
-   set_dev_entry_bit(devid, DEV_ENTRY_EINT_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_EINT_PASS);
if (flags & ACPI_DEVFLAG_NMI)
-   set_dev_entry_bit(devid, DEV_ENTRY_NMI_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_NMI_PASS);
if (flags & ACPI_DEVFLAG_SYSMGT1)
-   set_dev_entry_bit(devid, DEV_ENTRY_SYSMGT1);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1);
if (flags & ACPI_DEVFLAG_SYSMGT2)
-   set_dev_entry_bit(devid, DEV_ENTRY_SYSMGT2);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2);
if (flags & ACPI_DEVFLAG_LINT0)
-   set_dev_entry_bit(devid, DEV_ENTRY_LINT0_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT0_PASS);
if (flags & ACPI_DEVFLAG_LINT1)
-   set_dev_entry_bit(devid, DEV_ENTRY_LINT1_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT1_PASS);
 
-   amd_iommu_apply_erratum_63(devid);
+   amd_iommu_apply_erratum_63(iommu, devid);
 
set_iommu_for_device(iommu, devid);
 }
@@ -2518,8 +2533,8 @@ static void init_device_table_dma(struct 

[PATCH v3 25/35] iommu/amd: Update (un)init_device_table_dma()

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Include struct amd_iommu_pci_seg as a function parameter since
we need to access per PCI segment device table.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/init.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 5e8106641c5c..ca9131ab745b 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -238,7 +238,7 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
 
 static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum iommu_init_state state);
-static void init_device_table_dma(void);
+static void init_device_table_dma(struct amd_iommu_pci_seg *pci_seg);
 
 static bool amd_iommu_pre_enabled = true;
 
@@ -2114,6 +2114,7 @@ static void print_iommu_info(void)
 static int __init amd_iommu_init_pci(void)
 {
struct amd_iommu *iommu;
+   struct amd_iommu_pci_seg *pci_seg;
int ret;
 
for_each_iommu(iommu) {
@@ -2144,7 +2145,8 @@ static int __init amd_iommu_init_pci(void)
goto out;
}
 
-   init_device_table_dma();
+   for_each_pci_segment(pci_seg)
+   init_device_table_dma(pci_seg);
 
for_each_iommu(iommu)
iommu_flush_all_caches(iommu);
@@ -2507,9 +2509,13 @@ static int __init init_memory_definitions(struct 
acpi_table_header *table)
 /*
  * Init the device table to not allow DMA access for devices
  */
-static void init_device_table_dma(void)
+static void init_device_table_dma(struct amd_iommu_pci_seg *pci_seg)
 {
u32 devid;
+   struct dev_table_entry *dev_table = pci_seg->dev_table;
+
+   if (dev_table == NULL)
+   return;
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
set_dev_entry_bit(devid, DEV_ENTRY_VALID);
@@ -2517,13 +2523,17 @@ static void init_device_table_dma(void)
}
 }
 
-static void __init uninit_device_table_dma(void)
+static void __init uninit_device_table_dma(struct amd_iommu_pci_seg *pci_seg)
 {
u32 devid;
+   struct dev_table_entry *dev_table = pci_seg->dev_table;
+
+   if (dev_table == NULL)
+   return;
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
-   amd_iommu_dev_table[devid].data[0] = 0ULL;
-   amd_iommu_dev_table[devid].data[1] = 0ULL;
+   dev_table[devid].data[0] = 0ULL;
+   dev_table[devid].data[1] = 0ULL;
}
 }
 
@@ -3116,8 +3126,11 @@ static int __init state_next(void)
free_iommu_resources();
} else {
struct amd_iommu *iommu;
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg)
+   uninit_device_table_dma(pci_seg);
 
-   uninit_device_table_dma();
for_each_iommu(iommu)
iommu_flush_all_caches(iommu);
}
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 24/35] iommu/amd: Update set_dte_irq_entry

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment device table instead of global
device table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 8c99e2e161aa..ebae64711691 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2729,18 +2729,20 @@ EXPORT_SYMBOL(amd_iommu_device_info);
 static struct irq_chip amd_ir_chip;
 static DEFINE_SPINLOCK(iommu_table_lock);
 
-static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
+static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid,
+ struct irq_remap_table *table)
 {
u64 dte;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
 
-   dte = amd_iommu_dev_table[devid].data[2];
+   dte = dev_table[devid].data[2];
dte &= ~DTE_IRQ_PHYS_ADDR_MASK;
dte |= iommu_virt_to_phys(table->table);
dte |= DTE_IRQ_REMAP_INTCTL;
dte |= DTE_INTTABLEN;
dte |= DTE_IRQ_REMAP_ENABLE;
 
-   amd_iommu_dev_table[devid].data[2] = dte;
+   dev_table[devid].data[2] = dte;
 }
 
 static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 
devid)
@@ -2791,7 +2793,7 @@ static void set_remap_table_entry(struct amd_iommu 
*iommu, u16 devid,
struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
pci_seg->irq_lookup_table[devid] = table;
-   set_dte_irq_entry(devid, table);
+   set_dte_irq_entry(iommu, devid, table);
iommu_flush_dte(iommu, devid);
 }
 
@@ -2807,8 +2809,7 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
 
pci_seg = iommu->pci_seg;
pci_seg->irq_lookup_table[alias] = table;
-   set_dte_irq_entry(alias, table);
-
+   set_dte_irq_entry(iommu, alias, table);
iommu_flush_dte(pci_seg->rlookup_table[alias], alias);
 
return 0;
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 23/35] iommu/amd: Update dump_dte_entry

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment device table instead of global
device table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 6223af4ccc22..8c99e2e161aa 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -451,13 +451,13 @@ static void amd_iommu_uninit_device(struct device *dev)
  *
  /
 
-static void dump_dte_entry(u16 devid)
+static void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
 {
int i;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
 
for (i = 0; i < 4; ++i)
-   pr_err("DTE[%d]: %016llx\n", i,
-   amd_iommu_dev_table[devid].data[i]);
+   pr_err("DTE[%d]: %016llx\n", i, dev_table[devid].data[i]);
 }
 
 static void dump_command(unsigned long phys_addr)
@@ -618,7 +618,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY 
device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);
-   dump_dte_entry(devid);
+   dump_dte_entry(iommu, devid);
break;
case EVENT_TYPE_DEV_TAB_ERR:
dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR 
device=%02x:%02x.%x "
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 22/35] iommu/amd: Update iommu_ignore_device

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment device table instead of global
device table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 678eceb808e9..6223af4ccc22 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -413,15 +413,15 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct device *dev)
 static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
 {
struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
int devid;
 
-   devid = get_device_id(dev);
+   devid = (get_device_id(dev)) & 0x;
if (devid < 0)
return;
 
-
pci_seg->rlookup_table[devid] = NULL;
-   memset(_iommu_dev_table[devid], 0, sizeof(struct dev_table_entry));
+   memset(_table[devid], 0, sizeof(struct dev_table_entry));
 
setup_aliases(iommu, dev);
 }
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 21/35] iommu/amd: Update set_dte_entry and clear_dte_entry

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment data structures instead of global data
structures.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 126832ae2997..678eceb808e9 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1537,6 +1537,7 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
u64 pte_root = 0;
u64 flags = 0;
u32 old_domid;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
 
if (domain->iop.mode != PAGE_MODE_NONE)
pte_root = iommu_virt_to_phys(domain->iop.root);
@@ -1545,7 +1546,7 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
<< DEV_ENTRY_MODE_SHIFT;
pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;
 
-   flags = amd_iommu_dev_table[devid].data[1];
+   flags = dev_table[devid].data[1];
 
if (ats)
flags |= DTE_FLAG_IOTLB;
@@ -1584,9 +1585,9 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
flags &= ~DEV_DOMID_MASK;
flags |= domain->id;
 
-   old_domid = amd_iommu_dev_table[devid].data[1] & DEV_DOMID_MASK;
-   amd_iommu_dev_table[devid].data[1]  = flags;
-   amd_iommu_dev_table[devid].data[0]  = pte_root;
+   old_domid = dev_table[devid].data[1] & DEV_DOMID_MASK;
+   dev_table[devid].data[1]  = flags;
+   dev_table[devid].data[0]  = pte_root;
 
/*
 * A kdump kernel might be replacing a domain ID that was copied from
@@ -1598,11 +1599,13 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
}
 }
 
-static void clear_dte_entry(u16 devid)
+static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
 {
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
+
/* remove entry from the device table seen by the hardware */
-   amd_iommu_dev_table[devid].data[0]  = DTE_FLAG_V | DTE_FLAG_TV;
-   amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;
+   dev_table[devid].data[0]  = DTE_FLAG_V | DTE_FLAG_TV;
+   dev_table[devid].data[1] &= DTE_FLAG_MASK;
 
amd_iommu_apply_erratum_63(devid);
 }
@@ -1646,7 +1649,7 @@ static void do_detach(struct iommu_dev_data *dev_data)
/* Update data structures */
dev_data->domain = NULL;
list_del(_data->list);
-   clear_dte_entry(dev_data->devid);
+   clear_dte_entry(iommu, dev_data->devid);
clone_aliases(iommu, dev_data->dev);
 
/* Flush the DTE entry */
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 20/35] iommu/amd: Convert to use per PCI segment rlookup_table

2022-05-11 Thread Vasant Hegde via iommu
Then, remove the global amd_iommu_rlookup_table and rlookup_table_size.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  5 -
 drivers/iommu/amd/init.c| 23 ++-
 drivers/iommu/amd/iommu.c   | 19 +--
 3 files changed, 11 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 553e9910e91d..ddd606daa653 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -842,11 +842,6 @@ extern struct dev_table_entry *amd_iommu_dev_table;
  */
 extern u16 *amd_iommu_alias_table;
 
-/*
- * Reverse lookup table to find the IOMMU which translates a specific device.
- */
-extern struct amd_iommu **amd_iommu_rlookup_table;
-
 /* size of the dma_ops aperture as power of 2 */
 extern unsigned amd_iommu_aperture_order;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index ed40c7cec879..5e8106641c5c 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -200,12 +200,6 @@ struct dev_table_entry *amd_iommu_dev_table;
  */
 u16 *amd_iommu_alias_table;
 
-/*
- * The rlookup table is used to find the IOMMU which is responsible
- * for a specific device. It is also indexed by the PCI device id.
- */
-struct amd_iommu **amd_iommu_rlookup_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
@@ -214,7 +208,6 @@ unsigned long *amd_iommu_pd_alloc_bitmap;
 
 static u32 dev_table_size; /* size of the device table */
 static u32 alias_table_size;   /* size of the alias table */
-static u32 rlookup_table_size; /* size if the rlookup table */
 
 enum iommu_init_state {
IOMMU_START_STATE,
@@ -1142,7 +1135,7 @@ void amd_iommu_apply_erratum_63(u16 devid)
 /* Writes the specific IOMMU for a device into the rlookup table */
 static void __init set_iommu_for_device(struct amd_iommu *iommu, u16 devid)
 {
-   amd_iommu_rlookup_table[devid] = iommu;
+   iommu->pci_seg->rlookup_table[devid] = iommu;
 }
 
 /*
@@ -1824,7 +1817,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h,
 * Make sure IOMMU is not considered to translate itself. The IVRS
 * table tells us so, but this is a lie!
 */
-   amd_iommu_rlookup_table[iommu->devid] = NULL;
+   pci_seg->rlookup_table[iommu->devid] = NULL;
 
return 0;
 }
@@ -2782,10 +2775,6 @@ static void __init free_iommu_resources(void)
kmem_cache_destroy(amd_iommu_irq_cache);
amd_iommu_irq_cache = NULL;
 
-   free_pages((unsigned long)amd_iommu_rlookup_table,
-  get_order(rlookup_table_size));
-   amd_iommu_rlookup_table = NULL;
-
free_pages((unsigned long)amd_iommu_alias_table,
   get_order(alias_table_size));
amd_iommu_alias_table = NULL;
@@ -2924,7 +2913,6 @@ static int __init early_amd_iommu_init(void)
 
dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
-   rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE);
 
/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
@@ -2943,13 +2931,6 @@ static int __init early_amd_iommu_init(void)
if (amd_iommu_alias_table == NULL)
goto out;
 
-   /* IOMMU rlookup table - find the IOMMU for a specific device */
-   amd_iommu_rlookup_table = (void *)__get_free_pages(
-   GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
-   if (amd_iommu_rlookup_table == NULL)
-   goto out;
-
amd_iommu_pd_alloc_bitmap = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
get_order(MAX_DOMAIN_ID/8));
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 34909faeef76..126832ae2997 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -287,10 +287,9 @@ static void setup_aliases(struct amd_iommu *iommu, struct 
device *dev)
clone_aliases(iommu, dev);
 }
 
-static struct iommu_dev_data *find_dev_data(u16 devid)
+static struct iommu_dev_data *find_dev_data(struct amd_iommu *iommu, u16 devid)
 {
struct iommu_dev_data *dev_data;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
dev_data = search_dev_data(iommu, devid);
 
@@ -388,7 +387,7 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct device *dev)
if (devid < 0)
return devid;
 
-   dev_data = find_dev_data(devid);
+   dev_data = find_dev_data(iommu, devid);
if (!dev_data)
return -ENOMEM;
 
@@ -403,9 +402,6 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct 

[PATCH v3 19/35] iommu/amd: Update alloc_irq_table and alloc_irq_index

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Pass amd_iommu structure as one of the parameter to these functions
as its needed to retrieve variable tables inside these functions.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 26 +-
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2023bb7c2c3a..34909faeef76 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2812,21 +2812,17 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
return 0;
 }
 
-static struct irq_remap_table *alloc_irq_table(u16 devid, struct pci_dev *pdev)
+static struct irq_remap_table *alloc_irq_table(struct amd_iommu *iommu,
+  u16 devid, struct pci_dev *pdev)
 {
struct irq_remap_table *table = NULL;
struct irq_remap_table *new_table = NULL;
struct amd_iommu_pci_seg *pci_seg;
-   struct amd_iommu *iommu;
unsigned long flags;
u16 alias;
 
spin_lock_irqsave(_table_lock, flags);
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (!iommu)
-   goto out_unlock;
-
pci_seg = iommu->pci_seg;
table = pci_seg->irq_lookup_table[devid];
if (table)
@@ -2882,18 +2878,14 @@ static struct irq_remap_table *alloc_irq_table(u16 
devid, struct pci_dev *pdev)
return table;
 }
 
-static int alloc_irq_index(u16 devid, int count, bool align,
-  struct pci_dev *pdev)
+static int alloc_irq_index(struct amd_iommu *iommu, u16 devid, int count,
+  bool align, struct pci_dev *pdev)
 {
struct irq_remap_table *table;
int index, c, alignment = 1;
unsigned long flags;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-   if (!iommu)
-   return -ENODEV;
 
-   table = alloc_irq_table(devid, pdev);
+   table = alloc_irq_table(iommu, devid, pdev);
if (!table)
return -ENODEV;
 
@@ -3265,7 +3257,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
struct irq_remap_table *table;
 
-   table = alloc_irq_table(devid, NULL);
+   table = alloc_irq_table(iommu, devid, NULL);
if (table) {
if (!table->min_index) {
/*
@@ -3285,10 +3277,10 @@ static int irq_remapping_alloc(struct irq_domain 
*domain, unsigned int virq,
   info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) {
bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI);
 
-   index = alloc_irq_index(devid, nr_irqs, align,
+   index = alloc_irq_index(iommu, devid, nr_irqs, align,
msi_desc_to_pci_dev(info->desc));
} else {
-   index = alloc_irq_index(devid, nr_irqs, false, NULL);
+   index = alloc_irq_index(iommu, devid, nr_irqs, false, NULL);
}
 
if (index < 0) {
@@ -3414,8 +3406,8 @@ static int irq_remapping_select(struct irq_domain *d, 
struct irq_fwspec *fwspec,
 
if (devid < 0)
return 0;
+   iommu = __rlookup_amd_iommu((devid >> 16), (devid & 0x));
 
-   iommu = amd_iommu_rlookup_table[devid];
return iommu && iommu->ir_domain == d;
 }
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 18/35] iommu/amd: Update amd_irte_ops functions

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Pass amd_iommu structure as one of the parameter to amd_irte_ops functions
since its needed to activate/deactivate the iommu.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 ++--
 drivers/iommu/amd/iommu.c   | 51 -
 2 files changed, 24 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index aee923f8ef9e..553e9910e91d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -1003,9 +1003,9 @@ struct amd_ir_data {
 
 struct amd_irte_ops {
void (*prepare)(void *, u32, bool, u8, u32, int);
-   void (*activate)(void *, u16, u16);
-   void (*deactivate)(void *, u16, u16);
-   void (*set_affinity)(void *, u16, u16, u8, u32);
+   void (*activate)(struct amd_iommu *iommu, void *, u16, u16);
+   void (*deactivate)(struct amd_iommu *iommu, void *, u16, u16);
+   void (*set_affinity)(struct amd_iommu *iommu, void *, u16, u16, u8, 
u32);
void *(*get)(struct irq_remap_table *, int);
void (*set_allocated)(struct irq_remap_table *, int);
bool (*is_allocated)(struct irq_remap_table *, int);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 935e12fb6db4..2023bb7c2c3a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2932,19 +2932,14 @@ static int alloc_irq_index(u16 devid, int count, bool 
align,
return index;
 }
 
-static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
- struct amd_ir_data *data)
+static int modify_irte_ga(struct amd_iommu *iommu, u16 devid, int index,
+ struct irte_ga *irte, struct amd_ir_data *data)
 {
bool ret;
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
unsigned long flags;
struct irte_ga *entry;
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu == NULL)
-   return -EINVAL;
-
table = get_irq_table(iommu, devid);
if (!table)
return -ENOMEM;
@@ -2976,16 +2971,12 @@ static int modify_irte_ga(u16 devid, int index, struct 
irte_ga *irte,
return 0;
 }
 
-static int modify_irte(u16 devid, int index, union irte *irte)
+static int modify_irte(struct amd_iommu *iommu,
+  u16 devid, int index, union irte *irte)
 {
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
unsigned long flags;
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu == NULL)
-   return -EINVAL;
-
table = get_irq_table(iommu, devid);
if (!table)
return -ENOMEM;
@@ -3047,49 +3038,49 @@ static void irte_ga_prepare(void *entry,
irte->lo.fields_remap.valid   = 1;
 }
 
-static void irte_activate(void *entry, u16 devid, u16 index)
+static void irte_activate(struct amd_iommu *iommu, void *entry, u16 devid, u16 
index)
 {
union irte *irte = (union irte *) entry;
 
irte->fields.valid = 1;
-   modify_irte(devid, index, irte);
+   modify_irte(iommu, devid, index, irte);
 }
 
-static void irte_ga_activate(void *entry, u16 devid, u16 index)
+static void irte_ga_activate(struct amd_iommu *iommu, void *entry, u16 devid, 
u16 index)
 {
struct irte_ga *irte = (struct irte_ga *) entry;
 
irte->lo.fields_remap.valid = 1;
-   modify_irte_ga(devid, index, irte, NULL);
+   modify_irte_ga(iommu, devid, index, irte, NULL);
 }
 
-static void irte_deactivate(void *entry, u16 devid, u16 index)
+static void irte_deactivate(struct amd_iommu *iommu, void *entry, u16 devid, 
u16 index)
 {
union irte *irte = (union irte *) entry;
 
irte->fields.valid = 0;
-   modify_irte(devid, index, irte);
+   modify_irte(iommu, devid, index, irte);
 }
 
-static void irte_ga_deactivate(void *entry, u16 devid, u16 index)
+static void irte_ga_deactivate(struct amd_iommu *iommu, void *entry, u16 
devid, u16 index)
 {
struct irte_ga *irte = (struct irte_ga *) entry;
 
irte->lo.fields_remap.valid = 0;
-   modify_irte_ga(devid, index, irte, NULL);
+   modify_irte_ga(iommu, devid, index, irte, NULL);
 }
 
-static void irte_set_affinity(void *entry, u16 devid, u16 index,
+static void irte_set_affinity(struct amd_iommu *iommu, void *entry, u16 devid, 
u16 index,
  u8 vector, u32 dest_apicid)
 {
union irte *irte = (union irte *) entry;
 
irte->fields.vector = vector;
irte->fields.destination = dest_apicid;
-   modify_irte(devid, index, irte);
+   modify_irte(iommu, devid, index, irte);
 }
 
-static void irte_ga_set_affinity(void *entry, u16 devid, u16 index,
+static void irte_ga_set_affinity(struct amd_iommu *iommu, void *entry, u16 
devid, u16 index,
 u8 vector, u32 

[PATCH v3 17/35] iommu/amd: Introduce struct amd_ir_data.iommu

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Add a pointer to struct amd_iommu to amd_ir_data structure, which
can be used to correlate interrupt remapping data to a per-PCI-segment
interrupt remapping table.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/iommu.c   | 34 +
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 79c44f6033e0..aee923f8ef9e 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -985,6 +985,7 @@ struct irq_2_irte {
 
 struct amd_ir_data {
u32 cached_ga_tag;
+   struct amd_iommu *iommu;
struct irq_2_irte irq_2_irte;
struct msi_msg msi_entry;
void *entry;/* Pointer to union irte or struct irte_ga */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 502e66d09c61..935e12fb6db4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3000,16 +3000,11 @@ static int modify_irte(u16 devid, int index, union irte 
*irte)
return 0;
 }
 
-static void free_irte(u16 devid, int index)
+static void free_irte(struct amd_iommu *iommu, u16 devid, int index)
 {
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
unsigned long flags;
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu == NULL)
-   return;
-
table = get_irq_table(iommu, devid);
if (!table)
return;
@@ -3193,7 +3188,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data 
*data,
   int devid, int index, int sub_handle)
 {
struct irq_2_irte *irte_info = >irq_2_irte;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+   struct amd_iommu *iommu = data->iommu;
 
if (!iommu)
return;
@@ -3334,6 +3329,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
goto out_free_data;
}
 
+   data->iommu = iommu;
irq_data->hwirq = (devid << 16) + i;
irq_data->chip_data = data;
irq_data->chip = _ir_chip;
@@ -3350,7 +3346,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
kfree(irq_data->chip_data);
}
for (i = 0; i < nr_irqs; i++)
-   free_irte(devid, index + i);
+   free_irte(iommu, devid, index + i);
 out_free_parent:
irq_domain_free_irqs_common(domain, virq, nr_irqs);
return ret;
@@ -3369,7 +3365,7 @@ static void irq_remapping_free(struct irq_domain *domain, 
unsigned int virq,
if (irq_data && irq_data->chip_data) {
data = irq_data->chip_data;
irte_info = >irq_2_irte;
-   free_irte(irte_info->devid, irte_info->index);
+   free_irte(data->iommu, irte_info->devid, 
irte_info->index);
kfree(data->entry);
kfree(data);
}
@@ -3387,7 +3383,7 @@ static int irq_remapping_activate(struct irq_domain 
*domain,
 {
struct amd_ir_data *data = irq_data->chip_data;
struct irq_2_irte *irte_info = >irq_2_irte;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid];
+   struct amd_iommu *iommu = data->iommu;
struct irq_cfg *cfg = irqd_cfg(irq_data);
 
if (!iommu)
@@ -3404,7 +3400,7 @@ static void irq_remapping_deactivate(struct irq_domain 
*domain,
 {
struct amd_ir_data *data = irq_data->chip_data;
struct irq_2_irte *irte_info = >irq_2_irte;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid];
+   struct amd_iommu *iommu = data->iommu;
 
if (iommu)
iommu->irte_ops->deactivate(data->entry, irte_info->devid,
@@ -3500,12 +3496,16 @@ EXPORT_SYMBOL(amd_iommu_deactivate_guest_mode);
 static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
 {
int ret;
-   struct amd_iommu *iommu;
struct amd_iommu_pi_data *pi_data = vcpu_info;
struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data;
struct amd_ir_data *ir_data = data->chip_data;
struct irq_2_irte *irte_info = _data->irq_2_irte;
-   struct iommu_dev_data *dev_data = search_dev_data(NULL, 
irte_info->devid);
+   struct iommu_dev_data *dev_data;
+
+   if (ir_data->iommu == NULL)
+   return -EINVAL;
+
+   dev_data = search_dev_data(ir_data->iommu, irte_info->devid);
 
/* Note:
 * This device has never been set up for guest mode.
@@ -3527,10 +3527,6 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
pi_data->is_guest_mode = false;
 

[PATCH v3 16/35] iommu/amd: Update irq_remapping_alloc to use IOMMU lookup helper function

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

To allow IOMMU rlookup using both PCI segment and device ID.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 1485e4d4fb52..502e66d09c61 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3244,8 +3244,9 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
struct irq_alloc_info *info = arg;
struct irq_data *irq_data;
struct amd_ir_data *data = NULL;
+   struct amd_iommu *iommu;
struct irq_cfg *cfg;
-   int i, ret, devid;
+   int i, ret, devid, seg, sbdf;
int index;
 
if (!info)
@@ -3261,8 +3262,14 @@ static int irq_remapping_alloc(struct irq_domain 
*domain, unsigned int virq,
if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 
-   devid = get_devid(info);
-   if (devid < 0)
+   sbdf = get_devid(info);
+   if (sbdf < 0)
+   return -EINVAL;
+
+   seg = PCI_SBDF_TO_SEGID(sbdf);
+   devid = PCI_SBDF_TO_DEVID(sbdf);
+   iommu = __rlookup_amd_iommu(seg, devid);
+   if (!iommu)
return -EINVAL;
 
ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
@@ -3271,7 +3278,6 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
 
if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
 
table = alloc_irq_table(devid, NULL);
if (table) {
@@ -3281,7 +3287,6 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
 * interrupts.
 */
table->min_index = 32;
-   iommu = amd_iommu_rlookup_table[devid];
for (i = 0; i < 32; ++i)
iommu->irte_ops->set_allocated(table, 
i);
}
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 15/35] iommu/amd: Convert to use rlookup_amd_iommu helper function

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Use rlookup_amd_iommu() helper function which will give per PCI
segment rlookup_table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 64 +++
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 02fb244c3c7d..1485e4d4fb52 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -229,13 +229,17 @@ static struct iommu_dev_data *search_dev_data(struct 
amd_iommu *iommu, u16 devid
 
 static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
 {
+   struct amd_iommu *iommu;
u16 devid = pci_dev_id(pdev);
 
if (devid == alias)
return 0;
 
-   amd_iommu_rlookup_table[alias] =
-   amd_iommu_rlookup_table[devid];
+   iommu = rlookup_amd_iommu(>dev);
+   if (!iommu)
+   return 0;
+
+   amd_iommu_set_rlookup_table(iommu, alias);
memcpy(amd_iommu_dev_table[alias].data,
   amd_iommu_dev_table[devid].data,
   sizeof(amd_iommu_dev_table[alias].data));
@@ -366,7 +370,7 @@ static bool check_device(struct device *dev)
if (devid > amd_iommu_last_bdf)
return false;
 
-   if (amd_iommu_rlookup_table[devid] == NULL)
+   if (rlookup_amd_iommu(dev) == NULL)
return false;
 
return true;
@@ -1270,7 +1274,9 @@ static int device_flush_iotlb(struct iommu_dev_data 
*dev_data,
int qdep;
 
qdep = dev_data->ats.qdep;
-   iommu= amd_iommu_rlookup_table[dev_data->devid];
+   iommu= rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return -EINVAL;
 
build_inv_iotlb_pages(, dev_data->devid, qdep, address, size);
 
@@ -1295,7 +1301,9 @@ static int device_flush_dte(struct iommu_dev_data 
*dev_data)
u16 alias;
int ret;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return -EINVAL;
 
if (dev_is_pci(dev_data->dev))
pdev = to_pci_dev(dev_data->dev);
@@ -1525,8 +1533,8 @@ static void free_gcr3_table(struct protection_domain 
*domain)
free_page((unsigned long)domain->gcr3_tbl);
 }
 
-static void set_dte_entry(u16 devid, struct protection_domain *domain,
- bool ats, bool ppr)
+static void set_dte_entry(struct amd_iommu *iommu, u16 devid,
+ struct protection_domain *domain, bool ats, bool ppr)
 {
u64 pte_root = 0;
u64 flags = 0;
@@ -1545,8 +1553,6 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
flags |= DTE_FLAG_IOTLB;
 
if (ppr) {
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
if (iommu_feature(iommu, FEATURE_EPHSUP))
pte_root |= 1ULL << DEV_ENTRY_PPR;
}
@@ -1590,8 +1596,6 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
 * entries for the old domain ID that is being overwritten
 */
if (old_domid) {
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
amd_iommu_flush_tlb_domid(iommu, old_domid);
}
 }
@@ -1611,7 +1615,9 @@ static void do_attach(struct iommu_dev_data *dev_data,
struct amd_iommu *iommu;
bool ats;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return;
ats   = dev_data->ats.enabled;
 
/* Update data structures */
@@ -1623,7 +1629,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
domain->dev_cnt += 1;
 
/* Update device table */
-   set_dte_entry(dev_data->devid, domain,
+   set_dte_entry(iommu, dev_data->devid, domain,
  ats, dev_data->iommu_v2);
clone_aliases(iommu, dev_data->dev);
 
@@ -1635,7 +1641,9 @@ static void do_detach(struct iommu_dev_data *dev_data)
struct protection_domain *domain = dev_data->domain;
struct amd_iommu *iommu;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return;
 
/* Update data structures */
dev_data->domain = NULL;
@@ -1813,13 +1821,14 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
 {
struct iommu_device *iommu_dev;
struct amd_iommu *iommu;
-   int ret, devid;
+   int ret;
 
if (!check_device(dev))
return ERR_PTR(-ENODEV);
 
-   devid = get_device_id(dev);
-   iommu = amd_iommu_rlookup_table[devid];
+   iommu = rlookup_amd_iommu(dev);
+   if (!iommu)
+   return ERR_PTR(-ENODEV);
 
if 

[PATCH v3 14/35] iommu/amd: Convert to use per PCI segment irq_lookup_table

2022-05-11 Thread Vasant Hegde via iommu
Then, remove the global irq_lookup_table.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 --
 drivers/iommu/amd/init.c| 19 ---
 drivers/iommu/amd/iommu.c   | 36 ++---
 3 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index cf3157ba1225..79c44f6033e0 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -444,8 +444,6 @@ struct irq_remap_table {
u32 *table;
 };
 
-extern struct irq_remap_table **irq_lookup_table;
-
 /* Interrupt remapping feature used? */
 extern bool amd_iommu_irq_remap;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 018dfd0370c6..ed40c7cec879 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -206,12 +206,6 @@ u16 *amd_iommu_alias_table;
  */
 struct amd_iommu **amd_iommu_rlookup_table;
 
-/*
- * This table is used to find the irq remapping table for a given device id
- * quickly.
- */
-struct irq_remap_table **irq_lookup_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
@@ -2785,11 +2779,6 @@ static struct syscore_ops amd_iommu_syscore_ops = {
 
 static void __init free_iommu_resources(void)
 {
-   kmemleak_free(irq_lookup_table);
-   free_pages((unsigned long)irq_lookup_table,
-  get_order(rlookup_table_size));
-   irq_lookup_table = NULL;
-
kmem_cache_destroy(amd_iommu_irq_cache);
amd_iommu_irq_cache = NULL;
 
@@ -3010,14 +2999,6 @@ static int __init early_amd_iommu_init(void)
if (alloc_irq_lookup_table(pci_seg))
goto out;
}
-
-   irq_lookup_table = (void *)__get_free_pages(
-   GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
-   kmemleak_alloc(irq_lookup_table, rlookup_table_size,
-  1, GFP_KERNEL);
-   if (!irq_lookup_table)
-   goto out;
}
 
ret = init_memory_definitions(ivrs_base);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 5118ade206b8..02fb244c3c7d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2730,16 +2730,18 @@ static void set_dte_irq_entry(u16 devid, struct 
irq_remap_table *table)
amd_iommu_dev_table[devid].data[2] = dte;
 }
 
-static struct irq_remap_table *get_irq_table(u16 devid)
+static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 
devid)
 {
struct irq_remap_table *table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
if (WARN_ONCE(!amd_iommu_rlookup_table[devid],
  "%s: no iommu for devid %x\n", __func__, devid))
return NULL;
 
-   table = irq_lookup_table[devid];
-   if (WARN_ONCE(!table, "%s: no table for devid %x\n", __func__, devid))
+   table = pci_seg->irq_lookup_table[devid];
+   if (WARN_ONCE(!table, "%s: no table for devid %x:%x\n",
+ __func__, pci_seg->id, devid))
return NULL;
 
return table;
@@ -2772,7 +2774,9 @@ static struct irq_remap_table *__alloc_irq_table(void)
 static void set_remap_table_entry(struct amd_iommu *iommu, u16 devid,
  struct irq_remap_table *table)
 {
-   irq_lookup_table[devid] = table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   pci_seg->irq_lookup_table[devid] = table;
set_dte_irq_entry(devid, table);
iommu_flush_dte(iommu, devid);
 }
@@ -2781,8 +2785,14 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
   void *data)
 {
struct irq_remap_table *table = data;
+   struct amd_iommu_pci_seg *pci_seg;
+   struct amd_iommu *iommu = rlookup_amd_iommu(>dev);
 
-   irq_lookup_table[alias] = table;
+   if (!iommu)
+   return -EINVAL;
+
+   pci_seg = iommu->pci_seg;
+   pci_seg->irq_lookup_table[alias] = table;
set_dte_irq_entry(alias, table);
 
iommu_flush_dte(amd_iommu_rlookup_table[alias], alias);
@@ -2806,12 +2816,12 @@ static struct irq_remap_table *alloc_irq_table(u16 
devid, struct pci_dev *pdev)
goto out_unlock;
 
pci_seg = iommu->pci_seg;
-   table = irq_lookup_table[devid];
+   table = pci_seg->irq_lookup_table[devid];
if (table)
goto out_unlock;
 
alias = pci_seg->alias_table[devid];
-   table = irq_lookup_table[alias];
+   table = pci_seg->irq_lookup_table[alias];
if (table) {
set_remap_table_entry(iommu, devid, table);
  

[PATCH v3 13/35] iommu/amd: Introduce per PCI segment rlookup table size

2022-05-11 Thread Vasant Hegde via iommu
It will replace global "rlookup_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 11 ++-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 4912e1913b96..cf3157ba1225 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -557,6 +557,9 @@ struct amd_iommu_pci_seg {
/* Size of the alias table */
u32 alias_table_size;
 
+   /* Size of the rlookup table */
+   u32 rlookup_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 0d9126a92cff..018dfd0370c6 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -670,7 +670,7 @@ static inline int __init alloc_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 {
pci_seg->rlookup_table = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
+   
get_order(pci_seg->rlookup_table_size));
if (pci_seg->rlookup_table == NULL)
return -ENOMEM;
 
@@ -680,7 +680,7 @@ static inline int __init alloc_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->rlookup_table,
-  get_order(rlookup_table_size));
+  get_order(pci_seg->rlookup_table_size));
pci_seg->rlookup_table = NULL;
 }
 
@@ -688,9 +688,9 @@ static inline int __init alloc_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_se
 {
pci_seg->irq_lookup_table = (void *)__get_free_pages(
 GFP_KERNEL | __GFP_ZERO,
-get_order(rlookup_table_size));
+
get_order(pci_seg->rlookup_table_size));
kmemleak_alloc(pci_seg->irq_lookup_table,
-  rlookup_table_size, 1, GFP_KERNEL);
+  pci_seg->rlookup_table_size, 1, GFP_KERNEL);
if (pci_seg->irq_lookup_table == NULL)
return -ENOMEM;
 
@@ -701,7 +701,7 @@ static inline void free_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 {
kmemleak_free(pci_seg->irq_lookup_table);
free_pages((unsigned long)pci_seg->irq_lookup_table,
-  get_order(rlookup_table_size));
+  get_order(pci_seg->rlookup_table_size));
pci_seg->irq_lookup_table = NULL;
 }
 
@@ -1582,6 +1582,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
+   pci_seg->rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 12/35] iommu/amd: Introduce per PCI segment alias table size

2022-05-11 Thread Vasant Hegde via iommu
It will replace global "alias_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 3 +++
 drivers/iommu/amd/init.c| 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 6d979c4efd54..4912e1913b96 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -554,6 +554,9 @@ struct amd_iommu_pci_seg {
/* Size of the device table */
u32 dev_table_size;
 
+   /* Size of the alias table */
+   u32 alias_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 721154c3bf4d..0d9126a92cff 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -710,7 +710,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
int i;
 
pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL,
-   
get_order(alias_table_size));
+   get_order(pci_seg->alias_table_size));
if (!pci_seg->alias_table)
return -ENOMEM;
 
@@ -726,7 +726,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
 static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->alias_table,
-  get_order(alias_table_size));
+  get_order(pci_seg->alias_table_size));
pci_seg->alias_table = NULL;
 }
 
@@ -1581,6 +1581,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
+   pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 11/35] iommu/amd: Introduce per PCI segment device table size

2022-05-11 Thread Vasant Hegde via iommu
With multiple pci segment support, number of BDF supported by each
segment may differ. Hence introduce per segment device table size
which depends on last_bdf. This will replace global
"device_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 18 ++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 8b8079fdf0d4..6d979c4efd54 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -551,6 +551,9 @@ struct amd_iommu_pci_seg {
/* Largest PCI device id we expect translation requests for */
u16 last_bdf;
 
+   /* Size of the device table */
+   u32 dev_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 67a92f453731..721154c3bf4d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -414,6 +414,7 @@ static void iommu_set_cwwb_range(struct amd_iommu *iommu)
 static void iommu_set_device_table(struct amd_iommu *iommu)
 {
u64 entry;
+   u32 dev_table_size = iommu->pci_seg->dev_table_size;
 
BUG_ON(iommu->mmio_base == NULL);
 
@@ -650,7 +651,7 @@ static int __init find_last_devid_acpi(struct 
acpi_table_header *table, u16 pci_
 static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | 
GFP_DMA32,
- 
get_order(dev_table_size));
+ 
get_order(pci_seg->dev_table_size));
if (!pci_seg->dev_table)
return -ENOMEM;
 
@@ -660,7 +661,7 @@ static inline int __init alloc_dev_table(struct 
amd_iommu_pci_seg *pci_seg)
 static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->dev_table,
-   get_order(dev_table_size));
+   get_order(pci_seg->dev_table_size));
pci_seg->dev_table = NULL;
 }
 
@@ -1033,7 +1034,7 @@ static bool __copy_device_table(struct amd_iommu *iommu)
entry = (((u64) hi) << 32) + lo;
 
old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
-   if (old_devtb_size != dev_table_size) {
+   if (old_devtb_size != pci_seg->dev_table_size) {
pr_err("The device table size of IOMMU:%d is not expected!\n",
iommu->index);
return false;
@@ -1052,15 +1053,15 @@ static bool __copy_device_table(struct amd_iommu *iommu)
}
old_devtb = (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) && 
is_kdump_kernel())
? (__force void *)ioremap_encrypted(old_devtb_phys,
-   dev_table_size)
-   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   pci_seg->dev_table_size)
+   : memremap(old_devtb_phys, pci_seg->dev_table_size, 
MEMREMAP_WB);
 
if (!old_devtb)
return false;
 
gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
-   get_order(dev_table_size));
+   
get_order(pci_seg->dev_table_size));
if (pci_seg->old_dev_tbl_cpy == NULL) {
pr_err("Failed to allocate memory for copying old device 
table!\n");
memunmap(old_devtb);
@@ -1579,6 +1580,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
 
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
+   pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
@@ -2674,7 +2676,7 @@ static void early_enable_iommus(void)
for_each_pci_segment(pci_seg) {
if (pci_seg->old_dev_tbl_cpy != NULL) {
free_pages((unsigned 
long)pci_seg->old_dev_tbl_cpy,
-   get_order(dev_table_size));
+   
get_order(pci_seg->dev_table_size));
pci_seg->old_dev_tbl_cpy = NULL;
}
}
@@ -2688,7 +2690,7 @@ static void early_enable_iommus(void)
 
for_each_pci_segment(pci_seg) {
free_pages((unsigned long)pci_seg->dev_table,
-  get_order(dev_table_size));
+  

[PATCH v3 10/35] iommu/amd: Introduce per PCI segment last_bdf

2022-05-11 Thread Vasant Hegde via iommu
Current code uses global "amd_iommu_last_bdf" to track the last bdf
supported by the system. This value is used for various memory
allocation, device data flushing, etc.

Introduce per PCI segment last_bdf which will be used to track last bdf
supported by the given PCI segment and use this value for all per
segment memory allocations. Eventually it will replace global
"amd_iommu_last_bdf".

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 ++
 drivers/iommu/amd/init.c| 69 ++---
 2 files changed, 45 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 574f4f414f7d..8b8079fdf0d4 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -548,6 +548,9 @@ struct amd_iommu_pci_seg {
/* PCI segment number */
u16 id;
 
+   /* Largest PCI device id we expect translation requests for */
+   u16 last_bdf;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 5cb21d43bd6f..67a92f453731 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -550,6 +550,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
 {
u8 *p = (void *)h, *end = (void *)h;
struct ivhd_entry *dev;
+   int last_devid = -EINVAL;
 
u32 ivhd_size = get_ivhd_header_size(h);
 
@@ -567,13 +568,15 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
case IVHD_DEV_ALL:
/* Use maximum BDF value for DEV_ALL */
update_last_devid(0x);
-   break;
+   return 0x;
case IVHD_DEV_SELECT:
case IVHD_DEV_RANGE_END:
case IVHD_DEV_ALIAS:
case IVHD_DEV_EXT_SELECT:
/* all the above subfield types refer to device ids */
update_last_devid(dev->devid);
+   if (dev->devid > last_devid)
+   last_devid = dev->devid;
break;
default:
break;
@@ -583,7 +586,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
 
WARN_ON(p != end);
 
-   return 0;
+   return last_devid;
 }
 
 static int __init check_ivrs_checksum(struct acpi_table_header *table)
@@ -607,27 +610,31 @@ static int __init check_ivrs_checksum(struct 
acpi_table_header *table)
  * id which we need to handle. This is the first of three functions which parse
  * the ACPI table. So we check the checksum here.
  */
-static int __init find_last_devid_acpi(struct acpi_table_header *table)
+static int __init find_last_devid_acpi(struct acpi_table_header *table, u16 
pci_seg)
 {
u8 *p = (u8 *)table, *end = (u8 *)table;
struct ivhd_header *h;
+   int last_devid, last_bdf = 0;
 
p += IVRS_HEADER_LENGTH;
 
end += table->length;
while (p < end) {
h = (struct ivhd_header *)p;
-   if (h->type == amd_iommu_target_ivhd_type) {
-   int ret = find_last_devid_from_ivhd(h);
-
-   if (ret)
-   return ret;
+   if (h->pci_seg == pci_seg &&
+   h->type == amd_iommu_target_ivhd_type) {
+   last_devid = find_last_devid_from_ivhd(h);
+
+   if (last_devid < 0)
+   return -EINVAL;
+   if (last_devid > last_bdf)
+   last_bdf = last_devid;
}
p += h->length;
}
WARN_ON(p != end);
 
-   return 0;
+   return last_bdf;
 }
 
 /
@@ -1551,14 +1558,28 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 }
 
 /* Allocate PCI segment data structure */
-static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id)
+static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id,
+ struct acpi_table_header *ivrs_base)
 {
struct amd_iommu_pci_seg *pci_seg;
+   int last_bdf;
+
+   /*
+* First parse ACPI tables to find the largest Bus/Dev/Func we need to
+* handle in this PCI segment. Upon this information the shared data
+* structures for the PCI segments in the system will be allocated.
+*/
+   last_bdf = find_last_devid_acpi(ivrs_base, id);
+   if (last_bdf < 0)
+   return NULL;
 
pci_seg = kzalloc(sizeof(struct amd_iommu_pci_seg), GFP_KERNEL);
if (pci_seg == NULL)
return NULL;
 
+   pci_seg->last_bdf = 

[PATCH v3 09/35] iommu/amd: Introduce per PCI segment unity map list

2022-05-11 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments. In order to support
multiple PCI segments IVMD table in IVRS structure is enhanced to
include pci segment id. Update ivmd_header structure to include "pci_seg".

Also introduce per PCI segment unity map list. It will replace global
amd_iommu_unity_map list.

Note that we have used "reserved" field in IVMD table to include "pci_seg
id" which was set to zero. It will take care of backward compatibility
(new kernel will work fine on older systems).

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 13 +++--
 drivers/iommu/amd/init.c| 30 +++--
 drivers/iommu/amd/iommu.c   |  8 +++-
 3 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 9534064124f9..574f4f414f7d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -583,6 +583,13 @@ struct amd_iommu_pci_seg {
 * More than one device can share the same requestor id.
 */
u16 *alias_table;
+
+   /*
+* A list of required unity mappings we find in ACPI. It is not locked
+* because as runtime it is only read. It is created at ACPI table
+* parsing time.
+*/
+   struct list_head unity_map;
 };
 
 /*
@@ -809,12 +816,6 @@ struct unity_map_entry {
int prot;
 };
 
-/*
- * List of all unity mappings. It is not locked because as runtime it is only
- * read. It is created at ACPI table parsing time.
- */
-extern struct list_head amd_iommu_unity_map;
-
 /*
  * Data structures for device handling
  */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9d57cb05878e..5cb21d43bd6f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -142,7 +142,8 @@ struct ivmd_header {
u16 length;
u16 devid;
u16 aux;
-   u64 resv;
+   u16 pci_seg;
+   u8  resv[6];
u64 range_start;
u64 range_length;
 } __attribute__((packed));
@@ -162,8 +163,6 @@ static int amd_iommu_target_ivhd_type;
 
 u16 amd_iommu_last_bdf;/* largest PCI device id we have
   to handle */
-LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
-  we find in ACPI */
 
 LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
@@ -1562,6 +1561,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
+   INIT_LIST_HEAD(_seg->unity_map);
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
if (alloc_dev_table(pci_seg))
@@ -2397,10 +2397,13 @@ static int iommu_init_irq(struct amd_iommu *iommu)
 static void __init free_unity_maps(void)
 {
struct unity_map_entry *entry, *next;
+   struct amd_iommu_pci_seg *p, *pci_seg;
 
-   list_for_each_entry_safe(entry, next, _iommu_unity_map, list) {
-   list_del(>list);
-   kfree(entry);
+   for_each_pci_segment_safe(pci_seg, p) {
+   list_for_each_entry_safe(entry, next, _seg->unity_map, 
list) {
+   list_del(>list);
+   kfree(entry);
+   }
}
 }
 
@@ -2408,8 +2411,13 @@ static void __init free_unity_maps(void)
 static int __init init_unity_map_range(struct ivmd_header *m)
 {
struct unity_map_entry *e = NULL;
+   struct amd_iommu_pci_seg *pci_seg;
char *s;
 
+   pci_seg = get_pci_segment(m->pci_seg);
+   if (pci_seg == NULL)
+   return -ENOMEM;
+
e = kzalloc(sizeof(*e), GFP_KERNEL);
if (e == NULL)
return -ENOMEM;
@@ -2447,14 +2455,16 @@ static int __init init_unity_map_range(struct 
ivmd_header *m)
if (m->flags & IVMD_FLAG_EXCL_RANGE)
e->prot = (IVMD_FLAG_IW | IVMD_FLAG_IR) >> 1;
 
-   DUMP_printk("%s devid_start: %02x:%02x.%x devid_end: %02x:%02x.%x"
-   " range_start: %016llx range_end: %016llx flags: %x\n", s,
+   DUMP_printk("%s devid_start: %04x:%02x:%02x.%x devid_end: "
+   "%04x:%02x:%02x.%x range_start: %016llx range_end: %016llx"
+   " flags: %x\n", s, m->pci_seg,
PCI_BUS_NUM(e->devid_start), PCI_SLOT(e->devid_start),
-   PCI_FUNC(e->devid_start), PCI_BUS_NUM(e->devid_end),
+   PCI_FUNC(e->devid_start), m->pci_seg,
+   PCI_BUS_NUM(e->devid_end),
PCI_SLOT(e->devid_end), PCI_FUNC(e->devid_end),
e->address_start, e->address_end, m->flags);
 
-   list_add_tail(>list, _iommu_unity_map);
+   

[PATCH v3 08/35] iommu/amd: Introduce per PCI segment alias_table

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This will replace global alias table (amd_iommu_alias_table).

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  7 +
 drivers/iommu/amd/init.c| 41 ++---
 drivers/iommu/amd/iommu.c   | 41 ++---
 3 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index d43ce65f8e21..9534064124f9 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -576,6 +576,13 @@ struct amd_iommu_pci_seg {
 * will be copied to. It's only be used in kdump kernel.
 */
struct dev_table_entry *old_dev_tbl_cpy;
+
+   /*
+* The alias table is a driver specific data structure which contains 
the
+* mappings of the PCI device ids to the actual requestor ids on the 
IOMMU.
+* More than one device can share the same requestor id.
+*/
+   u16 *alias_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 31b19a418ee8..9d57cb05878e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -698,6 +698,31 @@ static inline void free_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_seg)
pci_seg->irq_lookup_table = NULL;
 }
 
+static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   int i;
+
+   pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL,
+   
get_order(alias_table_size));
+   if (!pci_seg->alias_table)
+   return -ENOMEM;
+
+   /*
+* let all alias entries point to itself
+*/
+   for (i = 0; i <= amd_iommu_last_bdf; ++i)
+   pci_seg->alias_table[i] = i;
+
+   return 0;
+}
+
+static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->alias_table,
+  get_order(alias_table_size));
+   pci_seg->alias_table = NULL;
+}
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1266,6 +1291,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
u32 dev_i, ext_flags = 0;
bool alias = false;
struct ivhd_entry *e;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
u32 ivhd_size;
int ret;
 
@@ -1347,7 +1373,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
devid_to = e->ext >> 8;
set_dev_entry_from_acpi(iommu, devid   , e->flags, 0);
set_dev_entry_from_acpi(iommu, devid_to, e->flags, 0);
-   amd_iommu_alias_table[devid] = devid_to;
+   pci_seg->alias_table[devid] = devid_to;
break;
case IVHD_DEV_ALIAS_RANGE:
 
@@ -1405,7 +1431,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
devid = e->devid;
for (dev_i = devid_start; dev_i <= devid; ++dev_i) {
if (alias) {
-   amd_iommu_alias_table[dev_i] = devid_to;
+   pci_seg->alias_table[dev_i] = devid_to;
set_dev_entry_from_acpi(iommu,
devid_to, flags, ext_flags);
}
@@ -1540,6 +1566,8 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
if (alloc_dev_table(pci_seg))
return NULL;
+   if (alloc_alias_table(pci_seg))
+   return NULL;
if (alloc_rlookup_table(pci_seg))
return NULL;
 
@@ -1566,6 +1594,7 @@ static void __init free_pci_segments(void)
list_del(_seg->list);
free_irq_lookup_table(pci_seg);
free_rlookup_table(pci_seg);
+   free_alias_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
}
@@ -2838,7 +2867,7 @@ static void __init ivinfo_init(void *ivrs)
 static int __init early_amd_iommu_init(void)
 {
struct acpi_table_header *ivrs_base;
-   int i, remap_cache_sz, ret;
+   int remap_cache_sz, ret;
acpi_status status;
 
if (!amd_iommu_detected)
@@ -2909,12 +2938,6 @@ static int __init early_amd_iommu_init(void)
if (amd_iommu_pd_alloc_bitmap == NULL)
goto out;
 
-   /*
-* let all alias entries point to itself
-*/
-   for (i = 0; i <= amd_iommu_last_bdf; ++i)
-   amd_iommu_alias_table[i] = i;
-
/*
 * never allocate domain 0 because its used as the non-allocated and
   

[PATCH v3 07/35] iommu/amd: Introduce per PCI segment old_dev_tbl_cpy

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

It will remove global old_dev_tbl_cpy. Also update copy_device_table()
copy device table for all PCI segments.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |   6 ++
 drivers/iommu/amd/init.c| 109 
 2 files changed, 70 insertions(+), 45 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 7dac61226208..d43ce65f8e21 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -570,6 +570,12 @@ struct amd_iommu_pci_seg {
 * device id quickly.
 */
struct irq_remap_table **irq_lookup_table;
+
+   /*
+* Pointer to a device table which the content of old device table
+* will be copied to. It's only be used in kdump kernel.
+*/
+   struct dev_table_entry *old_dev_tbl_cpy;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 144835a5cf6d..31b19a418ee8 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -193,11 +193,6 @@ bool amd_iommu_force_isolation __read_mostly;
  * page table root pointer.
  */
 struct dev_table_entry *amd_iommu_dev_table;
-/*
- * Pointer to a device table which the content of old device table
- * will be copied to. It's only be used in kdump kernel.
- */
-static struct dev_table_entry *old_dev_tbl_cpy;
 
 /*
  * The alias table is a driver specific data structure which contains the
@@ -990,39 +985,27 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
 }
 
 
-static bool copy_device_table(void)
+static bool __copy_device_table(struct amd_iommu *iommu)
 {
-   u64 int_ctl, int_tab_len, entry = 0, last_entry = 0;
+   u64 int_ctl, int_tab_len, entry = 0;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
struct dev_table_entry *old_devtb = NULL;
u32 lo, hi, devid, old_devtb_size;
phys_addr_t old_devtb_phys;
-   struct amd_iommu *iommu;
u16 dom_id, dte_v, irq_v;
gfp_t gfp_flag;
u64 tmp;
 
-   if (!amd_iommu_pre_enabled)
-   return false;
-
-   pr_warn("Translation is already enabled - trying to copy translation 
structures\n");
-   for_each_iommu(iommu) {
-   /* All IOMMUs should use the same device table with the same 
size */
-   lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
-   hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
-   entry = (((u64) hi) << 32) + lo;
-   if (last_entry && last_entry != entry) {
-   pr_err("IOMMU:%d should use the same dev table as 
others!\n",
-   iommu->index);
-   return false;
-   }
-   last_entry = entry;
+   /* Each IOMMU use separate device table with the same size */
+   lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
+   hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
+   entry = (((u64) hi) << 32) + lo;
 
-   old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
-   if (old_devtb_size != dev_table_size) {
-   pr_err("The device table size of IOMMU:%d is not 
expected!\n",
-   iommu->index);
-   return false;
-   }
+   old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
+   if (old_devtb_size != dev_table_size) {
+   pr_err("The device table size of IOMMU:%d is not expected!\n",
+   iommu->index);
+   return false;
}
 
/*
@@ -1045,31 +1028,31 @@ static bool copy_device_table(void)
return false;
 
gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
-   old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
-   get_order(dev_table_size));
-   if (old_dev_tbl_cpy == NULL) {
+   pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
+   get_order(dev_table_size));
+   if (pci_seg->old_dev_tbl_cpy == NULL) {
pr_err("Failed to allocate memory for copying old device 
table!\n");
memunmap(old_devtb);
return false;
}
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
-   old_dev_tbl_cpy[devid] = old_devtb[devid];
+   pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
 
if (dte_v && dom_id) {
-   old_dev_tbl_cpy[devid].data[0] = 
old_devtb[devid].data[0];
-   old_dev_tbl_cpy[devid].data[1] = 
old_devtb[devid].data[1];
+   

[PATCH v3 06/35] iommu/amd: Introduce per PCI segment dev_data_list

2022-05-11 Thread Vasant Hegde via iommu
This will replace global dev_data_list.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c|  1 +
 drivers/iommu/amd/iommu.c   | 21 ++---
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 55792edfcfbe..7dac61226208 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -542,6 +542,9 @@ struct amd_iommu_pci_seg {
/* List with all PCI segments in the system */
struct list_head list;
 
+   /* List of all available dev_data structures */
+   struct llist_head dev_data_list;
+
/* PCI segment number */
u16 id;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f513591a0646..144835a5cf6d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1525,6 +1525,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
return NULL;
 
pci_seg->id = id;
+   init_llist_head(_seg->dev_data_list);
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
if (alloc_dev_table(pci_seg))
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 1590270ac54a..f0764446dea5 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -62,9 +62,6 @@
 
 static DEFINE_SPINLOCK(pd_bitmap_lock);
 
-/* List of all available dev_data structures */
-static LLIST_HEAD(dev_data_list);
-
 LIST_HEAD(ioapic_map);
 LIST_HEAD(hpet_map);
 LIST_HEAD(acpihid_map);
@@ -195,9 +192,10 @@ static struct protection_domain *to_pdomain(struct 
iommu_domain *dom)
return container_of(dom, struct protection_domain, domain);
 }
 
-static struct iommu_dev_data *alloc_dev_data(u16 devid)
+static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 
devid)
 {
struct iommu_dev_data *dev_data;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
dev_data = kzalloc(sizeof(*dev_data), GFP_KERNEL);
if (!dev_data)
@@ -207,19 +205,20 @@ static struct iommu_dev_data *alloc_dev_data(u16 devid)
dev_data->devid = devid;
ratelimit_default_init(_data->rs);
 
-   llist_add(_data->dev_data_list, _data_list);
+   llist_add(_data->dev_data_list, _seg->dev_data_list);
return dev_data;
 }
 
-static struct iommu_dev_data *search_dev_data(u16 devid)
+static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 
devid)
 {
struct iommu_dev_data *dev_data;
struct llist_node *node;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
-   if (llist_empty(_data_list))
+   if (llist_empty(_seg->dev_data_list))
return NULL;
 
-   node = dev_data_list.first;
+   node = pci_seg->dev_data_list.first;
llist_for_each_entry(dev_data, node, dev_data_list) {
if (dev_data->devid == devid)
return dev_data;
@@ -288,10 +287,10 @@ static struct iommu_dev_data *find_dev_data(u16 devid)
struct iommu_dev_data *dev_data;
struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
-   dev_data = search_dev_data(devid);
+   dev_data = search_dev_data(iommu, devid);
 
if (dev_data == NULL) {
-   dev_data = alloc_dev_data(devid);
+   dev_data = alloc_dev_data(iommu, devid);
if (!dev_data)
return NULL;
 
@@ -3464,7 +3463,7 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data;
struct amd_ir_data *ir_data = data->chip_data;
struct irq_2_irte *irte_info = _data->irq_2_irte;
-   struct iommu_dev_data *dev_data = search_dev_data(irte_info->devid);
+   struct iommu_dev_data *dev_data = search_dev_data(NULL, 
irte_info->devid);
 
/* Note:
 * This device has never been set up for guest mode.
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 05/35] iommu/amd: Introduce per PCI segment irq_lookup_table

2022-05-11 Thread Vasant Hegde via iommu
This will replace global irq lookup table (irq_lookup_table).

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 ++
 drivers/iommu/amd/init.c| 27 +++
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index bc38bf526735..55792edfcfbe 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -561,6 +561,12 @@ struct amd_iommu_pci_seg {
 * device id.
 */
struct amd_iommu **rlookup_table;
+
+   /*
+* This table is used to find the irq remapping table for a given
+* device id quickly.
+*/
+   struct irq_remap_table **irq_lookup_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index ccd5e79d64fb..f513591a0646 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -682,6 +682,26 @@ static inline void free_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
pci_seg->rlookup_table = NULL;
 }
 
+static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg 
*pci_seg)
+{
+   pci_seg->irq_lookup_table = (void *)__get_free_pages(
+GFP_KERNEL | __GFP_ZERO,
+get_order(rlookup_table_size));
+   kmemleak_alloc(pci_seg->irq_lookup_table,
+  rlookup_table_size, 1, GFP_KERNEL);
+   if (pci_seg->irq_lookup_table == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   kmemleak_free(pci_seg->irq_lookup_table);
+   free_pages((unsigned long)pci_seg->irq_lookup_table,
+  get_order(rlookup_table_size));
+   pci_seg->irq_lookup_table = NULL;
+}
 
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
@@ -1533,6 +1553,7 @@ static void __init free_pci_segments(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_irq_lookup_table(pci_seg);
free_rlookup_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
@@ -2896,6 +2917,7 @@ static int __init early_amd_iommu_init(void)
amd_iommu_irq_remap = check_ioapic_information();
 
if (amd_iommu_irq_remap) {
+   struct amd_iommu_pci_seg *pci_seg;
/*
 * Interrupt remapping enabled, create kmem_cache for the
 * remapping tables.
@@ -2912,6 +2934,11 @@ static int __init early_amd_iommu_init(void)
if (!amd_iommu_irq_cache)
goto out;
 
+   for_each_pci_segment(pci_seg) {
+   if (alloc_irq_lookup_table(pci_seg))
+   goto out;
+   }
+
irq_lookup_table = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
get_order(rlookup_table_size));
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 04/35] iommu/amd: Introduce per PCI segment rlookup table

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This will replace global rlookup table (amd_iommu_rlookup_table).
Add helper functions to set/get rlookup table for the given device.
Also add macros to get seg/devid from sbdf.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h | 11 
 drivers/iommu/amd/init.c| 23 +++
 drivers/iommu/amd/iommu.c   | 44 +
 4 files changed, 79 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 885570cd0d77..2947239700ce 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -19,6 +19,7 @@ extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
 extern void amd_iommu_init_notifier(void);
 extern int amd_iommu_init_api(void);
+extern void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
 void amd_iommu_debugfs_setup(struct amd_iommu *iommu);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index a850d69b2849..bc38bf526735 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -452,6 +452,9 @@ extern bool amd_iommu_irq_remap;
 /* kmem_cache to get tables with 128 byte alignement */
 extern struct kmem_cache *amd_iommu_irq_cache;
 
+#define PCI_SBDF_TO_SEGID(sbdf)(((sbdf) >> 16) & 0x)
+#define PCI_SBDF_TO_DEVID(sbdf)((sbdf) & 0x)
+
 /* Make iterating over all pci segment easier */
 #define for_each_pci_segment(pci_seg) \
list_for_each_entry((pci_seg), _iommu_pci_seg_list, list)
@@ -486,6 +489,7 @@ struct amd_iommu_fault {
 };
 
 
+struct amd_iommu;
 struct iommu_domain;
 struct irq_domain;
 struct amd_irte_ops;
@@ -550,6 +554,13 @@ struct amd_iommu_pci_seg {
 * page table root pointer.
 */
struct dev_table_entry *dev_table;
+
+   /*
+* The rlookup iommu table is used to find the IOMMU which is
+* responsible for a specific device. It is indexed by the PCI
+* device id.
+*/
+   struct amd_iommu **rlookup_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9618bec97141..ccd5e79d64fb 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -663,6 +663,26 @@ static inline void free_dev_table(struct amd_iommu_pci_seg 
*pci_seg)
pci_seg->dev_table = NULL;
 }
 
+/* Allocate per PCI segment IOMMU rlookup table. */
+static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   pci_seg->rlookup_table = (void *)__get_free_pages(
+   GFP_KERNEL | __GFP_ZERO,
+   get_order(rlookup_table_size));
+   if (pci_seg->rlookup_table == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->rlookup_table,
+  get_order(rlookup_table_size));
+   pci_seg->rlookup_table = NULL;
+}
+
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1489,6 +1509,8 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
if (alloc_dev_table(pci_seg))
return NULL;
+   if (alloc_rlookup_table(pci_seg))
+   return NULL;
 
return pci_seg;
 }
@@ -1511,6 +1533,7 @@ static void __init free_pci_segments(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_rlookup_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
}
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 54b8eb764530..1590270ac54a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -146,6 +146,50 @@ struct dev_table_entry *get_dev_table(struct amd_iommu 
*iommu)
return dev_table;
 }
 
+static inline u16 get_device_segment(struct device *dev)
+{
+   u16 seg;
+
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   seg = pci_domain_nr(pdev->bus);
+   } else {
+   u32 devid = get_acpihid_device_id(dev, NULL);
+
+   seg = PCI_SBDF_TO_SEGID(devid);
+   }
+
+   return seg;
+}
+
+/* Writes the specific IOMMU for a device into the PCI segment rlookup table */
+void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid)
+{
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   pci_seg->rlookup_table[devid] = iommu;
+}
+
+static struct amd_iommu *__rlookup_amd_iommu(u16 seg, u16 devid)
+{
+   struct 

[PATCH v3 03/35] iommu/amd: Introduce per PCI segment device table

2022-05-11 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Introduce per PCI segment device table. All IOMMUs within the segment
will share this device table. This will replace global device
table i.e. amd_iommu_dev_table.

Also introduce helper function to get the device table for the given IOMMU.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h | 10 ++
 drivers/iommu/amd/init.c| 26 --
 drivers/iommu/amd/iommu.c   | 12 
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 1ab31074f5b3..885570cd0d77 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -128,4 +128,5 @@ static inline void amd_iommu_apply_ivrs_quirks(void) { }
 
 extern void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 u64 *root, int mode);
+extern struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
 #endif
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 7ec032afc1b2..a850d69b2849 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -540,6 +540,16 @@ struct amd_iommu_pci_seg {
 
/* PCI segment number */
u16 id;
+
+   /*
+* device table virtual address
+*
+* Pointer to the per PCI segment device table.
+* It is indexed by the PCI device id or the HT unit id and contains
+* information about the domain the device belongs to as well as the
+* page table root pointer.
+*/
+   struct dev_table_entry *dev_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 71be7ee4aa8b..9618bec97141 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -640,11 +640,29 @@ static int __init find_last_devid_acpi(struct 
acpi_table_header *table)
  *
  * The following functions belong to the code path which parses the ACPI table
  * the second time. In this ACPI parsing iteration we allocate IOMMU specific
- * data structures, initialize the device/alias/rlookup table and also
- * basically initialize the hardware.
+ * data structures, initialize the per PCI segment device/alias/rlookup table
+ * and also basically initialize the hardware.
  *
  /
 
+/* Allocate per PCI segment device table */
+static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | 
GFP_DMA32,
+ 
get_order(dev_table_size));
+   if (!pci_seg->dev_table)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->dev_table,
+   get_order(dev_table_size));
+   pci_seg->dev_table = NULL;
+}
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1469,6 +1487,9 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
pci_seg->id = id;
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
+   if (alloc_dev_table(pci_seg))
+   return NULL;
+
return pci_seg;
 }
 
@@ -1490,6 +1511,7 @@ static void __init free_pci_segments(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_dev_table(pci_seg);
kfree(pci_seg);
}
 }
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index cf57ffcc8d54..54b8eb764530 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -134,6 +134,18 @@ static inline int get_device_id(struct device *dev)
return devid;
 }
 
+struct dev_table_entry *get_dev_table(struct amd_iommu *iommu)
+{
+   struct dev_table_entry *dev_table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   BUG_ON(pci_seg == NULL);
+   dev_table = pci_seg->dev_table;
+   BUG_ON(dev_table == NULL);
+
+   return dev_table;
+}
+
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
 {
return container_of(dom, struct protection_domain, domain);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 02/35] iommu/amd: Introduce pci segment structure

2022-05-11 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments, where each segment
contains one or more IOMMU instances. However, an IOMMU instance can only
support a single PCI segment.

Current code assumes that system contains only one pci segment (segment 0)
and creates global data structures such as device table, rlookup table,
etc.

Introducing per PCI segment data structure, which contains segment
specific data structures. This will eventually replace the global
data structures.

Also update `amd_iommu->pci_seg` variable to point to PCI segment
structure instead of PCI segment ID.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 24 ++-
 drivers/iommu/amd/init.c| 46 -
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 06235b7cb13d..7ec032afc1b2 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -452,6 +452,11 @@ extern bool amd_iommu_irq_remap;
 /* kmem_cache to get tables with 128 byte alignement */
 extern struct kmem_cache *amd_iommu_irq_cache;
 
+/* Make iterating over all pci segment easier */
+#define for_each_pci_segment(pci_seg) \
+   list_for_each_entry((pci_seg), _iommu_pci_seg_list, list)
+#define for_each_pci_segment_safe(pci_seg, next) \
+   list_for_each_entry_safe((pci_seg), (next), _iommu_pci_seg_list, 
list)
 /*
  * Make iterating over all IOMMUs easier
  */
@@ -526,6 +531,17 @@ struct protection_domain {
unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
 };
 
+/*
+ * This structure contains information about one PCI segment in the system.
+ */
+struct amd_iommu_pci_seg {
+   /* List with all PCI segments in the system */
+   struct list_head list;
+
+   /* PCI segment number */
+   u16 id;
+};
+
 /*
  * Structure where we save information about one hardware AMD IOMMU in the
  * system.
@@ -577,7 +593,7 @@ struct amd_iommu {
u16 cap_ptr;
 
/* pci domain of this IOMMU */
-   u16 pci_seg;
+   struct amd_iommu_pci_seg *pci_seg;
 
/* start of exclusion range of that IOMMU */
u64 exclusion_start;
@@ -705,6 +721,12 @@ extern struct list_head ioapic_map;
 extern struct list_head hpet_map;
 extern struct list_head acpihid_map;
 
+/*
+ * List with all PCI segments in the system. This list is not locked because
+ * it is only written at driver initialization time
+ */
+extern struct list_head amd_iommu_pci_seg_list;
+
 /*
  * List with all IOMMUs in the system. This list is not locked because it is
  * only written and read at driver initialization or suspend time
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b4a798c7b347..71be7ee4aa8b 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -165,6 +165,7 @@ u16 amd_iommu_last_bdf; /* largest PCI 
device id we have
 LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
   we find in ACPI */
 
+LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
   system */
 
@@ -1456,6 +1457,43 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
return 0;
 }
 
+/* Allocate PCI segment data structure */
+static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   pci_seg = kzalloc(sizeof(struct amd_iommu_pci_seg), GFP_KERNEL);
+   if (pci_seg == NULL)
+   return NULL;
+
+   pci_seg->id = id;
+   list_add_tail(_seg->list, _iommu_pci_seg_list);
+
+   return pci_seg;
+}
+
+static struct amd_iommu_pci_seg *__init get_pci_segment(u16 id)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg) {
+   if (pci_seg->id == id)
+   return pci_seg;
+   }
+
+   return alloc_pci_segment(id);
+}
+
+static void __init free_pci_segments(void)
+{
+   struct amd_iommu_pci_seg *pci_seg, *next;
+
+   for_each_pci_segment_safe(pci_seg, next) {
+   list_del(_seg->list);
+   kfree(pci_seg);
+   }
+}
+
 static void __init free_iommu_one(struct amd_iommu *iommu)
 {
free_cwwb_sem(iommu);
@@ -1542,8 +1580,14 @@ static void amd_iommu_ats_write_check_workaround(struct 
amd_iommu *iommu)
  */
 static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header 
*h)
 {
+   struct amd_iommu_pci_seg *pci_seg;
int ret;
 
+   pci_seg = get_pci_segment(h->pci_seg);
+   if (pci_seg == NULL)
+   return -ENOMEM;
+   iommu->pci_seg = pci_seg;
+
raw_spin_lock_init(>lock);
iommu->cmd_sem_val = 0;
 
@@ 

[PATCH v3 01/35] iommu/amd: Update struct iommu_dev_data definition

2022-05-11 Thread Vasant Hegde via iommu
struct iommu_dev_data contains member "pdev" to point to pci_dev. This is
valid for only PCI devices and for other devices this will be NULL. This
causes unnecessary "pdev != NULL" check at various places.

Replace "struct pci_dev" member with "struct device" and use to_pci_dev()
to get pci device reference as needed. Also adjust setup_aliases() and
clone_aliases() function.

No functional change intended.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 +-
 drivers/iommu/amd/iommu.c   | 32 +
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 47108ed44fbb..06235b7cb13d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -685,7 +685,7 @@ struct iommu_dev_data {
struct list_head list;/* For domain->dev_list */
struct llist_node dev_data_list;  /* For global dev_data_list */
struct protection_domain *domain; /* Domain the device is bound to */
-   struct pci_dev *pdev;
+   struct device *dev;
u16 devid;/* PCI Device ID */
bool iommu_v2;/* Device can make use of IOMMUv2 */
struct {
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e..cf57ffcc8d54 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -188,10 +188,13 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, 
void *data)
return 0;
 }
 
-static void clone_aliases(struct pci_dev *pdev)
+static void clone_aliases(struct device *dev)
 {
-   if (!pdev)
+   struct pci_dev *pdev;
+
+   if (!dev_is_pci(dev))
return;
+   pdev = to_pci_dev(dev);
 
/*
 * The IVRS alias stored in the alias table may not be
@@ -203,14 +206,14 @@ static void clone_aliases(struct pci_dev *pdev)
pci_for_each_dma_alias(pdev, clone_alias, NULL);
 }
 
-static struct pci_dev *setup_aliases(struct device *dev)
+static void setup_aliases(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
u16 ivrs_alias;
 
/* For ACPI HID devices, there are no aliases */
if (!dev_is_pci(dev))
-   return NULL;
+   return;
 
/*
 * Add the IVRS alias to the pci aliases if it is on the same
@@ -221,9 +224,7 @@ static struct pci_dev *setup_aliases(struct device *dev)
PCI_BUS_NUM(ivrs_alias) == pdev->bus->number)
pci_add_dma_alias(pdev, ivrs_alias & 0xff, 1);
 
-   clone_aliases(pdev);
-
-   return pdev;
+   clone_aliases(dev);
 }
 
 static struct iommu_dev_data *find_dev_data(u16 devid)
@@ -331,7 +332,8 @@ static int iommu_init_device(struct device *dev)
if (!dev_data)
return -ENOMEM;
 
-   dev_data->pdev = setup_aliases(dev);
+   dev_data->dev = dev;
+   setup_aliases(dev);
 
/*
 * By default we use passthrough mode for IOMMUv2 capable device.
@@ -1232,13 +1234,17 @@ static int device_flush_dte_alias(struct pci_dev *pdev, 
u16 alias, void *data)
 static int device_flush_dte(struct iommu_dev_data *dev_data)
 {
struct amd_iommu *iommu;
+   struct pci_dev *pdev = NULL;
u16 alias;
int ret;
 
iommu = amd_iommu_rlookup_table[dev_data->devid];
 
-   if (dev_data->pdev)
-   ret = pci_for_each_dma_alias(dev_data->pdev,
+   if (dev_is_pci(dev_data->dev))
+   pdev = to_pci_dev(dev_data->dev);
+
+   if (pdev)
+   ret = pci_for_each_dma_alias(pdev,
 device_flush_dte_alias, iommu);
else
ret = iommu_flush_dte(iommu, dev_data->devid);
@@ -1561,7 +1567,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
/* Update device table */
set_dte_entry(dev_data->devid, domain,
  ats, dev_data->iommu_v2);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
 
device_flush_dte(dev_data);
 }
@@ -1577,7 +1583,7 @@ static void do_detach(struct iommu_dev_data *dev_data)
dev_data->domain = NULL;
list_del(_data->list);
clear_dte_entry(dev_data->devid);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
 
/* Flush the DTE entry */
device_flush_dte(dev_data);
@@ -1818,7 +1824,7 @@ static void update_device_table(struct protection_domain 
*domain)
list_for_each_entry(dev_data, >dev_list, list) {
set_dte_entry(dev_data->devid, domain,
  dev_data->ats.enabled, dev_data->iommu_v2);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
}
 }
 
-- 
2.27.0


[PATCH v3 00/35] iommu/amd: Add multiple PCI segments support

2022-05-11 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments, where each segment
contains one or more IOMMU instances. However, an IOMMU instance can only
support a single PCI segment.

Current code assumes a system contains only one PCI segment (segment 0)
and creates global data structures such as device table, rlookup table,
etc.

This series introduces per-PCI-segment data structure, which contains
device table, alias table, etc. For each PCI segment, all IOMMUs
share the same data structure. The series also makes necessary code
adjustment and logging enhancements. Finally it removes global data
structures like device table, alias table, etc.

In case of system w/ single PCI segment (e.g. PCI segment ID is zero),
IOMMU driver allocates one PCI segment data structure, which will
be shared by all IOMMUs.

Patch 1 updates struct iommu_dev_data definition.

Patch 2 - 13 introduce new PCI segment structure and allocate per
data structures, and introduce the amd_iommu.pci_seg pointer to point
to the corresponded pci_segment structure. Also, we have introduced
a helper function rlookup_amd_iommu() to reverse-lookup each iommu
for a particular device.

Patch 14 - 27 adopt to per PCI segment data structure and removes
global data structure.

Patch 28 fixes flushing logic to flush upto last_bdf.

Patch 29 - 35 convert usages of 16-bit PCI device ID to include
16-bit segment ID.

Changes from v2 -> v3:
  - Addressed Joerg's review comments
- Fixed typo in patch 1 subject
- Fixed few minor things in patch 2
- Merged patch 27 - 29 into one patch
- Added new macros to get seg and devid from sbdf
  - Patch 32 : Extend devid to 32bit and added new macro.

v2 patchset : 
https://lore.kernel.org/linux-iommu/20220425113415.24087-1-vasant.he...@amd.com/T/#t

Changes from v1 -> v2:
  - Updated patch 1 to include dev_is_pci() check

v1 patchset : 
https://lore.kernel.org/linux-iommu/20220404100023.324645-1-vasant.he...@amd.com/T/#t

Changes from RFC -> v1:
  - Rebased patches on top of iommu/next tree.
  - Update struct iommu_dev_data definition
  - Updated few log message to print segment ID
  - Fix smatch warnings

RFC patchset : 
https://lore.kernel.org/linux-iommu/20220311094854.31595-1-vasant.he...@amd.com/T/#t


Regards,
Vasant

Suravee Suthikulpanit (20):
  iommu/amd: Introduce per PCI segment device table
  iommu/amd: Introduce per PCI segment rlookup table
  iommu/amd: Introduce per PCI segment old_dev_tbl_cpy
  iommu/amd: Introduce per PCI segment alias_table
  iommu/amd: Convert to use rlookup_amd_iommu helper function
  iommu/amd: Update irq_remapping_alloc to use IOMMU lookup helper function
  iommu/amd: Introduce struct amd_ir_data.iommu
  iommu/amd: Update amd_irte_ops functions
  iommu/amd: Update alloc_irq_table and alloc_irq_index
  iommu/amd: Update set_dte_entry and clear_dte_entry
  iommu/amd: Update iommu_ignore_device
  iommu/amd: Update dump_dte_entry
  iommu/amd: Update set_dte_irq_entry
  iommu/amd: Update (un)init_device_table_dma()
  iommu/amd: Update set_dev_entry_bit() and get_dev_entry_bit()
  iommu/amd: Remove global amd_iommu_[dev_table/alias_table/last_bdf]
  iommu/amd: Introduce get_device_sbdf_id() helper function
  iommu/amd: Include PCI segment ID when initialize IOMMU
  iommu/amd: Specify PCI segment ID when getting pci device
  iommu/amd: Add PCI segment support for ivrs_[ioapic/hpet/acpihid] commands

Vasant Hegde (15):
  iommu/amd: Update struct iommu_dev_data definition
  iommu/amd: Introduce pci segment structure
  iommu/amd: Introduce per PCI segment irq_lookup_table
  iommu/amd: Introduce per PCI segment dev_data_list
  iommu/amd: Introduce per PCI segment unity map list
  iommu/amd: Introduce per PCI segment last_bdf
  iommu/amd: Introduce per PCI segment device table size
  iommu/amd: Introduce per PCI segment alias table size
  iommu/amd: Introduce per PCI segment rlookup table size
  iommu/amd: Convert to use per PCI segment irq_lookup_table
  iommu/amd: Convert to use per PCI segment rlookup_table
  iommu/amd: Flush upto last_bdf only
  iommu/amd: Print PCI segment ID in error log messages
  iommu/amd: Update device_state structure to include PCI seg ID
  iommu/amd: Update amd_iommu_fault structure to include PCI seg ID

 .../admin-guide/kernel-parameters.txt |  34 +-
 drivers/iommu/amd/amd_iommu.h |  13 +-
 drivers/iommu/amd/amd_iommu_types.h   | 133 +++-
 drivers/iommu/amd/init.c  | 687 +++---
 drivers/iommu/amd/iommu.c | 563 --
 drivers/iommu/amd/iommu_v2.c  |  67 +-
 drivers/iommu/amd/quirks.c|   4 +-
 7 files changed, 904 insertions(+), 597 deletions(-)

-- 
2.27.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/12] iommu/sva: Use attach/detach_pasid_dev in SVA interfaces

2022-05-11 Thread Baolu Lu

On 2022/5/10 23:23, Jason Gunthorpe wrote:

On Tue, May 10, 2022 at 02:17:34PM +0800, Lu Baolu wrote:


+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to mm_users
+ * @drvdata: opaque data pointer to pass to bind callback
+ *
+ * Create a bond between device and address space, allowing the device to 
access
+ * the mm using the returned PASID. If a bond already exists between @device 
and
+ * @mm, it is returned and an additional reference is taken. Caller must call
+ * iommu_sva_unbind_device() to release each reference.
+ *
+ * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
+ * initialize the required SVA features.
+ *
+ * On error, returns an ERR_PTR value.
+ */
+struct iommu_sva *
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+{
+   int ret = -EINVAL;
+   struct iommu_sva *handle;
+   struct iommu_domain *domain;
+
+   /*
+* TODO: Remove the drvdata parameter after kernel PASID support is
+* enabled for the idxd driver.
+*/
+   if (drvdata)
+   return ERR_PTR(-EOPNOTSUPP);


Why is this being left behind? Clean up the callers too please.


Okay, let me try to.




+   /* Allocate mm->pasid if necessary. */
+   ret = iommu_sva_alloc_pasid(mm, 1, (1U << dev->iommu->pasid_bits) - 1);
+   if (ret)
+   return ERR_PTR(ret);
+
+   mutex_lock(_sva_lock);
+   /* Search for an existing bond. */
+   handle = xa_load(>iommu->sva_bonds, mm->pasid);
+   if (handle) {
+   refcount_inc(>users);
+   goto out_success;
+   }


How can there be an existing bond?

dev->iommu is per-device

The device_group_immutable_singleton() insists on a single device
group

Basically 'sva_bonds' is the same thing as the group->pasid_array.


Yes, really.



Assuming we leave room for multi-device groups this logic should just
be

group = iommu_group_get(dev);
if (!group)
return -ENODEV;

mutex_lock(>mutex);
domain = xa_load(>pasid_array, mm->pasid);
if (!domain || domain->type != IOMMU_DOMAIN_SVA || domain->mm != mm)
domain = iommu_sva_alloc_domain(dev, mm);

?


Agreed. As a helper in iommu core, how about making it more generic like
below?

+struct iommu_domain *iommu_get_domain_for_dev_pasid(struct device *dev,
+   iosid_t pasid,
+   unsigned int type)
+{
+   struct iommu_domain *domain;
+   struct iommu_group *group;
+
+   if (!pasid_valid(pasid))
+   return NULL;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return NULL;
+
+   mutex_lock(>mutex);
+   domain = xa_load(>pasid_array, pasid);
+   if (domain && domain->type != type)
+   domain = NULL;
+   mutex_unlock(>mutex);
+   iommu_group_put(group);
+
+   return domain;
+}



And stick the refcount in the sva_domain

Also, given the current arrangement it might make sense to have a
struct iommu_domain_sva given that no driver is wrappering this in
something else.


Fair enough. How about below wrapper?

+struct iommu_sva_domain {
+   /*
+* Common iommu domain header, *must* be put at the top
+* of the structure.
+*/
+   struct iommu_domain domain;
+   struct mm_struct *mm;
+   struct iommu_sva bond;
+}

The refcount is wrapped in bond.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] iommu/mediatek: Improve safety for mediatek, smi property in larb nodes

2022-05-11 Thread Yong Wu via iommu
No functional change. Just improve safety from dts.

All the larbs that connect to one IOMMU must connect with the same
smi-common. This patch checks all the mediatek,smi property for each
larb, If their mediatek,smi are different, it will return fails.
Also avoid there is no available smi-larb nodes.

Suggested-by: Guenter Roeck 
Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 49 ++-
 1 file changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 1ba92751e9df..75b9ede45a1a 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1038,7 +1038,7 @@ static const struct component_master_ops 
mtk_iommu_com_ops = {
 static int mtk_iommu_mm_dts_parse(struct device *dev, struct component_match 
**match,
  struct mtk_iommu_data *data)
 {
-   struct device_node *larbnode, *smicomm_node, *smi_subcomm_node;
+   struct device_node *larbnode, *frst_avail_smicomm_node = NULL;
struct platform_device *plarbdev;
struct device_link *link;
int i, larb_nr, ret;
@@ -1050,6 +1050,7 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
return -EINVAL;
 
for (i = 0; i < larb_nr; i++) {
+   struct device_node *smicomm_node, *smi_subcomm_node;
u32 id;
 
larbnode = of_parse_phandle(dev->of_node, "mediatek,larbs", i);
@@ -1085,27 +1086,43 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
}
data->larb_imu[id].dev = >dev;
 
+   /* Get smi-(sub)-common dev from the last larb. */
+   smi_subcomm_node = of_parse_phandle(larbnode, "mediatek,smi", 
0);
+   if (!smi_subcomm_node) {
+   ret = -EINVAL;
+   goto err_larbnode_put;
+   }
+
+   /*
+* It may have two level smi-common. the node is smi-sub-common 
if it
+* has a new mediatek,smi property. otherwise it is smi-commmon.
+*/
+   smicomm_node = of_parse_phandle(smi_subcomm_node, 
"mediatek,smi", 0);
+   if (smicomm_node)
+   of_node_put(smi_subcomm_node);
+   else
+   smicomm_node = smi_subcomm_node;
+
+   if (!frst_avail_smicomm_node) {
+   frst_avail_smicomm_node = smicomm_node;
+   } else if (frst_avail_smicomm_node != smicomm_node) {
+   dev_err(dev, "mediatek,smi is not right @larb%d.", id);
+   of_node_put(smicomm_node);
+   ret = -EINVAL;
+   goto err_larbnode_put;
+   } else {
+   of_node_put(smicomm_node);
+   }
+
component_match_add_release(dev, match, component_release_of,
component_compare_of, larbnode);
}
 
-   /* Get smi-(sub)-common dev from the last larb. */
-   smi_subcomm_node = of_parse_phandle(larbnode, "mediatek,smi", 0);
-   if (!smi_subcomm_node)
+   if (!frst_avail_smicomm_node)
return -EINVAL;
 
-   /*
-* It may have two level smi-common. the node is smi-sub-common if it
-* has a new mediatek,smi property. otherwise it is smi-commmon.
-*/
-   smicomm_node = of_parse_phandle(smi_subcomm_node, "mediatek,smi", 0);
-   if (smicomm_node)
-   of_node_put(smi_subcomm_node);
-   else
-   smicomm_node = smi_subcomm_node;
-
-   plarbdev = of_find_device_by_node(smicomm_node);
-   of_node_put(smicomm_node);
+   plarbdev = of_find_device_by_node(frst_avail_smicomm_node);
+   of_node_put(frst_avail_smicomm_node);
data->smicomm_dev = >dev;
 
link = device_link_add(data->smicomm_dev, dev,
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] iommu/mediatek: Validate number of phandles associated with "mediatek, larbs"

2022-05-11 Thread Yong Wu via iommu
From: Guenter Roeck 

Fix the smatch warnings:
drivers/iommu/mtk_iommu.c:878 mtk_iommu_mm_dts_parse() error: uninitialized
symbol 'larbnode'.

If someone abuse the dtsi node(Don't follow the definition of dt-binding),
for example "mediatek,larbs" is provided as boolean property, the code may
crash. To fix this problem and improve the code safety, add some checking
for the invalid input from dtsi, e.g. checking the larb_nr/larbid valid
range, and avoid "mediatek,larb-id" property conflicts in the smi-larb
nodes.

Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 
Signed-off-by: Guenter Roeck 
Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 523bf59264e1..1ba92751e9df 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1046,6 +1046,8 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
larb_nr = of_count_phandle_with_args(dev->of_node, "mediatek,larbs", 
NULL);
if (larb_nr < 0)
return larb_nr;
+   if (larb_nr == 0 || larb_nr > MTK_LARB_NR_MAX)
+   return -EINVAL;
 
for (i = 0; i < larb_nr; i++) {
u32 id;
@@ -1062,6 +1064,10 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
ret = of_property_read_u32(larbnode, "mediatek,larb-id", );
if (ret)/* The id is consecutive if there is no this property */
id = i;
+   if (id >= MTK_LARB_NR_MAX) {
+   ret = -EINVAL;
+   goto err_larbnode_put;
+   }
 
plarbdev = of_find_device_by_node(larbnode);
if (!plarbdev) {
@@ -1072,6 +1078,11 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
ret = -EPROBE_DEFER;
goto err_larbnode_put;
}
+
+   if (data->larb_imu[id].dev) {
+   ret = -EEXIST;
+   goto err_larbnode_put;
+   }
data->larb_imu[id].dev = >dev;
 
component_match_add_release(dev, match, component_release_of,
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] iommu/mediatek: Add error path for loop of mm_dts_parse

2022-05-11 Thread Yong Wu via iommu
The mtk_iommu_mm_dts_parse will parse the smi larbs nodes. if the i+1
larb is parsed fail(return -EINVAL), we should of_node_put for the 0..i
larbs.

Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
Signed-off-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 0f6ec4a4d9d4..523bf59264e1 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1065,12 +1065,12 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
 
plarbdev = of_find_device_by_node(larbnode);
if (!plarbdev) {
-   of_node_put(larbnode);
-   return -ENODEV;
+   ret = -ENODEV;
+   goto err_larbnode_put;
}
if (!plarbdev->dev.driver) {
-   of_node_put(larbnode);
-   return -EPROBE_DEFER;
+   ret = -EPROBE_DEFER;
+   goto err_larbnode_put;
}
data->larb_imu[id].dev = >dev;
 
@@ -1101,9 +1101,20 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, 
struct component_match **m
   DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
if (!link) {
dev_err(dev, "Unable to link %s.\n", 
dev_name(data->smicomm_dev));
-   return -EINVAL;
+   ret = -EINVAL;
+   goto err_larbnode_put;
}
return 0;
+
+err_larbnode_put:
+   while (i--) {
+   larbnode = of_parse_phandle(dev->of_node, "mediatek,larbs", i);
+   if (larbnode && of_device_is_available(larbnode)) {
+   of_node_put(larbnode);
+   of_node_put(larbnode);
+   }
+   }
+   return ret;
 }
 
 static int mtk_iommu_probe(struct platform_device *pdev)
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] iommu/mediatek: Use dev_err_probe to mute probe_defer err log

2022-05-11 Thread Yong Wu via iommu
Mute the probe defer log:

[2.654806] mtk-iommu 14018000.iommu: mm dts parse fail(-517).
[2.656168] mtk-iommu 1c01f000.iommu: mm dts parse fail(-517).

Fixes: d2e9a1102cfc ("iommu/mediatek: Contain MM IOMMU flow with the MM TYPE")
Signed-off-by: Yong Wu 
---
The Fixes tag commit-id is from linux-next.
---
 drivers/iommu/mtk_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 71b2ace74cd6..0f6ec4a4d9d4 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1198,7 +1198,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
ret = mtk_iommu_mm_dts_parse(dev, , data);
if (ret) {
-   dev_err(dev, "mm dts parse fail(%d).", ret);
+   dev_err_probe(dev, ret, "mm dts parse fail.");
goto out_runtime_disable;
}
} else if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_INFRA) &&
-- 
2.18.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] iommu/mediatek: Improve safety from dts

2022-05-11 Thread Yong Wu via iommu
This patchset contains several improved patches:
[1/4] When mt8195 v7, I added a error log for dts parse fail, but it
doesn't ignore probe_defer case.(v6 doesn't have this err log.)
[2/4] Add a error path for MM dts parse.

[3/4][4/4] To improve safety from dts. Base on this:
https://lore.kernel.org/linux-mediatek/20211210205704.1664928-1-li...@roeck-us.net/

Base on linux-next-20220510.

Guenter Roeck (1):
  iommu/mediatek: Validate number of phandles associated with
"mediatek,larbs"

Yong Wu (3):
  iommu/mediatek: Use dev_err_probe to mute probe_defer err log
  iommu/mediatek: Add error path for loop of mm_dts_parse
  iommu/mediatek: Improve safety for mediatek,smi property in larb nodes

 drivers/iommu/mtk_iommu.c | 83 ---
 1 file changed, 61 insertions(+), 22 deletions(-)

-- 
2.18.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: Max mapping size takes min align mask into account

2022-05-11 Thread h...@lst.de
On Tue, May 10, 2022 at 06:26:55PM +, Michael Kelley (LINUX) wrote:
> > Hmm, this seems a bit pessimistic - the offset can vary per mapping, so
> > it feels to me like it should really be the caller's responsibility to
> > account for it if they're already involved enough to care about both
> > constraints. But I'm not sure how practical that would be.
> 
> Tianyu and I discussed this prior to his submitting the patch.
> Presumably dma_max_mapping_size() exists so that the higher
> level blk-mq code can limit the size of I/O requests to something
> that will "fit" in the swiotlb when bounce buffering is enabled.

Yes, the idea that upper level code doesn't need to care was very
much the idea behind dma_max_mapping_size().

> As you mentioned, how else would a caller handle this situation?

Well, we could look at dma_get_min_align_mask in the caller and do
the calculation there, but I really don't think that is a good idea.

So this patch looks sensible to me.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu