Re: [PATCH v2 2/2] dma-iommu: Check that swiotlb is active before trying to use it

2022-04-04 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/amd: Enable swiotlb in all cases

2022-04-04 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: remove unneeded validity check on dev

2022-04-04 Thread Lu Baolu

On 2022/4/4 15:52, Muhammad Usama Anjum wrote:

Any thoughts?


It looks good to me. I will queue it for v5.19.

Best regards,
baolu



On 3/13/22 8:03 PM, Muhammad Usama Anjum wrote:

dev_iommu_priv_get() is being used at the top of this function which
dereferences dev. Dev cannot be NULL after this. Remove the validity
check on dev and simplify the code.

Signed-off-by: Muhammad Usama Anjum 
---
  drivers/iommu/intel/iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index df5c62ecf942b..f79edbbd651a4 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2502,7 +2502,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
}
}
  
-	if (dev && domain_context_mapping(domain, dev)) {

+   if (domain_context_mapping(domain, dev)) {
dev_err(dev, "Domain context map failed\n");
dmar_remove_one_dev_info(dev);
return NULL;



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/2] dma-iommu: Check that swiotlb is active before trying to use it

2022-04-04 Thread Mario Limonciello via iommu
If the IOMMU is in use and an untrusted device is connected to an external
facing port but the address requested isn't page aligned will cause the
kernel to attempt to use bounce buffers.

If for some reason the bounce buffers have not been allocated this is a
problem that should be made apparent to the user.

Signed-off-by: Mario Limonciello 
---
v1->v2:
 * Move error message into the caller

 drivers/iommu/dma-iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..1ca85d37eeab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -971,6 +971,11 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
void *padding_start;
size_t padding_size, aligned_size;
 
+   if (!is_swiotlb_active(dev)) {
+   dev_warn_once(dev, "DMA bounce buffers are inactive, 
unable to map unaligned transaction.\n");
+   return DMA_MAPPING_ERROR;
+   }
+
aligned_size = iova_align(iovad, size);
phys = swiotlb_tbl_map_single(dev, phys, size, aligned_size,
  iova_mask(iovad), dir, attrs);
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/2] Fix issues with untrusted devices and AMD IOMMU

2022-04-04 Thread Mario Limonciello via iommu
It's been observed that plugging in a TBT3 NVME device to a port marked
with ExternalFacingPort that some DMA transactions occur that are not a
full page and so the DMA API attempts to use software bounce buffers
instead of relying upon the IOMMU translation.

This doesn't work and leads to messaging like:

swiotlb buffer is full (sz: 4096 bytes), total 0 (slots), used 0 (slots)

The bounce buffers were originally set up, but torn down during
the boot process.
* This happens because as part of IOMMU initialization
  `amd_iommu_init_dma_ops` gets called and resets the global swiotlb to 0.
* When late_init gets called `pci_swiotlb_late_init` `swiotlb_exit` is
  called and the buffers are torn down.

This can be observed in the logs:
```
[0.407286] AMD-Vi: Extended features (0x246577efa2254afa): PPR NX GT [5] IA 
GA PC GA_vAPIC
[0.407291] AMD-Vi: Interrupt remapping enabled
[0.407292] AMD-Vi: Virtual APIC enabled
[0.407872] software IO TLB: tearing down default memory pool
```
This series fixes the behavior of AMD IOMMU to enable swiotlb so that
non-page aligned DMA goes through a bounce buffer.

It also adds a message to help with debugging similar problems in the
future.

Mario Limonciello (2):
  iommu/amd: Enable swiotlb in all cases
  dma-iommu: Check that swiotlb is active before trying to use it

 drivers/iommu/amd/iommu.c | 7 ---
 drivers/iommu/dma-iommu.c | 5 +
 2 files changed, 5 insertions(+), 7 deletions(-)

-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/2] iommu/amd: Enable swiotlb in all cases

2022-04-04 Thread Mario Limonciello via iommu
Previously the AMD IOMMU would only enable SWIOTLB in certain
circumstances:
 * IOMMU in passthrough mode
 * SME enabled

This logic however doesn't work when an untrusted device is plugged in
that doesn't do page aligned DMA transactions.  The expectation is
that a bounce buffer is used for those transactions.

This fails like this:

swiotlb buffer is full (sz: 4096 bytes), total 0 (slots), used 0 (slots)

That happens because the bounce buffers have been allocated, followed by
freed during startup but the bounce buffering code expects that all IOMMUs
have left it enabled.

Remove the criteria to set up bounce buffers on AMD systems to ensure
they're always available for supporting untrusted devices.

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Suggested-by: Christoph Hellwig 
Signed-off-by: Mario Limonciello 
---
v1->v2:
 * Enable swiotlb for AMD instead of ignoring it for inactive

 drivers/iommu/amd/iommu.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e..079694f894b8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1838,17 +1838,10 @@ void amd_iommu_domain_update(struct protection_domain 
*domain)
amd_iommu_domain_flush_complete(domain);
 }
 
-static void __init amd_iommu_init_dma_ops(void)
-{
-   swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
-}
-
 int __init amd_iommu_init_api(void)
 {
int err;
 
-   amd_iommu_init_dma_ops();
-
err = bus_set_iommu(_bus_type, _iommu_ops);
if (err)
return err;
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] Fix issues with untrusted devices and AMD IOMMU

2022-04-04 Thread Christoph Hellwig
On Mon, Apr 04, 2022 at 05:05:00PM +, Limonciello, Mario wrote:
> I do expect that solves it as well.  The reason I submitted the way I
> did is that there seemed to be a strong affinity for having swiotlb
> disabled when IOMMU is enabled on AMD IOMMU.  The original code that
> disabled SWIOTLB in AMD IOMMU dates all the way back to 2.6.33 (commit
> 75f1cdf1dda92cae037ec848ae63690d91913eac) and it has ping ponged around
> since then to add more criteria that it would be or wouldn't be
> disabled, but was never actually dropped until your suggestion.

Well, that was before we started bounce buffering for untrusted devices.
We can't just have a less secure path for them because some conditions
are not met.  Especially given that most AMD systems right now probably
don't have that swiotlb buffer if the IOMMU is enabled.  So not freeing
the buffer in this case is a bug fix that is needed to properly
support the bounce buffering for unaligned I/O to untrusted devices.

> I do think that my messaging patch (1/2) may still be useful for
> debugging in the future if for another reason SWIOTLB is disabled.

I think the warning is useful.  For dma-direct we have it in the caller
so I'd be tempted todo the same for dma-iommu.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v2 02/11] iommu: Add iommu_group_singleton_lockdown()

2022-04-04 Thread Jason Gunthorpe via iommu
On Mon, Apr 04, 2022 at 01:43:49PM +0800, Lu Baolu wrote:
> On 2022/3/30 19:58, Jason Gunthorpe wrote:
> > > > Testing the group size is inherently the wrong test to make.
> > > What is your suggestion then?
> > Add a flag to the group that positively indicates the group can never
> > have more than one member, even after hot plug. eg because it is
> > impossible due to ACS, or lack of bridges, and so on.
> 
> The check method seems to be bus specific. For platform devices, perhaps
> this kind of information should be retrieved from firmware interfaces
> like APCI or DT.
> 
> From this point of view, would it be simpler and more reasonable for the
> device driver to do such check? After all, it is the device driver that
> decides whether to provide SVA services to the application via uacce.

The check has to do with the interconnect, not the device - I don't
see how a device driver would know any better.

Why do you bring up uacce? Nothing should need uacce to access SVA.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 0/2] Fix issues with untrusted devices and AMD IOMMU

2022-04-04 Thread Limonciello, Mario via iommu
[AMD Official Use Only]

> On Mon, Apr 04, 2022 at 11:47:05AM -0500, Mario Limonciello wrote:
> > The bounce buffers were originally set up, but torn down during
> > the boot process.
> > * This happens because as part of IOMMU initialization
> >   `amd_iommu_init_dma_ops` gets called and resets the global swiotlb to 0.
> > * When late_init gets called `pci_swiotlb_late_init` `swiotlb_exit` is
> >   called and the buffers are torn down.
> 
> I think the proper thing is to do this:
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index a1ada7bff44e6..079694f894b85 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1838,17 +1838,10 @@ void amd_iommu_domain_update(struct
> protection_domain *domain)
>   amd_iommu_domain_flush_complete(domain);
>  }
> 
> -static void __init amd_iommu_init_dma_ops(void)
> -{
> - swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
> -}
> -
>  int __init amd_iommu_init_api(void)
>  {
>   int err;
> 
> - amd_iommu_init_dma_ops();
> -
>   err = bus_set_iommu(_bus_type, _iommu_ops);
>   if (err)
>   return err;

I do expect that solves it as well.  The reason I submitted the way I did is
that there seemed to be a strong affinity for having swiotlb disabled when IOMMU
is enabled on AMD IOMMU.  The original code that disabled SWIOTLB in AMD IOMMU
dates all the way back to 2.6.33 (commit 
75f1cdf1dda92cae037ec848ae63690d91913eac)
and it has ping ponged around since then to add more criteria that it would be 
or wouldn't
be disabled, but was never actually dropped until your suggestion.

If the consensus is that it should be dropped, I'll validate it does help and 
then send that out
as a v2.  I do think that my messaging patch (1/2) may still be useful for 
debugging in the
future if for another reason SWIOTLB is disabled.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 09/15] swiotlb: make the swiotlb_init interface more useful

2022-04-04 Thread Michael Kelley (LINUX) via iommu
From: Christoph Hellwig  Sent: Sunday, April 3, 2022 10:06 PM
> 
> Pass a bool to pass if swiotlb needs to be enabled based on the

Wording problems.  I'm not sure what you meant to say.

> addressing needs and replace the verbose argument with a set of
> flags, including one to force enable bounce buffering.
> 
> Note that this patch removes the possibility to force xen-swiotlb
> use using swiotlb=force on the command line on x86 (arm and arm64
> never supported that), but this interface will be restored shortly.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm/mm/init.c |  6 +
>  arch/arm64/mm/init.c   |  6 +
>  arch/ia64/mm/init.c|  4 +--
>  arch/mips/cavium-octeon/dma-octeon.c   |  2 +-
>  arch/mips/loongson64/dma.c |  2 +-
>  arch/mips/sibyte/common/dma.c  |  2 +-
>  arch/powerpc/mm/mem.c  |  3 ++-
>  arch/powerpc/platforms/pseries/setup.c |  3 ---
>  arch/riscv/mm/init.c   |  8 +-
>  arch/s390/mm/init.c|  3 +--
>  arch/x86/kernel/pci-dma.c  | 15 ++-
>  drivers/xen/swiotlb-xen.c  |  4 +--
>  include/linux/swiotlb.h| 15 ++-
>  include/trace/events/swiotlb.h | 29 -
>  kernel/dma/swiotlb.c   | 35 ++
>  15 files changed, 55 insertions(+), 82 deletions(-)
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index fe249ea919083..ce64bdb55a16b 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -271,11 +271,7 @@ static void __init free_highpages(void)
>  void __init mem_init(void)
>  {
>  #ifdef CONFIG_ARM_LPAE
> - if (swiotlb_force == SWIOTLB_FORCE ||
> - max_pfn > arm_dma_pfn_limit)
> - swiotlb_init(1);
> - else
> - swiotlb_force = SWIOTLB_NO_FORCE;
> + swiotlb_init(max_pfn > arm_dma_pfn_limit, SWIOTLB_VERBOSE);
>  #endif
> 
>   set_max_mapnr(pfn_to_page(max_pfn) - mem_map);
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 8ac25f19084e8..7b6ea4d6733d6 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -398,11 +398,7 @@ void __init bootmem_init(void)
>   */
>  void __init mem_init(void)
>  {
> - if (swiotlb_force == SWIOTLB_FORCE ||
> - max_pfn > PFN_DOWN(arm64_dma_phys_limit))
> - swiotlb_init(1);
> - else if (!xen_swiotlb_detect())
> - swiotlb_force = SWIOTLB_NO_FORCE;
> + swiotlb_init(max_pfn > PFN_DOWN(arm64_dma_phys_limit),
> SWIOTLB_VERBOSE);
> 
>   /* this will put all unused low memory onto the freelists */
>   memblock_free_all();
> diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
> index 5d165607bf354..3c3e15b22608f 100644
> --- a/arch/ia64/mm/init.c
> +++ b/arch/ia64/mm/init.c
> @@ -437,9 +437,7 @@ mem_init (void)
>   if (iommu_detected)
>   break;
>  #endif
> -#ifdef CONFIG_SWIOTLB
> - swiotlb_init(1);
> -#endif
> + swiotlb_init(true, SWIOTLB_VERBOSE);
>   } while (0);
> 
>  #ifdef CONFIG_FLATMEM
> diff --git a/arch/mips/cavium-octeon/dma-octeon.c 
> b/arch/mips/cavium-octeon/dma-
> octeon.c
> index fb7547e217263..9fbba6a8fa4c5 100644
> --- a/arch/mips/cavium-octeon/dma-octeon.c
> +++ b/arch/mips/cavium-octeon/dma-octeon.c
> @@ -235,5 +235,5 @@ void __init plat_swiotlb_setup(void)
>  #endif
> 
>   swiotlb_adjust_size(swiotlbsize);
> - swiotlb_init(1);
> + swiotlb_init(true, SWIOTLB_VERBOSE);
>  }
> diff --git a/arch/mips/loongson64/dma.c b/arch/mips/loongson64/dma.c
> index 364f2f27c8723..8220a1bc0db64 100644
> --- a/arch/mips/loongson64/dma.c
> +++ b/arch/mips/loongson64/dma.c
> @@ -24,5 +24,5 @@ phys_addr_t dma_to_phys(struct device *dev, dma_addr_t
> daddr)
> 
>  void __init plat_swiotlb_setup(void)
>  {
> - swiotlb_init(1);
> + swiotlb_init(true, SWIOTLB_VERBOSE);
>  }
> diff --git a/arch/mips/sibyte/common/dma.c b/arch/mips/sibyte/common/dma.c
> index eb47a94f3583e..c5c2c782aff68 100644
> --- a/arch/mips/sibyte/common/dma.c
> +++ b/arch/mips/sibyte/common/dma.c
> @@ -10,5 +10,5 @@
> 
>  void __init plat_swiotlb_setup(void)
>  {
> - swiotlb_init(1);
> + swiotlb_init(true, SWIOTLB_VERBOSE);
>  }
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 8e301cd8925b2..e1519e2edc656 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -17,6 +17,7 @@
>  #include 
>  #include 
> 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -251,7 +252,7 @@ void __init mem_init(void)
>   if (is_secure_guest())
>   svm_swiotlb_init();
>   else
> - swiotlb_init(0);
> + swiotlb_init(ppc_swiotlb_enable, 0);
>  #endif
> 
>   high_memory = (void *) __va(max_low_pfn * PAGE_SIZE);
> diff --git a/arch/powerpc/platforms/pseries/setup.c
> b/arch/powerpc/platforms/pseries/setup.c
> index 069d7b3bb142e..c6e06d91b6602 

Re: [PATCH 0/2] Fix issues with untrusted devices and AMD IOMMU

2022-04-04 Thread Christoph Hellwig
On Mon, Apr 04, 2022 at 11:47:05AM -0500, Mario Limonciello wrote:
> The bounce buffers were originally set up, but torn down during
> the boot process.
> * This happens because as part of IOMMU initialization
>   `amd_iommu_init_dma_ops` gets called and resets the global swiotlb to 0.
> * When late_init gets called `pci_swiotlb_late_init` `swiotlb_exit` is
>   called and the buffers are torn down.

I think the proper thing is to do this:

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e6..079694f894b85 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1838,17 +1838,10 @@ void amd_iommu_domain_update(struct protection_domain 
*domain)
amd_iommu_domain_flush_complete(domain);
 }
 
-static void __init amd_iommu_init_dma_ops(void)
-{
-   swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
-}
-
 int __init amd_iommu_init_api(void)
 {
int err;
 
-   amd_iommu_init_dma_ops();
-
err = bus_set_iommu(_bus_type, _iommu_ops);
if (err)
return err;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu: Don't use swiotlb unless it's active

2022-04-04 Thread Mario Limonciello via iommu
The helper function `dev_use_swiotlb` is used for various decision
making points for how to handle DMA mapping requests.

If the kernel doesn't have any memory allocated for swiotlb to use, then
an untrusted device being connected to the system may fail to initialize
when a request is made.

To avoid this situation, don't mark the use of swiotlb when it has not
been set up.

Signed-off-by: Mario Limonciello 
---
 drivers/iommu/dma-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..92ca136c8a12 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -504,7 +504,8 @@ static bool dev_is_untrusted(struct device *dev)
 
 static bool dev_use_swiotlb(struct device *dev)
 {
-   return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev);
+   return IS_ENABLED(CONFIG_SWIOTLB) && dev_is_untrusted(dev) \
+   && is_swiotlb_active(dev);
 }
 
 /**
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] swiotlb: Check that slabs have been allocated when requested

2022-04-04 Thread Mario Limonciello via iommu
If the IOMMU is in use and an untrusted device is connected to an external
facing port but the address requested isn't page aligned will cause the
kernel to attempt to use bounce buffers.

If the bounce buffers have not been allocated however, this leads
to messages like this:

swiotlb buffer is full (sz: 4096 bytes), total 0 (slots), used 0 (slots)

Clarify the error message because the buffer isn't full, it doesn't
exist!

Signed-off-by: Mario Limonciello 
---
 kernel/dma/swiotlb.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 73a41cec9e38..d2a20cedf0d2 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -591,6 +591,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
if (!mem)
panic("Can not allocate SWIOTLB buffer earlier and can't now 
provide you with the DMA bounce buffer");
 
+   if (!mem->nslabs) {
+   dev_warn_once(dev, "No slabs have been configured, unable to 
use SWIOTLB buffer");
+   return (phys_addr_t)DMA_MAPPING_ERROR;
+   }
+
if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
pr_warn_once("Memory encryption is active and system is using 
DMA bounce buffers\n");
 
-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] Fix issues with untrusted devices and AMD IOMMU

2022-04-04 Thread Mario Limonciello via iommu
It's been observed that plugging in a TBT3 NVME device to a port marked
with ExternalFacingPort that some DMA transactions occur that are not a
full page and so the DMA API attempts to use software bounce buffers
instead of relying upon the IOMMU translation.

This doesn't work and leads to messaging like:

swiotlb buffer is full (sz: 4096 bytes), total 0 (slots), used 0 (slots)

The bounce buffers were originally set up, but torn down during
the boot process.
* This happens because as part of IOMMU initialization
  `amd_iommu_init_dma_ops` gets called and resets the global swiotlb to 0.
* When late_init gets called `pci_swiotlb_late_init` `swiotlb_exit` is
  called and the buffers are torn down.

This can be observed in the logs:
```
[0.407286] AMD-Vi: Extended features (0x246577efa2254afa): PPR NX GT [5] IA 
GA PC GA_vAPIC
[0.407291] AMD-Vi: Interrupt remapping enabled
[0.407292] AMD-Vi: Virtual APIC enabled
[0.407872] software IO TLB: tearing down default memory pool
```

This series adds some better messaging in case something like this comes
up again and also adds checks that swiotlb really is active before
trying to use it.

Mario Limonciello (2):
  swiotlb: Check that slabs have been allocated when requested
  iommu: Don't use swiotlb unless it's active

 drivers/iommu/dma-iommu.c | 3 ++-
 kernel/dma/swiotlb.c  | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

-- 
2.34.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/15] swiotlb: provide swiotlb_init variants that remap the buffer

2022-04-04 Thread Alan Robinson
Hi Christoph,

On Mon, Apr 04, 2022 at 05:05:56AM +, Christoph Hellwig wrote:
> From: Christoph Hellwig 
> Subject: [PATCH 12/15] swiotlb: provide swiotlb_init variants that remap
>  the buffer
> 
> To shared more code between swiotlb and xen-swiotlb, offer a
> swiotlb_init_remap interface and add a remap callback to
> swiotlb_init_late that will allow Xen to remap the buffer the

s/the buffer//

> buffer without duplicating much of the logic.

Alan

> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/pci/sta2x11-fixup.c |  2 +-
>  include/linux/swiotlb.h  |  5 -
>  kernel/dma/swiotlb.c | 36 +---
>  3 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
> index c7e6faf59a861..7368afc039987 100644
> --- a/arch/x86/pci/sta2x11-fixup.c
> +++ b/arch/x86/pci/sta2x11-fixup.c
> @@ -57,7 +57,7 @@ static void sta2x11_new_instance(struct pci_dev *pdev)
>   int size = STA2X11_SWIOTLB_SIZE;
>   /* First instance: register your own swiotlb area */
>   dev_info(>dev, "Using SWIOTLB (size %i)\n", size);
> - if (swiotlb_init_late(size, GFP_DMA))
> + if (swiotlb_init_late(size, GFP_DMA, NULL))
>   dev_emerg(>dev, "init swiotlb failed\n");
>   }
>   list_add(>list, _instance_list);
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index ee655f2e4d28b..7b50c82f84ce9 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -36,8 +36,11 @@ struct scatterlist;
>  
>  int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, unsigned int 
> flags);
>  unsigned long swiotlb_size_or_default(void);
> +void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
> + int (*remap)(void *tlb, unsigned long nslabs));
> +int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> + int (*remap)(void *tlb, unsigned long nslabs));
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> -int swiotlb_init_late(size_t size, gfp_t gfp_mask);
>  extern void __init swiotlb_update_mem_attributes(void);
>  
>  phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 119187afc65ec..d5fe8f5e08300 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -256,9 +256,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs,
>   * Statically reserve bounce buffer space and initialize bounce buffer data
>   * structures for the software IO TLB used to implement the DMA API.
>   */
> -void __init swiotlb_init(bool addressing_limit, unsigned int flags)
> +void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
> + int (*remap)(void *tlb, unsigned long nslabs))
>  {
> - size_t bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
> + unsigned long nslabs = default_nslabs;
> + size_t bytes;
>   void *tlb;
>  
>   if (!addressing_limit && !swiotlb_force_bounce)
> @@ -271,12 +273,23 @@ void __init swiotlb_init(bool addressing_limit, 
> unsigned int flags)
>* allow to pick a location everywhere for hypervisors with guest
>* memory encryption.
>*/
> +retry:
> + bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
>   if (flags & SWIOTLB_ANY)
>   tlb = memblock_alloc(bytes, PAGE_SIZE);
>   else
>   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
>   if (!tlb)
>   goto fail;
> + if (remap && remap(tlb, nslabs) < 0) {
> + memblock_free(tlb, PAGE_ALIGN(bytes));
> +
> + nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
> + if (nslabs < IO_TLB_MIN_SLABS)
> + panic("%s: Failed to remap %zu bytes\n",
> +   __func__, bytes);
> + goto retry;
> + }
>   if (swiotlb_init_with_tbl(tlb, default_nslabs, flags))
>   goto fail_free_mem;
>   return;
> @@ -287,12 +300,18 @@ void __init swiotlb_init(bool addressing_limit, 
> unsigned int flags)
>   pr_warn("Cannot allocate buffer");
>  }
>  
> +void __init swiotlb_init(bool addressing_limit, unsigned int flags)
> +{
> + return swiotlb_init_remap(addressing_limit, flags, NULL);
> +}
> +
>  /*
>   * Systems with larger DMA zones (those that don't support ISA) can
>   * initialize the swiotlb later using the slab allocator if needed.
>   * This should be just like above, but with some error catching.
>   */
> -int swiotlb_init_late(size_t size, gfp_t gfp_mask)
> +int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> + int (*remap)(void *tlb, unsigned long nslabs))
>  {
>   unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
>   unsigned long bytes;
> @@ -303,6 +322,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask)
>   if (swiotlb_force_disable)
>   

[PATCH v9 11/11] iommu/arm-smmu: Get associated RMR info and install bypass SMR

2022-04-04 Thread Shameer Kolothum via iommu
From: Jon Nettleton 

Check if there is any RMR info associated with the devices behind
the SMMU and if any, install bypass SMRs for them. This is to
keep any ongoing traffic associated with these devices alive
when we enable/reset SMMU during probe().

Signed-off-by: Jon Nettleton 
Signed-off-by: Steven Price 
Signed-off-by: Shameer Kolothum 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 52 +++
 1 file changed, 52 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 9a5b785d28fd..d1d0473b8b88 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -2068,6 +2068,54 @@ err_reset_platform_ops: __maybe_unused;
return err;
 }
 
+static void arm_smmu_rmr_install_bypass_smr(struct arm_smmu_device *smmu)
+{
+   struct list_head rmr_list;
+   struct iommu_resv_region *e;
+   int idx, cnt = 0;
+   u32 reg;
+
+   INIT_LIST_HEAD(_list);
+   iort_get_rmr_sids(dev_fwnode(smmu->dev), _list);
+
+   /*
+* Rather than trying to look at existing mappings that
+* are setup by the firmware and then invalidate the ones
+* that do no have matching RMR entries, just disable the
+* SMMU until it gets enabled again in the reset routine.
+*/
+   reg = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_sCR0);
+   reg |= ARM_SMMU_sCR0_CLIENTPD;
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_sCR0, reg);
+
+   list_for_each_entry(e, _list, list) {
+   const u32 *sids = e->fw_data.rmr.sids;
+   u32 num_sids = e->fw_data.rmr.num_sids;
+   int i;
+
+   for (i = 0; i < num_sids; i++) {
+   idx = arm_smmu_find_sme(smmu, sids[i], ~0);
+   if (idx < 0)
+   continue;
+
+   if (smmu->s2crs[idx].count == 0) {
+   smmu->smrs[idx].id = sids[i];
+   smmu->smrs[idx].mask = 0;
+   smmu->smrs[idx].valid = true;
+   }
+   smmu->s2crs[idx].count++;
+   smmu->s2crs[idx].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[idx].privcfg = S2CR_PRIVCFG_DEFAULT;
+
+   cnt++;
+   }
+   }
+
+   dev_notice(smmu->dev, "\tpreserved %d boot mapping%s\n", cnt,
+  cnt == 1 ? "" : "s");
+   iort_put_rmr_sids(dev_fwnode(smmu->dev), _list);
+}
+
 static int arm_smmu_device_probe(struct platform_device *pdev)
 {
struct resource *res;
@@ -2189,6 +2237,10 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
}
 
platform_set_drvdata(pdev, smmu);
+
+   /* Check for RMRs and install bypass SMRs if any */
+   arm_smmu_rmr_install_bypass_smr(smmu);
+
arm_smmu_device_reset(smmu);
arm_smmu_test_smr_masks(smmu);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 10/11] iommu/arm-smmu-v3: Get associated RMR info and install bypass STE

2022-04-04 Thread Shameer Kolothum via iommu
Check if there is any RMR info associated with the devices behind
the SMMUv3 and if any, install bypass STEs for them. This is to
keep any ongoing traffic associated with these devices alive
when we enable/reset SMMUv3 during probe().

Signed-off-by: Shameer Kolothum 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 57f831c44155..627a2b498e78 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3754,6 +3754,36 @@ static void __iomem *arm_smmu_ioremap(struct device 
*dev, resource_size_t start,
return devm_ioremap_resource(dev, );
 }
 
+static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
+{
+   struct list_head rmr_list;
+   struct iommu_resv_region *e;
+
+   INIT_LIST_HEAD(_list);
+   iort_get_rmr_sids(dev_fwnode(smmu->dev), _list);
+
+   list_for_each_entry(e, _list, list) {
+   __le64 *step;
+   const u32 *sids = e->fw_data.rmr.sids;
+   u32 num_sids = e->fw_data.rmr.num_sids;
+   int ret, i;
+
+   for (i = 0; i < num_sids; i++) {
+   ret = arm_smmu_init_sid_strtab(smmu, sids[i]);
+   if (ret) {
+   dev_err(smmu->dev, "RMR SID(0x%x) bypass 
failed\n",
+   sids[i]);
+   continue;
+   }
+
+   step = arm_smmu_get_step_for_sid(smmu, sids[i]);
+   arm_smmu_init_bypass_stes(step, 1, true);
+   }
+   }
+
+   iort_put_rmr_sids(dev_fwnode(smmu->dev), _list);
+}
+
 static int arm_smmu_device_probe(struct platform_device *pdev)
 {
int irq, ret;
@@ -3835,6 +3865,9 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
/* Record our private device structure */
platform_set_drvdata(pdev, smmu);
 
+   /* Check for RMRs and install bypass STEs if any */
+   arm_smmu_rmr_install_bypass_ste(smmu);
+
/* Reset the device */
ret = arm_smmu_device_reset(smmu, bypass);
if (ret)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 09/11] iommu/arm-smmu-v3: Refactor arm_smmu_init_bypass_stes() to force bypass

2022-04-04 Thread Shameer Kolothum via iommu
By default, disable_bypass flag is set and any dev without
an iommu domain installs STE with CFG_ABORT during
arm_smmu_init_bypass_stes(). Introduce a "force" flag and
move the STE update logic to arm_smmu_init_bypass_stes()
so that we can force it to install CFG_BYPASS STE for specific
SIDs.

This will be useful in a follow-up patch to install bypass
for IORT RMR SIDs.

Signed-off-by: Shameer Kolothum 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 61558fdabbe3..57f831c44155 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1380,12 +1380,21 @@ static void arm_smmu_write_strtab_ent(struct 
arm_smmu_master *master, u32 sid,
arm_smmu_cmdq_issue_cmd(smmu, _cmd);
 }
 
-static void arm_smmu_init_bypass_stes(__le64 *strtab, unsigned int nent)
+static void arm_smmu_init_bypass_stes(__le64 *strtab, unsigned int nent, bool 
force)
 {
unsigned int i;
+   u64 val = STRTAB_STE_0_V;
+
+   if (disable_bypass && !force)
+   val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
+   else
+   val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
 
for (i = 0; i < nent; ++i) {
-   arm_smmu_write_strtab_ent(NULL, -1, strtab);
+   strtab[0] = cpu_to_le64(val);
+   strtab[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
+  
STRTAB_STE_1_SHCFG_INCOMING));
+   strtab[2] = 0;
strtab += STRTAB_STE_DWORDS;
}
 }
@@ -1413,7 +1422,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device 
*smmu, u32 sid)
return -ENOMEM;
}
 
-   arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
+   arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false);
arm_smmu_write_strtab_l1_desc(strtab, desc);
return 0;
 }
@@ -3051,7 +3060,7 @@ static int arm_smmu_init_strtab_linear(struct 
arm_smmu_device *smmu)
reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
cfg->strtab_base_cfg = reg;
 
-   arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
+   arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false);
return 0;
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 08/11] iommu/arm-smmu-v3: Introduce strtab init helper

2022-04-04 Thread Shameer Kolothum via iommu
Introduce a helper to check the sid range and to init the l2 strtab
entries(bypass). This will be useful when we have to initialize the
l2 strtab with bypass for RMR SIDs.

Signed-off-by: Shameer Kolothum 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 28 +++--
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index efa38b4411f3..61558fdabbe3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2537,6 +2537,19 @@ static bool arm_smmu_sid_in_range(struct arm_smmu_device 
*smmu, u32 sid)
return sid < limit;
 }
 
+static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid)
+{
+   /* Check the SIDs are in range of the SMMU and our stream table */
+   if (!arm_smmu_sid_in_range(smmu, sid))
+   return -ERANGE;
+
+   /* Ensure l2 strtab is initialised */
+   if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB)
+   return arm_smmu_init_l2_strtab(smmu, sid);
+
+   return 0;
+}
+
 static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
  struct arm_smmu_master *master)
 {
@@ -2560,20 +2573,9 @@ static int arm_smmu_insert_master(struct arm_smmu_device 
*smmu,
new_stream->id = sid;
new_stream->master = master;
 
-   /*
-* Check the SIDs are in range of the SMMU and our stream table
-*/
-   if (!arm_smmu_sid_in_range(smmu, sid)) {
-   ret = -ERANGE;
+   ret = arm_smmu_init_sid_strtab(smmu, sid);
+   if (ret)
break;
-   }
-
-   /* Ensure l2 strtab is initialised */
-   if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) {
-   ret = arm_smmu_init_l2_strtab(smmu, sid);
-   if (ret)
-   break;
-   }
 
/* Insert into SID tree */
new_node = &(smmu->streams.rb_node);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 07/11] ACPI/IORT: Add a helper to retrieve RMR info directly

2022-04-04 Thread Shameer Kolothum via iommu
This will provide a way for SMMU drivers to retrieve StreamIDs
associated with IORT RMR nodes and use that to set bypass settings
for those IDs.

Signed-off-by: Shameer Kolothum 
---
 drivers/acpi/arm64/iort.c | 29 +
 include/linux/acpi_iort.h |  8 
 2 files changed, 37 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 1147387cfddb..fb2b0163c27d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1402,6 +1402,35 @@ int iort_dma_get_ranges(struct device *dev, u64 *size)
return nc_dma_get_range(dev, size);
 }
 
+/**
+ * iort_get_rmr_sids - Retrieve IORT RMR node reserved regions with
+ * associated StreamIDs information.
+ * @iommu_fwnode: fwnode associated with IOMMU
+ * @head: Resereved region list
+ */
+void iort_get_rmr_sids(struct fwnode_handle *iommu_fwnode,
+  struct list_head *head)
+{
+   iort_iommu_rmr_get_resv_regions(iommu_fwnode, NULL, head);
+}
+EXPORT_SYMBOL_GPL(iort_get_rmr_sids);
+
+/**
+ * iort_put_rmr_sids - Free all the memory allocated for RMR reserved regions.
+ * @iommu_fwnode: fwnode associated with IOMMU
+ * @head: Resereved region list
+ */
+void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode,
+  struct list_head *head)
+{
+   struct iommu_resv_region *entry, *next;
+
+   iort_iommu_put_resv_regions(NULL, head);
+   list_for_each_entry_safe(entry, next, head, list)
+   kfree(entry);
+}
+EXPORT_SYMBOL_GPL(iort_put_rmr_sids);
+
 static void __init acpi_iort_register_irq(int hwirq, const char *name,
  int trigger,
  struct resource *res)
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index eb3c28853110..774b8bc16573 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -33,6 +33,10 @@ struct irq_domain *iort_get_device_domain(struct device 
*dev, u32 id,
  enum irq_domain_bus_token bus_token);
 void acpi_configure_pmsi_domain(struct device *dev);
 int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
+void iort_get_rmr_sids(struct fwnode_handle *iommu_fwnode,
+  struct list_head *head);
+void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode,
+  struct list_head *head);
 /* IOMMU interface */
 int iort_dma_get_ranges(struct device *dev, u64 *size);
 int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
@@ -47,6 +51,10 @@ static inline struct irq_domain *iort_get_device_domain(
struct device *dev, u32 id, enum irq_domain_bus_token bus_token)
 { return NULL; }
 static inline void acpi_configure_pmsi_domain(struct device *dev) { }
+static inline
+void iort_get_rmr_sids(struct fwnode_handle *iommu_fwnode, struct list_head 
*head) { }
+static inline
+void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode, struct list_head 
*head) { }
 /* IOMMU interface */
 static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
 { return -ENODEV; }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 06/11] ACPI/IORT: Add support to retrieve IORT RMR reserved regions

2022-04-04 Thread Shameer Kolothum via iommu
Parse through the IORT RMR nodes and populate the reserve region list
corresponding to a given IOMMU and device(optional). Also, go through
the ID mappings of the RMR node and retrieve all the SIDs associated
with it.

Now that we have this support, update iommu_dma_get/_put_resv_regions()
paths to include the RMR reserve regions.

Signed-off-by: Shameer Kolothum 
---
 drivers/acpi/arm64/iort.c | 275 ++
 drivers/iommu/dma-iommu.c |   3 +
 include/linux/acpi_iort.h |   4 +
 3 files changed, 282 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 63acc3c5b275..1147387cfddb 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -812,6 +812,259 @@ void acpi_configure_pmsi_domain(struct device *dev)
 }
 
 #ifdef CONFIG_IOMMU_API
+static void iort_rmr_desc_check_overlap(struct acpi_iort_rmr_desc *desc, u32 
count)
+{
+   int i, j;
+
+   for (i = 0; i < count; i++) {
+   u64 end, start = desc[i].base_address, length = desc[i].length;
+
+   if (!length) {
+   pr_err(FW_BUG "RMR descriptor[0x%llx] with zero length, 
continue anyway\n",
+  start);
+   continue;
+   }
+
+   end = start + length - 1;
+
+   /* Check for address overlap */
+   for (j = i + 1; j < count; j++) {
+   u64 e_start = desc[j].base_address;
+   u64 e_end = e_start + desc[j].length - 1;
+
+   if (start <= e_end && end >= e_start)
+   pr_err(FW_BUG "RMR descriptor[0x%llx - 0x%llx] 
overlaps, continue anyway\n",
+  start, end);
+   }
+   }
+}
+
+/*
+ * Please note, we will keep the already allocated RMR reserve
+ * regions in case of a memory allocation failure.
+ */
+static void iort_get_rmrs(struct acpi_iort_node *node,
+ struct acpi_iort_node *smmu,
+ u32 *sids, u32 num_sids,
+ struct list_head *head)
+{
+   struct acpi_iort_rmr *rmr = (struct acpi_iort_rmr *)node->node_data;
+   struct acpi_iort_rmr_desc *rmr_desc;
+   int i;
+
+   rmr_desc = ACPI_ADD_PTR(struct acpi_iort_rmr_desc, node,
+   rmr->rmr_offset);
+
+   iort_rmr_desc_check_overlap(rmr_desc, rmr->rmr_count);
+
+   for (i = 0; i < rmr->rmr_count; i++, rmr_desc++) {
+   struct iommu_resv_region *region;
+   enum iommu_resv_type type;
+   u32  *sids_copy;
+   int prot = IOMMU_READ | IOMMU_WRITE;
+   u64 addr = rmr_desc->base_address, size = rmr_desc->length;
+
+   if (!IS_ALIGNED(addr, SZ_64K) || !IS_ALIGNED(size, SZ_64K)) {
+   /* PAGE align base addr and size */
+   addr &= PAGE_MASK;
+   size = PAGE_ALIGN(size + 
offset_in_page(rmr_desc->base_address));
+
+   pr_err(FW_BUG "RMR descriptor[0x%llx - 0x%llx] not 
aligned to 64K, continue with [0x%llx - 0x%llx]\n",
+  rmr_desc->base_address,
+  rmr_desc->base_address + rmr_desc->length - 1,
+  addr, addr + size - 1);
+   }
+
+   if (rmr->flags & ACPI_IORT_RMR_REMAP_PERMITTED)
+   type = IOMMU_RESV_DIRECT_RELAXABLE;
+   else
+   type = IOMMU_RESV_DIRECT;
+
+   if (rmr->flags & ACPI_IORT_RMR_ACCESS_PRIVILEGE)
+   prot |= IOMMU_PRIV;
+
+   /* Attributes 0x00 - 0x03 represents device memory */
+   if (ACPI_IORT_RMR_ACCESS_ATTRIBUTES(rmr->flags) <=
+   ACPI_IORT_RMR_ATTR_DEVICE_GRE)
+   prot |= IOMMU_MMIO;
+   else if (ACPI_IORT_RMR_ACCESS_ATTRIBUTES(rmr->flags) ==
+   ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB)
+   prot |= IOMMU_CACHE;
+
+   /* Create a copy of SIDs array to associate with this resv 
region */
+   sids_copy = kmemdup(sids, num_sids * sizeof(*sids), GFP_KERNEL);
+   if (!sids_copy)
+   return;
+
+   region = iommu_alloc_resv_region(addr, size, prot, type);
+   if (!region) {
+   kfree(sids_copy);
+   return;
+   }
+
+   region->fw_data.rmr.sids = sids_copy;
+   region->fw_data.rmr.num_sids = num_sids;
+   list_add_tail(>list, head);
+   }
+}
+
+static u32 *iort_rmr_alloc_sids(u32 *sids, u32 count, u32 id_start,
+   u32 new_count)
+{
+   u32 *new_sids;
+   u32 total_count = count + new_count;
+   int i;
+
+   new_sids = krealloc_array(sids, count + new_count,
+  

[PATCH v9 05/11] iommu/dma: Introduce a helper to remove reserved regions

2022-04-04 Thread Shameer Kolothum via iommu
Currently drivers use generic_iommu_put_resv_regions() to remove
reserved regions. Introduce a dma-iommu specific reserve region
removal helper(iommu_dma_put_resv_regions()). This will be useful
when we introduce reserve regions with any firmware specific memory
allocations(eg: IORT RMR) that have to be freed. Also update current
users of iommu_dma_get_resv_regions() to use iommu_dma_put_resv_regions()
for removal.

Signed-off-by: Shameer Kolothum 
---
 drivers/iommu/apple-dart.c  | 2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 2 +-
 drivers/iommu/dma-iommu.c   | 6 ++
 drivers/iommu/virtio-iommu.c| 2 +-
 include/linux/dma-iommu.h   | 5 +
 6 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c
index decafb07ad08..6c198a08e50f 100644
--- a/drivers/iommu/apple-dart.c
+++ b/drivers/iommu/apple-dart.c
@@ -771,7 +771,7 @@ static const struct iommu_ops apple_dart_iommu_ops = {
.of_xlate = apple_dart_of_xlate,
.def_domain_type = apple_dart_def_domain_type,
.get_resv_regions = apple_dart_get_resv_regions,
-   .put_resv_regions = generic_iommu_put_resv_regions,
+   .put_resv_regions = iommu_dma_put_resv_regions,
.pgsize_bitmap = -1UL, /* Restricted during dart probe */
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = apple_dart_attach_dev,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 627a3ed5ee8f..efa38b4411f3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2847,7 +2847,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
-   .put_resv_regions   = generic_iommu_put_resv_regions,
+   .put_resv_regions   = iommu_dma_put_resv_regions,
.dev_has_feat   = arm_smmu_dev_has_feature,
.dev_feat_enabled   = arm_smmu_dev_feature_enabled,
.dev_enable_feat= arm_smmu_dev_enable_feature,
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 568cce590ccc..9a5b785d28fd 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1589,7 +1589,7 @@ static struct iommu_ops arm_smmu_ops = {
.device_group   = arm_smmu_device_group,
.of_xlate   = arm_smmu_of_xlate,
.get_resv_regions   = arm_smmu_get_resv_regions,
-   .put_resv_regions   = generic_iommu_put_resv_regions,
+   .put_resv_regions   = iommu_dma_put_resv_regions,
.def_domain_type= arm_smmu_def_domain_type,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
.owner  = THIS_MODULE,
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 93d76b666888..44e3f3feaab6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -389,6 +389,12 @@ void iommu_dma_get_resv_regions(struct device *dev, struct 
list_head *list)
 }
 EXPORT_SYMBOL(iommu_dma_get_resv_regions);
 
+void iommu_dma_put_resv_regions(struct device *dev, struct list_head *list)
+{
+   generic_iommu_put_resv_regions(dev, list);
+}
+EXPORT_SYMBOL(iommu_dma_put_resv_regions);
+
 static int cookie_init_hw_msi_region(struct iommu_dma_cookie *cookie,
phys_addr_t start, phys_addr_t end)
 {
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 25be4b822aa0..b8fea7576bbd 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -1013,7 +1013,7 @@ static struct iommu_ops viommu_ops = {
.release_device = viommu_release_device,
.device_group   = viommu_device_group,
.get_resv_regions   = viommu_get_resv_regions,
-   .put_resv_regions   = generic_iommu_put_resv_regions,
+   .put_resv_regions   = iommu_dma_put_resv_regions,
.of_xlate   = viommu_of_xlate,
.owner  = THIS_MODULE,
.default_domain_ops = &(const struct iommu_domain_ops) {
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 24607dc3c2ac..0628db1e3272 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -37,6 +37,7 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
   struct msi_msg *msg);
 
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
+void iommu_dma_put_resv_regions(struct device *dev, struct list_head *list);
 
 void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
struct iommu_domain *domain);
@@ -89,5 +90,9 

[PATCH v9 04/11] ACPI/IORT: Provide a generic helper to retrieve reserve regions

2022-04-04 Thread Shameer Kolothum via iommu
Currently IORT provides a helper to retrieve HW MSI reserve regions.
Change this to a generic helper to retrieve any IORT related reserve
regions. This will be useful when we add support for RMR nodes in
subsequent patches.

Signed-off-by: Shameer Kolothum 
---
 drivers/acpi/arm64/iort.c | 23 +++
 drivers/iommu/dma-iommu.c |  2 +-
 include/linux/acpi_iort.h |  4 ++--
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index c5ebb2be9a19..63acc3c5b275 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -830,16 +830,13 @@ static struct acpi_iort_node 
*iort_get_msi_resv_iommu(struct device *dev)
return NULL;
 }
 
-/**
- * iort_iommu_msi_get_resv_regions - Reserved region driver helper
- *   for HW MSI regions.
- * @dev: Device from iommu_get_resv_regions()
- * @head: Reserved region list from iommu_get_resv_regions()
- *
+/*
+ * Retrieve platform specific HW MSI reserve regions.
  * The ITS interrupt translation spaces (ITS_base + SZ_64K, SZ_64K)
  * associated with the device are the HW MSI reserved regions.
  */
-void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
+static void
+iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct acpi_iort_its_group *its;
@@ -888,6 +885,16 @@ void iort_iommu_msi_get_resv_regions(struct device *dev, 
struct list_head *head)
}
 }
 
+/**
+ * iort_iommu_get_resv_regions - Generic helper to retrieve reserved regions.
+ * @dev: Device from iommu_get_resv_regions()
+ * @head: Reserved region list from iommu_get_resv_regions()
+ */
+void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head)
+{
+   iort_iommu_msi_get_resv_regions(dev, head);
+}
+
 static inline bool iort_iommu_driver_enabled(u8 type)
 {
switch (type) {
@@ -1052,7 +1059,7 @@ int iort_iommu_configure_id(struct device *dev, const u32 
*id_in)
 }
 
 #else
-void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
+void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head)
 { }
 int iort_iommu_configure_id(struct device *dev, const u32 *input_id)
 { return -ENODEV; }
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..93d76b666888 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -384,7 +384,7 @@ void iommu_dma_get_resv_regions(struct device *dev, struct 
list_head *list)
 {
 
if (!is_of_node(dev_iommu_fwspec_get(dev)->iommu_fwnode))
-   iort_iommu_msi_get_resv_regions(dev, list);
+   iort_iommu_get_resv_regions(dev, list);
 
 }
 EXPORT_SYMBOL(iommu_dma_get_resv_regions);
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index a8198b83753d..e5d2de9caf7f 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -36,7 +36,7 @@ int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 /* IOMMU interface */
 int iort_dma_get_ranges(struct device *dev, u64 *size);
 int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
-void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
+void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head);
 phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
 #else
 static inline void acpi_iort_init(void) { }
@@ -52,7 +52,7 @@ static inline int iort_dma_get_ranges(struct device *dev, u64 
*size)
 static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
 { return -ENODEV; }
 static inline
-void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
+void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head)
 { }
 
 static inline phys_addr_t acpi_iort_dma_get_max_cpu_address(void)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v9 03/11] ACPI/IORT: Make iort_iommu_msi_get_resv_regions() return void

2022-04-04 Thread Shameer Kolothum via iommu
At present iort_iommu_msi_get_resv_regions() returns the number of
MSI reserved regions on success and there are no users for this.
The reserved region list will get populated anyway for platforms
that require the HW MSI region reservation. Hence, change the
function to return void instead.

Signed-off-by: Shameer Kolothum 
---
 drivers/acpi/arm64/iort.c | 26 ++
 include/linux/acpi_iort.h |  6 +++---
 2 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index fd06cf43ba31..c5ebb2be9a19 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -832,25 +832,23 @@ static struct acpi_iort_node 
*iort_get_msi_resv_iommu(struct device *dev)
 
 /**
  * iort_iommu_msi_get_resv_regions - Reserved region driver helper
+ *   for HW MSI regions.
  * @dev: Device from iommu_get_resv_regions()
  * @head: Reserved region list from iommu_get_resv_regions()
  *
- * Returns: Number of msi reserved regions on success (0 if platform
- *  doesn't require the reservation or no associated msi regions),
- *  appropriate error value otherwise. The ITS interrupt translation
- *  spaces (ITS_base + SZ_64K, SZ_64K) associated with the device
- *  are the msi reserved regions.
+ * The ITS interrupt translation spaces (ITS_base + SZ_64K, SZ_64K)
+ * associated with the device are the HW MSI reserved regions.
  */
-int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
+void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct acpi_iort_its_group *its;
struct acpi_iort_node *iommu_node, *its_node = NULL;
-   int i, resv = 0;
+   int i;
 
iommu_node = iort_get_msi_resv_iommu(dev);
if (!iommu_node)
-   return 0;
+   return;
 
/*
 * Current logic to reserve ITS regions relies on HW topologies
@@ -870,7 +868,7 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, 
struct list_head *head)
}
 
if (!its_node)
-   return 0;
+   return;
 
/* Move to ITS specific data */
its = (struct acpi_iort_its_group *)its_node->node_data;
@@ -884,14 +882,10 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, 
struct list_head *head)
 
region = iommu_alloc_resv_region(base + SZ_64K, SZ_64K,
 prot, IOMMU_RESV_MSI);
-   if (region) {
+   if (region)
list_add_tail(>list, head);
-   resv++;
-   }
}
}
-
-   return (resv == its->its_count) ? resv : -ENODEV;
 }
 
 static inline bool iort_iommu_driver_enabled(u8 type)
@@ -1058,8 +1052,8 @@ int iort_iommu_configure_id(struct device *dev, const u32 
*id_in)
 }
 
 #else
-int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
-{ return 0; }
+void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
+{ }
 int iort_iommu_configure_id(struct device *dev, const u32 *input_id)
 { return -ENODEV; }
 #endif
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index f1f0842a2cb2..a8198b83753d 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -36,7 +36,7 @@ int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 /* IOMMU interface */
 int iort_dma_get_ranges(struct device *dev, u64 *size);
 int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
-int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
+void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
 phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
 #else
 static inline void acpi_iort_init(void) { }
@@ -52,8 +52,8 @@ static inline int iort_dma_get_ranges(struct device *dev, u64 
*size)
 static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
 { return -ENODEV; }
 static inline
-int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
-{ return 0; }
+void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
+{ }
 
 static inline phys_addr_t acpi_iort_dma_get_max_cpu_address(void)
 { return PHYS_ADDR_MAX; }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 02/11] iommu: Introduce a union to struct iommu_resv_region

2022-04-04 Thread Shameer Kolothum via iommu
A union is introduced to struct iommu_resv_region to hold
any firmware specific data. This is in preparation to add
support for IORT RMR reserve regions and the union now holds
the RMR specific information.

Signed-off-by: Shameer Kolothum 
---
 include/linux/iommu.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9208eca4b0d1..733f46b14ac8 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,6 +127,11 @@ enum iommu_resv_type {
IOMMU_RESV_SW_MSI,
 };
 
+struct iommu_iort_rmr_data {
+   const u32 *sids;/* Stream IDs associated with IORT RMR entry */
+   u32 num_sids;
+};
+
 /**
  * struct iommu_resv_region - descriptor for a reserved memory region
  * @list: Linked list pointers
@@ -134,6 +139,7 @@ enum iommu_resv_type {
  * @length: Length of the region in bytes
  * @prot: IOMMU Protection flags (READ/WRITE/...)
  * @type: Type of the reserved region
+ * @fw_data: Firmware-specific data
  */
 struct iommu_resv_region {
struct list_headlist;
@@ -141,6 +147,9 @@ struct iommu_resv_region {
size_t  length;
int prot;
enum iommu_resv_typetype;
+   union {
+   struct iommu_iort_rmr_data rmr;
+   } fw_data;
 };
 
 /**
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 01/11] ACPI/IORT: Add temporary RMR node flag definitions

2022-04-04 Thread Shameer Kolothum via iommu
IORT rev E.d introduces more details into the RMR node Flags
field. Add temporary definitions to describe and access these
Flags field until ACPICA header is updated to support E.d.

This patch can be reverted once the include/acpi/actbl2.h has
all the relevant definitions.

Signed-off-by: Shameer Kolothum 
---
Please find the ACPICA E.d related changes pull request here,
https://github.com/acpica/acpica/pull/765

This is now merged to acpica:master.

---
 drivers/acpi/arm64/iort.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index f2f8f05662de..fd06cf43ba31 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -25,6 +25,30 @@
 #define IORT_IOMMU_TYPE((1 << ACPI_IORT_NODE_SMMU) |   \
(1 << ACPI_IORT_NODE_SMMU_V3))
 
+/*
+ * The following RMR related definitions are temporary and
+ * can be removed once ACPICA headers support IORT rev E.d
+ */
+#ifndef ACPI_IORT_RMR_REMAP_PERMITTED
+#define ACPI_IORT_RMR_REMAP_PERMITTED  (1)
+#endif
+
+#ifndef ACPI_IORT_RMR_ACCESS_PRIVILEGE
+#define ACPI_IORT_RMR_ACCESS_PRIVILEGE (1 << 1)
+#endif
+
+#ifndef ACPI_IORT_RMR_ACCESS_ATTRIBUTES
+#define ACPI_IORT_RMR_ACCESS_ATTRIBUTES(flags) (((flags) >> 2) & 0xFF)
+#endif
+
+#ifndef ACPI_IORT_RMR_ATTR_DEVICE_GRE
+#define ACPI_IORT_RMR_ATTR_DEVICE_GRE  0x03
+#endif
+
+#ifndef ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB
+#define ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB  0x05
+#endif
+
 struct iort_its_msi_chip {
struct list_headlist;
struct fwnode_handle*fw_node;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 00/11] ACPI/IORT: Support for IORT RMR node

2022-04-04 Thread Shameer Kolothum via iommu
Hi

v8 --> v9
 - Adressed comments from Robin on interfaces as discussed here[0].
 - Addressed comments from Lorenzo.
 
Though functionally there aren't any major changes, the interfaces have
changed from v8 and for that reason not included the T-by tags from
Steve and Eric yet(Many thanks for that). Appreciate it if you could
give this a spin and let me know.

(The revised ACPICA pull request for IORT E.d related changes is
here[1] and this is now merged to acpica:master.)

Please take a look and let me know your thoughts.

Thanks,
Shameer
[0] 
https://lore.kernel.org/linux-arm-kernel/c982f1d7-c565-769a-abae-79c962969...@arm.com/
[1] https://github.com/acpica/acpica/pull/765

From old:
We have faced issues with 3408iMR RAID controller cards which
fail to boot when SMMU is enabled. This is because these
controllers make use of host memory for various caching related
purposes and when SMMU is enabled the iMR firmware fails to
access these memory regions as there is no mapping for them.
IORT RMR provides a way for UEFI to describe and report these
memory regions so that the kernel can make a unity mapping for
these in SMMU.

Change History:

v7 --> v8
  - Patch #1 has temp definitions for RMR related changes till
the ACPICA header changes are part of kernel.
  - No early parsing of RMR node info and is only parsed at the
time of use.
  - Changes to the RMR get/put API format compared to the
previous version.
  - Support for RMR descriptor shared by multiple stream IDs.

v6 --> v7
 -fix pointed out by Steve to the SMMUv2 SMR bypass install in patch #8.

v5 --> v6
- Addressed comments from Robin & Lorenzo.
  : Moved iort_parse_rmr() to acpi_iort_init() from
iort_init_platform_devices().
  : Removed use of struct iort_rmr_entry during the initial
parse. Using struct iommu_resv_region instead.
  : Report RMR address alignment and overlap errors, but continue.
  : Reworked arm_smmu_init_bypass_stes() (patch # 6).
- Updated SMMUv2 bypass SMR code. Thanks to Jon N (patch #8).
- Set IOMMU protection flags(IOMMU_CACHE, IOMMU_MMIO) based
  on Type of RMR region. Suggested by Jon N.

v4 --> v5
 -Added a fw_data union to struct iommu_resv_region and removed
  struct iommu_rmr (Based on comments from Joerg/Robin).
 -Added iommu_put_rmrs() to release mem.
 -Thanks to Steve for verifying on SMMUv2, but not added the Tested-by
  yet because of the above changes.

v3 -->v4
-Included the SMMUv2 SMR bypass install changes suggested by
 Steve(patch #7)
-As per Robin's comments, RMR reserve implementation is now
 more generic  (patch #8) and dropped v3 patches 8 and 10.
-Rebase to 5.13-rc1

RFC v2 --> v3
 -Dropped RFC tag as the ACPICA header changes are now ready to be
  part of 5.13[0]. But this series still has a dependency on that patch.
 -Added IORT E.b related changes(node flags, _DSM function 5 checks for
  PCIe).
 -Changed RMR to stream id mapping from M:N to M:1 as per the spec and
  discussion here[1].
 -Last two patches add support for SMMUv2(Thanks to Jon Nettleton!)

Jon Nettleton (1):
  iommu/arm-smmu: Get associated RMR info and install bypass SMR

Shameer Kolothum (10):
  ACPI/IORT: Add temporary RMR node flag definitions
  iommu: Introduce a union to struct iommu_resv_region
  ACPI/IORT: Make iort_iommu_msi_get_resv_regions() return void
  ACPI/IORT: Provide a generic helper to retrieve reserve regions
  iommu/dma: Introduce a helper to remove reserved regions
  ACPI/IORT: Add support to retrieve IORT RMR reserved regions
  ACPI/IORT: Add a helper to retrieve RMR info directly
  iommu/arm-smmu-v3: Introduce strtab init helper
  iommu/arm-smmu-v3: Refactor arm_smmu_init_bypass_stes() to force
bypass
  iommu/arm-smmu-v3: Get associated RMR info and install bypass STE

 drivers/acpi/arm64/iort.c   | 369 ++--
 drivers/iommu/apple-dart.c  |   2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  80 -
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |  54 ++-
 drivers/iommu/dma-iommu.c   |  11 +-
 drivers/iommu/virtio-iommu.c|   2 +-
 include/linux/acpi_iort.h   |  18 +-
 include/linux/dma-iommu.h   |   5 +
 include/linux/iommu.h   |   9 +
 9 files changed, 505 insertions(+), 45 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH RESEND v5 5/5] iova: Add iova_len argument to iova_domain_init_rcaches()

2022-04-04 Thread John Garry via iommu
Add max opt argument to iova_domain_init_rcaches(), and use it to set the
rcaches range.

Also fix up all users to set this value (at 0, meaning use default),
including a wrapper for that, iova_domain_init_rcaches_default().

For dma-iommu.c we derive the iova_len argument from the IOMMU group
max opt DMA size.

Signed-off-by: John Garry 
---
 drivers/iommu/dma-iommu.c| 15 ++-
 drivers/iommu/iova.c | 19 ---
 drivers/vdpa/vdpa_user/iova_domain.c |  4 ++--
 include/linux/iova.h |  3 ++-
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 42ca42ff1b5d..19f35624611c 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -525,6 +525,8 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
struct iova_domain *iovad;
+   size_t max_opt_dma_size;
+   unsigned long iova_len = 0;
int ret;
 
if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
@@ -560,7 +562,18 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
}
 
init_iova_domain(iovad, 1UL << order, base_pfn);
-   ret = iova_domain_init_rcaches(iovad);
+
+   max_opt_dma_size = iommu_group_get_max_opt_dma_size(dev->iommu_group);
+   if (max_opt_dma_size) {
+   unsigned long shift = __ffs(1UL << order);
+
+   iova_len = roundup_pow_of_two(max_opt_dma_size);
+   iova_len >>= shift;
+   if (!iova_len)
+   iova_len = 1;
+   }
+
+   ret = iova_domain_init_rcaches(iovad, iova_len);
if (ret)
return ret;
 
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 5c22b9187b79..d65e79e132ee 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -706,12 +706,20 @@ static void iova_magazine_push(struct iova_magazine *mag, 
unsigned long pfn)
mag->pfns[mag->size++] = pfn;
 }
 
-int iova_domain_init_rcaches(struct iova_domain *iovad)
+static unsigned long iova_len_to_rcache_max(unsigned long iova_len)
+{
+   return order_base_2(iova_len) + 1;
+}
+
+int iova_domain_init_rcaches(struct iova_domain *iovad, unsigned long iova_len)
 {
unsigned int cpu;
int i, ret;
 
-   iovad->rcache_max_size = 6; /* Arbitrarily high default */
+   if (iova_len)
+   iovad->rcache_max_size = iova_len_to_rcache_max(iova_len);
+   else
+   iovad->rcache_max_size = 6; /* Arbitrarily high default */
 
iovad->rcaches = kcalloc(iovad->rcache_max_size,
 sizeof(struct iova_rcache),
@@ -755,7 +763,12 @@ int iova_domain_init_rcaches(struct iova_domain *iovad)
free_iova_rcaches(iovad);
return ret;
 }
-EXPORT_SYMBOL_GPL(iova_domain_init_rcaches);
+
+int iova_domain_init_rcaches_default(struct iova_domain *iovad)
+{
+   return iova_domain_init_rcaches(iovad, 0);
+}
+EXPORT_SYMBOL_GPL(iova_domain_init_rcaches_default);
 
 /*
  * Try inserting IOVA range starting with 'iova_pfn' into 'rcache', and
diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
b/drivers/vdpa/vdpa_user/iova_domain.c
index 6daa3978d290..3a2acef98a4a 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -514,12 +514,12 @@ vduse_domain_create(unsigned long iova_limit, size_t 
bounce_size)
spin_lock_init(>iotlb_lock);
init_iova_domain(>stream_iovad,
PAGE_SIZE, IOVA_START_PFN);
-   ret = iova_domain_init_rcaches(>stream_iovad);
+   ret = iova_domain_init_rcaches_default(>stream_iovad);
if (ret)
goto err_iovad_stream;
init_iova_domain(>consistent_iovad,
PAGE_SIZE, bounce_pfns);
-   ret = iova_domain_init_rcaches(>consistent_iovad);
+   ret = iova_domain_init_rcaches_default(>consistent_iovad);
if (ret)
goto err_iovad_consistent;
 
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 02f7222fa85a..56281434ce0c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -95,7 +95,8 @@ struct iova *reserve_iova(struct iova_domain *iovad, unsigned 
long pfn_lo,
unsigned long pfn_hi);
 void init_iova_domain(struct iova_domain *iovad, unsigned long granule,
unsigned long start_pfn);
-int iova_domain_init_rcaches(struct iova_domain *iovad);
+int iova_domain_init_rcaches(struct iova_domain *iovad, unsigned long 
iova_len);
+int iova_domain_init_rcaches_default(struct iova_domain *iovad);
 struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn);
 void put_iova_domain(struct iova_domain *iovad);
 #else
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH RESEND v5 4/5] iommu: Allow max opt DMA len be set for a group via sysfs

2022-04-04 Thread John Garry via iommu
Add support to allow the maximum optimised DMA len be set for an IOMMU
group via sysfs.

This is much the same with the method to change the default domain type
for a group.

Signed-off-by: John Garry 
---
 .../ABI/testing/sysfs-kernel-iommu_groups | 16 +
 drivers/iommu/iommu.c | 59 ++-
 include/linux/iommu.h |  6 ++
 3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups 
b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
index b15af6a5bc08..ed6f72794f6c 100644
--- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@@ -63,3 +63,19 @@ Description: /sys/kernel/iommu_groups//type shows 
the type of default
system could lead to catastrophic effects (the users might
need to reboot the machine to get it to normal state). So, it's
expected that the users understand what they're doing.
+
+What:  /sys/kernel/iommu_groups//max_opt_dma_size
+Date:  Feb 2022
+KernelVersion: v5.18
+Contact:   iommu@lists.linux-foundation.org
+Description:   /sys/kernel/iommu_groups//max_opt_dma_size shows the
+   max optimised DMA size for the default IOMMU domain associated
+   with the group.
+   Each IOMMU domain has an IOVA domain. The IOVA domain caches
+   IOVAs upto a certain size as a performance optimisation.
+   This sysfs file allows the range of the IOVA domain caching be
+   set, such that larger than default IOVAs may be cached.
+   A value of 0 means that the default caching range is chosen.
+   A privileged user could request the kernel the change the range
+   by writing to this file. For this to happen, the same rules
+   and procedure applies as in changing the default domain type.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 10bb10c2a210..7c7258f19bed 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -48,6 +48,7 @@ struct iommu_group {
struct iommu_domain *default_domain;
struct iommu_domain *domain;
struct list_head entry;
+   size_t max_opt_dma_size;
 };
 
 struct group_device {
@@ -89,6 +90,9 @@ static int iommu_create_device_direct_mappings(struct 
iommu_group *group,
 static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
 static ssize_t iommu_group_store_type(struct iommu_group *group,
  const char *buf, size_t count);
+static ssize_t iommu_group_store_max_opt_dma_size(struct iommu_group *group,
+ const char *buf,
+ size_t count);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -571,6 +575,12 @@ static ssize_t iommu_group_show_type(struct iommu_group 
*group,
return strlen(type);
 }
 
+static ssize_t iommu_group_show_max_opt_dma_size(struct iommu_group *group,
+char *buf)
+{
+   return sprintf(buf, "%zu\n", group->max_opt_dma_size);
+}
+
 static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);
 
 static IOMMU_GROUP_ATTR(reserved_regions, 0444,
@@ -579,6 +589,9 @@ static IOMMU_GROUP_ATTR(reserved_regions, 0444,
 static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type,
iommu_group_store_type);
 
+static IOMMU_GROUP_ATTR(max_opt_dma_size, 0644, 
iommu_group_show_max_opt_dma_size,
+   iommu_group_store_max_opt_dma_size);
+
 static void iommu_group_release(struct kobject *kobj)
 {
struct iommu_group *group = to_iommu_group(kobj);
@@ -665,6 +678,10 @@ struct iommu_group *iommu_group_alloc(void)
if (ret)
return ERR_PTR(ret);
 
+   ret = iommu_group_create_file(group, 
_group_attr_max_opt_dma_size);
+   if (ret)
+   return ERR_PTR(ret);
+
pr_debug("Allocated group %d\n", group->id);
 
return group;
@@ -2087,6 +2104,11 @@ struct iommu_domain *iommu_get_dma_domain(struct device 
*dev)
return dev->iommu_group->default_domain;
 }
 
+size_t iommu_group_get_max_opt_dma_size(struct iommu_group *group)
+{
+   return group->max_opt_dma_size;
+}
+
 /*
  * IOMMU groups are really the natural working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap by
@@ -2871,12 +2893,14 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
  * @prev_dev: The device in the group (this is used to make sure that the 
device
  *  hasn't changed after the caller has called this function)
  * @type: The type of the new default domain that gets associated with the 
group
+ * @max_opt_dma_size: Set the IOMMU group max_opt_dma_size if non-zero
  *
  * 

[PATCH RESEND v5 3/5] iommu: Allow iommu_change_dev_def_domain() realloc same default domain type

2022-04-04 Thread John Garry via iommu
Allow iommu_change_dev_def_domain() to create a new default domain, keeping
the same as current.

Also remove comment about the function purpose, which will become stale.

Signed-off-by: John Garry 
---
 drivers/iommu/iommu.c | 49 ++-
 include/linux/iommu.h |  1 +
 2 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0dd766030baf..10bb10c2a210 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2863,6 +2863,7 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
 
+
 /*
  * Changes the default domain of an iommu group that has *only* one device
  *
@@ -2873,10 +2874,6 @@ EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
  *
  * Returns 0 on success and error code on failure
  *
- * Note:
- * 1. Presently, this function is called only when user requests to change the
- *group's default domain type through 
/sys/kernel/iommu_groups//type
- *Please take a closer look if intended to use for other purposes.
  */
 static int iommu_change_dev_def_domain(struct iommu_group *group,
   struct device *prev_dev, int type)
@@ -2929,28 +2926,32 @@ static int iommu_change_dev_def_domain(struct 
iommu_group *group,
goto out;
}
 
-   dev_def_dom = iommu_get_def_domain_type(dev);
-   if (!type) {
+   if (type == __IOMMU_DOMAIN_SAME) {
+   type = prev_dom->type;
+   } else {
+   dev_def_dom = iommu_get_def_domain_type(dev);
+   if (!type) {
+   /*
+* If the user hasn't requested any specific type of 
domain and
+* if the device supports both the domains, then 
default to the
+* domain the device was booted with
+*/
+   type = dev_def_dom ? : iommu_def_domain_type;
+   } else if (dev_def_dom && type != dev_def_dom) {
+   dev_err_ratelimited(prev_dev, "Device cannot be in %s 
domain\n",
+   iommu_domain_type_str(type));
+   ret = -EINVAL;
+   goto out;
+   }
+
/*
-* If the user hasn't requested any specific type of domain and
-* if the device supports both the domains, then default to the
-* domain the device was booted with
+* Switch to a new domain only if the requested domain type is 
different
+* from the existing default domain type
 */
-   type = dev_def_dom ? : iommu_def_domain_type;
-   } else if (dev_def_dom && type != dev_def_dom) {
-   dev_err_ratelimited(prev_dev, "Device cannot be in %s domain\n",
-   iommu_domain_type_str(type));
-   ret = -EINVAL;
-   goto out;
-   }
-
-   /*
-* Switch to a new domain only if the requested domain type is different
-* from the existing default domain type
-*/
-   if (prev_dom->type == type) {
-   ret = 0;
-   goto out;
+   if (prev_dom->type == type) {
+   ret = 0;
+   goto out;
+   }
}
 
/* We can bring up a flush queue without tearing down the domain */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9208eca4b0d1..b141cf71c7af 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -63,6 +63,7 @@ struct iommu_domain_geometry {
  implementation  */
 #define __IOMMU_DOMAIN_PT  (1U << 2)  /* Domain is identity mapped   */
 #define __IOMMU_DOMAIN_DMA_FQ  (1U << 3)  /* DMA-API uses flush queue*/
+#define __IOMMU_DOMAIN_SAME(1U << 4)  /* Keep same type (internal)   */
 
 /*
  * This are the possible domain-types
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RESEND v5 2/5] iova: Allow rcache range upper limit to be flexible

2022-04-04 Thread John Garry via iommu
Some low-level drivers may request DMA mappings whose IOVA length exceeds
that of the current rcache upper limit.

This means that allocations for those IOVAs will never be cached, and
always must be allocated and freed from the RB tree per DMA mapping cycle.
This has a significant effect on performance, more so since commit
4e89dce72521 ("iommu/iova: Retry from last rb tree node if iova search
fails"), as discussed at [0].

As a first step towards allowing the rcache range upper limit be
configured, hold this value in the IOVA rcache structure, and allocate
the rcaches separately.

Delete macro IOVA_RANGE_CACHE_MAX_SIZE in case it's reused by mistake.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/

Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 20 ++--
 include/linux/iova.h |  3 +++
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..5c22b9187b79 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -15,8 +15,6 @@
 /* The anchor node sits above the top of the usable address space */
 #define IOVA_ANCHOR~0UL
 
-#define IOVA_RANGE_CACHE_MAX_SIZE 6/* log of max cached IOVA range size 
(in pages) */
-
 static bool iova_rcache_insert(struct iova_domain *iovad,
   unsigned long pfn,
   unsigned long size);
@@ -443,7 +441,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
 * rounding up anything cacheable to make sure that can't happen. The
 * order of the unadjusted size will still match upon freeing.
 */
-   if (size < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1)))
+   if (size < (1 << (iovad->rcache_max_size - 1)))
size = roundup_pow_of_two(size);
 
iova_pfn = iova_rcache_get(iovad, size, limit_pfn + 1);
@@ -713,13 +711,15 @@ int iova_domain_init_rcaches(struct iova_domain *iovad)
unsigned int cpu;
int i, ret;
 
-   iovad->rcaches = kcalloc(IOVA_RANGE_CACHE_MAX_SIZE,
+   iovad->rcache_max_size = 6; /* Arbitrarily high default */
+
+   iovad->rcaches = kcalloc(iovad->rcache_max_size,
 sizeof(struct iova_rcache),
 GFP_KERNEL);
if (!iovad->rcaches)
return -ENOMEM;
 
-   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   for (i = 0; i < iovad->rcache_max_size; ++i) {
struct iova_cpu_rcache *cpu_rcache;
struct iova_rcache *rcache;
 
@@ -816,7 +816,7 @@ static bool iova_rcache_insert(struct iova_domain *iovad, 
unsigned long pfn,
 {
unsigned int log_size = order_base_2(size);
 
-   if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE)
+   if (log_size >= iovad->rcache_max_size)
return false;
 
return __iova_rcache_insert(iovad, >rcaches[log_size], pfn);
@@ -872,7 +872,7 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
 {
unsigned int log_size = order_base_2(size);
 
-   if (log_size >= IOVA_RANGE_CACHE_MAX_SIZE || !iovad->rcaches)
+   if (log_size >= iovad->rcache_max_size || !iovad->rcaches)
return 0;
 
return __iova_rcache_get(>rcaches[log_size], limit_pfn - size);
@@ -888,7 +888,7 @@ static void free_iova_rcaches(struct iova_domain *iovad)
unsigned int cpu;
int i, j;
 
-   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   for (i = 0; i < iovad->rcache_max_size; ++i) {
rcache = >rcaches[i];
if (!rcache->cpu_rcaches)
break;
@@ -916,7 +916,7 @@ static void free_cpu_cached_iovas(unsigned int cpu, struct 
iova_domain *iovad)
unsigned long flags;
int i;
 
-   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   for (i = 0; i < iovad->rcache_max_size; ++i) {
rcache = >rcaches[i];
cpu_rcache = per_cpu_ptr(rcache->cpu_rcaches, cpu);
spin_lock_irqsave(_rcache->lock, flags);
@@ -935,7 +935,7 @@ static void free_global_cached_iovas(struct iova_domain 
*iovad)
unsigned long flags;
int i, j;
 
-   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   for (i = 0; i < iovad->rcache_max_size; ++i) {
rcache = >rcaches[i];
spin_lock_irqsave(>lock, flags);
for (j = 0; j < rcache->depot_size; ++j) {
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 320a70e40233..02f7222fa85a 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -38,6 +38,9 @@ struct iova_domain {
 
struct iova_rcache  *rcaches;
struct hlist_node   cpuhp_dead;
+
+   /* log of max cached IOVA range size (in pages) */
+   unsigned long   rcache_max_size;
 };
 
 static inline unsigned long iova_size(struct iova *iova)
-- 
2.26.2


[PATCH RESEND v5 0/5] iommu: Allow IOVA rcache range be configured

2022-04-04 Thread John Garry via iommu
For streaming DMA mappings involving an IOMMU and whose IOVA len regularly
exceeds the IOVA rcache upper limit (meaning that they are not cached),
performance can be reduced. 

This may be much more pronounced from commit 4e89dce72521 ("iommu/iova:
Retry from last rb tree node if iova search fails"), as discussed at [0].

IOVAs which cannot be cached are highly involved in the IOVA ageing issue,
as discussed at [1].

This series allows the IOVA rcache range be configured, so that we may
cache all IOVAs per domain, thus improving performance.

A new IOMMU group sysfs file is added - max_opt_dma_size - which is used
indirectly to configure the IOVA rcache range:
/sys/kernel/iommu_groups/X/max_opt_dma_size

This file is updated same as how the IOMMU group default domain type is
updated, i.e. must unbind the only device in the group first.

The inspiration here comes from block layer request queue sysfs
"optimal_io_size" file, in /sys/block/sdX/queue/optimal_io_size

Some old figures* for storage scenario (when increasing IOVA rcache range
to cover all DMA mapping sizes from the LLD):
v5.13-rc1 baseline: 1200K IOPS
With series:1800K IOPS

All above are for IOMMU strict mode. Non-strict mode gives ~1800K IOPS in
all scenarios.

Based on v5.18-rc1
* I lost my high data throughout test setup

Differences to v4:
https://lore.kernel.org/linux-iommu/1626259003-201303-1-git-send-email-john.ga...@huawei.com/
- Major rebase
- Change the "Refactor iommu_group_store_type()" to not use a callback
  and an op type enum instead
  - I didn't pick up Will's Ack as it has changed so much
- Use a domain feature flag to keep same default group type
- Add wrapper for default IOVA rcache range
- Combine last 2x patches

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/1607538189-237944-1-git-send-email-john.ga...@huawei.com/

John Garry (5):
  iommu: Refactor iommu_group_store_type()
  iova: Allow rcache range upper limit to be flexible
  iommu: Allow iommu_change_dev_def_domain() realloc same default domain
type
  iommu: Allow max opt DMA len be set for a group via sysfs
  iova: Add iova_len argument to iova_domain_init_rcaches()

 .../ABI/testing/sysfs-kernel-iommu_groups |  16 ++
 drivers/iommu/dma-iommu.c |  15 +-
 drivers/iommu/iommu.c | 202 +-
 drivers/iommu/iova.c  |  37 ++--
 drivers/vdpa/vdpa_user/iova_domain.c  |   4 +-
 include/linux/iommu.h |   7 +
 include/linux/iova.h  |   6 +-
 7 files changed, 212 insertions(+), 75 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RESEND v5 1/5] iommu: Refactor iommu_group_store_type()

2022-04-04 Thread John Garry via iommu
Function iommu_group_store_type() supports changing the default domain
of an IOMMU group.

Many conditions need to be satisfied and steps taken for this action to be
successful.

Satisfying these conditions and steps will be required for setting other
IOMMU group attributes, so factor into a common part and a part specific
to update the IOMMU group attribute.

No functional change intended.

Some code comments are tidied up also.

Signed-off-by: John Garry 
---
 drivers/iommu/iommu.c | 96 ---
 1 file changed, 62 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2c45b85b9fc..0dd766030baf 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3000,21 +3000,57 @@ static int iommu_change_dev_def_domain(struct 
iommu_group *group,
return ret;
 }
 
+enum iommu_group_op {
+   CHANGE_GROUP_TYPE,
+};
+
+static int __iommu_group_store_type(const char *buf, struct iommu_group *group,
+   struct device *dev)
+{
+   int type;
+
+   if (sysfs_streq(buf, "identity"))
+   type = IOMMU_DOMAIN_IDENTITY;
+   else if (sysfs_streq(buf, "DMA"))
+   type = IOMMU_DOMAIN_DMA;
+   else if (sysfs_streq(buf, "DMA-FQ"))
+   type = IOMMU_DOMAIN_DMA_FQ;
+   else if (sysfs_streq(buf, "auto"))
+   type = 0;
+   else
+   return -EINVAL;
+
+   /*
+* Check if the only device in the group still has a driver bound or
+* we're transistioning from DMA -> DMA-FQ
+*/
+   if (device_is_bound(dev) && !(type == IOMMU_DOMAIN_DMA_FQ &&
+   group->default_domain->type == IOMMU_DOMAIN_DMA)) {
+   pr_err_ratelimited("Device is still bound to driver\n");
+   return -EINVAL;
+   }
+
+   return iommu_change_dev_def_domain(group, dev, type);
+}
+
 /*
  * Changing the default domain through sysfs requires the users to unbind the
  * drivers from the devices in the iommu group, except for a DMA -> DMA-FQ
- * transition. Return failure if this isn't met.
+ * transition. Changing or any other IOMMU group attribute still requires the
+ * user to unbind the drivers from the devices in the iommu group. Return
+ * failure if these conditions are not met.
  *
  * We need to consider the race between this and the device release path.
  * device_lock(dev) is used here to guarantee that the device release path
  * will not be entered at the same time.
  */
-static ssize_t iommu_group_store_type(struct iommu_group *group,
- const char *buf, size_t count)
+static ssize_t iommu_group_store_common(struct iommu_group *group,
+   enum iommu_group_op op,
+   const char *buf, size_t count)
 {
struct group_device *grp_dev;
struct device *dev;
-   int ret, req_type;
+   int ret;
 
if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
return -EACCES;
@@ -3022,27 +3058,16 @@ static ssize_t iommu_group_store_type(struct 
iommu_group *group,
if (WARN_ON(!group))
return -EINVAL;
 
-   if (sysfs_streq(buf, "identity"))
-   req_type = IOMMU_DOMAIN_IDENTITY;
-   else if (sysfs_streq(buf, "DMA"))
-   req_type = IOMMU_DOMAIN_DMA;
-   else if (sysfs_streq(buf, "DMA-FQ"))
-   req_type = IOMMU_DOMAIN_DMA_FQ;
-   else if (sysfs_streq(buf, "auto"))
-   req_type = 0;
-   else
-   return -EINVAL;
-
/*
 * Lock/Unlock the group mutex here before device lock to
-* 1. Make sure that the iommu group has only one device (this is a
+* 1. Make sure that the IOMMU group has only one device (this is a
 *prerequisite for step 2)
 * 2. Get struct *dev which is needed to lock device
 */
mutex_lock(>mutex);
if (iommu_group_device_count(group) != 1) {
mutex_unlock(>mutex);
-   pr_err_ratelimited("Cannot change default domain: Group has 
more than one device\n");
+   pr_err_ratelimited("Cannot change IOMMU group default domain 
attribute: Group has more than one device\n");
return -EINVAL;
}
 
@@ -3054,16 +3079,16 @@ static ssize_t iommu_group_store_type(struct 
iommu_group *group,
/*
 * Don't hold the group mutex because taking group mutex first and then
 * the device lock could potentially cause a deadlock as below. Assume
-* two threads T1 and T2. T1 is trying to change default domain of an
-* iommu group and T2 is trying to hot unplug a device or release [1] VF
-* of a PCIe device which is in the same iommu group. T1 takes group
-* mutex and before it could take device lock assume T2 has taken device
-* lock and is yet to take group mutex. Now, both the 

Re: [PATCH v1 00/37] iommu/amd: Add multiple PCI segments support

2022-04-04 Thread Vasant Hegde via iommu
On 4/4/2022 3:10 PM, Vasant Hegde via iommu wrote:
> Newer AMD systems can support multiple PCI segments, where each segment
> contains one or more IOMMU instances. However, an IOMMU instance can only
> support a single PCI segment.
> 

Hi,

Please ignore this series. Looks like I had network glitch and git didn't send 
entire series.
I have resent series.

-Vasant

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 37/37] iommu/amd: Update amd_iommu_fault structure to include PCI seg ID

2022-04-04 Thread Vasant Hegde via iommu
Rename 'device_id' as 'sbdf' and extend it to 32bit so that we can
pass PCI segment ID to ppr_notifier(). Also pass PCI segment ID to
pci_get_domain_bus_and_slot() instead of default value.

Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 2 +-
 drivers/iommu/amd/iommu.c   | 2 +-
 drivers/iommu/amd/iommu_v2.c| 9 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index f2bbcb19e92c..a908f18a3632 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -477,7 +477,7 @@ extern struct kmem_cache *amd_iommu_irq_cache;
 struct amd_iommu_fault {
u64 address;/* IO virtual address of the fault*/
u32 pasid;  /* Address space identifier */
-   u16 device_id;  /* Originating PCI device id */
+   u32 sbdf;   /* Originating PCI device id */
u16 tag;/* PPR tag */
u16 flags;  /* Fault flags */
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c64a2382b4a0..e04d349f5f9e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -700,7 +700,7 @@ static void iommu_handle_ppr_entry(struct amd_iommu *iommu, 
u64 *raw)
 
fault.address   = raw[1];
fault.pasid = PPR_PASID(raw[0]);
-   fault.device_id = PPR_DEVID(raw[0]);
+   fault.sbdf  = (iommu->pci_seg->id << 16) | PPR_DEVID(raw[0]);
fault.tag   = PPR_TAG(raw[0]);
fault.flags = PPR_FLAGS(raw[0]);
 
diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index b186d6e0..631ded8168ff 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -518,15 +518,16 @@ static int ppr_notifier(struct notifier_block *nb, 
unsigned long e, void *data)
unsigned long flags;
struct fault *fault;
bool finish;
-   u16 tag, devid;
+   u16 tag, devid, seg_id;
int ret;
 
iommu_fault = data;
tag = iommu_fault->tag & 0x1ff;
finish  = (iommu_fault->tag >> 9) & 1;
 
-   devid = iommu_fault->device_id;
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   seg_id = (iommu_fault->sbdf >> 16) & 0x;
+   devid = iommu_fault->sbdf & 0x;
+   pdev = pci_get_domain_bus_and_slot(seg_id, PCI_BUS_NUM(devid),
   devid & 0xff);
if (!pdev)
return -ENODEV;
@@ -540,7 +541,7 @@ static int ppr_notifier(struct notifier_block *nb, unsigned 
long e, void *data)
goto out;
}
 
-   dev_state = get_device_state(iommu_fault->device_id);
+   dev_state = get_device_state(iommu_fault->sbdf);
if (dev_state == NULL)
goto out;
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 36/37] iommu/amd: Update device_state structure to include PCI seg ID

2022-04-04 Thread Vasant Hegde via iommu
Rename struct device_state.devid variable to struct device_state.sbdf
and extend it to 32-bit to include the 16-bit PCI segment ID via
the helper function get_pci_sbdf_id().

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu_v2.c | 58 +++-
 1 file changed, 24 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index e56b137ceabd..b186d6e0 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -51,7 +51,7 @@ struct pasid_state {
 
 struct device_state {
struct list_head list;
-   u16 devid;
+   u32 sbdf;
atomic_t count;
struct pci_dev *pdev;
struct pasid_state **states;
@@ -83,35 +83,25 @@ static struct workqueue_struct *iommu_wq;
 
 static void free_pasid_states(struct device_state *dev_state);
 
-static u16 device_id(struct pci_dev *pdev)
-{
-   u16 devid;
-
-   devid = pdev->bus->number;
-   devid = (devid << 8) | pdev->devfn;
-
-   return devid;
-}
-
-static struct device_state *__get_device_state(u16 devid)
+static struct device_state *__get_device_state(u32 sbdf)
 {
struct device_state *dev_state;
 
list_for_each_entry(dev_state, _list, list) {
-   if (dev_state->devid == devid)
+   if (dev_state->sbdf == sbdf)
return dev_state;
}
 
return NULL;
 }
 
-static struct device_state *get_device_state(u16 devid)
+static struct device_state *get_device_state(u32 sbdf)
 {
struct device_state *dev_state;
unsigned long flags;
 
spin_lock_irqsave(_lock, flags);
-   dev_state = __get_device_state(devid);
+   dev_state = __get_device_state(sbdf);
if (dev_state != NULL)
atomic_inc(_state->count);
spin_unlock_irqrestore(_lock, flags);
@@ -609,7 +599,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
struct pasid_state *pasid_state;
struct device_state *dev_state;
struct mm_struct *mm;
-   u16 devid;
+   u32 sbdf;
int ret;
 
might_sleep();
@@ -617,8 +607,8 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
if (!amd_iommu_v2_supported())
return -ENODEV;
 
-   devid = device_id(pdev);
-   dev_state = get_device_state(devid);
+   sbdf  = get_pci_sbdf_id(pdev);
+   dev_state = get_device_state(sbdf);
 
if (dev_state == NULL)
return -EINVAL;
@@ -692,15 +682,15 @@ void amd_iommu_unbind_pasid(struct pci_dev *pdev, u32 
pasid)
 {
struct pasid_state *pasid_state;
struct device_state *dev_state;
-   u16 devid;
+   u32 sbdf;
 
might_sleep();
 
if (!amd_iommu_v2_supported())
return;
 
-   devid = device_id(pdev);
-   dev_state = get_device_state(devid);
+   sbdf = get_pci_sbdf_id(pdev);
+   dev_state = get_device_state(sbdf);
if (dev_state == NULL)
return;
 
@@ -742,7 +732,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
struct iommu_group *group;
unsigned long flags;
int ret, tmp;
-   u16 devid;
+   u32 sbdf;
 
might_sleep();
 
@@ -759,7 +749,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
if (pasids <= 0 || pasids > (PASID_MASK + 1))
return -EINVAL;
 
-   devid = device_id(pdev);
+   sbdf = get_pci_sbdf_id(pdev);
 
dev_state = kzalloc(sizeof(*dev_state), GFP_KERNEL);
if (dev_state == NULL)
@@ -768,7 +758,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
spin_lock_init(_state->lock);
init_waitqueue_head(_state->wq);
dev_state->pdev  = pdev;
-   dev_state->devid = devid;
+   dev_state->sbdf = sbdf;
 
tmp = pasids;
for (dev_state->pasid_levels = 0; (tmp - 1) & ~0x1ff; tmp >>= 9)
@@ -806,7 +796,7 @@ int amd_iommu_init_device(struct pci_dev *pdev, int pasids)
 
spin_lock_irqsave(_lock, flags);
 
-   if (__get_device_state(devid) != NULL) {
+   if (__get_device_state(sbdf) != NULL) {
spin_unlock_irqrestore(_lock, flags);
ret = -EBUSY;
goto out_free_domain;
@@ -838,16 +828,16 @@ void amd_iommu_free_device(struct pci_dev *pdev)
 {
struct device_state *dev_state;
unsigned long flags;
-   u16 devid;
+   u32 sbdf;
 
if (!amd_iommu_v2_supported())
return;
 
-   devid = device_id(pdev);
+   sbdf = get_pci_sbdf_id(pdev);
 
spin_lock_irqsave(_lock, flags);
 
-   dev_state = __get_device_state(devid);
+   dev_state = __get_device_state(sbdf);
if (dev_state == NULL) {
spin_unlock_irqrestore(_lock, flags);
return;
@@ -867,18 +857,18 @@ int 

[RESEND PATCH v1 35/37] iommu/amd: Print PCI segment ID in error log messages

2022-04-04 Thread Vasant Hegde via iommu
Print pci segment ID along with bdf. Useful for debugging.

Co-developed-by: Suravee Suthikulpaint 
Signed-off-by: Suravee Suthikulpaint 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/init.c  | 10 +-
 drivers/iommu/amd/iommu.c | 36 ++--
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index ba0ef8192a2f..24814ec3dca8 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1850,11 +1850,11 @@ static int __init init_iommu_all(struct 
acpi_table_header *table)
h = (struct ivhd_header *)p;
if (*p == amd_iommu_target_ivhd_type) {
 
-   DUMP_printk("device: %02x:%02x.%01x cap: %04x "
-   "seg: %d flags: %01x info %04x\n",
-   PCI_BUS_NUM(h->devid), PCI_SLOT(h->devid),
-   PCI_FUNC(h->devid), h->cap_ptr,
-   h->pci_seg, h->flags, h->info);
+   DUMP_printk("device: %04x:%02x:%02x.%01x cap: %04x "
+   "flags: %01x info %04x\n",
+   h->pci_seg, PCI_BUS_NUM(h->devid),
+   PCI_SLOT(h->devid), PCI_FUNC(h->devid),
+   h->cap_ptr, h->flags, h->info);
DUMP_printk("   mmio-addr: %016llx\n",
h->mmio_phys);
 
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4dd9d4201ffd..c64a2382b4a0 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -495,8 +495,8 @@ static void amd_iommu_report_rmp_hw_error(struct amd_iommu 
*iommu, volatile u32
vmg_tag, spa, flags);
}
} else {
-   pr_err_ratelimited("Event logged [RMP_HW_ERROR 
device=%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   pr_err_ratelimited("Event logged [RMP_HW_ERROR 
device=%04x:%02x:%02x.%x, vmg_tag=0x%04x, spa=0x%llx, flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
vmg_tag, spa, flags);
}
 
@@ -528,8 +528,8 @@ static void amd_iommu_report_rmp_fault(struct amd_iommu 
*iommu, volatile u32 *ev
vmg_tag, gpa, flags_rmp, flags);
}
} else {
-   pr_err_ratelimited("Event logged [RMP_PAGE_FAULT 
device=%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, 
flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   pr_err_ratelimited("Event logged [RMP_PAGE_FAULT 
device=%04x:%02x:%02x.%x, vmg_tag=0x%04x, gpa=0x%llx, flags_rmp=0x%04x, 
flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
vmg_tag, gpa, flags_rmp, flags);
}
 
@@ -575,8 +575,8 @@ static void amd_iommu_report_page_fault(struct amd_iommu 
*iommu,
domain_id, address, flags);
}
} else {
-   pr_err_ratelimited("Event logged [IO_PAGE_FAULT 
device=%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   pr_err_ratelimited("Event logged [IO_PAGE_FAULT 
device=%04x:%02x:%02x.%x domain=0x%04x address=0x%llx flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
domain_id, address, flags);
}
 
@@ -619,20 +619,20 @@ static void iommu_print_event(struct amd_iommu *iommu, 
void *__evt)
 
switch (type) {
case EVENT_TYPE_ILL_DEV:
-   dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY 
device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY 
device=%04x:%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 
PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);
dump_dte_entry(iommu, devid);
break;
case EVENT_TYPE_DEV_TAB_ERR:
-   dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR 
device=%02x:%02x.%x "
+   dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR 
device=%04x:%02x:%02x.%x "
"address=0x%llx flags=0x%04x]\n",
-   PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
+   iommu->pci_seg->id, PCI_BUS_NUM(devid), 

[RESEND PATCH v1 34/37] iommu/amd: Add PCI segment support for ivrs_ioapic, ivrs_hpet, ivrs_acpihid commands

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

By default, PCI segment is zero and can be omitted. To support system
with non-zero PCI segment ID, modify the parsing functions to allow
PCI segment ID.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 .../admin-guide/kernel-parameters.txt | 34 +++
 drivers/iommu/amd/init.c  | 41 ---
 2 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index f5a27f067db9..cc8f0c82ff55 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2208,23 +2208,39 @@
 
ivrs_ioapic [HW,X86-64]
Provide an override to the IOAPIC-ID<->DEVICE-ID
-   mapping provided in the IVRS ACPI table. For
-   example, to map IOAPIC-ID decimal 10 to
-   PCI device 00:14.0 write the parameter as:
+   mapping provided in the IVRS ACPI table.
+   By default, PCI segment is 0, and can be omitted.
+   For example:
+   * To map IOAPIC-ID decimal 10 to PCI device 00:14.0
+ write the parameter as:
ivrs_ioapic[10]=00:14.0
+   * To map IOAPIC-ID decimal 10 to PCI segment 0x1 and
+ PCI device 00:14.0 write the parameter as:
+   ivrs_ioapic[10]=0001:00:14.0
 
ivrs_hpet   [HW,X86-64]
Provide an override to the HPET-ID<->DEVICE-ID
-   mapping provided in the IVRS ACPI table. For
-   example, to map HPET-ID decimal 0 to
-   PCI device 00:14.0 write the parameter as:
+   mapping provided in the IVRS ACPI table.
+   By default, PCI segment is 0, and can be omitted.
+   For example:
+   * To map HPET-ID decimal 0 to PCI device 00:14.0
+ write the parameter as:
ivrs_hpet[0]=00:14.0
+   * To map HPET-ID decimal 10 to PCI segment 0x1 and
+ PCI device 00:14.0 write the parameter as:
+   ivrs_ioapic[10]=0001:00:14.0
 
ivrs_acpihid[HW,X86-64]
Provide an override to the ACPI-HID:UID<->DEVICE-ID
-   mapping provided in the IVRS ACPI table. For
-   example, to map UART-HID:UID AMD0020:0 to
-   PCI device 00:14.5 write the parameter as:
+   mapping provided in the IVRS ACPI table.
+
+   For example, to map UART-HID:UID AMD0020:0 to
+   PCI segment 0x1 and PCI device ID 00:14.5,
+   write the parameter as:
+   ivrs_acpihid[0001:00:14.5]=AMD0020:0
+
+   By default, PCI segment is 0, and can be omitted.
+   For example, PCI device 00:14.5 write the parameter as:
ivrs_acpihid[00:14.5]=AMD0020:0
 
js= [HW,JOY] Analog joystick
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index ccc0208d4b69..ba0ef8192a2f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3288,15 +3288,17 @@ static int __init parse_amd_iommu_options(char *str)
 
 static int __init parse_ivrs_ioapic(char *str)
 {
-   unsigned int bus, dev, fn;
+   u32 seg = 0, bus, dev, fn;
int ret, id, i;
u16 devid;
 
ret = sscanf(str, "[%d]=%x:%x.%x", , , , );
-
if (ret != 4) {
-   pr_err("Invalid command line: ivrs_ioapic%s\n", str);
-   return 1;
+   ret = sscanf(str, "[%d]=%x:%x:%x.%x", , , , , 
);
+   if (ret != 5) {
+   pr_err("Invalid command line: ivrs_ioapic%s\n", str);
+   return 1;
+   }
}
 
if (early_ioapic_map_size == EARLY_MAP_SIZE) {
@@ -3305,7 +3307,8 @@ static int __init parse_ivrs_ioapic(char *str)
return 1;
}
 
-   devid = ((bus & 0xff) << 8) | ((dev & 0x1f) << 3) | (fn & 0x7);
+   devid = ((seg & 0x) << 16) | ((bus & 0xff) << 8) |
+   ((dev & 0x1f) << 3) | (fn & 0x7);
 
cmdline_maps= true;
i   = early_ioapic_map_size++;
@@ -3318,15 +3321,17 @@ static int __init parse_ivrs_ioapic(char *str)
 
 static int __init parse_ivrs_hpet(char *str)
 {
-   unsigned int bus, dev, fn;
+   u32 seg = 0, bus, dev, fn;
int ret, id, i;
u16 devid;
 
ret = sscanf(str, "[%d]=%x:%x.%x", , , , );
-
if (ret != 4) {
- 

[RESEND PATCH v1 33/37] iommu/amd: Specify PCI segment ID when getting pci device

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Upcoming AMD systems can have multiple PCI segments. Hence pass PCI
segment ID to pci_get_domain_bus_and_slot() instead of '0'.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/init.c  |  6 --
 drivers/iommu/amd/iommu.c | 19 ++-
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 4a9f424eb4b4..ccc0208d4b69 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1961,7 +1961,8 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
int cap_ptr = iommu->cap_ptr;
int ret;
 
-   iommu->dev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(iommu->devid),
+   iommu->dev = pci_get_domain_bus_and_slot(iommu->pci_seg->id,
+PCI_BUS_NUM(iommu->devid),
 iommu->devid & 0xff);
if (!iommu->dev)
return -ENODEV;
@@ -2024,7 +2025,8 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
int i, j;
 
iommu->root_pdev =
-   pci_get_domain_bus_and_slot(0, iommu->dev->bus->number,
+   pci_get_domain_bus_and_slot(iommu->pci_seg->id,
+   iommu->dev->bus->number,
PCI_DEVFN(0, 0));
 
/*
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4fe77f77dfa1..4dd9d4201ffd 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -472,7 +472,7 @@ static void dump_command(unsigned long phys_addr)
pr_err("CMD[%d]: %08x\n", i, cmd->data[i]);
 }
 
-static void amd_iommu_report_rmp_hw_error(volatile u32 *event)
+static void amd_iommu_report_rmp_hw_error(struct amd_iommu *iommu, volatile 
u32 *event)
 {
struct iommu_dev_data *dev_data = NULL;
int devid, vmg_tag, flags;
@@ -484,7 +484,7 @@ static void amd_iommu_report_rmp_hw_error(volatile u32 
*event)
flags   = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK;
spa = ((u64)event[3] << 32) | (event[2] & 0xFFF8);
 
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 
PCI_BUS_NUM(devid),
   devid & 0xff);
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
@@ -504,7 +504,7 @@ static void amd_iommu_report_rmp_hw_error(volatile u32 
*event)
pci_dev_put(pdev);
 }
 
-static void amd_iommu_report_rmp_fault(volatile u32 *event)
+static void amd_iommu_report_rmp_fault(struct amd_iommu *iommu, volatile u32 
*event)
 {
struct iommu_dev_data *dev_data = NULL;
int devid, flags_rmp, vmg_tag, flags;
@@ -517,7 +517,7 @@ static void amd_iommu_report_rmp_fault(volatile u32 *event)
flags = (event[1] >> EVENT_FLAGS_SHIFT) & EVENT_FLAGS_MASK;
gpa   = ((u64)event[3] << 32) | event[2];
 
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 
PCI_BUS_NUM(devid),
   devid & 0xff);
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
@@ -543,13 +543,14 @@ static void amd_iommu_report_rmp_fault(volatile u32 
*event)
 #define IS_WRITE_REQUEST(flags)\
((flags) & EVENT_FLAG_RW)
 
-static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
+static void amd_iommu_report_page_fault(struct amd_iommu *iommu,
+   u16 devid, u16 domain_id,
u64 address, int flags)
 {
struct iommu_dev_data *dev_data = NULL;
struct pci_dev *pdev;
 
-   pdev = pci_get_domain_bus_and_slot(0, PCI_BUS_NUM(devid),
+   pdev = pci_get_domain_bus_and_slot(iommu->pci_seg->id, 
PCI_BUS_NUM(devid),
   devid & 0xff);
if (pdev)
dev_data = dev_iommu_priv_get(>dev);
@@ -612,7 +613,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
}
 
if (type == EVENT_TYPE_IO_FAULT) {
-   amd_iommu_report_page_fault(devid, pasid, address, flags);
+   amd_iommu_report_page_fault(iommu, devid, pasid, address, 
flags);
return;
}
 
@@ -653,10 +654,10 @@ static void iommu_print_event(struct amd_iommu *iommu, 
void *__evt)
pasid, address, flags);
break;
case EVENT_TYPE_RMP_FAULT:
-   amd_iommu_report_rmp_fault(event);
+   amd_iommu_report_rmp_fault(iommu, event);
break;
case EVENT_TYPE_RMP_HW_ERR:
-   amd_iommu_report_rmp_hw_error(event);
+   

[RESEND PATCH v1 32/37] iommu/amd: Include PCI segment ID when initialize IOMMU

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Extend current device ID variables to 32-bit to include the 16-bit
segment ID when parsing device information from IVRS table to initialize
each IOMMU.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  2 +-
 drivers/iommu/amd/amd_iommu_types.h |  6 ++--
 drivers/iommu/amd/init.c| 56 +++--
 drivers/iommu/amd/quirks.c  |  4 +--
 4 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 4dad1b442409..9be5ad746d47 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -125,7 +125,7 @@ static inline int get_pci_sbdf_id(struct pci_dev *pdev)
 
 extern bool translation_pre_enabled(struct amd_iommu *iommu);
 extern bool amd_iommu_is_attach_deferred(struct device *dev);
-extern int __init add_special_device(u8 type, u8 id, u16 *devid,
+extern int __init add_special_device(u8 type, u8 id, u32 *devid,
 bool cmd_line);
 
 #ifdef CONFIG_DMI
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 1109961e1042..f2bbcb19e92c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -734,8 +734,8 @@ struct acpihid_map_entry {
struct list_head list;
u8 uid[ACPIHID_UID_LEN];
u8 hid[ACPIHID_HID_LEN];
-   u16 devid;
-   u16 root_devid;
+   u32 devid;
+   u32 root_devid;
bool cmd_line;
struct iommu_group *group;
 };
@@ -743,7 +743,7 @@ struct acpihid_map_entry {
 struct devid_map {
struct list_head list;
u8 id;
-   u16 devid;
+   u32 devid;
bool cmd_line;
 };
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 093304d16c85..4a9f424eb4b4 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1147,7 +1147,7 @@ static void __init set_dev_entry_from_acpi(struct 
amd_iommu *iommu,
amd_iommu_set_rlookup_table(iommu, devid);
 }
 
-int __init add_special_device(u8 type, u8 id, u16 *devid, bool cmd_line)
+int __init add_special_device(u8 type, u8 id, u32 *devid, bool cmd_line)
 {
struct devid_map *entry;
struct list_head *list;
@@ -1184,7 +1184,7 @@ int __init add_special_device(u8 type, u8 id, u16 *devid, 
bool cmd_line)
return 0;
 }
 
-static int __init add_acpi_hid_device(u8 *hid, u8 *uid, u16 *devid,
+static int __init add_acpi_hid_device(u8 *hid, u8 *uid, u32 *devid,
  bool cmd_line)
 {
struct acpihid_map_entry *entry;
@@ -1263,7 +1263,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 {
u8 *p = (u8 *)h;
u8 *end = p, flags = 0;
-   u16 devid = 0, devid_start = 0, devid_to = 0;
+   u16 devid = 0, devid_start = 0, devid_to = 0, seg_id;
u32 dev_i, ext_flags = 0;
bool alias = false;
struct ivhd_entry *e;
@@ -1299,6 +1299,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 
while (p < end) {
e = (struct ivhd_entry *)p;
+   seg_id = pci_seg->id;
+
switch (e->type) {
case IVHD_DEV_ALL:
 
@@ -1309,9 +1311,9 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case IVHD_DEV_SELECT:
 
-   DUMP_printk("  DEV_SELECT\t\t\t devid: %02x:%02x.%x "
+   DUMP_printk("  DEV_SELECT\t\t\t devid: 
%04x:%02x:%02x.%x "
"flags: %02x\n",
-   PCI_BUS_NUM(e->devid),
+   seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
e->flags);
@@ -1322,8 +1324,8 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
case IVHD_DEV_SELECT_RANGE_START:
 
DUMP_printk("  DEV_SELECT_RANGE_START\t "
-   "devid: %02x:%02x.%x flags: %02x\n",
-   PCI_BUS_NUM(e->devid),
+   "devid: %04x:%02x:%02x.%x flags: %02x\n",
+   seg_id, PCI_BUS_NUM(e->devid),
PCI_SLOT(e->devid),
PCI_FUNC(e->devid),
e->flags);
@@ -1335,9 +1337,9 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
break;
case IVHD_DEV_ALIAS:
 
-   DUMP_printk("  DEV_ALIAS\t\t\t devid: %02x:%02x.%x "
+   DUMP_printk("  DEV_ALIAS\t\t\t devid: %04x:%02x:%02x.%x 
"
"flags: 

[RESEND PATCH v1 31/37] iommu/amd: Introduce get_device_sbdf_id() helper function

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Current get_device_id() only provide 16-bit PCI device ID (i.e. BDF).
With multiple PCI segment support, we need to extend the helper function
to include PCI segment ID.

So, introduce a new helper function get_device_sbdf_id() to replace
the current get_pci_device_id().

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu.h |  7 ++
 drivers/iommu/amd/iommu.c | 40 +--
 2 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 64c954e168d7..4dad1b442409 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -115,6 +115,13 @@ void amd_iommu_domain_clr_pt_root(struct protection_domain 
*domain)
amd_iommu_domain_set_pt_root(domain, 0);
 }
 
+static inline int get_pci_sbdf_id(struct pci_dev *pdev)
+{
+   int seg = pci_domain_nr(pdev->bus);
+   u16 devid = pci_dev_id(pdev);
+
+   return ((seg << 16) | (devid & 0x));
+}
 
 extern bool translation_pre_enabled(struct amd_iommu *iommu);
 extern bool amd_iommu_is_attach_deferred(struct device *dev);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 60fbb1abb15f..4fe77f77dfa1 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -92,13 +92,6 @@ static void detach_device(struct device *dev);
  *
  /
 
-static inline u16 get_pci_device_id(struct device *dev)
-{
-   struct pci_dev *pdev = to_pci_dev(dev);
-
-   return pci_dev_id(pdev);
-}
-
 static inline int get_acpihid_device_id(struct device *dev,
struct acpihid_map_entry **entry)
 {
@@ -119,16 +112,16 @@ static inline int get_acpihid_device_id(struct device 
*dev,
return -EINVAL;
 }
 
-static inline int get_device_id(struct device *dev)
+static inline int get_device_sbdf_id(struct device *dev)
 {
-   int devid;
+   int sbdf;
 
if (dev_is_pci(dev))
-   devid = get_pci_device_id(dev);
+   sbdf = get_pci_sbdf_id(to_pci_dev(dev));
else
-   devid = get_acpihid_device_id(dev, NULL);
+   sbdf = get_acpihid_device_id(dev, NULL);
 
-   return devid;
+   return sbdf;
 }
 
 struct dev_table_entry *get_dev_table(struct amd_iommu *iommu)
@@ -182,9 +175,11 @@ static struct amd_iommu *__rlookup_amd_iommu(u16 seg, u16 
devid)
 static struct amd_iommu *rlookup_amd_iommu(struct device *dev)
 {
u16 seg = get_device_segment(dev);
-   u16 devid = get_device_id(dev);
+   int devid = get_device_sbdf_id(dev);
 
-   return __rlookup_amd_iommu(seg, devid);
+   if (devid < 0)
+   return NULL;
+   return __rlookup_amd_iommu(seg, (devid & 0x));
 }
 
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
@@ -364,9 +359,10 @@ static bool check_device(struct device *dev)
if (!dev)
return false;
 
-   devid = get_device_id(dev);
+   devid = get_device_sbdf_id(dev);
if (devid < 0)
return false;
+   devid &= 0x;
 
iommu = rlookup_amd_iommu(dev);
if (!iommu)
@@ -374,7 +370,7 @@ static bool check_device(struct device *dev)
 
/* Out of our scope? */
pci_seg = iommu->pci_seg;
-   if ((devid & 0x) > pci_seg->last_bdf)
+   if (devid > pci_seg->last_bdf)
return false;
 
return true;
@@ -388,10 +384,11 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct device *dev)
if (dev_iommu_priv_get(dev))
return 0;
 
-   devid = get_device_id(dev);
+   devid = get_device_sbdf_id(dev);
if (devid < 0)
return devid;
 
+   devid &= 0x;
dev_data = find_dev_data(iommu, devid);
if (!dev_data)
return -ENOMEM;
@@ -421,10 +418,11 @@ static void iommu_ignore_device(struct amd_iommu *iommu, 
struct device *dev)
struct dev_table_entry *dev_table = get_dev_table(iommu);
int devid;
 
-   devid = (get_device_id(dev)) & 0x;
+   devid = get_device_sbdf_id(dev);
if (devid < 0)
return;
 
+   devid &= 0x;
pci_seg->rlookup_table[devid] = NULL;
memset(_table[devid], 0, sizeof(struct dev_table_entry));
 
@@ -2262,9 +2260,11 @@ static void amd_iommu_get_resv_regions(struct device 
*dev,
struct amd_iommu_pci_seg *pci_seg;
int devid;
 
-   devid = get_device_id(dev);
+   devid = get_device_sbdf_id(dev);
if (devid < 0)
return;
+   devid &= 0x;
+
iommu = rlookup_amd_iommu(dev);
if (!iommu)
return;
@@ -3151,7 +3151,7 @@ static int get_devid(struct irq_alloc_info *info)
return get_hpet_devid(info->devid);
case 

[RESEND PATCH v1 30/37] iommu/amd: Flush upto last_bdf only

2022-04-04 Thread Vasant Hegde via iommu
Fix amd_iommu_flush_dte_all() and amd_iommu_flush_tlb_all() to flush
upto last_bdf only.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 69c88e6c9fde..60fbb1abb15f 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1190,8 +1190,9 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 
devid)
 static void amd_iommu_flush_dte_all(struct amd_iommu *iommu)
 {
u32 devid;
+   u16 last_bdf = iommu->pci_seg->last_bdf;
 
-   for (devid = 0; devid <= 0x; ++devid)
+   for (devid = 0; devid <= last_bdf; ++devid)
iommu_flush_dte(iommu, devid);
 
iommu_completion_wait(iommu);
@@ -1204,8 +1205,9 @@ static void amd_iommu_flush_dte_all(struct amd_iommu 
*iommu)
 static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu)
 {
u32 dom_id;
+   u16 last_bdf = iommu->pci_seg->last_bdf;
 
-   for (dom_id = 0; dom_id <= 0x; ++dom_id) {
+   for (dom_id = 0; dom_id <= last_bdf; ++dom_id) {
struct iommu_cmd cmd;
build_inv_iommu_pages(, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
  dom_id, 1);
@@ -1248,8 +1250,9 @@ static void iommu_flush_irt(struct amd_iommu *iommu, u16 
devid)
 static void amd_iommu_flush_irt_all(struct amd_iommu *iommu)
 {
u32 devid;
+   u16 last_bdf = iommu->pci_seg->last_bdf;
 
-   for (devid = 0; devid <= MAX_DEV_TABLE_ENTRIES; devid++)
+   for (devid = 0; devid <= last_bdf; devid++)
iommu_flush_irt(iommu, devid);
 
iommu_completion_wait(iommu);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 29/37] iommu/amd: Remove global amd_iommu_last_bdf

2022-04-04 Thread Vasant Hegde via iommu
Replace it with per PCI segment last_bdf variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 ---
 drivers/iommu/amd/init.c| 35 ++---
 drivers/iommu/amd/iommu.c   | 10 ++---
 3 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 0aa170014b85..1109961e1042 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -829,9 +829,6 @@ struct unity_map_entry {
 /* size of the dma_ops aperture as power of 2 */
 extern unsigned amd_iommu_aperture_order;
 
-/* largest PCI device id we expect translation requests for */
-extern u16 amd_iommu_last_bdf;
-
 /* allocation bitmap for domain ids */
 extern unsigned long *amd_iommu_pd_alloc_bitmap;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b3905b1c4bc9..093304d16c85 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -161,9 +161,6 @@ static bool amd_iommu_disabled __initdata;
 static bool amd_iommu_force_enable __initdata;
 static int amd_iommu_target_ivhd_type;
 
-u16 amd_iommu_last_bdf;/* largest PCI device id we have
-  to handle */
-
 LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
   system */
@@ -245,16 +242,10 @@ static void init_translation_status(struct amd_iommu 
*iommu)
iommu->flags |= AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
 }
 
-static inline void update_last_devid(u16 devid)
-{
-   if (devid > amd_iommu_last_bdf)
-   amd_iommu_last_bdf = devid;
-}
-
-static inline unsigned long tbl_size(int entry_size)
+static inline unsigned long tbl_size(int entry_size, int last_bdf)
 {
unsigned shift = PAGE_SHIFT +
-get_order(((int)amd_iommu_last_bdf + 1) * entry_size);
+get_order((last_bdf + 1) * entry_size);
 
return 1UL << shift;
 }
@@ -538,7 +529,6 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
switch (dev->type) {
case IVHD_DEV_ALL:
/* Use maximum BDF value for DEV_ALL */
-   update_last_devid(0x);
return 0x;
break;
case IVHD_DEV_SELECT:
@@ -546,7 +536,6 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
case IVHD_DEV_ALIAS:
case IVHD_DEV_EXT_SELECT:
/* all the above subfield types refer to device ids */
-   update_last_devid(dev->devid);
if (dev->devid > last_devid)
last_devid = dev->devid;
break;
@@ -688,7 +677,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
/*
 * let all alias entries point to itself
 */
-   for (i = 0; i <= amd_iommu_last_bdf; ++i)
+   for (i = 0; i <= pci_seg->last_bdf; ++i)
pci_seg->alias_table[i] = i;
 
return 0;
@@ -1054,7 +1043,7 @@ static bool __copy_device_table(struct amd_iommu *iommu)
return false;
}
 
-   for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
+   for (devid = 0; devid <= pci_seg->last_bdf; ++devid) {
pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
@@ -1315,7 +1304,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 
DUMP_printk("  DEV_ALL\t\t\tflags: %02x\n", e->flags);
 
-   for (dev_i = 0; dev_i <= amd_iommu_last_bdf; ++dev_i)
+   for (dev_i = 0; dev_i <= pci_seg->last_bdf; ++dev_i)
set_dev_entry_from_acpi(iommu, dev_i, e->flags, 
0);
break;
case IVHD_DEV_SELECT:
@@ -1560,9 +1549,9 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
 
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
-   pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
-   pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
-   pci_seg->rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE);
+   pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE, last_bdf);
+   pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE, 
last_bdf);
+   pci_seg->rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE, 
last_bdf);
 
pci_seg->id = 

[RESEND PATCH v1 28/37] iommu/amd: Remove global amd_iommu_alias_table

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This is replaced by the per PCI segment alias table.
Also remove alias_table_size variable.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 --
 drivers/iommu/amd/init.c| 24 
 2 files changed, 30 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index dc76ee2c3ea5..0aa170014b85 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -826,12 +826,6 @@ struct unity_map_entry {
  * Data structures for device handling
  */
 
-/*
- * Alias table to find requestor ids to device ids. Not locked because only
- * read on runtime.
- */
-extern u16 *amd_iommu_alias_table;
-
 /* size of the dma_ops aperture as power of 2 */
 extern unsigned amd_iommu_aperture_order;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index dd667dfb4355..b3905b1c4bc9 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -185,21 +185,12 @@ static bool amd_iommu_pc_present __read_mostly;
 
 bool amd_iommu_force_isolation __read_mostly;
 
-/*
- * The alias table is a driver specific data structure which contains the
- * mappings of the PCI device ids to the actual requestor ids on the IOMMU.
- * More than one device can share the same requestor id.
- */
-u16 *amd_iommu_alias_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
  */
 unsigned long *amd_iommu_pd_alloc_bitmap;
 
-static u32 alias_table_size;   /* size of the alias table */
-
 enum iommu_init_state {
IOMMU_START_STATE,
IOMMU_IVRS_DETECTED,
@@ -2791,10 +2782,6 @@ static void __init free_iommu_resources(void)
kmem_cache_destroy(amd_iommu_irq_cache);
amd_iommu_irq_cache = NULL;
 
-   free_pages((unsigned long)amd_iommu_alias_table,
-  get_order(alias_table_size));
-   amd_iommu_alias_table = NULL;
-
free_iommu_all();
free_pci_segment();
 }
@@ -2923,20 +2910,9 @@ static int __init early_amd_iommu_init(void)
amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);
DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);
 
-   alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
-
/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
 
-   /*
-* Alias table - map PCI Bus/Dev/Func to Bus/Dev/Func the
-* IOMMU see for that device
-*/
-   amd_iommu_alias_table = (void *)__get_free_pages(GFP_KERNEL,
-   get_order(alias_table_size));
-   if (amd_iommu_alias_table == NULL)
-   goto out;
-
amd_iommu_pd_alloc_bitmap = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
get_order(MAX_DOMAIN_ID/8));
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 27/37] iommu/amd: Remove global amd_iommu_dev_table

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Replace global amd_iommu_dev_table with per PCI segment device table.
Also remove "dev_table_size".

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 --
 drivers/iommu/amd/init.c| 30 +++--
 drivers/iommu/amd/iommu.c   |  8 +---
 3 files changed, 8 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 334206381f84..dc76ee2c3ea5 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -826,12 +826,6 @@ struct unity_map_entry {
  * Data structures for device handling
  */
 
-/*
- * Device table used by hardware. Read and write accesses by software are
- * locked with the amd_iommu_pd_table lock.
- */
-extern struct dev_table_entry *amd_iommu_dev_table;
-
 /*
  * Alias table to find requestor ids to device ids. Not locked because only
  * read on runtime.
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b2ddf407e967..dd667dfb4355 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -185,14 +185,6 @@ static bool amd_iommu_pc_present __read_mostly;
 
 bool amd_iommu_force_isolation __read_mostly;
 
-/*
- * Pointer to the device table which is shared by all AMD IOMMUs
- * it is indexed by the PCI device id or the HT unit id and contains
- * information about the domain the device belongs to as well as the
- * page table root pointer.
- */
-struct dev_table_entry *amd_iommu_dev_table;
-
 /*
  * The alias table is a driver specific data structure which contains the
  * mappings of the PCI device ids to the actual requestor ids on the IOMMU.
@@ -206,7 +198,6 @@ u16 *amd_iommu_alias_table;
  */
 unsigned long *amd_iommu_pd_alloc_bitmap;
 
-static u32 dev_table_size; /* size of the device table */
 static u32 alias_table_size;   /* size of the alias table */
 
 enum iommu_init_state {
@@ -402,10 +393,11 @@ static void iommu_set_device_table(struct amd_iommu 
*iommu)
 {
u64 entry;
u32 dev_table_size = iommu->pci_seg->dev_table_size;
+   void *dev_table = (void *)get_dev_table(iommu);
 
BUG_ON(iommu->mmio_base == NULL);
 
-   entry = iommu_virt_to_phys(amd_iommu_dev_table);
+   entry = iommu_virt_to_phys(dev_table);
entry |= (dev_table_size >> 12) - 1;
memcpy_toio(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET,
, sizeof(entry));
@@ -1148,12 +1140,6 @@ void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, 
u16 devid)
set_dev_entry_bit(iommu, devid, DEV_ENTRY_IW);
 }
 
-/* Writes the specific IOMMU for a device into the rlookup table */
-static void __init set_iommu_for_device(struct amd_iommu *iommu, u16 devid)
-{
-   iommu->pci_seg->rlookup_table[devid] = iommu;
-}
-
 /*
  * This function takes the device specific flags read from the ACPI
  * table and sets up the device table entry with that information
@@ -1178,7 +1164,7 @@ static void __init set_dev_entry_from_acpi(struct 
amd_iommu *iommu,
 
amd_iommu_apply_erratum_63(iommu, devid);
 
-   set_iommu_for_device(iommu, devid);
+   amd_iommu_set_rlookup_table(iommu, devid);
 }
 
 int __init add_special_device(u8 type, u8 id, u16 *devid, bool cmd_line)
@@ -2809,10 +2795,6 @@ static void __init free_iommu_resources(void)
   get_order(alias_table_size));
amd_iommu_alias_table = NULL;
 
-   free_pages((unsigned long)amd_iommu_dev_table,
-  get_order(dev_table_size));
-   amd_iommu_dev_table = NULL;
-
free_iommu_all();
free_pci_segment();
 }
@@ -2941,16 +2923,10 @@ static int __init early_amd_iommu_init(void)
amd_iommu_target_ivhd_type = get_highest_supported_ivhd_type(ivrs_base);
DUMP_printk("Using IVHD type %#x\n", amd_iommu_target_ivhd_type);
 
-   dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
 
/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
-   amd_iommu_dev_table = (void *)__get_free_pages(
- GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
- get_order(dev_table_size));
-   if (amd_iommu_dev_table == NULL)
-   goto out;
 
/*
 * Alias table - map PCI Bus/Dev/Func to Bus/Dev/Func the
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7fb9e9a3291b..13731fa241f2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -230,6 +230,7 @@ static struct iommu_dev_data *search_dev_data(struct 
amd_iommu *iommu, u16 devid
 static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
 {
struct amd_iommu *iommu;
+   struct dev_table_entry *dev_table;
u16 devid = pci_dev_id(pdev);
 
if (devid == 

[RESEND PATCH v1 26/37] iommu/amd: Update set_dev_entry_bit() and get_dev_entry_bit()

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

To include a pointer to per PCI segment device table.

Also include struct amd_iommu as one of the function parameter to
amd_iommu_apply_erratum_63() since it is needed when setting up DTE.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h |  2 +-
 drivers/iommu/amd/init.c  | 59 +++
 drivers/iommu/amd/iommu.c |  2 +-
 3 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 2947239700ce..64c954e168d7 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -13,7 +13,7 @@
 
 extern irqreturn_t amd_iommu_int_thread(int irq, void *data);
 extern irqreturn_t amd_iommu_int_handler(int irq, void *data);
-extern void amd_iommu_apply_erratum_63(u16 devid);
+extern void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid);
 extern void amd_iommu_restart_event_logging(struct amd_iommu *iommu);
 extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index dba1e03e0cd2..b2ddf407e967 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -988,22 +988,37 @@ static void iommu_enable_gt(struct amd_iommu *iommu)
 }
 
 /* sets a specific bit in the device table entry. */
-static void set_dev_entry_bit(u16 devid, u8 bit)
+static void __set_dev_entry_bit(struct dev_table_entry *dev_table,
+   u16 devid, u8 bit)
 {
int i = (bit >> 6) & 0x03;
int _bit = bit & 0x3f;
 
-   amd_iommu_dev_table[devid].data[i] |= (1UL << _bit);
+   dev_table[devid].data[i] |= (1UL << _bit);
 }
 
-static int get_dev_entry_bit(u16 devid, u8 bit)
+static void set_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
+{
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+   return __set_dev_entry_bit(dev_table, devid, bit);
+}
+
+static int __get_dev_entry_bit(struct dev_table_entry *dev_table,
+  u16 devid, u8 bit)
 {
int i = (bit >> 6) & 0x03;
int _bit = bit & 0x3f;
 
-   return (amd_iommu_dev_table[devid].data[i] & (1UL << _bit)) >> _bit;
+   return (dev_table[devid].data[i] & (1UL << _bit)) >> _bit;
 }
 
+static int get_dev_entry_bit(struct amd_iommu *iommu, u16 devid, u8 bit)
+{
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
+
+   return __get_dev_entry_bit(dev_table, devid, bit);
+}
 
 static bool __copy_device_table(struct amd_iommu *iommu)
 {
@@ -1122,15 +1137,15 @@ static bool copy_device_table(void)
return true;
 }
 
-void amd_iommu_apply_erratum_63(u16 devid)
+void amd_iommu_apply_erratum_63(struct amd_iommu *iommu, u16 devid)
 {
int sysmgt;
 
-   sysmgt = get_dev_entry_bit(devid, DEV_ENTRY_SYSMGT1) |
-(get_dev_entry_bit(devid, DEV_ENTRY_SYSMGT2) << 1);
+   sysmgt = get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1) |
+(get_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2) << 1);
 
if (sysmgt == 0x01)
-   set_dev_entry_bit(devid, DEV_ENTRY_IW);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_IW);
 }
 
 /* Writes the specific IOMMU for a device into the rlookup table */
@@ -1147,21 +1162,21 @@ static void __init set_dev_entry_from_acpi(struct 
amd_iommu *iommu,
   u16 devid, u32 flags, u32 ext_flags)
 {
if (flags & ACPI_DEVFLAG_INITPASS)
-   set_dev_entry_bit(devid, DEV_ENTRY_INIT_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_INIT_PASS);
if (flags & ACPI_DEVFLAG_EXTINT)
-   set_dev_entry_bit(devid, DEV_ENTRY_EINT_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_EINT_PASS);
if (flags & ACPI_DEVFLAG_NMI)
-   set_dev_entry_bit(devid, DEV_ENTRY_NMI_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_NMI_PASS);
if (flags & ACPI_DEVFLAG_SYSMGT1)
-   set_dev_entry_bit(devid, DEV_ENTRY_SYSMGT1);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT1);
if (flags & ACPI_DEVFLAG_SYSMGT2)
-   set_dev_entry_bit(devid, DEV_ENTRY_SYSMGT2);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_SYSMGT2);
if (flags & ACPI_DEVFLAG_LINT0)
-   set_dev_entry_bit(devid, DEV_ENTRY_LINT0_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT0_PASS);
if (flags & ACPI_DEVFLAG_LINT1)
-   set_dev_entry_bit(devid, DEV_ENTRY_LINT1_PASS);
+   set_dev_entry_bit(iommu, devid, DEV_ENTRY_LINT1_PASS);
 
-   amd_iommu_apply_erratum_63(devid);
+   amd_iommu_apply_erratum_63(iommu, devid);
 
set_iommu_for_device(iommu, devid);
 }
@@ -2519,8 +2534,8 @@ static void init_device_table_dma(struct 

[RESEND PATCH v1 25/37] iommu/amd: Update (un)init_device_table_dma()

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Include struct amd_iommu_pci_seg as a function parameter since
we need to access per PCI segment device table.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/init.c | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 70eb6338b45d..dba1e03e0cd2 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -238,7 +238,7 @@ static enum iommu_init_state init_state = IOMMU_START_STATE;
 
 static int amd_iommu_enable_interrupts(void);
 static int __init iommu_go_to_state(enum iommu_init_state state);
-static void init_device_table_dma(void);
+static void init_device_table_dma(struct amd_iommu_pci_seg *pci_seg);
 
 static bool amd_iommu_pre_enabled = true;
 
@@ -2115,6 +2115,7 @@ static void print_iommu_info(void)
 static int __init amd_iommu_init_pci(void)
 {
struct amd_iommu *iommu;
+   struct amd_iommu_pci_seg *pci_seg;
int ret;
 
for_each_iommu(iommu) {
@@ -2145,7 +2146,8 @@ static int __init amd_iommu_init_pci(void)
goto out;
}
 
-   init_device_table_dma();
+   for_each_pci_segment(pci_seg)
+   init_device_table_dma(pci_seg);
 
for_each_iommu(iommu)
iommu_flush_all_caches(iommu);
@@ -2508,9 +2510,13 @@ static int __init init_memory_definitions(struct 
acpi_table_header *table)
 /*
  * Init the device table to not allow DMA access for devices
  */
-static void init_device_table_dma(void)
+static void init_device_table_dma(struct amd_iommu_pci_seg *pci_seg)
 {
u32 devid;
+   struct dev_table_entry *dev_table = pci_seg->dev_table;
+
+   if (dev_table == NULL)
+   return;
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
set_dev_entry_bit(devid, DEV_ENTRY_VALID);
@@ -2518,13 +2524,17 @@ static void init_device_table_dma(void)
}
 }
 
-static void __init uninit_device_table_dma(void)
+static void __init uninit_device_table_dma(struct amd_iommu_pci_seg *pci_seg)
 {
u32 devid;
+   struct dev_table_entry *dev_table = pci_seg->dev_table;
+
+   if (dev_table == NULL)
+   return;
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
-   amd_iommu_dev_table[devid].data[0] = 0ULL;
-   amd_iommu_dev_table[devid].data[1] = 0ULL;
+   dev_table[devid].data[0] = 0ULL;
+   dev_table[devid].data[1] = 0ULL;
}
 }
 
@@ -3117,8 +3127,11 @@ static int __init state_next(void)
free_iommu_resources();
} else {
struct amd_iommu *iommu;
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg)
+   uninit_device_table_dma(pci_seg);
 
-   uninit_device_table_dma();
for_each_iommu(iommu)
iommu_flush_all_caches(iommu);
}
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 24/37] iommu/amd: Update set_dte_irq_entry

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment device table instead of global
device table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index f2a9f7078b2a..dad84f76c1a0 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2726,18 +2726,20 @@ EXPORT_SYMBOL(amd_iommu_device_info);
 static struct irq_chip amd_ir_chip;
 static DEFINE_SPINLOCK(iommu_table_lock);
 
-static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
+static void set_dte_irq_entry(struct amd_iommu *iommu, u16 devid,
+ struct irq_remap_table *table)
 {
u64 dte;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
 
-   dte = amd_iommu_dev_table[devid].data[2];
+   dte = dev_table[devid].data[2];
dte &= ~DTE_IRQ_PHYS_ADDR_MASK;
dte |= iommu_virt_to_phys(table->table);
dte |= DTE_IRQ_REMAP_INTCTL;
dte |= DTE_INTTABLEN;
dte |= DTE_IRQ_REMAP_ENABLE;
 
-   amd_iommu_dev_table[devid].data[2] = dte;
+   dev_table[devid].data[2] = dte;
 }
 
 static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 
devid)
@@ -2788,7 +2790,7 @@ static void set_remap_table_entry(struct amd_iommu 
*iommu, u16 devid,
struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
pci_seg->irq_lookup_table[devid] = table;
-   set_dte_irq_entry(devid, table);
+   set_dte_irq_entry(iommu, devid, table);
iommu_flush_dte(iommu, devid);
 }
 
@@ -2804,8 +2806,7 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
 
pci_seg = iommu->pci_seg;
pci_seg->irq_lookup_table[alias] = table;
-   set_dte_irq_entry(alias, table);
-
+   set_dte_irq_entry(iommu, alias, table);
iommu_flush_dte(pci_seg->rlookup_table[alias], alias);
 
return 0;
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 23/37] iommu/amd: Update dump_dte_entry

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment device table instead of global
device table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 81a7d3b617be..f2a9f7078b2a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -450,13 +450,13 @@ static void amd_iommu_uninit_device(struct device *dev)
  *
  /
 
-static void dump_dte_entry(u16 devid)
+static void dump_dte_entry(struct amd_iommu *iommu, u16 devid)
 {
int i;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
 
for (i = 0; i < 4; ++i)
-   pr_err("DTE[%d]: %016llx\n", i,
-   amd_iommu_dev_table[devid].data[i]);
+   pr_err("DTE[%d]: %016llx\n", i, dev_table[devid].data[i]);
 }
 
 static void dump_command(unsigned long phys_addr)
@@ -617,7 +617,7 @@ static void iommu_print_event(struct amd_iommu *iommu, void 
*__evt)
dev_err(dev, "Event logged [ILLEGAL_DEV_TABLE_ENTRY 
device=%02x:%02x.%x pasid=0x%05x address=0x%llx flags=0x%04x]\n",
PCI_BUS_NUM(devid), PCI_SLOT(devid), PCI_FUNC(devid),
pasid, address, flags);
-   dump_dte_entry(devid);
+   dump_dte_entry(iommu, devid);
break;
case EVENT_TYPE_DEV_TAB_ERR:
dev_err(dev, "Event logged [DEV_TAB_HARDWARE_ERROR 
device=%02x:%02x.%x "
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 22/37] iommu/amd: Update iommu_ignore_device

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment device table instead of global
device table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index adc1747234ff..81a7d3b617be 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -412,15 +412,15 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct device *dev)
 static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
 {
struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
int devid;
 
-   devid = get_device_id(dev);
+   devid = (get_device_id(dev)) & 0x;
if (devid < 0)
return;
 
-
pci_seg->rlookup_table[devid] = NULL;
-   memset(_iommu_dev_table[devid], 0, sizeof(struct dev_table_entry));
+   memset(_table[devid], 0, sizeof(struct dev_table_entry));
 
setup_aliases(iommu, dev);
 }
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 21/37] iommu/amd: Update set_dte_entry and clear_dte_entry

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Start using per PCI segment data structures instead of global data
structures.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index bae65b05e37b..adc1747234ff 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1534,6 +1534,7 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
u64 pte_root = 0;
u64 flags = 0;
u32 old_domid;
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
 
if (domain->iop.mode != PAGE_MODE_NONE)
pte_root = iommu_virt_to_phys(domain->iop.root);
@@ -1542,7 +1543,7 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
<< DEV_ENTRY_MODE_SHIFT;
pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;
 
-   flags = amd_iommu_dev_table[devid].data[1];
+   flags = dev_table[devid].data[1];
 
if (ats)
flags |= DTE_FLAG_IOTLB;
@@ -1581,9 +1582,9 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
flags &= ~DEV_DOMID_MASK;
flags |= domain->id;
 
-   old_domid = amd_iommu_dev_table[devid].data[1] & DEV_DOMID_MASK;
-   amd_iommu_dev_table[devid].data[1]  = flags;
-   amd_iommu_dev_table[devid].data[0]  = pte_root;
+   old_domid = dev_table[devid].data[1] & DEV_DOMID_MASK;
+   dev_table[devid].data[1]  = flags;
+   dev_table[devid].data[0]  = pte_root;
 
/*
 * A kdump kernel might be replacing a domain ID that was copied from
@@ -1595,11 +1596,13 @@ static void set_dte_entry(struct amd_iommu *iommu, u16 
devid,
}
 }
 
-static void clear_dte_entry(u16 devid)
+static void clear_dte_entry(struct amd_iommu *iommu, u16 devid)
 {
+   struct dev_table_entry *dev_table = get_dev_table(iommu);
+
/* remove entry from the device table seen by the hardware */
-   amd_iommu_dev_table[devid].data[0]  = DTE_FLAG_V | DTE_FLAG_TV;
-   amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;
+   dev_table[devid].data[0]  = DTE_FLAG_V | DTE_FLAG_TV;
+   dev_table[devid].data[1] &= DTE_FLAG_MASK;
 
amd_iommu_apply_erratum_63(devid);
 }
@@ -1643,7 +1646,7 @@ static void do_detach(struct iommu_dev_data *dev_data)
/* Update data structures */
dev_data->domain = NULL;
list_del(_data->list);
-   clear_dte_entry(dev_data->devid);
+   clear_dte_entry(iommu, dev_data->devid);
clone_aliases(iommu, dev_data->dev);
 
/* Flush the DTE entry */
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 20/37] iommu/amd: Convert to use per PCI segment rlookup_table

2022-04-04 Thread Vasant Hegde via iommu
Then, remove the global amd_iommu_rlookup_table and rlookup_table_size.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  5 -
 drivers/iommu/amd/init.c| 23 ++-
 drivers/iommu/amd/iommu.c   | 19 +--
 3 files changed, 11 insertions(+), 36 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 990272a470aa..334206381f84 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -838,11 +838,6 @@ extern struct dev_table_entry *amd_iommu_dev_table;
  */
 extern u16 *amd_iommu_alias_table;
 
-/*
- * Reverse lookup table to find the IOMMU which translates a specific device.
- */
-extern struct amd_iommu **amd_iommu_rlookup_table;
-
 /* size of the dma_ops aperture as power of 2 */
 extern unsigned amd_iommu_aperture_order;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 29ed687bc43f..70eb6338b45d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -200,12 +200,6 @@ struct dev_table_entry *amd_iommu_dev_table;
  */
 u16 *amd_iommu_alias_table;
 
-/*
- * The rlookup table is used to find the IOMMU which is responsible
- * for a specific device. It is also indexed by the PCI device id.
- */
-struct amd_iommu **amd_iommu_rlookup_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
@@ -214,7 +208,6 @@ unsigned long *amd_iommu_pd_alloc_bitmap;
 
 static u32 dev_table_size; /* size of the device table */
 static u32 alias_table_size;   /* size of the alias table */
-static u32 rlookup_table_size; /* size if the rlookup table */
 
 enum iommu_init_state {
IOMMU_START_STATE,
@@ -1143,7 +1136,7 @@ void amd_iommu_apply_erratum_63(u16 devid)
 /* Writes the specific IOMMU for a device into the rlookup table */
 static void __init set_iommu_for_device(struct amd_iommu *iommu, u16 devid)
 {
-   amd_iommu_rlookup_table[devid] = iommu;
+   iommu->pci_seg->rlookup_table[devid] = iommu;
 }
 
 /*
@@ -1825,7 +1818,7 @@ static int __init init_iommu_one(struct amd_iommu *iommu, 
struct ivhd_header *h,
 * Make sure IOMMU is not considered to translate itself. The IVRS
 * table tells us so, but this is a lie!
 */
-   amd_iommu_rlookup_table[iommu->devid] = NULL;
+   pci_seg->rlookup_table[iommu->devid] = NULL;
 
return 0;
 }
@@ -2783,10 +2776,6 @@ static void __init free_iommu_resources(void)
kmem_cache_destroy(amd_iommu_irq_cache);
amd_iommu_irq_cache = NULL;
 
-   free_pages((unsigned long)amd_iommu_rlookup_table,
-  get_order(rlookup_table_size));
-   amd_iommu_rlookup_table = NULL;
-
free_pages((unsigned long)amd_iommu_alias_table,
   get_order(alias_table_size));
amd_iommu_alias_table = NULL;
@@ -2925,7 +2914,6 @@ static int __init early_amd_iommu_init(void)
 
dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
-   rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE);
 
/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
@@ -2944,13 +2932,6 @@ static int __init early_amd_iommu_init(void)
if (amd_iommu_alias_table == NULL)
goto out;
 
-   /* IOMMU rlookup table - find the IOMMU for a specific device */
-   amd_iommu_rlookup_table = (void *)__get_free_pages(
-   GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
-   if (amd_iommu_rlookup_table == NULL)
-   goto out;
-
amd_iommu_pd_alloc_bitmap = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
get_order(MAX_DOMAIN_ID/8));
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 814b5abe676a..bae65b05e37b 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -286,10 +286,9 @@ static void setup_aliases(struct amd_iommu *iommu, struct 
device *dev)
clone_aliases(iommu, dev);
 }
 
-static struct iommu_dev_data *find_dev_data(u16 devid)
+static struct iommu_dev_data *find_dev_data(struct amd_iommu *iommu, u16 devid)
 {
struct iommu_dev_data *dev_data;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
dev_data = search_dev_data(iommu, devid);
 
@@ -387,7 +386,7 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct device *dev)
if (devid < 0)
return devid;
 
-   dev_data = find_dev_data(devid);
+   dev_data = find_dev_data(iommu, devid);
if (!dev_data)
return -ENOMEM;
 
@@ -402,9 +401,6 @@ static int iommu_init_device(struct amd_iommu *iommu, 
struct 

[RESEND PATCH v1 19/37] iommu/amd: Update alloc_irq_table and alloc_irq_index

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Pass amd_iommu structure as one of the parameter to these functions
as its needed to retrieve variable tables inside these functions.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 26 +-
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 33b69843b860..814b5abe676a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2809,21 +2809,17 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
return 0;
 }
 
-static struct irq_remap_table *alloc_irq_table(u16 devid, struct pci_dev *pdev)
+static struct irq_remap_table *alloc_irq_table(struct amd_iommu *iommu,
+  u16 devid, struct pci_dev *pdev)
 {
struct irq_remap_table *table = NULL;
struct irq_remap_table *new_table = NULL;
struct amd_iommu_pci_seg *pci_seg;
-   struct amd_iommu *iommu;
unsigned long flags;
u16 alias;
 
spin_lock_irqsave(_table_lock, flags);
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (!iommu)
-   goto out_unlock;
-
pci_seg = iommu->pci_seg;
table = pci_seg->irq_lookup_table[devid];
if (table)
@@ -2879,18 +2875,14 @@ static struct irq_remap_table *alloc_irq_table(u16 
devid, struct pci_dev *pdev)
return table;
 }
 
-static int alloc_irq_index(u16 devid, int count, bool align,
-  struct pci_dev *pdev)
+static int alloc_irq_index(struct amd_iommu *iommu, u16 devid, int count,
+  bool align, struct pci_dev *pdev)
 {
struct irq_remap_table *table;
int index, c, alignment = 1;
unsigned long flags;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
-   if (!iommu)
-   return -ENODEV;
 
-   table = alloc_irq_table(devid, pdev);
+   table = alloc_irq_table(iommu, devid, pdev);
if (!table)
return -ENODEV;
 
@@ -3262,7 +3254,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
struct irq_remap_table *table;
 
-   table = alloc_irq_table(devid, NULL);
+   table = alloc_irq_table(iommu, devid, NULL);
if (table) {
if (!table->min_index) {
/*
@@ -3282,10 +3274,10 @@ static int irq_remapping_alloc(struct irq_domain 
*domain, unsigned int virq,
   info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) {
bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI);
 
-   index = alloc_irq_index(devid, nr_irqs, align,
+   index = alloc_irq_index(iommu, devid, nr_irqs, align,
msi_desc_to_pci_dev(info->desc));
} else {
-   index = alloc_irq_index(devid, nr_irqs, false, NULL);
+   index = alloc_irq_index(iommu, devid, nr_irqs, false, NULL);
}
 
if (index < 0) {
@@ -3411,8 +3403,8 @@ static int irq_remapping_select(struct irq_domain *d, 
struct irq_fwspec *fwspec,
 
if (devid < 0)
return 0;
+   iommu = __rlookup_amd_iommu((devid >> 16), (devid & 0x));
 
-   iommu = amd_iommu_rlookup_table[devid];
return iommu && iommu->ir_domain == d;
 }
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 18/37] iommu/amd: Update amd_irte_ops functions

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Pass amd_iommu structure as one of the parameter to amd_irte_ops functions
since its needed to activate/deactivate the iommu.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 ++--
 drivers/iommu/amd/iommu.c   | 51 -
 2 files changed, 24 insertions(+), 33 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 0ef9ecb8d3fc..990272a470aa 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -999,9 +999,9 @@ struct amd_ir_data {
 
 struct amd_irte_ops {
void (*prepare)(void *, u32, bool, u8, u32, int);
-   void (*activate)(void *, u16, u16);
-   void (*deactivate)(void *, u16, u16);
-   void (*set_affinity)(void *, u16, u16, u8, u32);
+   void (*activate)(struct amd_iommu *iommu, void *, u16, u16);
+   void (*deactivate)(struct amd_iommu *iommu, void *, u16, u16);
+   void (*set_affinity)(struct amd_iommu *iommu, void *, u16, u16, u8, 
u32);
void *(*get)(struct irq_remap_table *, int);
void (*set_allocated)(struct irq_remap_table *, int);
bool (*is_allocated)(struct irq_remap_table *, int);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 0800d69ef7b4..33b69843b860 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2929,19 +2929,14 @@ static int alloc_irq_index(u16 devid, int count, bool 
align,
return index;
 }
 
-static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
- struct amd_ir_data *data)
+static int modify_irte_ga(struct amd_iommu *iommu, u16 devid, int index,
+ struct irte_ga *irte, struct amd_ir_data *data)
 {
bool ret;
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
unsigned long flags;
struct irte_ga *entry;
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu == NULL)
-   return -EINVAL;
-
table = get_irq_table(iommu, devid);
if (!table)
return -ENOMEM;
@@ -2973,16 +2968,12 @@ static int modify_irte_ga(u16 devid, int index, struct 
irte_ga *irte,
return 0;
 }
 
-static int modify_irte(u16 devid, int index, union irte *irte)
+static int modify_irte(struct amd_iommu *iommu,
+  u16 devid, int index, union irte *irte)
 {
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
unsigned long flags;
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu == NULL)
-   return -EINVAL;
-
table = get_irq_table(iommu, devid);
if (!table)
return -ENOMEM;
@@ -3044,49 +3035,49 @@ static void irte_ga_prepare(void *entry,
irte->lo.fields_remap.valid   = 1;
 }
 
-static void irte_activate(void *entry, u16 devid, u16 index)
+static void irte_activate(struct amd_iommu *iommu, void *entry, u16 devid, u16 
index)
 {
union irte *irte = (union irte *) entry;
 
irte->fields.valid = 1;
-   modify_irte(devid, index, irte);
+   modify_irte(iommu, devid, index, irte);
 }
 
-static void irte_ga_activate(void *entry, u16 devid, u16 index)
+static void irte_ga_activate(struct amd_iommu *iommu, void *entry, u16 devid, 
u16 index)
 {
struct irte_ga *irte = (struct irte_ga *) entry;
 
irte->lo.fields_remap.valid = 1;
-   modify_irte_ga(devid, index, irte, NULL);
+   modify_irte_ga(iommu, devid, index, irte, NULL);
 }
 
-static void irte_deactivate(void *entry, u16 devid, u16 index)
+static void irte_deactivate(struct amd_iommu *iommu, void *entry, u16 devid, 
u16 index)
 {
union irte *irte = (union irte *) entry;
 
irte->fields.valid = 0;
-   modify_irte(devid, index, irte);
+   modify_irte(iommu, devid, index, irte);
 }
 
-static void irte_ga_deactivate(void *entry, u16 devid, u16 index)
+static void irte_ga_deactivate(struct amd_iommu *iommu, void *entry, u16 
devid, u16 index)
 {
struct irte_ga *irte = (struct irte_ga *) entry;
 
irte->lo.fields_remap.valid = 0;
-   modify_irte_ga(devid, index, irte, NULL);
+   modify_irte_ga(iommu, devid, index, irte, NULL);
 }
 
-static void irte_set_affinity(void *entry, u16 devid, u16 index,
+static void irte_set_affinity(struct amd_iommu *iommu, void *entry, u16 devid, 
u16 index,
  u8 vector, u32 dest_apicid)
 {
union irte *irte = (union irte *) entry;
 
irte->fields.vector = vector;
irte->fields.destination = dest_apicid;
-   modify_irte(devid, index, irte);
+   modify_irte(iommu, devid, index, irte);
 }
 
-static void irte_ga_set_affinity(void *entry, u16 devid, u16 index,
+static void irte_ga_set_affinity(struct amd_iommu *iommu, void *entry, u16 
devid, u16 index,
 u8 vector, u32 

[RESEND PATCH v1 17/37] iommu/amd: Introduce struct amd_ir_data.iommu

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Add a pointer to struct amd_iommu to amd_ir_data structure, which
can be used to correlate interrupt remapping data to a per-PCI-segment
interrupt remapping table.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  1 +
 drivers/iommu/amd/iommu.c   | 34 +
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index badf49d2371c..0ef9ecb8d3fc 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -981,6 +981,7 @@ struct irq_2_irte {
 
 struct amd_ir_data {
u32 cached_ga_tag;
+   struct amd_iommu *iommu;
struct irq_2_irte irq_2_irte;
struct msi_msg msi_entry;
void *entry;/* Pointer to union irte or struct irte_ga */
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c3941e342fb2..0800d69ef7b4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2997,16 +2997,11 @@ static int modify_irte(u16 devid, int index, union irte 
*irte)
return 0;
 }
 
-static void free_irte(u16 devid, int index)
+static void free_irte(struct amd_iommu *iommu, u16 devid, int index)
 {
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
unsigned long flags;
 
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu == NULL)
-   return;
-
table = get_irq_table(iommu, devid);
if (!table)
return;
@@ -3190,7 +3185,7 @@ static void irq_remapping_prepare_irte(struct amd_ir_data 
*data,
   int devid, int index, int sub_handle)
 {
struct irq_2_irte *irte_info = >irq_2_irte;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+   struct amd_iommu *iommu = data->iommu;
 
if (!iommu)
return;
@@ -3331,6 +3326,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
goto out_free_data;
}
 
+   data->iommu = iommu;
irq_data->hwirq = (devid << 16) + i;
irq_data->chip_data = data;
irq_data->chip = _ir_chip;
@@ -3347,7 +3343,7 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
kfree(irq_data->chip_data);
}
for (i = 0; i < nr_irqs; i++)
-   free_irte(devid, index + i);
+   free_irte(iommu, devid, index + i);
 out_free_parent:
irq_domain_free_irqs_common(domain, virq, nr_irqs);
return ret;
@@ -3366,7 +3362,7 @@ static void irq_remapping_free(struct irq_domain *domain, 
unsigned int virq,
if (irq_data && irq_data->chip_data) {
data = irq_data->chip_data;
irte_info = >irq_2_irte;
-   free_irte(irte_info->devid, irte_info->index);
+   free_irte(data->iommu, irte_info->devid, 
irte_info->index);
kfree(data->entry);
kfree(data);
}
@@ -3384,7 +3380,7 @@ static int irq_remapping_activate(struct irq_domain 
*domain,
 {
struct amd_ir_data *data = irq_data->chip_data;
struct irq_2_irte *irte_info = >irq_2_irte;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid];
+   struct amd_iommu *iommu = data->iommu;
struct irq_cfg *cfg = irqd_cfg(irq_data);
 
if (!iommu)
@@ -3401,7 +3397,7 @@ static void irq_remapping_deactivate(struct irq_domain 
*domain,
 {
struct amd_ir_data *data = irq_data->chip_data;
struct irq_2_irte *irte_info = >irq_2_irte;
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid];
+   struct amd_iommu *iommu = data->iommu;
 
if (iommu)
iommu->irte_ops->deactivate(data->entry, irte_info->devid,
@@ -3497,12 +3493,16 @@ EXPORT_SYMBOL(amd_iommu_deactivate_guest_mode);
 static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
 {
int ret;
-   struct amd_iommu *iommu;
struct amd_iommu_pi_data *pi_data = vcpu_info;
struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data;
struct amd_ir_data *ir_data = data->chip_data;
struct irq_2_irte *irte_info = _data->irq_2_irte;
-   struct iommu_dev_data *dev_data = search_dev_data(NULL, 
irte_info->devid);
+   struct iommu_dev_data *dev_data;
+
+   if (ir_data->iommu == NULL)
+   return -EINVAL;
+
+   dev_data = search_dev_data(ir_data->iommu, irte_info->devid);
 
/* Note:
 * This device has never been set up for guest mode.
@@ -3524,10 +3524,6 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
pi_data->is_guest_mode = false;
 

[RESEND PATCH v1 16/37] iommu/amd: Update irq_remapping_alloc to use IOMMU lookup helper function

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

To allow IOMMU rlookup using both PCI segment and device ID.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/iommu.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index cc200bfaa8c4..c3941e342fb2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3241,8 +3241,9 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
struct irq_alloc_info *info = arg;
struct irq_data *irq_data;
struct amd_ir_data *data = NULL;
+   struct amd_iommu *iommu;
struct irq_cfg *cfg;
-   int i, ret, devid;
+   int i, ret, devid, seg, sbdf;
int index;
 
if (!info)
@@ -3258,8 +3259,14 @@ static int irq_remapping_alloc(struct irq_domain 
*domain, unsigned int virq,
if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 
-   devid = get_devid(info);
-   if (devid < 0)
+   sbdf = get_devid(info);
+   if (sbdf < 0)
+   return -EINVAL;
+
+   seg = sbdf >> 16;
+   devid = sbdf & 0x;
+   iommu = __rlookup_amd_iommu(seg, devid);
+   if (!iommu)
return -EINVAL;
 
ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
@@ -3268,7 +3275,6 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
 
if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) {
struct irq_remap_table *table;
-   struct amd_iommu *iommu;
 
table = alloc_irq_table(devid, NULL);
if (table) {
@@ -3278,7 +3284,6 @@ static int irq_remapping_alloc(struct irq_domain *domain, 
unsigned int virq,
 * interrupts.
 */
table->min_index = 32;
-   iommu = amd_iommu_rlookup_table[devid];
for (i = 0; i < 32; ++i)
iommu->irte_ops->set_allocated(table, 
i);
}
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 15/37] iommu/amd: Convert to use rlookup_amd_iommu helper function

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Use rlookup_amd_iommu() helper function which will give per PCI
segment rlookup_table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 64 +++
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2c18f45fc13d..cc200bfaa8c4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -229,13 +229,17 @@ static struct iommu_dev_data *search_dev_data(struct 
amd_iommu *iommu, u16 devid
 
 static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
 {
+   struct amd_iommu *iommu;
u16 devid = pci_dev_id(pdev);
 
if (devid == alias)
return 0;
 
-   amd_iommu_rlookup_table[alias] =
-   amd_iommu_rlookup_table[devid];
+   iommu = rlookup_amd_iommu(>dev);
+   if (!iommu)
+   return 0;
+
+   amd_iommu_set_rlookup_table(iommu, alias);
memcpy(amd_iommu_dev_table[alias].data,
   amd_iommu_dev_table[devid].data,
   sizeof(amd_iommu_dev_table[alias].data));
@@ -365,7 +369,7 @@ static bool check_device(struct device *dev)
if (devid > amd_iommu_last_bdf)
return false;
 
-   if (amd_iommu_rlookup_table[devid] == NULL)
+   if (rlookup_amd_iommu(dev) == NULL)
return false;
 
return true;
@@ -1269,7 +1273,9 @@ static int device_flush_iotlb(struct iommu_dev_data 
*dev_data,
int qdep;
 
qdep = dev_data->ats.qdep;
-   iommu= amd_iommu_rlookup_table[dev_data->devid];
+   iommu= rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return -EINVAL;
 
build_inv_iotlb_pages(, dev_data->devid, qdep, address, size);
 
@@ -1294,7 +1300,9 @@ static int device_flush_dte(struct iommu_dev_data 
*dev_data)
u16 alias;
int ret;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return -EINVAL;
 
pdev = to_pci_dev(dev_data->dev);
if (pdev)
@@ -1522,8 +1530,8 @@ static void free_gcr3_table(struct protection_domain 
*domain)
free_page((unsigned long)domain->gcr3_tbl);
 }
 
-static void set_dte_entry(u16 devid, struct protection_domain *domain,
- bool ats, bool ppr)
+static void set_dte_entry(struct amd_iommu *iommu, u16 devid,
+ struct protection_domain *domain, bool ats, bool ppr)
 {
u64 pte_root = 0;
u64 flags = 0;
@@ -1542,8 +1550,6 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
flags |= DTE_FLAG_IOTLB;
 
if (ppr) {
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
if (iommu_feature(iommu, FEATURE_EPHSUP))
pte_root |= 1ULL << DEV_ENTRY_PPR;
}
@@ -1587,8 +1593,6 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
 * entries for the old domain ID that is being overwritten
 */
if (old_domid) {
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
amd_iommu_flush_tlb_domid(iommu, old_domid);
}
 }
@@ -1608,7 +1612,9 @@ static void do_attach(struct iommu_dev_data *dev_data,
struct amd_iommu *iommu;
bool ats;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return;
ats   = dev_data->ats.enabled;
 
/* Update data structures */
@@ -1620,7 +1626,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
domain->dev_cnt += 1;
 
/* Update device table */
-   set_dte_entry(dev_data->devid, domain,
+   set_dte_entry(iommu, dev_data->devid, domain,
  ats, dev_data->iommu_v2);
clone_aliases(iommu, dev_data->dev);
 
@@ -1632,7 +1638,9 @@ static void do_detach(struct iommu_dev_data *dev_data)
struct protection_domain *domain = dev_data->domain;
struct amd_iommu *iommu;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return;
 
/* Update data structures */
dev_data->domain = NULL;
@@ -1810,13 +1818,14 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
 {
struct iommu_device *iommu_dev;
struct amd_iommu *iommu;
-   int ret, devid;
+   int ret;
 
if (!check_device(dev))
return ERR_PTR(-ENODEV);
 
-   devid = get_device_id(dev);
-   iommu = amd_iommu_rlookup_table[devid];
+   iommu = rlookup_amd_iommu(dev);
+   if (!iommu)
+   return ERR_PTR(-ENODEV);
 
if (dev_iommu_priv_get(dev))

[RESEND PATCH v1 14/37] iommu/amd: Convert to use per PCI segment irq_lookup_table

2022-04-04 Thread Vasant Hegde via iommu
Then, remove the global irq_lookup_table.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 --
 drivers/iommu/amd/init.c| 19 ---
 drivers/iommu/amd/iommu.c   | 36 ++---
 3 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 6f1900fa86d2..badf49d2371c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -444,8 +444,6 @@ struct irq_remap_table {
u32 *table;
 };
 
-extern struct irq_remap_table **irq_lookup_table;
-
 /* Interrupt remapping feature used? */
 extern bool amd_iommu_irq_remap;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1688532dffb8..29ed687bc43f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -206,12 +206,6 @@ u16 *amd_iommu_alias_table;
  */
 struct amd_iommu **amd_iommu_rlookup_table;
 
-/*
- * This table is used to find the irq remapping table for a given device id
- * quickly.
- */
-struct irq_remap_table **irq_lookup_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
@@ -2786,11 +2780,6 @@ static struct syscore_ops amd_iommu_syscore_ops = {
 
 static void __init free_iommu_resources(void)
 {
-   kmemleak_free(irq_lookup_table);
-   free_pages((unsigned long)irq_lookup_table,
-  get_order(rlookup_table_size));
-   irq_lookup_table = NULL;
-
kmem_cache_destroy(amd_iommu_irq_cache);
amd_iommu_irq_cache = NULL;
 
@@ -3011,14 +3000,6 @@ static int __init early_amd_iommu_init(void)
if (alloc_irq_lookup_table(pci_seg))
goto out;
}
-
-   irq_lookup_table = (void *)__get_free_pages(
-   GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
-   kmemleak_alloc(irq_lookup_table, rlookup_table_size,
-  1, GFP_KERNEL);
-   if (!irq_lookup_table)
-   goto out;
}
 
ret = init_memory_definitions(ivrs_base);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 97cae067cbb4..2c18f45fc13d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2727,16 +2727,18 @@ static void set_dte_irq_entry(u16 devid, struct 
irq_remap_table *table)
amd_iommu_dev_table[devid].data[2] = dte;
 }
 
-static struct irq_remap_table *get_irq_table(u16 devid)
+static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 
devid)
 {
struct irq_remap_table *table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
if (WARN_ONCE(!amd_iommu_rlookup_table[devid],
  "%s: no iommu for devid %x\n", __func__, devid))
return NULL;
 
-   table = irq_lookup_table[devid];
-   if (WARN_ONCE(!table, "%s: no table for devid %x\n", __func__, devid))
+   table = pci_seg->irq_lookup_table[devid];
+   if (WARN_ONCE(!table, "%s: no table for devid %x:%x\n",
+ __func__, pci_seg->id, devid))
return NULL;
 
return table;
@@ -2769,7 +2771,9 @@ static struct irq_remap_table *__alloc_irq_table(void)
 static void set_remap_table_entry(struct amd_iommu *iommu, u16 devid,
  struct irq_remap_table *table)
 {
-   irq_lookup_table[devid] = table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   pci_seg->irq_lookup_table[devid] = table;
set_dte_irq_entry(devid, table);
iommu_flush_dte(iommu, devid);
 }
@@ -2778,8 +2782,14 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
   void *data)
 {
struct irq_remap_table *table = data;
+   struct amd_iommu_pci_seg *pci_seg;
+   struct amd_iommu *iommu = rlookup_amd_iommu(>dev);
 
-   irq_lookup_table[alias] = table;
+   if (!iommu)
+   return -EINVAL;
+
+   pci_seg = iommu->pci_seg;
+   pci_seg->irq_lookup_table[alias] = table;
set_dte_irq_entry(alias, table);
 
iommu_flush_dte(amd_iommu_rlookup_table[alias], alias);
@@ -2803,12 +2813,12 @@ static struct irq_remap_table *alloc_irq_table(u16 
devid, struct pci_dev *pdev)
goto out_unlock;
 
pci_seg = iommu->pci_seg;
-   table = irq_lookup_table[devid];
+   table = pci_seg->irq_lookup_table[devid];
if (table)
goto out_unlock;
 
alias = pci_seg->alias_table[devid];
-   table = irq_lookup_table[alias];
+   table = pci_seg->irq_lookup_table[alias];
if (table) {
set_remap_table_entry(iommu, devid, table);
  

[RESEND PATCH v1 13/37] iommu/amd: Introduce per PCI segment rlookup table size

2022-04-04 Thread Vasant Hegde via iommu
It will replace global "rlookup_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 11 ++-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 4bed64ad2068..6f1900fa86d2 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -553,6 +553,9 @@ struct amd_iommu_pci_seg {
/* Size of the alias table */
u32 alias_table_size;
 
+   /* Size of the rlookup table */
+   u32 rlookup_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d4e4f49066f8..1688532dffb8 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -671,7 +671,7 @@ static inline int __init alloc_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 {
pci_seg->rlookup_table = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
+   
get_order(pci_seg->rlookup_table_size));
if (pci_seg->rlookup_table == NULL)
return -ENOMEM;
 
@@ -681,7 +681,7 @@ static inline int __init alloc_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->rlookup_table,
-  get_order(rlookup_table_size));
+  get_order(pci_seg->rlookup_table_size));
pci_seg->rlookup_table = NULL;
 }
 
@@ -689,9 +689,9 @@ static inline int __init alloc_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_se
 {
pci_seg->irq_lookup_table = (void *)__get_free_pages(
 GFP_KERNEL | __GFP_ZERO,
-get_order(rlookup_table_size));
+
get_order(pci_seg->rlookup_table_size));
kmemleak_alloc(pci_seg->irq_lookup_table,
-  rlookup_table_size, 1, GFP_KERNEL);
+  pci_seg->rlookup_table_size, 1, GFP_KERNEL);
if (pci_seg->irq_lookup_table == NULL)
return -ENOMEM;
 
@@ -702,7 +702,7 @@ static inline void free_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 {
kmemleak_free(pci_seg->irq_lookup_table);
free_pages((unsigned long)pci_seg->irq_lookup_table,
-  get_order(rlookup_table_size));
+  get_order(pci_seg->rlookup_table_size));
pci_seg->irq_lookup_table = NULL;
 }
 
@@ -1583,6 +1583,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
+   pci_seg->rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 12/37] iommu/amd: Introduce per PCI segment alias table size

2022-04-04 Thread Vasant Hegde via iommu
It will replace global "alias_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 3 +++
 drivers/iommu/amd/init.c| 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index aa666d0723ba..4bed64ad2068 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -550,6 +550,9 @@ struct amd_iommu_pci_seg {
/* Size of the device table */
u32 dev_table_size;
 
+   /* Size of the alias table */
+   u32 alias_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f8da686182b5..d4e4f49066f8 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -711,7 +711,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
int i;
 
pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL,
-   
get_order(alias_table_size));
+   get_order(pci_seg->alias_table_size));
if (!pci_seg->alias_table)
return -ENOMEM;
 
@@ -727,7 +727,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
 static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->alias_table,
-  get_order(alias_table_size));
+  get_order(pci_seg->alias_table_size));
pci_seg->alias_table = NULL;
 }
 
@@ -1582,6 +1582,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
+   pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 11/37] iommu/amd: Introduce per PCI segment device table size

2022-04-04 Thread Vasant Hegde via iommu
With multiple pci segment support, number of BDF supported by each
segment may differ. Hence introduce per segment device table size
which depends on last_bdf. This will replace global
"device_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 18 ++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index e39e7db54e69..aa666d0723ba 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -547,6 +547,9 @@ struct amd_iommu_pci_seg {
/* Largest PCI device id we expect translation requests for */
u16 last_bdf;
 
+   /* Size of the device table */
+   u32 dev_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 71f39551a83a..f8da686182b5 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -414,6 +414,7 @@ static void iommu_set_cwwb_range(struct amd_iommu *iommu)
 static void iommu_set_device_table(struct amd_iommu *iommu)
 {
u64 entry;
+   u32 dev_table_size = iommu->pci_seg->dev_table_size;
 
BUG_ON(iommu->mmio_base == NULL);
 
@@ -651,7 +652,7 @@ static int __init find_last_devid_acpi(struct 
acpi_table_header *table, u16 pci_
 static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | 
GFP_DMA32,
- 
get_order(dev_table_size));
+ 
get_order(pci_seg->dev_table_size));
if (!pci_seg->dev_table)
return -ENOMEM;
 
@@ -661,7 +662,7 @@ static inline int __init alloc_dev_table(struct 
amd_iommu_pci_seg *pci_seg)
 static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->dev_table,
-   get_order(dev_table_size));
+   get_order(pci_seg->dev_table_size));
pci_seg->dev_table = NULL;
 }
 
@@ -1034,7 +1035,7 @@ static bool __copy_device_table(struct amd_iommu *iommu)
entry = (((u64) hi) << 32) + lo;
 
old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
-   if (old_devtb_size != dev_table_size) {
+   if (old_devtb_size != pci_seg->dev_table_size) {
pr_err("The device table size of IOMMU:%d is not expected!\n",
iommu->index);
return false;
@@ -1053,15 +1054,15 @@ static bool __copy_device_table(struct amd_iommu *iommu)
}
old_devtb = (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) && 
is_kdump_kernel())
? (__force void *)ioremap_encrypted(old_devtb_phys,
-   dev_table_size)
-   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   pci_seg->dev_table_size)
+   : memremap(old_devtb_phys, pci_seg->dev_table_size, 
MEMREMAP_WB);
 
if (!old_devtb)
return false;
 
gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
-   get_order(dev_table_size));
+   
get_order(pci_seg->dev_table_size));
if (pci_seg->old_dev_tbl_cpy == NULL) {
pr_err("Failed to allocate memory for copying old device 
table!\n");
memunmap(old_devtb);
@@ -1580,6 +1581,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
 
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
+   pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
@@ -2675,7 +2677,7 @@ static void early_enable_iommus(void)
for_each_pci_segment(pci_seg) {
if (pci_seg->old_dev_tbl_cpy != NULL) {
free_pages((unsigned 
long)pci_seg->old_dev_tbl_cpy,
-   get_order(dev_table_size));
+   
get_order(pci_seg->dev_table_size));
pci_seg->old_dev_tbl_cpy = NULL;
}
}
@@ -2689,7 +2691,7 @@ static void early_enable_iommus(void)
 
for_each_pci_segment(pci_seg) {
free_pages((unsigned long)pci_seg->dev_table,
-  get_order(dev_table_size));
+  

[RESEND PATCH v1 10/37] iommu/amd: Introduce per PCI segment last_bdf

2022-04-04 Thread Vasant Hegde via iommu
Current code uses global "amd_iommu_last_bdf" to track the last bdf
supported by the system. This value is used for various memory
allocation, device data flushing, etc.

Introduce per PCI segment last_bdf which will be used to track last bdf
supported by the given PCI segment and use this value for all per
segment memory allocations. Eventually it will replace global
"amd_iommu_last_bdf".

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 ++
 drivers/iommu/amd/init.c| 68 ++---
 2 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index c4c9c35e2bf7..e39e7db54e69 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -544,6 +544,9 @@ struct amd_iommu_pci_seg {
/* PCI segment number */
u16 id;
 
+   /* Largest PCI device id we expect translation requests for */
+   u16 last_bdf;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d613e20ea013..71f39551a83a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -550,6 +550,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
 {
u8 *p = (void *)h, *end = (void *)h;
struct ivhd_entry *dev;
+   int last_devid = -EINVAL;
 
u32 ivhd_size = get_ivhd_header_size(h);
 
@@ -567,6 +568,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
case IVHD_DEV_ALL:
/* Use maximum BDF value for DEV_ALL */
update_last_devid(0x);
+   return 0x;
break;
case IVHD_DEV_SELECT:
case IVHD_DEV_RANGE_END:
@@ -574,6 +576,8 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
case IVHD_DEV_EXT_SELECT:
/* all the above subfield types refer to device ids */
update_last_devid(dev->devid);
+   if (dev->devid > last_devid)
+   last_devid = dev->devid;
break;
default:
break;
@@ -583,7 +587,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
 
WARN_ON(p != end);
 
-   return 0;
+   return last_devid;
 }
 
 static int __init check_ivrs_checksum(struct acpi_table_header *table)
@@ -607,27 +611,31 @@ static int __init check_ivrs_checksum(struct 
acpi_table_header *table)
  * id which we need to handle. This is the first of three functions which parse
  * the ACPI table. So we check the checksum here.
  */
-static int __init find_last_devid_acpi(struct acpi_table_header *table)
+static int __init find_last_devid_acpi(struct acpi_table_header *table, u16 
pci_seg)
 {
u8 *p = (u8 *)table, *end = (u8 *)table;
struct ivhd_header *h;
+   int last_devid, last_bdf = 0;
 
p += IVRS_HEADER_LENGTH;
 
end += table->length;
while (p < end) {
h = (struct ivhd_header *)p;
-   if (h->type == amd_iommu_target_ivhd_type) {
-   int ret = find_last_devid_from_ivhd(h);
-
-   if (ret)
-   return ret;
+   if (h->pci_seg == pci_seg &&
+   h->type == amd_iommu_target_ivhd_type) {
+   last_devid = find_last_devid_from_ivhd(h);
+
+   if (last_devid < 0)
+   return -EINVAL;
+   if (last_devid > last_bdf)
+   last_bdf = last_devid;
}
p += h->length;
}
WARN_ON(p != end);
 
-   return 0;
+   return last_bdf;
 }
 
 /
@@ -1551,14 +1559,28 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 }
 
 /* Allocate PCI segment data structure */
-static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id)
+static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id,
+ struct acpi_table_header *ivrs_base)
 {
struct amd_iommu_pci_seg *pci_seg;
+   int last_bdf;
+
+   /*
+* First parse ACPI tables to find the largest Bus/Dev/Func we need to
+* handle in this PCI segment. Upon this information the shared data
+* structures for the PCI segments in the system will be allocated.
+*/
+   last_bdf = find_last_devid_acpi(ivrs_base, id);
+   if (last_bdf < 0)
+   return NULL;
 
pci_seg = kzalloc(sizeof(struct amd_iommu_pci_seg), GFP_KERNEL);
if (pci_seg == NULL)
 

[RESEND PATCH v1 09/37] iommu/amd: Introduce per PCI segment unity map list

2022-04-04 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments. In order to support
multiple PCI segments IVMD table in IVRS structure is enhanced to
include pci segment id. Update ivmd_header structure to include "pci_seg".

Also introduce per PCI segment unity map list. It will replace global
amd_iommu_unity_map list.

Note that we have used "reserved" field in IVMD table to include "pci_seg
id" which was set to zero. It will take care of backward compatibility
(new kernel will work fine on older systems).

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 13 +++--
 drivers/iommu/amd/init.c| 30 +++--
 drivers/iommu/amd/iommu.c   |  8 +++-
 3 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index f9776f188e36..c4c9c35e2bf7 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -579,6 +579,13 @@ struct amd_iommu_pci_seg {
 * More than one device can share the same requestor id.
 */
u16 *alias_table;
+
+   /*
+* A list of required unity mappings we find in ACPI. It is not locked
+* because as runtime it is only read. It is created at ACPI table
+* parsing time.
+*/
+   struct list_head unity_map;
 };
 
 /*
@@ -805,12 +812,6 @@ struct unity_map_entry {
int prot;
 };
 
-/*
- * List of all unity mappings. It is not locked because as runtime it is only
- * read. It is created at ACPI table parsing time.
- */
-extern struct list_head amd_iommu_unity_map;
-
 /*
  * Data structures for device handling
  */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index fe31de6e764c..d613e20ea013 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -142,7 +142,8 @@ struct ivmd_header {
u16 length;
u16 devid;
u16 aux;
-   u64 resv;
+   u16 pci_seg;
+   u8  resv[6];
u64 range_start;
u64 range_length;
 } __attribute__((packed));
@@ -162,8 +163,6 @@ static int amd_iommu_target_ivhd_type;
 
 u16 amd_iommu_last_bdf;/* largest PCI device id we have
   to handle */
-LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
-  we find in ACPI */
 
 LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
@@ -1562,6 +1561,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
+   INIT_LIST_HEAD(_seg->unity_map);
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
if (alloc_dev_table(pci_seg))
@@ -2397,10 +2397,13 @@ static int iommu_init_irq(struct amd_iommu *iommu)
 static void __init free_unity_maps(void)
 {
struct unity_map_entry *entry, *next;
+   struct amd_iommu_pci_seg *p, *pci_seg;
 
-   list_for_each_entry_safe(entry, next, _iommu_unity_map, list) {
-   list_del(>list);
-   kfree(entry);
+   for_each_pci_segment_safe(pci_seg, p) {
+   list_for_each_entry_safe(entry, next, _seg->unity_map, 
list) {
+   list_del(>list);
+   kfree(entry);
+   }
}
 }
 
@@ -2408,8 +2411,13 @@ static void __init free_unity_maps(void)
 static int __init init_unity_map_range(struct ivmd_header *m)
 {
struct unity_map_entry *e = NULL;
+   struct amd_iommu_pci_seg *pci_seg;
char *s;
 
+   pci_seg = get_pci_segment(m->pci_seg);
+   if (pci_seg == NULL)
+   return -ENOMEM;
+
e = kzalloc(sizeof(*e), GFP_KERNEL);
if (e == NULL)
return -ENOMEM;
@@ -2447,14 +2455,16 @@ static int __init init_unity_map_range(struct 
ivmd_header *m)
if (m->flags & IVMD_FLAG_EXCL_RANGE)
e->prot = (IVMD_FLAG_IW | IVMD_FLAG_IR) >> 1;
 
-   DUMP_printk("%s devid_start: %02x:%02x.%x devid_end: %02x:%02x.%x"
-   " range_start: %016llx range_end: %016llx flags: %x\n", s,
+   DUMP_printk("%s devid_start: %04x:%02x:%02x.%x devid_end: "
+   "%04x:%02x:%02x.%x range_start: %016llx range_end: %016llx"
+   " flags: %x\n", s, m->pci_seg,
PCI_BUS_NUM(e->devid_start), PCI_SLOT(e->devid_start),
-   PCI_FUNC(e->devid_start), PCI_BUS_NUM(e->devid_end),
+   PCI_FUNC(e->devid_start), m->pci_seg,
+   PCI_BUS_NUM(e->devid_end),
PCI_SLOT(e->devid_end), PCI_FUNC(e->devid_end),
e->address_start, e->address_end, m->flags);
 
-   list_add_tail(>list, _iommu_unity_map);
+   

[RESEND PATCH v1 08/37] iommu/amd: Introduce per PCI segment alias_table

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This will replace global alias table (amd_iommu_alias_table).

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  7 +
 drivers/iommu/amd/init.c| 41 ++---
 drivers/iommu/amd/iommu.c   | 41 ++---
 3 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 330bb346207a..f9776f188e36 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -572,6 +572,13 @@ struct amd_iommu_pci_seg {
 * will be copied to. It's only be used in kdump kernel.
 */
struct dev_table_entry *old_dev_tbl_cpy;
+
+   /*
+* The alias table is a driver specific data structure which contains 
the
+* mappings of the PCI device ids to the actual requestor ids on the 
IOMMU.
+* More than one device can share the same requestor id.
+*/
+   u16 *alias_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index af413738da01..fe31de6e764c 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -698,6 +698,31 @@ static inline void free_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_seg)
pci_seg->irq_lookup_table = NULL;
 }
 
+static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   int i;
+
+   pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL,
+   
get_order(alias_table_size));
+   if (!pci_seg->alias_table)
+   return -ENOMEM;
+
+   /*
+* let all alias entries point to itself
+*/
+   for (i = 0; i <= amd_iommu_last_bdf; ++i)
+   pci_seg->alias_table[i] = i;
+
+   return 0;
+}
+
+static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->alias_table,
+  get_order(alias_table_size));
+   pci_seg->alias_table = NULL;
+}
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1266,6 +1291,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
u32 dev_i, ext_flags = 0;
bool alias = false;
struct ivhd_entry *e;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
u32 ivhd_size;
int ret;
 
@@ -1347,7 +1373,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
devid_to = e->ext >> 8;
set_dev_entry_from_acpi(iommu, devid   , e->flags, 0);
set_dev_entry_from_acpi(iommu, devid_to, e->flags, 0);
-   amd_iommu_alias_table[devid] = devid_to;
+   pci_seg->alias_table[devid] = devid_to;
break;
case IVHD_DEV_ALIAS_RANGE:
 
@@ -1405,7 +1431,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
devid = e->devid;
for (dev_i = devid_start; dev_i <= devid; ++dev_i) {
if (alias) {
-   amd_iommu_alias_table[dev_i] = devid_to;
+   pci_seg->alias_table[dev_i] = devid_to;
set_dev_entry_from_acpi(iommu,
devid_to, flags, ext_flags);
}
@@ -1540,6 +1566,8 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
if (alloc_dev_table(pci_seg))
return NULL;
+   if (alloc_alias_table(pci_seg))
+   return NULL;
if (alloc_rlookup_table(pci_seg))
return NULL;
 
@@ -1566,6 +1594,7 @@ static void __init free_pci_segment(void)
list_del(_seg->list);
free_irq_lookup_table(pci_seg);
free_rlookup_table(pci_seg);
+   free_alias_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
}
@@ -2838,7 +2867,7 @@ static void __init ivinfo_init(void *ivrs)
 static int __init early_amd_iommu_init(void)
 {
struct acpi_table_header *ivrs_base;
-   int i, remap_cache_sz, ret;
+   int remap_cache_sz, ret;
acpi_status status;
 
if (!amd_iommu_detected)
@@ -2909,12 +2938,6 @@ static int __init early_amd_iommu_init(void)
if (amd_iommu_pd_alloc_bitmap == NULL)
goto out;
 
-   /*
-* let all alias entries point to itself
-*/
-   for (i = 0; i <= amd_iommu_last_bdf; ++i)
-   amd_iommu_alias_table[i] = i;
-
/*
 * never allocate domain 0 because its used as the non-allocated and

[RESEND PATCH v1 07/37] iommu/amd: Introduce per PCI segment old_dev_tbl_cpy

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

It will remove global old_dev_tbl_cpy. Also update copy_device_table()
copy device table for all PCI segments.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |   6 ++
 drivers/iommu/amd/init.c| 109 
 2 files changed, 70 insertions(+), 45 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 7bf35e3a1ed6..330bb346207a 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -566,6 +566,12 @@ struct amd_iommu_pci_seg {
 * device id quickly.
 */
struct irq_remap_table **irq_lookup_table;
+
+   /*
+* Pointer to a device table which the content of old device table
+* will be copied to. It's only be used in kdump kernel.
+*/
+   struct dev_table_entry *old_dev_tbl_cpy;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 27785a558d9c..af413738da01 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -193,11 +193,6 @@ bool amd_iommu_force_isolation __read_mostly;
  * page table root pointer.
  */
 struct dev_table_entry *amd_iommu_dev_table;
-/*
- * Pointer to a device table which the content of old device table
- * will be copied to. It's only be used in kdump kernel.
- */
-static struct dev_table_entry *old_dev_tbl_cpy;
 
 /*
  * The alias table is a driver specific data structure which contains the
@@ -990,39 +985,27 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
 }
 
 
-static bool copy_device_table(void)
+static bool __copy_device_table(struct amd_iommu *iommu)
 {
-   u64 int_ctl, int_tab_len, entry = 0, last_entry = 0;
+   u64 int_ctl, int_tab_len, entry = 0;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
struct dev_table_entry *old_devtb = NULL;
u32 lo, hi, devid, old_devtb_size;
phys_addr_t old_devtb_phys;
-   struct amd_iommu *iommu;
u16 dom_id, dte_v, irq_v;
gfp_t gfp_flag;
u64 tmp;
 
-   if (!amd_iommu_pre_enabled)
-   return false;
-
-   pr_warn("Translation is already enabled - trying to copy translation 
structures\n");
-   for_each_iommu(iommu) {
-   /* All IOMMUs should use the same device table with the same 
size */
-   lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
-   hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
-   entry = (((u64) hi) << 32) + lo;
-   if (last_entry && last_entry != entry) {
-   pr_err("IOMMU:%d should use the same dev table as 
others!\n",
-   iommu->index);
-   return false;
-   }
-   last_entry = entry;
+   /* Each IOMMU use separate device table with the same size */
+   lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
+   hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
+   entry = (((u64) hi) << 32) + lo;
 
-   old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
-   if (old_devtb_size != dev_table_size) {
-   pr_err("The device table size of IOMMU:%d is not 
expected!\n",
-   iommu->index);
-   return false;
-   }
+   old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
+   if (old_devtb_size != dev_table_size) {
+   pr_err("The device table size of IOMMU:%d is not expected!\n",
+   iommu->index);
+   return false;
}
 
/*
@@ -1045,31 +1028,31 @@ static bool copy_device_table(void)
return false;
 
gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
-   old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
-   get_order(dev_table_size));
-   if (old_dev_tbl_cpy == NULL) {
+   pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
+   get_order(dev_table_size));
+   if (pci_seg->old_dev_tbl_cpy == NULL) {
pr_err("Failed to allocate memory for copying old device 
table!\n");
memunmap(old_devtb);
return false;
}
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
-   old_dev_tbl_cpy[devid] = old_devtb[devid];
+   pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
 
if (dte_v && dom_id) {
-   old_dev_tbl_cpy[devid].data[0] = 
old_devtb[devid].data[0];
-   old_dev_tbl_cpy[devid].data[1] = 
old_devtb[devid].data[1];
+   

[RESEND PATCH v1 06/37] iommu/amd: Introduce per PCI segment dev_data_list

2022-04-04 Thread Vasant Hegde via iommu
This will replace global dev_data_list.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c|  1 +
 drivers/iommu/amd/iommu.c   | 21 ++---
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index d507c96598a7..7bf35e3a1ed6 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -538,6 +538,9 @@ struct protection_domain {
 struct amd_iommu_pci_seg {
struct list_head list;
 
+   /* List of all available dev_data structures */
+   struct llist_head dev_data_list;
+
/* PCI segment number */
u16 id;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 48db6c3164aa..27785a558d9c 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1525,6 +1525,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
return NULL;
 
pci_seg->id = id;
+   init_llist_head(_seg->dev_data_list);
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
if (alloc_dev_table(pci_seg))
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a8baa64c8f9c..2bea72f388b2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -62,9 +62,6 @@
 
 static DEFINE_SPINLOCK(pd_bitmap_lock);
 
-/* List of all available dev_data structures */
-static LLIST_HEAD(dev_data_list);
-
 LIST_HEAD(ioapic_map);
 LIST_HEAD(hpet_map);
 LIST_HEAD(acpihid_map);
@@ -195,9 +192,10 @@ static struct protection_domain *to_pdomain(struct 
iommu_domain *dom)
return container_of(dom, struct protection_domain, domain);
 }
 
-static struct iommu_dev_data *alloc_dev_data(u16 devid)
+static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 
devid)
 {
struct iommu_dev_data *dev_data;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
dev_data = kzalloc(sizeof(*dev_data), GFP_KERNEL);
if (!dev_data)
@@ -207,19 +205,20 @@ static struct iommu_dev_data *alloc_dev_data(u16 devid)
dev_data->devid = devid;
ratelimit_default_init(_data->rs);
 
-   llist_add(_data->dev_data_list, _data_list);
+   llist_add(_data->dev_data_list, _seg->dev_data_list);
return dev_data;
 }
 
-static struct iommu_dev_data *search_dev_data(u16 devid)
+static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 
devid)
 {
struct iommu_dev_data *dev_data;
struct llist_node *node;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
-   if (llist_empty(_data_list))
+   if (llist_empty(_seg->dev_data_list))
return NULL;
 
-   node = dev_data_list.first;
+   node = pci_seg->dev_data_list.first;
llist_for_each_entry(dev_data, node, dev_data_list) {
if (dev_data->devid == devid)
return dev_data;
@@ -287,10 +286,10 @@ static struct iommu_dev_data *find_dev_data(u16 devid)
struct iommu_dev_data *dev_data;
struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
-   dev_data = search_dev_data(devid);
+   dev_data = search_dev_data(iommu, devid);
 
if (dev_data == NULL) {
-   dev_data = alloc_dev_data(devid);
+   dev_data = alloc_dev_data(iommu, devid);
if (!dev_data)
return NULL;
 
@@ -3461,7 +3460,7 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data;
struct amd_ir_data *ir_data = data->chip_data;
struct irq_2_irte *irte_info = _data->irq_2_irte;
-   struct iommu_dev_data *dev_data = search_dev_data(irte_info->devid);
+   struct iommu_dev_data *dev_data = search_dev_data(NULL, 
irte_info->devid);
 
/* Note:
 * This device has never been set up for guest mode.
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 05/37] iommu/amd: Introduce per PCI segment irq_lookup_table

2022-04-04 Thread Vasant Hegde via iommu
This will replace global irq lookup table (irq_lookup_table).

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 ++
 drivers/iommu/amd/init.c| 27 +++
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 9c008662be1b..d507c96598a7 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -557,6 +557,12 @@ struct amd_iommu_pci_seg {
 * device id.
 */
struct amd_iommu **rlookup_table;
+
+   /*
+* This table is used to find the irq remapping table for a given
+* device id quickly.
+*/
+   struct irq_remap_table **irq_lookup_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index a2efc02ba80a..48db6c3164aa 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -682,6 +682,26 @@ static inline void free_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
pci_seg->rlookup_table = NULL;
 }
 
+static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg 
*pci_seg)
+{
+   pci_seg->irq_lookup_table = (void *)__get_free_pages(
+GFP_KERNEL | __GFP_ZERO,
+get_order(rlookup_table_size));
+   kmemleak_alloc(pci_seg->irq_lookup_table,
+  rlookup_table_size, 1, GFP_KERNEL);
+   if (pci_seg->irq_lookup_table == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   kmemleak_free(pci_seg->irq_lookup_table);
+   free_pages((unsigned long)pci_seg->irq_lookup_table,
+  get_order(rlookup_table_size));
+   pci_seg->irq_lookup_table = NULL;
+}
 
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
@@ -1533,6 +1553,7 @@ static void __init free_pci_segment(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_irq_lookup_table(pci_seg);
free_rlookup_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
@@ -2896,6 +2917,7 @@ static int __init early_amd_iommu_init(void)
amd_iommu_irq_remap = check_ioapic_information();
 
if (amd_iommu_irq_remap) {
+   struct amd_iommu_pci_seg *pci_seg;
/*
 * Interrupt remapping enabled, create kmem_cache for the
 * remapping tables.
@@ -2912,6 +2934,11 @@ static int __init early_amd_iommu_init(void)
if (!amd_iommu_irq_cache)
goto out;
 
+   for_each_pci_segment(pci_seg) {
+   if (alloc_irq_lookup_table(pci_seg))
+   goto out;
+   }
+
irq_lookup_table = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
get_order(rlookup_table_size));
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 04/37] iommu/amd: Introduce per PCI segment rlookup table

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This will replace global rlookup table (amd_iommu_rlookup_table).
Also add helper functions to set/get rlookup table for the given device.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h |  8 ++
 drivers/iommu/amd/init.c| 23 +++
 drivers/iommu/amd/iommu.c   | 44 +
 4 files changed, 76 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 885570cd0d77..2947239700ce 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -19,6 +19,7 @@ extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
 extern void amd_iommu_init_notifier(void);
 extern int amd_iommu_init_api(void);
+extern void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
 void amd_iommu_debugfs_setup(struct amd_iommu *iommu);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 404feb7995cc..9c008662be1b 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -486,6 +486,7 @@ struct amd_iommu_fault {
 };
 
 
+struct amd_iommu;
 struct iommu_domain;
 struct irq_domain;
 struct amd_irte_ops;
@@ -549,6 +550,13 @@ struct amd_iommu_pci_seg {
 * page table root pointer.
 */
struct dev_table_entry *dev_table;
+
+   /*
+* The rlookup iommu table is used to find the IOMMU which is
+* responsible for a specific device. It is indexed by the PCI
+* device id.
+*/
+   struct amd_iommu **rlookup_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 0fd1071bfc85..a2efc02ba80a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -663,6 +663,26 @@ static inline void free_dev_table(struct amd_iommu_pci_seg 
*pci_seg)
pci_seg->dev_table = NULL;
 }
 
+/* Allocate per PCI segment IOMMU rlookup table. */
+static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   pci_seg->rlookup_table = (void *)__get_free_pages(
+   GFP_KERNEL | __GFP_ZERO,
+   get_order(rlookup_table_size));
+   if (pci_seg->rlookup_table == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->rlookup_table,
+  get_order(rlookup_table_size));
+   pci_seg->rlookup_table = NULL;
+}
+
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1489,6 +1509,8 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
if (alloc_dev_table(pci_seg))
return NULL;
+   if (alloc_rlookup_table(pci_seg))
+   return NULL;
 
return pci_seg;
 }
@@ -1511,6 +1533,7 @@ static void __init free_pci_segment(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_rlookup_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
}
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 52058c0d4f62..a8baa64c8f9c 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -146,6 +146,50 @@ struct dev_table_entry *get_dev_table(struct amd_iommu 
*iommu)
return dev_table;
 }
 
+static inline u16 get_device_segment(struct device *dev)
+{
+   u16 seg;
+
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   seg = pci_domain_nr(pdev->bus);
+   } else {
+   u32 devid = get_acpihid_device_id(dev, NULL);
+
+   seg = (devid >> 16) & 0x;
+   }
+
+   return seg;
+}
+
+/* Writes the specific IOMMU for a device into the PCI segment rlookup table */
+void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid)
+{
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   pci_seg->rlookup_table[devid] = iommu;
+}
+
+static struct amd_iommu *__rlookup_amd_iommu(u16 seg, u16 devid)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg) {
+   if (pci_seg->id == seg)
+   return pci_seg->rlookup_table[devid];
+   }
+   return NULL;
+}
+
+static struct amd_iommu *rlookup_amd_iommu(struct device *dev)
+{
+   u16 seg = get_device_segment(dev);
+   u16 devid = get_device_id(dev);
+
+   return __rlookup_amd_iommu(seg, devid);
+}
+
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
 {
return 

[RESEND PATCH v1 03/37] iommu/amd: Introduce per PCI segment device table

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Introduce per PCI segment device table. All IOMMUs within the segment
will share this device table. This will replace global device
table i.e. amd_iommu_dev_table.

Also introduce helper function to get the device table for the given IOMMU.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h | 10 ++
 drivers/iommu/amd/init.c| 26 --
 drivers/iommu/amd/iommu.c   | 12 
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 1ab31074f5b3..885570cd0d77 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -128,4 +128,5 @@ static inline void amd_iommu_apply_ivrs_quirks(void) { }
 
 extern void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 u64 *root, int mode);
+extern struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
 #endif
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 62442d88978f..404feb7995cc 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -539,6 +539,16 @@ struct amd_iommu_pci_seg {
 
/* PCI segment number */
u16 id;
+
+   /*
+* device table virtual address
+*
+* Pointer to the per PCI segment device table.
+* It is indexed by the PCI device id or the HT unit id and contains
+* information about the domain the device belongs to as well as the
+* page table root pointer.
+*/
+   struct dev_table_entry *dev_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index e01eae9dcbc1..0fd1071bfc85 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -640,11 +640,29 @@ static int __init find_last_devid_acpi(struct 
acpi_table_header *table)
  *
  * The following functions belong to the code path which parses the ACPI table
  * the second time. In this ACPI parsing iteration we allocate IOMMU specific
- * data structures, initialize the device/alias/rlookup table and also
- * basically initialize the hardware.
+ * data structures, initialize the per PCI segment device/alias/rlookup table
+ * and also basically initialize the hardware.
  *
  /
 
+/* Allocate per PCI segment device table */
+static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | 
GFP_DMA32,
+ 
get_order(dev_table_size));
+   if (!pci_seg->dev_table)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->dev_table,
+   get_order(dev_table_size));
+   pci_seg->dev_table = NULL;
+}
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1469,6 +1487,9 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
pci_seg->id = id;
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
+   if (alloc_dev_table(pci_seg))
+   return NULL;
+
return pci_seg;
 }
 
@@ -1490,6 +1511,7 @@ static void __init free_pci_segment(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_dev_table(pci_seg);
kfree(pci_seg);
}
 }
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7a29e2645dc4..52058c0d4f62 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -134,6 +134,18 @@ static inline int get_device_id(struct device *dev)
return devid;
 }
 
+struct dev_table_entry *get_dev_table(struct amd_iommu *iommu)
+{
+   struct dev_table_entry *dev_table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   BUG_ON(pci_seg == NULL);
+   dev_table = pci_seg->dev_table;
+   BUG_ON(dev_table == NULL);
+
+   return dev_table;
+}
+
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
 {
return container_of(dom, struct protection_domain, domain);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 02/37] iommu/amd: Introduce pci segment structure

2022-04-04 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments, where each segment
contains one or more IOMMU instances. However, an IOMMU instance can only
support a single PCI segment.

Current code assumes that system contains only one pci segment (segment 0)
and creates global data structures such as device table, rlookup table,
etc.

Introducing per PCI segment data structure, which contains segment
specific data structures. This will eventually replace the global
data structures.

Also update `amd_iommu->pci_seg` variable to point to PCI segment
structure instead of PCI segment ID.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 23 ++-
 drivers/iommu/amd/init.c| 46 -
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 06235b7cb13d..62442d88978f 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -452,6 +452,11 @@ extern bool amd_iommu_irq_remap;
 /* kmem_cache to get tables with 128 byte alignement */
 extern struct kmem_cache *amd_iommu_irq_cache;
 
+/* Make iterating over all pci segment easier */
+#define for_each_pci_segment(pci_seg) \
+   list_for_each_entry((pci_seg), _iommu_pci_seg_list, list)
+#define for_each_pci_segment_safe(pci_seg, next) \
+   list_for_each_entry_safe((pci_seg), (next), _iommu_pci_seg_list, 
list)
 /*
  * Make iterating over all IOMMUs easier
  */
@@ -526,6 +531,16 @@ struct protection_domain {
unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
 };
 
+/*
+ * This structure contains information about one PCI segment in the system.
+ */
+struct amd_iommu_pci_seg {
+   struct list_head list;
+
+   /* PCI segment number */
+   u16 id;
+};
+
 /*
  * Structure where we save information about one hardware AMD IOMMU in the
  * system.
@@ -577,7 +592,7 @@ struct amd_iommu {
u16 cap_ptr;
 
/* pci domain of this IOMMU */
-   u16 pci_seg;
+   struct amd_iommu_pci_seg *pci_seg;
 
/* start of exclusion range of that IOMMU */
u64 exclusion_start;
@@ -705,6 +720,12 @@ extern struct list_head ioapic_map;
 extern struct list_head hpet_map;
 extern struct list_head acpihid_map;
 
+/*
+ * List with all PCI segments in the system. This list is not locked because
+ * it is only written at driver initialization time
+ */
+extern struct list_head amd_iommu_pci_seg_list;
+
 /*
  * List with all IOMMUs in the system. This list is not locked because it is
  * only written and read at driver initialization or suspend time
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b4a798c7b347..e01eae9dcbc1 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -165,6 +165,7 @@ u16 amd_iommu_last_bdf; /* largest PCI 
device id we have
 LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
   we find in ACPI */
 
+LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
   system */
 
@@ -1456,6 +1457,43 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
return 0;
 }
 
+/* Allocate PCI segment data structure */
+static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   pci_seg = kzalloc(sizeof(struct amd_iommu_pci_seg), GFP_KERNEL);
+   if (pci_seg == NULL)
+   return NULL;
+
+   pci_seg->id = id;
+   list_add_tail(_seg->list, _iommu_pci_seg_list);
+
+   return pci_seg;
+}
+
+static struct amd_iommu_pci_seg *__init get_pci_segment(u16 id)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg) {
+   if (pci_seg->id == id)
+   return pci_seg;
+   }
+
+   return alloc_pci_segment(id);
+}
+
+static void __init free_pci_segment(void)
+{
+   struct amd_iommu_pci_seg *pci_seg, *next;
+
+   for_each_pci_segment_safe(pci_seg, next) {
+   list_del(_seg->list);
+   kfree(pci_seg);
+   }
+}
+
 static void __init free_iommu_one(struct amd_iommu *iommu)
 {
free_cwwb_sem(iommu);
@@ -1542,8 +1580,14 @@ static void amd_iommu_ats_write_check_workaround(struct 
amd_iommu *iommu)
  */
 static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header 
*h)
 {
+   struct amd_iommu_pci_seg *pci_seg;
int ret;
 
+   pci_seg = get_pci_segment(h->pci_seg);
+   if (pci_seg == NULL)
+   return -ENOMEM;
+   iommu->pci_seg = pci_seg;
+
raw_spin_lock_init(>lock);
iommu->cmd_sem_val = 0;
 
@@ -1564,7 +1608,6 @@ static int __init 

[RESEND PATCH v1 01/37] iommu/amd: Update struct iommu_dev_data defination

2022-04-04 Thread Vasant Hegde via iommu
struct iommu_dev_data contains member "pdev" to point to pci_dev. This is
valid for only PCI devices and for other devices this will be NULL. This
causes unnecessary "pdev != NULL" check at various places.

Replace "struct pci_dev" member with "struct device" and use
to_pci_dev() to get pci device reference as needed. Also adjust
setup_aliases() and clone_aliases() function.

No functional change intended.

Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 +-
 drivers/iommu/amd/iommu.c   | 27 +++
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 47108ed44fbb..06235b7cb13d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -685,7 +685,7 @@ struct iommu_dev_data {
struct list_head list;/* For domain->dev_list */
struct llist_node dev_data_list;  /* For global dev_data_list */
struct protection_domain *domain; /* Domain the device is bound to */
-   struct pci_dev *pdev;
+   struct device *dev;
u16 devid;/* PCI Device ID */
bool iommu_v2;/* Device can make use of IOMMUv2 */
struct {
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e..7a29e2645dc4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -188,8 +188,10 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, 
void *data)
return 0;
 }
 
-static void clone_aliases(struct pci_dev *pdev)
+static void clone_aliases(struct device *dev)
 {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
if (!pdev)
return;
 
@@ -203,14 +205,14 @@ static void clone_aliases(struct pci_dev *pdev)
pci_for_each_dma_alias(pdev, clone_alias, NULL);
 }
 
-static struct pci_dev *setup_aliases(struct device *dev)
+static void setup_aliases(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
u16 ivrs_alias;
 
/* For ACPI HID devices, there are no aliases */
if (!dev_is_pci(dev))
-   return NULL;
+   return;
 
/*
 * Add the IVRS alias to the pci aliases if it is on the same
@@ -221,9 +223,7 @@ static struct pci_dev *setup_aliases(struct device *dev)
PCI_BUS_NUM(ivrs_alias) == pdev->bus->number)
pci_add_dma_alias(pdev, ivrs_alias & 0xff, 1);
 
-   clone_aliases(pdev);
-
-   return pdev;
+   clone_aliases(dev);
 }
 
 static struct iommu_dev_data *find_dev_data(u16 devid)
@@ -331,7 +331,8 @@ static int iommu_init_device(struct device *dev)
if (!dev_data)
return -ENOMEM;
 
-   dev_data->pdev = setup_aliases(dev);
+   dev_data->dev = dev;
+   setup_aliases(dev);
 
/*
 * By default we use passthrough mode for IOMMUv2 capable device.
@@ -1232,13 +1233,15 @@ static int device_flush_dte_alias(struct pci_dev *pdev, 
u16 alias, void *data)
 static int device_flush_dte(struct iommu_dev_data *dev_data)
 {
struct amd_iommu *iommu;
+   struct pci_dev *pdev;
u16 alias;
int ret;
 
iommu = amd_iommu_rlookup_table[dev_data->devid];
 
-   if (dev_data->pdev)
-   ret = pci_for_each_dma_alias(dev_data->pdev,
+   pdev = to_pci_dev(dev_data->dev);
+   if (pdev)
+   ret = pci_for_each_dma_alias(pdev,
 device_flush_dte_alias, iommu);
else
ret = iommu_flush_dte(iommu, dev_data->devid);
@@ -1561,7 +1564,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
/* Update device table */
set_dte_entry(dev_data->devid, domain,
  ats, dev_data->iommu_v2);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
 
device_flush_dte(dev_data);
 }
@@ -1577,7 +1580,7 @@ static void do_detach(struct iommu_dev_data *dev_data)
dev_data->domain = NULL;
list_del(_data->list);
clear_dte_entry(dev_data->devid);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
 
/* Flush the DTE entry */
device_flush_dte(dev_data);
@@ -1818,7 +1821,7 @@ static void update_device_table(struct protection_domain 
*domain)
list_for_each_entry(dev_data, >dev_list, list) {
set_dte_entry(dev_data->devid, domain,
  dev_data->ats.enabled, dev_data->iommu_v2);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
}
 }
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH v1 00/37] iommu/amd: Add multiple PCI segments support

2022-04-04 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments, where each segment
contains one or more IOMMU instances. However, an IOMMU instance can only
support a single PCI segment.

Current code assumes a system contains only one PCI segment (segment 0)
and creates global data structures such as device table, rlookup table,
etc.

This series introduces per-PCI-segment data structure, which contains
device table, alias table, etc. For each PCI segment, all IOMMUs
share the same data structure. The series also makes necessary code
adjustment and logging enhancements. Finally it removes global data
structures like device table, alias table, etc.

In case of system w/ single PCI segment (e.g. PCI segment ID is zero),
IOMMU driver allocates one PCI segment data structure, which will
be shared by all IOMMUs.

Patch 1 Updates struct iommu_dev_data defination.

Patch 2 - 13 introduce  new PCI segment structure and allocate per
data structures, and introduce the amd_iommu.pci_seg pointer to point
to the corresponded pci_segment structure. Also, we have introduced
a helper function rlookup_amd_iommu() to reverse-lookup each iommu
for a particular device.

Patch 14 - 29 adopt to per PCI segment data structure and removes
global data structure.

Patch 30 fixes flushing logic to flush upto last_bdf.

Patch 31 - 37 convert usages of 16-bit PCI device ID to include
16-bit segment ID.


RFC patchset : 
https://lore.kernel.org/linux-iommu/20220311094854.31595-1-vasant.he...@amd.com/T/#t

Changes in RFC -> v1:
  - Rebased patches on top of iommu/next tree.
  - Update struct iommu_dev_data defination
  - Updated few log message to print segment ID
  - Fix smatch warnings


Regards,
Vasant


Suravee Suthikulpanit (21):
  iommu/amd: Introduce per PCI segment device table
  iommu/amd: Introduce per PCI segment rlookup table
  iommu/amd: Introduce per PCI segment old_dev_tbl_cpy
  iommu/amd: Introduce per PCI segment alias_table
  iommu/amd: Convert to use rlookup_amd_iommu helper function
  iommu/amd: Update irq_remapping_alloc to use IOMMU lookup helper function
  iommu/amd: Introduce struct amd_ir_data.iommu
  iommu/amd: Update amd_irte_ops functions
  iommu/amd: Update alloc_irq_table and alloc_irq_index
  iommu/amd: Update set_dte_entry and clear_dte_entry
  iommu/amd: Update iommu_ignore_device
  iommu/amd: Update dump_dte_entry
  iommu/amd: Update set_dte_irq_entry
  iommu/amd: Update (un)init_device_table_dma()
  iommu/amd: Update set_dev_entry_bit() and get_dev_entry_bit()
  iommu/amd: Remove global amd_iommu_dev_table
  iommu/amd: Remove global amd_iommu_alias_table
  iommu/amd: Introduce get_device_sbdf_id() helper function
  iommu/amd: Include PCI segment ID when initialize IOMMU
  iommu/amd: Specify PCI segment ID when getting pci device
  iommu/amd: Add PCI segment support for ivrs_ioapic, ivrs_hpet, ivrs_acpihid 
commands

Vasant Hegde (16):
  iommu/amd: Update struct iommu_dev_data defination
  iommu/amd: Introduce pci segment structure
  iommu/amd: Introduce per PCI segment irq_lookup_table
  iommu/amd: Introduce per PCI segment dev_data_list
  iommu/amd: Introduce per PCI segment unity map list
  iommu/amd: Introduce per PCI segment last_bdf
  iommu/amd: Introduce per PCI segment device table size
  iommu/amd: Introduce per PCI segment alias table size
  iommu/amd: Introduce per PCI segment rlookup table size
  iommu/amd: Convert to use per PCI segment irq_lookup_table
  iommu/amd: Convert to use per PCI segment rlookup_table
  iommu/amd: Remove global amd_iommu_last_bdf
  iommu/amd: Flush upto last_bdf only
  iommu/amd: Print PCI segment ID in error log messages
  iommu/amd: Update device_state structure to include PCI seg ID
  iommu/amd: Update amd_iommu_fault structure to include PCI seg ID

 .../admin-guide/kernel-parameters.txt |  34 +-
 drivers/iommu/amd/amd_iommu.h |  13 +-
 drivers/iommu/amd/amd_iommu_types.h   | 127 +++-
 drivers/iommu/amd/init.c  | 683 +++---
 drivers/iommu/amd/iommu.c | 540 --
 drivers/iommu/amd/iommu_v2.c  |  67 +-
 drivers/iommu/amd/quirks.c|   4 +-
 7 files changed, 884 insertions(+), 584 deletions(-)

-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 15/37] iommu/amd: Convert to use rlookup_amd_iommu helper function

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Use rlookup_amd_iommu() helper function which will give per PCI
segment rlookup_table.

Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/iommu.c | 64 +++
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 2c18f45fc13d..cc200bfaa8c4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -229,13 +229,17 @@ static struct iommu_dev_data *search_dev_data(struct 
amd_iommu *iommu, u16 devid
 
 static int clone_alias(struct pci_dev *pdev, u16 alias, void *data)
 {
+   struct amd_iommu *iommu;
u16 devid = pci_dev_id(pdev);
 
if (devid == alias)
return 0;
 
-   amd_iommu_rlookup_table[alias] =
-   amd_iommu_rlookup_table[devid];
+   iommu = rlookup_amd_iommu(>dev);
+   if (!iommu)
+   return 0;
+
+   amd_iommu_set_rlookup_table(iommu, alias);
memcpy(amd_iommu_dev_table[alias].data,
   amd_iommu_dev_table[devid].data,
   sizeof(amd_iommu_dev_table[alias].data));
@@ -365,7 +369,7 @@ static bool check_device(struct device *dev)
if (devid > amd_iommu_last_bdf)
return false;
 
-   if (amd_iommu_rlookup_table[devid] == NULL)
+   if (rlookup_amd_iommu(dev) == NULL)
return false;
 
return true;
@@ -1269,7 +1273,9 @@ static int device_flush_iotlb(struct iommu_dev_data 
*dev_data,
int qdep;
 
qdep = dev_data->ats.qdep;
-   iommu= amd_iommu_rlookup_table[dev_data->devid];
+   iommu= rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return -EINVAL;
 
build_inv_iotlb_pages(, dev_data->devid, qdep, address, size);
 
@@ -1294,7 +1300,9 @@ static int device_flush_dte(struct iommu_dev_data 
*dev_data)
u16 alias;
int ret;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return -EINVAL;
 
pdev = to_pci_dev(dev_data->dev);
if (pdev)
@@ -1522,8 +1530,8 @@ static void free_gcr3_table(struct protection_domain 
*domain)
free_page((unsigned long)domain->gcr3_tbl);
 }
 
-static void set_dte_entry(u16 devid, struct protection_domain *domain,
- bool ats, bool ppr)
+static void set_dte_entry(struct amd_iommu *iommu, u16 devid,
+ struct protection_domain *domain, bool ats, bool ppr)
 {
u64 pte_root = 0;
u64 flags = 0;
@@ -1542,8 +1550,6 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
flags |= DTE_FLAG_IOTLB;
 
if (ppr) {
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
if (iommu_feature(iommu, FEATURE_EPHSUP))
pte_root |= 1ULL << DEV_ENTRY_PPR;
}
@@ -1587,8 +1593,6 @@ static void set_dte_entry(u16 devid, struct 
protection_domain *domain,
 * entries for the old domain ID that is being overwritten
 */
if (old_domid) {
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
amd_iommu_flush_tlb_domid(iommu, old_domid);
}
 }
@@ -1608,7 +1612,9 @@ static void do_attach(struct iommu_dev_data *dev_data,
struct amd_iommu *iommu;
bool ats;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return;
ats   = dev_data->ats.enabled;
 
/* Update data structures */
@@ -1620,7 +1626,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
domain->dev_cnt += 1;
 
/* Update device table */
-   set_dte_entry(dev_data->devid, domain,
+   set_dte_entry(iommu, dev_data->devid, domain,
  ats, dev_data->iommu_v2);
clone_aliases(iommu, dev_data->dev);
 
@@ -1632,7 +1638,9 @@ static void do_detach(struct iommu_dev_data *dev_data)
struct protection_domain *domain = dev_data->domain;
struct amd_iommu *iommu;
 
-   iommu = amd_iommu_rlookup_table[dev_data->devid];
+   iommu = rlookup_amd_iommu(dev_data->dev);
+   if (!iommu)
+   return;
 
/* Update data structures */
dev_data->domain = NULL;
@@ -1810,13 +1818,14 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
 {
struct iommu_device *iommu_dev;
struct amd_iommu *iommu;
-   int ret, devid;
+   int ret;
 
if (!check_device(dev))
return ERR_PTR(-ENODEV);
 
-   devid = get_device_id(dev);
-   iommu = amd_iommu_rlookup_table[devid];
+   iommu = rlookup_amd_iommu(dev);
+   if (!iommu)
+   return ERR_PTR(-ENODEV);
 
if (dev_iommu_priv_get(dev))

[PATCH v1 14/37] iommu/amd: Convert to use per PCI segment irq_lookup_table

2022-04-04 Thread Vasant Hegde via iommu
Then, remove the global irq_lookup_table.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 --
 drivers/iommu/amd/init.c| 19 ---
 drivers/iommu/amd/iommu.c   | 36 ++---
 3 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 6f1900fa86d2..badf49d2371c 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -444,8 +444,6 @@ struct irq_remap_table {
u32 *table;
 };
 
-extern struct irq_remap_table **irq_lookup_table;
-
 /* Interrupt remapping feature used? */
 extern bool amd_iommu_irq_remap;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1688532dffb8..29ed687bc43f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -206,12 +206,6 @@ u16 *amd_iommu_alias_table;
  */
 struct amd_iommu **amd_iommu_rlookup_table;
 
-/*
- * This table is used to find the irq remapping table for a given device id
- * quickly.
- */
-struct irq_remap_table **irq_lookup_table;
-
 /*
  * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
  * to know which ones are already in use.
@@ -2786,11 +2780,6 @@ static struct syscore_ops amd_iommu_syscore_ops = {
 
 static void __init free_iommu_resources(void)
 {
-   kmemleak_free(irq_lookup_table);
-   free_pages((unsigned long)irq_lookup_table,
-  get_order(rlookup_table_size));
-   irq_lookup_table = NULL;
-
kmem_cache_destroy(amd_iommu_irq_cache);
amd_iommu_irq_cache = NULL;
 
@@ -3011,14 +3000,6 @@ static int __init early_amd_iommu_init(void)
if (alloc_irq_lookup_table(pci_seg))
goto out;
}
-
-   irq_lookup_table = (void *)__get_free_pages(
-   GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
-   kmemleak_alloc(irq_lookup_table, rlookup_table_size,
-  1, GFP_KERNEL);
-   if (!irq_lookup_table)
-   goto out;
}
 
ret = init_memory_definitions(ivrs_base);
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 97cae067cbb4..2c18f45fc13d 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2727,16 +2727,18 @@ static void set_dte_irq_entry(u16 devid, struct 
irq_remap_table *table)
amd_iommu_dev_table[devid].data[2] = dte;
 }
 
-static struct irq_remap_table *get_irq_table(u16 devid)
+static struct irq_remap_table *get_irq_table(struct amd_iommu *iommu, u16 
devid)
 {
struct irq_remap_table *table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
if (WARN_ONCE(!amd_iommu_rlookup_table[devid],
  "%s: no iommu for devid %x\n", __func__, devid))
return NULL;
 
-   table = irq_lookup_table[devid];
-   if (WARN_ONCE(!table, "%s: no table for devid %x\n", __func__, devid))
+   table = pci_seg->irq_lookup_table[devid];
+   if (WARN_ONCE(!table, "%s: no table for devid %x:%x\n",
+ __func__, pci_seg->id, devid))
return NULL;
 
return table;
@@ -2769,7 +2771,9 @@ static struct irq_remap_table *__alloc_irq_table(void)
 static void set_remap_table_entry(struct amd_iommu *iommu, u16 devid,
  struct irq_remap_table *table)
 {
-   irq_lookup_table[devid] = table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   pci_seg->irq_lookup_table[devid] = table;
set_dte_irq_entry(devid, table);
iommu_flush_dte(iommu, devid);
 }
@@ -2778,8 +2782,14 @@ static int set_remap_table_entry_alias(struct pci_dev 
*pdev, u16 alias,
   void *data)
 {
struct irq_remap_table *table = data;
+   struct amd_iommu_pci_seg *pci_seg;
+   struct amd_iommu *iommu = rlookup_amd_iommu(>dev);
 
-   irq_lookup_table[alias] = table;
+   if (!iommu)
+   return -EINVAL;
+
+   pci_seg = iommu->pci_seg;
+   pci_seg->irq_lookup_table[alias] = table;
set_dte_irq_entry(alias, table);
 
iommu_flush_dte(amd_iommu_rlookup_table[alias], alias);
@@ -2803,12 +2813,12 @@ static struct irq_remap_table *alloc_irq_table(u16 
devid, struct pci_dev *pdev)
goto out_unlock;
 
pci_seg = iommu->pci_seg;
-   table = irq_lookup_table[devid];
+   table = pci_seg->irq_lookup_table[devid];
if (table)
goto out_unlock;
 
alias = pci_seg->alias_table[devid];
-   table = irq_lookup_table[alias];
+   table = pci_seg->irq_lookup_table[alias];
if (table) {
set_remap_table_entry(iommu, devid, table);
  

[PATCH v1 13/37] iommu/amd: Introduce per PCI segment rlookup table size

2022-04-04 Thread Vasant Hegde via iommu
It will replace global "rlookup_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 11 ++-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 4bed64ad2068..6f1900fa86d2 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -553,6 +553,9 @@ struct amd_iommu_pci_seg {
/* Size of the alias table */
u32 alias_table_size;
 
+   /* Size of the rlookup table */
+   u32 rlookup_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d4e4f49066f8..1688532dffb8 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -671,7 +671,7 @@ static inline int __init alloc_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 {
pci_seg->rlookup_table = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
-   get_order(rlookup_table_size));
+   
get_order(pci_seg->rlookup_table_size));
if (pci_seg->rlookup_table == NULL)
return -ENOMEM;
 
@@ -681,7 +681,7 @@ static inline int __init alloc_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->rlookup_table,
-  get_order(rlookup_table_size));
+  get_order(pci_seg->rlookup_table_size));
pci_seg->rlookup_table = NULL;
 }
 
@@ -689,9 +689,9 @@ static inline int __init alloc_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_se
 {
pci_seg->irq_lookup_table = (void *)__get_free_pages(
 GFP_KERNEL | __GFP_ZERO,
-get_order(rlookup_table_size));
+
get_order(pci_seg->rlookup_table_size));
kmemleak_alloc(pci_seg->irq_lookup_table,
-  rlookup_table_size, 1, GFP_KERNEL);
+  pci_seg->rlookup_table_size, 1, GFP_KERNEL);
if (pci_seg->irq_lookup_table == NULL)
return -ENOMEM;
 
@@ -702,7 +702,7 @@ static inline void free_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_seg)
 {
kmemleak_free(pci_seg->irq_lookup_table);
free_pages((unsigned long)pci_seg->irq_lookup_table,
-  get_order(rlookup_table_size));
+  get_order(pci_seg->rlookup_table_size));
pci_seg->irq_lookup_table = NULL;
 }
 
@@ -1583,6 +1583,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
+   pci_seg->rlookup_table_size = tbl_size(RLOOKUP_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 12/37] iommu/amd: Introduce per PCI segment alias table size

2022-04-04 Thread Vasant Hegde via iommu
It will replace global "alias_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 3 +++
 drivers/iommu/amd/init.c| 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index aa666d0723ba..4bed64ad2068 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -550,6 +550,9 @@ struct amd_iommu_pci_seg {
/* Size of the device table */
u32 dev_table_size;
 
+   /* Size of the alias table */
+   u32 alias_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index f8da686182b5..d4e4f49066f8 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -711,7 +711,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
int i;
 
pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL,
-   
get_order(alias_table_size));
+   get_order(pci_seg->alias_table_size));
if (!pci_seg->alias_table)
return -ENOMEM;
 
@@ -727,7 +727,7 @@ static int __init alloc_alias_table(struct 
amd_iommu_pci_seg *pci_seg)
 static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->alias_table,
-  get_order(alias_table_size));
+  get_order(pci_seg->alias_table_size));
pci_seg->alias_table = NULL;
 }
 
@@ -1582,6 +1582,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
+   pci_seg->alias_table_size   = tbl_size(ALIAS_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 11/37] iommu/amd: Introduce per PCI segment device table size

2022-04-04 Thread Vasant Hegde via iommu
With multiple pci segment support, number of BDF supported by each
segment may differ. Hence introduce per segment device table size
which depends on last_bdf. This will replace global
"device_table_size" variable.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c| 18 ++
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index e39e7db54e69..aa666d0723ba 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -547,6 +547,9 @@ struct amd_iommu_pci_seg {
/* Largest PCI device id we expect translation requests for */
u16 last_bdf;
 
+   /* Size of the device table */
+   u32 dev_table_size;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 71f39551a83a..f8da686182b5 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -414,6 +414,7 @@ static void iommu_set_cwwb_range(struct amd_iommu *iommu)
 static void iommu_set_device_table(struct amd_iommu *iommu)
 {
u64 entry;
+   u32 dev_table_size = iommu->pci_seg->dev_table_size;
 
BUG_ON(iommu->mmio_base == NULL);
 
@@ -651,7 +652,7 @@ static int __init find_last_devid_acpi(struct 
acpi_table_header *table, u16 pci_
 static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | 
GFP_DMA32,
- 
get_order(dev_table_size));
+ 
get_order(pci_seg->dev_table_size));
if (!pci_seg->dev_table)
return -ENOMEM;
 
@@ -661,7 +662,7 @@ static inline int __init alloc_dev_table(struct 
amd_iommu_pci_seg *pci_seg)
 static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
 {
free_pages((unsigned long)pci_seg->dev_table,
-   get_order(dev_table_size));
+   get_order(pci_seg->dev_table_size));
pci_seg->dev_table = NULL;
 }
 
@@ -1034,7 +1035,7 @@ static bool __copy_device_table(struct amd_iommu *iommu)
entry = (((u64) hi) << 32) + lo;
 
old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
-   if (old_devtb_size != dev_table_size) {
+   if (old_devtb_size != pci_seg->dev_table_size) {
pr_err("The device table size of IOMMU:%d is not expected!\n",
iommu->index);
return false;
@@ -1053,15 +1054,15 @@ static bool __copy_device_table(struct amd_iommu *iommu)
}
old_devtb = (cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT) && 
is_kdump_kernel())
? (__force void *)ioremap_encrypted(old_devtb_phys,
-   dev_table_size)
-   : memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+   pci_seg->dev_table_size)
+   : memremap(old_devtb_phys, pci_seg->dev_table_size, 
MEMREMAP_WB);
 
if (!old_devtb)
return false;
 
gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
-   get_order(dev_table_size));
+   
get_order(pci_seg->dev_table_size));
if (pci_seg->old_dev_tbl_cpy == NULL) {
pr_err("Failed to allocate memory for copying old device 
table!\n");
memunmap(old_devtb);
@@ -1580,6 +1581,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id,
 
pci_seg->last_bdf = last_bdf;
DUMP_printk("PCI segment : 0x%0x, last bdf : 0x%04x\n", id, last_bdf);
+   pci_seg->dev_table_size = tbl_size(DEV_TABLE_ENTRY_SIZE);
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
@@ -2675,7 +2677,7 @@ static void early_enable_iommus(void)
for_each_pci_segment(pci_seg) {
if (pci_seg->old_dev_tbl_cpy != NULL) {
free_pages((unsigned 
long)pci_seg->old_dev_tbl_cpy,
-   get_order(dev_table_size));
+   
get_order(pci_seg->dev_table_size));
pci_seg->old_dev_tbl_cpy = NULL;
}
}
@@ -2689,7 +2691,7 @@ static void early_enable_iommus(void)
 
for_each_pci_segment(pci_seg) {
free_pages((unsigned long)pci_seg->dev_table,
-  get_order(dev_table_size));
+  

[PATCH v1 10/37] iommu/amd: Introduce per PCI segment last_bdf

2022-04-04 Thread Vasant Hegde via iommu
Current code uses global "amd_iommu_last_bdf" to track the last bdf
supported by the system. This value is used for various memory
allocation, device data flushing, etc.

Introduce per PCI segment last_bdf which will be used to track last bdf
supported by the given PCI segment and use this value for all per
segment memory allocations. Eventually it will replace global
"amd_iommu_last_bdf".

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 ++
 drivers/iommu/amd/init.c| 68 ++---
 2 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index c4c9c35e2bf7..e39e7db54e69 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -544,6 +544,9 @@ struct amd_iommu_pci_seg {
/* PCI segment number */
u16 id;
 
+   /* Largest PCI device id we expect translation requests for */
+   u16 last_bdf;
+
/*
 * device table virtual address
 *
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d613e20ea013..71f39551a83a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -550,6 +550,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
 {
u8 *p = (void *)h, *end = (void *)h;
struct ivhd_entry *dev;
+   int last_devid = -EINVAL;
 
u32 ivhd_size = get_ivhd_header_size(h);
 
@@ -567,6 +568,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
case IVHD_DEV_ALL:
/* Use maximum BDF value for DEV_ALL */
update_last_devid(0x);
+   return 0x;
break;
case IVHD_DEV_SELECT:
case IVHD_DEV_RANGE_END:
@@ -574,6 +576,8 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
case IVHD_DEV_EXT_SELECT:
/* all the above subfield types refer to device ids */
update_last_devid(dev->devid);
+   if (dev->devid > last_devid)
+   last_devid = dev->devid;
break;
default:
break;
@@ -583,7 +587,7 @@ static int __init find_last_devid_from_ivhd(struct 
ivhd_header *h)
 
WARN_ON(p != end);
 
-   return 0;
+   return last_devid;
 }
 
 static int __init check_ivrs_checksum(struct acpi_table_header *table)
@@ -607,27 +611,31 @@ static int __init check_ivrs_checksum(struct 
acpi_table_header *table)
  * id which we need to handle. This is the first of three functions which parse
  * the ACPI table. So we check the checksum here.
  */
-static int __init find_last_devid_acpi(struct acpi_table_header *table)
+static int __init find_last_devid_acpi(struct acpi_table_header *table, u16 
pci_seg)
 {
u8 *p = (u8 *)table, *end = (u8 *)table;
struct ivhd_header *h;
+   int last_devid, last_bdf = 0;
 
p += IVRS_HEADER_LENGTH;
 
end += table->length;
while (p < end) {
h = (struct ivhd_header *)p;
-   if (h->type == amd_iommu_target_ivhd_type) {
-   int ret = find_last_devid_from_ivhd(h);
-
-   if (ret)
-   return ret;
+   if (h->pci_seg == pci_seg &&
+   h->type == amd_iommu_target_ivhd_type) {
+   last_devid = find_last_devid_from_ivhd(h);
+
+   if (last_devid < 0)
+   return -EINVAL;
+   if (last_devid > last_bdf)
+   last_bdf = last_devid;
}
p += h->length;
}
WARN_ON(p != end);
 
-   return 0;
+   return last_bdf;
 }
 
 /
@@ -1551,14 +1559,28 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
 }
 
 /* Allocate PCI segment data structure */
-static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id)
+static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id,
+ struct acpi_table_header *ivrs_base)
 {
struct amd_iommu_pci_seg *pci_seg;
+   int last_bdf;
+
+   /*
+* First parse ACPI tables to find the largest Bus/Dev/Func we need to
+* handle in this PCI segment. Upon this information the shared data
+* structures for the PCI segments in the system will be allocated.
+*/
+   last_bdf = find_last_devid_acpi(ivrs_base, id);
+   if (last_bdf < 0)
+   return NULL;
 
pci_seg = kzalloc(sizeof(struct amd_iommu_pci_seg), GFP_KERNEL);
if (pci_seg == NULL)
 

[PATCH v1 09/37] iommu/amd: Introduce per PCI segment unity map list

2022-04-04 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments. In order to support
multiple PCI segments IVMD table in IVRS structure is enhanced to
include pci segment id. Update ivmd_header structure to include "pci_seg".

Also introduce per PCI segment unity map list. It will replace global
amd_iommu_unity_map list.

Note that we have used "reserved" field in IVMD table to include "pci_seg
id" which was set to zero. It will take care of backward compatibility
(new kernel will work fine on older systems).

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 13 +++--
 drivers/iommu/amd/init.c| 30 +++--
 drivers/iommu/amd/iommu.c   |  8 +++-
 3 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index f9776f188e36..c4c9c35e2bf7 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -579,6 +579,13 @@ struct amd_iommu_pci_seg {
 * More than one device can share the same requestor id.
 */
u16 *alias_table;
+
+   /*
+* A list of required unity mappings we find in ACPI. It is not locked
+* because as runtime it is only read. It is created at ACPI table
+* parsing time.
+*/
+   struct list_head unity_map;
 };
 
 /*
@@ -805,12 +812,6 @@ struct unity_map_entry {
int prot;
 };
 
-/*
- * List of all unity mappings. It is not locked because as runtime it is only
- * read. It is created at ACPI table parsing time.
- */
-extern struct list_head amd_iommu_unity_map;
-
 /*
  * Data structures for device handling
  */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index fe31de6e764c..d613e20ea013 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -142,7 +142,8 @@ struct ivmd_header {
u16 length;
u16 devid;
u16 aux;
-   u64 resv;
+   u16 pci_seg;
+   u8  resv[6];
u64 range_start;
u64 range_length;
 } __attribute__((packed));
@@ -162,8 +163,6 @@ static int amd_iommu_target_ivhd_type;
 
 u16 amd_iommu_last_bdf;/* largest PCI device id we have
   to handle */
-LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
-  we find in ACPI */
 
 LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
@@ -1562,6 +1561,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
pci_seg->id = id;
init_llist_head(_seg->dev_data_list);
+   INIT_LIST_HEAD(_seg->unity_map);
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
if (alloc_dev_table(pci_seg))
@@ -2397,10 +2397,13 @@ static int iommu_init_irq(struct amd_iommu *iommu)
 static void __init free_unity_maps(void)
 {
struct unity_map_entry *entry, *next;
+   struct amd_iommu_pci_seg *p, *pci_seg;
 
-   list_for_each_entry_safe(entry, next, _iommu_unity_map, list) {
-   list_del(>list);
-   kfree(entry);
+   for_each_pci_segment_safe(pci_seg, p) {
+   list_for_each_entry_safe(entry, next, _seg->unity_map, 
list) {
+   list_del(>list);
+   kfree(entry);
+   }
}
 }
 
@@ -2408,8 +2411,13 @@ static void __init free_unity_maps(void)
 static int __init init_unity_map_range(struct ivmd_header *m)
 {
struct unity_map_entry *e = NULL;
+   struct amd_iommu_pci_seg *pci_seg;
char *s;
 
+   pci_seg = get_pci_segment(m->pci_seg);
+   if (pci_seg == NULL)
+   return -ENOMEM;
+
e = kzalloc(sizeof(*e), GFP_KERNEL);
if (e == NULL)
return -ENOMEM;
@@ -2447,14 +2455,16 @@ static int __init init_unity_map_range(struct 
ivmd_header *m)
if (m->flags & IVMD_FLAG_EXCL_RANGE)
e->prot = (IVMD_FLAG_IW | IVMD_FLAG_IR) >> 1;
 
-   DUMP_printk("%s devid_start: %02x:%02x.%x devid_end: %02x:%02x.%x"
-   " range_start: %016llx range_end: %016llx flags: %x\n", s,
+   DUMP_printk("%s devid_start: %04x:%02x:%02x.%x devid_end: "
+   "%04x:%02x:%02x.%x range_start: %016llx range_end: %016llx"
+   " flags: %x\n", s, m->pci_seg,
PCI_BUS_NUM(e->devid_start), PCI_SLOT(e->devid_start),
-   PCI_FUNC(e->devid_start), PCI_BUS_NUM(e->devid_end),
+   PCI_FUNC(e->devid_start), m->pci_seg,
+   PCI_BUS_NUM(e->devid_end),
PCI_SLOT(e->devid_end), PCI_FUNC(e->devid_end),
e->address_start, e->address_end, m->flags);
 
-   list_add_tail(>list, _iommu_unity_map);
+   

[PATCH v1 08/37] iommu/amd: Introduce per PCI segment alias_table

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This will replace global alias table (amd_iommu_alias_table).

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |  7 +
 drivers/iommu/amd/init.c| 41 ++---
 drivers/iommu/amd/iommu.c   | 41 ++---
 3 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 330bb346207a..f9776f188e36 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -572,6 +572,13 @@ struct amd_iommu_pci_seg {
 * will be copied to. It's only be used in kdump kernel.
 */
struct dev_table_entry *old_dev_tbl_cpy;
+
+   /*
+* The alias table is a driver specific data structure which contains 
the
+* mappings of the PCI device ids to the actual requestor ids on the 
IOMMU.
+* More than one device can share the same requestor id.
+*/
+   u16 *alias_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index af413738da01..fe31de6e764c 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -698,6 +698,31 @@ static inline void free_irq_lookup_table(struct 
amd_iommu_pci_seg *pci_seg)
pci_seg->irq_lookup_table = NULL;
 }
 
+static int __init alloc_alias_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   int i;
+
+   pci_seg->alias_table = (void *)__get_free_pages(GFP_KERNEL,
+   
get_order(alias_table_size));
+   if (!pci_seg->alias_table)
+   return -ENOMEM;
+
+   /*
+* let all alias entries point to itself
+*/
+   for (i = 0; i <= amd_iommu_last_bdf; ++i)
+   pci_seg->alias_table[i] = i;
+
+   return 0;
+}
+
+static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->alias_table,
+  get_order(alias_table_size));
+   pci_seg->alias_table = NULL;
+}
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1266,6 +1291,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
u32 dev_i, ext_flags = 0;
bool alias = false;
struct ivhd_entry *e;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
u32 ivhd_size;
int ret;
 
@@ -1347,7 +1373,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
devid_to = e->ext >> 8;
set_dev_entry_from_acpi(iommu, devid   , e->flags, 0);
set_dev_entry_from_acpi(iommu, devid_to, e->flags, 0);
-   amd_iommu_alias_table[devid] = devid_to;
+   pci_seg->alias_table[devid] = devid_to;
break;
case IVHD_DEV_ALIAS_RANGE:
 
@@ -1405,7 +1431,7 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
devid = e->devid;
for (dev_i = devid_start; dev_i <= devid; ++dev_i) {
if (alias) {
-   amd_iommu_alias_table[dev_i] = devid_to;
+   pci_seg->alias_table[dev_i] = devid_to;
set_dev_entry_from_acpi(iommu,
devid_to, flags, ext_flags);
}
@@ -1540,6 +1566,8 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
if (alloc_dev_table(pci_seg))
return NULL;
+   if (alloc_alias_table(pci_seg))
+   return NULL;
if (alloc_rlookup_table(pci_seg))
return NULL;
 
@@ -1566,6 +1594,7 @@ static void __init free_pci_segment(void)
list_del(_seg->list);
free_irq_lookup_table(pci_seg);
free_rlookup_table(pci_seg);
+   free_alias_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
}
@@ -2838,7 +2867,7 @@ static void __init ivinfo_init(void *ivrs)
 static int __init early_amd_iommu_init(void)
 {
struct acpi_table_header *ivrs_base;
-   int i, remap_cache_sz, ret;
+   int remap_cache_sz, ret;
acpi_status status;
 
if (!amd_iommu_detected)
@@ -2909,12 +2938,6 @@ static int __init early_amd_iommu_init(void)
if (amd_iommu_pd_alloc_bitmap == NULL)
goto out;
 
-   /*
-* let all alias entries point to itself
-*/
-   for (i = 0; i <= amd_iommu_last_bdf; ++i)
-   amd_iommu_alias_table[i] = i;
-
/*
 * never allocate domain 0 because its used as the non-allocated and

[PATCH v1 07/37] iommu/amd: Introduce per PCI segment old_dev_tbl_cpy

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

It will remove global old_dev_tbl_cpy. Also update copy_device_table()
copy device table for all PCI segments.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h |   6 ++
 drivers/iommu/amd/init.c| 109 
 2 files changed, 70 insertions(+), 45 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 7bf35e3a1ed6..330bb346207a 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -566,6 +566,12 @@ struct amd_iommu_pci_seg {
 * device id quickly.
 */
struct irq_remap_table **irq_lookup_table;
+
+   /*
+* Pointer to a device table which the content of old device table
+* will be copied to. It's only be used in kdump kernel.
+*/
+   struct dev_table_entry *old_dev_tbl_cpy;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 27785a558d9c..af413738da01 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -193,11 +193,6 @@ bool amd_iommu_force_isolation __read_mostly;
  * page table root pointer.
  */
 struct dev_table_entry *amd_iommu_dev_table;
-/*
- * Pointer to a device table which the content of old device table
- * will be copied to. It's only be used in kdump kernel.
- */
-static struct dev_table_entry *old_dev_tbl_cpy;
 
 /*
  * The alias table is a driver specific data structure which contains the
@@ -990,39 +985,27 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
 }
 
 
-static bool copy_device_table(void)
+static bool __copy_device_table(struct amd_iommu *iommu)
 {
-   u64 int_ctl, int_tab_len, entry = 0, last_entry = 0;
+   u64 int_ctl, int_tab_len, entry = 0;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
struct dev_table_entry *old_devtb = NULL;
u32 lo, hi, devid, old_devtb_size;
phys_addr_t old_devtb_phys;
-   struct amd_iommu *iommu;
u16 dom_id, dte_v, irq_v;
gfp_t gfp_flag;
u64 tmp;
 
-   if (!amd_iommu_pre_enabled)
-   return false;
-
-   pr_warn("Translation is already enabled - trying to copy translation 
structures\n");
-   for_each_iommu(iommu) {
-   /* All IOMMUs should use the same device table with the same 
size */
-   lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
-   hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
-   entry = (((u64) hi) << 32) + lo;
-   if (last_entry && last_entry != entry) {
-   pr_err("IOMMU:%d should use the same dev table as 
others!\n",
-   iommu->index);
-   return false;
-   }
-   last_entry = entry;
+   /* Each IOMMU use separate device table with the same size */
+   lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
+   hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
+   entry = (((u64) hi) << 32) + lo;
 
-   old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
-   if (old_devtb_size != dev_table_size) {
-   pr_err("The device table size of IOMMU:%d is not 
expected!\n",
-   iommu->index);
-   return false;
-   }
+   old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
+   if (old_devtb_size != dev_table_size) {
+   pr_err("The device table size of IOMMU:%d is not expected!\n",
+   iommu->index);
+   return false;
}
 
/*
@@ -1045,31 +1028,31 @@ static bool copy_device_table(void)
return false;
 
gfp_flag = GFP_KERNEL | __GFP_ZERO | GFP_DMA32;
-   old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
-   get_order(dev_table_size));
-   if (old_dev_tbl_cpy == NULL) {
+   pci_seg->old_dev_tbl_cpy = (void *)__get_free_pages(gfp_flag,
+   get_order(dev_table_size));
+   if (pci_seg->old_dev_tbl_cpy == NULL) {
pr_err("Failed to allocate memory for copying old device 
table!\n");
memunmap(old_devtb);
return false;
}
 
for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
-   old_dev_tbl_cpy[devid] = old_devtb[devid];
+   pci_seg->old_dev_tbl_cpy[devid] = old_devtb[devid];
dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
 
if (dte_v && dom_id) {
-   old_dev_tbl_cpy[devid].data[0] = 
old_devtb[devid].data[0];
-   old_dev_tbl_cpy[devid].data[1] = 
old_devtb[devid].data[1];
+   

[PATCH v1 06/37] iommu/amd: Introduce per PCI segment dev_data_list

2022-04-04 Thread Vasant Hegde via iommu
This will replace global dev_data_list.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  3 +++
 drivers/iommu/amd/init.c|  1 +
 drivers/iommu/amd/iommu.c   | 21 ++---
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index d507c96598a7..7bf35e3a1ed6 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -538,6 +538,9 @@ struct protection_domain {
 struct amd_iommu_pci_seg {
struct list_head list;
 
+   /* List of all available dev_data structures */
+   struct llist_head dev_data_list;
+
/* PCI segment number */
u16 id;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 48db6c3164aa..27785a558d9c 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1525,6 +1525,7 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
return NULL;
 
pci_seg->id = id;
+   init_llist_head(_seg->dev_data_list);
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
if (alloc_dev_table(pci_seg))
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a8baa64c8f9c..2bea72f388b2 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -62,9 +62,6 @@
 
 static DEFINE_SPINLOCK(pd_bitmap_lock);
 
-/* List of all available dev_data structures */
-static LLIST_HEAD(dev_data_list);
-
 LIST_HEAD(ioapic_map);
 LIST_HEAD(hpet_map);
 LIST_HEAD(acpihid_map);
@@ -195,9 +192,10 @@ static struct protection_domain *to_pdomain(struct 
iommu_domain *dom)
return container_of(dom, struct protection_domain, domain);
 }
 
-static struct iommu_dev_data *alloc_dev_data(u16 devid)
+static struct iommu_dev_data *alloc_dev_data(struct amd_iommu *iommu, u16 
devid)
 {
struct iommu_dev_data *dev_data;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
dev_data = kzalloc(sizeof(*dev_data), GFP_KERNEL);
if (!dev_data)
@@ -207,19 +205,20 @@ static struct iommu_dev_data *alloc_dev_data(u16 devid)
dev_data->devid = devid;
ratelimit_default_init(_data->rs);
 
-   llist_add(_data->dev_data_list, _data_list);
+   llist_add(_data->dev_data_list, _seg->dev_data_list);
return dev_data;
 }
 
-static struct iommu_dev_data *search_dev_data(u16 devid)
+static struct iommu_dev_data *search_dev_data(struct amd_iommu *iommu, u16 
devid)
 {
struct iommu_dev_data *dev_data;
struct llist_node *node;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
 
-   if (llist_empty(_data_list))
+   if (llist_empty(_seg->dev_data_list))
return NULL;
 
-   node = dev_data_list.first;
+   node = pci_seg->dev_data_list.first;
llist_for_each_entry(dev_data, node, dev_data_list) {
if (dev_data->devid == devid)
return dev_data;
@@ -287,10 +286,10 @@ static struct iommu_dev_data *find_dev_data(u16 devid)
struct iommu_dev_data *dev_data;
struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
-   dev_data = search_dev_data(devid);
+   dev_data = search_dev_data(iommu, devid);
 
if (dev_data == NULL) {
-   dev_data = alloc_dev_data(devid);
+   dev_data = alloc_dev_data(iommu, devid);
if (!dev_data)
return NULL;
 
@@ -3461,7 +3460,7 @@ static int amd_ir_set_vcpu_affinity(struct irq_data 
*data, void *vcpu_info)
struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data;
struct amd_ir_data *ir_data = data->chip_data;
struct irq_2_irte *irte_info = _data->irq_2_irte;
-   struct iommu_dev_data *dev_data = search_dev_data(irte_info->devid);
+   struct iommu_dev_data *dev_data = search_dev_data(NULL, 
irte_info->devid);
 
/* Note:
 * This device has never been set up for guest mode.
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 05/37] iommu/amd: Introduce per PCI segment irq_lookup_table

2022-04-04 Thread Vasant Hegde via iommu
This will replace global irq lookup table (irq_lookup_table).

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  6 ++
 drivers/iommu/amd/init.c| 27 +++
 2 files changed, 33 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 9c008662be1b..d507c96598a7 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -557,6 +557,12 @@ struct amd_iommu_pci_seg {
 * device id.
 */
struct amd_iommu **rlookup_table;
+
+   /*
+* This table is used to find the irq remapping table for a given
+* device id quickly.
+*/
+   struct irq_remap_table **irq_lookup_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index a2efc02ba80a..48db6c3164aa 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -682,6 +682,26 @@ static inline void free_rlookup_table(struct 
amd_iommu_pci_seg *pci_seg)
pci_seg->rlookup_table = NULL;
 }
 
+static inline int __init alloc_irq_lookup_table(struct amd_iommu_pci_seg 
*pci_seg)
+{
+   pci_seg->irq_lookup_table = (void *)__get_free_pages(
+GFP_KERNEL | __GFP_ZERO,
+get_order(rlookup_table_size));
+   kmemleak_alloc(pci_seg->irq_lookup_table,
+  rlookup_table_size, 1, GFP_KERNEL);
+   if (pci_seg->irq_lookup_table == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_irq_lookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   kmemleak_free(pci_seg->irq_lookup_table);
+   free_pages((unsigned long)pci_seg->irq_lookup_table,
+  get_order(rlookup_table_size));
+   pci_seg->irq_lookup_table = NULL;
+}
 
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
@@ -1533,6 +1553,7 @@ static void __init free_pci_segment(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_irq_lookup_table(pci_seg);
free_rlookup_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
@@ -2896,6 +2917,7 @@ static int __init early_amd_iommu_init(void)
amd_iommu_irq_remap = check_ioapic_information();
 
if (amd_iommu_irq_remap) {
+   struct amd_iommu_pci_seg *pci_seg;
/*
 * Interrupt remapping enabled, create kmem_cache for the
 * remapping tables.
@@ -2912,6 +2934,11 @@ static int __init early_amd_iommu_init(void)
if (!amd_iommu_irq_cache)
goto out;
 
+   for_each_pci_segment(pci_seg) {
+   if (alloc_irq_lookup_table(pci_seg))
+   goto out;
+   }
+
irq_lookup_table = (void *)__get_free_pages(
GFP_KERNEL | __GFP_ZERO,
get_order(rlookup_table_size));
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 04/37] iommu/amd: Introduce per PCI segment rlookup table

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

This will replace global rlookup table (amd_iommu_rlookup_table).
Also add helper functions to set/get rlookup table for the given device.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h |  8 ++
 drivers/iommu/amd/init.c| 23 +++
 drivers/iommu/amd/iommu.c   | 44 +
 4 files changed, 76 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 885570cd0d77..2947239700ce 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -19,6 +19,7 @@ extern int amd_iommu_init_devices(void);
 extern void amd_iommu_uninit_devices(void);
 extern void amd_iommu_init_notifier(void);
 extern int amd_iommu_init_api(void);
+extern void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid);
 
 #ifdef CONFIG_AMD_IOMMU_DEBUGFS
 void amd_iommu_debugfs_setup(struct amd_iommu *iommu);
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 404feb7995cc..9c008662be1b 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -486,6 +486,7 @@ struct amd_iommu_fault {
 };
 
 
+struct amd_iommu;
 struct iommu_domain;
 struct irq_domain;
 struct amd_irte_ops;
@@ -549,6 +550,13 @@ struct amd_iommu_pci_seg {
 * page table root pointer.
 */
struct dev_table_entry *dev_table;
+
+   /*
+* The rlookup iommu table is used to find the IOMMU which is
+* responsible for a specific device. It is indexed by the PCI
+* device id.
+*/
+   struct amd_iommu **rlookup_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 0fd1071bfc85..a2efc02ba80a 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -663,6 +663,26 @@ static inline void free_dev_table(struct amd_iommu_pci_seg 
*pci_seg)
pci_seg->dev_table = NULL;
 }
 
+/* Allocate per PCI segment IOMMU rlookup table. */
+static inline int __init alloc_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   pci_seg->rlookup_table = (void *)__get_free_pages(
+   GFP_KERNEL | __GFP_ZERO,
+   get_order(rlookup_table_size));
+   if (pci_seg->rlookup_table == NULL)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_rlookup_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->rlookup_table,
+  get_order(rlookup_table_size));
+   pci_seg->rlookup_table = NULL;
+}
+
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1489,6 +1509,8 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
 
if (alloc_dev_table(pci_seg))
return NULL;
+   if (alloc_rlookup_table(pci_seg))
+   return NULL;
 
return pci_seg;
 }
@@ -1511,6 +1533,7 @@ static void __init free_pci_segment(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_rlookup_table(pci_seg);
free_dev_table(pci_seg);
kfree(pci_seg);
}
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 52058c0d4f62..a8baa64c8f9c 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -146,6 +146,50 @@ struct dev_table_entry *get_dev_table(struct amd_iommu 
*iommu)
return dev_table;
 }
 
+static inline u16 get_device_segment(struct device *dev)
+{
+   u16 seg;
+
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   seg = pci_domain_nr(pdev->bus);
+   } else {
+   u32 devid = get_acpihid_device_id(dev, NULL);
+
+   seg = (devid >> 16) & 0x;
+   }
+
+   return seg;
+}
+
+/* Writes the specific IOMMU for a device into the PCI segment rlookup table */
+void amd_iommu_set_rlookup_table(struct amd_iommu *iommu, u16 devid)
+{
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   pci_seg->rlookup_table[devid] = iommu;
+}
+
+static struct amd_iommu *__rlookup_amd_iommu(u16 seg, u16 devid)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg) {
+   if (pci_seg->id == seg)
+   return pci_seg->rlookup_table[devid];
+   }
+   return NULL;
+}
+
+static struct amd_iommu *rlookup_amd_iommu(struct device *dev)
+{
+   u16 seg = get_device_segment(dev);
+   u16 devid = get_device_id(dev);
+
+   return __rlookup_amd_iommu(seg, devid);
+}
+
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
 {
return 

[PATCH v1 03/37] iommu/amd: Introduce per PCI segment device table

2022-04-04 Thread Vasant Hegde via iommu
From: Suravee Suthikulpanit 

Introduce per PCI segment device table. All IOMMUs within the segment
will share this device table. This will replace global device
table i.e. amd_iommu_dev_table.

Also introduce helper function to get the device table for the given IOMMU.

Co-developed-by: Vasant Hegde 
Signed-off-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu.h   |  1 +
 drivers/iommu/amd/amd_iommu_types.h | 10 ++
 drivers/iommu/amd/init.c| 26 --
 drivers/iommu/amd/iommu.c   | 12 
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 1ab31074f5b3..885570cd0d77 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -128,4 +128,5 @@ static inline void amd_iommu_apply_ivrs_quirks(void) { }
 
 extern void amd_iommu_domain_set_pgtable(struct protection_domain *domain,
 u64 *root, int mode);
+extern struct dev_table_entry *get_dev_table(struct amd_iommu *iommu);
 #endif
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 62442d88978f..404feb7995cc 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -539,6 +539,16 @@ struct amd_iommu_pci_seg {
 
/* PCI segment number */
u16 id;
+
+   /*
+* device table virtual address
+*
+* Pointer to the per PCI segment device table.
+* It is indexed by the PCI device id or the HT unit id and contains
+* information about the domain the device belongs to as well as the
+* page table root pointer.
+*/
+   struct dev_table_entry *dev_table;
 };
 
 /*
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index e01eae9dcbc1..0fd1071bfc85 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -640,11 +640,29 @@ static int __init find_last_devid_acpi(struct 
acpi_table_header *table)
  *
  * The following functions belong to the code path which parses the ACPI table
  * the second time. In this ACPI parsing iteration we allocate IOMMU specific
- * data structures, initialize the device/alias/rlookup table and also
- * basically initialize the hardware.
+ * data structures, initialize the per PCI segment device/alias/rlookup table
+ * and also basically initialize the hardware.
  *
  /
 
+/* Allocate per PCI segment device table */
+static inline int __init alloc_dev_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   pci_seg->dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO | 
GFP_DMA32,
+ 
get_order(dev_table_size));
+   if (!pci_seg->dev_table)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static inline void free_dev_table(struct amd_iommu_pci_seg *pci_seg)
+{
+   free_pages((unsigned long)pci_seg->dev_table,
+   get_order(dev_table_size));
+   pci_seg->dev_table = NULL;
+}
+
 /*
  * Allocates the command buffer. This buffer is per AMD IOMMU. We can
  * write commands to that buffer later and the IOMMU will execute them
@@ -1469,6 +1487,9 @@ static struct amd_iommu_pci_seg *__init 
alloc_pci_segment(u16 id)
pci_seg->id = id;
list_add_tail(_seg->list, _iommu_pci_seg_list);
 
+   if (alloc_dev_table(pci_seg))
+   return NULL;
+
return pci_seg;
 }
 
@@ -1490,6 +1511,7 @@ static void __init free_pci_segment(void)
 
for_each_pci_segment_safe(pci_seg, next) {
list_del(_seg->list);
+   free_dev_table(pci_seg);
kfree(pci_seg);
}
 }
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 7a29e2645dc4..52058c0d4f62 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -134,6 +134,18 @@ static inline int get_device_id(struct device *dev)
return devid;
 }
 
+struct dev_table_entry *get_dev_table(struct amd_iommu *iommu)
+{
+   struct dev_table_entry *dev_table;
+   struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+
+   BUG_ON(pci_seg == NULL);
+   dev_table = pci_seg->dev_table;
+   BUG_ON(dev_table == NULL);
+
+   return dev_table;
+}
+
 static struct protection_domain *to_pdomain(struct iommu_domain *dom)
 {
return container_of(dom, struct protection_domain, domain);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 02/37] iommu/amd: Introduce pci segment structure

2022-04-04 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments, where each segment
contains one or more IOMMU instances. However, an IOMMU instance can only
support a single PCI segment.

Current code assumes that system contains only one pci segment (segment 0)
and creates global data structures such as device table, rlookup table,
etc.

Introducing per PCI segment data structure, which contains segment
specific data structures. This will eventually replace the global
data structures.

Also update `amd_iommu->pci_seg` variable to point to PCI segment
structure instead of PCI segment ID.

Co-developed-by: Suravee Suthikulpanit 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h | 23 ++-
 drivers/iommu/amd/init.c| 46 -
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 06235b7cb13d..62442d88978f 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -452,6 +452,11 @@ extern bool amd_iommu_irq_remap;
 /* kmem_cache to get tables with 128 byte alignement */
 extern struct kmem_cache *amd_iommu_irq_cache;
 
+/* Make iterating over all pci segment easier */
+#define for_each_pci_segment(pci_seg) \
+   list_for_each_entry((pci_seg), _iommu_pci_seg_list, list)
+#define for_each_pci_segment_safe(pci_seg, next) \
+   list_for_each_entry_safe((pci_seg), (next), _iommu_pci_seg_list, 
list)
 /*
  * Make iterating over all IOMMUs easier
  */
@@ -526,6 +531,16 @@ struct protection_domain {
unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */
 };
 
+/*
+ * This structure contains information about one PCI segment in the system.
+ */
+struct amd_iommu_pci_seg {
+   struct list_head list;
+
+   /* PCI segment number */
+   u16 id;
+};
+
 /*
  * Structure where we save information about one hardware AMD IOMMU in the
  * system.
@@ -577,7 +592,7 @@ struct amd_iommu {
u16 cap_ptr;
 
/* pci domain of this IOMMU */
-   u16 pci_seg;
+   struct amd_iommu_pci_seg *pci_seg;
 
/* start of exclusion range of that IOMMU */
u64 exclusion_start;
@@ -705,6 +720,12 @@ extern struct list_head ioapic_map;
 extern struct list_head hpet_map;
 extern struct list_head acpihid_map;
 
+/*
+ * List with all PCI segments in the system. This list is not locked because
+ * it is only written at driver initialization time
+ */
+extern struct list_head amd_iommu_pci_seg_list;
+
 /*
  * List with all IOMMUs in the system. This list is not locked because it is
  * only written and read at driver initialization or suspend time
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index b4a798c7b347..e01eae9dcbc1 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -165,6 +165,7 @@ u16 amd_iommu_last_bdf; /* largest PCI 
device id we have
 LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
   we find in ACPI */
 
+LIST_HEAD(amd_iommu_pci_seg_list); /* list of all PCI segments */
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
   system */
 
@@ -1456,6 +1457,43 @@ static int __init init_iommu_from_acpi(struct amd_iommu 
*iommu,
return 0;
 }
 
+/* Allocate PCI segment data structure */
+static struct amd_iommu_pci_seg *__init alloc_pci_segment(u16 id)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   pci_seg = kzalloc(sizeof(struct amd_iommu_pci_seg), GFP_KERNEL);
+   if (pci_seg == NULL)
+   return NULL;
+
+   pci_seg->id = id;
+   list_add_tail(_seg->list, _iommu_pci_seg_list);
+
+   return pci_seg;
+}
+
+static struct amd_iommu_pci_seg *__init get_pci_segment(u16 id)
+{
+   struct amd_iommu_pci_seg *pci_seg;
+
+   for_each_pci_segment(pci_seg) {
+   if (pci_seg->id == id)
+   return pci_seg;
+   }
+
+   return alloc_pci_segment(id);
+}
+
+static void __init free_pci_segment(void)
+{
+   struct amd_iommu_pci_seg *pci_seg, *next;
+
+   for_each_pci_segment_safe(pci_seg, next) {
+   list_del(_seg->list);
+   kfree(pci_seg);
+   }
+}
+
 static void __init free_iommu_one(struct amd_iommu *iommu)
 {
free_cwwb_sem(iommu);
@@ -1542,8 +1580,14 @@ static void amd_iommu_ats_write_check_workaround(struct 
amd_iommu *iommu)
  */
 static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header 
*h)
 {
+   struct amd_iommu_pci_seg *pci_seg;
int ret;
 
+   pci_seg = get_pci_segment(h->pci_seg);
+   if (pci_seg == NULL)
+   return -ENOMEM;
+   iommu->pci_seg = pci_seg;
+
raw_spin_lock_init(>lock);
iommu->cmd_sem_val = 0;
 
@@ -1564,7 +1608,6 @@ static int __init 

[PATCH v1 01/37] iommu/amd: Update struct iommu_dev_data defination

2022-04-04 Thread Vasant Hegde via iommu
struct iommu_dev_data contains member "pdev" to point to pci_dev. This is
valid for only PCI devices and for other devices this will be NULL. This
causes unnecessary "pdev != NULL" check at various places.

Replace "struct pci_dev" member with "struct device" and use
to_pci_dev() to get pci device reference as needed. Also adjust
setup_aliases() and clone_aliases() function.

No functional change intended.

Signed-off-by: Vasant Hegde 
---
 drivers/iommu/amd/amd_iommu_types.h |  2 +-
 drivers/iommu/amd/iommu.c   | 27 +++
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 47108ed44fbb..06235b7cb13d 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -685,7 +685,7 @@ struct iommu_dev_data {
struct list_head list;/* For domain->dev_list */
struct llist_node dev_data_list;  /* For global dev_data_list */
struct protection_domain *domain; /* Domain the device is bound to */
-   struct pci_dev *pdev;
+   struct device *dev;
u16 devid;/* PCI Device ID */
bool iommu_v2;/* Device can make use of IOMMUv2 */
struct {
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a1ada7bff44e..7a29e2645dc4 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -188,8 +188,10 @@ static int clone_alias(struct pci_dev *pdev, u16 alias, 
void *data)
return 0;
 }
 
-static void clone_aliases(struct pci_dev *pdev)
+static void clone_aliases(struct device *dev)
 {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
if (!pdev)
return;
 
@@ -203,14 +205,14 @@ static void clone_aliases(struct pci_dev *pdev)
pci_for_each_dma_alias(pdev, clone_alias, NULL);
 }
 
-static struct pci_dev *setup_aliases(struct device *dev)
+static void setup_aliases(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
u16 ivrs_alias;
 
/* For ACPI HID devices, there are no aliases */
if (!dev_is_pci(dev))
-   return NULL;
+   return;
 
/*
 * Add the IVRS alias to the pci aliases if it is on the same
@@ -221,9 +223,7 @@ static struct pci_dev *setup_aliases(struct device *dev)
PCI_BUS_NUM(ivrs_alias) == pdev->bus->number)
pci_add_dma_alias(pdev, ivrs_alias & 0xff, 1);
 
-   clone_aliases(pdev);
-
-   return pdev;
+   clone_aliases(dev);
 }
 
 static struct iommu_dev_data *find_dev_data(u16 devid)
@@ -331,7 +331,8 @@ static int iommu_init_device(struct device *dev)
if (!dev_data)
return -ENOMEM;
 
-   dev_data->pdev = setup_aliases(dev);
+   dev_data->dev = dev;
+   setup_aliases(dev);
 
/*
 * By default we use passthrough mode for IOMMUv2 capable device.
@@ -1232,13 +1233,15 @@ static int device_flush_dte_alias(struct pci_dev *pdev, 
u16 alias, void *data)
 static int device_flush_dte(struct iommu_dev_data *dev_data)
 {
struct amd_iommu *iommu;
+   struct pci_dev *pdev;
u16 alias;
int ret;
 
iommu = amd_iommu_rlookup_table[dev_data->devid];
 
-   if (dev_data->pdev)
-   ret = pci_for_each_dma_alias(dev_data->pdev,
+   pdev = to_pci_dev(dev_data->dev);
+   if (pdev)
+   ret = pci_for_each_dma_alias(pdev,
 device_flush_dte_alias, iommu);
else
ret = iommu_flush_dte(iommu, dev_data->devid);
@@ -1561,7 +1564,7 @@ static void do_attach(struct iommu_dev_data *dev_data,
/* Update device table */
set_dte_entry(dev_data->devid, domain,
  ats, dev_data->iommu_v2);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
 
device_flush_dte(dev_data);
 }
@@ -1577,7 +1580,7 @@ static void do_detach(struct iommu_dev_data *dev_data)
dev_data->domain = NULL;
list_del(_data->list);
clear_dte_entry(dev_data->devid);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
 
/* Flush the DTE entry */
device_flush_dte(dev_data);
@@ -1818,7 +1821,7 @@ static void update_device_table(struct protection_domain 
*domain)
list_for_each_entry(dev_data, >dev_list, list) {
set_dte_entry(dev_data->devid, domain,
  dev_data->ats.enabled, dev_data->iommu_v2);
-   clone_aliases(dev_data->pdev);
+   clone_aliases(dev_data->dev);
}
 }
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v1 00/37] iommu/amd: Add multiple PCI segments support

2022-04-04 Thread Vasant Hegde via iommu
Newer AMD systems can support multiple PCI segments, where each segment
contains one or more IOMMU instances. However, an IOMMU instance can only
support a single PCI segment.

Current code assumes a system contains only one PCI segment (segment 0)
and creates global data structures such as device table, rlookup table,
etc.

This series introduces per-PCI-segment data structure, which contains
device table, alias table, etc. For each PCI segment, all IOMMUs
share the same data structure. The series also makes necessary code
adjustment and logging enhancements. Finally it removes global data
structures like device table, alias table, etc.

In case of system w/ single PCI segment (e.g. PCI segment ID is zero),
IOMMU driver allocates one PCI segment data structure, which will
be shared by all IOMMUs.

Patch 1 Updates struct iommu_dev_data defination.

Patch 2 - 13 introduce  new PCI segment structure and allocate per
data structures, and introduce the amd_iommu.pci_seg pointer to point
to the corresponded pci_segment structure. Also, we have introduced
a helper function rlookup_amd_iommu() to reverse-lookup each iommu
for a particular device.

Patch 14 - 29 adopt to per PCI segment data structure and removes
global data structure.

Patch 30 fixes flushing logic to flush upto last_bdf.

Patch 31 - 37 convert usages of 16-bit PCI device ID to include
16-bit segment ID.


RFC patchset : 
https://lore.kernel.org/linux-iommu/20220311094854.31595-1-vasant.he...@amd.com/T/#t

Changes in RFC -> v1:
  - Rebased patches on top of iommu/next tree.
  - Update struct iommu_dev_data defination
  - Updated few log message to print segment ID
  - Fix smatch warnings


Regards,
Vasant


Suravee Suthikulpanit (21):
  iommu/amd: Introduce per PCI segment device table
  iommu/amd: Introduce per PCI segment rlookup table
  iommu/amd: Introduce per PCI segment old_dev_tbl_cpy
  iommu/amd: Introduce per PCI segment alias_table
  iommu/amd: Convert to use rlookup_amd_iommu helper function
  iommu/amd: Update irq_remapping_alloc to use IOMMU lookup helper function
  iommu/amd: Introduce struct amd_ir_data.iommu
  iommu/amd: Update amd_irte_ops functions
  iommu/amd: Update alloc_irq_table and alloc_irq_index
  iommu/amd: Update set_dte_entry and clear_dte_entry
  iommu/amd: Update iommu_ignore_device
  iommu/amd: Update dump_dte_entry
  iommu/amd: Update set_dte_irq_entry
  iommu/amd: Update (un)init_device_table_dma()
  iommu/amd: Update set_dev_entry_bit() and get_dev_entry_bit()
  iommu/amd: Remove global amd_iommu_dev_table
  iommu/amd: Remove global amd_iommu_alias_table
  iommu/amd: Introduce get_device_sbdf_id() helper function
  iommu/amd: Include PCI segment ID when initialize IOMMU
  iommu/amd: Specify PCI segment ID when getting pci device
  iommu/amd: Add PCI segment support for ivrs_ioapic, ivrs_hpet, ivrs_acpihid 
commands

Vasant Hegde (16):
  iommu/amd: Update struct iommu_dev_data defination
  iommu/amd: Introduce pci segment structure
  iommu/amd: Introduce per PCI segment irq_lookup_table
  iommu/amd: Introduce per PCI segment dev_data_list
  iommu/amd: Introduce per PCI segment unity map list
  iommu/amd: Introduce per PCI segment last_bdf
  iommu/amd: Introduce per PCI segment device table size
  iommu/amd: Introduce per PCI segment alias table size
  iommu/amd: Introduce per PCI segment rlookup table size
  iommu/amd: Convert to use per PCI segment irq_lookup_table
  iommu/amd: Convert to use per PCI segment rlookup_table
  iommu/amd: Remove global amd_iommu_last_bdf
  iommu/amd: Flush upto last_bdf only
  iommu/amd: Print PCI segment ID in error log messages
  iommu/amd: Update device_state structure to include PCI seg ID
  iommu/amd: Update amd_iommu_fault structure to include PCI seg ID

 .../admin-guide/kernel-parameters.txt |  34 +-
 drivers/iommu/amd/amd_iommu.h |  13 +-
 drivers/iommu/amd/amd_iommu_types.h   | 127 +++-
 drivers/iommu/amd/init.c  | 683 +++---
 drivers/iommu/amd/iommu.c | 540 --
 drivers/iommu/amd/iommu_v2.c  |  67 +-
 drivers/iommu/amd/quirks.c|   4 +-
 7 files changed, 884 insertions(+), 584 deletions(-)

-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: remove unneeded validity check on dev

2022-04-04 Thread Muhammad Usama Anjum
Any thoughts?

On 3/13/22 8:03 PM, Muhammad Usama Anjum wrote:
> dev_iommu_priv_get() is being used at the top of this function which
> dereferences dev. Dev cannot be NULL after this. Remove the validity
> check on dev and simplify the code.
> 
> Signed-off-by: Muhammad Usama Anjum 
> ---
>  drivers/iommu/intel/iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index df5c62ecf942b..f79edbbd651a4 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2502,7 +2502,7 @@ static struct dmar_domain 
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>   }
>   }
>  
> - if (dev && domain_context_mapping(domain, dev)) {
> + if (domain_context_mapping(domain, dev)) {
>   dev_err(dev, "Domain context map failed\n");
>   dmar_remove_one_dev_info(dev);
>   return NULL;

-- 
Muhammad Usama Anjum
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/15] swiotlb: provide swiotlb_init variants that remap the buffer

2022-04-04 Thread Dongli Zhang



On 4/3/22 10:05 PM, Christoph Hellwig wrote:
> To shared more code between swiotlb and xen-swiotlb, offer a
> swiotlb_init_remap interface and add a remap callback to
> swiotlb_init_late that will allow Xen to remap the buffer the
> buffer without duplicating much of the logic.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/pci/sta2x11-fixup.c |  2 +-
>  include/linux/swiotlb.h  |  5 -
>  kernel/dma/swiotlb.c | 36 +---
>  3 files changed, 38 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
> index c7e6faf59a861..7368afc039987 100644
> --- a/arch/x86/pci/sta2x11-fixup.c
> +++ b/arch/x86/pci/sta2x11-fixup.c
> @@ -57,7 +57,7 @@ static void sta2x11_new_instance(struct pci_dev *pdev)
>   int size = STA2X11_SWIOTLB_SIZE;
>   /* First instance: register your own swiotlb area */
>   dev_info(>dev, "Using SWIOTLB (size %i)\n", size);
> - if (swiotlb_init_late(size, GFP_DMA))
> + if (swiotlb_init_late(size, GFP_DMA, NULL))
>   dev_emerg(>dev, "init swiotlb failed\n");
>   }
>   list_add(>list, _instance_list);
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index ee655f2e4d28b..7b50c82f84ce9 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -36,8 +36,11 @@ struct scatterlist;
>  
>  int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, unsigned int 
> flags);
>  unsigned long swiotlb_size_or_default(void);
> +void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
> + int (*remap)(void *tlb, unsigned long nslabs));
> +int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> + int (*remap)(void *tlb, unsigned long nslabs));
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> -int swiotlb_init_late(size_t size, gfp_t gfp_mask);
>  extern void __init swiotlb_update_mem_attributes(void);
>  
>  phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 119187afc65ec..d5fe8f5e08300 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -256,9 +256,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs,
>   * Statically reserve bounce buffer space and initialize bounce buffer data
>   * structures for the software IO TLB used to implement the DMA API.
>   */
> -void __init swiotlb_init(bool addressing_limit, unsigned int flags)
> +void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
> + int (*remap)(void *tlb, unsigned long nslabs))
>  {
> - size_t bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
> + unsigned long nslabs = default_nslabs;
> + size_t bytes;
>   void *tlb;
>  
>   if (!addressing_limit && !swiotlb_force_bounce)
> @@ -271,12 +273,23 @@ void __init swiotlb_init(bool addressing_limit, 
> unsigned int flags)
>* allow to pick a location everywhere for hypervisors with guest
>* memory encryption.
>*/
> +retry:
> + bytes = PAGE_ALIGN(default_nslabs << IO_TLB_SHIFT);
>   if (flags & SWIOTLB_ANY)
>   tlb = memblock_alloc(bytes, PAGE_SIZE);
>   else
>   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
>   if (!tlb)
>   goto fail;
> + if (remap && remap(tlb, nslabs) < 0) {
> + memblock_free(tlb, PAGE_ALIGN(bytes));
> +
> + nslabs = ALIGN(nslabs >> 1, IO_TLB_SEGSIZE);
> + if (nslabs < IO_TLB_MIN_SLABS)
> + panic("%s: Failed to remap %zu bytes\n",
> +   __func__, bytes);
> + goto retry;
> + }
>   if (swiotlb_init_with_tbl(tlb, default_nslabs, flags))
>   goto fail_free_mem;
>   return;
> @@ -287,12 +300,18 @@ void __init swiotlb_init(bool addressing_limit, 
> unsigned int flags)
>   pr_warn("Cannot allocate buffer");
>  }
>  
> +void __init swiotlb_init(bool addressing_limit, unsigned int flags)
> +{
> + return swiotlb_init_remap(addressing_limit, flags, NULL);
> +}
> +
>  /*
>   * Systems with larger DMA zones (those that don't support ISA) can
>   * initialize the swiotlb later using the slab allocator if needed.
>   * This should be just like above, but with some error catching.
>   */
> -int swiotlb_init_late(size_t size, gfp_t gfp_mask)
> +int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> + int (*remap)(void *tlb, unsigned long nslabs))
>  {
>   unsigned long nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
>   unsigned long bytes;
> @@ -303,6 +322,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask)
>   if (swiotlb_force_disable)
>   return 0;
>  
> +retry:
>   order = get_order(nslabs << IO_TLB_SHIFT);
>   nslabs = SLABS_PER_PAGE << order;
>   bytes = nslabs << IO_TLB_SHIFT;
> @@ -323,6 

Re: [PATCH RFC v2 06/11] iommu/vt-d: Add SVA domain support

2022-04-04 Thread Lu Baolu

On 2022/3/31 3:09, Jason Gunthorpe wrote:

On Tue, Mar 29, 2022 at 01:37:55PM +0800, Lu Baolu wrote:

Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops.

Signed-off-by: Lu Baolu 
  include/linux/intel-iommu.h |  1 +
  drivers/iommu/intel/iommu.c | 10 ++
  drivers/iommu/intel/svm.c   | 37 +
  3 files changed, 48 insertions(+)

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2f9891cb3d00..c14283137fb5 100644
+++ b/include/linux/intel-iommu.h
@@ -744,6 +744,7 @@ void intel_svm_unbind(struct iommu_sva *handle);
  u32 intel_svm_get_pasid(struct iommu_sva *handle);
  int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
+extern const struct iommu_domain_ops intel_svm_domain_ops;
  
  struct intel_svm_dev {

struct list_head list;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c1b91bce1530..5eae7cf9bee5 100644
+++ b/drivers/iommu/intel/iommu.c
@@ -4318,6 +4318,16 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
return domain;
case IOMMU_DOMAIN_IDENTITY:
return _domain->domain;
+#ifdef CONFIG_INTEL_IOMMU_SVM
+   case IOMMU_DOMAIN_SVA:
+   dmar_domain = alloc_domain(type);
+   if (!dmar_domain)
+   return NULL;
+   domain = _domain->domain;
+   domain->ops = _svm_domain_ops;
+
+   return domain;
+#endif /* CONFIG_INTEL_IOMMU_SVM */


If this is the usual pattern for drivers I would prefer to see an
alloc_sva op instead of more and more types.

Multiplexing functions is often not a great idea...


Robin mentioned that the iommu domain alloc/free interfaces are under
reforming. These cleanups need to wait to see what the final code looks
like.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v2 04/11] iommu: Add attach/detach_dev_pasid domain ops

2022-04-04 Thread Lu Baolu

Hi Jason,

On 2022/3/31 3:08, Jason Gunthorpe wrote:

On Tue, Mar 29, 2022 at 01:37:53PM +0800, Lu Baolu wrote:

Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):

  - SVA (Shared Virtual Address)
  - kernel DMA with PASID
  - hardware-assist mediated device

This adds a pair of common domain ops for this purpose and adds some
common helpers to attach/detach a domain to/from a {device, PASID} and
get/put the domain attached to {device, PASID}.

Signed-off-by: Lu Baolu 
  include/linux/iommu.h | 36 ++
  drivers/iommu/iommu.c | 88 +++
  2 files changed, 124 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 29c4c2edd706..a46285488a57 100644
+++ b/include/linux/iommu.h
@@ -269,6 +269,8 @@ struct iommu_ops {
   * struct iommu_domain_ops - domain specific operations
   * @attach_dev: attach an iommu domain to a device
   * @detach_dev: detach an iommu domain from a device
+ * @attach_dev_pasid: attach an iommu domain to a pasid of device
+ * @detach_dev_pasid: detach an iommu domain from a pasid of device
   * @map: map a physically contiguous memory region to an iommu domain
   * @map_pages: map a physically contiguous set of pages of the same size to
   * an iommu domain.
@@ -286,6 +288,10 @@ struct iommu_ops {
  struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+   int (*attach_dev_pasid)(struct iommu_domain *domain,
+   struct device *dev, ioasid_t id);
+   void (*detach_dev_pasid)(struct iommu_domain *domain,
+struct device *dev, ioasid_t id);


ID should be pasid for consistency


Sure.




+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+   struct iommu_group *group;
+   int ret = -EINVAL;
+   void *curr;
+
+   if (!domain->ops->attach_dev_pasid)
+   return -EINVAL;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return -ENODEV;
+
+   mutex_lock(>mutex);
+   /*
+* To keep things simple, we currently don't support IOMMU groups
+* with more than one device. Existing SVA-capable systems are not
+* affected by the problems that required IOMMU groups (lack of ACS
+* isolation, device ID aliasing and other hardware issues).
+*/
+   if (!iommu_group_singleton_lockdown(group))
+   goto out_unlock;
+
+   xa_lock(>pasid_array);
+   curr = __xa_cmpxchg(>pasid_array, pasid, NULL,
+   domain, GFP_KERNEL);
+   xa_unlock(>pasid_array);


Why the xa_lock/unlock? Just call the normal xa_cmpxchg?


I should use xa_cmpxchg() instead.





+void iommu_detach_device_pasid(struct iommu_domain *domain,
+  struct device *dev, ioasid_t pasid)
+{
+   struct iommu_group *group;
+
+   group = iommu_group_get(dev);
+   if (WARN_ON(!group))
+   return;


This group_get stuff really needs some cleaning, this makes no sense
at all.

If the kref to group can go to zero within this function then the
initial access of the kref is already buggy:

if (group)
kobject_get(group->devices_kobj);

Because it will crash or WARN_ON.

We don't hit this because it is required that a group cannot be
destroyed while a struct device has a driver bound, and all these
paths are driver bound paths.

So none of these group_get/put patterns have any purpose at all, and
now we are adding impossible WARN_ONs to..


The original intention of this check is that the helper is called on the
correct device. I agree that WARN_ON() is unnecessary because NULL
pointer reference will be caught automatically.




+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+   struct iommu_domain *domain;
+   struct iommu_group *group;
+
+   group = iommu_group_get(dev);
+   if (!group)
+   return NULL;


And now we are doing useless things on a performance path!


Agreed.




+   mutex_lock(>mutex);
+   domain = xa_load(>pasid_array, pasid);
+   if (domain && domain->type == IOMMU_DOMAIN_SVA)
+   iommu_sva_domain_get_user(domain);
+   mutex_unlock(>mutex);
+   iommu_group_put(group);


Why do we need so much locking on a performance path? RCU out of the
xarray..

Not sure I see how this get_user refcounting can work ?


I should move the refcountering things to iommu_domain and make the
change easier for review. Will improve this in the new version.



Jason


Best regards,
baolu

Re: [PATCH RFC v2 03/11] iommu/sva: Add iommu_domain type for SVA

2022-04-04 Thread Lu Baolu

Hi Jason and Kevin,

On 2022/4/3 7:32, Jason Gunthorpe wrote:

On Sat, Apr 02, 2022 at 08:43:16AM +, Tian, Kevin wrote:


This assumes any domain is interchangeable with any device, which is
not the iommu model. We need a domain op to check if a device is
compatiable with the domain for vfio an iommufd, this should do the
same.


This suggests that mm_struct needs to include the format information
of the CPU page table so the format can be checked by the domain op?


No, Linux does not support multiple formats for CPU page tables,
AFAICT, and creating the SVA domain in the first place should check
this.


It means each mm can have a list of domains associated with it and a
new domain is auto-created if the device doesn't work with any of the
existing domains.


mm has only one page table and one format. If a device is incompatible
with an existing domain wrapping that page table, how come creating
another domain could make it compatible?


Because domains wrap more than just the IOPTE format, they have
additional data related to the IOMMU HW block itself. Imagine a SOC
with two IOMMU HW blocks that can both process the CPU IOPTE format,
but have different configuration.

So if device A users IOMMU A it needs an iommu_domain from driver A and
same for another device B, even if both iommu_domains are thin
wrappers around the same mm_struct.


How about below data structure design?

- [New]struct iommu_sva_ioas
 Represent the I/O address space shared with an application CPU address
 space. This structure has a 1:1 relationship with an mm_struct. It
 graps a "mm->mm_count" refcount during creation and drop it on release.

struct iommu_sva_ioas {
struct mm_struct *mm;
ioasid_t pasid;

/* Counter of domains attached to this ioas. */
refcount_t users;

/* All bindings are linked here. */
struct list_head bonds;
};

- [Enhance existing] struct iommu_domain (IOMMU_DOMAIN_SVA type)
 Represent a hardware pagetable that the IOMMU hardware could use for
 SVA translation. Multiple iommu domains could be bound with an SVA ioas
 and each graps a refcount from ioas in order to make sure ioas could
 only be freed after all domains have been unbound.

@@ -95,6 +101,7 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+   struct iommu_sva_ioas *sva_ioas;
 };


- [Enhance existing] struct iommu_sva
  Represent a bond relationship between an SVA ioas and an iommu domain.
  If a bond already exists, it's reused and a reference is taken.

/**
 * struct iommu_sva - handle to a device-mm bond
 */
struct iommu_sva {
struct device   *dev;
struct iommu_sva_ioas   *sva_ioas;
struct iommu_domain *domain;
/* Link to sva ioas's bonds list */
struct list_headnode;
refcount_t  users;
};

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu