RE: [PATCH v2 4/4] iommu/vt-d: Size Page Request Queue to avoid overflow condition

2022-04-21 Thread Tian, Kevin
> From: Lu Baolu
> Sent: Thursday, April 21, 2022 7:36 PM
> 
> PRQ overflow may cause I/O throughput congestion, resulting in unnecessary
> degradation of I/O performance. Appropriately increasing the length of PRQ
> can greatly reduce the occurrence of PRQ overflow. The count of maximum
> page requests that can be generated in parallel by a PCIe device is
> statically defined in the Outstanding Page Request Capacity field of the
> PCIe ATS configure space.
> 
> The new length of PRQ is calculated by summing up the value of Outstanding
> Page Request Capacity register across all devices where Page Requests are
> supported on the real PR-capable platform (Intel Sapphire Rapids). The
> result is round to the nearest higher power of 2.
> 
> The PRQ length is also double sized as the VT-d IOMMU driver only updates
> the Page Request Queue Head Register (PQH_REG) after processing the
> entire
> queue.
> 
> Signed-off-by: Lu Baolu 

Reviewed-by: Kevin Tian 

> ---
>  include/linux/intel-svm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index b3b125b332aa..207ef06ba3e1 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -9,7 +9,7 @@
>  #define __INTEL_SVM_H__
> 
>  /* Page Request Queue depth */
> -#define PRQ_ORDER2
> +#define PRQ_ORDER4
>  #define PRQ_RING_MASK((0x1000 << PRQ_ORDER) - 0x20)
>  #define PRQ_DEPTH((0x1000 << PRQ_ORDER) >> 5)
> 
> --
> 2.25.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 3/4] iommu/vt-d: Drop stop marker messages

2022-04-21 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Thursday, April 21, 2022 7:36 PM
> 
> The page fault handling framework in the IOMMU core explicitly states
> that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must
> discard them before reporting faults. This handles Stop Marker messages
> in prq_event_thread() before reporting events to the core.
> 
> The VT-d driver explicitly drains the pending page requests when a CPU
> page table (represented by a mm struct) is unbound from a PASID according
> to the procedures defined in the VT-d spec. The Stop Marker messages do
> not need a response. Hence, it is safe to drop the Stop Marker messages
> silently if any of them is found in the page request queue.
> 
> Fixes: d5b9e4bfe0d88 ("iommu/vt-d: Report prq to io-pgfault framework")
> Signed-off-by: Lu Baolu 
> Reviewed-by: Jacob Pan 

Reviewed-by: Kevin Tian 

> ---
>  drivers/iommu/intel/svm.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index c720d1be992d..0741ec165673 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -760,6 +760,10 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>   goto bad_req;
>   }
> 
> + /* Drop Stop Marker message. No need for a response. */
> + if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
> + goto prq_advance;
> +
>   if (!svm || svm->pasid != req->pasid) {
>   /*
>* It can't go away, because the driver is not
> permitted
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 2/4] iommu/vt-d: Set PGSNP bit in pasid table entry for SVA binding

2022-04-21 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Thursday, April 21, 2022 7:36 PM
> 
> This field make the requests snoop processor caches irrespective of
> other attributes in the request or other fields in paging structure
> entries used to translate the request.

I think you want to first point out the fact that SVA wants snoop
cache instead of just talking about the effect of PGSNP.

But thinking more I wonder why PGSNP is ever required. This is
similar to DMA API case. x86 is already cache coherent for normal
DMA (if not setting PCI no-snoop) and if the driver knows no-snoop
is incompatible to SVA API then it should avoid triggering no-snoop
traffic for SVA usage. In this case it is pointless for IOMMU driver
to enable force-snooping. Even in the future certain platform allows
no-snoop usage w/ SVA (I'm not sure how it works) this again should
be reflected by additional SVA APIs for driver to explicitly manage.

force-snoop should be enabled only in device assignment case IMHO,
orthogonal to whether vSVA is actually used.

Did I misunderstand the motivation here?

> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/svm.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 23a38763c1d1..c720d1be992d 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -391,9 +391,12 @@ static struct iommu_sva
> *intel_svm_bind_mm(struct intel_iommu *iommu,
>   }
> 
>   /* Setup the pasid table: */
> - sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
> - PASID_FLAG_SUPERVISOR_MODE : 0;
> - sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ?
> PASID_FLAG_FL5LP : 0;
> + sflags = PASID_FLAG_PAGE_SNOOP;
> + if (flags & SVM_FLAG_SUPERVISOR_MODE)
> + sflags |= PASID_FLAG_SUPERVISOR_MODE;
> + if (cpu_feature_enabled(X86_FEATURE_LA57))
> + sflags |= PASID_FLAG_FL5LP;
> +
>   spin_lock_irqsave(>lock, iflags);
>   ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm-
> >pasid,
>   FLPT_DEFAULT_DID, sflags);
> --
> 2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry

2022-04-21 Thread Tian, Kevin
> From: Lu Baolu
> Sent: Thursday, April 21, 2022 7:36 PM
> 
> The latest VT-d specification states that the PGSNP field in the pasid
> table entry should be treated as Reserved(0) for implementations not
> supporting Snoop Control (SC=0 in the Extended Capability Register).
> This adds a check before setting the field.
> 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel/pasid.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index f8d215d85695..5cb2daa2b8cb 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -625,8 +625,14 @@ int intel_pasid_setup_first_level(struct intel_iommu
> *iommu,
>   }
>   }
> 
> - if (flags & PASID_FLAG_PAGE_SNOOP)
> - pasid_set_pgsnp(pte);
> + if (flags & PASID_FLAG_PAGE_SNOOP) {
> + if (ecap_sc_support(iommu->ecap)) {
> + pasid_set_pgsnp(pte);
> + } else {
> + pasid_clear_entry(pte);
> + return -EINVAL;
> + }
> + }
> 
>   pasid_set_domain_id(pte, did);
>   pasid_set_address_width(pte, iommu->agaw);
> @@ -710,7 +716,8 @@ int intel_pasid_setup_second_level(struct
> intel_iommu *iommu,
>   pasid_set_fault_enable(pte);
>   pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> 
> - if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
> + if (ecap_sc_support(iommu->ecap) &&
> + domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
>   pasid_set_pgsnp(pte);
> 

This should be rebased on top of Jason's enforce coherency series
instead of blindly setting it. No matter whether it's legacy mode
where we set SNP in PTE or scalable mode where we set PGSNP
in PASID entry for entire page table, the trigger point should be
same i.e. when someone calls enforce_cache_coherency().

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/25] dma-direct: take dma-ranges/offsets into account in resource mapping

2022-04-21 Thread Robin Murphy

On 2022-04-21 18:35, Serge Semin wrote:

On Thu, Apr 21, 2022 at 04:45:36PM +0200, Christoph Hellwig wrote:

On Wed, Apr 20, 2022 at 11:55:38AM +0300, Serge Semin wrote:

On Wed, Apr 20, 2022 at 10:47:46AM +0200, Christoph Hellwig wrote:

I can't really comment on the dma-ranges exlcusion for P2P mappings,
as that predates my involvedment, however:


My example wasn't specific to the PCIe P2P transfers, but about PCIe
devices reaching some platform devices over the system interconnect
bus.


So strike PCIe, but this our definition of Peer to Peer accesses.


What if I get to have a physical address of a platform device and want
have that device being accessed by a PCIe peripheral device? The
dma_map_resource() seemed very much suitable for that. But considering
what you say it isn't.





dma_map_resource is the right thing for that.  But the physical address
of MMIO ranges in the platform device should not have struct pages
allocated for it, and thus the other dma_map_* APIs should not work on
it to start with.


The problem is that the dma_map_resource() won't work for that, but
presumably the dma_map_sg()-like methods will (after some hacking with
the phys address, but anyway). Consider the system diagram in my
previous email. Here is what I would do to initialize a DMA
transaction between a platform device and a PCIe peripheral device:

1) struct resource *rsc = platform_get_resource(plat_dev, IORESOURCE_MEM, 0);

2) dma_addr_t dar = dma_map_resource(_dev->dev, rsc->start, rsc->end - 
rsc->start + 1,
   DMA_FROM_DEVICE, 0);

3) dma_addr_t sar;
void *tmp = dma_alloc_coherent(_dev->dev, PAGE_SIZE, , GFP_KERNEL);
memset(tmp, 0xaa, PAGE_SIZE);

4) PCIe device: DMA.DAR=dar, DMA.SAR=sar. RUN.

If there is no dma-ranges specified in the PCIe Host controller
DT-node, the PCIe peripheral devices will see the rest of the system
memory as is (no offsets and remappings). But if there is dma-ranges
with some specific system settings it may affect the PCIe MRd/MWr TLPs
address translation including the addresses targeted to the MMIO
space. In that case the mapping performed on step 2) will return a
wrong DMA-address since the corresponding dma_direct_map_resource()
just returns the passed physical address missing the
'pci_dev->dma_range_map'-based mapping performed in
translate_phys_to_dma().

Note the mapping on step 3) works correctly because it calls the
translate_phys_to_dma() of the direct DMA interface thus taking the
PCie dma-ranges into account.

To sum up as I see it either restricting dma_map_resource() to map
just the intra-bus addresses was wrong or there must be some
additional mapping infrastructure for the denoted systems. Though I
don't see a way the dma_map_resource() could be fixed to be suitable
for each considered cases.


FWIW the current semantics of dma_map_resource() are basically just to 
insert IOMMU awareness where dmaengine drivers were previously just 
casting phys_addr_t to dma_addr_t (or u32, or whatever else they put 
into their descriptor/register/etc.) IIRC there was a bit of a question 
whether it really belonged in the DMA API at all, since it's not really 
a "DMA" operation in the conventional sense, and convenience was the 
only real deciding argument. The relevant drivers at the time were not 
taking dma_pfn_offset into account when consuming physical addresses 
directly, so the new API didn't either.


That's just how things got to where they are today. Once again, I'm not 
saying that what we have now is necessarily right, or that your change 
is necessarily wrong, I just really want to understand specifically 
*why* you need to make it, so we can evaluate the risk of possible 
breakage either way. Theoretical "if"s aren't really enough.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/13] iommu/virtio: Clean up bus_set_iommu()

2022-04-21 Thread Robin Murphy

On 2022-04-21 18:12, Jean-Philippe Brucker wrote:

On Thu, Apr 14, 2022 at 01:42:41PM +0100, Robin Murphy wrote:

Stop calling bus_set_iommu() since it's now unnecessary, and simplify
the probe failure path accordingly.

Signed-off-by: Robin Murphy 
---
  drivers/iommu/virtio-iommu.c | 24 
  1 file changed, 24 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 25be4b822aa0..371f8657c0ce 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -7,7 +7,6 @@
  
  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
  
-#include 

  #include 
  #include 
  #include 


 isn't needed anymore either. In any case it
looks great, thanks


Ha, it totally passed me by that this one *isn't* a platform driver, derp :)


Reviewed-by: Jean-Philippe Brucker 

and tested on QEMU (so only PCI for now)


Thanks!

Robin.


@@ -1146,26 +1145,6 @@ static int viommu_probe(struct virtio_device *vdev)
  
  	iommu_device_register(>iommu, _ops, parent_dev);
  
-#ifdef CONFIG_PCI

-   if (pci_bus_type.iommu_ops != _ops) {
-   ret = bus_set_iommu(_bus_type, _ops);
-   if (ret)
-   goto err_unregister;
-   }
-#endif
-#ifdef CONFIG_ARM_AMBA
-   if (amba_bustype.iommu_ops != _ops) {
-   ret = bus_set_iommu(_bustype, _ops);
-   if (ret)
-   goto err_unregister;
-   }
-#endif
-   if (platform_bus_type.iommu_ops != _ops) {
-   ret = bus_set_iommu(_bus_type, _ops);
-   if (ret)
-   goto err_unregister;
-   }
-
vdev->priv = viommu;
  
  	dev_info(dev, "input address: %u bits\n",

@@ -1174,9 +1153,6 @@ static int viommu_probe(struct virtio_device *vdev)
  
  	return 0;
  
-err_unregister:

-   iommu_device_sysfs_remove(>iommu);
-   iommu_device_unregister(>iommu);
  err_free_vqs:
vdev->config->del_vqs(vdev);
  
--

2.28.0.dirty


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/25] dma-direct: take dma-ranges/offsets into account in resource mapping

2022-04-21 Thread Serge Semin
On Thu, Apr 21, 2022 at 04:45:36PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 20, 2022 at 11:55:38AM +0300, Serge Semin wrote:
> > On Wed, Apr 20, 2022 at 10:47:46AM +0200, Christoph Hellwig wrote:
> > > I can't really comment on the dma-ranges exlcusion for P2P mappings,
> > > as that predates my involvedment, however:
> > 
> > My example wasn't specific to the PCIe P2P transfers, but about PCIe
> > devices reaching some platform devices over the system interconnect
> > bus.
> 
> So strike PCIe, but this our definition of Peer to Peer accesses.
> 
> > What if I get to have a physical address of a platform device and want
> > have that device being accessed by a PCIe peripheral device? The
> > dma_map_resource() seemed very much suitable for that. But considering
> > what you say it isn't.
> 

> dma_map_resource is the right thing for that.  But the physical address
> of MMIO ranges in the platform device should not have struct pages
> allocated for it, and thus the other dma_map_* APIs should not work on
> it to start with.

The problem is that the dma_map_resource() won't work for that, but
presumably the dma_map_sg()-like methods will (after some hacking with
the phys address, but anyway). Consider the system diagram in my
previous email. Here is what I would do to initialize a DMA
transaction between a platform device and a PCIe peripheral device:

1) struct resource *rsc = platform_get_resource(plat_dev, IORESOURCE_MEM, 0);

2) dma_addr_t dar = dma_map_resource(_dev->dev, rsc->start, rsc->end - 
rsc->start + 1,
  DMA_FROM_DEVICE, 0);

3) dma_addr_t sar;
   void *tmp = dma_alloc_coherent(_dev->dev, PAGE_SIZE, , GFP_KERNEL);
   memset(tmp, 0xaa, PAGE_SIZE);

4) PCIe device: DMA.DAR=dar, DMA.SAR=sar. RUN.

If there is no dma-ranges specified in the PCIe Host controller
DT-node, the PCIe peripheral devices will see the rest of the system
memory as is (no offsets and remappings). But if there is dma-ranges
with some specific system settings it may affect the PCIe MRd/MWr TLPs
address translation including the addresses targeted to the MMIO
space. In that case the mapping performed on step 2) will return a
wrong DMA-address since the corresponding dma_direct_map_resource()
just returns the passed physical address missing the
'pci_dev->dma_range_map'-based mapping performed in
translate_phys_to_dma().

Note the mapping on step 3) works correctly because it calls the
translate_phys_to_dma() of the direct DMA interface thus taking the
PCie dma-ranges into account.

To sum up as I see it either restricting dma_map_resource() to map
just the intra-bus addresses was wrong or there must be some
additional mapping infrastructure for the denoted systems. Though I
don't see a way the dma_map_resource() could be fixed to be suitable
for each considered cases.

-Sergey
   
map the platforms

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/13] iommu/virtio: Clean up bus_set_iommu()

2022-04-21 Thread Jean-Philippe Brucker
On Thu, Apr 14, 2022 at 01:42:41PM +0100, Robin Murphy wrote:
> Stop calling bus_set_iommu() since it's now unnecessary, and simplify
> the probe failure path accordingly.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/virtio-iommu.c | 24 
>  1 file changed, 24 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 25be4b822aa0..371f8657c0ce 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -7,7 +7,6 @@
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
> -#include 
>  #include 
>  #include 
>  #include 

 isn't needed anymore either. In any case it
looks great, thanks

Reviewed-by: Jean-Philippe Brucker 

and tested on QEMU (so only PCI for now)


> @@ -1146,26 +1145,6 @@ static int viommu_probe(struct virtio_device *vdev)
>  
>   iommu_device_register(>iommu, _ops, parent_dev);
>  
> -#ifdef CONFIG_PCI
> - if (pci_bus_type.iommu_ops != _ops) {
> - ret = bus_set_iommu(_bus_type, _ops);
> - if (ret)
> - goto err_unregister;
> - }
> -#endif
> -#ifdef CONFIG_ARM_AMBA
> - if (amba_bustype.iommu_ops != _ops) {
> - ret = bus_set_iommu(_bustype, _ops);
> - if (ret)
> - goto err_unregister;
> - }
> -#endif
> - if (platform_bus_type.iommu_ops != _ops) {
> - ret = bus_set_iommu(_bus_type, _ops);
> - if (ret)
> - goto err_unregister;
> - }
> -
>   vdev->priv = viommu;
>  
>   dev_info(dev, "input address: %u bits\n",
> @@ -1174,9 +1153,6 @@ static int viommu_probe(struct virtio_device *vdev)
>  
>   return 0;
>  
> -err_unregister:
> - iommu_device_sysfs_remove(>iommu);
> - iommu_device_unregister(>iommu);
>  err_free_vqs:
>   vdev->config->del_vqs(vdev);
>  
> -- 
> 2.28.0.dirty
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [Patch v2] iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu

2022-04-21 Thread Krishna Reddy via iommu
> Tegra194 and Tegra234 SoCs have the erratum that causes walk cache entries to
> not be invalidated correctly. The problem is that the walk cache index 
> generated
> for IOVA is not same across translation and invalidation requests. This is 
> leading
> to page faults when PMD entry is released during unmap and populated with
> new PTE table during subsequent map request. Disabling large page mappings
> avoids the release of PMD entry and avoid translations seeing stale PMD entry 
> in
> walk cache.
> Fix this by limiting the page mappings to PAGE_SIZE for Tegra194 and
> Tegra234 devices. This is recommended fix from Tegra hardware design team.
> 
> Co-developed-by: Pritesh Raithatha 
> Signed-off-by: Pritesh Raithatha 
> Signed-off-by: Ashish Mhetre 
> ---
> Changes in v2:
> - Using init_context() to override pgsize_bitmap instead of new function
> 
>  drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c | 30
> 
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
> b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
> index 01e9b50b10a1..87bf522b9d2e 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
> @@ -258,6 +258,34 @@ static void nvidia_smmu_probe_finalize(struct
> arm_smmu_device *smmu, struct devi
>   dev_name(dev), err);
>  }
> 
> +static int nvidia_smmu_init_context(struct arm_smmu_domain
> *smmu_domain,
> + struct io_pgtable_cfg *pgtbl_cfg,
> + struct device *dev)
> +{
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
> + const struct device_node *np = smmu->dev->of_node;
> +
> + /*
> +  * Tegra194 and Tegra234 SoCs have the erratum that causes walk
> cache
> +  * entries to not be invalidated correctly. The problem is that the walk
> +  * cache index generated for IOVA is not same across translation and
> +  * invalidation requests. This is leading to page faults when PMD entry
> +  * is released during unmap and populated with new PTE table during
> +  * subsequent map request. Disabling large page mappings avoids the
> +  * release of PMD entry and avoid translations seeing stale PMD entry in
> +  * walk cache.
> +  * Fix this by limiting the page mappings to PAGE_SIZE on Tegra194 and
> +  * Tegra234.
> +  */
> + if (of_device_is_compatible(np, "nvidia,tegra234-smmu") ||
> + of_device_is_compatible(np, "nvidia,tegra194-smmu")) {
> + smmu->pgsize_bitmap = PAGE_SIZE;
> + pgtbl_cfg->pgsize_bitmap = smmu->pgsize_bitmap;
> + }
> +
> + return 0;
> +}
> +
>  static const struct arm_smmu_impl nvidia_smmu_impl = {
>   .read_reg = nvidia_smmu_read_reg,
>   .write_reg = nvidia_smmu_write_reg,
> @@ -268,10 +296,12 @@ static const struct arm_smmu_impl
> nvidia_smmu_impl = {
>   .global_fault = nvidia_smmu_global_fault,
>   .context_fault = nvidia_smmu_context_fault,
>   .probe_finalize = nvidia_smmu_probe_finalize,
> + .init_context = nvidia_smmu_init_context,
>  };
> 
>  static const struct arm_smmu_impl nvidia_smmu_single_impl = {
>   .probe_finalize = nvidia_smmu_probe_finalize,
> + .init_context = nvidia_smmu_init_context,
>  };
> 

Reviewed-by: Krishna Reddy 

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 0/9] ACPI/IORT: Support for IORT RMR node

2022-04-21 Thread Robin Murphy

On 2022-04-21 15:43, Shameerali Kolothum Thodi wrote:




-Original Message-
From: Steven Price [mailto:steven.pr...@arm.com]
Sent: 21 April 2022 13:59
To: Shameerali Kolothum Thodi ;
linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
iommu@lists.linux-foundation.org
Cc: Linuxarm ; lorenzo.pieral...@arm.com;
j...@8bytes.org; robin.mur...@arm.com; w...@kernel.org; wanghuiqiang
; Guohanjun (Hanjun Guo)
; sami.muja...@arm.com; j...@solid-run.com;
eric.au...@redhat.com; laurentiu.tu...@nxp.com; h...@infradead.org
Subject: Re: [PATCH v10 0/9] ACPI/IORT: Support for IORT RMR node

On 20/04/2022 17:48, Shameer Kolothum wrote:

Hi

v9 --> v10
  - Dropped patch #1 ("Add temporary RMR node flag definitions") since
the ACPICA header updates patch is now in the mailing list[1]
  - Based on the suggestion from Christoph, introduced a
resv_region_free_fw_data() callback in struct iommu_resv_region and
used that to free RMR specific memory allocations.

Though there is a small change from v9 with respect to how we free up
the FW specific data, I have taken the liberty to pick up the R-by and
T-by tags from Lorenzo, Steve and Laurentiu. But please do take a look
again and let me know.


I've given this a go and it works fine on my Juno setup. So do keep my
T-by tag.


Many thanks for that.


Sami has been kind enough to give me an updated firmware which also
fixes the RMR node in the IORT. Although as mentioned before the details
of the RMR node are currently being ignored so this doesn't change the
functionality but silences the warning.


Strictly they're not ignored, you just won't be getting past the point 
where they're not entirely not ignored. It'll appear to work because 
arm_smmu_rmr_install_bypass_smr() just bypasses the whole stream until 
the actual device turns up to join up to the StreamID and the "real" 
processing of RMRs happens via iommu_create_device_direct_mappings() - 
if there's no actual HDLCD device described in the DSDT, that will never 
happen, and even if there is, chances are that things will currently 
happen in the wrong order such we'd end up waiting to replay 
iommu_probe_device() from acpi_iommu_configure_id() once a driver binds, 
and *that* definitely can't happen without teaching the HDLCD driver 
about ACPI.



My concern is that with the RMR region effectively ignored we may see
more broken firmware, and while a length of zero produces a warning, an
otherwise incorrect length will currently "silently work" but mean that
any future tightening would cause problems. For example if the SMMU
driver were to recreate the mappings to only cover the region specified
in the RMR it may not be large enough if the RMR base/length are not
correct.


Not sure how we can further validate the RMR if the firmware provides an
incorrect one. I see your point of future tightening causing problems
with broken firmware. But then it is indeed a "broken firmware"...

  It's up to the maintainers as to whether they see this as a

problem or not.


Hi Robin,

Any thoughts on this?


In general we can't second-guess firmware. Even a zero-length RMR should 
have ample opportunity to blow up outside this one corner case where 
Linux never gets to associate the StreamID with a corresponding device.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 4/9] ACPI/IORT: Add support to retrieve IORT RMR reserved regions

2022-04-21 Thread Shameerali Kolothum Thodi via iommu



> -Original Message-
> From: Christoph Hellwig [mailto:h...@infradead.org]
> Sent: 21 April 2022 07:49
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org; Linuxarm ;
> lorenzo.pieral...@arm.com; j...@8bytes.org; robin.mur...@arm.com;
> w...@kernel.org; wanghuiqiang ; Guohanjun
> (Hanjun Guo) ; steven.pr...@arm.com;
> sami.muja...@arm.com; j...@solid-run.com; eric.au...@redhat.com;
> laurentiu.tu...@nxp.com; h...@infradead.org
> Subject: Re: [PATCH v10 4/9] ACPI/IORT: Add support to retrieve IORT RMR
> reserved regions
> 
[...]

> >  void generic_iommu_put_resv_regions(struct device *dev, struct list_head
> *list)
> >  {
> > struct iommu_resv_region *entry, *next;
> >
> > -   list_for_each_entry_safe(entry, next, list, list)
> > +   list_for_each_entry_safe(entry, next, list, list) {
> > +   if (entry->resv_region_free_fw_data)
> > +   entry->resv_region_free_fw_data(dev, entry);
> > kfree(entry);
> 
> I'd move the kfree to the free callback if present.  This would also
> allow to hide the union from the common code entirely and use a
> container structure like:
> 
> struct iommu_iort_rmr_data {
>   struct iommu_resv_region rr;
> 
>   /* Stream IDs associated with IORT RMR entry */
>   const u32 *sids;
>   u32 num_sids;
> };

Ok. I will respin soon with the above changes.

Thanks,
Shameer 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/25] dma-direct: take dma-ranges/offsets into account in resource mapping

2022-04-21 Thread Christoph Hellwig
On Wed, Apr 20, 2022 at 11:55:38AM +0300, Serge Semin wrote:
> On Wed, Apr 20, 2022 at 10:47:46AM +0200, Christoph Hellwig wrote:
> > I can't really comment on the dma-ranges exlcusion for P2P mappings,
> > as that predates my involvedment, however:
> 
> My example wasn't specific to the PCIe P2P transfers, but about PCIe
> devices reaching some platform devices over the system interconnect
> bus.

So strike PCIe, but this our definition of Peer to Peer accesses.

> What if I get to have a physical address of a platform device and want
> have that device being accessed by a PCIe peripheral device? The
> dma_map_resource() seemed very much suitable for that. But considering
> what you say it isn't.

dma_map_resource is the right thing for that.  But the physical address
of MMIO ranges in the platform device should not have struct pages
allocated for it, and thus the other dma_map_* APIs should not work on
it to start with.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 1/2] iommu/amd: Enable swiotlb in all cases

2022-04-21 Thread Limonciello, Mario via iommu
[AMD Official Use Only]

> On Wed, Apr 06, 2022 at 05:04:52PM +, Limonciello, Mario wrote:
> > Considering before this fix effectively swiotlb was turned off on most AMD
> > systems, when this is picked up I think y'all should consider to add a:
> >
> > Cc: sta...@vger.kernel.org # 5.11+
> 
> Agreed.  I think this is for Joerg to pick up, and I'd love to see it
> picked up soon as I'll have to rebase my
> 
> "cleanup swiotlb initialization" series on top of it.

Joerg,

Just want to double check this wasn't lost the last few weeks.  I was just 
checking
iommu.git/next and didn't notice it still.

Thanks,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v10 0/9] ACPI/IORT: Support for IORT RMR node

2022-04-21 Thread Shameerali Kolothum Thodi via iommu



> -Original Message-
> From: Steven Price [mailto:steven.pr...@arm.com]
> Sent: 21 April 2022 13:59
> To: Shameerali Kolothum Thodi ;
> linux-arm-ker...@lists.infradead.org; linux-a...@vger.kernel.org;
> iommu@lists.linux-foundation.org
> Cc: Linuxarm ; lorenzo.pieral...@arm.com;
> j...@8bytes.org; robin.mur...@arm.com; w...@kernel.org; wanghuiqiang
> ; Guohanjun (Hanjun Guo)
> ; sami.muja...@arm.com; j...@solid-run.com;
> eric.au...@redhat.com; laurentiu.tu...@nxp.com; h...@infradead.org
> Subject: Re: [PATCH v10 0/9] ACPI/IORT: Support for IORT RMR node
> 
> On 20/04/2022 17:48, Shameer Kolothum wrote:
> > Hi
> >
> > v9 --> v10
> >  - Dropped patch #1 ("Add temporary RMR node flag definitions") since
> >the ACPICA header updates patch is now in the mailing list[1]
> >  - Based on the suggestion from Christoph, introduced a
> >resv_region_free_fw_data() callback in struct iommu_resv_region and
> >used that to free RMR specific memory allocations.
> >
> > Though there is a small change from v9 with respect to how we free up
> > the FW specific data, I have taken the liberty to pick up the R-by and
> > T-by tags from Lorenzo, Steve and Laurentiu. But please do take a look
> > again and let me know.
> 
> I've given this a go and it works fine on my Juno setup. So do keep my
> T-by tag.

Many thanks for that.

> Sami has been kind enough to give me an updated firmware which also
> fixes the RMR node in the IORT. Although as mentioned before the details
> of the RMR node are currently being ignored so this doesn't change the
> functionality but silences the warning.
> 
> My concern is that with the RMR region effectively ignored we may see
> more broken firmware, and while a length of zero produces a warning, an
> otherwise incorrect length will currently "silently work" but mean that
> any future tightening would cause problems. For example if the SMMU
> driver were to recreate the mappings to only cover the region specified
> in the RMR it may not be large enough if the RMR base/length are not
> correct.

Not sure how we can further validate the RMR if the firmware provides an
incorrect one. I see your point of future tightening causing problems
with broken firmware. But then it is indeed a "broken firmware"...

 It's up to the maintainers as to whether they see this as a
> problem or not.

Hi Robin,

Any thoughts on this?

Thanks,
Shameer

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/3] More ARM DMA ops cleanup

2022-04-21 Thread Robin Murphy

On 2022-04-21 15:13, Christoph Hellwig wrote:

On Thu, Apr 21, 2022 at 12:36:56PM +0100, Robin Murphy wrote:

Hi all,

Thanks to Christoph's latest series, I'm reminded that, if we're going
to give the ARM DMA ops some cleanup this cycle, it's as good a time as
any to dust off these old patches and add them on top as well. I've
based these on the arm-dma-direct branch which I assume matches the
patches posted at [1].


All these do look sensible to me.  But weren't you working on replacing
the ARM iommu dma_ops with dma-іommu anyway?


Yes, that's somewhat entangled with the IOMMU bus ops stuff, so I'll 
probably get to the point of having to revisit it in a couple of months 
or so. These patches are off the bottom of that stack from my first 
attempt, where the aim was to make the current ops the same shape first 
so that the switch is then easier to reason about (particularly in terms 
of sounding out any issues with the hooking up of dev->dma_coherent, 
although your series will now be taking most of the load off there).


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/3] More ARM DMA ops cleanup

2022-04-21 Thread Christoph Hellwig
On Thu, Apr 21, 2022 at 12:36:56PM +0100, Robin Murphy wrote:
> Hi all,
> 
> Thanks to Christoph's latest series, I'm reminded that, if we're going
> to give the ARM DMA ops some cleanup this cycle, it's as good a time as
> any to dust off these old patches and add them on top as well. I've
> based these on the arm-dma-direct branch which I assume matches the
> patches posted at [1].

All these do look sensible to me.  But weren't you working on replacing
the ARM iommu dma_ops with dma-іommu anyway?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 6/7] ARM: use the common dma_to_phys/phys_to_dma implementation where possible

2022-04-21 Thread Christoph Hellwig
On Thu, Apr 21, 2022 at 10:05:11AM +0200, Arnd Bergmann wrote:
> > -unsigned long __pfn_to_bus(unsigned long pfn)
> > +#else
> > +static inline unsigned long fb_bus_sdram_offset(void)
> >  {
> > -   return __pfn_to_phys(pfn) + (fb_bus_sdram_offset() - PHYS_OFFSET);
> > +   return BUS_OFFSET;
> >  }
> > -EXPORT_SYMBOL(__pfn_to_bus);
> > +#endif /* CONFIG_FOOTBRIDGE_ADDIN */
> 
> I have an older patch to remove CONFIG_FOOTBRIDGE_ADDIN
> completely, as it does a couple of other nasty things and there are
> apparently no users. Would that help here?

For this series it doesn't really make much of a difference.  The addin
case actually is simpler than the host mode for DMA,  But overall
CONFIG_FOOTBRIDGE_ADDIN seems to be a collection of special cases, so
if it is unused and can be remove that would probably be a good thing.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/7] ARM: remove the unused virt_to_dma helper

2022-04-21 Thread Christoph Hellwig
On Thu, Apr 21, 2022 at 10:00:55AM +0200, Arnd Bergmann wrote:
> I think __virt_to_bus() is now unused as well and could be removed
> in the same step.

Yes.

> It looks like __bus_to_virt() is still used in the ISA DMA API, but
> as that is only used on footbridge and rpc, the generic version of
> that could be moved into rpc (footbridge already has a custom
> version).

That sounds like a useful cleanup, but isn't really in scope for
this series.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 0/9] ACPI/IORT: Support for IORT RMR node

2022-04-21 Thread Steven Price
On 20/04/2022 17:48, Shameer Kolothum wrote:
> Hi
> 
> v9 --> v10
>  - Dropped patch #1 ("Add temporary RMR node flag definitions") since
>the ACPICA header updates patch is now in the mailing list[1]
>  - Based on the suggestion from Christoph, introduced a 
>resv_region_free_fw_data() callback in struct iommu_resv_region and
>used that to free RMR specific memory allocations.
> 
> Though there is a small change from v9 with respect to how we free up
> the FW specific data, I have taken the liberty to pick up the R-by and
> T-by tags from Lorenzo, Steve and Laurentiu. But please do take a look
> again and let me know.

I've given this a go and it works fine on my Juno setup. So do keep my
T-by tag.

Sami has been kind enough to give me an updated firmware which also
fixes the RMR node in the IORT. Although as mentioned before the details
of the RMR node are currently being ignored so this doesn't change the
functionality but silences the warning.

My concern is that with the RMR region effectively ignored we may see
more broken firmware, and while a length of zero produces a warning, an
otherwise incorrect length will currently "silently work" but mean that
any future tightening would cause problems. For example if the SMMU
driver were to recreate the mappings to only cover the region specified
in the RMR it may not be large enough if the RMR base/length are not
correct. It's up to the maintainers as to whether they see this as a
problem or not.

Thanks,

Steve

> Thanks,
> Shameer
> [1] https://lore.kernel.org/all/44610361.fMDQidcC6G@kreacher/
> 
> From old:
> We have faced issues with 3408iMR RAID controller cards which
> fail to boot when SMMU is enabled. This is because these
> controllers make use of host memory for various caching related
> purposes and when SMMU is enabled the iMR firmware fails to
> access these memory regions as there is no mapping for them.
> IORT RMR provides a way for UEFI to describe and report these
> memory regions so that the kernel can make a unity mapping for
> these in SMMU.
> 
> Change History:
> 
> v8 --> v9
>  - Adressed comments from Robin on interfaces.
>  - Addressed comments from Lorenzo.
> 
> v7 --> v8
>   - Patch #1 has temp definitions for RMR related changes till
>     the ACPICA header changes are part of kernel.
>   - No early parsing of RMR node info and is only parsed at the
>     time of use.
>   - Changes to the RMR get/put API format compared to the
>     previous version.
>   - Support for RMR descriptor shared by multiple stream IDs.
> 
> v6 --> v7
>  -fix pointed out by Steve to the SMMUv2 SMR bypass install in patch #8.
> 
> v5 --> v6
> - Addressed comments from Robin & Lorenzo.
>   : Moved iort_parse_rmr() to acpi_iort_init() from
>     iort_init_platform_devices().
>   : Removed use of struct iort_rmr_entry during the initial
>     parse. Using struct iommu_resv_region instead.
>   : Report RMR address alignment and overlap errors, but continue.
>   : Reworked arm_smmu_init_bypass_stes() (patch # 6).
> - Updated SMMUv2 bypass SMR code. Thanks to Jon N (patch #8).
> - Set IOMMU protection flags(IOMMU_CACHE, IOMMU_MMIO) based
>   on Type of RMR region. Suggested by Jon N.
> 
> v4 --> v5
>  -Added a fw_data union to struct iommu_resv_region and removed
>   struct iommu_rmr (Based on comments from Joerg/Robin).
>  -Added iommu_put_rmrs() to release mem.
>  -Thanks to Steve for verifying on SMMUv2, but not added the Tested-by
>   yet because of the above changes.
> 
> v3 -->v4
> -Included the SMMUv2 SMR bypass install changes suggested by
>  Steve(patch #7)
> -As per Robin's comments, RMR reserve implementation is now
>  more generic  (patch #8) and dropped v3 patches 8 and 10.
> -Rebase to 5.13-rc1
> 
> RFC v2 --> v3
>  -Dropped RFC tag as the ACPICA header changes are now ready to be
>   part of 5.13[0]. But this series still has a dependency on that patch.
>  -Added IORT E.b related changes(node flags, _DSM function 5 checks for
>   PCIe).
>  -Changed RMR to stream id mapping from M:N to M:1 as per the spec and
>   discussion here[1].
>  -Last two patches add support for SMMUv2(Thanks to Jon Nettleton!)
> 
> Jon Nettleton (1):
>   iommu/arm-smmu: Get associated RMR info and install bypass SMR
> 
> Shameer Kolothum (8):
>   iommu: Introduce a union to struct iommu_resv_region
>   ACPI/IORT: Make iort_iommu_msi_get_resv_regions() return void
>   ACPI/IORT: Provide a generic helper to retrieve reserve regions
>   ACPI/IORT: Add support to retrieve IORT RMR reserved regions
>   ACPI/IORT: Add a helper to retrieve RMR info directly
>   iommu/arm-smmu-v3: Introduce strtab init helper
>   iommu/arm-smmu-v3: Refactor arm_smmu_init_bypass_stes() to force
> bypass
>   iommu/arm-smmu-v3: Get associated RMR info and install bypass STE
> 
>  drivers/acpi/arm64/iort.c   | 335 ++--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  78 -
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   |  52 +++
>  

Re: [Patch v2] iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu

2022-04-21 Thread Robin Murphy

On 2022-04-21 09:15, Ashish Mhetre wrote:

Tegra194 and Tegra234 SoCs have the erratum that causes walk cache
entries to not be invalidated correctly. The problem is that the walk
cache index generated for IOVA is not same across translation and
invalidation requests. This is leading to page faults when PMD entry is
released during unmap and populated with new PTE table during subsequent
map request. Disabling large page mappings avoids the release of PMD
entry and avoid translations seeing stale PMD entry in walk cache.
Fix this by limiting the page mappings to PAGE_SIZE for Tegra194 and
Tegra234 devices. This is recommended fix from Tegra hardware design
team.


Acked-by: Robin Murphy 


Co-developed-by: Pritesh Raithatha 
Signed-off-by: Pritesh Raithatha 
Signed-off-by: Ashish Mhetre 
---
Changes in v2:
- Using init_context() to override pgsize_bitmap instead of new function

  drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c | 30 
  1 file changed, 30 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
index 01e9b50b10a1..87bf522b9d2e 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
@@ -258,6 +258,34 @@ static void nvidia_smmu_probe_finalize(struct 
arm_smmu_device *smmu, struct devi
dev_name(dev), err);
  }
  
+static int nvidia_smmu_init_context(struct arm_smmu_domain *smmu_domain,

+   struct io_pgtable_cfg *pgtbl_cfg,
+   struct device *dev)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   const struct device_node *np = smmu->dev->of_node;
+
+   /*
+* Tegra194 and Tegra234 SoCs have the erratum that causes walk cache
+* entries to not be invalidated correctly. The problem is that the walk
+* cache index generated for IOVA is not same across translation and
+* invalidation requests. This is leading to page faults when PMD entry
+* is released during unmap and populated with new PTE table during
+* subsequent map request. Disabling large page mappings avoids the
+* release of PMD entry and avoid translations seeing stale PMD entry in
+* walk cache.
+* Fix this by limiting the page mappings to PAGE_SIZE on Tegra194 and
+* Tegra234.
+*/
+   if (of_device_is_compatible(np, "nvidia,tegra234-smmu") ||
+   of_device_is_compatible(np, "nvidia,tegra194-smmu")) {
+   smmu->pgsize_bitmap = PAGE_SIZE;
+   pgtbl_cfg->pgsize_bitmap = smmu->pgsize_bitmap;
+   }
+
+   return 0;
+}
+
  static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -268,10 +296,12 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.global_fault = nvidia_smmu_global_fault,
.context_fault = nvidia_smmu_context_fault,
.probe_finalize = nvidia_smmu_probe_finalize,
+   .init_context = nvidia_smmu_init_context,
  };
  
  static const struct arm_smmu_impl nvidia_smmu_single_impl = {

.probe_finalize = nvidia_smmu_probe_finalize,
+   .init_context = nvidia_smmu_init_context,
  };
  
  struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/4] iommu/vt-d: Size Page Request Queue to avoid overflow condition

2022-04-21 Thread Lu Baolu
PRQ overflow may cause I/O throughput congestion, resulting in unnecessary
degradation of I/O performance. Appropriately increasing the length of PRQ
can greatly reduce the occurrence of PRQ overflow. The count of maximum
page requests that can be generated in parallel by a PCIe device is
statically defined in the Outstanding Page Request Capacity field of the
PCIe ATS configure space.

The new length of PRQ is calculated by summing up the value of Outstanding
Page Request Capacity register across all devices where Page Requests are
supported on the real PR-capable platform (Intel Sapphire Rapids). The
result is round to the nearest higher power of 2.

The PRQ length is also double sized as the VT-d IOMMU driver only updates
the Page Request Queue Head Register (PQH_REG) after processing the entire
queue.

Signed-off-by: Lu Baolu 
---
 include/linux/intel-svm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index b3b125b332aa..207ef06ba3e1 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -9,7 +9,7 @@
 #define __INTEL_SVM_H__
 
 /* Page Request Queue depth */
-#define PRQ_ORDER  2
+#define PRQ_ORDER  4
 #define PRQ_RING_MASK  ((0x1000 << PRQ_ORDER) - 0x20)
 #define PRQ_DEPTH  ((0x1000 << PRQ_ORDER) >> 5)
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] iommu/vt-d: Drop stop marker messages

2022-04-21 Thread Lu Baolu
The page fault handling framework in the IOMMU core explicitly states
that it doesn't handle PCI PASID Stop Marker and the IOMMU drivers must
discard them before reporting faults. This handles Stop Marker messages
in prq_event_thread() before reporting events to the core.

The VT-d driver explicitly drains the pending page requests when a CPU
page table (represented by a mm struct) is unbound from a PASID according
to the procedures defined in the VT-d spec. The Stop Marker messages do
not need a response. Hence, it is safe to drop the Stop Marker messages
silently if any of them is found in the page request queue.

Fixes: d5b9e4bfe0d88 ("iommu/vt-d: Report prq to io-pgfault framework")
Signed-off-by: Lu Baolu 
Reviewed-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index c720d1be992d..0741ec165673 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -760,6 +760,10 @@ static irqreturn_t prq_event_thread(int irq, void *d)
goto bad_req;
}
 
+   /* Drop Stop Marker message. No need for a response. */
+   if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
+   goto prq_advance;
+
if (!svm || svm->pasid != req->pasid) {
/*
 * It can't go away, because the driver is not permitted
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] iommu/vt-d: Set PGSNP bit in pasid table entry for SVA binding

2022-04-21 Thread Lu Baolu
This field make the requests snoop processor caches irrespective of
other attributes in the request or other fields in paging structure
entries used to translate the request.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/svm.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 23a38763c1d1..c720d1be992d 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -391,9 +391,12 @@ static struct iommu_sva *intel_svm_bind_mm(struct 
intel_iommu *iommu,
}
 
/* Setup the pasid table: */
-   sflags = (flags & SVM_FLAG_SUPERVISOR_MODE) ?
-   PASID_FLAG_SUPERVISOR_MODE : 0;
-   sflags |= cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0;
+   sflags = PASID_FLAG_PAGE_SNOOP;
+   if (flags & SVM_FLAG_SUPERVISOR_MODE)
+   sflags |= PASID_FLAG_SUPERVISOR_MODE;
+   if (cpu_feature_enabled(X86_FEATURE_LA57))
+   sflags |= PASID_FLAG_FL5LP;
+
spin_lock_irqsave(>lock, iflags);
ret = intel_pasid_setup_first_level(iommu, dev, mm->pgd, mm->pasid,
FLPT_DEFAULT_DID, sflags);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/4] iommu/vt-d: Check before setting PGSNP bit in pasid table entry

2022-04-21 Thread Lu Baolu
The latest VT-d specification states that the PGSNP field in the pasid
table entry should be treated as Reserved(0) for implementations not
supporting Snoop Control (SC=0 in the Extended Capability Register).
This adds a check before setting the field.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/pasid.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index f8d215d85695..5cb2daa2b8cb 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -625,8 +625,14 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
}
}
 
-   if (flags & PASID_FLAG_PAGE_SNOOP)
-   pasid_set_pgsnp(pte);
+   if (flags & PASID_FLAG_PAGE_SNOOP) {
+   if (ecap_sc_support(iommu->ecap)) {
+   pasid_set_pgsnp(pte);
+   } else {
+   pasid_clear_entry(pte);
+   return -EINVAL;
+   }
+   }
 
pasid_set_domain_id(pte, did);
pasid_set_address_width(pte, iommu->agaw);
@@ -710,7 +716,8 @@ int intel_pasid_setup_second_level(struct intel_iommu 
*iommu,
pasid_set_fault_enable(pte);
pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
 
-   if (domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
+   if (ecap_sc_support(iommu->ecap) &&
+   domain->domain.type == IOMMU_DOMAIN_UNMANAGED)
pasid_set_pgsnp(pte);
 
/*
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/4] iommu/vt-d: Some fine tuning of SVA

2022-04-21 Thread Lu Baolu
Hi folks,

This includes several tunings of Intel SVA implementation. I plan to
target them for v5.19. Please help to review.

Best regards,
baolu

Change log:

v2:
 - Move snoop control capability check into a separated patch.
 - Return false if the caller opt-in setting PGSNP with a flag.
 - Add a Fixes tag for "iommu/vt-d: Drop stop marker messages" as it is
   required by the iopf framework.

v1: initial post
 - 
https://lore.kernel.org/linux-iommu/20220416123049.879969-1-baolu...@linux.intel.com/
 

Lu Baolu (4):
  iommu/vt-d: Check before setting PGSNP bit in pasid table entry
  iommu/vt-d: Set PGSNP bit in pasid table entry for SVA binding
  iommu/vt-d: Drop stop marker messages
  iommu/vt-d: Size Page Request Queue to avoid overflow condition

 include/linux/intel-svm.h   |  2 +-
 drivers/iommu/intel/pasid.c | 13 ++---
 drivers/iommu/intel/svm.c   | 13 ++---
 3 files changed, 21 insertions(+), 7 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] ARM/dma-mapping: Consolidate IOMMU ops callbacks

2022-04-21 Thread Robin Murphy
Merge the coherent and non-coherent callbacks down to a single
implementation each, relying on the generic dev->dma_coherent
flag at the points where the difference matters.

Signed-off-by: Robin Murphy 
---
 arch/arm/mm/dma-mapping.c | 240 +-
 1 file changed, 56 insertions(+), 184 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6b0095b84a58..10e5e5800d78 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1079,13 +1079,13 @@ static void __iommu_free_atomic(struct device *dev, 
void *cpu_addr,
__free_from_pool(cpu_addr, size);
 }
 
-static void *__arm_iommu_alloc_attrs(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs,
-   int coherent_flag)
+static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
+   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
 {
pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
struct page **pages;
void *addr = NULL;
+   int coherent_flag = dev->dma_coherent ? COHERENT : NORMAL;
 
*handle = DMA_MAPPING_ERROR;
size = PAGE_ALIGN(size);
@@ -1128,19 +1128,7 @@ static void *__arm_iommu_alloc_attrs(struct device *dev, 
size_t size,
return NULL;
 }
 
-static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
-{
-   return __arm_iommu_alloc_attrs(dev, size, handle, gfp, attrs, NORMAL);
-}
-
-static void *arm_coherent_iommu_alloc_attrs(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
-{
-   return __arm_iommu_alloc_attrs(dev, size, handle, gfp, attrs, COHERENT);
-}
-
-static int __arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct 
*vma,
+static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
@@ -1154,35 +1142,24 @@ static int __arm_iommu_mmap_attrs(struct device *dev, 
struct vm_area_struct *vma
if (vma->vm_pgoff >= nr_pages)
return -ENXIO;
 
+   if (!dev->dma_coherent)
+   vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
+
err = vm_map_pages(vma, pages, nr_pages);
if (err)
pr_err("Remapping memory failed: %d\n", err);
 
return err;
 }
-static int arm_iommu_mmap_attrs(struct device *dev,
-   struct vm_area_struct *vma, void *cpu_addr,
-   dma_addr_t dma_addr, size_t size, unsigned long attrs)
-{
-   vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
-
-   return __arm_iommu_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, 
attrs);
-}
-
-static int arm_coherent_iommu_mmap_attrs(struct device *dev,
-   struct vm_area_struct *vma, void *cpu_addr,
-   dma_addr_t dma_addr, size_t size, unsigned long attrs)
-{
-   return __arm_iommu_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, 
attrs);
-}
 
 /*
  * free a page as defined by the above mapping.
  * Must not be called with IRQs disabled.
  */
-static void __arm_iommu_free_attrs(struct device *dev, size_t size, void 
*cpu_addr,
-   dma_addr_t handle, unsigned long attrs, int coherent_flag)
+static void arm_iommu_free_attrs(struct device *dev, size_t size, void 
*cpu_addr,
+   dma_addr_t handle, unsigned long attrs)
 {
+   int coherent_flag = dev->dma_coherent ? COHERENT : NORMAL;
struct page **pages;
size = PAGE_ALIGN(size);
 
@@ -1204,19 +1181,6 @@ static void __arm_iommu_free_attrs(struct device *dev, 
size_t size, void *cpu_ad
__iommu_free_buffer(dev, pages, size, attrs);
 }
 
-static void arm_iommu_free_attrs(struct device *dev, size_t size,
-void *cpu_addr, dma_addr_t handle,
-unsigned long attrs)
-{
-   __arm_iommu_free_attrs(dev, size, cpu_addr, handle, attrs, NORMAL);
-}
-
-static void arm_coherent_iommu_free_attrs(struct device *dev, size_t size,
-   void *cpu_addr, dma_addr_t handle, unsigned long attrs)
-{
-   __arm_iommu_free_attrs(dev, size, cpu_addr, handle, attrs, COHERENT);
-}
-
 static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
 void *cpu_addr, dma_addr_t dma_addr,
 size_t size, unsigned long attrs)
@@ -1236,8 +1200,7 @@ static int arm_iommu_get_sgtable(struct device *dev, 
struct sg_table *sgt,
  */
 static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
  size_t size, dma_addr_t *handle,
- enum dma_data_direction dir, unsigned long attrs,
- bool is_coherent)
+ enum dma_data_direction dir, unsigned long attrs)
 {

[PATCH 1/3] ARM/dma-mapping: Drop .dma_supported for IOMMU ops

2022-04-21 Thread Robin Murphy
When an IOMMU is present, we trust that it should be capable
of remapping any physical memory, and since the device masks
represent the input (virtual) addresses to the IOMMU it makes
no sense to validate them against physical PFNs anyway.

Signed-off-by: Robin Murphy 
---
 arch/arm/mm/dma-mapping.c | 23 ---
 1 file changed, 23 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0f76222cbcbb..6b0095b84a58 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -104,25 +104,6 @@ static struct arm_dma_buffer *arm_dma_buffer_find(void 
*virt)
  *
  */
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-/*
- * Return whether the given device DMA address mask can be supported
- * properly.  For example, if your device can only drive the low 24-bits
- * during bus mastering, then you would pass 0x00ff as the mask
- * to this function.
- */
-static int arm_dma_supported(struct device *dev, u64 mask)
-{
-   unsigned long max_dma_pfn = min(max_pfn - 1, arm_dma_pfn_limit);
-
-   /*
-* Translate the device's DMA mask to a PFN limit.  This
-* PFN number includes the page which we can DMA to.
-*/
-   return PHYS_PFN(dma_to_phys(dev, mask)) >= max_dma_pfn;
-}
-#endif
-
 static void __dma_clear_buffer(struct page *page, size_t size, int 
coherent_flag)
 {
/*
@@ -1681,8 +1662,6 @@ static const struct dma_map_ops iommu_ops = {
 
.map_resource   = arm_iommu_map_resource,
.unmap_resource = arm_iommu_unmap_resource,
-
-   .dma_supported  = arm_dma_supported,
 };
 
 static const struct dma_map_ops iommu_coherent_ops = {
@@ -1699,8 +1678,6 @@ static const struct dma_map_ops iommu_coherent_ops = {
 
.map_resource   = arm_iommu_map_resource,
.unmap_resource = arm_iommu_unmap_resource,
-
-   .dma_supported  = arm_dma_supported,
 };
 
 /**
-- 
2.35.3.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] ARM/dma-mapping: Merge IOMMU ops

2022-04-21 Thread Robin Murphy
The dma_sync_* operations are now the only difference between the
coherent and non-coherent IOMMU ops. Some minor tweaks to make those
safe for coherent devices with minimal overhead, and we can condense
down to a single set of DMA ops.

Signed-off-by: Robin Murphy 
---
 arch/arm/mm/dma-mapping.c | 37 +
 1 file changed, 13 insertions(+), 24 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 10e5e5800d78..dd46cce61579 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1341,6 +1341,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev,
struct scatterlist *s;
int i;
 
+   if (dev->dma_coherent)
+   return;
+
for_each_sg(sg, s, nents, i)
__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
 
@@ -1360,6 +1363,9 @@ static void arm_iommu_sync_sg_for_device(struct device 
*dev,
struct scatterlist *s;
int i;
 
+   if (dev->dma_coherent)
+   return;
+
for_each_sg(sg, s, nents, i)
__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 }
@@ -1493,12 +1499,13 @@ static void arm_iommu_sync_single_for_cpu(struct device 
*dev,
 {
struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
dma_addr_t iova = handle & PAGE_MASK;
-   struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+   struct page *page;
unsigned int offset = handle & ~PAGE_MASK;
 
-   if (!iova)
+   if (dev->dma_coherent || !iova)
return;
 
+   page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
__dma_page_dev_to_cpu(page, offset, size, dir);
 }
 
@@ -1507,12 +1514,13 @@ static void arm_iommu_sync_single_for_device(struct 
device *dev,
 {
struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
dma_addr_t iova = handle & PAGE_MASK;
-   struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+   struct page *page;
unsigned int offset = handle & ~PAGE_MASK;
 
-   if (!iova)
+   if (dev->dma_coherent || !iova)
return;
 
+   page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
__dma_page_cpu_to_dev(page, offset, size, dir);
 }
 
@@ -1536,22 +1544,6 @@ static const struct dma_map_ops iommu_ops = {
.unmap_resource = arm_iommu_unmap_resource,
 };
 
-static const struct dma_map_ops iommu_coherent_ops = {
-   .alloc  = arm_iommu_alloc_attrs,
-   .free   = arm_iommu_free_attrs,
-   .mmap   = arm_iommu_mmap_attrs,
-   .get_sgtable= arm_iommu_get_sgtable,
-
-   .map_page   = arm_iommu_map_page,
-   .unmap_page = arm_iommu_unmap_page,
-
-   .map_sg = arm_iommu_map_sg,
-   .unmap_sg   = arm_iommu_unmap_sg,
-
-   .map_resource   = arm_iommu_map_resource,
-   .unmap_resource = arm_iommu_unmap_resource,
-};
-
 /**
  * arm_iommu_create_mapping
  * @bus: pointer to the bus holding the client device (for IOMMU calls)
@@ -1750,10 +1742,7 @@ static void arm_setup_iommu_dma_ops(struct device *dev, 
u64 dma_base, u64 size,
return;
}
 
-   if (coherent)
-   set_dma_ops(dev, _coherent_ops);
-   else
-   set_dma_ops(dev, _ops);
+   set_dma_ops(dev, _ops);
 }
 
 static void arm_teardown_iommu_dma_ops(struct device *dev)
-- 
2.35.3.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] More ARM DMA ops cleanup

2022-04-21 Thread Robin Murphy
Hi all,

Thanks to Christoph's latest series, I'm reminded that, if we're going
to give the ARM DMA ops some cleanup this cycle, it's as good a time as
any to dust off these old patches and add them on top as well. I've
based these on the arm-dma-direct branch which I assume matches the
patches posted at [1].

Cheers,
Robin.

[1] 
https://lore.kernel.org/linux-arm-kernel/20220421074204.1284072-1-...@lst.de/


Robin Murphy (3):
  ARM/dma-mapping: Drop .dma_supported for IOMMU ops
  ARM/dma-mapping: Consolidate IOMMU ops callbacks
  ARM/dma-mapping: Merge IOMMU ops

 arch/arm/mm/dma-mapping.c | 286 +-
 1 file changed, 62 insertions(+), 224 deletions(-)

-- 
2.35.3.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: fully convert arm to use dma-direct

2022-04-21 Thread Andre Przywara
On Thu, 21 Apr 2022 09:41:57 +0200
Christoph Hellwig  wrote:

Hi,

> arm is the last platform not using the dma-direct code for directly
> mapped DMA.  With the dmaboune removal from Arnd we can easily switch
> arm to always use dma-direct now (it already does for LPAE configs
> and nommu).  I'd love to merge this series through the dma-mapping tree
> as it gives us the opportunity for additional core dma-mapping
> improvements.
> 
> Diffstat:
>  arch/arm/common/dmabounce.c  |  582 
> ---
>  arch/arm/include/asm/dma-mapping.h   |  128 
>  b/arch/arm/Kconfig   |5 
>  b/arch/arm/common/Kconfig|6 
>  b/arch/arm/common/Makefile   |1 
>  b/arch/arm/common/sa.c   |   64 --
>  b/arch/arm/include/asm/device.h  |3 
>  b/arch/arm/include/asm/dma-direct.h  |   49 -
>  b/arch/arm/include/asm/memory.h  |2 
>  b/arch/arm/mach-footbridge/Kconfig   |1 
>  b/arch/arm/mach-footbridge/common.c  |   19 
>  b/arch/arm/mach-footbridge/include/mach/dma-direct.h |8 
>  b/arch/arm/mach-footbridge/include/mach/memory.h |4 
>  b/arch/arm/mach-highbank/highbank.c  |2 

FWIW, I applied this on top of 5.18-rc3 and pushed my Midway (the Highbank
successor) a bit with it (scp-ing GBs forth and back to a SATA SSD). Not a
really conclusive test, but so far it looks all fine.

So for the Highbank part:
Acked-by: Andre Przywara 

Cheers,
Andre


>  b/arch/arm/mach-mvebu/coherency.c|2 
>  b/arch/arm/mm/dma-mapping.c  |  381 
>  b/drivers/usb/core/hcd.c |   17 
>  b/drivers/usb/host/ohci-sa.c |   25 
>  18 files changed, 84 insertions(+), 1215 deletions(-)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/13] iommu/arm-smmu: Clean up bus_set_iommu()

2022-04-21 Thread Will Deacon
On Wed, Apr 20, 2022 at 05:05:03PM +0100, Robin Murphy wrote:
> On 2022-04-19 15:40, Will Deacon wrote:
> > On Thu, Apr 14, 2022 at 01:42:33PM +0100, Robin Murphy wrote:
> > > Stop calling bus_set_iommu() since it's now unnecessary. With device
> > > probes now replayed for every IOMMU instance registration, the whole
> > > sorry ordering workaround for legacy DT bindings goes too, hooray!
> > 
> > Ha, I hope you tested this!
> 
> Oh alright then, since it's you... :)
> 
> I've hacked up a Juno DT with the old bindings, and (after needing a while
> to remember that they're fundamentally incompatible with disable_bypass),
> can confirm that with my whole dev branch including this series applied, it
> boots and creates IOMMU groups as expected. I then made the mistake of
> trying without the branch to check whether the squawks from
> iommu_setup_dma_ops() were new or not, and... well... plain rc3 doesn't even
> boot on the same setup - it's somehow blowing up in the failure cleanup path
> of iommu_bus_init(), apparently calling iommu_release_device() on something
> where dev->iommu->iommu_dev is NULL, for reasons that are far from clear and
> I'm not sure I can really be bothered to debug further... :/

Great, so your series is a fix!

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Patch v2] iommu: arm-smmu: disable large page mappings for Nvidia arm-smmu

2022-04-21 Thread Ashish Mhetre via iommu
Tegra194 and Tegra234 SoCs have the erratum that causes walk cache
entries to not be invalidated correctly. The problem is that the walk
cache index generated for IOVA is not same across translation and
invalidation requests. This is leading to page faults when PMD entry is
released during unmap and populated with new PTE table during subsequent
map request. Disabling large page mappings avoids the release of PMD
entry and avoid translations seeing stale PMD entry in walk cache.
Fix this by limiting the page mappings to PAGE_SIZE for Tegra194 and
Tegra234 devices. This is recommended fix from Tegra hardware design
team.

Co-developed-by: Pritesh Raithatha 
Signed-off-by: Pritesh Raithatha 
Signed-off-by: Ashish Mhetre 
---
Changes in v2:
- Using init_context() to override pgsize_bitmap instead of new function

 drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c | 30 
 1 file changed, 30 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
index 01e9b50b10a1..87bf522b9d2e 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-nvidia.c
@@ -258,6 +258,34 @@ static void nvidia_smmu_probe_finalize(struct 
arm_smmu_device *smmu, struct devi
dev_name(dev), err);
 }
 
+static int nvidia_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg,
+   struct device *dev)
+{
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   const struct device_node *np = smmu->dev->of_node;
+
+   /*
+* Tegra194 and Tegra234 SoCs have the erratum that causes walk cache
+* entries to not be invalidated correctly. The problem is that the walk
+* cache index generated for IOVA is not same across translation and
+* invalidation requests. This is leading to page faults when PMD entry
+* is released during unmap and populated with new PTE table during
+* subsequent map request. Disabling large page mappings avoids the
+* release of PMD entry and avoid translations seeing stale PMD entry in
+* walk cache.
+* Fix this by limiting the page mappings to PAGE_SIZE on Tegra194 and
+* Tegra234.
+*/
+   if (of_device_is_compatible(np, "nvidia,tegra234-smmu") ||
+   of_device_is_compatible(np, "nvidia,tegra194-smmu")) {
+   smmu->pgsize_bitmap = PAGE_SIZE;
+   pgtbl_cfg->pgsize_bitmap = smmu->pgsize_bitmap;
+   }
+
+   return 0;
+}
+
 static const struct arm_smmu_impl nvidia_smmu_impl = {
.read_reg = nvidia_smmu_read_reg,
.write_reg = nvidia_smmu_write_reg,
@@ -268,10 +296,12 @@ static const struct arm_smmu_impl nvidia_smmu_impl = {
.global_fault = nvidia_smmu_global_fault,
.context_fault = nvidia_smmu_context_fault,
.probe_finalize = nvidia_smmu_probe_finalize,
+   .init_context = nvidia_smmu_init_context,
 };
 
 static const struct arm_smmu_impl nvidia_smmu_single_impl = {
.probe_finalize = nvidia_smmu_probe_finalize,
+   .init_context = nvidia_smmu_init_context,
 };
 
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu)
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: fully convert arm to use dma-direct

2022-04-21 Thread Arnd Bergmann
On Thu, Apr 21, 2022 at 9:41 AM Christoph Hellwig  wrote:
>
> Hi all,
>
> arm is the last platform not using the dma-direct code for directly
> mapped DMA.  With the dmaboune removal from Arnd we can easily switch
> arm to always use dma-direct now (it already does for LPAE configs
> and nommu).  I'd love to merge this series through the dma-mapping tree
> as it gives us the opportunity for additional core dma-mapping
> improvements.

Thanks a lot for completing this, it looks all good to me, and I hope that
Russell can test my assabet patch to make sure this doesn't break
anything.

I saw one opportunity for an additional cleanup patch that I commented
on, but that does not stop the rest from getting merged.

I also made sure that this passes the basic kernelci tests across all
arm machines.

 Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/7] ARM: use dma-direct unconditionally

2022-04-21 Thread Arnd Bergmann
On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig  wrote:
>
> Use dma-direct unconditionally on arm.  It has already been used for
> some time for LPAE and nommu configurations.
>
> This mostly changes the streaming mapping implementation and the (simple)
> coherent allocator for device that are DMA coherent.  The existing
> complex allocator for uncached mappings for non-coherent devices is still
> used as is using the arch_dma_alloc/arch_dma_free hooks.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm/Kconfig   |   4 +-
>  arch/arm/include/asm/dma-mapping.h |  24 --
>  arch/arm/mach-highbank/highbank.c  |   2 +-
>  arch/arm/mach-mvebu/coherency.c|   2 +-
>  arch/arm/mm/dma-mapping.c  | 365 ++---
>  5 files changed, 19 insertions(+), 378 deletions(-)
>  delete mode 100644 arch/arm/include/asm/dma-mapping.h

The diffstat looks really nice!

I can't really tell from looking at the code if this is an equivalent
conversion,
so I have to trust you on that. I did make sure this passes the basic tests
on kernelci.org, which tests a large number of machines, which is a good
sign.

Tested-by: Arnd Bergmann 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/7] ARM: use the common dma_to_phys/phys_to_dma implementation where possible

2022-04-21 Thread Arnd Bergmann
On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig  wrote:
>
> Only the footbridge platforms provide their own DMA address translation
> helpers, so switch to the generic version for all other platforms, and
> consolidate the footbridge implementation to remove two levels of
> indirection.
>
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Arnd Bergmann 

> ---
> @@ -335,17 +336,19 @@ unsigned long __bus_to_virt(unsigned long res)
> return res;
>  }
>  EXPORT_SYMBOL(__bus_to_virt);
> -
> -unsigned long __pfn_to_bus(unsigned long pfn)
> +#else
> +static inline unsigned long fb_bus_sdram_offset(void)
>  {
> -   return __pfn_to_phys(pfn) + (fb_bus_sdram_offset() - PHYS_OFFSET);
> +   return BUS_OFFSET;
>  }
> -EXPORT_SYMBOL(__pfn_to_bus);
> +#endif /* CONFIG_FOOTBRIDGE_ADDIN */

I have an older patch to remove CONFIG_FOOTBRIDGE_ADDIN
completely, as it does a couple of other nasty things and there are
apparently no users. Would that help here?

   Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/7] ARM: use dma_to_phys/phys_to_dma in the dma-mapping code

2022-04-21 Thread Arnd Bergmann
On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig  wrote:
>
> Use the helpers as expected by the dma-direct code in the old arm
> dma-mapping code to ease a gradual switch to the common DMA code.
>
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Arnd Bergmann 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/7] ARM: remove the unused virt_to_dma helper

2022-04-21 Thread Arnd Bergmann
On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig  wrote:
>
> Signed-off-by: Christoph Hellwig 

I generally prefer to have at least something useful in the description, e.g.
why it's now unused and what it was used for before.

> -static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
> -{
> -   if (dev)
> -   return pfn_to_dma(dev, virt_to_pfn(addr));
> -
> -   return (dma_addr_t)__virt_to_bus((unsigned long)(addr));
> -}

I think __virt_to_bus() is now unused as well and could be removed
in the same step.

It looks like __bus_to_virt() is still used in the ISA DMA API, but
as that is only used on footbridge and rpc, the generic version of
that could be moved into rpc (footbridge already has a custom
version).

Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/7] ARM: mark various dma-mapping routines static in dma-mapping.c

2022-04-21 Thread Arnd Bergmann
On Thu, Apr 21, 2022 at 9:42 AM Christoph Hellwig  wrote:
>
> With the dmabounce removal these aren't used outside of dma-mapping.c,
> so mark them static.  Move the dma_map_ops declarations down a bit
> to avoid lots of forward declarations.
>
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Arnd Bergmann 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/7] ARM: remove dmabounce

2022-04-21 Thread Arnd Bergmann
On Thu, Apr 21, 2022 at 9:41 AM Christoph Hellwig  wrote:
>
> Remove the now unused dmabounce code.
>
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Arnd Bergmann 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/7] ARM: use dma-direct unconditionally

2022-04-21 Thread Christoph Hellwig
Use dma-direct unconditionally on arm.  It has already been used for
some time for LPAE and nommu configurations.

This mostly changes the streaming mapping implementation and the (simple)
coherent allocator for device that are DMA coherent.  The existing
complex allocator for uncached mappings for non-coherent devices is still
used as is using the arch_dma_alloc/arch_dma_free hooks.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/Kconfig   |   4 +-
 arch/arm/include/asm/dma-mapping.h |  24 --
 arch/arm/mach-highbank/highbank.c  |   2 +-
 arch/arm/mach-mvebu/coherency.c|   2 +-
 arch/arm/mm/dma-mapping.c  | 365 ++---
 5 files changed, 19 insertions(+), 378 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-mapping.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index ac2e3e4957d66..3258630beee6c 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -19,8 +19,8 @@ config ARM
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU
-   select ARCH_HAS_SYNC_DMA_FOR_DEVICE if SWIOTLB || !MMU
-   select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB || !MMU
+   select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+   select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_TEARDOWN_DMA_OPS if MMU
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
deleted file mode 100644
index 6427b934bd11c..0
--- a/arch/arm/include/asm/dma-mapping.h
+++ /dev/null
@@ -1,24 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef ASMARM_DMA_MAPPING_H
-#define ASMARM_DMA_MAPPING_H
-
-#ifdef __KERNEL__
-
-#include 
-#include 
-
-#include 
-#include 
-
-extern const struct dma_map_ops arm_dma_ops;
-extern const struct dma_map_ops arm_coherent_dma_ops;
-
-static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-{
-   if (IS_ENABLED(CONFIG_MMU) && !IS_ENABLED(CONFIG_ARM_LPAE))
-   return _dma_ops;
-   return NULL;
-}
-
-#endif /* __KERNEL__ */
-#endif
diff --git a/arch/arm/mach-highbank/highbank.c 
b/arch/arm/mach-highbank/highbank.c
index db607955a7e45..5d4f977ac7d2a 100644
--- a/arch/arm/mach-highbank/highbank.c
+++ b/arch/arm/mach-highbank/highbank.c
@@ -98,7 +98,7 @@ static int highbank_platform_notifier(struct notifier_block 
*nb,
if (of_property_read_bool(dev->of_node, "dma-coherent")) {
val = readl(sregs_base + reg);
writel(val | 0xff01, sregs_base + reg);
-   set_dma_ops(dev, _coherent_dma_ops);
+   dev->dma_coherent = true;
}
 
return NOTIFY_OK;
diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index 49e3c8d20c2fa..865ac4bc060df 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -98,7 +98,7 @@ static int mvebu_hwcc_notifier(struct notifier_block *nb,
 
if (event != BUS_NOTIFY_ADD_DEVICE)
return NOTIFY_DONE;
-   set_dma_ops(dev, _coherent_dma_ops);
+   dev->dma_coherent = true;
 
return NOTIFY_OK;
 }
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e09d79a328fa1..0f76222cbcbb9 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -103,79 +103,8 @@ static struct arm_dma_buffer *arm_dma_buffer_find(void 
*virt)
  * before transfers and delay cache invalidation until transfer completion.
  *
  */
-static void __dma_page_cpu_to_dev(struct page *, unsigned long,
-   size_t, enum dma_data_direction);
-static void __dma_page_dev_to_cpu(struct page *, unsigned long,
-   size_t, enum dma_data_direction);
-
-/**
- * arm_dma_map_page - map a portion of a page for streaming DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Ensure that any data held in the cache is appropriately discarded
- * or written back.
- *
- * The device owns this memory once this call has completed.  The CPU
- * can regain ownership by calling dma_unmap_page().
- */
-static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page,
-unsigned long offset, size_t size, enum dma_data_direction dir,
-unsigned long attrs)
-{
-   if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-   __dma_page_cpu_to_dev(page, offset, size, dir);
-   return phys_to_dma(dev, page_to_phys(page) + offset);
-}
-
-static dma_addr_t arm_coherent_dma_map_page(struct device *dev, struct page 
*page,
-unsigned long offset, size_t size, enum dma_data_direction dir,
-unsigned long attrs)
-{
-   return phys_to_dma(dev, page_to_phys(page) + offset);

[PATCH 6/7] ARM: use the common dma_to_phys/phys_to_dma implementation where possible

2022-04-21 Thread Christoph Hellwig
Only the footbridge platforms provide their own DMA address translation
helpers, so switch to the generic version for all other platforms, and
consolidate the footbridge implementation to remove two levels of
indirection.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/Kconfig  |  1 -
 arch/arm/include/asm/dma-direct.h | 41 +--
 arch/arm/include/asm/memory.h |  2 -
 arch/arm/mach-footbridge/Kconfig  |  1 +
 arch/arm/mach-footbridge/common.c | 19 +
 .../mach-footbridge/include/mach/dma-direct.h |  8 
 .../arm/mach-footbridge/include/mach/memory.h |  4 --
 7 files changed, 21 insertions(+), 55 deletions(-)
 create mode 100644 arch/arm/mach-footbridge/include/mach/dma-direct.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2e8091e2d8a86..ac2e3e4957d66 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -15,7 +15,6 @@ config ARM
select ARCH_HAS_MEMBARRIER_SYNC_CORE
select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
select ARCH_HAS_PTE_SPECIAL if ARM_LPAE
-   select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_SETUP_DMA_OPS
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
diff --git a/arch/arm/include/asm/dma-direct.h 
b/arch/arm/include/asm/dma-direct.h
index 6fd52713b5d12..4f7bcde03abb5 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -1,40 +1 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef ASM_ARM_DMA_DIRECT_H
-#define ASM_ARM_DMA_DIRECT_H 1
-
-#include 
-
-/*
- * dma_to_pfn/pfn_to_dma are architecture private
- * functions used internally by the DMA-mapping API to provide DMA
- * addresses. They must not be used by drivers.
- */
-static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
-{
-   if (dev && dev->dma_range_map)
-   pfn = PFN_DOWN(translate_phys_to_dma(dev, PFN_PHYS(pfn)));
-   return (dma_addr_t)__pfn_to_bus(pfn);
-}
-
-static inline unsigned long dma_to_pfn(struct device *dev, dma_addr_t addr)
-{
-   unsigned long pfn = __bus_to_pfn(addr);
-
-   if (dev && dev->dma_range_map)
-   pfn = PFN_DOWN(translate_dma_to_phys(dev, PFN_PHYS(pfn)));
-   return pfn;
-}
-
-static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
-{
-   unsigned int offset = paddr & ~PAGE_MASK;
-   return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset;
-}
-
-static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
-{
-   unsigned int offset = dev_addr & ~PAGE_MASK;
-   return __pfn_to_phys(dma_to_pfn(dev, dev_addr)) + offset;
-}
-
-#endif /* ASM_ARM_DMA_DIRECT_H */
+#include 
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index f673e13e0f942..a55a9038abc89 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -378,8 +378,6 @@ static inline unsigned long __virt_to_idmap(unsigned long x)
 #ifndef __virt_to_bus
 #define __virt_to_bus  __virt_to_phys
 #define __bus_to_virt  __phys_to_virt
-#define __pfn_to_bus(x)__pfn_to_phys(x)
-#define __bus_to_pfn(x)__phys_to_pfn(x)
 #endif
 
 /*
diff --git a/arch/arm/mach-footbridge/Kconfig b/arch/arm/mach-footbridge/Kconfig
index 728aff93fba9d..b5bbdcf2e4896 100644
--- a/arch/arm/mach-footbridge/Kconfig
+++ b/arch/arm/mach-footbridge/Kconfig
@@ -60,6 +60,7 @@ endmenu
 
 # Footbridge support
 config FOOTBRIDGE
+   select ARCH_HAS_PHYS_TO_DMA
bool
 
 # Footbridge in host mode
diff --git a/arch/arm/mach-footbridge/common.c 
b/arch/arm/mach-footbridge/common.c
index 322495df271d5..5020eb96b025d 100644
--- a/arch/arm/mach-footbridge/common.c
+++ b/arch/arm/mach-footbridge/common.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -335,17 +336,19 @@ unsigned long __bus_to_virt(unsigned long res)
return res;
 }
 EXPORT_SYMBOL(__bus_to_virt);
-
-unsigned long __pfn_to_bus(unsigned long pfn)
+#else
+static inline unsigned long fb_bus_sdram_offset(void)
 {
-   return __pfn_to_phys(pfn) + (fb_bus_sdram_offset() - PHYS_OFFSET);
+   return BUS_OFFSET;
 }
-EXPORT_SYMBOL(__pfn_to_bus);
+#endif /* CONFIG_FOOTBRIDGE_ADDIN */
 
-unsigned long __bus_to_pfn(unsigned long bus)
+dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
-   return __phys_to_pfn(bus - (fb_bus_sdram_offset() - PHYS_OFFSET));
+   return paddr + (fb_bus_sdram_offset() - PHYS_OFFSET);
 }
-EXPORT_SYMBOL(__bus_to_pfn);
 
-#endif
+phys_addr_t dma_to_phys(struct device *dev, dma_addr_t dev_addr)
+{
+   return dev_addr - (fb_bus_sdram_offset() - PHYS_OFFSET);
+}
diff --git a/arch/arm/mach-footbridge/include/mach/dma-direct.h 
b/arch/arm/mach-footbridge/include/mach/dma-direct.h
new file mode 100644
index 0..01f9e8367c009
--- /dev/null
+++ b/arch/arm/mach-footbridge/include/mach/dma-direct.h
@@ -0,0 +1,8 @@
+/* 

[PATCH 5/7] ARM: use dma_to_phys/phys_to_dma in the dma-mapping code

2022-04-21 Thread Christoph Hellwig
Use the helpers as expected by the dma-direct code in the old arm
dma-mapping code to ease a gradual switch to the common DMA code.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/mm/dma-mapping.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0ee5adbdd3f1d..e09d79a328fa1 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -128,14 +128,14 @@ static dma_addr_t arm_dma_map_page(struct device *dev, 
struct page *page,
 {
if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
__dma_page_cpu_to_dev(page, offset, size, dir);
-   return pfn_to_dma(dev, page_to_pfn(page)) + offset;
+   return phys_to_dma(dev, page_to_phys(page) + offset);
 }
 
 static dma_addr_t arm_coherent_dma_map_page(struct device *dev, struct page 
*page,
 unsigned long offset, size_t size, enum dma_data_direction dir,
 unsigned long attrs)
 {
-   return pfn_to_dma(dev, page_to_pfn(page)) + offset;
+   return phys_to_dma(dev, page_to_phys(page) + offset);
 }
 
 /**
@@ -156,7 +156,7 @@ static void arm_dma_unmap_page(struct device *dev, 
dma_addr_t handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0)
-   __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)),
+   __dma_page_dev_to_cpu(phys_to_page(dma_to_phys(dev, handle)),
  handle & ~PAGE_MASK, size, dir);
 }
 
@@ -164,7 +164,7 @@ static void arm_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
unsigned int offset = handle & (PAGE_SIZE - 1);
-   struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset));
+   struct page *page = phys_to_page(dma_to_phys(dev, handle-offset));
__dma_page_dev_to_cpu(page, offset, size, dir);
 }
 
@@ -172,7 +172,7 @@ static void arm_dma_sync_single_for_device(struct device 
*dev,
dma_addr_t handle, size_t size, enum dma_data_direction dir)
 {
unsigned int offset = handle & (PAGE_SIZE - 1);
-   struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset));
+   struct page *page = phys_to_page(dma_to_phys(dev, handle-offset));
__dma_page_cpu_to_dev(page, offset, size, dir);
 }
 
@@ -190,7 +190,7 @@ static int arm_dma_supported(struct device *dev, u64 mask)
 * Translate the device's DMA mask to a PFN limit.  This
 * PFN number includes the page which we can DMA to.
 */
-   return dma_to_pfn(dev, mask) >= max_dma_pfn;
+   return PHYS_PFN(dma_to_phys(dev, mask)) >= max_dma_pfn;
 }
 
 static void __dma_clear_buffer(struct page *page, size_t size, int 
coherent_flag)
@@ -681,7 +681,7 @@ static void *__dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
if (page) {
unsigned long flags;
 
-   *handle = pfn_to_dma(dev, page_to_pfn(page));
+   *handle = phys_to_dma(dev, page_to_phys(page));
buf->virt = args.want_vaddr ? addr : page;
 
spin_lock_irqsave(_dma_bufs_lock, flags);
@@ -721,7 +721,7 @@ static int __arm_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
int ret = -ENXIO;
unsigned long nr_vma_pages = vma_pages(vma);
unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long pfn = dma_to_pfn(dev, dma_addr);
+   unsigned long pfn = PHYS_PFN(dma_to_phys(dev, dma_addr));
unsigned long off = vma->vm_pgoff;
 
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, ))
@@ -762,7 +762,7 @@ static void __arm_dma_free(struct device *dev, size_t size, 
void *cpu_addr,
   dma_addr_t handle, unsigned long attrs,
   bool is_coherent)
 {
-   struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
+   struct page *page = phys_to_page(dma_to_phys(dev, handle));
struct arm_dma_buffer *buf;
struct arm_dma_free_args args = {
.dev = dev,
@@ -796,15 +796,15 @@ static int arm_dma_get_sgtable(struct device *dev, struct 
sg_table *sgt,
 void *cpu_addr, dma_addr_t handle, size_t size,
 unsigned long attrs)
 {
-   unsigned long pfn = dma_to_pfn(dev, handle);
+   phys_addr_t paddr = dma_to_phys(dev, handle);
struct page *page;
int ret;
 
/* If the PFN is not valid, we do not have a struct page */
-   if (!pfn_valid(pfn))
+   if (!pfn_valid(PHYS_PFN(paddr)))
return -ENXIO;
 
-   page = pfn_to_page(pfn);
+   page = phys_to_page(paddr);
 
ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
if (unlikely(ret))
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org

[PATCH 2/7] ARM: remove dmabounce

2022-04-21 Thread Christoph Hellwig
Remove the now unused dmabounce code.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/common/Kconfig|   4 -
 arch/arm/common/Makefile   |   1 -
 arch/arm/common/dmabounce.c| 582 -
 arch/arm/include/asm/device.h  |   3 -
 arch/arm/include/asm/dma-mapping.h |  29 --
 5 files changed, 619 deletions(-)
 delete mode 100644 arch/arm/common/dmabounce.c

diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig
index bc158fd227e12..d2fdb1796f488 100644
--- a/arch/arm/common/Kconfig
+++ b/arch/arm/common/Kconfig
@@ -3,10 +3,6 @@ config SA
bool
select ZONE_DMA if ARCH_SA1100
 
-config DMABOUNCE
-   bool
-   select ZONE_DMA
-
 config KRAIT_L2_ACCESSORS
bool
 
diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
index 8cd574be94cfe..7bae8cbaafe78 100644
--- a/arch/arm/common/Makefile
+++ b/arch/arm/common/Makefile
@@ -6,7 +6,6 @@
 obj-y  += firmware.o
 
 obj-$(CONFIG_SA)   += sa.o
-obj-$(CONFIG_DMABOUNCE)+= dmabounce.o
 obj-$(CONFIG_KRAIT_L2_ACCESSORS) += krait-l2-accessors.o
 obj-$(CONFIG_SHARP_LOCOMO) += locomo.o
 obj-$(CONFIG_SHARP_PARAM)  += sharpsl_param.o
diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
deleted file mode 100644
index 7996c04393d50..0
--- a/arch/arm/common/dmabounce.c
+++ /dev/null
@@ -1,582 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- *  arch/arm/common/dmabounce.c
- *
- *  Special dma_{map/unmap/dma_sync}_* routines for systems that have
- *  limited DMA windows. These functions utilize bounce buffers to
- *  copy data to/from buffers located outside the DMA region. This
- *  only works for systems in which DMA memory is at the bottom of
- *  RAM, the remainder of memory is at the top and the DMA memory
- *  can be marked as ZONE_DMA. Anything beyond that such as discontiguous
- *  DMA windows will require custom implementations that reserve memory
- *  areas at early bootup.
- *
- *  Original version by Brad Parker (b...@heeltoe.com)
- *  Re-written by Christopher Hoover 
- *  Made generic by Deepak Saxena 
- *
- *  Copyright (C) 2002 Hewlett Packard Company.
- *  Copyright (C) 2004 MontaVista Software, Inc.
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include 
-#include 
-
-#undef STATS
-
-#ifdef STATS
-#define DO_STATS(X) do { X ; } while (0)
-#else
-#define DO_STATS(X) do { } while (0)
-#endif
-
-/* ** */
-
-struct safe_buffer {
-   struct list_head node;
-
-   /* original request */
-   void*ptr;
-   size_t  size;
-   int direction;
-
-   /* safe buffer info */
-   struct dmabounce_pool *pool;
-   void*safe;
-   dma_addr_t  safe_dma_addr;
-};
-
-struct dmabounce_pool {
-   unsigned long   size;
-   struct dma_pool *pool;
-#ifdef STATS
-   unsigned long   allocs;
-#endif
-};
-
-struct dmabounce_device_info {
-   struct device *dev;
-   struct list_head safe_buffers;
-#ifdef STATS
-   unsigned long total_allocs;
-   unsigned long map_op_count;
-   unsigned long bounce_count;
-   int attr_res;
-#endif
-   struct dmabounce_pool   small;
-   struct dmabounce_pool   large;
-
-   rwlock_t lock;
-
-   int (*needs_bounce)(struct device *, dma_addr_t, size_t);
-};
-
-#ifdef STATS
-static ssize_t dmabounce_show(struct device *dev, struct device_attribute 
*attr,
- char *buf)
-{
-   struct dmabounce_device_info *device_info = dev->archdata.dmabounce;
-   return sprintf(buf, "%lu %lu %lu %lu %lu %lu\n",
-   device_info->small.allocs,
-   device_info->large.allocs,
-   device_info->total_allocs - device_info->small.allocs -
-   device_info->large.allocs,
-   device_info->total_allocs,
-   device_info->map_op_count,
-   device_info->bounce_count);
-}
-
-static DEVICE_ATTR(dmabounce_stats, 0400, dmabounce_show, NULL);
-#endif
-
-
-/* allocate a 'safe' buffer and keep track of it */
-static inline struct safe_buffer *
-alloc_safe_buffer(struct dmabounce_device_info *device_info, void *ptr,
- size_t size, enum dma_data_direction dir)
-{
-   struct safe_buffer *buf;
-   struct dmabounce_pool *pool;
-   struct device *dev = device_info->dev;
-   unsigned long flags;
-
-   dev_dbg(dev, "%s(ptr=%p, size=%d, dir=%d)\n",
-   __func__, ptr, size, dir);
-
-   if (size <= device_info->small.size) {
-   pool = _info->small;
-   } else if (size <= device_info->large.size) {
-   pool = _info->large;
-   } else {
-   pool = NULL;
-   }
-
-   buf = kmalloc(sizeof(struct safe_buffer), GFP_ATOMIC);
-   if (buf == 

[PATCH 4/7] ARM: remove the unused virt_to_dma helper

2022-04-21 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/dma-direct.h | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/dma-direct.h 
b/arch/arm/include/asm/dma-direct.h
index 77fcb7ee5ec90..6fd52713b5d12 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -5,7 +5,7 @@
 #include 
 
 /*
- * dma_to_pfn/pfn_to_dma/virt_to_dma are architecture private
+ * dma_to_pfn/pfn_to_dma are architecture private
  * functions used internally by the DMA-mapping API to provide DMA
  * addresses. They must not be used by drivers.
  */
@@ -25,14 +25,6 @@ static inline unsigned long dma_to_pfn(struct device *dev, 
dma_addr_t addr)
return pfn;
 }
 
-static inline dma_addr_t virt_to_dma(struct device *dev, void *addr)
-{
-   if (dev)
-   return pfn_to_dma(dev, virt_to_pfn(addr));
-
-   return (dma_addr_t)__virt_to_bus((unsigned long)(addr));
-}
-
 static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
unsigned int offset = paddr & ~PAGE_MASK;
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/7] ARM: mark various dma-mapping routines static in dma-mapping.c

2022-04-21 Thread Christoph Hellwig
With the dmabounce removal these aren't used outside of dma-mapping.c,
so mark them static.  Move the dma_map_ops declarations down a bit
to avoid lots of forward declarations.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/dma-mapping.h |  75 --
 arch/arm/mm/dma-mapping.c  | 100 +
 2 files changed, 46 insertions(+), 129 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 1e015a7ad86aa..6427b934bd11c 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -20,80 +20,5 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return NULL;
 }
 
-/**
- * arm_dma_alloc - allocate consistent memory for DMA
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @size: required memory size
- * @handle: bus-specific DMA address
- * @attrs: optinal attributes that specific mapping properties
- *
- * Allocate some memory for a device for performing DMA.  This function
- * allocates pages, and will return the CPU-viewed address, and sets @handle
- * to be the device-viewed address.
- */
-extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
-  gfp_t gfp, unsigned long attrs);
-
-/**
- * arm_dma_free - free memory allocated by arm_dma_alloc
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @size: size of memory originally requested in dma_alloc_coherent
- * @cpu_addr: CPU-view address returned from dma_alloc_coherent
- * @handle: device-view address returned from dma_alloc_coherent
- * @attrs: optinal attributes that specific mapping properties
- *
- * Free (and unmap) a DMA buffer previously allocated by
- * arm_dma_alloc().
- *
- * References to memory and mappings associated with cpu_addr/handle
- * during and after this call executing are illegal.
- */
-extern void arm_dma_free(struct device *dev, size_t size, void *cpu_addr,
-dma_addr_t handle, unsigned long attrs);
-
-/**
- * arm_dma_mmap - map a coherent DMA allocation into user space
- * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
- * @vma: vm_area_struct describing requested user mapping
- * @cpu_addr: kernel CPU-view address returned from dma_alloc_coherent
- * @handle: device-view address returned from dma_alloc_coherent
- * @size: size of memory originally requested in dma_alloc_coherent
- * @attrs: optinal attributes that specific mapping properties
- *
- * Map a coherent DMA buffer previously allocated by dma_alloc_coherent
- * into user space.  The coherent DMA buffer must not be freed by the
- * driver until the user space mapping has been released.
- */
-extern int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs);
-
-/*
- * For SA-, IXP425, and ADI systems  the dma-mapping functions are "magic"
- * and utilize bounce buffers as needed to work around limited DMA windows.
- *
- * On the SA-, a bug limits DMA to only certain regions of RAM.
- * On the IXP425, the PCI inbound window is 64MB (256MB total RAM)
- * On some ADI engineering systems, PCI inbound window is 32MB (12MB total RAM)
- *
- * The following are helper functions used by the dmabounce subystem
- *
- */
-
-/*
- * The scatter list versions of the above methods.
- */
-extern int arm_dma_map_sg(struct device *, struct scatterlist *, int,
-   enum dma_data_direction, unsigned long attrs);
-extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int,
-   enum dma_data_direction, unsigned long attrs);
-extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int,
-   enum dma_data_direction);
-extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, 
int,
-   enum dma_data_direction);
-extern int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs);
-
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 82ffac621854f..0ee5adbdd3f1d 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -193,50 +193,6 @@ static int arm_dma_supported(struct device *dev, u64 mask)
return dma_to_pfn(dev, mask) >= max_dma_pfn;
 }
 
-const struct dma_map_ops arm_dma_ops = {
-   .alloc  = arm_dma_alloc,
-   .free   = arm_dma_free,
-   .alloc_pages= dma_direct_alloc_pages,
-   .free_pages = dma_direct_free_pages,
-   .mmap   = arm_dma_mmap,
-   .get_sgtable= arm_dma_get_sgtable,
-   .map_page   = arm_dma_map_page,
-   

[PATCH 1/7] ARM: sa1100/assabet: move dmabounce hack to ohci driver

2022-04-21 Thread Christoph Hellwig
From: Arnd Bergmann 

The sa platform is one of the two remaining users of the old Arm
specific "dmabounce" code, which is an earlier implementation of the
generic swiotlb.

Linus Walleij submitted a patch that removes dmabounce support from
the ixp4xx, and I had a look at the other user, which is the sa
companion chip.

Looking at how dmabounce is used, I could narrow it down to one driver
one three machines:

 - dmabounce is only initialized on assabet/neponset, jornada720 and
   badge4, which are the platforms that have an sa and support
   DMA on it.

 - All three of these suffer from "erratum #7" that requires only
   doing DMA to half the memory sections based on one of the address
   lines, in addition, the neponset also can't DMA to the RAM that
   is connected to sa itself.

 - the pxa lubbock machine also has sa, but does not support DMA
   on it and does not set dmabounce.

 - only the OHCI and audio devices on sa support DMA, but as
   there is no audio driver for this hardware, only OHCI remains.

In the OHCI code, I noticed that two other platforms already have
a local bounce buffer support in the form of the "local_mem"
allocator. Specifically, TMIO and SM501 use this on a few other ARM
boards with 16KB or 128KB of local SRAM that can be accessed from the
OHCI and from the CPU.

While this is not the same problem as on sa, I could not find a
reason why we can't re-use the existing implementation but replace the
physical SRAM address mapping with a locally allocated DMA buffer.

There are two main downsides:

 - rather than using a dynamically sized pool, this buffer needs
   to be allocated at probe time using a fixed size. Without
   having any idea of what it should be, I picked a size of
   64KB, which is between what the other two OHCI front-ends use
   in their SRAM. If anyone has a better idea what that size
   is reasonable, this can be trivially changed.

 - Previously, only USB transfers to unaddressable memory needed
   to go through the bounce buffer, now all of them do, which may
   impact runtime performance for USB endpoints that do a lot of
   transfers.

On the upside, the local_mem support uses write-combining buffers,
which should be a bit faster for transfers to the device compared to
normal uncached coherent memory as used in dmabounce.

Cc: Linus Walleij 
Cc: Russell King 
Cc: Christoph Hellwig 
Cc: Laurentiu Tudor 
Cc: linux-...@vger.kernel.org
Signed-off-by: Arnd Bergmann 
Reviewed-by: Greg Kroah-Hartman 
Acked-by: Alan Stern 
Signed-off-by: Christoph Hellwig 
---
 arch/arm/common/Kconfig|  2 +-
 arch/arm/common/sa.c   | 64 --
 drivers/usb/core/hcd.c | 17 +++--
 drivers/usb/host/ohci-sa.c | 25 +
 4 files changed, 40 insertions(+), 68 deletions(-)

diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig
index c8e198631d418..bc158fd227e12 100644
--- a/arch/arm/common/Kconfig
+++ b/arch/arm/common/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 config SA
bool
-   select DMABOUNCE if !ARCH_PXA
+   select ZONE_DMA if ARCH_SA1100
 
 config DMABOUNCE
bool
diff --git a/arch/arm/common/sa.c b/arch/arm/common/sa.c
index 5367f03beb468..eaf2b527b0673 100644
--- a/arch/arm/common/sa.c
+++ b/arch/arm/common/sa.c
@@ -1386,70 +1386,9 @@ void sa_driver_unregister(struct sa_driver 
*driver)
 }
 EXPORT_SYMBOL(sa_driver_unregister);
 
-#ifdef CONFIG_DMABOUNCE
-/*
- * According to the "Intel StrongARM SA- Microprocessor Companion
- * Chip Specification Update" (June 2000), erratum #7, there is a
- * significant bug in the SA SDRAM shared memory controller.  If
- * an access to a region of memory above 1MB relative to the bank base,
- * it is important that address bit 10 _NOT_ be asserted. Depending
- * on the configuration of the RAM, bit 10 may correspond to one
- * of several different (processor-relative) address bits.
- *
- * This routine only identifies whether or not a given DMA address
- * is susceptible to the bug.
- *
- * This should only get called for sa_device types due to the
- * way we configure our device dma_masks.
- */
-static int sa_needs_bounce(struct device *dev, dma_addr_t addr, size_t 
size)
-{
-   /*
-* Section 4.6 of the "Intel StrongARM SA- Development Module
-* User's Guide" mentions that jumpers R51 and R52 control the
-* target of SA- DMA (either SDRAM bank 0 on Assabet, or
-* SDRAM bank 1 on Neponset). The default configuration selects
-* Assabet, so any address in bank 1 is necessarily invalid.
-*/
-   return (machine_is_assabet() || machine_is_pfs168()) &&
-   (addr >= 0xc800 || (addr + size) >= 0xc800);
-}
-
-static int sa_notifier_call(struct notifier_block *n, unsigned long action,
-   void *data)
-{
-   struct sa_dev *dev = to_sa_device(data);
-
-   

fully convert arm to use dma-direct

2022-04-21 Thread Christoph Hellwig
Hi all,

arm is the last platform not using the dma-direct code for directly
mapped DMA.  With the dmaboune removal from Arnd we can easily switch
arm to always use dma-direct now (it already does for LPAE configs
and nommu).  I'd love to merge this series through the dma-mapping tree
as it gives us the opportunity for additional core dma-mapping
improvements.

Diffstat:
 arch/arm/common/dmabounce.c  |  582 ---
 arch/arm/include/asm/dma-mapping.h   |  128 
 b/arch/arm/Kconfig   |5 
 b/arch/arm/common/Kconfig|6 
 b/arch/arm/common/Makefile   |1 
 b/arch/arm/common/sa.c   |   64 --
 b/arch/arm/include/asm/device.h  |3 
 b/arch/arm/include/asm/dma-direct.h  |   49 -
 b/arch/arm/include/asm/memory.h  |2 
 b/arch/arm/mach-footbridge/Kconfig   |1 
 b/arch/arm/mach-footbridge/common.c  |   19 
 b/arch/arm/mach-footbridge/include/mach/dma-direct.h |8 
 b/arch/arm/mach-footbridge/include/mach/memory.h |4 
 b/arch/arm/mach-highbank/highbank.c  |2 
 b/arch/arm/mach-mvebu/coherency.c|2 
 b/arch/arm/mm/dma-mapping.c  |  381 
 b/drivers/usb/core/hcd.c |   17 
 b/drivers/usb/host/ohci-sa.c |   25 
 18 files changed, 84 insertions(+), 1215 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 4/9] ACPI/IORT: Add support to retrieve IORT RMR reserved regions

2022-04-21 Thread Christoph Hellwig
On Wed, Apr 20, 2022 at 05:48:31PM +0100, Shameer Kolothum wrote:
> Parse through the IORT RMR nodes and populate the reserve region list
> corresponding to a given IOMMU and device(optional). Also, go through
> the ID mappings of the RMR node and retrieve all the SIDs associated
> with it.
> 
> Also make sure we update generic_iommu_put_resv_regions() with
> resv_region_free_fw_data() callback to free up any RMR related
> memory allocation. 
> 
> [Lorenzo: For ACPI IORT]
> Reviewed-by: Lorenzo Pieralisi 
> Tested-by: Steven Price 
> Tested-by: Laurentiu Tudor 
> Signed-off-by: Shameer Kolothum 
> ---
>  drivers/acpi/arm64/iort.c | 264 ++
>  drivers/iommu/iommu.c |  12 +-
>  2 files changed, 272 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index cd5d1d7823cb..8b189e9eca95 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -788,6 +788,267 @@ void acpi_configure_pmsi_domain(struct device *dev)
>  }
>  
>  #ifdef CONFIG_IOMMU_API
> +static void iort_rmr_free_fw_data(struct device *dev,
> +   struct iommu_resv_region *region)
> +{
> + kfree(region->fw_data.rmr.sids);
> +}
> +
> +static void iort_rmr_desc_check_overlap(struct acpi_iort_rmr_desc *desc,
> + u32 count)
> +{
> + int i, j;
> +
> + for (i = 0; i < count; i++) {
> + u64 end, start = desc[i].base_address, length = desc[i].length;
> +
> + if (!length) {
> + pr_err(FW_BUG "RMR descriptor[0x%llx] with zero length, 
> continue anyway\n",
> +start);
> + continue;
> + }
> +
> + end = start + length - 1;
> +
> + /* Check for address overlap */
> + for (j = i + 1; j < count; j++) {
> + u64 e_start = desc[j].base_address;
> + u64 e_end = e_start + desc[j].length - 1;
> +
> + if (start <= e_end && end >= e_start)
> + pr_err(FW_BUG "RMR descriptor[0x%llx - 0x%llx] 
> overlaps, continue anyway\n",
> +start, end);
> + }
> + }
> +}
> +
> +/*
> + * Please note, we will keep the already allocated RMR reserve
> + * regions in case of a memory allocation failure.
> + */
> +static void iort_get_rmrs(struct acpi_iort_node *node,
> +   struct acpi_iort_node *smmu,
> +   u32 *sids, u32 num_sids,
> +   struct list_head *head)
> +{
> + struct acpi_iort_rmr *rmr = (struct acpi_iort_rmr *)node->node_data;
> + struct acpi_iort_rmr_desc *rmr_desc;
> + int i;
> +
> + rmr_desc = ACPI_ADD_PTR(struct acpi_iort_rmr_desc, node,
> + rmr->rmr_offset);
> +
> + iort_rmr_desc_check_overlap(rmr_desc, rmr->rmr_count);
> +
> + for (i = 0; i < rmr->rmr_count; i++, rmr_desc++) {
> + struct iommu_resv_region *region;
> + enum iommu_resv_type type;
> + u32  *sids_copy;
> + int prot = IOMMU_READ | IOMMU_WRITE;
> + u64 addr = rmr_desc->base_address, size = rmr_desc->length;
> +
> + if (!IS_ALIGNED(addr, SZ_64K) || !IS_ALIGNED(size, SZ_64K)) {
> + /* PAGE align base addr and size */
> + addr &= PAGE_MASK;
> + size = PAGE_ALIGN(size + 
> offset_in_page(rmr_desc->base_address));
> +
> + pr_err(FW_BUG "RMR descriptor[0x%llx - 0x%llx] not 
> aligned to 64K, continue with [0x%llx - 0x%llx]\n",
> +rmr_desc->base_address,
> +rmr_desc->base_address + rmr_desc->length - 1,
> +addr, addr + size - 1);
> + }
> +
> + if (rmr->flags & ACPI_IORT_RMR_REMAP_PERMITTED)
> + type = IOMMU_RESV_DIRECT_RELAXABLE;
> + else
> + type = IOMMU_RESV_DIRECT;
> +
> + if (rmr->flags & ACPI_IORT_RMR_ACCESS_PRIVILEGE)
> + prot |= IOMMU_PRIV;
> +
> + /* Attributes 0x00 - 0x03 represents device memory */
> + if (ACPI_IORT_RMR_ACCESS_ATTRIBUTES(rmr->flags) <=
> + ACPI_IORT_RMR_ATTR_DEVICE_GRE)
> + prot |= IOMMU_MMIO;
> + else if (ACPI_IORT_RMR_ACCESS_ATTRIBUTES(rmr->flags) ==
> + ACPI_IORT_RMR_ATTR_NORMAL_IWB_OWB)
> + prot |= IOMMU_CACHE;
> +
> + /* Create a copy of SIDs array to associate with this resv 
> region */
> + sids_copy = kmemdup(sids, num_sids * sizeof(*sids), GFP_KERNEL);
> + if (!sids_copy)
> + return;
> +
> + region = iommu_alloc_resv_region(addr, size, prot, type);
> + if (!region) {
> +

Re: [PATCH v4 05/11] iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit

2022-04-21 Thread zhangfei....@foxmail.com



On 2022/4/21 上午12:45, Jean-Philippe Brucker wrote:

Hi,

On Fri, Apr 15, 2022 at 02:51:08AM -0700, Fenghua Yu wrote:

 From a6444e1e5bd8076f5e5c5e950d3192de327f0c9c Mon Sep 17 00:00:00 2001
From: Fenghua Yu 
Date: Fri, 15 Apr 2022 00:51:33 -0700
Subject: [RFC PATCH] iommu/sva: Fix PASID use-after-free issue

A PASID might be still used even though it is freed on mm exit.

process A:
sva_bind();
ioasid_alloc() = N; // Get PASID N for the mm
fork(): // spawn process B
exit();
ioasid_free(N);

process B:
device uses PASID N -> failure
sva_unbind();

Dave Hansen suggests to take a refcount on the mm whenever binding the
PASID to a device and drop the refcount on unbinding. The mm won't be
dropped if the PASID is still bound to it.

Fixes: 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID allocation and free 
it on mm exit")

Reported-by: Zhangfei Gao 
Suggested-by: Dave Hansen" 
Signed-off-by: Fenghua Yu 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 6 ++
  drivers/iommu/intel/svm.c   | 4 
  2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 22ddd05bbdcd..3fcb842a0df0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -7,6 +7,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "arm-smmu-v3.h"

  #include "../../iommu-sva-lib.h"
@@ -363,6 +364,9 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, 
void *drvdata)
  
  	mutex_lock(_lock);

handle = __arm_smmu_sva_bind(dev, mm);
+   /* Take an mm refcount on a successful bind. */
+   if (!IS_ERR(handle))
+   mmget(mm);
mutex_unlock(_lock);
return handle;
  }
@@ -372,6 +376,8 @@ void arm_smmu_sva_unbind(struct iommu_sva *handle)
struct arm_smmu_bond *bond = sva_to_bond(handle);
  
  	mutex_lock(_lock);

+   /* Drop an mm refcount. */
+   mmput(bond->mm);

I do like the idea because it will simplify the driver. We can't call
mmput() here, though, because it may call the release() MMU notifier which
will try to grab sva_lock, already held.

I also found another use-after-free in arm_smmu_free_shared_cd(), where we
call arm64_mm_context_put() when the mm could already be freed. There used
to be an mmgrab() preventing this but it went away during a rewrite.

To fix both we could just move mmput() at the end of unbind() but I'd
rather do a proper cleanup removing the release() notifier right away.
Zhangfei, could you try the patch below?

Thanks,
Jean

--- 8< ---

 From 4e09c0d71dfb35fc90915bd1e36545027fbf8a03 Mon Sep 17 00:00:00 2001
From: Jean-Philippe Brucker 
Date: Wed, 20 Apr 2022 10:19:24 +0100
Subject: [PATCH] iommu/arm-smmu-v3-sva: Fix PASID and mm use-after-free issues

Commit 701fac40384f ("iommu/sva: Assign a PASID to mm on PASID
allocation and free it on mm exit") frees the PASID earlier than what
the SMMUv3 driver expects. At the moment the SMMU driver handles mm exit
in the release() MMU notifier by quiescing the context descriptor. The
context descriptor is only made invalid in unbind(), after the device
driver ensured the PASID is not used anymore. Releasing the PASID on mm
exit may cause it to be reallocated while it is still used by the
context descriptor.

There is another use-after-free, present since the beginning, where we
call arm64_mm_context_put() without a guarantee that mm_count is held.

Dave Hansen suggests to grab mm_users whenever binding the mm to a
device and drop it on unbinding. With that we can fix both issues and
simplify the driver by removing the release() notifier.

Fixes: 32784a9562fb ("iommu/arm-smmu-v3: Implement iommu_sva_bind/unbind()")
Reported-by: Zhangfei Gao 
Suggested-by: Dave Hansen 
Signed-off-by: Fenghua Yu 
Signed-off-by: Jean-Philippe Brucker 
---
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  1 -
  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 49 +--
  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 14 +-
  3 files changed, 15 insertions(+), 49 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..d50d215d946c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -735,7 +735,6 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct 
iommu_domain *dom)
  
  extern struct xarray arm_smmu_asid_xa;

  extern struct mutex arm_smmu_asid_lock;
-extern struct arm_smmu_ctx_desc quiet_cd;
  
  int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid,

struct arm_smmu_ctx_desc *cd);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 22ddd05bbdcd..f9dff0f6cdd4 100644
--- 

Re: [PATCH v10 1/9] iommu: Introduce a union to struct iommu_resv_region

2022-04-21 Thread Christoph Hellwig
On Wed, Apr 20, 2022 at 05:48:28PM +0100, Shameer Kolothum wrote:
> @@ -141,6 +149,11 @@ struct iommu_resv_region {
>   size_t  length;
>   int prot;
>   enum iommu_resv_typetype;
> + union {
> + struct iommu_iort_rmr_data rmr;
> + } fw_data;
> + void (*resv_region_free_fw_data)(struct device *dev,
> +  struct iommu_resv_region *region);

I'd shorten the name to just free.  Also any reason the call to this
method isn't also added in this patch as it logically belongs here?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 3/9] ACPI/IORT: Provide a generic helper to retrieve reserve regions

2022-04-21 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 2/9] ACPI/IORT: Make iort_iommu_msi_get_resv_regions() return void

2022-04-21 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu