Re: [PATCH v2] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Joe Perches
On Fri, 2019-11-22 at 11:46 -0500, Qian Cai wrote:
> On Fri, 2019-11-22 at 08:28 -0800, Joe Perches wrote:
> > On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote:
> > > On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote:
> > > > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote:
> > > > > When running heavy memory pressure workloads, this 5+ old system is
> > > > > throwing endless warnings below because disk IO is too slow to recover
> > > > > from swapping. Since the volume from alloc_iova_fast() could be large,
> > > > > once it calls printk(), it will trigger disk IO (writing to the log
> > > > > files) and pending softirqs which could cause an infinite loop and 
> > > > > make
> > > > > no progress for days by the ongoimng memory reclaim. This is the 
> > > > > counter
> > > > > part for Intel where the AMD part has already been merged. See the
> > > > > commit 3d708895325b ("iommu/amd: Silence warnings under memory
> > > > > pressure"). Since the allocation failure will be reported in
> > > > > intel_alloc_iova(), so just call printk_ratelimted() there and silence
> > > > > the one in alloc_iova_mem() to avoid the expensive warn_alloc().
> > > > 
> > > > []
> > > > > v2: use dev_err_ratelimited() and improve the commit messages.
> > > > 
> > > > []
> > > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > > > 
> > > > []
> > > > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct 
> > > > > device *dev,
> > > > >   iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
> > > > >  IOVA_PFN(dma_mask), true);
> > > > >   if (unlikely(!iova_pfn)) {
> > > > > - dev_err(dev, "Allocating %ld-page iova failed", 
> > > > > nrpages);
> > > > > + dev_err_ratelimited(dev, "Allocating %ld-page iova 
> > > > > failed",
> > > > > + nrpages);
> > > > 
> > > > Trivia:
> > > > 
> > > > This should really have a \n termination on the format string
> > > > 
> > > > dev_err_ratelimited(dev, "Allocating %ld-page iova 
> > > > failed\n",
> > > > 
> > > > 
> > > 
> > > Why do you say so? It is right now printing with a newline added anyway.
> > > 
> > >  hpsa :03:00.0: DMAR: Allocating 1-page iova failed
> > 
> > If another process uses pr_cont at the same time,
> > it can be interleaved.
> 
> I lean towards fixing that in a separate patch if ever needed, as the origin
> dev_err() has no "\n" enclosed either.

Your choice.

I wrote trivia:, but touching the same line multiple times
is relatively pointless.



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 00/13] virtio-iommu on non-devicetree platforms

2019-11-22 Thread Jacob Pan (Jun)
On Fri, 22 Nov 2019 11:49:47 +0100
Jean-Philippe Brucker  wrote:

> I'm seeking feedback on multi-platform support for virtio-iommu. At
> the moment only devicetree (DT) is supported and we don't have a
> pleasant solution for other platforms. Once we figure out the topology
> description, x86 support is trivial.
> 
> Since the IOMMU manages memory accesses from other devices, the guest
> kernel needs to initialize the IOMMU before endpoints start issuing
> DMA. It's a solved problem: firmware or hypervisor describes through
> DT or ACPI tables the device dependencies, and probe of endpoints is
> deferred until the IOMMU is probed. But:
> 
> (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and
> IORT for Arm). From my point of view IORT is easier to extend, since
> we just need to introduce a new node type. There are no dependencies
> to Arm in the Linux IORT driver, so it works well with CONFIG_X86.
> 
>From my limited understanding, IORT and VIOT is to solve device topology
enumeration only? I am not sure how it can be expanded to cover
information beyond device topology. e.g. DMAR has NUMA information and
root port ATS, I guess they are not used today in the guest but might
be additions in the future.

> However, there are concerns about other OS vendors feeling
> obligated to implement this new node, so Arm proposed introducing
> another ACPI table, that can wrap any of DMAR, IVRS and IORT to
> extend it with new virtual nodes. A draft of this VIOT table
> specification is available at
> http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf
> 
> I'm afraid this could increase fragmentation as guests would need
> to implement or modify their support for all of DMAR, IVRS and IORT.
> If we end up doing VIOT, I suggest limiting it to IORT.
> 
> (2) In addition, there are some concerns about having virtio depend on
> ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool
> x86 [1]) don't currently implement those methods.
> 
> It was suggested to embed the topology description into the
> device. It can work, as demonstrated at the end of this RFC, with the
> following limitations:
> 
> - The topology description must be read before any endpoint
> managed by the IOMMU is probed, and even before the virtio module is
>   loaded. This RFC uses a PCI quirk to manually parse the virtio
>   configuration. It assumes that all endpoints managed by the
> IOMMU are under this same PCI host.
> 

> - I don't have a solution for the virtio-mmio transport at the
>   moment, because I haven't had time to modify a host to test it.
> I think it could either use a notifier on the platform bus, or
>   better, a new 'iommu' command-line argument to the virtio-mmio
>   driver. So the current prototype doesn't work for firecracker
> and microvm, which rely on virtio-mmio.
> 
> - For Arm, if the platform has an ITS, the hypervisor needs IORT
> or DT to describe it anyway. More generally, not using either ACPI or
>   DT might prevent from supporting other features as well. I
> suspect the above users will have to implement a standard method
> sooner or later.
> 
> - Even when reusing as much existing code as possible, guest
> support is still going to be around a few hundred lines since we can't
>   rely on the normal virtio infrastructure to be loaded at that
>   point. As you can see below, the diffstat for the incomplete
>   topology implementation is already bigger than the exhaustive
> IORT support, even when jumping through the VIOT hoop.
> 
> So it's a lightweight solution for very specific use-cases, and we
> should still support ACPI for the general case. Multi-platform
> guests such as Linux will then need to support three topology
> descriptions instead of two.
> 
> In this RFC I present both solutions, but I'd rather not keep all of
> it. Please see the individual patches for details:
> 
> (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT
> driver and patches 2, 11 add the VIOT glue.
> 
> (2) Patch 12 adds the built-in topology description to the
> virtio-iommu specification. Patch 13 is a partial implementation for
> the Linux virtio-iommu driver. It only supports PCI, not platform
> devices.
> 
> You can find Linux and QEMU code on my virtio-iommu/devel branches at
> http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu
> 
> 
> I split the diffstat since there are two independent features. The
> first one is for patches 1-11, and the second one for patch 13.
> 
> Jean-Philippe Brucker (11):
>   ACPI/IORT: Move IORT to the ACPI folder
>   ACPI: Add VIOT definitions
>   ACPI/IORT: Allow registration of external tables
>   ACPI/IORT: Add node categories
>   ACPI/IORT: Support VIOT virtio-mmio node
>   ACPI/IORT: Support VIOT virtio-pci node
>   ACPI/IORT: Defer probe until virtio-iommu-pci has registered a
> fwnode ACPI/IORT: Add callback to update a device's fwnode
>   iommu/virt

[PATCH v2 7/8] drm/msm/a6xx: Support split pagetables

2019-11-22 Thread Jordan Crouse
Attempt to enable split pagetables if the arm-smmu driver supports it.
This will move the default address space from the default region to
the address range assigned to TTBR1. The behavior should be transparent
to the driver for now but it gets the default buffers out of the way
when we want to start swapping TTBR0 for context-specific pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 ++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 5dc0b2c..96b3b28 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -811,6 +811,50 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
+   struct msm_gem_address_space *aspace;
+   struct msm_mmu *mmu;
+   u64 start, size;
+   u32 val = 1;
+   int ret;
+
+   if (!iommu)
+   return ERR_PTR(-ENOMEM);
+
+   /* Try to request split pagetables */
+   iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+
+   mmu = msm_iommu_new(&pdev->dev, iommu);
+   if (IS_ERR(mmu)) {
+   iommu_domain_free(iommu);
+   return ERR_CAST(mmu);
+   }
+
+   /* Check to see if split pagetables were successful */
+   ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+   if (!ret && val) {
+   /*
+* The aperture start will be at the beginning of the TTBR1
+* space so use that as a base
+*/
+   start = iommu->geometry.aperture_start;
+   size = 0x;
+   } else {
+   /* Otherwise use the legacy 32 bit region */
+   start = SZ_16M;
+   size = 0x - SZ_16M;
+   }
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", start, size);
+   if (IS_ERR(aspace))
+   iommu_domain_free(iommu);
+
+   return aspace;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -832,7 +876,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
-   .create_address_space = adreno_iommu_create_address_space,
+   .create_address_space = a6xx_create_address_space,
 #endif
},
.get_timestamp = a6xx_get_timestamp,
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 6/8] drm/msm: Refactor address space initialization

2019-11-22 Thread Jordan Crouse
Refactor how address space initialization works. Instead of having the
address space function create the MMU object (and thus require separate but
equal functions for gpummu and iommu) use a single function and pass the
MMU struct in. Make the generic code cleaner by using target specific
functions to create the address space so a2xx can do its own thing in its
own space.  For all the other targets use a generic helper to initialize
IOMMU but leave the door open for newer targets to use customization
if they need it.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c|  1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c  | 23 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h  |  8 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  | 10 +++---
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 +
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c |  4 ---
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 +--
 drivers/gpu/drm/msm/msm_drv.h|  8 ++---
 drivers/gpu/drm/msm/msm_gem_vma.c| 52 +---
 drivers/gpu/drm/msm/msm_gpu.c| 40 ++--
 drivers/gpu/drm/msm/msm_gpu.h|  4 +--
 drivers/gpu/drm/msm/msm_iommu.c  |  3 ++
 16 files changed, 83 insertions(+), 114 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
index 1f83bc1..60f6472 100644
--- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c
@@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct 
msm_gpu *gpu)
return state;
 }
 
+static struct msm_gem_address_space *
+a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
+{
+   struct msm_mmu *mmu = msm_gpummu_new(&pdev->dev, gpu);
+   struct msm_gem_address_space *aspace;
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
+   SZ_16M + 0xfff * SZ_64K);
+
+   if (IS_ERR(aspace) && !IS_ERR(mmu))
+   mmu->funcs->destroy(mmu);
+
+   return aspace;
+}
+
 /* Register offset defines for A2XX - copy of A3XX */
 static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = {
REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE),
@@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a2xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = a2xx_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index 7ad1493..41e51e0 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a3xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
 };
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index b01388a..3655440 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = {
 #endif
.gpu_state_get = a4xx_gpu_state_get,
.gpu_state_put = adreno_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a4xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index b02e204..0f5db72 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -1432,6 +1432,7 @@ static const struct adreno_gpu_funcs funcs = {
.gpu_busy = a5xx_gpu_busy,
.gpu_state_get = a5xx_gpu_state_get,
.gpu_state_put = a5xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
},
.get_timestamp = a5xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index dc8ec2c..5dc0b2c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -832,6 +832,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
+   .create_address_space = adreno_iommu_create_address_space,
 #endif
},
.get_timestamp = a6xx_get_timestamp,
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c 
b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
inde

[PATCH v2 8/8] arm64: dts: qcom: sdm845: Update Adreno GPU SMMU compatible string

2019-11-22 Thread Jordan Crouse
Add "qcom,adreno-smmu-v2" compatible string for the Adreno GPU SMMU node
to enable split pagetable support.

Signed-off-by: Jordan Crouse 
---

 arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index ddb1f23..d90ba6eda 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -2869,7 +2869,7 @@
};
 
adreno_smmu: iommu@504 {
-   compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+   compatible = "qcom,adreno-smmu-v2", "qcom,smmu-v2";
reg = <0 0x504 0 0x1>;
#iommu-cells = <1>;
#global-interrupts = <2>;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/8] iommu/arm-smmu: Split pagetable support for Adreno GPUs

2019-11-22 Thread Jordan Crouse


Another refresh to support split pagetables for Adreno GPUs as part of an
incremental process to enable per-context pagetables.

In order to support per-context pagetables the GPU needs to enable split tables
so that we can store global buffers in the TTBR1 space leaving the GPU free to
program the TTBR0 register with the address of a context specific pagetable.

This patchset adds split pagetable support for devices identified with the
compatible string qcom,adreno-smmu-v2. If the compatible string is enabled and
DOMAIN_ATTR_SPLIT_TABLES is non zero at attach time, the implementation will
set up the TTBR0 and TTBR1 spaces with identical configurations and program
the domain pagetable into the TTBR1 register. The TTBR0 register will be
unused.

The driver can determine if split pagetables were programmed by querying
DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be
updated to reflect the virtual address space for the TTBR1 range.

These patches are on based on top of linux-next-20191120 with [1], [2], and [3]
from Robin on the iommu list.

The first four patches add the device tree bindings and implementation
specific support for arm-smmu and the rest of the patches add the drm/msm
implementation followed by the device tree update for sdm845.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039718.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039719.html
[3] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039720.html


Jordan Crouse (8):
  dt-bindings: arm-smmu: Add Adreno GPU variant
  iommu: Add DOMAIN_ATTR_SPLIT_TABLES
  iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context
  iommu/arm-smmu: Add split pagetables for Adreno IOMMU implementations
  drm/msm: Attach the IOMMU device during initialization
  drm/msm: Refactor address space initialization
  drm/msm/a6xx: Support split pagetables
  arm64: dts: qcom: sdm845:  Update Adreno GPU SMMU compatible string

 .../devicetree/bindings/iommu/arm,smmu.yaml|  6 ++
 arch/arm64/boot/dts/qcom/sdm845.dtsi   |  2 +-
 drivers/gpu/drm/msm/adreno/a2xx_gpu.c  | 16 
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c  |  1 +
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c  |  1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c  |  1 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 45 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c| 23 --
 drivers/gpu/drm/msm/adreno/adreno_gpu.h|  8 ++
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c| 18 ++--
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c   | 18 ++--
 drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c   |  4 -
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c   | 18 ++--
 drivers/gpu/drm/msm/msm_drv.h  |  8 +-
 drivers/gpu/drm/msm/msm_gem_vma.c  | 37 ++---
 drivers/gpu/drm/msm/msm_gpu.c  | 49 +--
 drivers/gpu/drm/msm/msm_gpu.h  |  4 +-
 drivers/gpu/drm/msm/msm_gpummu.c   |  6 --
 drivers/gpu/drm/msm/msm_iommu.c| 18 ++--
 drivers/gpu/drm/msm/msm_mmu.h  |  1 -
 drivers/iommu/arm-smmu-impl.c  |  6 +-
 drivers/iommu/arm-smmu-qcom.c  | 96 ++
 drivers/iommu/arm-smmu.c   | 52 +---
 drivers/iommu/arm-smmu.h   | 14 +++-
 include/linux/iommu.h  |  1 +
 25 files changed, 295 insertions(+), 158 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/8] drm/msm: Attach the IOMMU device during initialization

2019-11-22 Thread Jordan Crouse
Everywhere an IOMMU object is created by msm_gpu_create_address_space
the IOMMU device is attached immediately after. Instead of carrying around
the infrastructure to do the attach from the device specific code do it
directly in the msm_iommu_init() function. This gets it out of the way for
more aggressive cleanups that follow.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c  |  8 
 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c |  4 
 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c |  7 ---
 drivers/gpu/drm/msm/msm_gem_vma.c| 23 +++
 drivers/gpu/drm/msm/msm_gpu.c| 11 +--
 drivers/gpu/drm/msm/msm_gpummu.c |  6 --
 drivers/gpu/drm/msm/msm_iommu.c  | 15 +++
 drivers/gpu/drm/msm/msm_mmu.h|  1 -
 8 files changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c 
b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
index 6c92f0f..b082b23 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c
@@ -704,7 +704,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
 {
struct iommu_domain *domain;
struct msm_gem_address_space *aspace;
-   int ret;
 
domain = iommu_domain_alloc(&platform_bus_type);
if (!domain)
@@ -720,13 +719,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms)
return PTR_ERR(aspace);
}
 
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DPU_ERROR("failed to attach iommu %d\n", ret);
-   msm_gem_address_space_put(aspace);
-   return ret;
-   }
-
dpu_kms->base.aspace = aspace;
return 0;
 }
diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c 
b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
index dda0543..9dba37c 100644
--- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c
@@ -518,10 +518,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret)
-   goto fail;
} else {
DRM_DEV_INFO(dev->dev, "no iommu, fallback to phys "
"contig buffers for scanout\n");
diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c 
b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
index e43ecd4..653dab2 100644
--- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c
@@ -736,13 +736,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
}
 
kms->aspace = aspace;
-
-   ret = aspace->mmu->funcs->attach(aspace->mmu);
-   if (ret) {
-   DRM_DEV_ERROR(&pdev->dev, "failed to attach iommu: 
%d\n",
-   ret);
-   goto fail;
-   }
} else {
DRM_DEV_INFO(&pdev->dev,
 "no iommu, fallback to phys contig buffers for 
scanout\n");
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
b/drivers/gpu/drm/msm/msm_gem_vma.c
index 1af5354..91d993a 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -131,8 +131,8 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
const char *name)
 {
struct msm_gem_address_space *aspace;
-   u64 size = domain->geometry.aperture_end -
-   domain->geometry.aperture_start;
+   u64 start = domain->geometry.aperture_start;
+   u64 size = domain->geometry.aperture_end - start;
 
aspace = kzalloc(sizeof(*aspace), GFP_KERNEL);
if (!aspace)
@@ -141,9 +141,18 @@ msm_gem_address_space_create(struct device *dev, struct 
iommu_domain *domain,
spin_lock_init(&aspace->lock);
aspace->name = name;
aspace->mmu = msm_iommu_new(dev, domain);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
 
-   drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> 
PAGE_SHIFT),
-   size >> PAGE_SHIFT);
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
+
+   /*
+* Attaching the IOMMU device changes the aperture values so use the
+* cached values instead
+*/
+   drm_mm_init(&aspace->mm, start >> PAGE_SHIFT, size >> PAGE_SHIFT);
 
kref_init(&aspace->kref);
 
@@ -164,6 +173,12 @@ msm_gem_address_space_create_a2xx(struct device *dev, 
struct msm_gpu *gpu,
spin_lock_init(&aspace->lock);
aspace->name = name;
aspace->mmu = msm_gpummu_new(dev, gpu);
+   if (IS_ERR(aspace->mmu)) {
+   int ret = PTR_ERR(aspace->mmu);
+
+   kfree(aspace);
+   return ERR_PTR(ret);
+   }
 
drm_mm_init(&aspace->mm, (va_start 

[PATCH v2 4/8] iommu/arm-smmu: Add split pagetables for Adreno IOMMU implementations

2019-11-22 Thread Jordan Crouse
Add implementation specific support to enable split pagetables for
SMMU implementations attached to Adreno GPUs on Qualcomm targets.

To enable split pagetables the driver will set an attribute on the domain.
if conditions are correct, set up the hardware to support equally sized
TTBR0 and TTBR1 regions and programs the domain pagetable to TTBR1 to make
it available for global buffers while allowing the GPU the chance to
switch the TTBR0 at runtime for per-context pagetables.

After programming the context, the value of the domain attribute can be
queried to see if split pagetables were successfully programmed. The
domain geometry will be updated so that the caller can determine the
start of the region to generate correct virtual addresses.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 ++
 drivers/iommu/arm-smmu-qcom.c | 96 +++
 drivers/iommu/arm-smmu.c  | 41 ++
 drivers/iommu/arm-smmu.h  | 11 +
 4 files changed, 143 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index 33ed682..1e91231 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -174,5 +174,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500"))
return qcom_smmu_impl_init(smmu);
 
+   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu-v2"))
+   return adreno_smmu_impl_init(smmu);
+
return smmu;
 }
diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index 24c071c..6591e49 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -11,6 +11,102 @@ struct qcom_smmu {
struct arm_smmu_device smmu;
 };
 
+#define TG0_4K  0
+#define TG0_64K 1
+#define TG0_16K 2
+
+#define TG1_16K 1
+#define TG1_4K  2
+#define TG1_64K 3
+
+/*
+ * Set up split pagetables for Adreno SMMUs that will keep a static TTBR1 for
+ * global buffers and dynamically switch TTBR0 from the GPU for context 
specific
+ * pagetables.
+ */
+static int adreno_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+   struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
+   u32 tcr, tg0;
+
+   /*
+* Return error if split pagetables are not enabled so that arm-smmu
+* do the default configuration
+*/
+   if (!(pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1))
+   return -EINVAL;
+
+   /* Get the bank configuration from the pagetable config */
+   tcr = arm_smmu_lpae_tcr(pgtbl_cfg) & 0x;
+
+   /*
+* The TCR configuration for TTBR0 and TTBR1 is (almost) identical so
+* just duplicate the T0 configuration and shift it
+*/
+   cb->tcr[0] = (tcr << 16) | tcr;
+
+   /*
+* The (almost) above refers to the granule size field which is
+* different for TTBR0 and TTBR1. With the TTBR1 quirk enabled,
+* io-pgtable-arm will write the T1 appropriate granule size for tg.
+* Translate the configuration from the T1 field to get the right value
+* for T0
+*/
+   if (pgtbl_cfg->arm_lpae_s1_cfg.tcr.tg == TG1_4K)
+   tg0 = TG0_4K;
+   else if (pgtbl_cfg->arm_lpae_s1_cfg.tcr.tg == TG1_16K)
+   tg0 = TG0_16K;
+   else
+   tg0 = TG0_64K;
+
+   /* clear and set the correct value for TG0  */
+   cb->tcr[0] &= ~TCR_TG0;
+   cb->tcr[0] |= FIELD_PREP(TCR_TG0, tg0);
+
+   /*
+* arm_smmu_lape_tcr2 sets SEP_UPSTREAM which is always the appropriate
+* SEP for Adreno IOMMU
+*/
+   cb->tcr[1] = arm_smmu_lpae_tcr2(pgtbl_cfg);
+   cb->tcr[1] |= TCR2_AS;
+
+   /* TTBRs */
+   cb->ttbr[0] = FIELD_PREP(TTBRn_ASID, cfg->asid);
+   cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+   cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
+
+   /* MAIRs */
+   cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
+   cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
+
+   return 0;
+}
+
+static int adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
+{
+   /* Enable split pagetables if the flag is set and the format matches */
+   if (smmu_domain->split_pagetables)
+   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 &&
+   smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64)
+   pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+
+   return 0;
+}
+
+static const struct arm_smmu_impl adreno_smmu_impl = {
+   .init_context = adreno_smmu_init_context,
+   .init_context_bank = adreno_smmu_init_context_bank,
+};
+
+struct arm_smmu_device *adreno_smmu_i

[PATCH v2 3/8] iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context

2019-11-22 Thread Jordan Crouse
Pass the propposed io_pgtable_cfg to the implementation specific
init_context() function to give the implementation an opportunity to to
modify it before it gets passed to io-pgtable.

Signed-off-by: Jordan Crouse 
---

 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c  | 11 ++-
 drivers/iommu/arm-smmu.h  |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index b2fe72a..33ed682 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg)
 {
struct cavium_smmu *cs = container_of(smmu_domain->smmu,
  struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c106406..5c7c32b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -775,11 +775,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
cfg->asid = cfg->cbndx;
 
smmu_domain->smmu = smmu;
-   if (smmu->impl && smmu->impl->init_context) {
-   ret = smmu->impl->init_context(smmu_domain);
-   if (ret)
-   goto out_unlock;
-   }
 
pgtbl_cfg = (struct io_pgtable_cfg) {
.pgsize_bitmap  = smmu->pgsize_bitmap,
@@ -790,6 +785,12 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
.iommu_dev  = smmu->dev,
};
 
+   if (smmu->impl && smmu->impl->init_context) {
+   ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg);
+   if (ret)
+   goto out_unlock;
+   }
+
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index afab9de..0eb498f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -357,7 +357,8 @@ struct arm_smmu_impl {
u64 val);
int (*cfg_probe)(struct arm_smmu_device *smmu);
int (*reset)(struct arm_smmu_device *smmu);
-   int (*init_context)(struct arm_smmu_domain *smmu_domain);
+   int (*init_context)(struct arm_smmu_domain *smmu_domain,
+   struct io_pgtable_cfg *pgtbl_cfg);
void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 int status);
 };
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/8] dt-bindings: arm-smmu: Add Adreno GPU variant

2019-11-22 Thread Jordan Crouse
Add a compatible string to identify SMMUs that are attached
to Adreno GPU devices that wish to support split pagetables.

Signed-off-by: Jordan Crouse 
---

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 6515dbe..db9f826 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -31,6 +31,12 @@ properties:
   - qcom,sdm845-smmu-v2
   - const: qcom,smmu-v2
 
+  - description: Qcom Adreno GPU SMMU iplementing split pagetables
+items:
+  - enum:
+  - qcom,adreno-smmu-v2
+  - const: qcom,smmu-v2
+
   - description: Qcom SoCs implementing "arm,mmu-500"
 items:
   - enum:
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/8] iommu: Add DOMAIN_ATTR_SPLIT_TABLES

2019-11-22 Thread Jordan Crouse
Add a new attribute to enable and query the state of split pagetables
for the domain.

Signed-off-by: Jordan Crouse 
---

 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f2223cb..18c861e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -126,6 +126,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_SPLIT_TABLES,
DOMAIN_ATTR_MAX,
 };
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling

2019-11-22 Thread Jordan Crouse
On Fri, Oct 25, 2019 at 07:08:37PM +0100, Robin Murphy wrote:
> TTBR1 values have so far been redundant since no users implement any
> support for split address spaces. Crucially, though, one of the main
> reasons for wanting to do so is to be able to manage each half entirely
> independently, e.g. context-switching one set of mappings without
> disturbing the other. Thus it seems unlikely that tying two tables
> together in a single io_pgtable_cfg would ever be particularly desirable
> or useful.
> 
> Streamline the configs to just a single conceptual TTBR value
> representing the allocated table. This paves the way for future users to
> support split address spaces by simply allocating a table and dealing
> with the detailed TTBRn logistics themselves.

Tested-by: Jordan Crouse 

> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu-v3.c|  2 +-
>  drivers/iommu/arm-smmu.c   |  9 -
>  drivers/iommu/io-pgtable-arm-v7s.c | 16 +++-
>  drivers/iommu/io-pgtable-arm.c |  5 ++---
>  drivers/iommu/ipmmu-vmsa.c |  2 +-
>  drivers/iommu/msm_iommu.c  |  4 ++--
>  drivers/iommu/mtk_iommu.c  |  4 ++--
>  drivers/iommu/qcom_iommu.c |  3 +--
>  include/linux/io-pgtable.h |  4 ++--
>  9 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3f20e548f1ec..da31e607698f 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2170,7 +2170,7 @@ static int arm_smmu_domain_finalise_s1(struct 
> arm_smmu_domain *smmu_domain,
>   }
>  
>   cfg->cd.asid= (u16)asid;
> - cfg->cd.ttbr= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> + cfg->cd.ttbr= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>   cfg->cd.tcr = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>   cfg->cd.mair= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>   return 0;
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 2bc3e93b11e6..a249e4e49ead 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -534,13 +534,12 @@ static void arm_smmu_init_context_bank(struct 
> arm_smmu_domain *smmu_domain,
>   /* TTBRs */
>   if (stage1) {
>   if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
> - cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr[0];
> - cb->ttbr[1] = pgtbl_cfg->arm_v7s_cfg.ttbr[1];
> + cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
> + cb->ttbr[1] = 0;
>   } else {
> - cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> + cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>   cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> - cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1];
> - cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> + cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
>   }
>   } else {
>   cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index 7c3bd2c3cdca..4d2c1e7f67c4 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
> io_pgtable_cfg *cfg,
>   /* Ensure the empty pgd is visible before any actual TTBR write */
>   wmb();
>  
> - /* TTBRs */
> - cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
> -ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
> -(cfg->coherent_walk ?
> -(ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> - ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> -(ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> - ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> - cfg->arm_v7s_cfg.ttbr[1] = 0;
> + /* TTBR */
> + cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
> + (cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
> +   ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> +   ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> +  (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> +   ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
>   return &data->iop;
>  
>  out_free_data:
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 1795df8f7a51..bc0841040ebe 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -872,9 +872,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
> void *cookie)
>   /* Ensure the empty pgd is visible before any actual TTBR write */
>  

Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling

2019-11-22 Thread Jordan Crouse
On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> Although it's conceptually nice for the io_pgtable_cfg to provide a
> standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU
> looks exactly like an Arm CPU, and they all have various other TCR
> controls which io-pgtable can't be expected to understand. Thus since
> there is an expectation that drivers will have to add to the given TCR
> value anyway, let's strip it down to just the essentials that are
> directly relevant to io-pgatble's inner workings - namely the various
> sizes and the walk attributes.

Tested-by: Jordan Crouse 

> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu-v3.c| 41 +++--
>  drivers/iommu/arm-smmu.c   |  7 ++-
>  drivers/iommu/arm-smmu.h   | 27 
>  drivers/iommu/io-pgtable-arm-v7s.c |  6 +-
>  drivers/iommu/io-pgtable-arm.c | 98 --
>  drivers/iommu/io-pgtable.c |  2 +-
>  drivers/iommu/qcom_iommu.c |  8 +--
>  include/linux/io-pgtable.h |  9 ++-
>  8 files changed, 94 insertions(+), 104 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index da31e607698f..ca72cd777955 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -261,27 +261,18 @@
>  /* Context descriptor (stage-1 only) */
>  #define CTXDESC_CD_DWORDS8
>  #define CTXDESC_CD_0_TCR_T0SZGENMASK_ULL(5, 0)
> -#define ARM64_TCR_T0SZ   GENMASK_ULL(5, 0)
>  #define CTXDESC_CD_0_TCR_TG0 GENMASK_ULL(7, 6)
> -#define ARM64_TCR_TG0GENMASK_ULL(15, 14)
>  #define CTXDESC_CD_0_TCR_IRGN0   GENMASK_ULL(9, 8)
> -#define ARM64_TCR_IRGN0  GENMASK_ULL(9, 8)
>  #define CTXDESC_CD_0_TCR_ORGN0   GENMASK_ULL(11, 10)
> -#define ARM64_TCR_ORGN0  GENMASK_ULL(11, 10)
>  #define CTXDESC_CD_0_TCR_SH0 GENMASK_ULL(13, 12)
> -#define ARM64_TCR_SH0GENMASK_ULL(13, 12)
>  #define CTXDESC_CD_0_TCR_EPD0(1ULL << 14)
> -#define ARM64_TCR_EPD0   (1ULL << 7)
>  #define CTXDESC_CD_0_TCR_EPD1(1ULL << 30)
> -#define ARM64_TCR_EPD1   (1ULL << 23)
>  
>  #define CTXDESC_CD_0_ENDI(1UL << 15)
>  #define CTXDESC_CD_0_V   (1UL << 31)
>  
>  #define CTXDESC_CD_0_TCR_IPS GENMASK_ULL(34, 32)
> -#define ARM64_TCR_IPSGENMASK_ULL(34, 32)
>  #define CTXDESC_CD_0_TCR_TBI0(1ULL << 38)
> -#define ARM64_TCR_TBI0   (1ULL << 37)
>  
>  #define CTXDESC_CD_0_AA64(1UL << 41)
>  #define CTXDESC_CD_0_S   (1UL << 44)
> @@ -292,10 +283,6 @@
>  
>  #define CTXDESC_CD_1_TTB0_MASK   GENMASK_ULL(51, 4)
>  
> -/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
> -#define ARM_SMMU_TCR2CD(tcr, fld)FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
> - FIELD_GET(ARM64_TCR_##fld, tcr))
> -
>  /* Command queue */
>  #define CMDQ_ENT_SZ_SHIFT4
>  #define CMDQ_ENT_DWORDS  ((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
> @@ -1443,23 +1430,6 @@ static int arm_smmu_cmdq_issue_sync(struct 
> arm_smmu_device *smmu)
>  }
>  
>  /* Context descriptor manipulation functions */
> -static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> -{
> - u64 val = 0;
> -
> - /* Repack the TCR. Just care about TTBR0 for now */
> - val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
> - val |= ARM_SMMU_TCR2CD(tcr, TG0);
> - val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
> - val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
> - val |= ARM_SMMU_TCR2CD(tcr, SH0);
> - val |= ARM_SMMU_TCR2CD(tcr, EPD0);
> - val |= ARM_SMMU_TCR2CD(tcr, EPD1);
> - val |= ARM_SMMU_TCR2CD(tcr, IPS);
> -
> - return val;
> -}
> -
>  static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
>   struct arm_smmu_s1_cfg *cfg)
>  {
> @@ -1469,7 +1439,7 @@ static void arm_smmu_write_ctx_desc(struct 
> arm_smmu_device *smmu,
>* We don't need to issue any invalidation here, as we'll invalidate
>* the STE when installing the new entry anyway.
>*/
> - val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> + val = cfg->cd.tcr |
>  #ifdef __BIG_ENDIAN
> CTXDESC_CD_0_ENDI |
>  #endif
> @@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct 
> arm_smmu_domain *smmu_domain,
>   int asid;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
>   struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> + typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = 
> &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>  
>   asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
>   if (asid < 0)
> @@ -2171,7 +2142,13 @@ static int arm_smmu_domain_finalise_s1(struct 
> arm_smmu_domain *smmu_domain,
>  

Re: [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage

2019-11-22 Thread Jordan Crouse
On Fri, Oct 25, 2019 at 07:08:39PM +0100, Robin Murphy wrote:
> Now that we can correctly extract top-level indices without relying on
> the remaining upper bits being zero, the only remaining impediments to
> using a given table for TTBR1 are the address validation on map/unmap
> and the awkward TCR translation granule format. Add a quirk so that we
> can do the right thing at those points.

Tested-by: Jordan Crouse 

> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/io-pgtable-arm.c | 25 +++--
>  include/linux/io-pgtable.h |  4 
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 9b1912ede000..e53edff56e54 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -107,6 +107,10 @@
>  #define ARM_LPAE_TCR_TG0_64K 1
>  #define ARM_LPAE_TCR_TG0_16K 2
>  
> +#define ARM_LPAE_TCR_TG1_16K 1
> +#define ARM_LPAE_TCR_TG1_4K  2
> +#define ARM_LPAE_TCR_TG1_64K 3
> +
>  #define ARM_LPAE_TCR_SH0_SHIFT   12
>  #define ARM_LPAE_TCR_SH_NS   0
>  #define ARM_LPAE_TCR_SH_OS   2
> @@ -466,6 +470,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
> unsigned long iova,
>   arm_lpae_iopte *ptep = data->pgd;
>   int ret, lvl = data->start_level;
>   arm_lpae_iopte prot;
> + long iaext = (long)iova >> cfg->ias;
>  
>   /* If no access, then nothing to do */
>   if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
> @@ -474,7 +479,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
> unsigned long iova,
>   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>   return -EINVAL;
>  
> - if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
> + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> + iaext = ~iaext;
> + if (WARN_ON(iaext || paddr >> cfg->oas))
>   return -ERANGE;
>  
>   prot = arm_lpae_prot_to_pte(data, iommu_prot);
> @@ -640,11 +647,14 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops 
> *ops, unsigned long iova,
>   struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>   struct io_pgtable_cfg *cfg = &data->iop.cfg;
>   arm_lpae_iopte *ptep = data->pgd;
> + long iaext = (long)iova >> cfg->ias;
>  
>   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>   return 0;
>  
> - if (WARN_ON(iova >> data->iop.cfg.ias))
> + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> + iaext = ~iaext;
> + if (WARN_ON(iaext))
>   return 0;
>  
>   return __arm_lpae_unmap(data, gather, iova, size, data->start_level, 
> ptep);
> @@ -780,9 +790,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
> void *cookie)
>   u64 reg;
>   struct arm_lpae_io_pgtable *data;
>   typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
> + bool tg1;
>  
>   if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> - IO_PGTABLE_QUIRK_NON_STRICT))
> + IO_PGTABLE_QUIRK_NON_STRICT |
> + IO_PGTABLE_QUIRK_ARM_TTBR1))
>   return NULL;
>  
>   data = arm_lpae_alloc_pgtable(cfg);
> @@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg 
> *cfg, void *cookie)
>   tcr->orgn = ARM_LPAE_TCR_RGN_NC;
>   }
>  
> + tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
>   switch (ARM_LPAE_GRANULE(data)) {
>   case SZ_4K:
> - tcr->tg = ARM_LPAE_TCR_TG0_4K;
> + tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
>   break;
>   case SZ_16K:
> - tcr->tg = ARM_LPAE_TCR_TG0_16K;
> + tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
>   break;
>   case SZ_64K:
> - tcr->tg = ARM_LPAE_TCR_TG0_64K;
> + tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
>   break;
>   }
>  
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 6ae104cedfd7..d7c5cb685e50 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -83,12 +83,16 @@ struct io_pgtable_cfg {
>* IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
>*  on unmap, for DMA domains using the flush queue mechanism for
>*  delayed invalidation.
> +  *
> +  * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
> +  *  for use in the upper half of a split address space.
>*/
>   #define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
>   #define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
>   #define IO_PGTABLE_QUIRK_TLBI_ON_MAPBIT(2)
>   #define IO_PGTABLE_QUIRK_ARM_MTK_EXTBIT(3)
>   #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
> + #define IO_PGTABLE_QUIRK_ARM_TTB

Re: [PATCH] of: property: Add device link support for "iommu-map"

2019-11-22 Thread Rob Herring
On Fri, Nov 22, 2019 at 10:13 AM Ard Biesheuvel
 wrote:
>
> On Fri, 22 Nov 2019 at 17:01, Rob Herring  wrote:
> >
> > On Fri, Nov 22, 2019 at 8:55 AM Will Deacon  wrote:
> > >
> > > [+Ard]
> > >
> > > Hi Rob,
> > >
> > > On Fri, Nov 22, 2019 at 08:47:46AM -0600, Rob Herring wrote:
> > > > On Wed, Nov 20, 2019 at 1:00 PM Will Deacon  wrote:
> > > > >
> > > > > Commit 8e12257dead7 ("of: property: Add device link support for 
> > > > > iommus,
> > > > > mboxes and io-channels") added device link support for IOMMU linkages
> > > > > described using the "iommus" property. For PCI devices, this property
> > > > > is not present and instead the "iommu-map" property is used on the 
> > > > > host
> > > > > bridge node to map the endpoint RequesterIDs to their corresponding
> > > > > IOMMU instance.
> > > > >
> > > > > Add support for "iommu-map" to the device link supplier bindings so 
> > > > > that
> > > > > probing of PCI devices can be deferred until after the IOMMU is
> > > > > available.
> > > > >
> > > > > Cc: Greg Kroah-Hartman 
> > > > > Cc: Rob Herring 
> > > > > Cc: Saravana Kannan 
> > > > > Cc: Robin Murphy 
> > > > > Signed-off-by: Will Deacon 
> > > > > ---
> > > > >
> > > > > Applies against driver-core/driver-core-next.
> > > > > Tested on AMD Seattle (arm64).
> > > >
> > > > Guess that answers my question whether anyone uses Seattle with DT.
> > > > Seattle uses the old SMMU binding, and there's not even an IOMMU
> > > > associated with the PCI host. I raise this mainly because the dts
> > > > files for Seattle either need some love or perhaps should be removed.
> > >
> > > I'm using the new DT bindings on my Seattle, thanks to the firmware fairy
> > > (Ard) visiting my flat with a dediprog. The patches I've posted to enable
> > > modular builds of the arm-smmu driver require that the old binding is
> > > disabled [1].
> >
> > Going to post those dts changes?
> >
>
> Last time I tried upstreaming seattle DT changes I got zero response,
> so I didn't bother since.

I leave most dts reviews up to sub-arch maintainers and I'm pretty
sure AMD doesn't care about it anymore, so we need a new maintainer or
just send a pull request to Arnd/Olof.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Qian Cai
When running heavy memory pressure workloads, this 5+ old system is
throwing endless warnings below because disk IO is too slow to recover
from swapping. Since the volume from alloc_iova_fast() could be large,
once it calls printk(), it will trigger disk IO (writing to the log
files) and pending softirqs which could cause an infinite loop and make
no progress for days by the ongoimng memory reclaim. This is the counter
part for Intel where the AMD part has already been merged. See the
commit 3d708895325b ("iommu/amd: Silence warnings under memory
pressure"). Since the allocation failure will be reported in
intel_alloc_iova(), so just call dev_err_once() there because even the
"ratelimited" is too much, and silence the one in alloc_iova_mem() to
avoid the expensive warn_alloc().

 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 slab_out_of_memory: 66 callbacks suppressed
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
   node 0: slabs: 1822, objs: 16398, free: 0
   node 1: slabs: 2051, objs: 18459, free: 31
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
   node 0: slabs: 1822, objs: 16398, free: 0
   node 1: slabs: 2051, objs: 18459, free: 31
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 0: slabs: 1822, objs: 16398, free: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 1: slabs: 2051, objs: 18459, free: 31
   node 0: slabs: 697, objs: 4182, free: 0
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   node 1: slabs: 381, objs: 2286, free: 27
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 1: slabs: 381, objs: 2286, free: 27
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 warn_alloc: 96 callbacks suppressed
 kworker/11:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1
 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: GB
 Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19
12/27/2015
 Workqueue: kblockd blk_mq_run_work_fn
 Call Trace:
  dump_stack+0xa0/0xea
  warn_alloc.cold.94+0x8a/0x12d
  __alloc_pages_slowpath+0x1750/0x1870
  __alloc_pages_nodemask+0x58a/0x710
  alloc_pages_current+0x9c/0x110
  alloc_slab_page+0xc9/0x760
  allocate_slab+0x48f/0x5d0
  new_slab+0x46/0x70
  ___slab_alloc+0x4ab/0x7b0
  __slab_alloc+0x43/0x70
  kmem_cache_alloc+0x2dd/0x450
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
  alloc_iova+0x33/0x210
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
  alloc_iova_fast+0x62/0x3d1
   node 1: slabs: 381, objs: 2286, free: 27
  intel_alloc_iova+0xce/0xe0
  intel_map_sg+0xed/0x410
  scsi_dma_map+0xd7/0x160
  scsi_queue_rq+0xbf7/0x1310
  blk_mq_dispatch_rq_list+0x4d9/0xbc0
  blk_mq_sched_dispatch_requests+0x24a/0x300
  __blk_mq_run_hw_queue+0x156/0x230
  blk_mq_run_work_fn+0x3b/0x40
  process_one_work+0x579/0xb90
  worker_thread+0x63/0x5b0
  kthread+0x1e6/0x210
  ret_from_fork+0x3a/0x50
 Mem-Info:
 active_anon:2422723 inactive_anon:361971 isolated_anon:34403
  active_file:

[PATCH] iommu/arm-smmu: support SMMU module probing from the IORT

2019-11-22 Thread Ard Biesheuvel
Add support for SMMU drivers built as modules to the ACPI/IORT device
probing path, by deferring the probe of the master if the SMMU driver is
known to exist but has not been loaded yet. Given that the IORT code
registers a platform device for each SMMU that it discovers, we can
easily trigger the udev based autoloading of the SMMU drivers by making
the platform device identifier part of the module alias.

Signed-off-by: Ard Biesheuvel 
---
 drivers/acpi/arm64/iort.c   | 4 ++--
 drivers/iommu/arm-smmu-v3.c | 1 +
 drivers/iommu/arm-smmu.c| 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 5a7551d060f2..a696457a9b11 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -850,9 +850,9 @@ static inline bool iort_iommu_driver_enabled(u8 type)
 {
switch (type) {
case ACPI_IORT_NODE_SMMU_V3:
-   return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
+   return IS_ENABLED(CONFIG_ARM_SMMU_V3);
case ACPI_IORT_NODE_SMMU:
-   return IS_BUILTIN(CONFIG_ARM_SMMU);
+   return IS_ENABLED(CONFIG_ARM_SMMU);
default:
pr_warn("IORT node type %u does not describe an SMMU\n", type);
return false;
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 7669beafc493..bf6a1e8eb9b0 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3733,4 +3733,5 @@ module_platform_driver(arm_smmu_driver);
 
 MODULE_DESCRIPTION("IOMMU API for ARM architected SMMUv3 implementations");
 MODULE_AUTHOR("Will Deacon ");
+MODULE_ALIAS("platform:arm-smmu-v3");
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d55acc48aee3..db5106b0955b 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -2292,4 +2292,5 @@ module_platform_driver(arm_smmu_driver);
 
 MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations");
 MODULE_AUTHOR("Will Deacon ");
+MODULE_ALIAS("platform:arm-smmu");
 MODULE_LICENSE("GPL v2");
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Qian Cai
When running heavy memory pressure workloads, this 5+ old system is
throwing endless warnings below because disk IO is too slow to recover
from swapping. Since the volume from alloc_iova_fast() could be large,
once it calls printk(), it will trigger disk IO (writing to the log
files) and pending softirqs which could cause an infinite loop and make
no progress for days by the ongoimng memory reclaim. This is the counter
part for Intel where the AMD part has already been merged. See the
commit 3d708895325b ("iommu/amd: Silence warnings under memory
pressure"). Since the allocation failure will be reported in
intel_alloc_iova(), so just call printk_ratelimted() there and silence
the one in alloc_iova_mem() to avoid the expensive warn_alloc().

 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 slab_out_of_memory: 66 callbacks suppressed
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
   node 0: slabs: 1822, objs: 16398, free: 0
   node 1: slabs: 2051, objs: 18459, free: 31
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
   node 0: slabs: 1822, objs: 16398, free: 0
   node 1: slabs: 2051, objs: 18459, free: 31
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: iommu_iova, object size: 40, buffer size: 448, default order:
0, min order: 0
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 1: slabs: 381, objs: 2286, free: 27
   node 0: slabs: 1822, objs: 16398, free: 0
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 1: slabs: 2051, objs: 18459, free: 31
   node 0: slabs: 697, objs: 4182, free: 0
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
   node 1: slabs: 381, objs: 2286, free: 27
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
   node 1: slabs: 381, objs: 2286, free: 27
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 warn_alloc: 96 callbacks suppressed
 kworker/11:1H: page allocation failure: order:0,
mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1
 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: GB
 Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19
12/27/2015
 Workqueue: kblockd blk_mq_run_work_fn
 Call Trace:
  dump_stack+0xa0/0xea
  warn_alloc.cold.94+0x8a/0x12d
  __alloc_pages_slowpath+0x1750/0x1870
  __alloc_pages_nodemask+0x58a/0x710
  alloc_pages_current+0x9c/0x110
  alloc_slab_page+0xc9/0x760
  allocate_slab+0x48f/0x5d0
  new_slab+0x46/0x70
  ___slab_alloc+0x4ab/0x7b0
  __slab_alloc+0x43/0x70
  kmem_cache_alloc+0x2dd/0x450
 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
  alloc_iova+0x33/0x210
   cache: skbuff_head_cache, object size: 208, buffer size: 640, default
order: 0, min order: 0
   node 0: slabs: 697, objs: 4182, free: 0
  alloc_iova_fast+0x62/0x3d1
   node 1: slabs: 381, objs: 2286, free: 27
  intel_alloc_iova+0xce/0xe0
  intel_map_sg+0xed/0x410
  scsi_dma_map+0xd7/0x160
  scsi_queue_rq+0xbf7/0x1310
  blk_mq_dispatch_rq_list+0x4d9/0xbc0
  blk_mq_sched_dispatch_requests+0x24a/0x300
  __blk_mq_run_hw_queue+0x156/0x230
  blk_mq_run_work_fn+0x3b/0x40
  process_one_work+0x579/0xb90
  worker_thread+0x63/0x5b0
  kthread+0x1e6/0x210
  ret_from_fork+0x3a/0x50
 Mem-Info:
 active_anon:2422723 inactive_anon:361971 isolated_anon:34403
  active_file:2285 inactive_file:1838 isolated_file:0

Re: [PATCH v2] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Qian Cai
On Fri, 2019-11-22 at 08:28 -0800, Joe Perches wrote:
> On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote:
> > On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote:
> > > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote:
> > > > When running heavy memory pressure workloads, this 5+ old system is
> > > > throwing endless warnings below because disk IO is too slow to recover
> > > > from swapping. Since the volume from alloc_iova_fast() could be large,
> > > > once it calls printk(), it will trigger disk IO (writing to the log
> > > > files) and pending softirqs which could cause an infinite loop and make
> > > > no progress for days by the ongoimng memory reclaim. This is the counter
> > > > part for Intel where the AMD part has already been merged. See the
> > > > commit 3d708895325b ("iommu/amd: Silence warnings under memory
> > > > pressure"). Since the allocation failure will be reported in
> > > > intel_alloc_iova(), so just call printk_ratelimted() there and silence
> > > > the one in alloc_iova_mem() to avoid the expensive warn_alloc().
> > > 
> > > []
> > > > v2: use dev_err_ratelimited() and improve the commit messages.
> > > 
> > > []
> > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > > 
> > > []
> > > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct 
> > > > device *dev,
> > > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
> > > >IOVA_PFN(dma_mask), true);
> > > > if (unlikely(!iova_pfn)) {
> > > > -   dev_err(dev, "Allocating %ld-page iova failed", 
> > > > nrpages);
> > > > +   dev_err_ratelimited(dev, "Allocating %ld-page iova 
> > > > failed",
> > > > +   nrpages);
> > > 
> > > Trivia:
> > > 
> > > This should really have a \n termination on the format string
> > > 
> > >   dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n",
> > > 
> > > 
> > 
> > Why do you say so? It is right now printing with a newline added anyway.
> > 
> >  hpsa :03:00.0: DMAR: Allocating 1-page iova failed
> 
> If another process uses pr_cont at the same time,
> it can be interleaved.

I lean towards fixing that in a separate patch if ever needed, as the origin
dev_err() has no "\n" enclosed either.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Joe Perches
On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote:
> On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote:
> > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote:
> > > When running heavy memory pressure workloads, this 5+ old system is
> > > throwing endless warnings below because disk IO is too slow to recover
> > > from swapping. Since the volume from alloc_iova_fast() could be large,
> > > once it calls printk(), it will trigger disk IO (writing to the log
> > > files) and pending softirqs which could cause an infinite loop and make
> > > no progress for days by the ongoimng memory reclaim. This is the counter
> > > part for Intel where the AMD part has already been merged. See the
> > > commit 3d708895325b ("iommu/amd: Silence warnings under memory
> > > pressure"). Since the allocation failure will be reported in
> > > intel_alloc_iova(), so just call printk_ratelimted() there and silence
> > > the one in alloc_iova_mem() to avoid the expensive warn_alloc().
> > 
> > []
> > > v2: use dev_err_ratelimited() and improve the commit messages.
> > 
> > []
> > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > 
> > []
> > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct device 
> > > *dev,
> > >   iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
> > >  IOVA_PFN(dma_mask), true);
> > >   if (unlikely(!iova_pfn)) {
> > > - dev_err(dev, "Allocating %ld-page iova failed", nrpages);
> > > + dev_err_ratelimited(dev, "Allocating %ld-page iova failed",
> > > + nrpages);
> > 
> > Trivia:
> > 
> > This should really have a \n termination on the format string
> > 
> > dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n",
> > 
> > 
> 
> Why do you say so? It is right now printing with a newline added anyway.
> 
>  hpsa :03:00.0: DMAR: Allocating 1-page iova failed

If another process uses pr_cont at the same time,
it can be interleaved.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 1/4] uacce: Add documents for uacce

2019-11-22 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) is
a kernel module targets to provide Shared Virtual Addressing (SVA)
between the accelerator and process.

This patch add document to explain how it works.

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Zhangfei Gao 
---
 Documentation/misc-devices/uacce.rst | 176 +++
 1 file changed, 176 insertions(+)
 create mode 100644 Documentation/misc-devices/uacce.rst

diff --git a/Documentation/misc-devices/uacce.rst 
b/Documentation/misc-devices/uacce.rst
new file mode 100644
index 000..1db412e
--- /dev/null
+++ b/Documentation/misc-devices/uacce.rst
@@ -0,0 +1,176 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Introduction of Uacce
+-
+
+Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
+provide Shared Virtual Addressing (SVA) between accelerators and processes.
+So accelerator can access any data structure of the main cpu.
+This differs from the data sharing between cpu and io device, which share
+only data content rather than address.
+Because of the unified address, hardware and user space of process can
+share the same virtual address in the communication.
+Uacce takes the hardware accelerator as a heterogeneous processor, while
+IOMMU share the same CPU page tables and as a result the same translation
+from va to pa.
+
+::
+
+ __   __
+|  | |  |
+|  User application (CPU)  | |   Hardware Accelerator   |
+|__| |__|
+
+ | |
+ | va  | va
+ V V
+ ____
+|  |  |  |
+|   MMU|  |  IOMMU   |
+|__|  |__|
+ | |
+ | |
+ V pa  V pa
+ ___
+|   |
+|  Memory   |
+|___|
+
+
+
+Architecture
+
+
+Uacce is the kernel module, taking charge of iommu and address sharing.
+The user drivers and libraries are called WarpDrive.
+
+The uacce device, built around the IOMMU SVA API, can access multiple
+address spaces, including the one without PASID.
+
+A virtual concept, queue, is used for the communication. It provides a
+FIFO-like interface. And it maintains a unified address space between the
+application and all involved hardware.
+
+::
+
+ ___  

+|   |   user API | 
   |
+| WarpDrive library | >  |  user 
driver   |
+|___|
||
+ ||
+ ||
+ | queue fd   |
+ ||
+ ||
+ v|
+ ___ _|
+|   |   | |   | 
mmap memory
+| Other framework   |   |  uacce  |   | 
r/w interface
+| crypto/nic/others |   |_|   |
+|___| |
+ |   ||
+ | register  | register   |
+ |   ||
+ |   ||
+ |_   __  |
+ |   | | |  | |
+  -  |  Device Driver  | |  IOMMU   | |
+ |_| |__| |
+ ||
+

Re: [PATCH] of: property: Add device link support for "iommu-map"

2019-11-22 Thread Ard Biesheuvel
On Fri, 22 Nov 2019 at 17:01, Rob Herring  wrote:
>
> On Fri, Nov 22, 2019 at 8:55 AM Will Deacon  wrote:
> >
> > [+Ard]
> >
> > Hi Rob,
> >
> > On Fri, Nov 22, 2019 at 08:47:46AM -0600, Rob Herring wrote:
> > > On Wed, Nov 20, 2019 at 1:00 PM Will Deacon  wrote:
> > > >
> > > > Commit 8e12257dead7 ("of: property: Add device link support for iommus,
> > > > mboxes and io-channels") added device link support for IOMMU linkages
> > > > described using the "iommus" property. For PCI devices, this property
> > > > is not present and instead the "iommu-map" property is used on the host
> > > > bridge node to map the endpoint RequesterIDs to their corresponding
> > > > IOMMU instance.
> > > >
> > > > Add support for "iommu-map" to the device link supplier bindings so that
> > > > probing of PCI devices can be deferred until after the IOMMU is
> > > > available.
> > > >
> > > > Cc: Greg Kroah-Hartman 
> > > > Cc: Rob Herring 
> > > > Cc: Saravana Kannan 
> > > > Cc: Robin Murphy 
> > > > Signed-off-by: Will Deacon 
> > > > ---
> > > >
> > > > Applies against driver-core/driver-core-next.
> > > > Tested on AMD Seattle (arm64).
> > >
> > > Guess that answers my question whether anyone uses Seattle with DT.
> > > Seattle uses the old SMMU binding, and there's not even an IOMMU
> > > associated with the PCI host. I raise this mainly because the dts
> > > files for Seattle either need some love or perhaps should be removed.
> >
> > I'm using the new DT bindings on my Seattle, thanks to the firmware fairy
> > (Ard) visiting my flat with a dediprog. The patches I've posted to enable
> > modular builds of the arm-smmu driver require that the old binding is
> > disabled [1].
>
> Going to post those dts changes?
>

Last time I tried upstreaming seattle DT changes I got zero response,
so I didn't bother since.


> > > No issues with the patch itself though. I'll queue it after rc1.
> >
> > Thanks, although I think Greg has already queued it [2] due to the
> > dependencies on other patches in his tree.
>
> Okay, forgot to check my spam from Greg folder and missed that.
>
> Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of: property: Add device link support for "iommu-map"

2019-11-22 Thread Rob Herring
On Fri, Nov 22, 2019 at 8:55 AM Will Deacon  wrote:
>
> [+Ard]
>
> Hi Rob,
>
> On Fri, Nov 22, 2019 at 08:47:46AM -0600, Rob Herring wrote:
> > On Wed, Nov 20, 2019 at 1:00 PM Will Deacon  wrote:
> > >
> > > Commit 8e12257dead7 ("of: property: Add device link support for iommus,
> > > mboxes and io-channels") added device link support for IOMMU linkages
> > > described using the "iommus" property. For PCI devices, this property
> > > is not present and instead the "iommu-map" property is used on the host
> > > bridge node to map the endpoint RequesterIDs to their corresponding
> > > IOMMU instance.
> > >
> > > Add support for "iommu-map" to the device link supplier bindings so that
> > > probing of PCI devices can be deferred until after the IOMMU is
> > > available.
> > >
> > > Cc: Greg Kroah-Hartman 
> > > Cc: Rob Herring 
> > > Cc: Saravana Kannan 
> > > Cc: Robin Murphy 
> > > Signed-off-by: Will Deacon 
> > > ---
> > >
> > > Applies against driver-core/driver-core-next.
> > > Tested on AMD Seattle (arm64).
> >
> > Guess that answers my question whether anyone uses Seattle with DT.
> > Seattle uses the old SMMU binding, and there's not even an IOMMU
> > associated with the PCI host. I raise this mainly because the dts
> > files for Seattle either need some love or perhaps should be removed.
>
> I'm using the new DT bindings on my Seattle, thanks to the firmware fairy
> (Ard) visiting my flat with a dediprog. The patches I've posted to enable
> modular builds of the arm-smmu driver require that the old binding is
> disabled [1].

Going to post those dts changes?

> > No issues with the patch itself though. I'll queue it after rc1.
>
> Thanks, although I think Greg has already queued it [2] due to the
> dependencies on other patches in his tree.

Okay, forgot to check my spam from Greg folder and missed that.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling

2019-11-22 Thread Robin Murphy

On 20/11/2019 3:11 pm, Will Deacon wrote:

On Mon, Nov 04, 2019 at 04:27:56PM -0700, Jordan Crouse wrote:

On Mon, Nov 04, 2019 at 07:14:45PM +, Will Deacon wrote:

On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 9a57eb6c253c..059be7e21030 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain 
*domain,
iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
FIELD_PREP(TTBRn_ASID, ctx->asid));
-   iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
-   FIELD_PREP(TTBRn_ASID, ctx->asid));
+   iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);


Are you sure it's safe to drop the ASID here? Just want to make sure there
wasn't some "quirk" this was helping with.


I was reminded of this recently. Some of our SMMU guys told me that a 0x0 in
TTBR1 could cause a S2 fault if a faulty transaction caused a ttbr1 lookup so
the "quirk" was writing the ASID so the register wasn't zero. I'm not sure if
this is a vendor specific blip or not.


You should be able to set EPD1 to prevent walks via TTBR1 in that case,
though. Sticking the ASID in there is still dodgy if EPD1 is clear and
TTBR1 points at junk (or even physical address 0x0).

That's probably something which should be folded into this patch.


Note that EPD1 was being set by io-pgtable-arm before this patch, and 
remains set by virtue of arm_smmu_lpae_tcr() afterwards, so presumably 
the brokenness might run a bit deeper than that. Either way, though, I'm 
somewhat dubious since the ASID could well be 0 anyway :/


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 3/4] crypto: hisilicon - Remove module_param uacce_mode

2019-11-22 Thread Zhangfei Gao
Remove the module_param uacce_mode, which is not used currently.

Signed-off-by: Zhangfei Gao 
Signed-off-by: Zhou Wang 
---
 drivers/crypto/hisilicon/zip/zip_main.c | 31 ++-
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/crypto/hisilicon/zip/zip_main.c 
b/drivers/crypto/hisilicon/zip/zip_main.c
index 1b2ee96..3de9412 100644
--- a/drivers/crypto/hisilicon/zip/zip_main.c
+++ b/drivers/crypto/hisilicon/zip/zip_main.c
@@ -264,9 +264,6 @@ static u32 pf_q_num = HZIP_PF_DEF_Q_NUM;
 module_param_cb(pf_q_num, &pf_q_num_ops, &pf_q_num, 0444);
 MODULE_PARM_DESC(pf_q_num, "Number of queues in PF(v1 1-4096, v2 1-1024)");
 
-static int uacce_mode;
-module_param(uacce_mode, int, 0);
-
 static const struct pci_device_id hisi_zip_dev_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_ZIP_PF) },
{ PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_ZIP_VF) },
@@ -669,6 +666,7 @@ static int hisi_zip_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
pci_set_drvdata(pdev, hisi_zip);
 
qm = &hisi_zip->qm;
+   qm->use_dma_api = true;
qm->pdev = pdev;
qm->ver = rev_id;
 
@@ -676,20 +674,6 @@ static int hisi_zip_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
qm->dev_name = hisi_zip_name;
qm->fun_type = (pdev->device == PCI_DEVICE_ID_ZIP_PF) ? QM_HW_PF :
QM_HW_VF;
-   switch (uacce_mode) {
-   case 0:
-   qm->use_dma_api = true;
-   break;
-   case 1:
-   qm->use_dma_api = false;
-   break;
-   case 2:
-   qm->use_dma_api = true;
-   break;
-   default:
-   return -EINVAL;
-   }
-
ret = hisi_qm_init(qm);
if (ret) {
dev_err(&pdev->dev, "Failed to init qm!\n");
@@ -976,12 +960,10 @@ static int __init hisi_zip_init(void)
goto err_pci;
}
 
-   if (uacce_mode == 0 || uacce_mode == 2) {
-   ret = hisi_zip_register_to_crypto();
-   if (ret < 0) {
-   pr_err("Failed to register driver to crypto.\n");
-   goto err_crypto;
-   }
+   ret = hisi_zip_register_to_crypto();
+   if (ret < 0) {
+   pr_err("Failed to register driver to crypto.\n");
+   goto err_crypto;
}
 
return 0;
@@ -996,8 +978,7 @@ static int __init hisi_zip_init(void)
 
 static void __exit hisi_zip_exit(void)
 {
-   if (uacce_mode == 0 || uacce_mode == 2)
-   hisi_zip_unregister_from_crypto();
+   hisi_zip_unregister_from_crypto();
pci_unregister_driver(&hisi_zip_pci_driver);
hisi_zip_unregister_debugfs();
 }
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 4/4] crypto: hisilicon - register zip engine to uacce

2019-11-22 Thread Zhangfei Gao
Register qm to uacce framework for user crypto driver

Signed-off-by: Zhangfei Gao 
Signed-off-by: Zhou Wang 
---
 drivers/crypto/hisilicon/qm.c   | 234 +++-
 drivers/crypto/hisilicon/qm.h   |  11 ++
 drivers/crypto/hisilicon/zip/zip_main.c |  16 ++-
 include/uapi/misc/uacce/hisi_qm.h   |  23 
 4 files changed, 277 insertions(+), 7 deletions(-)
 create mode 100644 include/uapi/misc/uacce/hisi_qm.h

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index a8ed6990..7d23daa 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -9,6 +9,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include "qm.h"
 
 /* eq/aeq irq enable */
@@ -467,6 +470,11 @@ static void qm_poll_qp(struct hisi_qp *qp, struct hisi_qm 
*qm)
 {
struct qm_cqe *cqe = qp->cqe + qp->qp_status.cq_head;
 
+   if (qp->event_cb) {
+   qp->event_cb(qp);
+   return;
+   }
+
if (qp->req_cb) {
while (QM_CQE_PHASE(cqe) == qp->qp_status.cqc_phase) {
dma_rmb();
@@ -1271,7 +1279,7 @@ static int qm_qp_ctx_cfg(struct hisi_qp *qp, int qp_id, 
int pasid)
  * @qp: The qp we want to start to run.
  * @arg: Accelerator specific argument.
  *
- * After this function, qp can receive request from user. Return qp_id if
+ * After this function, qp can receive request from user. Return 0 if
  * successful, Return -EBUSY if failed.
  */
 int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg)
@@ -1316,7 +1324,7 @@ int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long 
arg)
 
dev_dbg(dev, "queue %d started\n", qp_id);
 
-   return qp_id;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(hisi_qm_start_qp);
 
@@ -1397,6 +1405,213 @@ static void hisi_qm_cache_wb(struct hisi_qm *qm)
}
 }
 
+static void qm_qp_event_notifier(struct hisi_qp *qp)
+{
+   wake_up_interruptible(&qp->uacce_q->wait);
+}
+
+static int hisi_qm_get_available_instances(struct uacce_device *uacce)
+{
+   int i, ret;
+   struct hisi_qm *qm = uacce->priv;
+
+   read_lock(&qm->qps_lock);
+   for (i = 0, ret = 0; i < qm->qp_num; i++)
+   if (!qm->qp_array[i])
+   ret++;
+   read_unlock(&qm->qps_lock);
+
+   return ret;
+}
+
+static int hisi_qm_uacce_get_queue(struct uacce_device *uacce,
+  unsigned long arg,
+  struct uacce_queue *q)
+{
+   struct hisi_qm *qm = uacce->priv;
+   struct hisi_qp *qp;
+   u8 alg_type = 0;
+
+   qp = hisi_qm_create_qp(qm, alg_type);
+   if (IS_ERR(qp))
+   return PTR_ERR(qp);
+
+   q->priv = qp;
+   q->uacce = uacce;
+   qp->uacce_q = q;
+   qp->event_cb = qm_qp_event_notifier;
+   qp->pasid = arg;
+
+   return 0;
+}
+
+static void hisi_qm_uacce_put_queue(struct uacce_queue *q)
+{
+   struct hisi_qp *qp = q->priv;
+
+   hisi_qm_cache_wb(qp->qm);
+   hisi_qm_release_qp(qp);
+}
+
+/* map sq/cq/doorbell to user space */
+static int hisi_qm_uacce_mmap(struct uacce_queue *q,
+ struct vm_area_struct *vma,
+ struct uacce_qfile_region *qfr)
+{
+   struct hisi_qp *qp = q->priv;
+   struct hisi_qm *qm = qp->qm;
+   size_t sz = vma->vm_end - vma->vm_start;
+   struct pci_dev *pdev = qm->pdev;
+   struct device *dev = &pdev->dev;
+   unsigned long vm_pgoff;
+   int ret;
+
+   switch (qfr->type) {
+   case UACCE_QFRT_MMIO:
+   if (qm->ver == QM_HW_V2) {
+   if (sz > PAGE_SIZE * (QM_DOORBELL_PAGE_NR +
+   QM_DOORBELL_SQ_CQ_BASE_V2 / PAGE_SIZE))
+   return -EINVAL;
+   } else {
+   if (sz > PAGE_SIZE * QM_DOORBELL_PAGE_NR)
+   return -EINVAL;
+   }
+
+   vma->vm_flags |= VM_IO;
+
+   return remap_pfn_range(vma, vma->vm_start,
+  qm->phys_base >> PAGE_SHIFT,
+  sz, pgprot_noncached(vma->vm_page_prot));
+   case UACCE_QFRT_DUS:
+   if (sz != qp->qdma.size)
+   return -EINVAL;
+
+   /*
+* dma_mmap_coherent() requires vm_pgoff as 0
+* restore vm_pfoff to initial value for mmap()
+*/
+   vm_pgoff = vma->vm_pgoff;
+   vma->vm_pgoff = 0;
+   ret = dma_mmap_coherent(dev, vma, qp->qdma.va,
+   qp->qdma.dma, sz);
+   vma->vm_pgoff = vm_pgoff;
+   return ret;
+
+   default:
+   return -EINVAL;
+   }
+}
+
+static int hisi_qm_uacce_start_queue(struct uacce_queue *q)
+{
+   struct hisi_qp *qp = q->priv;
+
+   return hisi_qm_start_qp(qp, qp->pa

Re: [PATCH v2 6/8] iommu/arm-smmu-v3: Add second level of context descriptor table

2019-11-22 Thread Jean-Philippe Brucker
On Mon, Nov 11, 2019 at 03:50:07PM +, Jonathan Cameron wrote:
> > +   cfg->l1ptr = dmam_alloc_coherent(smmu->dev, size,
> > +&cfg->l1ptr_dma,
> > +GFP_KERNEL | __GFP_ZERO);
> 
> As before.  Fairly sure __GFP_ZERO doesn't give you anything extra.

Indeed

> > +   if (!cfg->l1ptr) {
> > +   dev_warn(smmu->dev, "failed to allocate L1 context 
> > table\n");
> > +   return -ENOMEM;
> > +   }
> > +   }
> > +
> > +   cfg->tables = devm_kzalloc(smmu->dev, sizeof(struct arm_smmu_cd_table) *
> > +  cfg->num_tables, GFP_KERNEL);
> > +   if (!cfg->tables) {
> > +   ret = -ENOMEM;
> > +   goto err_free_l1;
> > +   }
> > +
> > +   /* With two levels, leaf tables are allocated lazily */
> This comment is a kind of odd one.  It is actually talking about what
> 'doesn't' happen here I think..
> 
> Perhaps /*
>  * Only allocate a leaf table for linear case.
>  * With two levels, the leaf tables are allocated lazily.
>*/

Yes, that's clearer

> > +   if (!cfg->l1ptr) {
> > +   ret = arm_smmu_alloc_cd_leaf_table(smmu, &cfg->tables[0],
> > +  max_contexts);
> > +   if (ret)
> > +   goto err_free_tables;
> > +   }
> > +
> > +   return 0;
> > +
> > +err_free_tables:
> > +   devm_kfree(smmu->dev, cfg->tables);
> > +err_free_l1:
> > +   if (cfg->l1ptr)
> > +   dmam_free_coherent(smmu->dev, size, cfg->l1ptr, cfg->l1ptr_dma);
> 
> This cleanup only occurs if we have had an error.
> Is there potential for this to rerun at some point later?  If so we should
> be careful to also reset relevant pointers - e.g. cfg->l1ptr = NULL as
> they are used to control the flow above.

Yes we should definitely clear l1ptr. The domain may be managed by a
device driver, and if attach_dev() fails they will call domain_free(),
which checks this pointer. Plus nothing prevents them from calling
attach_dev() again with the same domain.

> If there is no chance of a rerun why bother cleaning them up at all?  
> Something
> has gone horribly wrong so let the eventual smmu cleanup deal with them.

The domain is much shorter-lived than the SMMU device, so we need this
cleanup.

> > +   return ret;
> >  }
> >  
> >  static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain)
> >  {
> > +   int i;
> > struct arm_smmu_device *smmu = smmu_domain->smmu;
> > struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> > +   size_t num_leaf_entries = 1 << cfg->s1cdmax;
> > +   struct arm_smmu_cd_table *table = cfg->tables;
> >  
> > -   arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax);
> > +   if (cfg->l1ptr) {
> > +   size_t size = cfg->num_tables * (CTXDESC_L1_DESC_DWORDS << 3);
> > +
> > +   dmam_free_coherent(smmu->dev, size, cfg->l1ptr, cfg->l1ptr_dma);
> 
>   As above, if we can call this in a fashion that makes sense
>   other than in eventual smmu tear down, then we need to be
>   careful to reset the pointers.   If not, then why are we 
> clearing
>   managed resourced by hand anyway?

Yes, we call this on the error cleanup path (not only domain_free()), so
it needs to leave the domain in a usable state.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 8/8] iommu/arm-smmu-v3: Add support for PCI PASID

2019-11-22 Thread Jean-Philippe Brucker
Hi Jonathan,

On Mon, Nov 11, 2019 at 04:05:29PM +, Jonathan Cameron wrote:
> On Fri, 8 Nov 2019 16:25:08 +0100
> Jean-Philippe Brucker  wrote:
> 
> > Enable PASID for PCI devices that support it. Since the SSID tables are
> > allocated by arm_smmu_attach_dev(), PASID has to be enabled early enough.
> > arm_smmu_dev_feature_enable() would be too late, since by that time the
> > main DMA domain has already been attached. Do it in add_device() instead.
> > 
> > Signed-off-by: Jean-Philippe Brucker 
> Seems straightforward.
> 
> Reviewed-by: Jonathan Cameron 
> 
> Thanks for working on this stuff.  I hope we an move to get the rest of the
> SVA elements lined up behind it so everything moves quickly in the next
> cycle (or two).

Thanks a lot for the thorough review. I'm aiming for v5.6 for the PASID
series, and then realistically v5.7 for the rest of SVA, but I'll try to
send it sooner.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/8] iommu/arm-smmu-v3: Prepare for SSID support

2019-11-22 Thread Jean-Philippe Brucker
On Mon, Nov 11, 2019 at 02:38:11PM +, Jonathan Cameron wrote:
> Hmm. There are several different refactors in here alongside a few new
> bits.  Would be nice to break it up more to make life even easier for
> reviewers.   It's not 'so' complex that it's really a problem though
> so could leave it as is if you really want to.

Sure, I'll see if I can split it more in next version.

> > +   table->ptr = dmam_alloc_coherent(smmu->dev, size, &table->ptr_dma,
> > +GFP_KERNEL | __GFP_ZERO);
> 
> We dropped dma_zalloc_coherent because we now zero in dma_alloc_coherent
> anyway.  Hence I'm fairly sure that __GFP_ZERO should have no effect.
> 
> https://lore.kernel.org/patchwork/patch/1031536/
> 
> Am I missing some special corner case here?

Here I just copied the GFP flags already in use. But removing all
__GFP_ZERO from the driver would make a good cleanup patch.

> > -   if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
> > -   arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg);
> > -
> 
> Whilst it seems fine, perhaps a note on the 'why' of moving this into
> finalise_s1 would be good in the patch description.

Ok. Since it's only to simplify the handling of allocation failure in a
subsequent patch, I think I'll move that part over there.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v9 2/4] uacce: add uacce driver

2019-11-22 Thread Zhangfei Gao
From: Kenneth Lee 

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
only data content rather than address.
Since unified address, hardware and user space of process can share the
same virtual address in the communication.

Uacce create a chrdev for every registration, the queue is allocated to
the process when the chrdev is opened. Then the process can access the
hardware resource by interact with the queue file. By mmap the queue
file space to user space, the process can directly put requests to the
hardware without syscall to the kernel space.

The IOMMU core only tracks mm<->device bonds at the moment, because it
only needs to handle IOTLB invalidation and PASID table entries. However
uacce needs a finer granularity since multiple queues from the same
device can be bound to an mm. When the mm exits, all bound queues must
be stopped so that the IOMMU can safely clear the PASID table entry and
reallocate the PASID.

An intermediate struct uacce_mm links uacce devices and queues.
Note that an mm may be bound to multiple devices but an uacce_mm
structure only ever belongs to a single device, because we don't need
anything more complex (if multiple devices are bound to one mm, then
we'll create one uacce_mm for each bond).

uacce_device --+-- uacce_mm --+-- uacce_queue
   |  '-- uacce_queue
   |
   '-- uacce_mm --+-- uacce_queue
  +-- uacce_queue
  '-- uacce_queue

Signed-off-by: Kenneth Lee 
Signed-off-by: Zaibo Xu 
Signed-off-by: Zhou Wang 
Signed-off-by: Jean-Philippe Brucker 
Signed-off-by: Zhangfei Gao 
---
 Documentation/ABI/testing/sysfs-driver-uacce |  37 ++
 drivers/misc/Kconfig |   1 +
 drivers/misc/Makefile|   1 +
 drivers/misc/uacce/Kconfig   |  13 +
 drivers/misc/uacce/Makefile  |   2 +
 drivers/misc/uacce/uacce.c   | 627 +++
 include/linux/uacce.h| 161 +++
 include/uapi/misc/uacce/uacce.h  |  38 ++
 8 files changed, 880 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce
 create mode 100644 drivers/misc/uacce/Kconfig
 create mode 100644 drivers/misc/uacce/Makefile
 create mode 100644 drivers/misc/uacce/uacce.c
 create mode 100644 include/linux/uacce.h
 create mode 100644 include/uapi/misc/uacce/uacce.h

diff --git a/Documentation/ABI/testing/sysfs-driver-uacce 
b/Documentation/ABI/testing/sysfs-driver-uacce
new file mode 100644
index 000..0fc6c957
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-uacce
@@ -0,0 +1,37 @@
+What:   /sys/class/uacce//api
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Api of the device, no requirement of the format
+Application use the api to match the correct driver
+
+What:   /sys/class/uacce//flags
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Attributes of the device, see UACCE_DEV_xxx flag defined in 
uacce.h
+
+What:   /sys/class/uacce//available_instances
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Available instances left of the device
+Return -ENODEV if uacce_ops get_available_instances is not 
provided
+
+What:   /sys/class/uacce//algorithms
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Algorithms supported by this accelerator, separated by new 
line.
+
+What:   /sys/class/uacce//region_mmio_size
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Size (bytes) of mmio region queue file
+
+What:   /sys/class/uacce//region_dus_size
+Date:   Nov 2019
+KernelVersion:  5.5
+Contact:linux-accelerat...@lists.ozlabs.org
+Description:Size (bytes) of dus region queue file
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index c55b637..929feb0 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -481,4 +481,5 @@ source "drivers/misc/cxl/Kconfig"
 source "drivers/misc/ocxl/Kconfig"
 source "drivers/misc/cardreader/Kconfig"
 source "drivers/misc/habanalabs/Kconfig"
+source "drivers/misc/uacce/Kconfig"
 endmenu
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index c1860d3..9abf292 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -56,4 +56,5 @@ obj-$(CONFIG_OCXL)+= ocxl/
 obj-y   

[PATCH v9 0/4] Add uacce module for Accelerator

2019-11-22 Thread Zhangfei Gao
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support. 
We have keep verifying with Jean's sva/current [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patch [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/5.4-rc4-uacce-v9

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-upstream-v9

References:
[1] http://jpbrucker.net/sva/
[2] 
http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v9:
Suggested by Jonathan
1. Remove sysfs: numa_distance, node_id, id.
2. Split the api to solve the potential race
struct uacce_device *uacce_alloc(struct device *parent,
 struct uacce_interface *interface)
int uacce_register(struct uacce_device *uacce)
void uacce_remove(struct uacce_device *uacce)
3. Split clean up patch 03

v8:
Address some comments from Jonathan
Merge Jean's patch, using uacce_mm instead of pid for sva_exit

v7:
As suggested by Jean and Jerome
Only consider sva case and remove unused dma apis for the first patch.
Also add mm_exit for sva and vm_ops.close etc


v6: https://lkml.org/lkml/2019/10/16/231
Change sys qfrs_size to different file, suggested by Jonathan
Fix crypto daily build issue and based on crypto code base, also 5.4-rc1.

v5: https://lkml.org/lkml/2019/10/14/74
Add an example patch using the uacce interface, suggested by Greg
0003-crypto-hisilicon-register-zip-engine-to-uacce.patch

v4: https://lkml.org/lkml/2019/9/17/116
Based on 5.4-rc1
Considering other driver integrating uacce, 
if uacce not compiled, uacce_register return error and uacce_unregister is 
empty.
Simplify uacce flag: UACCE_DEV_SVA.
Address Greg's comments: 
Fix state machine, remove potential syslog triggered from user space etc.

v3: https://lkml.org/lkml/2019/9/2/990
Recommended by Greg, use sturct uacce_device instead of struct uacce,
and use struct *cdev in struct uacce_device, as a result, 
cdev can be released by itself when refcount decreased to 0.
So the two structures are decoupled and self-maintained by themsleves.
Also add dev.release for put_device.

v2: https://lkml.org/lkml/2019/8/28/565
Address comments from Greg and Jonathan
Modify interface uacce_register
Drop noiommu mode first

v1: https://lkml.org/lkml/2019/8/14/277
1. Rebase to 5.3-rc1
2. Build on iommu interface
3. Verifying with Jean's sva and Eric's nested mode iommu.
4. User library has developed a lot: support zlib, openssl etc.
5. Move to misc first

RFC3:
https://lkml.org/lkml/2018/11/12/1951

RFC2:
https://lwn.net/Articles/763990/


Background of why Uacce:
Von Neumann processor is not good at general data manipulation.
It is designed for control-bound rather than data-bound application.
The latter need less control path facility and more/specific ALUs.
So there are more and more heterogeneous processors, such as
encryption/decryption accelerators, TPUs, or
EDGE (Explicated Data Graph Execution) processors, introduced to gain
better performance or power efficiency for particular applications
these days.

There are generally two ways to make use of these heterogeneous processors:

The first is to make them co-processors, just like FPU.
This is good for some application but it has its own cons:
It changes the ISA set permanently.
You must save all state elements when the process is switched out.
But most data-bound processors have a huge set of state elements.
It makes the kernel scheduler more complex.

The second is Accelerator.
It is taken as a IO device from the CPU's point of view
(but it need not to be physically). The process, running on CPU,
hold a context of the accelerator and send instructions to it as if
it calls a function or thread running with FPU.
The context is bound with the processor itself.
So the state elements remain in the hardware context until
the context is released.

We believe this is the core feature of an "Accelerator" vs. Co-processor
or other heterogeneous processors.

The intention of Uacce is to provide the basic facility to backup
this scenario. Its first step is to make sure the accelerator and process
can share the same address space. So the accelerator ISA can directly
address any data structure of the main CPU.
This differs from the data sharing between CPU and IO device,
which share data content rather than address.
So it is different comparing to the other DMA libraries.

In th

"Revisit iommu_insert_resv_region() implementation" causes use-after-free

2019-11-22 Thread Qian Cai
Read files under /sys/kernel/iommu_groups/ triggers an use-after-free. Reverted
the commit 4dbd258ff63e ("iommu: Revisit iommu_insert_resv_region()
implementation") fixed the issue.

/* no merge needed on elements of different types than @nr */
if (iter->type != nr->type) {
list_move_tail(&iter->list, &stack);
continue;

[  160.156964][ T3100] BUG: KASAN: use-after-free in
iommu_insert_resv_region+0x34b/0x520
[  160.197758][ T3100] Read of size 4 at addr 8887aba78464 by task cat/3100
[  160.230645][ T3100] 
[  160.240907][ T3100] CPU: 14 PID: 3100 Comm: cat Not tainted 5.4.0-rc8-next-
20191122+ #11
[  160.278671][ T3100] Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420
Gen9, BIOS U19 12/27/2015
[  160.320589][ T3100] Call Trace:
[  160.335229][ T3100]  dump_stack+0xa0/0xea
[  160.354011][ T3100]  print_address_description.constprop.5.cold.7+0x9/0x384
[  160.386569][ T3100]  __kasan_report.cold.8+0x7a/0xc0
[  160.409811][ T3100]  ? iommu_insert_resv_region+0x34b/0x520
[  160.435668][ T3100]  kasan_report+0x12/0x20
[  160.455387][ T3100]  __asan_load4+0x95/0xa0
[  160.474808][ T3100]  iommu_insert_resv_region+0x34b/0x520
[  160.500228][ T3100]  ? iommu_bus_notifier+0xe0/0xe0
[  160.522904][ T3100]  ? intel_iommu_get_resv_regions+0x348/0x400
[  160.550461][ T3100]  iommu_get_group_resv_regions+0x16d/0x2f0
[  160.577611][ T3100]  ? iommu_insert_resv_region+0x520/0x520
[  160.603756][ T3100]  ? register_lock_class+0x940/0x940
[  160.628265][ T3100]  iommu_group_show_resv_regions+0x8d/0x1f0
[  160.655370][ T3100]  ? iommu_get_group_resv_regions+0x2f0/0x2f0
[  160.684168][ T3100]  iommu_group_attr_show+0x34/0x50
[  160.708395][ T3100]  sysfs_kf_seq_show+0x11c/0x220
[  160.731758][ T3100]  ? iommu_default_passthrough+0x20/0x20
[  160.756898][ T3100]  kernfs_seq_show+0xa4/0xb0
[  160.777097][ T3100]  seq_read+0x27e/0x710
[  160.795195][ T3100]  kernfs_fop_read+0x7d/0x2c0
[  160.815349][ T3100]  __vfs_read+0x50/0xa0
[  160.834154][ T3100]  vfs_read+0xcb/0x1e0
[  160.852332][ T3100]  ksys_read+0xc6/0x160
[  160.871028][ T3100]  ? kernel_write+0xc0/0xc0
[  160.891307][ T3100]  ? do_syscall_64+0x79/0xaec
[  160.912446][ T3100]  ? do_syscall_64+0x79/0xaec
[  160.933640][ T3100]  __x64_sys_read+0x43/0x50
[  160.953957][ T3100]  do_syscall_64+0xcc/0xaec
[  160.974322][ T3100]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  160.999130][ T3100]  ? syscall_return_slowpath+0x580/0x580
[  161.024753][ T3100]  ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
[  161.052416][ T3100]  ? trace_hardirqs_off_caller+0x3a/0x150
[  161.078400][ T3100]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[  161.103711][ T3100]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  161.130793][ T3100] RIP: 0033:0x7f33e0d89d75
[  161.150732][ T3100] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f
1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48>
3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89
[  161.245503][ T3100] RSP: 002b:7fff88f0db88 EFLAGS: 0246 ORIG_RAX:

[  161.284547][ T3100] RAX: ffda RBX: 0002 RCX:
7f33e0d89d75
[  161.321123][ T3100] RDX: 0002 RSI: 7f33e1201000 RDI:
0003
[  161.357617][ T3100] RBP: 7f33e1201000 R08:  R09:

[  161.394173][ T3100] R10: 0022 R11: 0246 R12:
7f33e1201000
[  161.430736][ T3100] R13: 0003 R14: 0fff R15:
0002
[  161.467337][ T3100] 
[  161.477529][ T3100] Allocated by task 3100:
[  161.497133][ T3100]  save_stack+0x21/0x90
[  161.515777][ T3100]  __kasan_kmalloc.constprop.13+0xc1/0xd0
[  161.541743][ T3100]  kasan_kmalloc+0x9/0x10
[  161.561330][ T3100]  kmem_cache_alloc_trace+0x1f8/0x470
[  161.585949][ T3100]  iommu_insert_resv_region+0xeb/0x520
[  161.610876][ T3100]  iommu_get_group_resv_regions+0x16d/0x2f0
[  161.638318][ T3100]  iommu_group_show_resv_regions+0x8d/0x1f0
[  161.665322][ T3100]  iommu_group_attr_show+0x34/0x50
[  161.688526][ T3100]  sysfs_kf_seq_show+0x11c/0x220
[  161.711992][ T3100]  kernfs_seq_show+0xa4/0xb0
[  161.734252][ T3100]  seq_read+0x27e/0x710
[  161.754412][ T3100]  kernfs_fop_read+0x7d/0x2c0
[  161.775493][ T3100]  __vfs_read+0x50/0xa0
[  161.794328][ T3100]  vfs_read+0xcb/0x1e0
[  161.812559][ T3100]  ksys_read+0xc6/0x160
[  161.831554][ T3100]  __x64_sys_read+0x43/0x50
[  161.851772][ T3100]  do_syscall_64+0xcc/0xaec
[  161.872098][ T3100]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  161.898919][ T3100] 
[  161.909113][ T3100] Freed by task 3100:
[  161.927070][ T3100]  save_stack+0x21/0x90
[  161.945711][ T3100]  __kasan_slab_free+0x11c/0x170
[  161.968112][ T3100]  kasan_slab_free+0xe/0x10
[  161.988601][ T3100]  slab_free_freelist_hook+0x5f/0x1d0
[  162.012918][ T3100]  kfree+0xe9/0x410
[  162.029454][ T3100]  iommu_insert_resv_region+0x47d/0x520
[  162.053701][ T3100]  iommu_get_group_resv_regions+

RE: [RFC PATCH] usb: gadget: f_tcm: Added DMA32 flag while allocation of command buffer

2019-11-22 Thread Jayshri Dajiram Pawar
> > +Konrad
> 
> You can run Linux with CONFIG_DEBU_DMA and use
> debug_dma_dump_mappings() to dump and figure things out. See
> https://urldefense.proofpoint.com/v2/url?u=http-
> 3A__xenbits.xen.org_gitweb_-3Fp-3Dxentesttools_bootstrap.git-3Ba-3Dblob-
> 3Bf-3Droot-5Fimage_drivers_dump_dump-5Fdma.c-3Bh-
> 3D2ba251a2f8c36c24c68762b3e4c9f76ea178238f-3Bhb-
> 3DHEAD&d=DwIFAg&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-
> _haXqY&r=yUzzs89gsBIbqjpopjycywmchLJKpKHDc_YLMD1daI0&m=uPhjYpgZJ
> ovroCszC7ZGapdrx4F72MK7pqXmzpRyGmA&s=dSO49c-
> githIzhiwgvwOt0m00M2trGWB3AnKL3OKpkw&e=
> 
> >
> > Jayshri,
> >
> > On 15/11/2019 12:14, Jayshri Dajiram Pawar wrote:
> > > > > > There is a problem when function driver allocate memory for
> > > > > > buffer used by DMA from outside dma_mask space.
> > > > > > It appears during testing f_tcm driver with cdns3 controller.
> > > > > > In the result cdns3 driver was not able to map virtual buffer to 
> > > > > > DMA.
> > > > > > This fix should be improved depending on dma_mask associated
> > > > > > with
> > > > device.
> > > > > > Adding GFP_DMA32 flag while allocationg command data buffer
> > > > > > only for
> > > > > > 32 bit controllers.
> > > > >
> > > > > Hi Jayshri,
> > > > >
> > > > > This issue should be fixed by setting DMA_MASK correctly for
> > > > > controller, you can't limit user's memory region. At
> > > > > usb_ep_queue, the UDC driver will call DMA MAP API, for Cadence,
> > > > > it is
> > > > usb_gadget_map_request_by_dev.
> > > > > For the system without SMMU (IO-MMU), it will use swiotlb to
> > > > > make sure the data buffer used for DMA transfer is within DMA
> > > > > mask for controller, There is a reserved low memory region for
> > > > > debounce buffer in swiotlb use case.
> > > > >
> > > >
> > > > /**
> > > >* struct usb_request - describes one i/o request
> > > >* @buf: Buffer used for data.  Always provide this; some controllers
> > > >*only use PIO, or don't use DMA for some endpoints.
> > > >* @dma: DMA address corresponding to 'buf'.  If you don't set this
> > > >*field, and the usb controller needs one, it is responsible
> > > >*for mapping and unmapping the buffer.
> > > > 
> > > >*/
> > > >
> > > > So if dma is not set in the usb_request then controller driver is
> > > > responsible to do a dma_map of the buffer pointed by 'buf' before it
> attemps to do DMA.
> > > > This should take care of DMA mask and swiotlb.
> > > >
> > > > This patch is not correct.
> > > >
> > > Hi Roger,
> > >
> > > We have scatter-gather disabled.
> > > We are getting below error while allocation of cmd data buffer with
> length 524288 or greater, while writing large size files to device.
> > > This error occurred on x86 platform.
> > > Because of this reason we have added DMA flag while allocation of
> buffer.
> > >
> > > [ 1602.977532] swiotlb_tbl_map_single: 26 callbacks suppressed [
> > > 1602.977536] cdns-usb3 cdns-usb3.1: swiotlb buffer is full (sz:
> > > 524288 bytes), total 32768 (slots), used 0 (slots)
> >
Hi Roger,

> > Why is swiotlb buffer getting full? How much is it on your system?

On our system swiotlb max mapping size is 256KB.
UASP receive data state tries to queue and map buffer of length 524288 (512KB), 
which is greater than 256KB that's why swiotlb buffer is getting full.

> > Are you sure that dma_unmap is happening on requests that complete?
> else we'll just keep hogging the swiotlb buffer.

Yes, dma_unmap is happening on requests that complete.

I could map buffer of length 512KB with  IO_TLB_SEGSIZE changed to 256.
With this max mapping size is increased to  256*2048 = 512KB. 

+++ b/include/linux/swiotlb.h
@@ -21,7 +21,7 @@ enum swiotlb_force {
  * must be a power of 2.  What is the appropriate value ?
  * The complexity of {map,unmap}_single is linearly dependent on this value.
  */
-#define IO_TLB_SEGSIZE 128
+#define IO_TLB_SEGSIZE 256

Regards,
Jayshri

> >
> > cheers,
> > -roger



> >
> > > [ 1602.977542] cdns-usb3 cdns-usb3.1: overflow
> > > 0x0007eee0+524288 of DMA mask  bus mask 0 [
> > > 1602.977555] WARNING: CPU: 6 PID: 285 at kernel/dma/direct.c:43
> > > report_addr+0x37/0x60 [ 1602.977556] Modules linked in:
> target_core_user uio target_core_pscsi target_core_file target_core_iblock
> usb_f_tcm(OE) target_core_mod cdns3(OE) cdns3_pci_wrap(OE) roles(E)
> libcomposite(OE) udc_core(OE) xt_multiport iptable_filter bpfilter
> snd_hda_codec_hdmi nls_iso8859_1 i915 intel_rapl x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek
> snd_hda_codec_generic ledtrig_audio snd_hda_intel irqbypass
> snd_hda_codec snd_hda_core snd_hwdep snd_pcm drm_kms_helper
> snd_seq_midi snd_seq_midi_event crct10dif_pclmul snd_rawmidi
> crc32_pclmul drm snd_seq ghash_clmulni_intel snd_seq_device aesni_intel
> snd_timer mei_me i2c_algo_bit aes_x86_64 crypto_simd cryptd fb_sys_fops
> glue_helper snd mei input_leds syscopyarea intel_cstate sysfillrect
> intel_rapl_perf sysimgblt hp

Re: [PATCH v2] iommu/iova: silence warnings under memory pressure

2019-11-22 Thread Qian Cai
On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote:
> On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote:
> > When running heavy memory pressure workloads, this 5+ old system is
> > throwing endless warnings below because disk IO is too slow to recover
> > from swapping. Since the volume from alloc_iova_fast() could be large,
> > once it calls printk(), it will trigger disk IO (writing to the log
> > files) and pending softirqs which could cause an infinite loop and make
> > no progress for days by the ongoimng memory reclaim. This is the counter
> > part for Intel where the AMD part has already been merged. See the
> > commit 3d708895325b ("iommu/amd: Silence warnings under memory
> > pressure"). Since the allocation failure will be reported in
> > intel_alloc_iova(), so just call printk_ratelimted() there and silence
> > the one in alloc_iova_mem() to avoid the expensive warn_alloc().
> 
> []
> > v2: use dev_err_ratelimited() and improve the commit messages.
> 
> []
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> 
> []
> > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct device 
> > *dev,
> > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
> >IOVA_PFN(dma_mask), true);
> > if (unlikely(!iova_pfn)) {
> > -   dev_err(dev, "Allocating %ld-page iova failed", nrpages);
> > +   dev_err_ratelimited(dev, "Allocating %ld-page iova failed",
> > +   nrpages);
> 
> Trivia:
> 
> This should really have a \n termination on the format string
> 
>   dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n",
> 
> 

Why do you say so? It is right now printing with a newline added anyway.

 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed
 hpsa :03:00.0: DMAR: Allocating 1-page iova failed

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] of: property: Add device link support for "iommu-map"

2019-11-22 Thread Will Deacon
[+Ard]

Hi Rob,

On Fri, Nov 22, 2019 at 08:47:46AM -0600, Rob Herring wrote:
> On Wed, Nov 20, 2019 at 1:00 PM Will Deacon  wrote:
> >
> > Commit 8e12257dead7 ("of: property: Add device link support for iommus,
> > mboxes and io-channels") added device link support for IOMMU linkages
> > described using the "iommus" property. For PCI devices, this property
> > is not present and instead the "iommu-map" property is used on the host
> > bridge node to map the endpoint RequesterIDs to their corresponding
> > IOMMU instance.
> >
> > Add support for "iommu-map" to the device link supplier bindings so that
> > probing of PCI devices can be deferred until after the IOMMU is
> > available.
> >
> > Cc: Greg Kroah-Hartman 
> > Cc: Rob Herring 
> > Cc: Saravana Kannan 
> > Cc: Robin Murphy 
> > Signed-off-by: Will Deacon 
> > ---
> >
> > Applies against driver-core/driver-core-next.
> > Tested on AMD Seattle (arm64).
> 
> Guess that answers my question whether anyone uses Seattle with DT.
> Seattle uses the old SMMU binding, and there's not even an IOMMU
> associated with the PCI host. I raise this mainly because the dts
> files for Seattle either need some love or perhaps should be removed.

I'm using the new DT bindings on my Seattle, thanks to the firmware fairy
(Ard) visiting my flat with a dediprog. The patches I've posted to enable
modular builds of the arm-smmu driver require that the old binding is
disabled [1].

> No issues with the patch itself though. I'll queue it after rc1.

Thanks, although I think Greg has already queued it [2] due to the
dependencies on other patches in his tree.

Will

[1] https://lore.kernel.org/lkml/20191121114918.2293-14-w...@kernel.org/
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/commit/?h=driver-core-next&id=e149573b2f84d0517648dafc0db625afa681ed54
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] of: property: Add device link support for "iommu-map"

2019-11-22 Thread Rob Herring
On Wed, Nov 20, 2019 at 1:00 PM Will Deacon  wrote:
>
> Commit 8e12257dead7 ("of: property: Add device link support for iommus,
> mboxes and io-channels") added device link support for IOMMU linkages
> described using the "iommus" property. For PCI devices, this property
> is not present and instead the "iommu-map" property is used on the host
> bridge node to map the endpoint RequesterIDs to their corresponding
> IOMMU instance.
>
> Add support for "iommu-map" to the device link supplier bindings so that
> probing of PCI devices can be deferred until after the IOMMU is
> available.
>
> Cc: Greg Kroah-Hartman 
> Cc: Rob Herring 
> Cc: Saravana Kannan 
> Cc: Robin Murphy 
> Signed-off-by: Will Deacon 
> ---
>
> Applies against driver-core/driver-core-next.
> Tested on AMD Seattle (arm64).

Guess that answers my question whether anyone uses Seattle with DT.
Seattle uses the old SMMU binding, and there's not even an IOMMU
associated with the PCI host. I raise this mainly because the dts
files for Seattle either need some love or perhaps should be removed.

No issues with the patch itself though. I'll queue it after rc1.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 00/13] virtio-iommu on non-devicetree platforms

2019-11-22 Thread Michael S. Tsirkin
On Fri, Nov 22, 2019 at 11:49:47AM +0100, Jean-Philippe Brucker wrote:
> I'm seeking feedback on multi-platform support for virtio-iommu. At the
> moment only devicetree (DT) is supported and we don't have a pleasant
> solution for other platforms. Once we figure out the topology
> description, x86 support is trivial.
> 
> Since the IOMMU manages memory accesses from other devices, the guest
> kernel needs to initialize the IOMMU before endpoints start issuing DMA.
> It's a solved problem: firmware or hypervisor describes through DT or
> ACPI tables the device dependencies, and probe of endpoints is deferred
> until the IOMMU is probed. But:
> 
> (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT
> for Arm). From my point of view IORT is easier to extend, since we
> just need to introduce a new node type. There are no dependencies to
> Arm in the Linux IORT driver, so it works well with CONFIG_X86.
> 
> However, there are concerns about other OS vendors feeling obligated
> to implement this new node, so Arm proposed introducing another ACPI
> table, that can wrap any of DMAR, IVRS and IORT to extend it with
> new virtual nodes. A draft of this VIOT table specification is
> available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf
> 
> I'm afraid this could increase fragmentation as guests would need to
> implement or modify their support for all of DMAR, IVRS and IORT. If
> we end up doing VIOT, I suggest limiting it to IORT.
> 
> (2) In addition, there are some concerns about having virtio depend on
> ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86
> [1])

power?

> don't currently implement those methods.
> 
> It was suggested to embed the topology description into the device.
> It can work, as demonstrated at the end of this RFC, with the
> following limitations:
> 
> - The topology description must be read before any endpoint managed
>   by the IOMMU is probed, and even before the virtio module is
>   loaded. This RFC uses a PCI quirk to manually parse the virtio
>   configuration. It assumes that all endpoints managed by the IOMMU
>   are under this same PCI host.
> 
> - I don't have a solution for the virtio-mmio transport at the
>   moment, because I haven't had time to modify a host to test it. I
>   think it could either use a notifier on the platform bus, or
>   better, a new 'iommu' command-line argument to the virtio-mmio
>   driver.

A notifier seems easier for users. What are the disadvantages of
that?

>   So the current prototype doesn't work for firecracker and
>   microvm, which rely on virtio-mmio.
> 
> - For Arm, if the platform has an ITS, the hypervisor needs IORT or
>   DT to describe it anyway. More generally, not using either ACPI or
>   DT might prevent from supporting other features as well. I suspect
>   the above users will have to implement a standard method sooner or
>   later.
> 
> - Even when reusing as much existing code as possible, guest support
>   is still going to be around a few hundred lines since we can't
>   rely on the normal virtio infrastructure to be loaded at that
>   point. As you can see below, the diffstat for the incomplete
>   topology implementation is already bigger than the exhaustive IORT
>   support, even when jumping through the VIOT hoop.
> 
> So it's a lightweight solution for very specific use-cases, and we
> should still support ACPI for the general case. Multi-platform
> guests such as Linux will then need to support three topology
> descriptions instead of two.
> 
> In this RFC I present both solutions, but I'd rather not keep all of it.
> Please see the individual patches for details:
> 
> (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT
> driver and patches 2, 11 add the VIOT glue.
> 
> (2) Patch 12 adds the built-in topology description to the virtio-iommu
> specification. Patch 13 is a partial implementation for the Linux
> virtio-iommu driver. It only supports PCI, not platform devices.
> 
> You can find Linux and QEMU code on my virtio-iommu/devel branches at
> http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu
> 
> 
> I split the diffstat since there are two independent features. The first
> one is for patches 1-11, and the second one for patch 13.
> 
> Jean-Philippe Brucker (11):
>   ACPI/IORT: Move IORT to the ACPI folder
>   ACPI: Add VIOT definitions
>   ACPI/IORT: Allow registration of external tables
>   ACPI/IORT: Add node categories
>   ACPI/IORT: Support VIOT virtio-mmio node
>   ACPI/IORT: Support VIOT virtio-pci node
>   ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode
>   ACPI/IORT: Add callback to update a device's fwnode
>   iommu/virtio: Create fwnode if necessary
>   iommu/virtio: Update IORT fwnode
>   ACPI: Add VIOT table

Re: [RFC 13/13] iommu/virtio: Add topology description to

2019-11-22 Thread Michael S. Tsirkin
On Fri, Nov 22, 2019 at 11:50:00AM +0100, Jean-Philippe Brucker wrote:
> Some hypervisors don't implement either device-tree or ACPI, but still
> need a method to describe the IOMMU topology. Read the virtio-iommu
> config early and parse the topology description. Hook into the
> dma_setup() callbacks to initialize the IOMMU before probing endpoints.
> 
> If the virtio-iommu uses the virtio-pci transport, this will only work
> if the PCI root complex is the first device probed. We don't currently
> support virtio-mmio.
> 
> Initially I tried to generate a fake IORT table and feed it to the IORT
> driver, in order to avoid rewriting the whole DMA code, but it wouldn't
> work with platform endpoints, which are references to items in the ACPI
> table on IORT.
> 
> Signed-off-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 

Overall this looks good to me. The only point is that
I think the way the interface is designed makes writing
the driver a bit too difficult. Idea: if instead we just
have a length field and then an array of records
(preferably unions so we don't need to work hard),
we can shadow that into memory, then iterate over
the unions.

Maybe add a uniform record length + number of records field.
Then just skip types you do not know how to handle.
This will also help make sure it's within bounds.

What do you think?


You will need to do something to address the TODO I think.

> ---
> Note that we only call virt_dma_configure() if the host didn't provide
> either DT or ACPI method. If you want to test this with QEMU, you'll
> need to manually disable the acpi_dma_configure() part in pci-driver.c
> ---
>  drivers/base/platform.c   |   3 +
>  drivers/iommu/Kconfig |   9 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu-topology.c | 410 ++
>  drivers/iommu/virtio-iommu.c  |   3 +
>  drivers/pci/pci-driver.c  |   3 +
>  include/linux/virtio_iommu.h  |  18 ++
>  include/uapi/linux/virtio_iommu.h |  26 ++
>  8 files changed, 473 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu-topology.c
>  create mode 100644 include/linux/virtio_iommu.h
> 
> diff --git a/drivers/base/platform.c b/drivers/base/platform.c
> index b230beb6ccb4..70b12c8ef2fb 100644
> --- a/drivers/base/platform.c
> +++ b/drivers/base/platform.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "base.h"
>  #include "power/power.h"
> @@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev)
>   } else if (has_acpi_companion(dev)) {
>   attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
>   ret = acpi_dma_configure(dev, attr);
> + } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) {
> + ret = virt_dma_configure(dev);
>   }
>  
>   return ret;
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index e6eb4f238d1a..d02c0d36019d 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -486,4 +486,13 @@ config VIRTIO_IOMMU
>  
> Say Y here if you intend to run this kernel as a guest.
>  
> +config VIRTIO_IOMMU_TOPOLOGY
> + bool "Topology properties for the virtio-iommu"
> + depends on VIRTIO_IOMMU
> + help
> +   Enable early probing of the virtio-iommu device, to detect the
> +   topology description.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 4f405f926e73..6b51c4186ebc 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o
> diff --git a/drivers/iommu/virtio-iommu-topology.c 
> b/drivers/iommu/virtio-iommu-topology.c
> new file mode 100644
> index ..ec22510ace3d
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu-topology.c
> @@ -0,0 +1,410 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct viommu_cap_config {
> + u8 pos; /* PCI capability position */
> + u8 bar;
> + u32 length; /* structure size */
> + u32 offset; /* structure offset within the bar */
> +};
> +
> +struct viommu_spec {
> + struct device   *dev; /* transport device */
> + struct fwnode_handle*fwnode;
> + struct iommu_ops*ops;
> + struct list_headtopology;
> + struct list_headlist;
> +};
> +
> +struct viommu_topology {
> + union {
> + struct virtio_iommu_topo_head head;
> + struct virtio_iommu_topo_pci_range pci;
> + st

[RFC 05/13] ACPI/IORT: Support VIOT virtio-mmio node

2019-11-22 Thread Jean-Philippe Brucker
Add a new type of node to the IORT driver, that describes a virtio-iommu
device based on the virtio-mmio transport. The node is only available
when the IORT is a sub-table of the VIOT.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/iort.c | 66 ++---
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index 1d43fbc0001f..adc5953fffa5 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -43,7 +43,8 @@ static bool iort_type_matches(u8 type, enum 
iort_node_category category)
switch (category) {
case IORT_IOMMU_TYPE:
return type == ACPI_IORT_NODE_SMMU ||
-  type == ACPI_IORT_NODE_SMMU_V3;
+  type == ACPI_IORT_NODE_SMMU_V3 ||
+  type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU;
case IORT_MSI_TYPE:
return type == ACPI_IORT_NODE_ITS_GROUP;
default:
@@ -868,8 +869,10 @@ static inline bool iort_iommu_driver_enabled(u8 type)
return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
case ACPI_IORT_NODE_SMMU:
return IS_BUILTIN(CONFIG_ARM_SMMU);
+   case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
+   return IS_ENABLED(CONFIG_VIRTIO_IOMMU);
default:
-   pr_warn("IORT node type %u does not describe an SMMU\n", type);
+   pr_warn("IORT node type %u does not describe an IOMMU\n", type);
return false;
}
 }
@@ -1408,6 +1411,46 @@ static int __init arm_smmu_v3_pmcg_add_platdata(struct 
platform_device *pdev)
return platform_device_add_data(pdev, &model, sizeof(model));
 }
 
+static int __init viommu_mmio_count_resources(struct acpi_iort_node *node)
+{
+   /* Mem + IRQ */
+   return 2;
+}
+
+static void __init viommu_mmio_init_resources(struct resource *res,
+  struct acpi_iort_node *node)
+{
+   int hw_irq, trigger;
+   struct acpi_viot_iort_virtio_mmio_iommu *viommu;
+
+   viommu = (struct acpi_viot_iort_virtio_mmio_iommu *)node->node_data;
+
+   res[0].start = viommu->base_address;
+   res[0].end = viommu->base_address + viommu->span - 1;
+   res[0].flags = IORESOURCE_MEM;
+
+   hw_irq = IORT_IRQ_MASK(viommu->interrupt);
+   trigger = IORT_IRQ_TRIGGER_MASK(viommu->interrupt);
+   acpi_iort_register_irq(hw_irq, "viommu", trigger, res + 1);
+}
+
+static void __init viommu_mmio_dma_configure(struct device *dev,
+ struct acpi_iort_node *node)
+{
+   enum dev_dma_attr attr;
+   struct acpi_viot_iort_virtio_mmio_iommu *viommu;
+
+   viommu = (struct acpi_viot_iort_virtio_mmio_iommu *)node->node_data;
+
+   attr = (viommu->flags & 
ACPI_VIOT_IORT_VIRTIO_MMIO_IOMMU_CACHE_COHERENT) ?
+   DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT;
+
+   dev->dma_mask = &dev->coherent_dma_mask;
+
+   /* Configure DMA for the page table walker */
+   acpi_dma_configure(dev, attr);
+}
+
 struct iort_dev_config {
const char *name;
int (*dev_init)(struct acpi_iort_node *node);
@@ -1443,6 +1486,14 @@ static const struct iort_dev_config 
iort_arm_smmu_v3_pmcg_cfg __initconst = {
.dev_add_platdata = arm_smmu_v3_pmcg_add_platdata,
 };
 
+static const struct iort_dev_config iort_viommu_mmio_cfg __initconst = {
+   /* Probe with the generic virtio-mmio driver */
+   .name = "virtio-mmio",
+   .dev_dma_configure = viommu_mmio_dma_configure,
+   .dev_count_resources = viommu_mmio_count_resources,
+   .dev_init_resources = viommu_mmio_init_resources,
+};
+
 static __init const struct iort_dev_config *iort_get_dev_cfg(
struct acpi_iort_node *node)
 {
@@ -1453,9 +1504,16 @@ static __init const struct iort_dev_config 
*iort_get_dev_cfg(
return &iort_arm_smmu_cfg;
case ACPI_IORT_NODE_PMCG:
return &iort_arm_smmu_v3_pmcg_cfg;
-   default:
-   return NULL;
}
+
+   if (iort_table_source == IORT_SOURCE_VIOT) {
+   switch (node->type) {
+   case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
+   return &iort_viommu_mmio_cfg;
+   }
+   }
+
+   return NULL;
 }
 
 /**
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 07/13] ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode

2019-11-22 Thread Jean-Philippe Brucker
When the IOMMU is PCI-based, IORT doesn't know the fwnode until the
driver has had a chance to register it. In addition to deferring the
probe until the IOMMU ops are set, also defer the probe until the fwspec
is available.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/iort.c | 54 ++---
 1 file changed, 31 insertions(+), 23 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index b517aa4e83ba..f08f72d8af78 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -61,6 +61,22 @@ static bool iort_type_matches(u8 type, enum 
iort_node_category category)
}
 }
 
+static inline bool iort_iommu_driver_enabled(u8 type)
+{
+   switch (type) {
+   case ACPI_IORT_NODE_SMMU_V3:
+   return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
+   case ACPI_IORT_NODE_SMMU:
+   return IS_BUILTIN(CONFIG_ARM_SMMU);
+   case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
+   case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU:
+   return IS_ENABLED(CONFIG_VIRTIO_IOMMU);
+   default:
+   pr_warn("IORT node type %u does not describe an IOMMU\n", type);
+   return false;
+   }
+}
+
 /**
  * iort_set_fwnode() - Create iort_fwnode and use it to register
  *iommu data in the iort_fwnode_list
@@ -102,9 +118,9 @@ static inline int iort_set_fwnode(struct acpi_iort_node 
*iort_node,
  *
  * Returns: fwnode_handle pointer on success, NULL on failure
  */
-static inline struct fwnode_handle *iort_get_fwnode(
-   struct acpi_iort_node *node)
+static inline struct fwnode_handle *iort_get_fwnode(struct acpi_iort_node 
*node)
 {
+   int err = -ENODEV;
struct iort_fwnode *curr;
struct fwnode_handle *fwnode = NULL;
 
@@ -112,12 +128,20 @@ static inline struct fwnode_handle *iort_get_fwnode(
list_for_each_entry(curr, &iort_fwnode_list, list) {
if (curr->iort_node == node) {
fwnode = curr->fwnode;
+   if (!fwnode && curr->pci_devid) {
+   /*
+* Postpone probe until virtio-iommu has
+* registered its fwnode.
+*/
+   err = iort_iommu_driver_enabled(node->type) ?
+   -EPROBE_DEFER : -ENODEV;
+   }
break;
}
}
spin_unlock(&iort_fwnode_lock);
 
-   return fwnode;
+   return fwnode ?: ERR_PTR(err);
 }
 
 /**
@@ -874,22 +898,6 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, 
struct list_head *head)
return (resv == its->its_count) ? resv : -ENODEV;
 }
 
-static inline bool iort_iommu_driver_enabled(u8 type)
-{
-   switch (type) {
-   case ACPI_IORT_NODE_SMMU_V3:
-   return IS_BUILTIN(CONFIG_ARM_SMMU_V3);
-   case ACPI_IORT_NODE_SMMU:
-   return IS_BUILTIN(CONFIG_ARM_SMMU);
-   case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
-   case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU:
-   return IS_ENABLED(CONFIG_VIRTIO_IOMMU);
-   default:
-   pr_warn("IORT node type %u does not describe an IOMMU\n", type);
-   return false;
-   }
-}
-
 static int arm_smmu_iort_xlate(struct device *dev, u32 streamid,
   struct fwnode_handle *fwnode,
   const struct iommu_ops *ops)
@@ -920,8 +928,8 @@ static int iort_iommu_xlate(struct device *dev, struct 
acpi_iort_node *node,
return -ENODEV;
 
iort_fwnode = iort_get_fwnode(node);
-   if (!iort_fwnode)
-   return -ENODEV;
+   if (IS_ERR(iort_fwnode))
+   return PTR_ERR(iort_fwnode);
 
/*
 * If the ops look-up fails, this means that either
@@ -1618,8 +1626,8 @@ static int __init iort_add_platform_device(struct 
acpi_iort_node *node,
 
fwnode = iort_get_fwnode(node);
 
-   if (!fwnode) {
-   ret = -ENODEV;
+   if (IS_ERR(fwnode)) {
+   ret = PTR_ERR(fwnode);
goto dev_put;
}
 
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 10/13] iommu/virtio: Update IORT fwnode

2019-11-22 Thread Jean-Philippe Brucker
When the virtio-iommu uses the PCI transport and the topology is
described with IORT, register the PCI fwnode with IORT.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 8efa368134c0..9847552faecc 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -7,6 +7,7 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -989,6 +990,8 @@ static int viommu_set_fwnode(struct viommu_dev *viommu)
set_primary_fwnode(dev, fwnode);
}
 
+   /* Tell IORT about a PCI device's fwnode */
+   iort_iommu_update_fwnode(dev, dev->fwnode);
iommu_device_set_fwnode(&viommu->iommu, dev->fwnode);
return 0;
 }
@@ -1000,6 +1003,8 @@ static void viommu_clear_fwnode(struct viommu_dev *viommu)
if (!dev->fwnode)
return;
 
+   iort_iommu_update_fwnode(dev, NULL);
+
if (is_software_node(dev->fwnode)) {
struct fwnode_handle *fwnode = dev->fwnode;
 
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 04/13] ACPI/IORT: Add node categories

2019-11-22 Thread Jean-Philippe Brucker
The current node filtering won't work when introducing node types
greater than 63 (such as the virtio-iommu nodes). Add
node_type_matches() to filter nodes by category.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/iort.c | 34 --
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index 9c6c91e06f8f..1d43fbc0001f 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -18,10 +18,10 @@
 #include 
 #include 
 
-#define IORT_TYPE_MASK(type)   (1 << (type))
-#define IORT_MSI_TYPE  (1 << ACPI_IORT_NODE_ITS_GROUP)
-#define IORT_IOMMU_TYPE((1 << ACPI_IORT_NODE_SMMU) |   \
-   (1 << ACPI_IORT_NODE_SMMU_V3))
+enum iort_node_category {
+   IORT_MSI_TYPE,
+   IORT_IOMMU_TYPE,
+};
 
 struct iort_its_msi_chip {
struct list_headlist;
@@ -38,6 +38,20 @@ struct iort_fwnode {
 static LIST_HEAD(iort_fwnode_list);
 static DEFINE_SPINLOCK(iort_fwnode_lock);
 
+static bool iort_type_matches(u8 type, enum iort_node_category category)
+{
+   switch (category) {
+   case IORT_IOMMU_TYPE:
+   return type == ACPI_IORT_NODE_SMMU ||
+  type == ACPI_IORT_NODE_SMMU_V3;
+   case IORT_MSI_TYPE:
+   return type == ACPI_IORT_NODE_ITS_GROUP;
+   default:
+   WARN_ON(1);
+   return false;
+   }
+}
+
 /**
  * iort_set_fwnode() - Create iort_fwnode and use it to register
  *iommu data in the iort_fwnode_list
@@ -397,7 +411,7 @@ static int iort_get_id_mapping_index(struct acpi_iort_node 
*node)
 
 static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node,
   u32 id_in, u32 *id_out,
-  u8 type_mask)
+  enum iort_node_category category)
 {
u32 id = id_in;
 
@@ -406,7 +420,7 @@ static struct acpi_iort_node *iort_node_map_id(struct 
acpi_iort_node *node,
struct acpi_iort_id_mapping *map;
int i, index;
 
-   if (IORT_TYPE_MASK(node->type) & type_mask) {
+   if (iort_type_matches(node->type, category)) {
if (id_out)
*id_out = id;
return node;
@@ -458,8 +472,8 @@ static struct acpi_iort_node *iort_node_map_id(struct 
acpi_iort_node *node,
 }
 
 static struct acpi_iort_node *iort_node_map_platform_id(
-   struct acpi_iort_node *node, u32 *id_out, u8 type_mask,
-   int index)
+   struct acpi_iort_node *node, u32 *id_out,
+   enum iort_node_category category, int index)
 {
struct acpi_iort_node *parent;
u32 id;
@@ -475,8 +489,8 @@ static struct acpi_iort_node *iort_node_map_platform_id(
 * as NC (named component) -> SMMU -> ITS. If the type is matched,
 * return the initial dev id and its parent pointer directly.
 */
-   if (!(IORT_TYPE_MASK(parent->type) & type_mask))
-   parent = iort_node_map_id(parent, id, id_out, type_mask);
+   if (!iort_type_matches(parent->type, category))
+   parent = iort_node_map_id(parent, id, id_out, category);
else
if (id_out)
*id_out = id;
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 09/13] iommu/virtio: Create fwnode if necessary

2019-11-22 Thread Jean-Philippe Brucker
The presence of a fwnode on a PCI device depends on the platform. QEMU
q35, for example, creates an ACPI description for each PCI slot, but
QEMU virt (aarch64) doesn't. Since the IOMMU subsystem relies heavily on
fwnode to discover the DMA topology, create a fwnode for the
virtio-iommu if necessary, using the software_node framework.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 56 
 1 file changed, 51 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 3ea9d7682999..8efa368134c0 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -966,6 +966,48 @@ static struct iommu_ops viommu_ops = {
.of_xlate   = viommu_of_xlate,
 };
 
+static int viommu_set_fwnode(struct viommu_dev *viommu)
+{
+   /*
+* viommu->dev is the virtio device, its parent is the associated
+* transport device.
+*/
+   struct device *dev = viommu->dev->parent;
+
+   /*
+* With device tree a fwnode is always present. With ACPI, on some
+* platforms a PCI device has a DSDT node describing the slot. On other
+* platforms, no fwnode is created and we have to do it ourselves.
+*/
+   if (!dev->fwnode) {
+   struct fwnode_handle *fwnode;
+
+   fwnode = fwnode_create_software_node(NULL, NULL);
+   if (IS_ERR(fwnode))
+   return PTR_ERR(fwnode);
+
+   set_primary_fwnode(dev, fwnode);
+   }
+
+   iommu_device_set_fwnode(&viommu->iommu, dev->fwnode);
+   return 0;
+}
+
+static void viommu_clear_fwnode(struct viommu_dev *viommu)
+{
+   struct device *dev = viommu->dev->parent;
+
+   if (!dev->fwnode)
+   return;
+
+   if (is_software_node(dev->fwnode)) {
+   struct fwnode_handle *fwnode = dev->fwnode;
+
+   set_primary_fwnode(dev, NULL);
+   fwnode_remove_software_node(fwnode);
+   }
+}
+
 static int viommu_init_vqs(struct viommu_dev *viommu)
 {
struct virtio_device *vdev = dev_to_virtio(viommu->dev);
@@ -1004,7 +1046,6 @@ static int viommu_fill_evtq(struct viommu_dev *viommu)
 
 static int viommu_probe(struct virtio_device *vdev)
 {
-   struct device *parent_dev = vdev->dev.parent;
struct viommu_dev *viommu = NULL;
struct device *dev = &vdev->dev;
u64 input_start = 0;
@@ -1084,9 +1125,11 @@ static int viommu_probe(struct virtio_device *vdev)
if (ret)
goto err_free_vqs;
 
-   iommu_device_set_ops(&viommu->iommu, &viommu_ops);
-   iommu_device_set_fwnode(&viommu->iommu, parent_dev->fwnode);
+   ret = viommu_set_fwnode(viommu);
+   if (ret)
+   goto err_sysfs_remove;
 
+   iommu_device_set_ops(&viommu->iommu, &viommu_ops);
iommu_device_register(&viommu->iommu);
 
 #ifdef CONFIG_PCI
@@ -1119,8 +1162,10 @@ static int viommu_probe(struct virtio_device *vdev)
return 0;
 
 err_unregister:
-   iommu_device_sysfs_remove(&viommu->iommu);
iommu_device_unregister(&viommu->iommu);
+   viommu_clear_fwnode(viommu);
+err_sysfs_remove:
+   iommu_device_sysfs_remove(&viommu->iommu);
 err_free_vqs:
vdev->config->del_vqs(vdev);
 
@@ -1131,8 +1176,9 @@ static void viommu_remove(struct virtio_device *vdev)
 {
struct viommu_dev *viommu = vdev->priv;
 
-   iommu_device_sysfs_remove(&viommu->iommu);
iommu_device_unregister(&viommu->iommu);
+   viommu_clear_fwnode(viommu);
+   iommu_device_sysfs_remove(&viommu->iommu);
 
/* Stop all virtqueues */
vdev->config->reset(vdev);
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 01/13] ACPI/IORT: Move IORT to the ACPI folder

2019-11-22 Thread Jean-Philippe Brucker
IORT can be used (by QEMU) to describe a virtual topology containing an
architecture-agnostic paravirtualized device.

In order to build IORT for x86 systems, the driver has to be moved outside
of arm64/. Since there is nothing specific to arm64 in the driver, it
simply requires moving Makefile and Kconfig entries.

Signed-off-by: Jean-Philippe Brucker 
---
 MAINTAINERS | 9 +
 drivers/acpi/Kconfig| 3 +++
 drivers/acpi/Makefile   | 1 +
 drivers/acpi/arm64/Kconfig  | 3 ---
 drivers/acpi/arm64/Makefile | 1 -
 drivers/acpi/{arm64 => }/iort.c | 0
 6 files changed, 13 insertions(+), 4 deletions(-)
 rename drivers/acpi/{arm64 => }/iort.c (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index eb19fad370d7..9153d278f67e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -377,6 +377,15 @@ L: platform-driver-...@vger.kernel.org
 S: Maintained
 F: drivers/platform/x86/i2c-multi-instantiate.c
 
+ACPI IORT DRIVER
+M: Lorenzo Pieralisi 
+M: Hanjun Guo 
+M: Sudeep Holla 
+L: linux-a...@vger.kernel.org
+L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers)
+S: Maintained
+F: drivers/acpi/iort.c
+
 ACPI PMIC DRIVERS
 M: "Rafael J. Wysocki" 
 M: Len Brown 
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index ebe1e9e5fd81..548976c8b2b0 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -576,6 +576,9 @@ config TPS68470_PMIC_OPREGION
  region, which must be available before any of the devices
  using this, are probed.
 
+config ACPI_IORT
+   bool
+
 endif  # ACPI
 
 config X86_PM_TIMER
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 5d361e4e3405..9d1792165713 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -123,3 +123,4 @@ video-objs  += acpi_video.o video_detect.o
 obj-y  += dptf/
 
 obj-$(CONFIG_ARM64)+= arm64/
+obj-$(CONFIG_ACPI_IORT)+= iort.o
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index 6dba187f4f2e..d0902c85d46e 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -3,8 +3,5 @@
 # ACPI Configuration for ARM64
 #
 
-config ACPI_IORT
-   bool
-
 config ACPI_GTDT
bool
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 6ff50f4ed947..38771a816caf 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,3 +1,2 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_ACPI_IORT)+= iort.o
 obj-$(CONFIG_ACPI_GTDT)+= gtdt.o
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/iort.c
similarity index 100%
rename from drivers/acpi/arm64/iort.c
rename to drivers/acpi/iort.c
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 00/13] virtio-iommu on non-devicetree platforms

2019-11-22 Thread Jean-Philippe Brucker
I'm seeking feedback on multi-platform support for virtio-iommu. At the
moment only devicetree (DT) is supported and we don't have a pleasant
solution for other platforms. Once we figure out the topology
description, x86 support is trivial.

Since the IOMMU manages memory accesses from other devices, the guest
kernel needs to initialize the IOMMU before endpoints start issuing DMA.
It's a solved problem: firmware or hypervisor describes through DT or
ACPI tables the device dependencies, and probe of endpoints is deferred
until the IOMMU is probed. But:

(1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT
for Arm). From my point of view IORT is easier to extend, since we
just need to introduce a new node type. There are no dependencies to
Arm in the Linux IORT driver, so it works well with CONFIG_X86.

However, there are concerns about other OS vendors feeling obligated
to implement this new node, so Arm proposed introducing another ACPI
table, that can wrap any of DMAR, IVRS and IORT to extend it with
new virtual nodes. A draft of this VIOT table specification is
available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf

I'm afraid this could increase fragmentation as guests would need to
implement or modify their support for all of DMAR, IVRS and IORT. If
we end up doing VIOT, I suggest limiting it to IORT.

(2) In addition, there are some concerns about having virtio depend on
ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86
[1]) don't currently implement those methods.

It was suggested to embed the topology description into the device.
It can work, as demonstrated at the end of this RFC, with the
following limitations:

- The topology description must be read before any endpoint managed
  by the IOMMU is probed, and even before the virtio module is
  loaded. This RFC uses a PCI quirk to manually parse the virtio
  configuration. It assumes that all endpoints managed by the IOMMU
  are under this same PCI host.

- I don't have a solution for the virtio-mmio transport at the
  moment, because I haven't had time to modify a host to test it. I
  think it could either use a notifier on the platform bus, or
  better, a new 'iommu' command-line argument to the virtio-mmio
  driver. So the current prototype doesn't work for firecracker and
  microvm, which rely on virtio-mmio.

- For Arm, if the platform has an ITS, the hypervisor needs IORT or
  DT to describe it anyway. More generally, not using either ACPI or
  DT might prevent from supporting other features as well. I suspect
  the above users will have to implement a standard method sooner or
  later.

- Even when reusing as much existing code as possible, guest support
  is still going to be around a few hundred lines since we can't
  rely on the normal virtio infrastructure to be loaded at that
  point. As you can see below, the diffstat for the incomplete
  topology implementation is already bigger than the exhaustive IORT
  support, even when jumping through the VIOT hoop.

So it's a lightweight solution for very specific use-cases, and we
should still support ACPI for the general case. Multi-platform
guests such as Linux will then need to support three topology
descriptions instead of two.

In this RFC I present both solutions, but I'd rather not keep all of it.
Please see the individual patches for details:

(1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT
driver and patches 2, 11 add the VIOT glue.

(2) Patch 12 adds the built-in topology description to the virtio-iommu
specification. Patch 13 is a partial implementation for the Linux
virtio-iommu driver. It only supports PCI, not platform devices.

You can find Linux and QEMU code on my virtio-iommu/devel branches at
http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu


I split the diffstat since there are two independent features. The first
one is for patches 1-11, and the second one for patch 13.

Jean-Philippe Brucker (11):
  ACPI/IORT: Move IORT to the ACPI folder
  ACPI: Add VIOT definitions
  ACPI/IORT: Allow registration of external tables
  ACPI/IORT: Add node categories
  ACPI/IORT: Support VIOT virtio-mmio node
  ACPI/IORT: Support VIOT virtio-pci node
  ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode
  ACPI/IORT: Add callback to update a device's fwnode
  iommu/virtio: Create fwnode if necessary
  iommu/virtio: Update IORT fwnode
  ACPI: Add VIOT table

 MAINTAINERS |   9 +
 drivers/acpi/Kconfig|   7 +
 drivers/acpi/Makefile   |   2 +
 drivers/acpi/arm64/Kconfig  |   3 -
 drivers/acpi/arm64/Makefile |   1 -
 drivers/acpi/bus.c  |   2 +
 drivers/acpi/{arm64 => }/iort.c | 317 ++--
 drivers/acpi/tables.c   |   2 +-
 driv

[RFC 03/13] ACPI/IORT: Allow registration of external tables

2019-11-22 Thread Jean-Philippe Brucker
Add a function to register an IORT table from an external source.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/iort.c   | 22 --
 include/linux/acpi_iort.h | 10 ++
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index d62a9ea26fae..9c6c91e06f8f 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -144,6 +144,7 @@ typedef acpi_status (*iort_find_node_callback)
 
 /* Root pointer to the mapped IORT table */
 static struct acpi_table_header *iort_table;
+static enum iort_table_source iort_table_source;
 
 static LIST_HEAD(iort_msi_chip_list);
 static DEFINE_SPINLOCK(iort_msi_chip_lock);
@@ -1617,11 +1618,28 @@ static void __init iort_init_platform_devices(void)
}
 }
 
+void __init acpi_iort_register_table(struct acpi_table_header *table,
+enum iort_table_source source)
+{
+   /*
+* Firmware or hypervisor should know better than give us two IORT
+* tables.
+*/
+   if (WARN_ON(iort_table))
+   return;
+
+   iort_table = table;
+   iort_table_source = source;
+
+   iort_init_platform_devices();
+}
+
 void __init acpi_iort_init(void)
 {
acpi_status status;
+   static struct acpi_table_header *table;
 
-   status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table);
+   status = acpi_get_table(ACPI_SIG_IORT, 0, &table);
if (ACPI_FAILURE(status)) {
if (status != AE_NOT_FOUND) {
const char *msg = acpi_format_exception(status);
@@ -1632,5 +1650,5 @@ void __init acpi_iort_init(void)
return;
}
 
-   iort_init_platform_devices();
+   acpi_iort_register_table(table, IORT_SOURCE_IORT);
 }
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index 8e7e2ec37f1b..f4db5fff07cf 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -11,6 +11,11 @@
 #include 
 #include 
 
+enum iort_table_source {
+   IORT_SOURCE_IORT,   /* The Real Thing */
+   IORT_SOURCE_VIOT,   /* Paravirtual extensions */
+};
+
 #define IORT_IRQ_MASK(irq) (irq & 0xULL)
 #define IORT_IRQ_TRIGGER_MASK(irq) ((irq >> 32) & 0xULL)
 
@@ -27,6 +32,8 @@ int iort_register_domain_token(int trans_id, phys_addr_t base,
 void iort_deregister_domain_token(int trans_id);
 struct fwnode_handle *iort_find_domain_token(int trans_id);
 #ifdef CONFIG_ACPI_IORT
+void acpi_iort_register_table(struct acpi_table_header *table,
+ enum iort_table_source source);
 void acpi_iort_init(void);
 u32 iort_msi_map_rid(struct device *dev, u32 req_id);
 struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id);
@@ -37,6 +44,9 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 
*size);
 const struct iommu_ops *iort_iommu_configure(struct device *dev);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
 #else
+static void acpi_iort_register_table(struct acpi_table_header *table,
+enum iort_table_source source)
+{ }
 static inline void acpi_iort_init(void) { }
 static inline u32 iort_msi_map_rid(struct device *dev, u32 req_id)
 { return req_id; }
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC virtio 12/13] virtio-iommu: Add built-in topology description

2019-11-22 Thread Jean-Philippe Brucker
Add a lightweight method to describe the IOMMU topology in the config
space, guarded by a new feature bit. A list of capabilities in the
config space describes the devices managed by the IOMMU and their
endpoint IDs.

Signed-off-by: Jean-Philippe Brucker 
---
 virtio-iommu.tex | 88 
 1 file changed, 88 insertions(+)

diff --git a/virtio-iommu.tex b/virtio-iommu.tex
index 28c562b..2b29873 100644
--- a/virtio-iommu.tex
+++ b/virtio-iommu.tex
@@ -67,6 +67,9 @@ \subsection{Feature bits}\label{sec:Device Types / IOMMU 
Device / Feature bits}
 
 \item[VIRTIO_IOMMU_F_MMIO (5)]
   The VIRTIO_IOMMU_MAP_F_MMIO flag is available.
+
+\item[VIRTIO_IOMMU_F_TOPOLOGY (6)]
+  Topology description is available at \field{topo_offset}.
 \end{description}
 
 \drivernormative{\subsubsection}{Feature bits}{Device Types / IOMMU Device / 
Feature bits}
@@ -97,6 +100,7 @@ \subsection{Device configuration layout}\label{sec:Device 
Types / IOMMU Device /
 le32 end;
   } domain_range;
   le32 probe_size;
+  le16 topo_offset;
 };
 \end{lstlisting}
 
@@ -141,6 +145,90 @@ \subsection{Device initialization}\label{sec:Device Types 
/ IOMMU Device / Devic
 If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
 device SHOULD NOT let endpoints access the guest-physical address space.
 
+\subsubsection{Built-in topology description}\label{sec:Device Types / IOMMU 
Device / Device initialization / topology}
+
+The device manages memory accesses from endpoints, identified by endpoint
+IDs. The driver can discover which endpoint ID corresponds to an endpoint
+using several methods, depending on the platform. Platforms described
+with device tree use the \texttt{iommus} and \texttt{iommu-map} properties
+embedded into device nodes for this purpose. Platforms described with
+ACPI use a table such as the Virtual I/O Table. Platforms that do not
+support either device tree or ACPI may embed a minimalistic description
+in the device configuration space.
+
+An important disadvantage of describing the topology from within the
+device is the lack of initialization ordering information. Out-of-band
+descriptions such as device tree and ACPI let the operating system know
+about device dependencies so that it can initialize supplier devices
+(IOMMUs) before their consumers (endpoints). Platforms using the
+VIRTIO_IOMMU_F_TOPOLOGY feature have to communicate the device dependency
+in another way.
+
+If the VIRTIO_IOMMU_F_TOPOLOGY feature is negotiated, \field{topo_offset}
+is the offset between the beginning of the device-specific configuration
+space (virtio_iommu_config) and the first topology structure header. A
+topology structures defines the endpoint ID of one or more endpoints
+managed by the virtio-iommu device.
+
+\begin{lstlisting}
+struct virtio_iommu_topo_head {
+  le16 type;
+  le16 next;
+};
+\end{lstlisting}
+
+\field{next} is the offset between the beginning of the device-specific
+configuration space and the next topology structure header. When
+\field{next} is zero, this is the last structure.
+
+\field{type} describes the type of structure:
+\begin{description}
+  \item[VIRTIO_IOMMU_TOPO_PCI_RANGE (0)] struct virtio_iommu_topo_pci_range
+  \item[VIRTIO_IOMMU_TOPO_ENDPOINT (1)] struct virtio_iommu_topo_endpoint
+\end{description}
+
+\paragraph{PCI range}\label{sec:Device Types / IOMMU Device / Device 
initialization / topology / PCI range}
+
+\begin{lstlisting}
+struct virtio_iommu_topo_pci_range {
+  struct virtio_iommu_topo_head head;
+  le32 endpoint_start;
+  le16 hierarchy;
+  le16 requester_start;
+  le16 requester_end;
+  le16 reserved;
+};
+\end{lstlisting}
+
+The PCI range structure describes the endpoint IDs of a series of PCI
+devices.
+
+\begin{description}
+  \item[\field{hierarchy}] Identifier of the PCI hierarchy. Sometimes
+called PCI segment or domain number.
+  \item[\field{requester_start}] First requester ID in the range.
+  \item[\field{requester_end}] Last requester ID in the range.
+  \item[\field{endpoint_start}] First endpoint ID.
+\end{description}
+
+The correspondence between a PCI requester ID in the range
+[ requester_start; requester_end ] and its endpoint IDs is a linear
+transformation: endpoint_id = requester_id - requester_start +
+endpoint_start.
+
+\paragraph{Single endpoint}\label{sec:Device Types / IOMMU Device / Device 
initialization / topology / Single endpoint}
+
+\begin{lstlisting}
+struct virtio_iommu_topo_endpoint {
+  struct virtio_iommu_topo_head head;
+  le32 endpoint;
+  le64 address;
+};
+\end{lstlisting}
+
+\field{endpoint} is the ID of a single endpoint, identified by its first
+MMIO address in the physical address space.
+
 \subsection{Device operations}\label{sec:Device Types / IOMMU Device / Device 
operations}
 
 Driver send requests on the request virtqueue, notifies the device and
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfou

[RFC 02/13] ACPI: Add VIOT definitions

2019-11-22 Thread Jean-Philippe Brucker
This is temporary, until the VIOT table is published and these
definitions added to ACPICA.

Signed-off-by: Jean-Philippe Brucker 
---
 include/acpi/actbl2.h | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h
index e45ced27f4c3..99c1d747e9d8 100644
--- a/include/acpi/actbl2.h
+++ b/include/acpi/actbl2.h
@@ -25,6 +25,7 @@
  * the wrong signature.
  */
 #define ACPI_SIG_IORT   "IORT" /* IO Remapping Table */
+#define ACPI_SIG_VIOT   "VIOT" /* Virtual I/O Table */
 #define ACPI_SIG_IVRS   "IVRS" /* I/O Virtualization Reporting 
Structure */
 #define ACPI_SIG_LPIT   "LPIT" /* Low Power Idle Table */
 #define ACPI_SIG_MADT   "APIC" /* Multiple APIC Description Table */
@@ -412,6 +413,36 @@ struct acpi_ivrs_memory {
u64 memory_length;
 };
 
+/***
+ *
+ * VIOT - Virtual I/O Table
+ *Version 1
+ *
+ 
**/
+
+struct acpi_table_viot {
+   struct acpi_table_header header;
+   u8 reserved[12];
+   struct acpi_table_header base_table;
+};
+
+#define ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU0x80
+#define ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU   0x81
+
+struct acpi_viot_iort_virtio_pci_iommu {
+   u32 devid;
+};
+
+struct acpi_viot_iort_virtio_mmio_iommu {
+   u64 base_address;
+   u64 span;
+   u64 flags;
+   u64 interrupt;
+};
+
+/* FIXME: rename this monstrosity. */
+#define ACPI_VIOT_IORT_VIRTIO_MMIO_IOMMU_CACHE_COHERENT (1<<0)
+
 
/***
  *
  * LPIT - Low Power Idle Table
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 06/13] ACPI/IORT: Support VIOT virtio-pci node

2019-11-22 Thread Jean-Philippe Brucker
When virtio-iommu uses the PCI transport, IORT doesn't instantiate the
device and doesn't create a fwnode. They will be created later by the
PCI subsystem. Store the information needed to identify the IOMMU in
iort_fwnode_list.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/iort.c | 117 +++-
 1 file changed, 93 insertions(+), 24 deletions(-)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index adc5953fffa5..b517aa4e83ba 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -30,10 +30,17 @@ struct iort_its_msi_chip {
u32 translation_id;
 };
 
+struct iort_pci_devid {
+   u16 segment;
+   u8 bus;
+   u8 devfn;
+};
+
 struct iort_fwnode {
struct list_head list;
struct acpi_iort_node *iort_node;
struct fwnode_handle *fwnode;
+   struct iort_pci_devid *pci_devid;
 };
 static LIST_HEAD(iort_fwnode_list);
 static DEFINE_SPINLOCK(iort_fwnode_lock);
@@ -44,7 +51,8 @@ static bool iort_type_matches(u8 type, enum 
iort_node_category category)
case IORT_IOMMU_TYPE:
return type == ACPI_IORT_NODE_SMMU ||
   type == ACPI_IORT_NODE_SMMU_V3 ||
-  type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU;
+  type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU ||
+  type == ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU;
case IORT_MSI_TYPE:
return type == ACPI_IORT_NODE_ITS_GROUP;
default:
@@ -59,12 +67,14 @@ static bool iort_type_matches(u8 type, enum 
iort_node_category category)
  *
  * @node: IORT table node associated with the IOMMU
  * @fwnode: fwnode associated with the IORT node
+ * @pci_devid: pci device ID associated with the IORT node, may be NULL
  *
  * Returns: 0 on success
  *  <0 on failure
  */
 static inline int iort_set_fwnode(struct acpi_iort_node *iort_node,
- struct fwnode_handle *fwnode)
+ struct fwnode_handle *fwnode,
+ struct iort_pci_devid *pci_devid)
 {
struct iort_fwnode *np;
 
@@ -76,6 +86,7 @@ static inline int iort_set_fwnode(struct acpi_iort_node 
*iort_node,
INIT_LIST_HEAD(&np->list);
np->iort_node = iort_node;
np->fwnode = fwnode;
+   np->pci_devid = pci_devid;
 
spin_lock(&iort_fwnode_lock);
list_add_tail(&np->list, &iort_fwnode_list);
@@ -121,6 +132,7 @@ static inline void iort_delete_fwnode(struct acpi_iort_node 
*node)
spin_lock(&iort_fwnode_lock);
list_for_each_entry_safe(curr, tmp, &iort_fwnode_list, list) {
if (curr->iort_node == node) {
+   kfree(curr->pci_devid);
list_del(&curr->list);
kfree(curr);
break;
@@ -870,6 +882,7 @@ static inline bool iort_iommu_driver_enabled(u8 type)
case ACPI_IORT_NODE_SMMU:
return IS_BUILTIN(CONFIG_ARM_SMMU);
case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU:
+   case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU:
return IS_ENABLED(CONFIG_VIRTIO_IOMMU);
default:
pr_warn("IORT node type %u does not describe an IOMMU\n", type);
@@ -1451,6 +1464,28 @@ static void __init viommu_mmio_dma_configure(struct 
device *dev,
acpi_dma_configure(dev, attr);
 }
 
+static __init struct iort_pci_devid *
+viommu_pci_get_devid(struct acpi_iort_node *node)
+{
+   unsigned int val;
+   struct iort_pci_devid *devid;
+   struct acpi_viot_iort_virtio_pci_iommu *viommu;
+
+   viommu = (struct acpi_viot_iort_virtio_pci_iommu *)node->node_data;
+
+   val = le32_to_cpu(viommu->devid);
+
+   devid = kzalloc(sizeof(*devid), GFP_KERNEL);
+   if (!devid)
+   return ERR_PTR(-ENOMEM);
+
+   devid->segment = val >> 16;
+   devid->bus = PCI_BUS_NUM(val);
+   devid->devfn = val & 0xff;
+
+   return devid;
+}
+
 struct iort_dev_config {
const char *name;
int (*dev_init)(struct acpi_iort_node *node);
@@ -1462,6 +1497,7 @@ struct iort_dev_config {
int (*dev_set_proximity)(struct device *dev,
struct acpi_iort_node *node);
int (*dev_add_platdata)(struct platform_device *pdev);
+   struct iort_pci_devid *(*dev_get_pci_devid)(struct acpi_iort_node 
*node);
 };
 
 static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = {
@@ -1494,6 +1530,10 @@ static const struct iort_dev_config iort_viommu_mmio_cfg 
__initconst = {
.dev_init_resources = viommu_mmio_init_resources,
 };
 
+static const struct iort_dev_config iort_viommu_pci_cfg __initconst = {
+   .dev_get_pci_devid = viommu_pci_get_devid,
+};
+
 static __init const struct iort_dev_config *iort_get_dev_cfg(
struct acpi_iort_node *node)
 {
@@ -1510,6 +1550,8 @@ static __init const struct iort_de

[RFC 13/13] iommu/virtio: Add topology description to

2019-11-22 Thread Jean-Philippe Brucker
Some hypervisors don't implement either device-tree or ACPI, but still
need a method to describe the IOMMU topology. Read the virtio-iommu
config early and parse the topology description. Hook into the
dma_setup() callbacks to initialize the IOMMU before probing endpoints.

If the virtio-iommu uses the virtio-pci transport, this will only work
if the PCI root complex is the first device probed. We don't currently
support virtio-mmio.

Initially I tried to generate a fake IORT table and feed it to the IORT
driver, in order to avoid rewriting the whole DMA code, but it wouldn't
work with platform endpoints, which are references to items in the ACPI
table on IORT.

Signed-off-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 

---
Note that we only call virt_dma_configure() if the host didn't provide
either DT or ACPI method. If you want to test this with QEMU, you'll
need to manually disable the acpi_dma_configure() part in pci-driver.c
---
 drivers/base/platform.c   |   3 +
 drivers/iommu/Kconfig |   9 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu-topology.c | 410 ++
 drivers/iommu/virtio-iommu.c  |   3 +
 drivers/pci/pci-driver.c  |   3 +
 include/linux/virtio_iommu.h  |  18 ++
 include/uapi/linux/virtio_iommu.h |  26 ++
 8 files changed, 473 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu-topology.c
 create mode 100644 include/linux/virtio_iommu.h

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index b230beb6ccb4..70b12c8ef2fb 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "base.h"
 #include "power/power.h"
@@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev)
} else if (has_acpi_companion(dev)) {
attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode));
ret = acpi_dma_configure(dev, attr);
+   } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) {
+   ret = virt_dma_configure(dev);
}
 
return ret;
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e6eb4f238d1a..d02c0d36019d 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -486,4 +486,13 @@ config VIRTIO_IOMMU
 
  Say Y here if you intend to run this kernel as a guest.
 
+config VIRTIO_IOMMU_TOPOLOGY
+   bool "Topology properties for the virtio-iommu"
+   depends on VIRTIO_IOMMU
+   help
+ Enable early probing of the virtio-iommu device, to detect the
+ topology description.
+
+ Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 4f405f926e73..6b51c4186ebc 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
 obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o
diff --git a/drivers/iommu/virtio-iommu-topology.c 
b/drivers/iommu/virtio-iommu-topology.c
new file mode 100644
index ..ec22510ace3d
--- /dev/null
+++ b/drivers/iommu/virtio-iommu-topology.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct viommu_cap_config {
+   u8 pos; /* PCI capability position */
+   u8 bar;
+   u32 length; /* structure size */
+   u32 offset; /* structure offset within the bar */
+};
+
+struct viommu_spec {
+   struct device   *dev; /* transport device */
+   struct fwnode_handle*fwnode;
+   struct iommu_ops*ops;
+   struct list_headtopology;
+   struct list_headlist;
+};
+
+struct viommu_topology {
+   union {
+   struct virtio_iommu_topo_head head;
+   struct virtio_iommu_topo_pci_range pci;
+   struct virtio_iommu_topo_endpoint ep;
+   };
+   /* Index into viommu_spec->topology */
+   struct list_head list;
+};
+
+static LIST_HEAD(viommus);
+static DEFINE_MUTEX(viommus_lock);
+
+#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field)
+
+static inline int viommu_find_capability(struct pci_dev *dev, u8 cfg_type,
+struct viommu_cap_config *cap)
+{
+   int pos;
+   u8 bar;
+
+   for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+pos > 0;
+pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) {
+   u8 type;
+
+   pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), &type);
+   if (type != cfg_type)
+   continue;
+
+   pci_read_config_byte(dev, pos 

[RFC 08/13] ACPI/IORT: Add callback to update a device's fwnode

2019-11-22 Thread Jean-Philippe Brucker
For a PCI-based IOMMU, IORT isn't in charge of allocating a fwnode. Let
the IOMMU driver update the fwnode associated to an IORT node when
available.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/iort.c   | 38 ++
 include/linux/acpi_iort.h |  4 
 2 files changed, 42 insertions(+)

diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c
index f08f72d8af78..8263ab275b2b 100644
--- a/drivers/acpi/iort.c
+++ b/drivers/acpi/iort.c
@@ -1038,11 +1038,49 @@ const struct iommu_ops *iort_iommu_configure(struct 
device *dev)
 
return ops;
 }
+
+/**
+ * iort_iommu_update_fwnode - update fwnode of a PCI IOMMU
+ * @dev: the IOMMU device
+ * @fwnode: the fwnode, or NULL to remove an existing fwnode
+ *
+ * A PCI device isn't instantiated by the IORT driver. The IOMMU driver sets or
+ * removes its fwnode using this function.
+ */
+void iort_iommu_update_fwnode(struct device *dev, struct fwnode_handle *fwnode)
+{
+   struct pci_dev *pdev;
+   struct iort_fwnode *curr;
+   struct iort_pci_devid *devid;
+
+   if (!dev_is_pci(dev))
+   return;
+
+   pdev = to_pci_dev(dev);
+
+   spin_lock(&iort_fwnode_lock);
+   list_for_each_entry(curr, &iort_fwnode_list, list) {
+   devid = curr->pci_devid;
+   if (devid &&
+   pci_domain_nr(pdev->bus) == devid->segment &&
+   pdev->bus->number == devid->bus &&
+   pdev->devfn == devid->devfn) {
+   WARN_ON(fwnode && curr->fwnode);
+   curr->fwnode = fwnode;
+   break;
+   }
+   }
+   spin_unlock(&iort_fwnode_lock);
+}
+EXPORT_SYMBOL_GPL(iort_iommu_update_fwnode);
 #else
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 { return 0; }
 const struct iommu_ops *iort_iommu_configure(struct device *dev)
 { return NULL; }
+static void iort_iommu_update_fwnode(struct device *dev,
+struct fwnode_handle *fwnode)
+{ }
 #endif
 
 static int nc_dma_get_range(struct device *dev, u64 *size)
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index f4db5fff07cf..840635e40d9d 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -43,6 +43,7 @@ int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size);
 const struct iommu_ops *iort_iommu_configure(struct device *dev);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
+void iort_iommu_update_fwnode(struct device *dev, struct fwnode_handle 
*fwnode);
 #else
 static void acpi_iort_register_table(struct acpi_table_header *table,
 enum iort_table_source source)
@@ -63,6 +64,9 @@ static inline const struct iommu_ops *iort_iommu_configure(
 static inline
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 { return 0; }
+static void iort_iommu_update_fwnode(struct device *dev,
+struct fwnode_handle *fwnode)
+{ }
 #endif
 
 #endif /* __ACPI_IORT_H__ */
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC 11/13] ACPI: Add VIOT table

2019-11-22 Thread Jean-Philippe Brucker
Add support for a new ACPI table that embeds other tables describing a
platform's IOMMU topology. Currently the only supported base table is
IORT. The VIOT contains an IORT with additional node types, that
describe a virtio-iommu.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/Kconfig  |  4 
 drivers/acpi/Makefile |  1 +
 drivers/acpi/bus.c|  2 ++
 drivers/acpi/tables.c |  2 +-
 drivers/acpi/viot.c   | 44 +++
 drivers/iommu/Kconfig |  1 +
 include/linux/acpi_viot.h | 20 ++
 7 files changed, 73 insertions(+), 1 deletion(-)
 create mode 100644 drivers/acpi/viot.c
 create mode 100644 include/linux/acpi_viot.h

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 548976c8b2b0..513a5e4d3526 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -579,6 +579,10 @@ config TPS68470_PMIC_OPREGION
 config ACPI_IORT
bool
 
+config ACPI_VIOT
+   bool
+   select ACPI_IORT
+
 endif  # ACPI
 
 config X86_PM_TIMER
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 9d1792165713..6abdc6cc32c7 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -124,3 +124,4 @@ obj-y   += dptf/
 
 obj-$(CONFIG_ARM64)+= arm64/
 obj-$(CONFIG_ACPI_IORT)+= iort.o
+obj-$(CONFIG_ACPI_VIOT)+= viot.o
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 48bc96d45bab..6f364e0c9240 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -25,6 +25,7 @@
 #include 
 #endif
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1246,6 +1247,7 @@ static int __init acpi_init(void)
 
pci_mmcfg_late_init();
acpi_iort_init();
+   acpi_viot_init();
acpi_scan_init();
acpi_ec_init();
acpi_debugfs_init();
diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
index 180ac4329763..9662ea5e1064 100644
--- a/drivers/acpi/tables.c
+++ b/drivers/acpi/tables.c
@@ -501,7 +501,7 @@ static const char * const table_sigs[] = {
ACPI_SIG_WDDT, ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT,
ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT,
ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT,
-   NULL };
+   ACPI_SIG_VIOT, NULL };
 
 #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header)
 
diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
new file mode 100644
index ..ab9a6e43ad9b
--- /dev/null
+++ b/drivers/acpi/viot.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2019 Linaro
+ *
+ * Virtual IOMMU table
+ */
+#define pr_fmt(fmt)"ACPI: VIOT: " fmt
+
+#include 
+#include 
+#include 
+
+int __init acpi_viot_init(void)
+{
+   struct acpi_table_viot *viot;
+   struct acpi_table_header *acpi_header;
+   acpi_status status;
+
+   status = acpi_get_table(ACPI_SIG_VIOT, 0, &acpi_header);
+   if (ACPI_FAILURE(status)) {
+   if (status != AE_NOT_FOUND) {
+   const char *msg = acpi_format_exception(status);
+
+   pr_err("Failed to get table, %s\n", msg);
+   return -EINVAL;
+   }
+
+   return 0;
+   }
+
+   if (acpi_header->length < sizeof(*viot)) {
+   pr_err("VIOT table overflow, bad table!\n");
+   return -EINVAL;
+   }
+
+   viot = (struct acpi_table_viot *)acpi_header;
+   if (ACPI_COMPARE_NAMESEG(viot->base_table.signature, ACPI_SIG_IORT)) {
+   acpi_iort_register_table(&viot->base_table, IORT_SOURCE_VIOT);
+   return 0;
+   }
+
+   pr_err("Unknown base table header\n");
+   return -EINVAL;
+}
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e3842eabcfdd..e6eb4f238d1a 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -480,6 +480,7 @@ config VIRTIO_IOMMU
depends on ARM64
select IOMMU_API
select INTERVAL_TREE
+   select ACPI_VIOT if ACPI
help
  Para-virtualised IOMMU driver with virtio.
 
diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
new file mode 100644
index ..6c282d5eb793
--- /dev/null
+++ b/include/linux/acpi_viot.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Linaro
+ */
+
+#ifndef __ACPI_VIOT_H__
+#define __ACPI_VIOT_H__
+
+#ifdef CONFIG_ACPI_VIOT
+
+int acpi_viot_init(void);
+
+#else /* !CONFIG_ACPI_VIOT */
+
+static inline int acpi_viot_init(void)
+{}
+
+#endif /* !CONFIG_ACPI_VIOT */
+
+#endif /* __ACPI_VIOT_H__ */
-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu