Re: [PATCH v2] iommu/iova: silence warnings under memory pressure
On Fri, 2019-11-22 at 11:46 -0500, Qian Cai wrote: > On Fri, 2019-11-22 at 08:28 -0800, Joe Perches wrote: > > On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote: > > > On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote: > > > > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote: > > > > > When running heavy memory pressure workloads, this 5+ old system is > > > > > throwing endless warnings below because disk IO is too slow to recover > > > > > from swapping. Since the volume from alloc_iova_fast() could be large, > > > > > once it calls printk(), it will trigger disk IO (writing to the log > > > > > files) and pending softirqs which could cause an infinite loop and > > > > > make > > > > > no progress for days by the ongoimng memory reclaim. This is the > > > > > counter > > > > > part for Intel where the AMD part has already been merged. See the > > > > > commit 3d708895325b ("iommu/amd: Silence warnings under memory > > > > > pressure"). Since the allocation failure will be reported in > > > > > intel_alloc_iova(), so just call printk_ratelimted() there and silence > > > > > the one in alloc_iova_mem() to avoid the expensive warn_alloc(). > > > > > > > > [] > > > > > v2: use dev_err_ratelimited() and improve the commit messages. > > > > > > > > [] > > > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > > > > > > > [] > > > > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct > > > > > device *dev, > > > > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, > > > > > IOVA_PFN(dma_mask), true); > > > > > if (unlikely(!iova_pfn)) { > > > > > - dev_err(dev, "Allocating %ld-page iova failed", > > > > > nrpages); > > > > > + dev_err_ratelimited(dev, "Allocating %ld-page iova > > > > > failed", > > > > > + nrpages); > > > > > > > > Trivia: > > > > > > > > This should really have a \n termination on the format string > > > > > > > > dev_err_ratelimited(dev, "Allocating %ld-page iova > > > > failed\n", > > > > > > > > > > > > > > Why do you say so? It is right now printing with a newline added anyway. > > > > > > hpsa :03:00.0: DMAR: Allocating 1-page iova failed > > > > If another process uses pr_cont at the same time, > > it can be interleaved. > > I lean towards fixing that in a separate patch if ever needed, as the origin > dev_err() has no "\n" enclosed either. Your choice. I wrote trivia:, but touching the same line multiple times is relatively pointless. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 00/13] virtio-iommu on non-devicetree platforms
On Fri, 22 Nov 2019 11:49:47 +0100 Jean-Philippe Brucker wrote: > I'm seeking feedback on multi-platform support for virtio-iommu. At > the moment only devicetree (DT) is supported and we don't have a > pleasant solution for other platforms. Once we figure out the topology > description, x86 support is trivial. > > Since the IOMMU manages memory accesses from other devices, the guest > kernel needs to initialize the IOMMU before endpoints start issuing > DMA. It's a solved problem: firmware or hypervisor describes through > DT or ACPI tables the device dependencies, and probe of endpoints is > deferred until the IOMMU is probed. But: > > (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and > IORT for Arm). From my point of view IORT is easier to extend, since > we just need to introduce a new node type. There are no dependencies > to Arm in the Linux IORT driver, so it works well with CONFIG_X86. > >From my limited understanding, IORT and VIOT is to solve device topology enumeration only? I am not sure how it can be expanded to cover information beyond device topology. e.g. DMAR has NUMA information and root port ATS, I guess they are not used today in the guest but might be additions in the future. > However, there are concerns about other OS vendors feeling > obligated to implement this new node, so Arm proposed introducing > another ACPI table, that can wrap any of DMAR, IVRS and IORT to > extend it with new virtual nodes. A draft of this VIOT table > specification is available at > http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf > > I'm afraid this could increase fragmentation as guests would need > to implement or modify their support for all of DMAR, IVRS and IORT. > If we end up doing VIOT, I suggest limiting it to IORT. > > (2) In addition, there are some concerns about having virtio depend on > ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool > x86 [1]) don't currently implement those methods. > > It was suggested to embed the topology description into the > device. It can work, as demonstrated at the end of this RFC, with the > following limitations: > > - The topology description must be read before any endpoint > managed by the IOMMU is probed, and even before the virtio module is > loaded. This RFC uses a PCI quirk to manually parse the virtio > configuration. It assumes that all endpoints managed by the > IOMMU are under this same PCI host. > > - I don't have a solution for the virtio-mmio transport at the > moment, because I haven't had time to modify a host to test it. > I think it could either use a notifier on the platform bus, or > better, a new 'iommu' command-line argument to the virtio-mmio > driver. So the current prototype doesn't work for firecracker > and microvm, which rely on virtio-mmio. > > - For Arm, if the platform has an ITS, the hypervisor needs IORT > or DT to describe it anyway. More generally, not using either ACPI or > DT might prevent from supporting other features as well. I > suspect the above users will have to implement a standard method > sooner or later. > > - Even when reusing as much existing code as possible, guest > support is still going to be around a few hundred lines since we can't > rely on the normal virtio infrastructure to be loaded at that > point. As you can see below, the diffstat for the incomplete > topology implementation is already bigger than the exhaustive > IORT support, even when jumping through the VIOT hoop. > > So it's a lightweight solution for very specific use-cases, and we > should still support ACPI for the general case. Multi-platform > guests such as Linux will then need to support three topology > descriptions instead of two. > > In this RFC I present both solutions, but I'd rather not keep all of > it. Please see the individual patches for details: > > (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT > driver and patches 2, 11 add the VIOT glue. > > (2) Patch 12 adds the built-in topology description to the > virtio-iommu specification. Patch 13 is a partial implementation for > the Linux virtio-iommu driver. It only supports PCI, not platform > devices. > > You can find Linux and QEMU code on my virtio-iommu/devel branches at > http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu > > > I split the diffstat since there are two independent features. The > first one is for patches 1-11, and the second one for patch 13. > > Jean-Philippe Brucker (11): > ACPI/IORT: Move IORT to the ACPI folder > ACPI: Add VIOT definitions > ACPI/IORT: Allow registration of external tables > ACPI/IORT: Add node categories > ACPI/IORT: Support VIOT virtio-mmio node > ACPI/IORT: Support VIOT virtio-pci node > ACPI/IORT: Defer probe until virtio-iommu-pci has registered a > fwnode ACPI/IORT: Add callback to update a device's fwnode > iommu/virt
[PATCH v2 7/8] drm/msm/a6xx: Support split pagetables
Attempt to enable split pagetables if the arm-smmu driver supports it. This will move the default address space from the default region to the address range assigned to TTBR1. The behavior should be transparent to the driver for now but it gets the default buffers out of the way when we want to start swapping TTBR0 for context-specific pagetables. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 ++- 1 file changed, 45 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 5dc0b2c..96b3b28 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -811,6 +811,50 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu) return (unsigned long)busy_time; } +static struct msm_gem_address_space * +a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) +{ + struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type); + struct msm_gem_address_space *aspace; + struct msm_mmu *mmu; + u64 start, size; + u32 val = 1; + int ret; + + if (!iommu) + return ERR_PTR(-ENOMEM); + + /* Try to request split pagetables */ + iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val); + + mmu = msm_iommu_new(&pdev->dev, iommu); + if (IS_ERR(mmu)) { + iommu_domain_free(iommu); + return ERR_CAST(mmu); + } + + /* Check to see if split pagetables were successful */ + ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val); + if (!ret && val) { + /* +* The aperture start will be at the beginning of the TTBR1 +* space so use that as a base +*/ + start = iommu->geometry.aperture_start; + size = 0x; + } else { + /* Otherwise use the legacy 32 bit region */ + start = SZ_16M; + size = 0x - SZ_16M; + } + + aspace = msm_gem_address_space_create(mmu, "gpu", start, size); + if (IS_ERR(aspace)) + iommu_domain_free(iommu); + + return aspace; +} + static const struct adreno_gpu_funcs funcs = { .base = { .get_param = adreno_get_param, @@ -832,7 +876,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DRM_MSM_GPU_STATE) .gpu_state_get = a6xx_gpu_state_get, .gpu_state_put = a6xx_gpu_state_put, - .create_address_space = adreno_iommu_create_address_space, + .create_address_space = a6xx_create_address_space, #endif }, .get_timestamp = a6xx_get_timestamp, -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 6/8] drm/msm: Refactor address space initialization
Refactor how address space initialization works. Instead of having the address space function create the MMU object (and thus require separate but equal functions for gpummu and iommu) use a single function and pass the MMU struct in. Make the generic code cleaner by using target specific functions to create the address space so a2xx can do its own thing in its own space. For all the other targets use a generic helper to initialize IOMMU but leave the door open for newer targets to use customization if they need it. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/adreno/a2xx_gpu.c| 16 ++ drivers/gpu/drm/msm/adreno/a3xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/a4xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/a5xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c| 1 + drivers/gpu/drm/msm/adreno/adreno_gpu.c | 23 ++ drivers/gpu/drm/msm/adreno/adreno_gpu.h | 8 + drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 10 +++--- drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 14 + drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 4 --- drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 11 +-- drivers/gpu/drm/msm/msm_drv.h| 8 ++--- drivers/gpu/drm/msm/msm_gem_vma.c| 52 +--- drivers/gpu/drm/msm/msm_gpu.c| 40 ++-- drivers/gpu/drm/msm/msm_gpu.h| 4 +-- drivers/gpu/drm/msm/msm_iommu.c | 3 ++ 16 files changed, 83 insertions(+), 114 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c index 1f83bc1..60f6472 100644 --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c @@ -401,6 +401,21 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct msm_gpu *gpu) return state; } +static struct msm_gem_address_space * +a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) +{ + struct msm_mmu *mmu = msm_gpummu_new(&pdev->dev, gpu); + struct msm_gem_address_space *aspace; + + aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M, + SZ_16M + 0xfff * SZ_64K); + + if (IS_ERR(aspace) && !IS_ERR(mmu)) + mmu->funcs->destroy(mmu); + + return aspace; +} + /* Register offset defines for A2XX - copy of A3XX */ static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = { REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE), @@ -429,6 +444,7 @@ static const struct adreno_gpu_funcs funcs = { #endif .gpu_state_get = a2xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, + .create_address_space = a2xx_create_address_space, }, }; diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c index 7ad1493..41e51e0 100644 --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c @@ -441,6 +441,7 @@ static const struct adreno_gpu_funcs funcs = { #endif .gpu_state_get = a3xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, + .create_address_space = adreno_iommu_create_address_space, }, }; diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c index b01388a..3655440 100644 --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c @@ -532,6 +532,7 @@ static const struct adreno_gpu_funcs funcs = { #endif .gpu_state_get = a4xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, + .create_address_space = adreno_iommu_create_address_space, }, .get_timestamp = a4xx_get_timestamp, }; diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index b02e204..0f5db72 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -1432,6 +1432,7 @@ static const struct adreno_gpu_funcs funcs = { .gpu_busy = a5xx_gpu_busy, .gpu_state_get = a5xx_gpu_state_get, .gpu_state_put = a5xx_gpu_state_put, + .create_address_space = adreno_iommu_create_address_space, }, .get_timestamp = a5xx_get_timestamp, }; diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index dc8ec2c..5dc0b2c 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -832,6 +832,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DRM_MSM_GPU_STATE) .gpu_state_get = a6xx_gpu_state_get, .gpu_state_put = a6xx_gpu_state_put, + .create_address_space = adreno_iommu_create_address_space, #endif }, .get_timestamp = a6xx_get_timestamp, diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c inde
[PATCH v2 8/8] arm64: dts: qcom: sdm845: Update Adreno GPU SMMU compatible string
Add "qcom,adreno-smmu-v2" compatible string for the Adreno GPU SMMU node to enable split pagetable support. Signed-off-by: Jordan Crouse --- arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi index ddb1f23..d90ba6eda 100644 --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi @@ -2869,7 +2869,7 @@ }; adreno_smmu: iommu@504 { - compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2"; + compatible = "qcom,adreno-smmu-v2", "qcom,smmu-v2"; reg = <0 0x504 0 0x1>; #iommu-cells = <1>; #global-interrupts = <2>; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/8] iommu/arm-smmu: Split pagetable support for Adreno GPUs
Another refresh to support split pagetables for Adreno GPUs as part of an incremental process to enable per-context pagetables. In order to support per-context pagetables the GPU needs to enable split tables so that we can store global buffers in the TTBR1 space leaving the GPU free to program the TTBR0 register with the address of a context specific pagetable. This patchset adds split pagetable support for devices identified with the compatible string qcom,adreno-smmu-v2. If the compatible string is enabled and DOMAIN_ATTR_SPLIT_TABLES is non zero at attach time, the implementation will set up the TTBR0 and TTBR1 spaces with identical configurations and program the domain pagetable into the TTBR1 register. The TTBR0 register will be unused. The driver can determine if split pagetables were programmed by querying DOMAIN_ATTR_SPLIT_TABLES after attaching. The domain geometry will also be updated to reflect the virtual address space for the TTBR1 range. These patches are on based on top of linux-next-20191120 with [1], [2], and [3] from Robin on the iommu list. The first four patches add the device tree bindings and implementation specific support for arm-smmu and the rest of the patches add the drm/msm implementation followed by the device tree update for sdm845. [1] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039718.html [2] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039719.html [3] https://lists.linuxfoundation.org/pipermail/iommu/2019-October/039720.html Jordan Crouse (8): dt-bindings: arm-smmu: Add Adreno GPU variant iommu: Add DOMAIN_ATTR_SPLIT_TABLES iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context iommu/arm-smmu: Add split pagetables for Adreno IOMMU implementations drm/msm: Attach the IOMMU device during initialization drm/msm: Refactor address space initialization drm/msm/a6xx: Support split pagetables arm64: dts: qcom: sdm845: Update Adreno GPU SMMU compatible string .../devicetree/bindings/iommu/arm,smmu.yaml| 6 ++ arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +- drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 16 drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 45 ++ drivers/gpu/drm/msm/adreno/adreno_gpu.c| 23 -- drivers/gpu/drm/msm/adreno/adreno_gpu.h| 8 ++ drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c| 18 ++-- drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 18 ++-- drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 4 - drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 18 ++-- drivers/gpu/drm/msm/msm_drv.h | 8 +- drivers/gpu/drm/msm/msm_gem_vma.c | 37 ++--- drivers/gpu/drm/msm/msm_gpu.c | 49 +-- drivers/gpu/drm/msm/msm_gpu.h | 4 +- drivers/gpu/drm/msm/msm_gpummu.c | 6 -- drivers/gpu/drm/msm/msm_iommu.c| 18 ++-- drivers/gpu/drm/msm/msm_mmu.h | 1 - drivers/iommu/arm-smmu-impl.c | 6 +- drivers/iommu/arm-smmu-qcom.c | 96 ++ drivers/iommu/arm-smmu.c | 52 +--- drivers/iommu/arm-smmu.h | 14 +++- include/linux/iommu.h | 1 + 25 files changed, 295 insertions(+), 158 deletions(-) -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 5/8] drm/msm: Attach the IOMMU device during initialization
Everywhere an IOMMU object is created by msm_gpu_create_address_space the IOMMU device is attached immediately after. Instead of carrying around the infrastructure to do the attach from the device specific code do it directly in the msm_iommu_init() function. This gets it out of the way for more aggressive cleanups that follow. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 8 drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 4 drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 7 --- drivers/gpu/drm/msm/msm_gem_vma.c| 23 +++ drivers/gpu/drm/msm/msm_gpu.c| 11 +-- drivers/gpu/drm/msm/msm_gpummu.c | 6 -- drivers/gpu/drm/msm/msm_iommu.c | 15 +++ drivers/gpu/drm/msm/msm_mmu.h| 1 - 8 files changed, 27 insertions(+), 48 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c index 6c92f0f..b082b23 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c @@ -704,7 +704,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms) { struct iommu_domain *domain; struct msm_gem_address_space *aspace; - int ret; domain = iommu_domain_alloc(&platform_bus_type); if (!domain) @@ -720,13 +719,6 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms) return PTR_ERR(aspace); } - ret = aspace->mmu->funcs->attach(aspace->mmu); - if (ret) { - DPU_ERROR("failed to attach iommu %d\n", ret); - msm_gem_address_space_put(aspace); - return ret; - } - dpu_kms->base.aspace = aspace; return 0; } diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c index dda0543..9dba37c 100644 --- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c +++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c @@ -518,10 +518,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev) } kms->aspace = aspace; - - ret = aspace->mmu->funcs->attach(aspace->mmu); - if (ret) - goto fail; } else { DRM_DEV_INFO(dev->dev, "no iommu, fallback to phys " "contig buffers for scanout\n"); diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c index e43ecd4..653dab2 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c @@ -736,13 +736,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev) } kms->aspace = aspace; - - ret = aspace->mmu->funcs->attach(aspace->mmu); - if (ret) { - DRM_DEV_ERROR(&pdev->dev, "failed to attach iommu: %d\n", - ret); - goto fail; - } } else { DRM_DEV_INFO(&pdev->dev, "no iommu, fallback to phys contig buffers for scanout\n"); diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c b/drivers/gpu/drm/msm/msm_gem_vma.c index 1af5354..91d993a 100644 --- a/drivers/gpu/drm/msm/msm_gem_vma.c +++ b/drivers/gpu/drm/msm/msm_gem_vma.c @@ -131,8 +131,8 @@ msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain, const char *name) { struct msm_gem_address_space *aspace; - u64 size = domain->geometry.aperture_end - - domain->geometry.aperture_start; + u64 start = domain->geometry.aperture_start; + u64 size = domain->geometry.aperture_end - start; aspace = kzalloc(sizeof(*aspace), GFP_KERNEL); if (!aspace) @@ -141,9 +141,18 @@ msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain, spin_lock_init(&aspace->lock); aspace->name = name; aspace->mmu = msm_iommu_new(dev, domain); + if (IS_ERR(aspace->mmu)) { + int ret = PTR_ERR(aspace->mmu); - drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> PAGE_SHIFT), - size >> PAGE_SHIFT); + kfree(aspace); + return ERR_PTR(ret); + } + + /* +* Attaching the IOMMU device changes the aperture values so use the +* cached values instead +*/ + drm_mm_init(&aspace->mm, start >> PAGE_SHIFT, size >> PAGE_SHIFT); kref_init(&aspace->kref); @@ -164,6 +173,12 @@ msm_gem_address_space_create_a2xx(struct device *dev, struct msm_gpu *gpu, spin_lock_init(&aspace->lock); aspace->name = name; aspace->mmu = msm_gpummu_new(dev, gpu); + if (IS_ERR(aspace->mmu)) { + int ret = PTR_ERR(aspace->mmu); + + kfree(aspace); + return ERR_PTR(ret); + } drm_mm_init(&aspace->mm, (va_start
[PATCH v2 4/8] iommu/arm-smmu: Add split pagetables for Adreno IOMMU implementations
Add implementation specific support to enable split pagetables for SMMU implementations attached to Adreno GPUs on Qualcomm targets. To enable split pagetables the driver will set an attribute on the domain. if conditions are correct, set up the hardware to support equally sized TTBR0 and TTBR1 regions and programs the domain pagetable to TTBR1 to make it available for global buffers while allowing the GPU the chance to switch the TTBR0 at runtime for per-context pagetables. After programming the context, the value of the domain attribute can be queried to see if split pagetables were successfully programmed. The domain geometry will be updated so that the caller can determine the start of the region to generate correct virtual addresses. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu-impl.c | 3 ++ drivers/iommu/arm-smmu-qcom.c | 96 +++ drivers/iommu/arm-smmu.c | 41 ++ drivers/iommu/arm-smmu.h | 11 + 4 files changed, 143 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 33ed682..1e91231 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -174,5 +174,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500")) return qcom_smmu_impl_init(smmu); + if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu-v2")) + return adreno_smmu_impl_init(smmu); + return smmu; } diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c index 24c071c..6591e49 100644 --- a/drivers/iommu/arm-smmu-qcom.c +++ b/drivers/iommu/arm-smmu-qcom.c @@ -11,6 +11,102 @@ struct qcom_smmu { struct arm_smmu_device smmu; }; +#define TG0_4K 0 +#define TG0_64K 1 +#define TG0_16K 2 + +#define TG1_16K 1 +#define TG1_4K 2 +#define TG1_64K 3 + +/* + * Set up split pagetables for Adreno SMMUs that will keep a static TTBR1 for + * global buffers and dynamically switch TTBR0 from the GPU for context specific + * pagetables. + */ +static int adreno_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg) +{ + struct arm_smmu_cfg *cfg = &smmu_domain->cfg; + struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx]; + u32 tcr, tg0; + + /* +* Return error if split pagetables are not enabled so that arm-smmu +* do the default configuration +*/ + if (!(pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)) + return -EINVAL; + + /* Get the bank configuration from the pagetable config */ + tcr = arm_smmu_lpae_tcr(pgtbl_cfg) & 0x; + + /* +* The TCR configuration for TTBR0 and TTBR1 is (almost) identical so +* just duplicate the T0 configuration and shift it +*/ + cb->tcr[0] = (tcr << 16) | tcr; + + /* +* The (almost) above refers to the granule size field which is +* different for TTBR0 and TTBR1. With the TTBR1 quirk enabled, +* io-pgtable-arm will write the T1 appropriate granule size for tg. +* Translate the configuration from the T1 field to get the right value +* for T0 +*/ + if (pgtbl_cfg->arm_lpae_s1_cfg.tcr.tg == TG1_4K) + tg0 = TG0_4K; + else if (pgtbl_cfg->arm_lpae_s1_cfg.tcr.tg == TG1_16K) + tg0 = TG0_16K; + else + tg0 = TG0_64K; + + /* clear and set the correct value for TG0 */ + cb->tcr[0] &= ~TCR_TG0; + cb->tcr[0] |= FIELD_PREP(TCR_TG0, tg0); + + /* +* arm_smmu_lape_tcr2 sets SEP_UPSTREAM which is always the appropriate +* SEP for Adreno IOMMU +*/ + cb->tcr[1] = arm_smmu_lpae_tcr2(pgtbl_cfg); + cb->tcr[1] |= TCR2_AS; + + /* TTBRs */ + cb->ttbr[0] = FIELD_PREP(TTBRn_ASID, cfg->asid); + cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid); + + /* MAIRs */ + cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair; + cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32; + + return 0; +} + +static int adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg) +{ + /* Enable split pagetables if the flag is set and the format matches */ + if (smmu_domain->split_pagetables) + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && + smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64) + pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1; + + return 0; +} + +static const struct arm_smmu_impl adreno_smmu_impl = { + .init_context = adreno_smmu_init_context, + .init_context_bank = adreno_smmu_init_context_bank, +}; + +struct arm_smmu_device *adreno_smmu_i
[PATCH v2 3/8] iommu/arm-smmu: Pass io_pgtable_cfg to impl specific init_context
Pass the propposed io_pgtable_cfg to the implementation specific init_context() function to give the implementation an opportunity to to modify it before it gets passed to io-pgtable. Signed-off-by: Jordan Crouse --- drivers/iommu/arm-smmu-impl.c | 3 ++- drivers/iommu/arm-smmu.c | 11 ++- drivers/iommu/arm-smmu.h | 3 ++- 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index b2fe72a..33ed682 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu) return 0; } -static int cavium_init_context(struct arm_smmu_domain *smmu_domain) +static int cavium_init_context(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg) { struct cavium_smmu *cs = container_of(smmu_domain->smmu, struct cavium_smmu, smmu); diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c106406..5c7c32b 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -775,11 +775,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, cfg->asid = cfg->cbndx; smmu_domain->smmu = smmu; - if (smmu->impl && smmu->impl->init_context) { - ret = smmu->impl->init_context(smmu_domain); - if (ret) - goto out_unlock; - } pgtbl_cfg = (struct io_pgtable_cfg) { .pgsize_bitmap = smmu->pgsize_bitmap, @@ -790,6 +785,12 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, .iommu_dev = smmu->dev, }; + if (smmu->impl && smmu->impl->init_context) { + ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg); + if (ret) + goto out_unlock; + } + if (smmu_domain->non_strict) pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index afab9de..0eb498f 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -357,7 +357,8 @@ struct arm_smmu_impl { u64 val); int (*cfg_probe)(struct arm_smmu_device *smmu); int (*reset)(struct arm_smmu_device *smmu); - int (*init_context)(struct arm_smmu_domain *smmu_domain); + int (*init_context)(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg); void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync, int status); }; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/8] dt-bindings: arm-smmu: Add Adreno GPU variant
Add a compatible string to identify SMMUs that are attached to Adreno GPU devices that wish to support split pagetables. Signed-off-by: Jordan Crouse --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index 6515dbe..db9f826 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -31,6 +31,12 @@ properties: - qcom,sdm845-smmu-v2 - const: qcom,smmu-v2 + - description: Qcom Adreno GPU SMMU iplementing split pagetables +items: + - enum: + - qcom,adreno-smmu-v2 + - const: qcom,smmu-v2 + - description: Qcom SoCs implementing "arm,mmu-500" items: - enum: -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/8] iommu: Add DOMAIN_ATTR_SPLIT_TABLES
Add a new attribute to enable and query the state of split pagetables for the domain. Signed-off-by: Jordan Crouse --- include/linux/iommu.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index f2223cb..18c861e 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -126,6 +126,7 @@ enum iommu_attr { DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_NESTING,/* two stages of translation */ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, + DOMAIN_ATTR_SPLIT_TABLES, DOMAIN_ATTR_MAX, }; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
On Fri, Oct 25, 2019 at 07:08:37PM +0100, Robin Murphy wrote: > TTBR1 values have so far been redundant since no users implement any > support for split address spaces. Crucially, though, one of the main > reasons for wanting to do so is to be able to manage each half entirely > independently, e.g. context-switching one set of mappings without > disturbing the other. Thus it seems unlikely that tying two tables > together in a single io_pgtable_cfg would ever be particularly desirable > or useful. > > Streamline the configs to just a single conceptual TTBR value > representing the allocated table. This paves the way for future users to > support split address spaces by simply allocating a table and dealing > with the detailed TTBRn logistics themselves. Tested-by: Jordan Crouse > Signed-off-by: Robin Murphy > --- > drivers/iommu/arm-smmu-v3.c| 2 +- > drivers/iommu/arm-smmu.c | 9 - > drivers/iommu/io-pgtable-arm-v7s.c | 16 +++- > drivers/iommu/io-pgtable-arm.c | 5 ++--- > drivers/iommu/ipmmu-vmsa.c | 2 +- > drivers/iommu/msm_iommu.c | 4 ++-- > drivers/iommu/mtk_iommu.c | 4 ++-- > drivers/iommu/qcom_iommu.c | 3 +-- > include/linux/io-pgtable.h | 4 ++-- > 9 files changed, 22 insertions(+), 27 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index 3f20e548f1ec..da31e607698f 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -2170,7 +2170,7 @@ static int arm_smmu_domain_finalise_s1(struct > arm_smmu_domain *smmu_domain, > } > > cfg->cd.asid= (u16)asid; > - cfg->cd.ttbr= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0]; > + cfg->cd.ttbr= pgtbl_cfg->arm_lpae_s1_cfg.ttbr; > cfg->cd.tcr = pgtbl_cfg->arm_lpae_s1_cfg.tcr; > cfg->cd.mair= pgtbl_cfg->arm_lpae_s1_cfg.mair; > return 0; > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index 2bc3e93b11e6..a249e4e49ead 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -534,13 +534,12 @@ static void arm_smmu_init_context_bank(struct > arm_smmu_domain *smmu_domain, > /* TTBRs */ > if (stage1) { > if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) { > - cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr[0]; > - cb->ttbr[1] = pgtbl_cfg->arm_v7s_cfg.ttbr[1]; > + cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr; > + cb->ttbr[1] = 0; > } else { > - cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0]; > + cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; > cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid); > - cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1]; > - cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid); > + cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid); > } > } else { > cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr; > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c > b/drivers/iommu/io-pgtable-arm-v7s.c > index 7c3bd2c3cdca..4d2c1e7f67c4 100644 > --- a/drivers/iommu/io-pgtable-arm-v7s.c > +++ b/drivers/iommu/io-pgtable-arm-v7s.c > @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct > io_pgtable_cfg *cfg, > /* Ensure the empty pgd is visible before any actual TTBR write */ > wmb(); > > - /* TTBRs */ > - cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) | > -ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS | > -(cfg->coherent_walk ? > -(ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) | > - ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) : > -(ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) | > - ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC))); > - cfg->arm_v7s_cfg.ttbr[1] = 0; > + /* TTBR */ > + cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S | > + (cfg->coherent_walk ? (ARM_V7S_TTBR_NOS | > + ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) | > + ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) : > + (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) | > + ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC))); > return &data->iop; > > out_free_data: > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index 1795df8f7a51..bc0841040ebe 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -872,9 +872,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, > void *cookie) > /* Ensure the empty pgd is visible before any actual TTBR write */ >
Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote: > Although it's conceptually nice for the io_pgtable_cfg to provide a > standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU > looks exactly like an Arm CPU, and they all have various other TCR > controls which io-pgtable can't be expected to understand. Thus since > there is an expectation that drivers will have to add to the given TCR > value anyway, let's strip it down to just the essentials that are > directly relevant to io-pgatble's inner workings - namely the various > sizes and the walk attributes. Tested-by: Jordan Crouse > Signed-off-by: Robin Murphy > --- > drivers/iommu/arm-smmu-v3.c| 41 +++-- > drivers/iommu/arm-smmu.c | 7 ++- > drivers/iommu/arm-smmu.h | 27 > drivers/iommu/io-pgtable-arm-v7s.c | 6 +- > drivers/iommu/io-pgtable-arm.c | 98 -- > drivers/iommu/io-pgtable.c | 2 +- > drivers/iommu/qcom_iommu.c | 8 +-- > include/linux/io-pgtable.h | 9 ++- > 8 files changed, 94 insertions(+), 104 deletions(-) > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index da31e607698f..ca72cd777955 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -261,27 +261,18 @@ > /* Context descriptor (stage-1 only) */ > #define CTXDESC_CD_DWORDS8 > #define CTXDESC_CD_0_TCR_T0SZGENMASK_ULL(5, 0) > -#define ARM64_TCR_T0SZ GENMASK_ULL(5, 0) > #define CTXDESC_CD_0_TCR_TG0 GENMASK_ULL(7, 6) > -#define ARM64_TCR_TG0GENMASK_ULL(15, 14) > #define CTXDESC_CD_0_TCR_IRGN0 GENMASK_ULL(9, 8) > -#define ARM64_TCR_IRGN0 GENMASK_ULL(9, 8) > #define CTXDESC_CD_0_TCR_ORGN0 GENMASK_ULL(11, 10) > -#define ARM64_TCR_ORGN0 GENMASK_ULL(11, 10) > #define CTXDESC_CD_0_TCR_SH0 GENMASK_ULL(13, 12) > -#define ARM64_TCR_SH0GENMASK_ULL(13, 12) > #define CTXDESC_CD_0_TCR_EPD0(1ULL << 14) > -#define ARM64_TCR_EPD0 (1ULL << 7) > #define CTXDESC_CD_0_TCR_EPD1(1ULL << 30) > -#define ARM64_TCR_EPD1 (1ULL << 23) > > #define CTXDESC_CD_0_ENDI(1UL << 15) > #define CTXDESC_CD_0_V (1UL << 31) > > #define CTXDESC_CD_0_TCR_IPS GENMASK_ULL(34, 32) > -#define ARM64_TCR_IPSGENMASK_ULL(34, 32) > #define CTXDESC_CD_0_TCR_TBI0(1ULL << 38) > -#define ARM64_TCR_TBI0 (1ULL << 37) > > #define CTXDESC_CD_0_AA64(1UL << 41) > #define CTXDESC_CD_0_S (1UL << 44) > @@ -292,10 +283,6 @@ > > #define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(51, 4) > > -/* Convert between AArch64 (CPU) TCR format and SMMU CD format */ > -#define ARM_SMMU_TCR2CD(tcr, fld)FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \ > - FIELD_GET(ARM64_TCR_##fld, tcr)) > - > /* Command queue */ > #define CMDQ_ENT_SZ_SHIFT4 > #define CMDQ_ENT_DWORDS ((1 << CMDQ_ENT_SZ_SHIFT) >> 3) > @@ -1443,23 +1430,6 @@ static int arm_smmu_cmdq_issue_sync(struct > arm_smmu_device *smmu) > } > > /* Context descriptor manipulation functions */ > -static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr) > -{ > - u64 val = 0; > - > - /* Repack the TCR. Just care about TTBR0 for now */ > - val |= ARM_SMMU_TCR2CD(tcr, T0SZ); > - val |= ARM_SMMU_TCR2CD(tcr, TG0); > - val |= ARM_SMMU_TCR2CD(tcr, IRGN0); > - val |= ARM_SMMU_TCR2CD(tcr, ORGN0); > - val |= ARM_SMMU_TCR2CD(tcr, SH0); > - val |= ARM_SMMU_TCR2CD(tcr, EPD0); > - val |= ARM_SMMU_TCR2CD(tcr, EPD1); > - val |= ARM_SMMU_TCR2CD(tcr, IPS); > - > - return val; > -} > - > static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu, > struct arm_smmu_s1_cfg *cfg) > { > @@ -1469,7 +1439,7 @@ static void arm_smmu_write_ctx_desc(struct > arm_smmu_device *smmu, >* We don't need to issue any invalidation here, as we'll invalidate >* the STE when installing the new entry anyway. >*/ > - val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) | > + val = cfg->cd.tcr | > #ifdef __BIG_ENDIAN > CTXDESC_CD_0_ENDI | > #endif > @@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct > arm_smmu_domain *smmu_domain, > int asid; > struct arm_smmu_device *smmu = smmu_domain->smmu; > struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; > + typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = > &pgtbl_cfg->arm_lpae_s1_cfg.tcr; > > asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits); > if (asid < 0) > @@ -2171,7 +2142,13 @@ static int arm_smmu_domain_finalise_s1(struct > arm_smmu_domain *smmu_domain, >
Re: [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage
On Fri, Oct 25, 2019 at 07:08:39PM +0100, Robin Murphy wrote: > Now that we can correctly extract top-level indices without relying on > the remaining upper bits being zero, the only remaining impediments to > using a given table for TTBR1 are the address validation on map/unmap > and the awkward TCR translation granule format. Add a quirk so that we > can do the right thing at those points. Tested-by: Jordan Crouse > Signed-off-by: Robin Murphy > --- > drivers/iommu/io-pgtable-arm.c | 25 +++-- > include/linux/io-pgtable.h | 4 > 2 files changed, 23 insertions(+), 6 deletions(-) > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index 9b1912ede000..e53edff56e54 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -107,6 +107,10 @@ > #define ARM_LPAE_TCR_TG0_64K 1 > #define ARM_LPAE_TCR_TG0_16K 2 > > +#define ARM_LPAE_TCR_TG1_16K 1 > +#define ARM_LPAE_TCR_TG1_4K 2 > +#define ARM_LPAE_TCR_TG1_64K 3 > + > #define ARM_LPAE_TCR_SH0_SHIFT 12 > #define ARM_LPAE_TCR_SH_NS 0 > #define ARM_LPAE_TCR_SH_OS 2 > @@ -466,6 +470,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, > unsigned long iova, > arm_lpae_iopte *ptep = data->pgd; > int ret, lvl = data->start_level; > arm_lpae_iopte prot; > + long iaext = (long)iova >> cfg->ias; > > /* If no access, then nothing to do */ > if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE))) > @@ -474,7 +479,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, > unsigned long iova, > if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size)) > return -EINVAL; > > - if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas)) > + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) > + iaext = ~iaext; > + if (WARN_ON(iaext || paddr >> cfg->oas)) > return -ERANGE; > > prot = arm_lpae_prot_to_pte(data, iommu_prot); > @@ -640,11 +647,14 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops > *ops, unsigned long iova, > struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); > struct io_pgtable_cfg *cfg = &data->iop.cfg; > arm_lpae_iopte *ptep = data->pgd; > + long iaext = (long)iova >> cfg->ias; > > if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size)) > return 0; > > - if (WARN_ON(iova >> data->iop.cfg.ias)) > + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) > + iaext = ~iaext; > + if (WARN_ON(iaext)) > return 0; > > return __arm_lpae_unmap(data, gather, iova, size, data->start_level, > ptep); > @@ -780,9 +790,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, > void *cookie) > u64 reg; > struct arm_lpae_io_pgtable *data; > typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr; > + bool tg1; > > if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | > - IO_PGTABLE_QUIRK_NON_STRICT)) > + IO_PGTABLE_QUIRK_NON_STRICT | > + IO_PGTABLE_QUIRK_ARM_TTBR1)) > return NULL; > > data = arm_lpae_alloc_pgtable(cfg); > @@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg > *cfg, void *cookie) > tcr->orgn = ARM_LPAE_TCR_RGN_NC; > } > > + tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1; > switch (ARM_LPAE_GRANULE(data)) { > case SZ_4K: > - tcr->tg = ARM_LPAE_TCR_TG0_4K; > + tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K; > break; > case SZ_16K: > - tcr->tg = ARM_LPAE_TCR_TG0_16K; > + tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K; > break; > case SZ_64K: > - tcr->tg = ARM_LPAE_TCR_TG0_64K; > + tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K; > break; > } > > diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h > index 6ae104cedfd7..d7c5cb685e50 100644 > --- a/include/linux/io-pgtable.h > +++ b/include/linux/io-pgtable.h > @@ -83,12 +83,16 @@ struct io_pgtable_cfg { >* IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs >* on unmap, for DMA domains using the flush queue mechanism for >* delayed invalidation. > + * > + * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table > + * for use in the upper half of a split address space. >*/ > #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) > #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) > #define IO_PGTABLE_QUIRK_TLBI_ON_MAPBIT(2) > #define IO_PGTABLE_QUIRK_ARM_MTK_EXTBIT(3) > #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4) > + #define IO_PGTABLE_QUIRK_ARM_TTB
Re: [PATCH] of: property: Add device link support for "iommu-map"
On Fri, Nov 22, 2019 at 10:13 AM Ard Biesheuvel wrote: > > On Fri, 22 Nov 2019 at 17:01, Rob Herring wrote: > > > > On Fri, Nov 22, 2019 at 8:55 AM Will Deacon wrote: > > > > > > [+Ard] > > > > > > Hi Rob, > > > > > > On Fri, Nov 22, 2019 at 08:47:46AM -0600, Rob Herring wrote: > > > > On Wed, Nov 20, 2019 at 1:00 PM Will Deacon wrote: > > > > > > > > > > Commit 8e12257dead7 ("of: property: Add device link support for > > > > > iommus, > > > > > mboxes and io-channels") added device link support for IOMMU linkages > > > > > described using the "iommus" property. For PCI devices, this property > > > > > is not present and instead the "iommu-map" property is used on the > > > > > host > > > > > bridge node to map the endpoint RequesterIDs to their corresponding > > > > > IOMMU instance. > > > > > > > > > > Add support for "iommu-map" to the device link supplier bindings so > > > > > that > > > > > probing of PCI devices can be deferred until after the IOMMU is > > > > > available. > > > > > > > > > > Cc: Greg Kroah-Hartman > > > > > Cc: Rob Herring > > > > > Cc: Saravana Kannan > > > > > Cc: Robin Murphy > > > > > Signed-off-by: Will Deacon > > > > > --- > > > > > > > > > > Applies against driver-core/driver-core-next. > > > > > Tested on AMD Seattle (arm64). > > > > > > > > Guess that answers my question whether anyone uses Seattle with DT. > > > > Seattle uses the old SMMU binding, and there's not even an IOMMU > > > > associated with the PCI host. I raise this mainly because the dts > > > > files for Seattle either need some love or perhaps should be removed. > > > > > > I'm using the new DT bindings on my Seattle, thanks to the firmware fairy > > > (Ard) visiting my flat with a dediprog. The patches I've posted to enable > > > modular builds of the arm-smmu driver require that the old binding is > > > disabled [1]. > > > > Going to post those dts changes? > > > > Last time I tried upstreaming seattle DT changes I got zero response, > so I didn't bother since. I leave most dts reviews up to sub-arch maintainers and I'm pretty sure AMD doesn't care about it anymore, so we need a new maintainer or just send a pull request to Arnd/Olof. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4] iommu/iova: silence warnings under memory pressure
When running heavy memory pressure workloads, this 5+ old system is throwing endless warnings below because disk IO is too slow to recover from swapping. Since the volume from alloc_iova_fast() could be large, once it calls printk(), it will trigger disk IO (writing to the log files) and pending softirqs which could cause an infinite loop and make no progress for days by the ongoimng memory reclaim. This is the counter part for Intel where the AMD part has already been merged. See the commit 3d708895325b ("iommu/amd: Silence warnings under memory pressure"). Since the allocation failure will be reported in intel_alloc_iova(), so just call dev_err_once() there because even the "ratelimited" is too much, and silence the one in alloc_iova_mem() to avoid the expensive warn_alloc(). hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed slab_out_of_memory: 66 callbacks suppressed SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 0: slabs: 1822, objs: 16398, free: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 1: slabs: 2051, objs: 18459, free: 31 node 0: slabs: 697, objs: 4182, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) node 1: slabs: 381, objs: 2286, free: 27 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 hpsa :03:00.0: DMAR: Allocating 1-page iova failed warn_alloc: 96 callbacks suppressed kworker/11:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: GB Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 Workqueue: kblockd blk_mq_run_work_fn Call Trace: dump_stack+0xa0/0xea warn_alloc.cold.94+0x8a/0x12d __alloc_pages_slowpath+0x1750/0x1870 __alloc_pages_nodemask+0x58a/0x710 alloc_pages_current+0x9c/0x110 alloc_slab_page+0xc9/0x760 allocate_slab+0x48f/0x5d0 new_slab+0x46/0x70 ___slab_alloc+0x4ab/0x7b0 __slab_alloc+0x43/0x70 kmem_cache_alloc+0x2dd/0x450 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) alloc_iova+0x33/0x210 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 alloc_iova_fast+0x62/0x3d1 node 1: slabs: 381, objs: 2286, free: 27 intel_alloc_iova+0xce/0xe0 intel_map_sg+0xed/0x410 scsi_dma_map+0xd7/0x160 scsi_queue_rq+0xbf7/0x1310 blk_mq_dispatch_rq_list+0x4d9/0xbc0 blk_mq_sched_dispatch_requests+0x24a/0x300 __blk_mq_run_hw_queue+0x156/0x230 blk_mq_run_work_fn+0x3b/0x40 process_one_work+0x579/0xb90 worker_thread+0x63/0x5b0 kthread+0x1e6/0x210 ret_from_fork+0x3a/0x50 Mem-Info: active_anon:2422723 inactive_anon:361971 isolated_anon:34403 active_file:
[PATCH] iommu/arm-smmu: support SMMU module probing from the IORT
Add support for SMMU drivers built as modules to the ACPI/IORT device probing path, by deferring the probe of the master if the SMMU driver is known to exist but has not been loaded yet. Given that the IORT code registers a platform device for each SMMU that it discovers, we can easily trigger the udev based autoloading of the SMMU drivers by making the platform device identifier part of the module alias. Signed-off-by: Ard Biesheuvel --- drivers/acpi/arm64/iort.c | 4 ++-- drivers/iommu/arm-smmu-v3.c | 1 + drivers/iommu/arm-smmu.c| 1 + 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 5a7551d060f2..a696457a9b11 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -850,9 +850,9 @@ static inline bool iort_iommu_driver_enabled(u8 type) { switch (type) { case ACPI_IORT_NODE_SMMU_V3: - return IS_BUILTIN(CONFIG_ARM_SMMU_V3); + return IS_ENABLED(CONFIG_ARM_SMMU_V3); case ACPI_IORT_NODE_SMMU: - return IS_BUILTIN(CONFIG_ARM_SMMU); + return IS_ENABLED(CONFIG_ARM_SMMU); default: pr_warn("IORT node type %u does not describe an SMMU\n", type); return false; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 7669beafc493..bf6a1e8eb9b0 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -3733,4 +3733,5 @@ module_platform_driver(arm_smmu_driver); MODULE_DESCRIPTION("IOMMU API for ARM architected SMMUv3 implementations"); MODULE_AUTHOR("Will Deacon "); +MODULE_ALIAS("platform:arm-smmu-v3"); MODULE_LICENSE("GPL v2"); diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index d55acc48aee3..db5106b0955b 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -2292,4 +2292,5 @@ module_platform_driver(arm_smmu_driver); MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations"); MODULE_AUTHOR("Will Deacon "); +MODULE_ALIAS("platform:arm-smmu"); MODULE_LICENSE("GPL v2"); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3] iommu/iova: silence warnings under memory pressure
When running heavy memory pressure workloads, this 5+ old system is throwing endless warnings below because disk IO is too slow to recover from swapping. Since the volume from alloc_iova_fast() could be large, once it calls printk(), it will trigger disk IO (writing to the log files) and pending softirqs which could cause an infinite loop and make no progress for days by the ongoimng memory reclaim. This is the counter part for Intel where the AMD part has already been merged. See the commit 3d708895325b ("iommu/amd: Silence warnings under memory pressure"). Since the allocation failure will be reported in intel_alloc_iova(), so just call printk_ratelimted() there and silence the one in alloc_iova_mem() to avoid the expensive warn_alloc(). hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed slab_out_of_memory: 66 callbacks suppressed SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 0: slabs: 1822, objs: 16398, free: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 1: slabs: 2051, objs: 18459, free: 31 node 0: slabs: 697, objs: 4182, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) node 1: slabs: 381, objs: 2286, free: 27 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 hpsa :03:00.0: DMAR: Allocating 1-page iova failed warn_alloc: 96 callbacks suppressed kworker/11:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: GB Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 Workqueue: kblockd blk_mq_run_work_fn Call Trace: dump_stack+0xa0/0xea warn_alloc.cold.94+0x8a/0x12d __alloc_pages_slowpath+0x1750/0x1870 __alloc_pages_nodemask+0x58a/0x710 alloc_pages_current+0x9c/0x110 alloc_slab_page+0xc9/0x760 allocate_slab+0x48f/0x5d0 new_slab+0x46/0x70 ___slab_alloc+0x4ab/0x7b0 __slab_alloc+0x43/0x70 kmem_cache_alloc+0x2dd/0x450 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) alloc_iova+0x33/0x210 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 alloc_iova_fast+0x62/0x3d1 node 1: slabs: 381, objs: 2286, free: 27 intel_alloc_iova+0xce/0xe0 intel_map_sg+0xed/0x410 scsi_dma_map+0xd7/0x160 scsi_queue_rq+0xbf7/0x1310 blk_mq_dispatch_rq_list+0x4d9/0xbc0 blk_mq_sched_dispatch_requests+0x24a/0x300 __blk_mq_run_hw_queue+0x156/0x230 blk_mq_run_work_fn+0x3b/0x40 process_one_work+0x579/0xb90 worker_thread+0x63/0x5b0 kthread+0x1e6/0x210 ret_from_fork+0x3a/0x50 Mem-Info: active_anon:2422723 inactive_anon:361971 isolated_anon:34403 active_file:2285 inactive_file:1838 isolated_file:0
Re: [PATCH v2] iommu/iova: silence warnings under memory pressure
On Fri, 2019-11-22 at 08:28 -0800, Joe Perches wrote: > On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote: > > On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote: > > > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote: > > > > When running heavy memory pressure workloads, this 5+ old system is > > > > throwing endless warnings below because disk IO is too slow to recover > > > > from swapping. Since the volume from alloc_iova_fast() could be large, > > > > once it calls printk(), it will trigger disk IO (writing to the log > > > > files) and pending softirqs which could cause an infinite loop and make > > > > no progress for days by the ongoimng memory reclaim. This is the counter > > > > part for Intel where the AMD part has already been merged. See the > > > > commit 3d708895325b ("iommu/amd: Silence warnings under memory > > > > pressure"). Since the allocation failure will be reported in > > > > intel_alloc_iova(), so just call printk_ratelimted() there and silence > > > > the one in alloc_iova_mem() to avoid the expensive warn_alloc(). > > > > > > [] > > > > v2: use dev_err_ratelimited() and improve the commit messages. > > > > > > [] > > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > > > > > [] > > > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct > > > > device *dev, > > > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, > > > >IOVA_PFN(dma_mask), true); > > > > if (unlikely(!iova_pfn)) { > > > > - dev_err(dev, "Allocating %ld-page iova failed", > > > > nrpages); > > > > + dev_err_ratelimited(dev, "Allocating %ld-page iova > > > > failed", > > > > + nrpages); > > > > > > Trivia: > > > > > > This should really have a \n termination on the format string > > > > > > dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n", > > > > > > > > > > Why do you say so? It is right now printing with a newline added anyway. > > > > hpsa :03:00.0: DMAR: Allocating 1-page iova failed > > If another process uses pr_cont at the same time, > it can be interleaved. I lean towards fixing that in a separate patch if ever needed, as the origin dev_err() has no "\n" enclosed either. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/iova: silence warnings under memory pressure
On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote: > On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote: > > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote: > > > When running heavy memory pressure workloads, this 5+ old system is > > > throwing endless warnings below because disk IO is too slow to recover > > > from swapping. Since the volume from alloc_iova_fast() could be large, > > > once it calls printk(), it will trigger disk IO (writing to the log > > > files) and pending softirqs which could cause an infinite loop and make > > > no progress for days by the ongoimng memory reclaim. This is the counter > > > part for Intel where the AMD part has already been merged. See the > > > commit 3d708895325b ("iommu/amd: Silence warnings under memory > > > pressure"). Since the allocation failure will be reported in > > > intel_alloc_iova(), so just call printk_ratelimted() there and silence > > > the one in alloc_iova_mem() to avoid the expensive warn_alloc(). > > > > [] > > > v2: use dev_err_ratelimited() and improve the commit messages. > > > > [] > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > > > [] > > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct device > > > *dev, > > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, > > > IOVA_PFN(dma_mask), true); > > > if (unlikely(!iova_pfn)) { > > > - dev_err(dev, "Allocating %ld-page iova failed", nrpages); > > > + dev_err_ratelimited(dev, "Allocating %ld-page iova failed", > > > + nrpages); > > > > Trivia: > > > > This should really have a \n termination on the format string > > > > dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n", > > > > > > Why do you say so? It is right now printing with a newline added anyway. > > hpsa :03:00.0: DMAR: Allocating 1-page iova failed If another process uses pr_cont at the same time, it can be interleaved. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v9 1/4] uacce: Add documents for uacce
From: Kenneth Lee Uacce (Unified/User-space-access-intended Accelerator Framework) is a kernel module targets to provide Shared Virtual Addressing (SVA) between the accelerator and process. This patch add document to explain how it works. Signed-off-by: Kenneth Lee Signed-off-by: Zaibo Xu Signed-off-by: Zhou Wang Signed-off-by: Zhangfei Gao --- Documentation/misc-devices/uacce.rst | 176 +++ 1 file changed, 176 insertions(+) create mode 100644 Documentation/misc-devices/uacce.rst diff --git a/Documentation/misc-devices/uacce.rst b/Documentation/misc-devices/uacce.rst new file mode 100644 index 000..1db412e --- /dev/null +++ b/Documentation/misc-devices/uacce.rst @@ -0,0 +1,176 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Introduction of Uacce +- + +Uacce (Unified/User-space-access-intended Accelerator Framework) targets to +provide Shared Virtual Addressing (SVA) between accelerators and processes. +So accelerator can access any data structure of the main cpu. +This differs from the data sharing between cpu and io device, which share +only data content rather than address. +Because of the unified address, hardware and user space of process can +share the same virtual address in the communication. +Uacce takes the hardware accelerator as a heterogeneous processor, while +IOMMU share the same CPU page tables and as a result the same translation +from va to pa. + +:: + + __ __ +| | | | +| User application (CPU) | | Hardware Accelerator | +|__| |__| + + | | + | va | va + V V + ____ +| | | | +| MMU| | IOMMU | +|__| |__| + | | + | | + V pa V pa + ___ +| | +| Memory | +|___| + + + +Architecture + + +Uacce is the kernel module, taking charge of iommu and address sharing. +The user drivers and libraries are called WarpDrive. + +The uacce device, built around the IOMMU SVA API, can access multiple +address spaces, including the one without PASID. + +A virtual concept, queue, is used for the communication. It provides a +FIFO-like interface. And it maintains a unified address space between the +application and all involved hardware. + +:: + + ___ +| | user API | | +| WarpDrive library | > | user driver | +|___| || + || + || + | queue fd | + || + || + v| + ___ _| +| | | | | mmap memory +| Other framework | | uacce | | r/w interface +| crypto/nic/others | |_| | +|___| | + | || + | register | register | + | || + | || + |_ __ | + | | | | | | + - | Device Driver | | IOMMU | | + |_| |__| | + || +
Re: [PATCH] of: property: Add device link support for "iommu-map"
On Fri, 22 Nov 2019 at 17:01, Rob Herring wrote: > > On Fri, Nov 22, 2019 at 8:55 AM Will Deacon wrote: > > > > [+Ard] > > > > Hi Rob, > > > > On Fri, Nov 22, 2019 at 08:47:46AM -0600, Rob Herring wrote: > > > On Wed, Nov 20, 2019 at 1:00 PM Will Deacon wrote: > > > > > > > > Commit 8e12257dead7 ("of: property: Add device link support for iommus, > > > > mboxes and io-channels") added device link support for IOMMU linkages > > > > described using the "iommus" property. For PCI devices, this property > > > > is not present and instead the "iommu-map" property is used on the host > > > > bridge node to map the endpoint RequesterIDs to their corresponding > > > > IOMMU instance. > > > > > > > > Add support for "iommu-map" to the device link supplier bindings so that > > > > probing of PCI devices can be deferred until after the IOMMU is > > > > available. > > > > > > > > Cc: Greg Kroah-Hartman > > > > Cc: Rob Herring > > > > Cc: Saravana Kannan > > > > Cc: Robin Murphy > > > > Signed-off-by: Will Deacon > > > > --- > > > > > > > > Applies against driver-core/driver-core-next. > > > > Tested on AMD Seattle (arm64). > > > > > > Guess that answers my question whether anyone uses Seattle with DT. > > > Seattle uses the old SMMU binding, and there's not even an IOMMU > > > associated with the PCI host. I raise this mainly because the dts > > > files for Seattle either need some love or perhaps should be removed. > > > > I'm using the new DT bindings on my Seattle, thanks to the firmware fairy > > (Ard) visiting my flat with a dediprog. The patches I've posted to enable > > modular builds of the arm-smmu driver require that the old binding is > > disabled [1]. > > Going to post those dts changes? > Last time I tried upstreaming seattle DT changes I got zero response, so I didn't bother since. > > > No issues with the patch itself though. I'll queue it after rc1. > > > > Thanks, although I think Greg has already queued it [2] due to the > > dependencies on other patches in his tree. > > Okay, forgot to check my spam from Greg folder and missed that. > > Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of: property: Add device link support for "iommu-map"
On Fri, Nov 22, 2019 at 8:55 AM Will Deacon wrote: > > [+Ard] > > Hi Rob, > > On Fri, Nov 22, 2019 at 08:47:46AM -0600, Rob Herring wrote: > > On Wed, Nov 20, 2019 at 1:00 PM Will Deacon wrote: > > > > > > Commit 8e12257dead7 ("of: property: Add device link support for iommus, > > > mboxes and io-channels") added device link support for IOMMU linkages > > > described using the "iommus" property. For PCI devices, this property > > > is not present and instead the "iommu-map" property is used on the host > > > bridge node to map the endpoint RequesterIDs to their corresponding > > > IOMMU instance. > > > > > > Add support for "iommu-map" to the device link supplier bindings so that > > > probing of PCI devices can be deferred until after the IOMMU is > > > available. > > > > > > Cc: Greg Kroah-Hartman > > > Cc: Rob Herring > > > Cc: Saravana Kannan > > > Cc: Robin Murphy > > > Signed-off-by: Will Deacon > > > --- > > > > > > Applies against driver-core/driver-core-next. > > > Tested on AMD Seattle (arm64). > > > > Guess that answers my question whether anyone uses Seattle with DT. > > Seattle uses the old SMMU binding, and there's not even an IOMMU > > associated with the PCI host. I raise this mainly because the dts > > files for Seattle either need some love or perhaps should be removed. > > I'm using the new DT bindings on my Seattle, thanks to the firmware fairy > (Ard) visiting my flat with a dediprog. The patches I've posted to enable > modular builds of the arm-smmu driver require that the old binding is > disabled [1]. Going to post those dts changes? > > No issues with the patch itself though. I'll queue it after rc1. > > Thanks, although I think Greg has already queued it [2] due to the > dependencies on other patches in his tree. Okay, forgot to check my spam from Greg folder and missed that. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
On 20/11/2019 3:11 pm, Will Deacon wrote: On Mon, Nov 04, 2019 at 04:27:56PM -0700, Jordan Crouse wrote: On Mon, Nov 04, 2019 at 07:14:45PM +, Will Deacon wrote: On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote: diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index 9a57eb6c253c..059be7e21030 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain, iommu_writeq(ctx, ARM_SMMU_CB_TTBR0, pgtbl_cfg.arm_lpae_s1_cfg.ttbr | FIELD_PREP(TTBRn_ASID, ctx->asid)); - iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, - FIELD_PREP(TTBRn_ASID, ctx->asid)); + iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0); Are you sure it's safe to drop the ASID here? Just want to make sure there wasn't some "quirk" this was helping with. I was reminded of this recently. Some of our SMMU guys told me that a 0x0 in TTBR1 could cause a S2 fault if a faulty transaction caused a ttbr1 lookup so the "quirk" was writing the ASID so the register wasn't zero. I'm not sure if this is a vendor specific blip or not. You should be able to set EPD1 to prevent walks via TTBR1 in that case, though. Sticking the ASID in there is still dodgy if EPD1 is clear and TTBR1 points at junk (or even physical address 0x0). That's probably something which should be folded into this patch. Note that EPD1 was being set by io-pgtable-arm before this patch, and remains set by virtue of arm_smmu_lpae_tcr() afterwards, so presumably the brokenness might run a bit deeper than that. Either way, though, I'm somewhat dubious since the ASID could well be 0 anyway :/ Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v9 3/4] crypto: hisilicon - Remove module_param uacce_mode
Remove the module_param uacce_mode, which is not used currently. Signed-off-by: Zhangfei Gao Signed-off-by: Zhou Wang --- drivers/crypto/hisilicon/zip/zip_main.c | 31 ++- 1 file changed, 6 insertions(+), 25 deletions(-) diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c index 1b2ee96..3de9412 100644 --- a/drivers/crypto/hisilicon/zip/zip_main.c +++ b/drivers/crypto/hisilicon/zip/zip_main.c @@ -264,9 +264,6 @@ static u32 pf_q_num = HZIP_PF_DEF_Q_NUM; module_param_cb(pf_q_num, &pf_q_num_ops, &pf_q_num, 0444); MODULE_PARM_DESC(pf_q_num, "Number of queues in PF(v1 1-4096, v2 1-1024)"); -static int uacce_mode; -module_param(uacce_mode, int, 0); - static const struct pci_device_id hisi_zip_dev_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_ZIP_PF) }, { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_ZIP_VF) }, @@ -669,6 +666,7 @@ static int hisi_zip_probe(struct pci_dev *pdev, const struct pci_device_id *id) pci_set_drvdata(pdev, hisi_zip); qm = &hisi_zip->qm; + qm->use_dma_api = true; qm->pdev = pdev; qm->ver = rev_id; @@ -676,20 +674,6 @@ static int hisi_zip_probe(struct pci_dev *pdev, const struct pci_device_id *id) qm->dev_name = hisi_zip_name; qm->fun_type = (pdev->device == PCI_DEVICE_ID_ZIP_PF) ? QM_HW_PF : QM_HW_VF; - switch (uacce_mode) { - case 0: - qm->use_dma_api = true; - break; - case 1: - qm->use_dma_api = false; - break; - case 2: - qm->use_dma_api = true; - break; - default: - return -EINVAL; - } - ret = hisi_qm_init(qm); if (ret) { dev_err(&pdev->dev, "Failed to init qm!\n"); @@ -976,12 +960,10 @@ static int __init hisi_zip_init(void) goto err_pci; } - if (uacce_mode == 0 || uacce_mode == 2) { - ret = hisi_zip_register_to_crypto(); - if (ret < 0) { - pr_err("Failed to register driver to crypto.\n"); - goto err_crypto; - } + ret = hisi_zip_register_to_crypto(); + if (ret < 0) { + pr_err("Failed to register driver to crypto.\n"); + goto err_crypto; } return 0; @@ -996,8 +978,7 @@ static int __init hisi_zip_init(void) static void __exit hisi_zip_exit(void) { - if (uacce_mode == 0 || uacce_mode == 2) - hisi_zip_unregister_from_crypto(); + hisi_zip_unregister_from_crypto(); pci_unregister_driver(&hisi_zip_pci_driver); hisi_zip_unregister_debugfs(); } -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v9 4/4] crypto: hisilicon - register zip engine to uacce
Register qm to uacce framework for user crypto driver Signed-off-by: Zhangfei Gao Signed-off-by: Zhou Wang --- drivers/crypto/hisilicon/qm.c | 234 +++- drivers/crypto/hisilicon/qm.h | 11 ++ drivers/crypto/hisilicon/zip/zip_main.c | 16 ++- include/uapi/misc/uacce/hisi_qm.h | 23 4 files changed, 277 insertions(+), 7 deletions(-) create mode 100644 include/uapi/misc/uacce/hisi_qm.h diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c index a8ed6990..7d23daa 100644 --- a/drivers/crypto/hisilicon/qm.c +++ b/drivers/crypto/hisilicon/qm.c @@ -9,6 +9,9 @@ #include #include #include +#include +#include +#include #include "qm.h" /* eq/aeq irq enable */ @@ -467,6 +470,11 @@ static void qm_poll_qp(struct hisi_qp *qp, struct hisi_qm *qm) { struct qm_cqe *cqe = qp->cqe + qp->qp_status.cq_head; + if (qp->event_cb) { + qp->event_cb(qp); + return; + } + if (qp->req_cb) { while (QM_CQE_PHASE(cqe) == qp->qp_status.cqc_phase) { dma_rmb(); @@ -1271,7 +1279,7 @@ static int qm_qp_ctx_cfg(struct hisi_qp *qp, int qp_id, int pasid) * @qp: The qp we want to start to run. * @arg: Accelerator specific argument. * - * After this function, qp can receive request from user. Return qp_id if + * After this function, qp can receive request from user. Return 0 if * successful, Return -EBUSY if failed. */ int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg) @@ -1316,7 +1324,7 @@ int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg) dev_dbg(dev, "queue %d started\n", qp_id); - return qp_id; + return 0; } EXPORT_SYMBOL_GPL(hisi_qm_start_qp); @@ -1397,6 +1405,213 @@ static void hisi_qm_cache_wb(struct hisi_qm *qm) } } +static void qm_qp_event_notifier(struct hisi_qp *qp) +{ + wake_up_interruptible(&qp->uacce_q->wait); +} + +static int hisi_qm_get_available_instances(struct uacce_device *uacce) +{ + int i, ret; + struct hisi_qm *qm = uacce->priv; + + read_lock(&qm->qps_lock); + for (i = 0, ret = 0; i < qm->qp_num; i++) + if (!qm->qp_array[i]) + ret++; + read_unlock(&qm->qps_lock); + + return ret; +} + +static int hisi_qm_uacce_get_queue(struct uacce_device *uacce, + unsigned long arg, + struct uacce_queue *q) +{ + struct hisi_qm *qm = uacce->priv; + struct hisi_qp *qp; + u8 alg_type = 0; + + qp = hisi_qm_create_qp(qm, alg_type); + if (IS_ERR(qp)) + return PTR_ERR(qp); + + q->priv = qp; + q->uacce = uacce; + qp->uacce_q = q; + qp->event_cb = qm_qp_event_notifier; + qp->pasid = arg; + + return 0; +} + +static void hisi_qm_uacce_put_queue(struct uacce_queue *q) +{ + struct hisi_qp *qp = q->priv; + + hisi_qm_cache_wb(qp->qm); + hisi_qm_release_qp(qp); +} + +/* map sq/cq/doorbell to user space */ +static int hisi_qm_uacce_mmap(struct uacce_queue *q, + struct vm_area_struct *vma, + struct uacce_qfile_region *qfr) +{ + struct hisi_qp *qp = q->priv; + struct hisi_qm *qm = qp->qm; + size_t sz = vma->vm_end - vma->vm_start; + struct pci_dev *pdev = qm->pdev; + struct device *dev = &pdev->dev; + unsigned long vm_pgoff; + int ret; + + switch (qfr->type) { + case UACCE_QFRT_MMIO: + if (qm->ver == QM_HW_V2) { + if (sz > PAGE_SIZE * (QM_DOORBELL_PAGE_NR + + QM_DOORBELL_SQ_CQ_BASE_V2 / PAGE_SIZE)) + return -EINVAL; + } else { + if (sz > PAGE_SIZE * QM_DOORBELL_PAGE_NR) + return -EINVAL; + } + + vma->vm_flags |= VM_IO; + + return remap_pfn_range(vma, vma->vm_start, + qm->phys_base >> PAGE_SHIFT, + sz, pgprot_noncached(vma->vm_page_prot)); + case UACCE_QFRT_DUS: + if (sz != qp->qdma.size) + return -EINVAL; + + /* +* dma_mmap_coherent() requires vm_pgoff as 0 +* restore vm_pfoff to initial value for mmap() +*/ + vm_pgoff = vma->vm_pgoff; + vma->vm_pgoff = 0; + ret = dma_mmap_coherent(dev, vma, qp->qdma.va, + qp->qdma.dma, sz); + vma->vm_pgoff = vm_pgoff; + return ret; + + default: + return -EINVAL; + } +} + +static int hisi_qm_uacce_start_queue(struct uacce_queue *q) +{ + struct hisi_qp *qp = q->priv; + + return hisi_qm_start_qp(qp, qp->pa
Re: [PATCH v2 6/8] iommu/arm-smmu-v3: Add second level of context descriptor table
On Mon, Nov 11, 2019 at 03:50:07PM +, Jonathan Cameron wrote: > > + cfg->l1ptr = dmam_alloc_coherent(smmu->dev, size, > > +&cfg->l1ptr_dma, > > +GFP_KERNEL | __GFP_ZERO); > > As before. Fairly sure __GFP_ZERO doesn't give you anything extra. Indeed > > + if (!cfg->l1ptr) { > > + dev_warn(smmu->dev, "failed to allocate L1 context > > table\n"); > > + return -ENOMEM; > > + } > > + } > > + > > + cfg->tables = devm_kzalloc(smmu->dev, sizeof(struct arm_smmu_cd_table) * > > + cfg->num_tables, GFP_KERNEL); > > + if (!cfg->tables) { > > + ret = -ENOMEM; > > + goto err_free_l1; > > + } > > + > > + /* With two levels, leaf tables are allocated lazily */ > This comment is a kind of odd one. It is actually talking about what > 'doesn't' happen here I think.. > > Perhaps /* > * Only allocate a leaf table for linear case. > * With two levels, the leaf tables are allocated lazily. >*/ Yes, that's clearer > > + if (!cfg->l1ptr) { > > + ret = arm_smmu_alloc_cd_leaf_table(smmu, &cfg->tables[0], > > + max_contexts); > > + if (ret) > > + goto err_free_tables; > > + } > > + > > + return 0; > > + > > +err_free_tables: > > + devm_kfree(smmu->dev, cfg->tables); > > +err_free_l1: > > + if (cfg->l1ptr) > > + dmam_free_coherent(smmu->dev, size, cfg->l1ptr, cfg->l1ptr_dma); > > This cleanup only occurs if we have had an error. > Is there potential for this to rerun at some point later? If so we should > be careful to also reset relevant pointers - e.g. cfg->l1ptr = NULL as > they are used to control the flow above. Yes we should definitely clear l1ptr. The domain may be managed by a device driver, and if attach_dev() fails they will call domain_free(), which checks this pointer. Plus nothing prevents them from calling attach_dev() again with the same domain. > If there is no chance of a rerun why bother cleaning them up at all? > Something > has gone horribly wrong so let the eventual smmu cleanup deal with them. The domain is much shorter-lived than the SMMU device, so we need this cleanup. > > + return ret; > > } > > > > static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain) > > { > > + int i; > > struct arm_smmu_device *smmu = smmu_domain->smmu; > > struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; > > + size_t num_leaf_entries = 1 << cfg->s1cdmax; > > + struct arm_smmu_cd_table *table = cfg->tables; > > > > - arm_smmu_free_cd_leaf_table(smmu, &cfg->table, 1 << cfg->s1cdmax); > > + if (cfg->l1ptr) { > > + size_t size = cfg->num_tables * (CTXDESC_L1_DESC_DWORDS << 3); > > + > > + dmam_free_coherent(smmu->dev, size, cfg->l1ptr, cfg->l1ptr_dma); > > As above, if we can call this in a fashion that makes sense > other than in eventual smmu tear down, then we need to be > careful to reset the pointers. If not, then why are we > clearing > managed resourced by hand anyway? Yes, we call this on the error cleanup path (not only domain_free()), so it needs to leave the domain in a usable state. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 8/8] iommu/arm-smmu-v3: Add support for PCI PASID
Hi Jonathan, On Mon, Nov 11, 2019 at 04:05:29PM +, Jonathan Cameron wrote: > On Fri, 8 Nov 2019 16:25:08 +0100 > Jean-Philippe Brucker wrote: > > > Enable PASID for PCI devices that support it. Since the SSID tables are > > allocated by arm_smmu_attach_dev(), PASID has to be enabled early enough. > > arm_smmu_dev_feature_enable() would be too late, since by that time the > > main DMA domain has already been attached. Do it in add_device() instead. > > > > Signed-off-by: Jean-Philippe Brucker > Seems straightforward. > > Reviewed-by: Jonathan Cameron > > Thanks for working on this stuff. I hope we an move to get the rest of the > SVA elements lined up behind it so everything moves quickly in the next > cycle (or two). Thanks a lot for the thorough review. I'm aiming for v5.6 for the PASID series, and then realistically v5.7 for the rest of SVA, but I'll try to send it sooner. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 4/8] iommu/arm-smmu-v3: Prepare for SSID support
On Mon, Nov 11, 2019 at 02:38:11PM +, Jonathan Cameron wrote: > Hmm. There are several different refactors in here alongside a few new > bits. Would be nice to break it up more to make life even easier for > reviewers. It's not 'so' complex that it's really a problem though > so could leave it as is if you really want to. Sure, I'll see if I can split it more in next version. > > + table->ptr = dmam_alloc_coherent(smmu->dev, size, &table->ptr_dma, > > +GFP_KERNEL | __GFP_ZERO); > > We dropped dma_zalloc_coherent because we now zero in dma_alloc_coherent > anyway. Hence I'm fairly sure that __GFP_ZERO should have no effect. > > https://lore.kernel.org/patchwork/patch/1031536/ > > Am I missing some special corner case here? Here I just copied the GFP flags already in use. But removing all __GFP_ZERO from the driver would make a good cleanup patch. > > - if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) > > - arm_smmu_write_ctx_desc(smmu, &smmu_domain->s1_cfg); > > - > > Whilst it seems fine, perhaps a note on the 'why' of moving this into > finalise_s1 would be good in the patch description. Ok. Since it's only to simplify the handling of allocation failure in a subsequent patch, I think I'll move that part over there. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v9 2/4] uacce: add uacce driver
From: Kenneth Lee Uacce (Unified/User-space-access-intended Accelerator Framework) targets to provide Shared Virtual Addressing (SVA) between accelerators and processes. So accelerator can access any data structure of the main cpu. This differs from the data sharing between cpu and io device, which share only data content rather than address. Since unified address, hardware and user space of process can share the same virtual address in the communication. Uacce create a chrdev for every registration, the queue is allocated to the process when the chrdev is opened. Then the process can access the hardware resource by interact with the queue file. By mmap the queue file space to user space, the process can directly put requests to the hardware without syscall to the kernel space. The IOMMU core only tracks mm<->device bonds at the moment, because it only needs to handle IOTLB invalidation and PASID table entries. However uacce needs a finer granularity since multiple queues from the same device can be bound to an mm. When the mm exits, all bound queues must be stopped so that the IOMMU can safely clear the PASID table entry and reallocate the PASID. An intermediate struct uacce_mm links uacce devices and queues. Note that an mm may be bound to multiple devices but an uacce_mm structure only ever belongs to a single device, because we don't need anything more complex (if multiple devices are bound to one mm, then we'll create one uacce_mm for each bond). uacce_device --+-- uacce_mm --+-- uacce_queue | '-- uacce_queue | '-- uacce_mm --+-- uacce_queue +-- uacce_queue '-- uacce_queue Signed-off-by: Kenneth Lee Signed-off-by: Zaibo Xu Signed-off-by: Zhou Wang Signed-off-by: Jean-Philippe Brucker Signed-off-by: Zhangfei Gao --- Documentation/ABI/testing/sysfs-driver-uacce | 37 ++ drivers/misc/Kconfig | 1 + drivers/misc/Makefile| 1 + drivers/misc/uacce/Kconfig | 13 + drivers/misc/uacce/Makefile | 2 + drivers/misc/uacce/uacce.c | 627 +++ include/linux/uacce.h| 161 +++ include/uapi/misc/uacce/uacce.h | 38 ++ 8 files changed, 880 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-driver-uacce create mode 100644 drivers/misc/uacce/Kconfig create mode 100644 drivers/misc/uacce/Makefile create mode 100644 drivers/misc/uacce/uacce.c create mode 100644 include/linux/uacce.h create mode 100644 include/uapi/misc/uacce/uacce.h diff --git a/Documentation/ABI/testing/sysfs-driver-uacce b/Documentation/ABI/testing/sysfs-driver-uacce new file mode 100644 index 000..0fc6c957 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-driver-uacce @@ -0,0 +1,37 @@ +What: /sys/class/uacce//api +Date: Nov 2019 +KernelVersion: 5.5 +Contact:linux-accelerat...@lists.ozlabs.org +Description:Api of the device, no requirement of the format +Application use the api to match the correct driver + +What: /sys/class/uacce//flags +Date: Nov 2019 +KernelVersion: 5.5 +Contact:linux-accelerat...@lists.ozlabs.org +Description:Attributes of the device, see UACCE_DEV_xxx flag defined in uacce.h + +What: /sys/class/uacce//available_instances +Date: Nov 2019 +KernelVersion: 5.5 +Contact:linux-accelerat...@lists.ozlabs.org +Description:Available instances left of the device +Return -ENODEV if uacce_ops get_available_instances is not provided + +What: /sys/class/uacce//algorithms +Date: Nov 2019 +KernelVersion: 5.5 +Contact:linux-accelerat...@lists.ozlabs.org +Description:Algorithms supported by this accelerator, separated by new line. + +What: /sys/class/uacce//region_mmio_size +Date: Nov 2019 +KernelVersion: 5.5 +Contact:linux-accelerat...@lists.ozlabs.org +Description:Size (bytes) of mmio region queue file + +What: /sys/class/uacce//region_dus_size +Date: Nov 2019 +KernelVersion: 5.5 +Contact:linux-accelerat...@lists.ozlabs.org +Description:Size (bytes) of dus region queue file diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index c55b637..929feb0 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -481,4 +481,5 @@ source "drivers/misc/cxl/Kconfig" source "drivers/misc/ocxl/Kconfig" source "drivers/misc/cardreader/Kconfig" source "drivers/misc/habanalabs/Kconfig" +source "drivers/misc/uacce/Kconfig" endmenu diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index c1860d3..9abf292 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -56,4 +56,5 @@ obj-$(CONFIG_OCXL)+= ocxl/ obj-y
[PATCH v9 0/4] Add uacce module for Accelerator
Uacce (Unified/User-space-access-intended Accelerator Framework) targets to provide Shared Virtual Addressing (SVA) between accelerators and processes. So accelerator can access any data structure of the main cpu. This differs from the data sharing between cpu and io device, which share data content rather than address. Because of unified address, hardware and user space of process can share the same virtual address in the communication. Uacce is intended to be used with Jean Philippe Brucker's SVA patchset[1], which enables IO side page fault and PASID support. We have keep verifying with Jean's sva/current [2] We also keep verifying with Eric's SMMUv3 Nested Stage patch [3] This series and related zip & qm driver https://github.com/Linaro/linux-kernel-warpdrive/tree/5.4-rc4-uacce-v9 The library and user application: https://github.com/Linaro/warpdrive/tree/wdprd-upstream-v9 References: [1] http://jpbrucker.net/sva/ [2] http://www.linux-arm.org/git?p=linux-jpb.git;a=shortlog;h=refs/heads/sva/current [3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 Change History: v9: Suggested by Jonathan 1. Remove sysfs: numa_distance, node_id, id. 2. Split the api to solve the potential race struct uacce_device *uacce_alloc(struct device *parent, struct uacce_interface *interface) int uacce_register(struct uacce_device *uacce) void uacce_remove(struct uacce_device *uacce) 3. Split clean up patch 03 v8: Address some comments from Jonathan Merge Jean's patch, using uacce_mm instead of pid for sva_exit v7: As suggested by Jean and Jerome Only consider sva case and remove unused dma apis for the first patch. Also add mm_exit for sva and vm_ops.close etc v6: https://lkml.org/lkml/2019/10/16/231 Change sys qfrs_size to different file, suggested by Jonathan Fix crypto daily build issue and based on crypto code base, also 5.4-rc1. v5: https://lkml.org/lkml/2019/10/14/74 Add an example patch using the uacce interface, suggested by Greg 0003-crypto-hisilicon-register-zip-engine-to-uacce.patch v4: https://lkml.org/lkml/2019/9/17/116 Based on 5.4-rc1 Considering other driver integrating uacce, if uacce not compiled, uacce_register return error and uacce_unregister is empty. Simplify uacce flag: UACCE_DEV_SVA. Address Greg's comments: Fix state machine, remove potential syslog triggered from user space etc. v3: https://lkml.org/lkml/2019/9/2/990 Recommended by Greg, use sturct uacce_device instead of struct uacce, and use struct *cdev in struct uacce_device, as a result, cdev can be released by itself when refcount decreased to 0. So the two structures are decoupled and self-maintained by themsleves. Also add dev.release for put_device. v2: https://lkml.org/lkml/2019/8/28/565 Address comments from Greg and Jonathan Modify interface uacce_register Drop noiommu mode first v1: https://lkml.org/lkml/2019/8/14/277 1. Rebase to 5.3-rc1 2. Build on iommu interface 3. Verifying with Jean's sva and Eric's nested mode iommu. 4. User library has developed a lot: support zlib, openssl etc. 5. Move to misc first RFC3: https://lkml.org/lkml/2018/11/12/1951 RFC2: https://lwn.net/Articles/763990/ Background of why Uacce: Von Neumann processor is not good at general data manipulation. It is designed for control-bound rather than data-bound application. The latter need less control path facility and more/specific ALUs. So there are more and more heterogeneous processors, such as encryption/decryption accelerators, TPUs, or EDGE (Explicated Data Graph Execution) processors, introduced to gain better performance or power efficiency for particular applications these days. There are generally two ways to make use of these heterogeneous processors: The first is to make them co-processors, just like FPU. This is good for some application but it has its own cons: It changes the ISA set permanently. You must save all state elements when the process is switched out. But most data-bound processors have a huge set of state elements. It makes the kernel scheduler more complex. The second is Accelerator. It is taken as a IO device from the CPU's point of view (but it need not to be physically). The process, running on CPU, hold a context of the accelerator and send instructions to it as if it calls a function or thread running with FPU. The context is bound with the processor itself. So the state elements remain in the hardware context until the context is released. We believe this is the core feature of an "Accelerator" vs. Co-processor or other heterogeneous processors. The intention of Uacce is to provide the basic facility to backup this scenario. Its first step is to make sure the accelerator and process can share the same address space. So the accelerator ISA can directly address any data structure of the main CPU. This differs from the data sharing between CPU and IO device, which share data content rather than address. So it is different comparing to the other DMA libraries. In th
"Revisit iommu_insert_resv_region() implementation" causes use-after-free
Read files under /sys/kernel/iommu_groups/ triggers an use-after-free. Reverted the commit 4dbd258ff63e ("iommu: Revisit iommu_insert_resv_region() implementation") fixed the issue. /* no merge needed on elements of different types than @nr */ if (iter->type != nr->type) { list_move_tail(&iter->list, &stack); continue; [Â Â 160.156964][ T3100] BUG: KASAN: use-after-free in iommu_insert_resv_region+0x34b/0x520 [Â Â 160.197758][ T3100] Read of size 4 at addr 8887aba78464 by task cat/3100 [Â Â 160.230645][ T3100]Â [Â Â 160.240907][ T3100] CPU: 14 PID: 3100 Comm: cat Not tainted 5.4.0-rc8-next- 20191122+ #11 [Â Â 160.278671][ T3100] Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 [Â Â 160.320589][ T3100] Call Trace: [Â Â 160.335229][ T3100]Â Â dump_stack+0xa0/0xea [Â Â 160.354011][ T3100]Â Â print_address_description.constprop.5.cold.7+0x9/0x384 [Â Â 160.386569][ T3100]Â Â __kasan_report.cold.8+0x7a/0xc0 [Â Â 160.409811][ T3100]Â Â ? iommu_insert_resv_region+0x34b/0x520 [Â Â 160.435668][ T3100]Â Â kasan_report+0x12/0x20 [Â Â 160.455387][ T3100]Â Â __asan_load4+0x95/0xa0 [Â Â 160.474808][ T3100]Â Â iommu_insert_resv_region+0x34b/0x520 [Â Â 160.500228][ T3100]Â Â ? iommu_bus_notifier+0xe0/0xe0 [Â Â 160.522904][ T3100]Â Â ? intel_iommu_get_resv_regions+0x348/0x400 [Â Â 160.550461][ T3100]Â Â iommu_get_group_resv_regions+0x16d/0x2f0 [Â Â 160.577611][ T3100]Â Â ? iommu_insert_resv_region+0x520/0x520 [Â Â 160.603756][ T3100]Â Â ? register_lock_class+0x940/0x940 [Â Â 160.628265][ T3100]Â Â iommu_group_show_resv_regions+0x8d/0x1f0 [Â Â 160.655370][ T3100]Â Â ? iommu_get_group_resv_regions+0x2f0/0x2f0 [Â Â 160.684168][ T3100]Â Â iommu_group_attr_show+0x34/0x50 [Â Â 160.708395][ T3100]Â Â sysfs_kf_seq_show+0x11c/0x220 [Â Â 160.731758][ T3100]Â Â ? iommu_default_passthrough+0x20/0x20 [Â Â 160.756898][ T3100]Â Â kernfs_seq_show+0xa4/0xb0 [Â Â 160.777097][ T3100]Â Â seq_read+0x27e/0x710 [Â Â 160.795195][ T3100]Â Â kernfs_fop_read+0x7d/0x2c0 [Â Â 160.815349][ T3100]Â Â __vfs_read+0x50/0xa0 [Â Â 160.834154][ T3100]Â Â vfs_read+0xcb/0x1e0 [Â Â 160.852332][ T3100]Â Â ksys_read+0xc6/0x160 [Â Â 160.871028][ T3100]Â Â ? kernel_write+0xc0/0xc0 [Â Â 160.891307][ T3100]Â Â ? do_syscall_64+0x79/0xaec [Â Â 160.912446][ T3100]Â Â ? do_syscall_64+0x79/0xaec [Â Â 160.933640][ T3100]Â Â __x64_sys_read+0x43/0x50 [Â Â 160.953957][ T3100]Â Â do_syscall_64+0xcc/0xaec [Â Â 160.974322][ T3100]Â Â ? trace_hardirqs_on_thunk+0x1a/0x1c [Â Â 160.999130][ T3100]Â Â ? syscall_return_slowpath+0x580/0x580 [Â Â 161.024753][ T3100]Â Â ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe [Â Â 161.052416][ T3100]Â Â ? trace_hardirqs_off_caller+0x3a/0x150 [Â Â 161.078400][ T3100]Â Â ? trace_hardirqs_off_thunk+0x1a/0x1c [Â Â 161.103711][ T3100]Â Â entry_SYSCALL_64_after_hwframe+0x49/0xbe [Â Â 161.130793][ T3100] RIP: 0033:0x7f33e0d89d75 [Â Â 161.150732][ T3100] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 [Â Â 161.245503][ T3100] RSP: 002b:7fff88f0db88 EFLAGS: 0246 ORIG_RAX: [Â Â 161.284547][ T3100] RAX: ffda RBX: 0002 RCX: 7f33e0d89d75 [Â Â 161.321123][ T3100] RDX: 0002 RSI: 7f33e1201000 RDI: 0003 [Â Â 161.357617][ T3100] RBP: 7f33e1201000 R08: R09: [Â Â 161.394173][ T3100] R10: 0022 R11: 0246 R12: 7f33e1201000 [Â Â 161.430736][ T3100] R13: 0003 R14: 0fff R15: 0002 [Â Â 161.467337][ T3100]Â [Â Â 161.477529][ T3100] Allocated by task 3100: [Â Â 161.497133][ T3100]Â Â save_stack+0x21/0x90 [Â Â 161.515777][ T3100]Â Â __kasan_kmalloc.constprop.13+0xc1/0xd0 [Â Â 161.541743][ T3100]Â Â kasan_kmalloc+0x9/0x10 [Â Â 161.561330][ T3100]Â Â kmem_cache_alloc_trace+0x1f8/0x470 [Â Â 161.585949][ T3100]Â Â iommu_insert_resv_region+0xeb/0x520 [Â Â 161.610876][ T3100]Â Â iommu_get_group_resv_regions+0x16d/0x2f0 [Â Â 161.638318][ T3100]Â Â iommu_group_show_resv_regions+0x8d/0x1f0 [Â Â 161.665322][ T3100]Â Â iommu_group_attr_show+0x34/0x50 [Â Â 161.688526][ T3100]Â Â sysfs_kf_seq_show+0x11c/0x220 [Â Â 161.711992][ T3100]Â Â kernfs_seq_show+0xa4/0xb0 [Â Â 161.734252][ T3100]Â Â seq_read+0x27e/0x710 [Â Â 161.754412][ T3100]Â Â kernfs_fop_read+0x7d/0x2c0 [Â Â 161.775493][ T3100]Â Â __vfs_read+0x50/0xa0 [Â Â 161.794328][ T3100]Â Â vfs_read+0xcb/0x1e0 [Â Â 161.812559][ T3100]Â Â ksys_read+0xc6/0x160 [Â Â 161.831554][ T3100]Â Â __x64_sys_read+0x43/0x50 [Â Â 161.851772][ T3100]Â Â do_syscall_64+0xcc/0xaec [Â Â 161.872098][ T3100]Â Â entry_SYSCALL_64_after_hwframe+0x49/0xbe [Â Â 161.898919][ T3100]Â [Â Â 161.909113][ T3100] Freed by task 3100: [Â Â 161.927070][ T3100]Â Â save_stack+0x21/0x90 [Â Â 161.945711][ T3100]Â Â __kasan_slab_free+0x11c/0x170 [Â Â 161.968112][ T3100]Â Â kasan_slab_free+0xe/0x10 [Â Â 161.988601][ T3100]Â Â slab_free_freelist_hook+0x5f/0x1d0 [Â Â 162.012918][ T3100]Â Â kfree+0xe9/0x410 [Â Â 162.029454][ T3100]Â Â iommu_insert_resv_region+0x47d/0x520 [Â Â 162.053701][ T3100]Â Â iommu_get_group_resv_regions+
RE: [RFC PATCH] usb: gadget: f_tcm: Added DMA32 flag while allocation of command buffer
> > +Konrad > > You can run Linux with CONFIG_DEBU_DMA and use > debug_dma_dump_mappings() to dump and figure things out. See > https://urldefense.proofpoint.com/v2/url?u=http- > 3A__xenbits.xen.org_gitweb_-3Fp-3Dxentesttools_bootstrap.git-3Ba-3Dblob- > 3Bf-3Droot-5Fimage_drivers_dump_dump-5Fdma.c-3Bh- > 3D2ba251a2f8c36c24c68762b3e4c9f76ea178238f-3Bhb- > 3DHEAD&d=DwIFAg&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ- > _haXqY&r=yUzzs89gsBIbqjpopjycywmchLJKpKHDc_YLMD1daI0&m=uPhjYpgZJ > ovroCszC7ZGapdrx4F72MK7pqXmzpRyGmA&s=dSO49c- > githIzhiwgvwOt0m00M2trGWB3AnKL3OKpkw&e= > > > > > Jayshri, > > > > On 15/11/2019 12:14, Jayshri Dajiram Pawar wrote: > > > > > > There is a problem when function driver allocate memory for > > > > > > buffer used by DMA from outside dma_mask space. > > > > > > It appears during testing f_tcm driver with cdns3 controller. > > > > > > In the result cdns3 driver was not able to map virtual buffer to > > > > > > DMA. > > > > > > This fix should be improved depending on dma_mask associated > > > > > > with > > > > device. > > > > > > Adding GFP_DMA32 flag while allocationg command data buffer > > > > > > only for > > > > > > 32 bit controllers. > > > > > > > > > > Hi Jayshri, > > > > > > > > > > This issue should be fixed by setting DMA_MASK correctly for > > > > > controller, you can't limit user's memory region. At > > > > > usb_ep_queue, the UDC driver will call DMA MAP API, for Cadence, > > > > > it is > > > > usb_gadget_map_request_by_dev. > > > > > For the system without SMMU (IO-MMU), it will use swiotlb to > > > > > make sure the data buffer used for DMA transfer is within DMA > > > > > mask for controller, There is a reserved low memory region for > > > > > debounce buffer in swiotlb use case. > > > > > > > > > > > > > /** > > > >* struct usb_request - describes one i/o request > > > >* @buf: Buffer used for data. Always provide this; some controllers > > > >*only use PIO, or don't use DMA for some endpoints. > > > >* @dma: DMA address corresponding to 'buf'. If you don't set this > > > >*field, and the usb controller needs one, it is responsible > > > >*for mapping and unmapping the buffer. > > > > > > > >*/ > > > > > > > > So if dma is not set in the usb_request then controller driver is > > > > responsible to do a dma_map of the buffer pointed by 'buf' before it > attemps to do DMA. > > > > This should take care of DMA mask and swiotlb. > > > > > > > > This patch is not correct. > > > > > > > Hi Roger, > > > > > > We have scatter-gather disabled. > > > We are getting below error while allocation of cmd data buffer with > length 524288 or greater, while writing large size files to device. > > > This error occurred on x86 platform. > > > Because of this reason we have added DMA flag while allocation of > buffer. > > > > > > [ 1602.977532] swiotlb_tbl_map_single: 26 callbacks suppressed [ > > > 1602.977536] cdns-usb3 cdns-usb3.1: swiotlb buffer is full (sz: > > > 524288 bytes), total 32768 (slots), used 0 (slots) > > Hi Roger, > > Why is swiotlb buffer getting full? How much is it on your system? On our system swiotlb max mapping size is 256KB. UASP receive data state tries to queue and map buffer of length 524288 (512KB), which is greater than 256KB that's why swiotlb buffer is getting full. > > Are you sure that dma_unmap is happening on requests that complete? > else we'll just keep hogging the swiotlb buffer. Yes, dma_unmap is happening on requests that complete. I could map buffer of length 512KB with IO_TLB_SEGSIZE changed to 256. With this max mapping size is increased to 256*2048 = 512KB. +++ b/include/linux/swiotlb.h @@ -21,7 +21,7 @@ enum swiotlb_force { * must be a power of 2. What is the appropriate value ? * The complexity of {map,unmap}_single is linearly dependent on this value. */ -#define IO_TLB_SEGSIZE 128 +#define IO_TLB_SEGSIZE 256 Regards, Jayshri > > > > cheers, > > -roger > > > > > [ 1602.977542] cdns-usb3 cdns-usb3.1: overflow > > > 0x0007eee0+524288 of DMA mask bus mask 0 [ > > > 1602.977555] WARNING: CPU: 6 PID: 285 at kernel/dma/direct.c:43 > > > report_addr+0x37/0x60 [ 1602.977556] Modules linked in: > target_core_user uio target_core_pscsi target_core_file target_core_iblock > usb_f_tcm(OE) target_core_mod cdns3(OE) cdns3_pci_wrap(OE) roles(E) > libcomposite(OE) udc_core(OE) xt_multiport iptable_filter bpfilter > snd_hda_codec_hdmi nls_iso8859_1 i915 intel_rapl x86_pkg_temp_thermal > intel_powerclamp coretemp kvm_intel kvm snd_hda_codec_realtek > snd_hda_codec_generic ledtrig_audio snd_hda_intel irqbypass > snd_hda_codec snd_hda_core snd_hwdep snd_pcm drm_kms_helper > snd_seq_midi snd_seq_midi_event crct10dif_pclmul snd_rawmidi > crc32_pclmul drm snd_seq ghash_clmulni_intel snd_seq_device aesni_intel > snd_timer mei_me i2c_algo_bit aes_x86_64 crypto_simd cryptd fb_sys_fops > glue_helper snd mei input_leds syscopyarea intel_cstate sysfillrect > intel_rapl_perf sysimgblt hp
Re: [PATCH v2] iommu/iova: silence warnings under memory pressure
On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote: > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote: > > When running heavy memory pressure workloads, this 5+ old system is > > throwing endless warnings below because disk IO is too slow to recover > > from swapping. Since the volume from alloc_iova_fast() could be large, > > once it calls printk(), it will trigger disk IO (writing to the log > > files) and pending softirqs which could cause an infinite loop and make > > no progress for days by the ongoimng memory reclaim. This is the counter > > part for Intel where the AMD part has already been merged. See the > > commit 3d708895325b ("iommu/amd: Silence warnings under memory > > pressure"). Since the allocation failure will be reported in > > intel_alloc_iova(), so just call printk_ratelimted() there and silence > > the one in alloc_iova_mem() to avoid the expensive warn_alloc(). > > [] > > v2: use dev_err_ratelimited() and improve the commit messages. > > [] > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > [] > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct device > > *dev, > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, > >IOVA_PFN(dma_mask), true); > > if (unlikely(!iova_pfn)) { > > - dev_err(dev, "Allocating %ld-page iova failed", nrpages); > > + dev_err_ratelimited(dev, "Allocating %ld-page iova failed", > > + nrpages); > > Trivia: > > This should really have a \n termination on the format string > > dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n", > > Why do you say so? It is right now printing with a newline added anyway.  hpsa :03:00.0: DMAR: Allocating 1-page iova failed  hpsa :03:00.0: DMAR: Allocating 1-page iova failed  hpsa :03:00.0: DMAR: Allocating 1-page iova failed  hpsa :03:00.0: DMAR: Allocating 1-page iova failed  hpsa :03:00.0: DMAR: Allocating 1-page iova failed  hpsa :03:00.0: DMAR: Allocating 1-page iova failed  hpsa :03:00.0: DMAR: Allocating 1-page iova failed  hpsa :03:00.0: DMAR: Allocating 1-page iova failed ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of: property: Add device link support for "iommu-map"
[+Ard] Hi Rob, On Fri, Nov 22, 2019 at 08:47:46AM -0600, Rob Herring wrote: > On Wed, Nov 20, 2019 at 1:00 PM Will Deacon wrote: > > > > Commit 8e12257dead7 ("of: property: Add device link support for iommus, > > mboxes and io-channels") added device link support for IOMMU linkages > > described using the "iommus" property. For PCI devices, this property > > is not present and instead the "iommu-map" property is used on the host > > bridge node to map the endpoint RequesterIDs to their corresponding > > IOMMU instance. > > > > Add support for "iommu-map" to the device link supplier bindings so that > > probing of PCI devices can be deferred until after the IOMMU is > > available. > > > > Cc: Greg Kroah-Hartman > > Cc: Rob Herring > > Cc: Saravana Kannan > > Cc: Robin Murphy > > Signed-off-by: Will Deacon > > --- > > > > Applies against driver-core/driver-core-next. > > Tested on AMD Seattle (arm64). > > Guess that answers my question whether anyone uses Seattle with DT. > Seattle uses the old SMMU binding, and there's not even an IOMMU > associated with the PCI host. I raise this mainly because the dts > files for Seattle either need some love or perhaps should be removed. I'm using the new DT bindings on my Seattle, thanks to the firmware fairy (Ard) visiting my flat with a dediprog. The patches I've posted to enable modular builds of the arm-smmu driver require that the old binding is disabled [1]. > No issues with the patch itself though. I'll queue it after rc1. Thanks, although I think Greg has already queued it [2] due to the dependencies on other patches in his tree. Will [1] https://lore.kernel.org/lkml/20191121114918.2293-14-w...@kernel.org/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/commit/?h=driver-core-next&id=e149573b2f84d0517648dafc0db625afa681ed54 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] of: property: Add device link support for "iommu-map"
On Wed, Nov 20, 2019 at 1:00 PM Will Deacon wrote: > > Commit 8e12257dead7 ("of: property: Add device link support for iommus, > mboxes and io-channels") added device link support for IOMMU linkages > described using the "iommus" property. For PCI devices, this property > is not present and instead the "iommu-map" property is used on the host > bridge node to map the endpoint RequesterIDs to their corresponding > IOMMU instance. > > Add support for "iommu-map" to the device link supplier bindings so that > probing of PCI devices can be deferred until after the IOMMU is > available. > > Cc: Greg Kroah-Hartman > Cc: Rob Herring > Cc: Saravana Kannan > Cc: Robin Murphy > Signed-off-by: Will Deacon > --- > > Applies against driver-core/driver-core-next. > Tested on AMD Seattle (arm64). Guess that answers my question whether anyone uses Seattle with DT. Seattle uses the old SMMU binding, and there's not even an IOMMU associated with the PCI host. I raise this mainly because the dts files for Seattle either need some love or perhaps should be removed. No issues with the patch itself though. I'll queue it after rc1. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC 00/13] virtio-iommu on non-devicetree platforms
On Fri, Nov 22, 2019 at 11:49:47AM +0100, Jean-Philippe Brucker wrote: > I'm seeking feedback on multi-platform support for virtio-iommu. At the > moment only devicetree (DT) is supported and we don't have a pleasant > solution for other platforms. Once we figure out the topology > description, x86 support is trivial. > > Since the IOMMU manages memory accesses from other devices, the guest > kernel needs to initialize the IOMMU before endpoints start issuing DMA. > It's a solved problem: firmware or hypervisor describes through DT or > ACPI tables the device dependencies, and probe of endpoints is deferred > until the IOMMU is probed. But: > > (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT > for Arm). From my point of view IORT is easier to extend, since we > just need to introduce a new node type. There are no dependencies to > Arm in the Linux IORT driver, so it works well with CONFIG_X86. > > However, there are concerns about other OS vendors feeling obligated > to implement this new node, so Arm proposed introducing another ACPI > table, that can wrap any of DMAR, IVRS and IORT to extend it with > new virtual nodes. A draft of this VIOT table specification is > available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf > > I'm afraid this could increase fragmentation as guests would need to > implement or modify their support for all of DMAR, IVRS and IORT. If > we end up doing VIOT, I suggest limiting it to IORT. > > (2) In addition, there are some concerns about having virtio depend on > ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86 > [1]) power? > don't currently implement those methods. > > It was suggested to embed the topology description into the device. > It can work, as demonstrated at the end of this RFC, with the > following limitations: > > - The topology description must be read before any endpoint managed > by the IOMMU is probed, and even before the virtio module is > loaded. This RFC uses a PCI quirk to manually parse the virtio > configuration. It assumes that all endpoints managed by the IOMMU > are under this same PCI host. > > - I don't have a solution for the virtio-mmio transport at the > moment, because I haven't had time to modify a host to test it. I > think it could either use a notifier on the platform bus, or > better, a new 'iommu' command-line argument to the virtio-mmio > driver. A notifier seems easier for users. What are the disadvantages of that? > So the current prototype doesn't work for firecracker and > microvm, which rely on virtio-mmio. > > - For Arm, if the platform has an ITS, the hypervisor needs IORT or > DT to describe it anyway. More generally, not using either ACPI or > DT might prevent from supporting other features as well. I suspect > the above users will have to implement a standard method sooner or > later. > > - Even when reusing as much existing code as possible, guest support > is still going to be around a few hundred lines since we can't > rely on the normal virtio infrastructure to be loaded at that > point. As you can see below, the diffstat for the incomplete > topology implementation is already bigger than the exhaustive IORT > support, even when jumping through the VIOT hoop. > > So it's a lightweight solution for very specific use-cases, and we > should still support ACPI for the general case. Multi-platform > guests such as Linux will then need to support three topology > descriptions instead of two. > > In this RFC I present both solutions, but I'd rather not keep all of it. > Please see the individual patches for details: > > (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT > driver and patches 2, 11 add the VIOT glue. > > (2) Patch 12 adds the built-in topology description to the virtio-iommu > specification. Patch 13 is a partial implementation for the Linux > virtio-iommu driver. It only supports PCI, not platform devices. > > You can find Linux and QEMU code on my virtio-iommu/devel branches at > http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu > > > I split the diffstat since there are two independent features. The first > one is for patches 1-11, and the second one for patch 13. > > Jean-Philippe Brucker (11): > ACPI/IORT: Move IORT to the ACPI folder > ACPI: Add VIOT definitions > ACPI/IORT: Allow registration of external tables > ACPI/IORT: Add node categories > ACPI/IORT: Support VIOT virtio-mmio node > ACPI/IORT: Support VIOT virtio-pci node > ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode > ACPI/IORT: Add callback to update a device's fwnode > iommu/virtio: Create fwnode if necessary > iommu/virtio: Update IORT fwnode > ACPI: Add VIOT table
Re: [RFC 13/13] iommu/virtio: Add topology description to
On Fri, Nov 22, 2019 at 11:50:00AM +0100, Jean-Philippe Brucker wrote: > Some hypervisors don't implement either device-tree or ACPI, but still > need a method to describe the IOMMU topology. Read the virtio-iommu > config early and parse the topology description. Hook into the > dma_setup() callbacks to initialize the IOMMU before probing endpoints. > > If the virtio-iommu uses the virtio-pci transport, this will only work > if the PCI root complex is the first device probed. We don't currently > support virtio-mmio. > > Initially I tried to generate a fake IORT table and feed it to the IORT > driver, in order to avoid rewriting the whole DMA code, but it wouldn't > work with platform endpoints, which are references to items in the ACPI > table on IORT. > > Signed-off-by: Eric Auger > Signed-off-by: Jean-Philippe Brucker Overall this looks good to me. The only point is that I think the way the interface is designed makes writing the driver a bit too difficult. Idea: if instead we just have a length field and then an array of records (preferably unions so we don't need to work hard), we can shadow that into memory, then iterate over the unions. Maybe add a uniform record length + number of records field. Then just skip types you do not know how to handle. This will also help make sure it's within bounds. What do you think? You will need to do something to address the TODO I think. > --- > Note that we only call virt_dma_configure() if the host didn't provide > either DT or ACPI method. If you want to test this with QEMU, you'll > need to manually disable the acpi_dma_configure() part in pci-driver.c > --- > drivers/base/platform.c | 3 + > drivers/iommu/Kconfig | 9 + > drivers/iommu/Makefile| 1 + > drivers/iommu/virtio-iommu-topology.c | 410 ++ > drivers/iommu/virtio-iommu.c | 3 + > drivers/pci/pci-driver.c | 3 + > include/linux/virtio_iommu.h | 18 ++ > include/uapi/linux/virtio_iommu.h | 26 ++ > 8 files changed, 473 insertions(+) > create mode 100644 drivers/iommu/virtio-iommu-topology.c > create mode 100644 include/linux/virtio_iommu.h > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index b230beb6ccb4..70b12c8ef2fb 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include > > #include "base.h" > #include "power/power.h" > @@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev) > } else if (has_acpi_companion(dev)) { > attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode)); > ret = acpi_dma_configure(dev, attr); > + } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) { > + ret = virt_dma_configure(dev); > } > > return ret; > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index e6eb4f238d1a..d02c0d36019d 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -486,4 +486,13 @@ config VIRTIO_IOMMU > > Say Y here if you intend to run this kernel as a guest. > > +config VIRTIO_IOMMU_TOPOLOGY > + bool "Topology properties for the virtio-iommu" > + depends on VIRTIO_IOMMU > + help > + Enable early probing of the virtio-iommu device, to detect the > + topology description. > + > + Say Y here if you intend to run this kernel as a guest. > + > endif # IOMMU_SUPPORT > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 4f405f926e73..6b51c4186ebc 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o > obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o > obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o > diff --git a/drivers/iommu/virtio-iommu-topology.c > b/drivers/iommu/virtio-iommu-topology.c > new file mode 100644 > index ..ec22510ace3d > --- /dev/null > +++ b/drivers/iommu/virtio-iommu-topology.c > @@ -0,0 +1,410 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct viommu_cap_config { > + u8 pos; /* PCI capability position */ > + u8 bar; > + u32 length; /* structure size */ > + u32 offset; /* structure offset within the bar */ > +}; > + > +struct viommu_spec { > + struct device *dev; /* transport device */ > + struct fwnode_handle*fwnode; > + struct iommu_ops*ops; > + struct list_headtopology; > + struct list_headlist; > +}; > + > +struct viommu_topology { > + union { > + struct virtio_iommu_topo_head head; > + struct virtio_iommu_topo_pci_range pci; > + st
[RFC 05/13] ACPI/IORT: Support VIOT virtio-mmio node
Add a new type of node to the IORT driver, that describes a virtio-iommu device based on the virtio-mmio transport. The node is only available when the IORT is a sub-table of the VIOT. Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/iort.c | 66 ++--- 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index 1d43fbc0001f..adc5953fffa5 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -43,7 +43,8 @@ static bool iort_type_matches(u8 type, enum iort_node_category category) switch (category) { case IORT_IOMMU_TYPE: return type == ACPI_IORT_NODE_SMMU || - type == ACPI_IORT_NODE_SMMU_V3; + type == ACPI_IORT_NODE_SMMU_V3 || + type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU; case IORT_MSI_TYPE: return type == ACPI_IORT_NODE_ITS_GROUP; default: @@ -868,8 +869,10 @@ static inline bool iort_iommu_driver_enabled(u8 type) return IS_BUILTIN(CONFIG_ARM_SMMU_V3); case ACPI_IORT_NODE_SMMU: return IS_BUILTIN(CONFIG_ARM_SMMU); + case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: + return IS_ENABLED(CONFIG_VIRTIO_IOMMU); default: - pr_warn("IORT node type %u does not describe an SMMU\n", type); + pr_warn("IORT node type %u does not describe an IOMMU\n", type); return false; } } @@ -1408,6 +1411,46 @@ static int __init arm_smmu_v3_pmcg_add_platdata(struct platform_device *pdev) return platform_device_add_data(pdev, &model, sizeof(model)); } +static int __init viommu_mmio_count_resources(struct acpi_iort_node *node) +{ + /* Mem + IRQ */ + return 2; +} + +static void __init viommu_mmio_init_resources(struct resource *res, + struct acpi_iort_node *node) +{ + int hw_irq, trigger; + struct acpi_viot_iort_virtio_mmio_iommu *viommu; + + viommu = (struct acpi_viot_iort_virtio_mmio_iommu *)node->node_data; + + res[0].start = viommu->base_address; + res[0].end = viommu->base_address + viommu->span - 1; + res[0].flags = IORESOURCE_MEM; + + hw_irq = IORT_IRQ_MASK(viommu->interrupt); + trigger = IORT_IRQ_TRIGGER_MASK(viommu->interrupt); + acpi_iort_register_irq(hw_irq, "viommu", trigger, res + 1); +} + +static void __init viommu_mmio_dma_configure(struct device *dev, + struct acpi_iort_node *node) +{ + enum dev_dma_attr attr; + struct acpi_viot_iort_virtio_mmio_iommu *viommu; + + viommu = (struct acpi_viot_iort_virtio_mmio_iommu *)node->node_data; + + attr = (viommu->flags & ACPI_VIOT_IORT_VIRTIO_MMIO_IOMMU_CACHE_COHERENT) ? + DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT; + + dev->dma_mask = &dev->coherent_dma_mask; + + /* Configure DMA for the page table walker */ + acpi_dma_configure(dev, attr); +} + struct iort_dev_config { const char *name; int (*dev_init)(struct acpi_iort_node *node); @@ -1443,6 +1486,14 @@ static const struct iort_dev_config iort_arm_smmu_v3_pmcg_cfg __initconst = { .dev_add_platdata = arm_smmu_v3_pmcg_add_platdata, }; +static const struct iort_dev_config iort_viommu_mmio_cfg __initconst = { + /* Probe with the generic virtio-mmio driver */ + .name = "virtio-mmio", + .dev_dma_configure = viommu_mmio_dma_configure, + .dev_count_resources = viommu_mmio_count_resources, + .dev_init_resources = viommu_mmio_init_resources, +}; + static __init const struct iort_dev_config *iort_get_dev_cfg( struct acpi_iort_node *node) { @@ -1453,9 +1504,16 @@ static __init const struct iort_dev_config *iort_get_dev_cfg( return &iort_arm_smmu_cfg; case ACPI_IORT_NODE_PMCG: return &iort_arm_smmu_v3_pmcg_cfg; - default: - return NULL; } + + if (iort_table_source == IORT_SOURCE_VIOT) { + switch (node->type) { + case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: + return &iort_viommu_mmio_cfg; + } + } + + return NULL; } /** -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 07/13] ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode
When the IOMMU is PCI-based, IORT doesn't know the fwnode until the driver has had a chance to register it. In addition to deferring the probe until the IOMMU ops are set, also defer the probe until the fwspec is available. Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/iort.c | 54 ++--- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index b517aa4e83ba..f08f72d8af78 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -61,6 +61,22 @@ static bool iort_type_matches(u8 type, enum iort_node_category category) } } +static inline bool iort_iommu_driver_enabled(u8 type) +{ + switch (type) { + case ACPI_IORT_NODE_SMMU_V3: + return IS_BUILTIN(CONFIG_ARM_SMMU_V3); + case ACPI_IORT_NODE_SMMU: + return IS_BUILTIN(CONFIG_ARM_SMMU); + case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: + case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU: + return IS_ENABLED(CONFIG_VIRTIO_IOMMU); + default: + pr_warn("IORT node type %u does not describe an IOMMU\n", type); + return false; + } +} + /** * iort_set_fwnode() - Create iort_fwnode and use it to register *iommu data in the iort_fwnode_list @@ -102,9 +118,9 @@ static inline int iort_set_fwnode(struct acpi_iort_node *iort_node, * * Returns: fwnode_handle pointer on success, NULL on failure */ -static inline struct fwnode_handle *iort_get_fwnode( - struct acpi_iort_node *node) +static inline struct fwnode_handle *iort_get_fwnode(struct acpi_iort_node *node) { + int err = -ENODEV; struct iort_fwnode *curr; struct fwnode_handle *fwnode = NULL; @@ -112,12 +128,20 @@ static inline struct fwnode_handle *iort_get_fwnode( list_for_each_entry(curr, &iort_fwnode_list, list) { if (curr->iort_node == node) { fwnode = curr->fwnode; + if (!fwnode && curr->pci_devid) { + /* +* Postpone probe until virtio-iommu has +* registered its fwnode. +*/ + err = iort_iommu_driver_enabled(node->type) ? + -EPROBE_DEFER : -ENODEV; + } break; } } spin_unlock(&iort_fwnode_lock); - return fwnode; + return fwnode ?: ERR_PTR(err); } /** @@ -874,22 +898,6 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) return (resv == its->its_count) ? resv : -ENODEV; } -static inline bool iort_iommu_driver_enabled(u8 type) -{ - switch (type) { - case ACPI_IORT_NODE_SMMU_V3: - return IS_BUILTIN(CONFIG_ARM_SMMU_V3); - case ACPI_IORT_NODE_SMMU: - return IS_BUILTIN(CONFIG_ARM_SMMU); - case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: - case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU: - return IS_ENABLED(CONFIG_VIRTIO_IOMMU); - default: - pr_warn("IORT node type %u does not describe an IOMMU\n", type); - return false; - } -} - static int arm_smmu_iort_xlate(struct device *dev, u32 streamid, struct fwnode_handle *fwnode, const struct iommu_ops *ops) @@ -920,8 +928,8 @@ static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node, return -ENODEV; iort_fwnode = iort_get_fwnode(node); - if (!iort_fwnode) - return -ENODEV; + if (IS_ERR(iort_fwnode)) + return PTR_ERR(iort_fwnode); /* * If the ops look-up fails, this means that either @@ -1618,8 +1626,8 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node, fwnode = iort_get_fwnode(node); - if (!fwnode) { - ret = -ENODEV; + if (IS_ERR(fwnode)) { + ret = PTR_ERR(fwnode); goto dev_put; } -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 10/13] iommu/virtio: Update IORT fwnode
When the virtio-iommu uses the PCI transport and the topology is described with IORT, register the PCI fwnode with IORT. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/virtio-iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 8efa368134c0..9847552faecc 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -7,6 +7,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include #include #include #include @@ -989,6 +990,8 @@ static int viommu_set_fwnode(struct viommu_dev *viommu) set_primary_fwnode(dev, fwnode); } + /* Tell IORT about a PCI device's fwnode */ + iort_iommu_update_fwnode(dev, dev->fwnode); iommu_device_set_fwnode(&viommu->iommu, dev->fwnode); return 0; } @@ -1000,6 +1003,8 @@ static void viommu_clear_fwnode(struct viommu_dev *viommu) if (!dev->fwnode) return; + iort_iommu_update_fwnode(dev, NULL); + if (is_software_node(dev->fwnode)) { struct fwnode_handle *fwnode = dev->fwnode; -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 04/13] ACPI/IORT: Add node categories
The current node filtering won't work when introducing node types greater than 63 (such as the virtio-iommu nodes). Add node_type_matches() to filter nodes by category. Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/iort.c | 34 -- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index 9c6c91e06f8f..1d43fbc0001f 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -18,10 +18,10 @@ #include #include -#define IORT_TYPE_MASK(type) (1 << (type)) -#define IORT_MSI_TYPE (1 << ACPI_IORT_NODE_ITS_GROUP) -#define IORT_IOMMU_TYPE((1 << ACPI_IORT_NODE_SMMU) | \ - (1 << ACPI_IORT_NODE_SMMU_V3)) +enum iort_node_category { + IORT_MSI_TYPE, + IORT_IOMMU_TYPE, +}; struct iort_its_msi_chip { struct list_headlist; @@ -38,6 +38,20 @@ struct iort_fwnode { static LIST_HEAD(iort_fwnode_list); static DEFINE_SPINLOCK(iort_fwnode_lock); +static bool iort_type_matches(u8 type, enum iort_node_category category) +{ + switch (category) { + case IORT_IOMMU_TYPE: + return type == ACPI_IORT_NODE_SMMU || + type == ACPI_IORT_NODE_SMMU_V3; + case IORT_MSI_TYPE: + return type == ACPI_IORT_NODE_ITS_GROUP; + default: + WARN_ON(1); + return false; + } +} + /** * iort_set_fwnode() - Create iort_fwnode and use it to register *iommu data in the iort_fwnode_list @@ -397,7 +411,7 @@ static int iort_get_id_mapping_index(struct acpi_iort_node *node) static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, u32 id_in, u32 *id_out, - u8 type_mask) + enum iort_node_category category) { u32 id = id_in; @@ -406,7 +420,7 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, struct acpi_iort_id_mapping *map; int i, index; - if (IORT_TYPE_MASK(node->type) & type_mask) { + if (iort_type_matches(node->type, category)) { if (id_out) *id_out = id; return node; @@ -458,8 +472,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, } static struct acpi_iort_node *iort_node_map_platform_id( - struct acpi_iort_node *node, u32 *id_out, u8 type_mask, - int index) + struct acpi_iort_node *node, u32 *id_out, + enum iort_node_category category, int index) { struct acpi_iort_node *parent; u32 id; @@ -475,8 +489,8 @@ static struct acpi_iort_node *iort_node_map_platform_id( * as NC (named component) -> SMMU -> ITS. If the type is matched, * return the initial dev id and its parent pointer directly. */ - if (!(IORT_TYPE_MASK(parent->type) & type_mask)) - parent = iort_node_map_id(parent, id, id_out, type_mask); + if (!iort_type_matches(parent->type, category)) + parent = iort_node_map_id(parent, id, id_out, category); else if (id_out) *id_out = id; -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 09/13] iommu/virtio: Create fwnode if necessary
The presence of a fwnode on a PCI device depends on the platform. QEMU q35, for example, creates an ACPI description for each PCI slot, but QEMU virt (aarch64) doesn't. Since the IOMMU subsystem relies heavily on fwnode to discover the DMA topology, create a fwnode for the virtio-iommu if necessary, using the software_node framework. Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/virtio-iommu.c | 56 1 file changed, 51 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 3ea9d7682999..8efa368134c0 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -966,6 +966,48 @@ static struct iommu_ops viommu_ops = { .of_xlate = viommu_of_xlate, }; +static int viommu_set_fwnode(struct viommu_dev *viommu) +{ + /* +* viommu->dev is the virtio device, its parent is the associated +* transport device. +*/ + struct device *dev = viommu->dev->parent; + + /* +* With device tree a fwnode is always present. With ACPI, on some +* platforms a PCI device has a DSDT node describing the slot. On other +* platforms, no fwnode is created and we have to do it ourselves. +*/ + if (!dev->fwnode) { + struct fwnode_handle *fwnode; + + fwnode = fwnode_create_software_node(NULL, NULL); + if (IS_ERR(fwnode)) + return PTR_ERR(fwnode); + + set_primary_fwnode(dev, fwnode); + } + + iommu_device_set_fwnode(&viommu->iommu, dev->fwnode); + return 0; +} + +static void viommu_clear_fwnode(struct viommu_dev *viommu) +{ + struct device *dev = viommu->dev->parent; + + if (!dev->fwnode) + return; + + if (is_software_node(dev->fwnode)) { + struct fwnode_handle *fwnode = dev->fwnode; + + set_primary_fwnode(dev, NULL); + fwnode_remove_software_node(fwnode); + } +} + static int viommu_init_vqs(struct viommu_dev *viommu) { struct virtio_device *vdev = dev_to_virtio(viommu->dev); @@ -1004,7 +1046,6 @@ static int viommu_fill_evtq(struct viommu_dev *viommu) static int viommu_probe(struct virtio_device *vdev) { - struct device *parent_dev = vdev->dev.parent; struct viommu_dev *viommu = NULL; struct device *dev = &vdev->dev; u64 input_start = 0; @@ -1084,9 +1125,11 @@ static int viommu_probe(struct virtio_device *vdev) if (ret) goto err_free_vqs; - iommu_device_set_ops(&viommu->iommu, &viommu_ops); - iommu_device_set_fwnode(&viommu->iommu, parent_dev->fwnode); + ret = viommu_set_fwnode(viommu); + if (ret) + goto err_sysfs_remove; + iommu_device_set_ops(&viommu->iommu, &viommu_ops); iommu_device_register(&viommu->iommu); #ifdef CONFIG_PCI @@ -1119,8 +1162,10 @@ static int viommu_probe(struct virtio_device *vdev) return 0; err_unregister: - iommu_device_sysfs_remove(&viommu->iommu); iommu_device_unregister(&viommu->iommu); + viommu_clear_fwnode(viommu); +err_sysfs_remove: + iommu_device_sysfs_remove(&viommu->iommu); err_free_vqs: vdev->config->del_vqs(vdev); @@ -1131,8 +1176,9 @@ static void viommu_remove(struct virtio_device *vdev) { struct viommu_dev *viommu = vdev->priv; - iommu_device_sysfs_remove(&viommu->iommu); iommu_device_unregister(&viommu->iommu); + viommu_clear_fwnode(viommu); + iommu_device_sysfs_remove(&viommu->iommu); /* Stop all virtqueues */ vdev->config->reset(vdev); -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 01/13] ACPI/IORT: Move IORT to the ACPI folder
IORT can be used (by QEMU) to describe a virtual topology containing an architecture-agnostic paravirtualized device. In order to build IORT for x86 systems, the driver has to be moved outside of arm64/. Since there is nothing specific to arm64 in the driver, it simply requires moving Makefile and Kconfig entries. Signed-off-by: Jean-Philippe Brucker --- MAINTAINERS | 9 + drivers/acpi/Kconfig| 3 +++ drivers/acpi/Makefile | 1 + drivers/acpi/arm64/Kconfig | 3 --- drivers/acpi/arm64/Makefile | 1 - drivers/acpi/{arm64 => }/iort.c | 0 6 files changed, 13 insertions(+), 4 deletions(-) rename drivers/acpi/{arm64 => }/iort.c (100%) diff --git a/MAINTAINERS b/MAINTAINERS index eb19fad370d7..9153d278f67e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -377,6 +377,15 @@ L: platform-driver-...@vger.kernel.org S: Maintained F: drivers/platform/x86/i2c-multi-instantiate.c +ACPI IORT DRIVER +M: Lorenzo Pieralisi +M: Hanjun Guo +M: Sudeep Holla +L: linux-a...@vger.kernel.org +L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) +S: Maintained +F: drivers/acpi/iort.c + ACPI PMIC DRIVERS M: "Rafael J. Wysocki" M: Len Brown diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index ebe1e9e5fd81..548976c8b2b0 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -576,6 +576,9 @@ config TPS68470_PMIC_OPREGION region, which must be available before any of the devices using this, are probed. +config ACPI_IORT + bool + endif # ACPI config X86_PM_TIMER diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 5d361e4e3405..9d1792165713 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -123,3 +123,4 @@ video-objs += acpi_video.o video_detect.o obj-y += dptf/ obj-$(CONFIG_ARM64)+= arm64/ +obj-$(CONFIG_ACPI_IORT)+= iort.o diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig index 6dba187f4f2e..d0902c85d46e 100644 --- a/drivers/acpi/arm64/Kconfig +++ b/drivers/acpi/arm64/Kconfig @@ -3,8 +3,5 @@ # ACPI Configuration for ARM64 # -config ACPI_IORT - bool - config ACPI_GTDT bool diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index 6ff50f4ed947..38771a816caf 100644 --- a/drivers/acpi/arm64/Makefile +++ b/drivers/acpi/arm64/Makefile @@ -1,3 +1,2 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_ACPI_IORT)+= iort.o obj-$(CONFIG_ACPI_GTDT)+= gtdt.o diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/iort.c similarity index 100% rename from drivers/acpi/arm64/iort.c rename to drivers/acpi/iort.c -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 00/13] virtio-iommu on non-devicetree platforms
I'm seeking feedback on multi-platform support for virtio-iommu. At the moment only devicetree (DT) is supported and we don't have a pleasant solution for other platforms. Once we figure out the topology description, x86 support is trivial. Since the IOMMU manages memory accesses from other devices, the guest kernel needs to initialize the IOMMU before endpoints start issuing DMA. It's a solved problem: firmware or hypervisor describes through DT or ACPI tables the device dependencies, and probe of endpoints is deferred until the IOMMU is probed. But: (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT for Arm). From my point of view IORT is easier to extend, since we just need to introduce a new node type. There are no dependencies to Arm in the Linux IORT driver, so it works well with CONFIG_X86. However, there are concerns about other OS vendors feeling obligated to implement this new node, so Arm proposed introducing another ACPI table, that can wrap any of DMAR, IVRS and IORT to extend it with new virtual nodes. A draft of this VIOT table specification is available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf I'm afraid this could increase fragmentation as guests would need to implement or modify their support for all of DMAR, IVRS and IORT. If we end up doing VIOT, I suggest limiting it to IORT. (2) In addition, there are some concerns about having virtio depend on ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86 [1]) don't currently implement those methods. It was suggested to embed the topology description into the device. It can work, as demonstrated at the end of this RFC, with the following limitations: - The topology description must be read before any endpoint managed by the IOMMU is probed, and even before the virtio module is loaded. This RFC uses a PCI quirk to manually parse the virtio configuration. It assumes that all endpoints managed by the IOMMU are under this same PCI host. - I don't have a solution for the virtio-mmio transport at the moment, because I haven't had time to modify a host to test it. I think it could either use a notifier on the platform bus, or better, a new 'iommu' command-line argument to the virtio-mmio driver. So the current prototype doesn't work for firecracker and microvm, which rely on virtio-mmio. - For Arm, if the platform has an ITS, the hypervisor needs IORT or DT to describe it anyway. More generally, not using either ACPI or DT might prevent from supporting other features as well. I suspect the above users will have to implement a standard method sooner or later. - Even when reusing as much existing code as possible, guest support is still going to be around a few hundred lines since we can't rely on the normal virtio infrastructure to be loaded at that point. As you can see below, the diffstat for the incomplete topology implementation is already bigger than the exhaustive IORT support, even when jumping through the VIOT hoop. So it's a lightweight solution for very specific use-cases, and we should still support ACPI for the general case. Multi-platform guests such as Linux will then need to support three topology descriptions instead of two. In this RFC I present both solutions, but I'd rather not keep all of it. Please see the individual patches for details: (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT driver and patches 2, 11 add the VIOT glue. (2) Patch 12 adds the built-in topology description to the virtio-iommu specification. Patch 13 is a partial implementation for the Linux virtio-iommu driver. It only supports PCI, not platform devices. You can find Linux and QEMU code on my virtio-iommu/devel branches at http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu I split the diffstat since there are two independent features. The first one is for patches 1-11, and the second one for patch 13. Jean-Philippe Brucker (11): ACPI/IORT: Move IORT to the ACPI folder ACPI: Add VIOT definitions ACPI/IORT: Allow registration of external tables ACPI/IORT: Add node categories ACPI/IORT: Support VIOT virtio-mmio node ACPI/IORT: Support VIOT virtio-pci node ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode ACPI/IORT: Add callback to update a device's fwnode iommu/virtio: Create fwnode if necessary iommu/virtio: Update IORT fwnode ACPI: Add VIOT table MAINTAINERS | 9 + drivers/acpi/Kconfig| 7 + drivers/acpi/Makefile | 2 + drivers/acpi/arm64/Kconfig | 3 - drivers/acpi/arm64/Makefile | 1 - drivers/acpi/bus.c | 2 + drivers/acpi/{arm64 => }/iort.c | 317 ++-- drivers/acpi/tables.c | 2 +- driv
[RFC 03/13] ACPI/IORT: Allow registration of external tables
Add a function to register an IORT table from an external source. Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/iort.c | 22 -- include/linux/acpi_iort.h | 10 ++ 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index d62a9ea26fae..9c6c91e06f8f 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -144,6 +144,7 @@ typedef acpi_status (*iort_find_node_callback) /* Root pointer to the mapped IORT table */ static struct acpi_table_header *iort_table; +static enum iort_table_source iort_table_source; static LIST_HEAD(iort_msi_chip_list); static DEFINE_SPINLOCK(iort_msi_chip_lock); @@ -1617,11 +1618,28 @@ static void __init iort_init_platform_devices(void) } } +void __init acpi_iort_register_table(struct acpi_table_header *table, +enum iort_table_source source) +{ + /* +* Firmware or hypervisor should know better than give us two IORT +* tables. +*/ + if (WARN_ON(iort_table)) + return; + + iort_table = table; + iort_table_source = source; + + iort_init_platform_devices(); +} + void __init acpi_iort_init(void) { acpi_status status; + static struct acpi_table_header *table; - status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table); + status = acpi_get_table(ACPI_SIG_IORT, 0, &table); if (ACPI_FAILURE(status)) { if (status != AE_NOT_FOUND) { const char *msg = acpi_format_exception(status); @@ -1632,5 +1650,5 @@ void __init acpi_iort_init(void) return; } - iort_init_platform_devices(); + acpi_iort_register_table(table, IORT_SOURCE_IORT); } diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index 8e7e2ec37f1b..f4db5fff07cf 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -11,6 +11,11 @@ #include #include +enum iort_table_source { + IORT_SOURCE_IORT, /* The Real Thing */ + IORT_SOURCE_VIOT, /* Paravirtual extensions */ +}; + #define IORT_IRQ_MASK(irq) (irq & 0xULL) #define IORT_IRQ_TRIGGER_MASK(irq) ((irq >> 32) & 0xULL) @@ -27,6 +32,8 @@ int iort_register_domain_token(int trans_id, phys_addr_t base, void iort_deregister_domain_token(int trans_id); struct fwnode_handle *iort_find_domain_token(int trans_id); #ifdef CONFIG_ACPI_IORT +void acpi_iort_register_table(struct acpi_table_header *table, + enum iort_table_source source); void acpi_iort_init(void); u32 iort_msi_map_rid(struct device *dev, u32 req_id); struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id); @@ -37,6 +44,9 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size); const struct iommu_ops *iort_iommu_configure(struct device *dev); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); #else +static void acpi_iort_register_table(struct acpi_table_header *table, +enum iort_table_source source) +{ } static inline void acpi_iort_init(void) { } static inline u32 iort_msi_map_rid(struct device *dev, u32 req_id) { return req_id; } -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC virtio 12/13] virtio-iommu: Add built-in topology description
Add a lightweight method to describe the IOMMU topology in the config space, guarded by a new feature bit. A list of capabilities in the config space describes the devices managed by the IOMMU and their endpoint IDs. Signed-off-by: Jean-Philippe Brucker --- virtio-iommu.tex | 88 1 file changed, 88 insertions(+) diff --git a/virtio-iommu.tex b/virtio-iommu.tex index 28c562b..2b29873 100644 --- a/virtio-iommu.tex +++ b/virtio-iommu.tex @@ -67,6 +67,9 @@ \subsection{Feature bits}\label{sec:Device Types / IOMMU Device / Feature bits} \item[VIRTIO_IOMMU_F_MMIO (5)] The VIRTIO_IOMMU_MAP_F_MMIO flag is available. + +\item[VIRTIO_IOMMU_F_TOPOLOGY (6)] + Topology description is available at \field{topo_offset}. \end{description} \drivernormative{\subsubsection}{Feature bits}{Device Types / IOMMU Device / Feature bits} @@ -97,6 +100,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / IOMMU Device / le32 end; } domain_range; le32 probe_size; + le16 topo_offset; }; \end{lstlisting} @@ -141,6 +145,90 @@ \subsection{Device initialization}\label{sec:Device Types / IOMMU Device / Devic If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the device SHOULD NOT let endpoints access the guest-physical address space. +\subsubsection{Built-in topology description}\label{sec:Device Types / IOMMU Device / Device initialization / topology} + +The device manages memory accesses from endpoints, identified by endpoint +IDs. The driver can discover which endpoint ID corresponds to an endpoint +using several methods, depending on the platform. Platforms described +with device tree use the \texttt{iommus} and \texttt{iommu-map} properties +embedded into device nodes for this purpose. Platforms described with +ACPI use a table such as the Virtual I/O Table. Platforms that do not +support either device tree or ACPI may embed a minimalistic description +in the device configuration space. + +An important disadvantage of describing the topology from within the +device is the lack of initialization ordering information. Out-of-band +descriptions such as device tree and ACPI let the operating system know +about device dependencies so that it can initialize supplier devices +(IOMMUs) before their consumers (endpoints). Platforms using the +VIRTIO_IOMMU_F_TOPOLOGY feature have to communicate the device dependency +in another way. + +If the VIRTIO_IOMMU_F_TOPOLOGY feature is negotiated, \field{topo_offset} +is the offset between the beginning of the device-specific configuration +space (virtio_iommu_config) and the first topology structure header. A +topology structures defines the endpoint ID of one or more endpoints +managed by the virtio-iommu device. + +\begin{lstlisting} +struct virtio_iommu_topo_head { + le16 type; + le16 next; +}; +\end{lstlisting} + +\field{next} is the offset between the beginning of the device-specific +configuration space and the next topology structure header. When +\field{next} is zero, this is the last structure. + +\field{type} describes the type of structure: +\begin{description} + \item[VIRTIO_IOMMU_TOPO_PCI_RANGE (0)] struct virtio_iommu_topo_pci_range + \item[VIRTIO_IOMMU_TOPO_ENDPOINT (1)] struct virtio_iommu_topo_endpoint +\end{description} + +\paragraph{PCI range}\label{sec:Device Types / IOMMU Device / Device initialization / topology / PCI range} + +\begin{lstlisting} +struct virtio_iommu_topo_pci_range { + struct virtio_iommu_topo_head head; + le32 endpoint_start; + le16 hierarchy; + le16 requester_start; + le16 requester_end; + le16 reserved; +}; +\end{lstlisting} + +The PCI range structure describes the endpoint IDs of a series of PCI +devices. + +\begin{description} + \item[\field{hierarchy}] Identifier of the PCI hierarchy. Sometimes +called PCI segment or domain number. + \item[\field{requester_start}] First requester ID in the range. + \item[\field{requester_end}] Last requester ID in the range. + \item[\field{endpoint_start}] First endpoint ID. +\end{description} + +The correspondence between a PCI requester ID in the range +[ requester_start; requester_end ] and its endpoint IDs is a linear +transformation: endpoint_id = requester_id - requester_start + +endpoint_start. + +\paragraph{Single endpoint}\label{sec:Device Types / IOMMU Device / Device initialization / topology / Single endpoint} + +\begin{lstlisting} +struct virtio_iommu_topo_endpoint { + struct virtio_iommu_topo_head head; + le32 endpoint; + le64 address; +}; +\end{lstlisting} + +\field{endpoint} is the ID of a single endpoint, identified by its first +MMIO address in the physical address space. + \subsection{Device operations}\label{sec:Device Types / IOMMU Device / Device operations} Driver send requests on the request virtqueue, notifies the device and -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfou
[RFC 02/13] ACPI: Add VIOT definitions
This is temporary, until the VIOT table is published and these definitions added to ACPICA. Signed-off-by: Jean-Philippe Brucker --- include/acpi/actbl2.h | 31 +++ 1 file changed, 31 insertions(+) diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h index e45ced27f4c3..99c1d747e9d8 100644 --- a/include/acpi/actbl2.h +++ b/include/acpi/actbl2.h @@ -25,6 +25,7 @@ * the wrong signature. */ #define ACPI_SIG_IORT "IORT" /* IO Remapping Table */ +#define ACPI_SIG_VIOT "VIOT" /* Virtual I/O Table */ #define ACPI_SIG_IVRS "IVRS" /* I/O Virtualization Reporting Structure */ #define ACPI_SIG_LPIT "LPIT" /* Low Power Idle Table */ #define ACPI_SIG_MADT "APIC" /* Multiple APIC Description Table */ @@ -412,6 +413,36 @@ struct acpi_ivrs_memory { u64 memory_length; }; +/*** + * + * VIOT - Virtual I/O Table + *Version 1 + * + **/ + +struct acpi_table_viot { + struct acpi_table_header header; + u8 reserved[12]; + struct acpi_table_header base_table; +}; + +#define ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU0x80 +#define ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU 0x81 + +struct acpi_viot_iort_virtio_pci_iommu { + u32 devid; +}; + +struct acpi_viot_iort_virtio_mmio_iommu { + u64 base_address; + u64 span; + u64 flags; + u64 interrupt; +}; + +/* FIXME: rename this monstrosity. */ +#define ACPI_VIOT_IORT_VIRTIO_MMIO_IOMMU_CACHE_COHERENT (1<<0) + /*** * * LPIT - Low Power Idle Table -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 06/13] ACPI/IORT: Support VIOT virtio-pci node
When virtio-iommu uses the PCI transport, IORT doesn't instantiate the device and doesn't create a fwnode. They will be created later by the PCI subsystem. Store the information needed to identify the IOMMU in iort_fwnode_list. Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/iort.c | 117 +++- 1 file changed, 93 insertions(+), 24 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index adc5953fffa5..b517aa4e83ba 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -30,10 +30,17 @@ struct iort_its_msi_chip { u32 translation_id; }; +struct iort_pci_devid { + u16 segment; + u8 bus; + u8 devfn; +}; + struct iort_fwnode { struct list_head list; struct acpi_iort_node *iort_node; struct fwnode_handle *fwnode; + struct iort_pci_devid *pci_devid; }; static LIST_HEAD(iort_fwnode_list); static DEFINE_SPINLOCK(iort_fwnode_lock); @@ -44,7 +51,8 @@ static bool iort_type_matches(u8 type, enum iort_node_category category) case IORT_IOMMU_TYPE: return type == ACPI_IORT_NODE_SMMU || type == ACPI_IORT_NODE_SMMU_V3 || - type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU; + type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU || + type == ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU; case IORT_MSI_TYPE: return type == ACPI_IORT_NODE_ITS_GROUP; default: @@ -59,12 +67,14 @@ static bool iort_type_matches(u8 type, enum iort_node_category category) * * @node: IORT table node associated with the IOMMU * @fwnode: fwnode associated with the IORT node + * @pci_devid: pci device ID associated with the IORT node, may be NULL * * Returns: 0 on success * <0 on failure */ static inline int iort_set_fwnode(struct acpi_iort_node *iort_node, - struct fwnode_handle *fwnode) + struct fwnode_handle *fwnode, + struct iort_pci_devid *pci_devid) { struct iort_fwnode *np; @@ -76,6 +86,7 @@ static inline int iort_set_fwnode(struct acpi_iort_node *iort_node, INIT_LIST_HEAD(&np->list); np->iort_node = iort_node; np->fwnode = fwnode; + np->pci_devid = pci_devid; spin_lock(&iort_fwnode_lock); list_add_tail(&np->list, &iort_fwnode_list); @@ -121,6 +132,7 @@ static inline void iort_delete_fwnode(struct acpi_iort_node *node) spin_lock(&iort_fwnode_lock); list_for_each_entry_safe(curr, tmp, &iort_fwnode_list, list) { if (curr->iort_node == node) { + kfree(curr->pci_devid); list_del(&curr->list); kfree(curr); break; @@ -870,6 +882,7 @@ static inline bool iort_iommu_driver_enabled(u8 type) case ACPI_IORT_NODE_SMMU: return IS_BUILTIN(CONFIG_ARM_SMMU); case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: + case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU: return IS_ENABLED(CONFIG_VIRTIO_IOMMU); default: pr_warn("IORT node type %u does not describe an IOMMU\n", type); @@ -1451,6 +1464,28 @@ static void __init viommu_mmio_dma_configure(struct device *dev, acpi_dma_configure(dev, attr); } +static __init struct iort_pci_devid * +viommu_pci_get_devid(struct acpi_iort_node *node) +{ + unsigned int val; + struct iort_pci_devid *devid; + struct acpi_viot_iort_virtio_pci_iommu *viommu; + + viommu = (struct acpi_viot_iort_virtio_pci_iommu *)node->node_data; + + val = le32_to_cpu(viommu->devid); + + devid = kzalloc(sizeof(*devid), GFP_KERNEL); + if (!devid) + return ERR_PTR(-ENOMEM); + + devid->segment = val >> 16; + devid->bus = PCI_BUS_NUM(val); + devid->devfn = val & 0xff; + + return devid; +} + struct iort_dev_config { const char *name; int (*dev_init)(struct acpi_iort_node *node); @@ -1462,6 +1497,7 @@ struct iort_dev_config { int (*dev_set_proximity)(struct device *dev, struct acpi_iort_node *node); int (*dev_add_platdata)(struct platform_device *pdev); + struct iort_pci_devid *(*dev_get_pci_devid)(struct acpi_iort_node *node); }; static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = { @@ -1494,6 +1530,10 @@ static const struct iort_dev_config iort_viommu_mmio_cfg __initconst = { .dev_init_resources = viommu_mmio_init_resources, }; +static const struct iort_dev_config iort_viommu_pci_cfg __initconst = { + .dev_get_pci_devid = viommu_pci_get_devid, +}; + static __init const struct iort_dev_config *iort_get_dev_cfg( struct acpi_iort_node *node) { @@ -1510,6 +1550,8 @@ static __init const struct iort_de
[RFC 13/13] iommu/virtio: Add topology description to
Some hypervisors don't implement either device-tree or ACPI, but still need a method to describe the IOMMU topology. Read the virtio-iommu config early and parse the topology description. Hook into the dma_setup() callbacks to initialize the IOMMU before probing endpoints. If the virtio-iommu uses the virtio-pci transport, this will only work if the PCI root complex is the first device probed. We don't currently support virtio-mmio. Initially I tried to generate a fake IORT table and feed it to the IORT driver, in order to avoid rewriting the whole DMA code, but it wouldn't work with platform endpoints, which are references to items in the ACPI table on IORT. Signed-off-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- Note that we only call virt_dma_configure() if the host didn't provide either DT or ACPI method. If you want to test this with QEMU, you'll need to manually disable the acpi_dma_configure() part in pci-driver.c --- drivers/base/platform.c | 3 + drivers/iommu/Kconfig | 9 + drivers/iommu/Makefile| 1 + drivers/iommu/virtio-iommu-topology.c | 410 ++ drivers/iommu/virtio-iommu.c | 3 + drivers/pci/pci-driver.c | 3 + include/linux/virtio_iommu.h | 18 ++ include/uapi/linux/virtio_iommu.h | 26 ++ 8 files changed, 473 insertions(+) create mode 100644 drivers/iommu/virtio-iommu-topology.c create mode 100644 include/linux/virtio_iommu.h diff --git a/drivers/base/platform.c b/drivers/base/platform.c index b230beb6ccb4..70b12c8ef2fb 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -27,6 +27,7 @@ #include #include #include +#include #include "base.h" #include "power/power.h" @@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev) } else if (has_acpi_companion(dev)) { attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode)); ret = acpi_dma_configure(dev, attr); + } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) { + ret = virt_dma_configure(dev); } return ret; diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index e6eb4f238d1a..d02c0d36019d 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -486,4 +486,13 @@ config VIRTIO_IOMMU Say Y here if you intend to run this kernel as a guest. +config VIRTIO_IOMMU_TOPOLOGY + bool "Topology properties for the virtio-iommu" + depends on VIRTIO_IOMMU + help + Enable early probing of the virtio-iommu device, to detect the + topology description. + + Say Y here if you intend to run this kernel as a guest. + endif # IOMMU_SUPPORT diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 4f405f926e73..6b51c4186ebc 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o diff --git a/drivers/iommu/virtio-iommu-topology.c b/drivers/iommu/virtio-iommu-topology.c new file mode 100644 index ..ec22510ace3d --- /dev/null +++ b/drivers/iommu/virtio-iommu-topology.c @@ -0,0 +1,410 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct viommu_cap_config { + u8 pos; /* PCI capability position */ + u8 bar; + u32 length; /* structure size */ + u32 offset; /* structure offset within the bar */ +}; + +struct viommu_spec { + struct device *dev; /* transport device */ + struct fwnode_handle*fwnode; + struct iommu_ops*ops; + struct list_headtopology; + struct list_headlist; +}; + +struct viommu_topology { + union { + struct virtio_iommu_topo_head head; + struct virtio_iommu_topo_pci_range pci; + struct virtio_iommu_topo_endpoint ep; + }; + /* Index into viommu_spec->topology */ + struct list_head list; +}; + +static LIST_HEAD(viommus); +static DEFINE_MUTEX(viommus_lock); + +#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field) + +static inline int viommu_find_capability(struct pci_dev *dev, u8 cfg_type, +struct viommu_cap_config *cap) +{ + int pos; + u8 bar; + + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR); +pos > 0; +pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) { + u8 type; + + pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), &type); + if (type != cfg_type) + continue; + + pci_read_config_byte(dev, pos
[RFC 08/13] ACPI/IORT: Add callback to update a device's fwnode
For a PCI-based IOMMU, IORT isn't in charge of allocating a fwnode. Let the IOMMU driver update the fwnode associated to an IORT node when available. Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/iort.c | 38 ++ include/linux/acpi_iort.h | 4 2 files changed, 42 insertions(+) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index f08f72d8af78..8263ab275b2b 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -1038,11 +1038,49 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) return ops; } + +/** + * iort_iommu_update_fwnode - update fwnode of a PCI IOMMU + * @dev: the IOMMU device + * @fwnode: the fwnode, or NULL to remove an existing fwnode + * + * A PCI device isn't instantiated by the IORT driver. The IOMMU driver sets or + * removes its fwnode using this function. + */ +void iort_iommu_update_fwnode(struct device *dev, struct fwnode_handle *fwnode) +{ + struct pci_dev *pdev; + struct iort_fwnode *curr; + struct iort_pci_devid *devid; + + if (!dev_is_pci(dev)) + return; + + pdev = to_pci_dev(dev); + + spin_lock(&iort_fwnode_lock); + list_for_each_entry(curr, &iort_fwnode_list, list) { + devid = curr->pci_devid; + if (devid && + pci_domain_nr(pdev->bus) == devid->segment && + pdev->bus->number == devid->bus && + pdev->devfn == devid->devfn) { + WARN_ON(fwnode && curr->fwnode); + curr->fwnode = fwnode; + break; + } + } + spin_unlock(&iort_fwnode_lock); +} +EXPORT_SYMBOL_GPL(iort_iommu_update_fwnode); #else int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { return 0; } const struct iommu_ops *iort_iommu_configure(struct device *dev) { return NULL; } +static void iort_iommu_update_fwnode(struct device *dev, +struct fwnode_handle *fwnode) +{ } #endif static int nc_dma_get_range(struct device *dev, u64 *size) diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index f4db5fff07cf..840635e40d9d 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -43,6 +43,7 @@ int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size); const struct iommu_ops *iort_iommu_configure(struct device *dev); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); +void iort_iommu_update_fwnode(struct device *dev, struct fwnode_handle *fwnode); #else static void acpi_iort_register_table(struct acpi_table_header *table, enum iort_table_source source) @@ -63,6 +64,9 @@ static inline const struct iommu_ops *iort_iommu_configure( static inline int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { return 0; } +static void iort_iommu_update_fwnode(struct device *dev, +struct fwnode_handle *fwnode) +{ } #endif #endif /* __ACPI_IORT_H__ */ -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC 11/13] ACPI: Add VIOT table
Add support for a new ACPI table that embeds other tables describing a platform's IOMMU topology. Currently the only supported base table is IORT. The VIOT contains an IORT with additional node types, that describe a virtio-iommu. Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/Kconfig | 4 drivers/acpi/Makefile | 1 + drivers/acpi/bus.c| 2 ++ drivers/acpi/tables.c | 2 +- drivers/acpi/viot.c | 44 +++ drivers/iommu/Kconfig | 1 + include/linux/acpi_viot.h | 20 ++ 7 files changed, 73 insertions(+), 1 deletion(-) create mode 100644 drivers/acpi/viot.c create mode 100644 include/linux/acpi_viot.h diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 548976c8b2b0..513a5e4d3526 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -579,6 +579,10 @@ config TPS68470_PMIC_OPREGION config ACPI_IORT bool +config ACPI_VIOT + bool + select ACPI_IORT + endif # ACPI config X86_PM_TIMER diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 9d1792165713..6abdc6cc32c7 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -124,3 +124,4 @@ obj-y += dptf/ obj-$(CONFIG_ARM64)+= arm64/ obj-$(CONFIG_ACPI_IORT)+= iort.o +obj-$(CONFIG_ACPI_VIOT)+= viot.o diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index 48bc96d45bab..6f364e0c9240 100644 --- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -25,6 +25,7 @@ #include #endif #include +#include #include #include #include @@ -1246,6 +1247,7 @@ static int __init acpi_init(void) pci_mmcfg_late_init(); acpi_iort_init(); + acpi_viot_init(); acpi_scan_init(); acpi_ec_init(); acpi_debugfs_init(); diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index 180ac4329763..9662ea5e1064 100644 --- a/drivers/acpi/tables.c +++ b/drivers/acpi/tables.c @@ -501,7 +501,7 @@ static const char * const table_sigs[] = { ACPI_SIG_WDDT, ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT, - NULL }; + ACPI_SIG_VIOT, NULL }; #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header) diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c new file mode 100644 index ..ab9a6e43ad9b --- /dev/null +++ b/drivers/acpi/viot.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2019 Linaro + * + * Virtual IOMMU table + */ +#define pr_fmt(fmt)"ACPI: VIOT: " fmt + +#include +#include +#include + +int __init acpi_viot_init(void) +{ + struct acpi_table_viot *viot; + struct acpi_table_header *acpi_header; + acpi_status status; + + status = acpi_get_table(ACPI_SIG_VIOT, 0, &acpi_header); + if (ACPI_FAILURE(status)) { + if (status != AE_NOT_FOUND) { + const char *msg = acpi_format_exception(status); + + pr_err("Failed to get table, %s\n", msg); + return -EINVAL; + } + + return 0; + } + + if (acpi_header->length < sizeof(*viot)) { + pr_err("VIOT table overflow, bad table!\n"); + return -EINVAL; + } + + viot = (struct acpi_table_viot *)acpi_header; + if (ACPI_COMPARE_NAMESEG(viot->base_table.signature, ACPI_SIG_IORT)) { + acpi_iort_register_table(&viot->base_table, IORT_SOURCE_VIOT); + return 0; + } + + pr_err("Unknown base table header\n"); + return -EINVAL; +} diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index e3842eabcfdd..e6eb4f238d1a 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -480,6 +480,7 @@ config VIRTIO_IOMMU depends on ARM64 select IOMMU_API select INTERVAL_TREE + select ACPI_VIOT if ACPI help Para-virtualised IOMMU driver with virtio. diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h new file mode 100644 index ..6c282d5eb793 --- /dev/null +++ b/include/linux/acpi_viot.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2019 Linaro + */ + +#ifndef __ACPI_VIOT_H__ +#define __ACPI_VIOT_H__ + +#ifdef CONFIG_ACPI_VIOT + +int acpi_viot_init(void); + +#else /* !CONFIG_ACPI_VIOT */ + +static inline int acpi_viot_init(void) +{} + +#endif /* !CONFIG_ACPI_VIOT */ + +#endif /* __ACPI_VIOT_H__ */ -- 2.24.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu