Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
On 5/22/2020 12:42 PM, Robin Murphy wrote: On 2020-05-22 00:10, Rob Herring wrote: On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi wrote: From: Laurentiu Tudor The existing bindings cannot be used to specify the relationship between fsl-mc devices and GIC ITSes. Add a generic binding for mapping fsl-mc devices to GIC ITSes, using msi-map property. Signed-off-by: Laurentiu Tudor Cc: Rob Herring --- .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 +-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt index 9134e9bcca56..b0813b2d0493 100644 --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt @@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit value called an ICID the requester. The generic 'iommus' property is insufficient to describe the relationship -between ICIDs and IOMMUs, so an iommu-map property is used to define -the set of possible ICIDs under a root DPRC and how they map to -an IOMMU. +between ICIDs and IOMMUs, so the iommu-map and msi-map properties are used +to define the set of possible ICIDs under a root DPRC and how they map to +an IOMMU and a GIC ITS respectively. For generic IOMMU bindings, see Documentation/devicetree/bindings/iommu/iommu.txt. @@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt. For arm-smmu binding, see: Documentation/devicetree/bindings/iommu/arm,smmu.yaml. +For GICv3 and GIC ITS bindings, see: +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. + Required properties: - compatible @@ -119,6 +122,15 @@ Optional properties: associated with the listed IOMMU, with the iommu-specifier (i - icid-base + iommu-base). +- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier + data. + + The property is an arbitrary number of tuples of + (icid-base,iommu,iommu-base,length). I'm confused because the example has GIC ITS phandle, not an IOMMU. What is an iommu-base? Right, I was already halfway through writing a reply to say that all the copy-pasted "iommu" references here should be using the terminology from the pci-msi.txt binding instead. Right, will change it. + + Any ICID in the interval [icid-base, icid-base + length) is + associated with the listed GIC ITS, with the iommu-specifier + (i - icid-base + iommu-base). Example: smmu: iommu@500 { @@ -128,6 +140,16 @@ Example: ... }; + gic: interrupt-controller@600 { + compatible = "arm,gic-v3"; + ... + its: gic-its@602 { + compatible = "arm,gic-v3-its"; + msi-controller; + ... + }; + }; + fsl_mc: fsl-mc@80c00 { compatible = "fsl,qoriq-mc"; reg = <0x0008 0x0c00 0 0x40>, /* MC portal base */ @@ -135,6 +157,8 @@ Example: msi-parent = <&its>; Side note: is it right to keep msi-parent here? It rather implies that the MC itself has a 'native' Device ID rather than an ICID, which I believe is not strictly true. Plus it's extra-confusing that it doesn't specify an ID either way, since that makes it look like the legacy PCI case that gets treated implicitly as an identity msi-map, which makes no sense at all to combine with an actual msi-map. Before adding msi-map, the fsl-mc code assumed that ICID and streamID are equal and used msi-parent just to get the reference to the ITS node. Removing msi-parent will break the backward compatibility of the already existing systems. Maybe we should mention that this is legacy and not to be used for newer device trees. /* define map for ICIDs 23-64 */ iommu-map = <23 &smmu 23 41>; + /* define msi map for ICIDs 23-64 */ + msi-map = <23 &its 23 41>; Seeing 23 twice is odd. The numbers to the right of 'its' should be an ITS number space. On about 99% of systems the values in the SMMU Stream ID and ITS Device ID spaces are going to be the same. Nobody's going to bother carrying *two* sets of sideband data across the interconnect if they don't have to ;) Robin. Diana #address-cells = <3>; #size-cells = <1>; -- 2.26.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation
>For the record: I don't think we should apply these because we don't have a >good way of testing them. We currently have three problems that prevent us >from enabling SMMU on Tegra194: Out of three issues pointed here, I see that only issue 2) is a real blocker for enabling SMMU HW by default in upstream. >That said, I have tested earlier versions of this patchset on top of my local >branch with fixes for the above and they do seem to work as expected. >So I'll leave it up to the IOMMU maintainers whether they're willing to merge >the driver patches as is. > But I want to clarify that I won't be applying the DTS patches until we've > solved all of the above issues and therefore it should be clear that these > won't be runtime tested until then. SMMU driver patches as such are complete and can be used by nvidia with a local config change(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT=n) to disable_bypass and Protects the driver patches against kernel changes. This config disable option is tested already by Nicolin Chen and me. Robin/Will, Can you comment if smmu driver patches alone(1,2,3 out of 5 patches) can be merged without DT enable patches? Is it reasonable to merge the driver patches alone? >1) If we enable SMMU support, then the DMA API will automatically try > to use SMMU domains for allocations. This means that translations > will happen as soon as a device's IOMMU operations are initialized > and that is typically a long time (in kernel time at least) before > a driver is bound and has a chance of configuring the device. > This causes problems for non-quiesced devices like display > controllers that the bootloader might have set up to scan out a > boot splash. > What we're missing here is a way to: > a) advertise reserved memory regions for boot splash framebuffers > b) map reserved memory regions early during SMMU setup > Patches have been floating on the public mailing lists for b) but > a) requires changes to the bootloader (both proprietary ones and > U-Boot for SoCs prior to Tegra194). This happens if SMMU translations is enabled for display before reserved Memory regions issue is fixed. This issue is not a real blocker for SMMU enable. > 2) Even if we don't enable SMMU for a given device (by not hooking up > the iommus property), with a default kernel configuration we get a > bunch of faults during boot because the ARM SMMU driver faults by > default (rather than bypass) for masters which aren't hooked up to > the SMMU. > We could work around that by changing the default configuration or > overriding it on the command-line, but that's not really an option > because it decreases security and means that Tegra194 won't work > out-of-the-box. This is the real issue that blocks enabling SMMU. The USF faults for devices that don't have SMMU translations enabled should be fixed or WAR'ed before SMMU can be enabled. We should look at keeping SID as 0x7F for the devices that can't have SMMU enabled yet. SID 0x7f bypasses SMMU externally. > 3) We don't properly describe the DMA hierarchy, which causes the DMA > masks to be improperly set. As a bit of background: Tegra194 has a > special address bit (bit 39) that causes some swizzling to happen > within the memory controller. As a result, any I/O virtual address > that has bit 39 set will cause this swizzling to happen on access. > The DMA/IOMMU allocator always starts allocating from the top of > the IOVA space, which means that the first couple of gigabytes of > allocations will cause most devices to fail because of the > undesired swizzling that occurs. > We had an initial patch for SDHCI merged that hard-codes the DMA > mask to DMA_BIT_MASK(39) on Tegra194 to work around that. However, > the devices all do support addressing 40 bits and the restriction > on bit 39 is really a property of the bus rather than a capability > of the device. This means that we would have to work around this > for every device driver by adding similar hacks. A better option is > to properly describe the DMA hierarchy (using dma-ranges) because > that will then automatically be applied as a constraint on each > device's DMA mask. > I have been working on patches to address this, but they are fairly > involved because they require device tree bindings changes and so > on. Dma_mask issue is again outside SMMU driver and as long as the clients with Dma_mask issue don't have SMMU enabled, it would be fine. SDHCI can have SMMU enabled in upstream as soon as issue 2 is taken care. >So before we solve all of the above issues we can't really enable SMMU on >Tegra194 and hence won't be able to test it. As such we don't know if these >patches even work, nor can we validate that they continue to work. >As such, I don't think there's any use in applying these patches upstream >since they
Re: [RFC] Use SMMU HTTU for DMA dirty page tracking
Hi, On Tue, May 19, 2020 at 05:42:55PM +0800, Xiang Zheng wrote: > Hi all, > > Is there any plan for enabling SMMU HTTU? Not outside of SVA, as far as I know. > I have seen the patch locates in the SVA series patch, which adds > support for HTTU: > https://www.spinics.net/lists/arm-kernel/msg798694.html > > HTTU reduces the number of access faults on SMMU fault queue > (permission faults also benifit from it). > > Besides reducing the faults, HTTU also helps to track dirty pages for > device DMA. Is it feasible to utilize HTTU to get dirty pages on device > DMA during VFIO live migration? As you know there is a VFIO interface for this under discussion: https://lore.kernel.org/kvm/1589781397-28368-1-git-send-email-kwankh...@nvidia.com/ It doesn't implement an internal API to communicate with the IOMMU driver about dirty pages. > If SMMU can track dirty pages, devices are not required to implement > additional dirty pages tracking to support VFIO live migration. It seems feasible, though tracking it in the device might be more efficient. I might have misunderstood but I think for live migration of the Intel NIC they trap guest accesses to the device and introspect its state to figure out which pages it is accessing. With HTTU I suppose (without much knowledge about live migration) that you'd need several new interfaces to the IOMMU drivers: * A way for VFIO to query HTTU support in the SMMU. There are some discussions about communicating more IOMMU capabilities through VFIO but no implementation yet. When HTTU isn't supported the DIRTY_PAGES bitmap would report all pages as they do now. * VFIO_IOMMU_DIRTY_PAGES_FLAG_START/STOP would clear the dirty bit for all VFIO mappings (which is going to take some time). There is a walker in io-pgtable for iova_to_phys() which could be extended. I suppose it's also possible to atomically switch the HA and HD bits in context descriptors. * VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP would query the dirty bit for all VFIO mappings. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
On 5/22/2020 5:02 PM, Rob Herring wrote: > On Fri, May 22, 2020 at 3:42 AM Robin Murphy wrote: >> >> On 2020-05-22 00:10, Rob Herring wrote: >>> On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi >>> wrote: From: Laurentiu Tudor The existing bindings cannot be used to specify the relationship between fsl-mc devices and GIC ITSes. Add a generic binding for mapping fsl-mc devices to GIC ITSes, using msi-map property. Signed-off-by: Laurentiu Tudor Cc: Rob Herring --- .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 +-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt index 9134e9bcca56..b0813b2d0493 100644 --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt @@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit value called an ICID the requester. The generic 'iommus' property is insufficient to describe the relationship -between ICIDs and IOMMUs, so an iommu-map property is used to define -the set of possible ICIDs under a root DPRC and how they map to -an IOMMU. +between ICIDs and IOMMUs, so the iommu-map and msi-map properties are used +to define the set of possible ICIDs under a root DPRC and how they map to +an IOMMU and a GIC ITS respectively. For generic IOMMU bindings, see Documentation/devicetree/bindings/iommu/iommu.txt. @@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt. For arm-smmu binding, see: Documentation/devicetree/bindings/iommu/arm,smmu.yaml. +For GICv3 and GIC ITS bindings, see: +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. + Required properties: - compatible @@ -119,6 +122,15 @@ Optional properties: associated with the listed IOMMU, with the iommu-specifier (i - icid-base + iommu-base). +- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier + data. + + The property is an arbitrary number of tuples of + (icid-base,iommu,iommu-base,length). >>> >>> I'm confused because the example has GIC ITS phandle, not an IOMMU. >>> >>> What is an iommu-base? >> >> Right, I was already halfway through writing a reply to say that all the >> copy-pasted "iommu" references here should be using the terminology from >> the pci-msi.txt binding instead. >> + + Any ICID in the interval [icid-base, icid-base + length) is + associated with the listed GIC ITS, with the iommu-specifier + (i - icid-base + iommu-base). Example: smmu: iommu@500 { @@ -128,6 +140,16 @@ Example: ... }; + gic: interrupt-controller@600 { + compatible = "arm,gic-v3"; + ... + its: gic-its@602 { + compatible = "arm,gic-v3-its"; + msi-controller; + ... + }; + }; + fsl_mc: fsl-mc@80c00 { compatible = "fsl,qoriq-mc"; reg = <0x0008 0x0c00 0 0x40>,/* MC portal base */ @@ -135,6 +157,8 @@ Example: msi-parent = <&its>; >> >> Side note: is it right to keep msi-parent here? It rather implies that >> the MC itself has a 'native' Device ID rather than an ICID, which I >> believe is not strictly true. Plus it's extra-confusing that it doesn't >> specify an ID either way, since that makes it look like the legacy PCI >> case that gets treated implicitly as an identity msi-map, which makes no >> sense at all to combine with an actual msi-map. > > No, it doesn't make sense from a binding perspective. > >> /* define map for ICIDs 23-64 */ iommu-map = <23 &smmu 23 41>; +/* define msi map for ICIDs 23-64 */ +msi-map = <23 &its 23 41>; >>> >>> Seeing 23 twice is odd. The numbers to the right of 'its' should be an >>> ITS number space. >> >> On about 99% of systems the values in the SMMU Stream ID and ITS Device >> ID spaces are going to be the same. Nobody's going to bother carrying >> *two* sets of sideband data across the interconnect if they don't have to ;) > > I'm referring to the 23 on the left and right, not between the msi and > iommu. If the left and right are the same, then what are we remapping > exactly? > I also insisted a lot on keeping things simple and don't do any kind of translation but Robin convinced me that this is not such a great idea. The truth is that the hardware
Re: [PATCH v5 0/5] Nvidia Arm SMMUv2 Implementation
On Thu, May 21, 2020 at 04:31:02PM -0700, Krishna Reddy wrote: > Changes in v5: > Rebased on top of > git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next > > v4 - https://lkml.org/lkml/2019/10/30/1054 > v3 - https://lkml.org/lkml/2019/10/18/1601 > v2 - https://lkml.org/lkml/2019/9/2/980 > v1 - https://lkml.org/lkml/2019/8/29/1588 > > Krishna Reddy (5): > iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage > dt-bindings: arm-smmu: Add binding for Tegra194 SMMU > iommu/arm-smmu: Add global/context fault implementation hooks For the record: I don't think we should apply these because we don't have a good way of testing them. We currently have three problems that prevent us from enabling SMMU on Tegra194: 1) If we enable SMMU support, then the DMA API will automatically try to use SMMU domains for allocations. This means that translations will happen as soon as a device's IOMMU operations are initialized and that is typically a long time (in kernel time at least) before a driver is bound and has a chance of configuring the device. This causes problems for non-quiesced devices like display controllers that the bootloader might have set up to scan out a boot splash. What we're missing here is a way to: a) advertise reserved memory regions for boot splash framebuffers b) map reserved memory regions early during SMMU setup Patches have been floating on the public mailing lists for b) but a) requires changes to the bootloader (both proprietary ones and U-Boot for SoCs prior to Tegra194). 2) Even if we don't enable SMMU for a given device (by not hooking up the iommus property), with a default kernel configuration we get a bunch of faults during boot because the ARM SMMU driver faults by default (rather than bypass) for masters which aren't hooked up to the SMMU. We could work around that by changing the default configuration or overriding it on the command-line, but that's not really an option because it decreases security and means that Tegra194 won't work out-of-the-box. 3) We don't properly describe the DMA hierarchy, which causes the DMA masks to be improperly set. As a bit of background: Tegra194 has a special address bit (bit 39) that causes some swizzling to happen within the memory controller. As a result, any I/O virtual address that has bit 39 set will cause this swizzling to happen on access. The DMA/IOMMU allocator always starts allocating from the top of the IOVA space, which means that the first couple of gigabytes of allocations will cause most devices to fail because of the undesired swizzling that occurs. We had an initial patch for SDHCI merged that hard-codes the DMA mask to DMA_BIT_MASK(39) on Tegra194 to work around that. However, the devices all do support addressing 40 bits and the restriction on bit 39 is really a property of the bus rather than a capability of the device. This means that we would have to work around this for every device driver by adding similar hacks. A better option is to properly describe the DMA hierarchy (using dma-ranges) because that will then automatically be applied as a constraint on each device's DMA mask. I have been working on patches to address this, but they are fairly involved because they require device tree bindings changes and so on. So before we solve all of the above issues we can't really enable SMMU on Tegra194 and hence won't be able to test it. As such we don't know if these patches even work, nor can we validate that they continue to work. As such, I don't think there's any use in applying these patches upstream since they will be effectively dead code until all of the above issues are resolved. > arm64: tegra: Add DT node for T194 SMMU > arm64: tegra: enable SMMU for SDHCI and EQOS on T194 This one is going to cause EQOS to break because of 3) above. It might work for SDHCI because of the workaround we currently have in that driver. However, I do have a local patch that reverts the workaround and replaces it with the proper fix, which uses dma-ranges as mentioned above. That said, I have tested earlier versions of this patchset on top of my local branch with fixes for the above and they do seem to work as expected. So I'll leave it up to the IOMMU maintainers whether they're willing to merge the driver patches as is. But I want to clarify that I won't be applying the DTS patches until we've solved all of the above issues and therefore it should be clear that these won't be runtime tested until then. I expect it will take at least until v5.9-rc1 before we have all the changes merged that would allow us to enable SMMU support. Thierry > .../devicetree/bindings/iommu/arm,smmu.yaml | 5 + > MAINTAINERS | 2 + > arch/arm64/boot/dts/nvidia/teg
Re: arm-smmu-v3 high cpu usage for NVMe
On 20/03/2020 10:41, John Garry wrote: + Barry, Alexandru PerfTop: 85864 irqs/sec kernel:89.6% exact: 0.0% lost: 0/34434 drop: 0/40116 [4000Hz cycles], (all, 96 CPUs) -- 27.43% [kernel] [k] arm_smmu_cmdq_issue_cmdlist 11.71% [kernel] [k] _raw_spin_unlock_irqrestore 6.35% [kernel] [k] _raw_spin_unlock_irq 2.65% [kernel] [k] get_user_pages_fast 2.03% [kernel] [k] __slab_free 1.55% [kernel] [k] tick_nohz_idle_exit 1.47% [kernel] [k] arm_lpae_map 1.39% [kernel] [k] __fget 1.14% [kernel] [k] __lock_text_start 1.09% [kernel] [k] _raw_spin_lock 1.08% [kernel] [k] bio_release_pages.part.42 1.03% [kernel] [k] __sbitmap_get_word 0.97% [kernel] [k] arm_smmu_atc_inv_domain.constprop.42 0.91% [kernel] [k] fput_many 0.88% [kernel] [k] __arm_lpae_map Hi Will, Robin, I'm just getting around to look at this topic again. Here's the current picture for my NVMe test: perf top -C 0 * Samples: 808 of event 'cycles:ppp', Event count (approx.): 469909024 Overhead Shared Object Symbol 75.91% [kernel] [k] arm_smmu_cmdq_issue_cmdlist 3.28% [kernel] [k] arm_smmu_tlb_inv_range 2.42% [kernel] [k] arm_smmu_atc_inv_domain.constprop.49 2.35% [kernel] [k] _raw_spin_unlock_irqrestore 1.32% [kernel] [k] __arm_smmu_cmdq_poll_set_valid_map.isra.41 1.20% [kernel] [k] aio_complete_rw 0.96% [kernel] [k] enqueue_task_fair 0.93% [kernel] [k] gic_handle_irq 0.86% [kernel] [k] _raw_spin_lock_irqsave 0.72% [kernel] [k] put_reqs_available 0.72% [kernel] [k] sbitmap_queue_clear * only certain CPUs run the dma unmap for my scenario, cpu0 being one of them. Colleague Barry has similar findings for some other scenarios. So we tried the latest perf NMI support wip patches, and noticed a few hotspots (see https://raw.githubusercontent.com/hisilicon/kernel-dev/fee69c8ca3784b9dd3912703cfcd4985a00f6bbb/perf%20annotate and https://raw.githubusercontent.com/hisilicon/kernel-dev/fee69c8ca3784b9dd3912703cfcd4985a00f6bbb/report.txt) when running some NVMe traffic: - initial cmpxchg to get a place in the queue - when more CPUs get involved, we start failing at an exponential rate 0.00 :8000107a3500: cas x4, x2, [x27] 26.52 :8000107a3504: mov x0, x4 : arm_smmu_cmdq_issue_cmdlist(): - the queue locking - polling cmd_sync Some ideas to optimise: a. initial cmpxchg So this cmpxchg could be considered unfair. In addition, with all the contention on arm_smmu_cmdq.q, that cacheline would be constantly pinged around the system. Maybe we can implement something similar to the idea of queued/ticketed spinlocks, making a CPU spin on own copy of arm_smmu_cmdq.q after initial cmpxchg fails, released by its leader, and releasing subsequent followers b. Drop the queue_full checking in certain circumstances If we cannot theoretically fill the queue, then stop the checking for queue full or similar. This should also help current problem of a., as the less time between cmpxchg, the less chance of failing (as we check queue available space between cmpxchg attempts). So if cmdq depth > nr_available_cpus * (max batch size + 1) AND we always issue a cmd_sync for a batch (regardless of whether requested), then we should never fill (I think). c. Don't do queue locking in certain circumstances If we implement (and support) b. and support MSI polling, then I don't think that this is required. d. More minor ideas are to move forward when the "owner" stops gathering to reduce time of advancing the prod, hopefully reducing cmd_sync polling time; and also use a smaller word size for the valid bitmap operations, maybe 32b atomic operations are overall more efficient (than 64b) - mostly valid range check is < 16 bits from my observation. Let me know your thoughts or any other ideas. Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/15] device core: Add ability to handle multiple dma offsets
Hi Nicolas, On Wed, May 20, 2020 at 7:28 AM Nicolas Saenz Julienne wrote: > > Hi Jim, > thanks for having a go at this! My two cents. > > On Tue, 2020-05-19 at 16:34 -0400, Jim Quinlan wrote: > > The device variable 'dma_pfn_offset' is used to do a single > > linear map between cpu addrs and dma addrs. The variable > > 'dma_map' is added to struct device to point to an array > > of multiple offsets which is required for some devices. > > > > Signed-off-by: Jim Quinlan > > --- > > [...] > > > --- a/include/linux/device.h > > +++ b/include/linux/device.h > > @@ -493,6 +493,8 @@ struct dev_links_info { > > * @bus_dma_limit: Limit of an upstream bridge or bus which imposes a > > smaller > > * DMA limit than the device itself supports. > > * @dma_pfn_offset: offset of DMA memory range relatively of RAM > > + * @dma_map: Like dma_pfn_offset but used when there are multiple > > + * pfn offsets for multiple dma-ranges. > > * @dma_parms: A low level driver may set these to teach IOMMU code > > about > > * segment limitations. > > * @dma_pools: Dma pools (if dma'ble device). > > @@ -578,7 +580,12 @@ struct device { > >allocations such descriptors. */ > > u64 bus_dma_limit; /* upstream dma constraint */ > > unsigned long dma_pfn_offset; > > - > > +#ifdef CONFIG_DMA_PFN_OFFSET_MAP > > + const void *dma_offset_map; /* Like dma_pfn_offset, but for > > + * the unlikely case of multiple > > + * offsets. If non-null, > > dma_pfn_offset > > + * will be 0. */ > > I get a bad feeling about separating the DMA offset handling into two distinct > variables. Albeit generally frowned upon, there is a fair amount of trickery > around dev->dma_pfn_offset all over the kernel. usb_alloc_dev() comes to mind > for example. And this obviously doesn't play well with it. The trickery should only be present when CONFIG_DMA_PFN_OFFSET_MAP=y**. Otherwise it does no harm. Also, I feel that if dev-dma_pfn_offset is valid then so is dev->dma_pfn_offset_map -- they both use the same mechanism in the same places. I am merely extending something that has been in Linux for a long time.. Further, I could have had dma_pfn_offset_map subsume dma_pfn_offset but I wanted to leave it alone since folks would complain that it would go from an addition to an if-clause and an inline function. But if I did go that way there would only be one mechanism that would cover both cases. > I feel a potential > solution to multiple DMA ranges should completely integrate with the current > device DMA handling code, without special cases, on top of that, be > transparent > to the user. Having dma_pfn_offset_map subsume dma_pfn_offset would integrate the current code too. And I am not sure what you mean by being "transparent to the user" -- the writer of the PCIe endpoint driver is going to do some DMA calls and they have no idea if this mechanism is in play or not. > > In more concrete terms, I'd repackage dev->bus_dma_limit and > dev->dma_pfn_offset into a list/array of DMA range structures This is sort of what I am doing except I defined my own structure. Using the of_range structure would require one to do the same extra calculations over and over for a DMA call; this is why I defined my structure that has all of the needed precomputed variables. > and adapt/create > the relevant getter/setter functions so as for DMA users not to have to worry > about the specifics of a device's DMA constraints. I'm not sure I understand where these getter/setter functions would exist or what they would do. > editing dev->dma_pfn_offset, you'd be passing a DMA range structure to the > device core, and let it take the relevant decisions on how to handle it How and where would the device core operate for these getter/setters? In how many places in the code? The way I see it, any solution has to adjust the value when doing dma2phys and phys2dma conversions, and the most efficient place to do that is in the two DMA header files (the second one is for ARM). > internally (overwrite, add a new entry, merge them, etc...). I'm concerned that this would be overkill; I am just trying to get a driver upstream for some baroque PCIe RC HW I'm not sure if we should set up something elaborate when the demand is not there. I'll be posting a v2. ChrisophH has sent me some personal feedback which I am incorporating; so feel free to discuss your ideas with him as well because I really want consensus on any large changes in direction. Thanks, Jim ** CONFIG_DMA_OF_PFN_OFFSET_MAP=y only occurs when building for ARCH_BRCMSTB. However, ARCH_BRCMSTB is set by the ARM64 defconfig and the ARM multi_v7_defconfig, so it would be activated for those defconfigs. This may(a) get us kicked out of those defconfigs or (b) we may have to keep
Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
On 2020-05-22 15:08, Rob Herring wrote: On Fri, May 22, 2020 at 3:57 AM Diana Craciun OSS wrote: On 5/22/2020 12:42 PM, Robin Murphy wrote: On 2020-05-22 00:10, Rob Herring wrote: On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi wrote: From: Laurentiu Tudor The existing bindings cannot be used to specify the relationship between fsl-mc devices and GIC ITSes. Add a generic binding for mapping fsl-mc devices to GIC ITSes, using msi-map property. Signed-off-by: Laurentiu Tudor Cc: Rob Herring --- .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 +-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt index 9134e9bcca56..b0813b2d0493 100644 --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt @@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit value called an ICID the requester. The generic 'iommus' property is insufficient to describe the relationship -between ICIDs and IOMMUs, so an iommu-map property is used to define -the set of possible ICIDs under a root DPRC and how they map to -an IOMMU. +between ICIDs and IOMMUs, so the iommu-map and msi-map properties are used +to define the set of possible ICIDs under a root DPRC and how they map to +an IOMMU and a GIC ITS respectively. For generic IOMMU bindings, see Documentation/devicetree/bindings/iommu/iommu.txt. @@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt. For arm-smmu binding, see: Documentation/devicetree/bindings/iommu/arm,smmu.yaml. +For GICv3 and GIC ITS bindings, see: +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. + Required properties: - compatible @@ -119,6 +122,15 @@ Optional properties: associated with the listed IOMMU, with the iommu-specifier (i - icid-base + iommu-base). +- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier + data. + + The property is an arbitrary number of tuples of + (icid-base,iommu,iommu-base,length). I'm confused because the example has GIC ITS phandle, not an IOMMU. What is an iommu-base? Right, I was already halfway through writing a reply to say that all the copy-pasted "iommu" references here should be using the terminology from the pci-msi.txt binding instead. Right, will change it. + + Any ICID in the interval [icid-base, icid-base + length) is + associated with the listed GIC ITS, with the iommu-specifier + (i - icid-base + iommu-base). Example: smmu: iommu@500 { @@ -128,6 +140,16 @@ Example: ... }; + gic: interrupt-controller@600 { + compatible = "arm,gic-v3"; + ... + its: gic-its@602 { + compatible = "arm,gic-v3-its"; + msi-controller; + ... + }; + }; + fsl_mc: fsl-mc@80c00 { compatible = "fsl,qoriq-mc"; reg = <0x0008 0x0c00 0 0x40>,/* MC portal base */ @@ -135,6 +157,8 @@ Example: msi-parent = <&its>; Side note: is it right to keep msi-parent here? It rather implies that the MC itself has a 'native' Device ID rather than an ICID, which I believe is not strictly true. Plus it's extra-confusing that it doesn't specify an ID either way, since that makes it look like the legacy PCI case that gets treated implicitly as an identity msi-map, which makes no sense at all to combine with an actual msi-map. Before adding msi-map, the fsl-mc code assumed that ICID and streamID are equal and used msi-parent just to get the reference to the ITS node. Removing msi-parent will break the backward compatibility of the already existing systems. Maybe we should mention that this is legacy and not to be used for newer device trees. If ids are 1:1, then the DT should use msi-parent. If there is remapping, then use msi-map. A given system should use one or the other. I suppose if some ids are 1:1 and the msi-map was added to add additional support for ids not 1:1, then you could end up with both. That's fine in dts files, but examples should reflect the 'right' way. Is that defined anywhere? The generic MSI binding just has some weaselly wording about buses: "When #msi-cells is non-zero, busses with an msi-parent will require additional properties to describe the relationship between devices on the bus and the set of MSIs they can potentially generate." which appears at odds with its own definition of msi-parent as including an msi-specifier (or at best very unclear about what value that specifier should take in this case). The PCI MSI binding goes even further and specifically reserves msi-parent for cases where there is no sideband data. As far as I'm aware, the fact that the IT
Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
On Fri, May 22, 2020 at 3:57 AM Diana Craciun OSS wrote: > > On 5/22/2020 12:42 PM, Robin Murphy wrote: > > On 2020-05-22 00:10, Rob Herring wrote: > >> On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi > >> wrote: > >>> > >>> From: Laurentiu Tudor > >>> > >>> The existing bindings cannot be used to specify the relationship > >>> between fsl-mc devices and GIC ITSes. > >>> > >>> Add a generic binding for mapping fsl-mc devices to GIC ITSes, using > >>> msi-map property. > >>> > >>> Signed-off-by: Laurentiu Tudor > >>> Cc: Rob Herring > >>> --- > >>> .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 > >>> +-- > >>> 1 file changed, 27 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt > >>> b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt > >>> index 9134e9bcca56..b0813b2d0493 100644 > >>> --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt > >>> +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt > >>> @@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit > >>> value called an ICID > >>> the requester. > >>> > >>> The generic 'iommus' property is insufficient to describe the > >>> relationship > >>> -between ICIDs and IOMMUs, so an iommu-map property is used to define > >>> -the set of possible ICIDs under a root DPRC and how they map to > >>> -an IOMMU. > >>> +between ICIDs and IOMMUs, so the iommu-map and msi-map properties > >>> are used > >>> +to define the set of possible ICIDs under a root DPRC and how they > >>> map to > >>> +an IOMMU and a GIC ITS respectively. > >>> > >>> For generic IOMMU bindings, see > >>> Documentation/devicetree/bindings/iommu/iommu.txt. > >>> @@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt. > >>> For arm-smmu binding, see: > >>> Documentation/devicetree/bindings/iommu/arm,smmu.yaml. > >>> > >>> +For GICv3 and GIC ITS bindings, see: > >>> +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. > >>> > >>> + > >>> Required properties: > >>> > >>> - compatible > >>> @@ -119,6 +122,15 @@ Optional properties: > >>> associated with the listed IOMMU, with the iommu-specifier > >>> (i - icid-base + iommu-base). > >>> > >>> +- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier > >>> + data. > >>> + > >>> + The property is an arbitrary number of tuples of > >>> + (icid-base,iommu,iommu-base,length). > >> > >> I'm confused because the example has GIC ITS phandle, not an IOMMU. > >> > >> What is an iommu-base? > > > > Right, I was already halfway through writing a reply to say that all > > the copy-pasted "iommu" references here should be using the > > terminology from the pci-msi.txt binding instead. > > Right, will change it. > > > > >>> + > >>> + Any ICID in the interval [icid-base, icid-base + length) is > >>> + associated with the listed GIC ITS, with the iommu-specifier > >>> + (i - icid-base + iommu-base). > >>> Example: > >>> > >>> smmu: iommu@500 { > >>> @@ -128,6 +140,16 @@ Example: > >>> ... > >>> }; > >>> > >>> + gic: interrupt-controller@600 { > >>> + compatible = "arm,gic-v3"; > >>> + ... > >>> + its: gic-its@602 { > >>> + compatible = "arm,gic-v3-its"; > >>> + msi-controller; > >>> + ... > >>> + }; > >>> + }; > >>> + > >>> fsl_mc: fsl-mc@80c00 { > >>> compatible = "fsl,qoriq-mc"; > >>> reg = <0x0008 0x0c00 0 0x40>,/* MC > >>> portal base */ > >>> @@ -135,6 +157,8 @@ Example: > >>> msi-parent = <&its>; > > > > Side note: is it right to keep msi-parent here? It rather implies that > > the MC itself has a 'native' Device ID rather than an ICID, which I > > believe is not strictly true. Plus it's extra-confusing that it > > doesn't specify an ID either way, since that makes it look like the > > legacy PCI case that gets treated implicitly as an identity msi-map, > > which makes no sense at all to combine with an actual msi-map. > > Before adding msi-map, the fsl-mc code assumed that ICID and streamID > are equal and used msi-parent just to get the reference to the ITS node. > Removing msi-parent will break the backward compatibility of the already > existing systems. Maybe we should mention that this is legacy and not to > be used for newer device trees. If ids are 1:1, then the DT should use msi-parent. If there is remapping, then use msi-map. A given system should use one or the other. I suppose if some ids are 1:1 and the msi-map was added to add additional support for ids not 1:1, then you could end up with both. That's fine in dts files, but examples should reflect the 'right' way. Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linu
Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
On Fri, May 22, 2020 at 3:42 AM Robin Murphy wrote: > > On 2020-05-22 00:10, Rob Herring wrote: > > On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi > > wrote: > >> > >> From: Laurentiu Tudor > >> > >> The existing bindings cannot be used to specify the relationship > >> between fsl-mc devices and GIC ITSes. > >> > >> Add a generic binding for mapping fsl-mc devices to GIC ITSes, using > >> msi-map property. > >> > >> Signed-off-by: Laurentiu Tudor > >> Cc: Rob Herring > >> --- > >> .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 +-- > >> 1 file changed, 27 insertions(+), 3 deletions(-) > >> > >> diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt > >> b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt > >> index 9134e9bcca56..b0813b2d0493 100644 > >> --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt > >> +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt > >> @@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit value > >> called an ICID > >> the requester. > >> > >> The generic 'iommus' property is insufficient to describe the > >> relationship > >> -between ICIDs and IOMMUs, so an iommu-map property is used to define > >> -the set of possible ICIDs under a root DPRC and how they map to > >> -an IOMMU. > >> +between ICIDs and IOMMUs, so the iommu-map and msi-map properties are used > >> +to define the set of possible ICIDs under a root DPRC and how they map to > >> +an IOMMU and a GIC ITS respectively. > >> > >> For generic IOMMU bindings, see > >> Documentation/devicetree/bindings/iommu/iommu.txt. > >> @@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt. > >> For arm-smmu binding, see: > >> Documentation/devicetree/bindings/iommu/arm,smmu.yaml. > >> > >> +For GICv3 and GIC ITS bindings, see: > >> +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. > >> + > >> Required properties: > >> > >> - compatible > >> @@ -119,6 +122,15 @@ Optional properties: > >> associated with the listed IOMMU, with the iommu-specifier > >> (i - icid-base + iommu-base). > >> > >> +- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier > >> + data. > >> + > >> + The property is an arbitrary number of tuples of > >> + (icid-base,iommu,iommu-base,length). > > > > I'm confused because the example has GIC ITS phandle, not an IOMMU. > > > > What is an iommu-base? > > Right, I was already halfway through writing a reply to say that all the > copy-pasted "iommu" references here should be using the terminology from > the pci-msi.txt binding instead. > > >> + > >> + Any ICID in the interval [icid-base, icid-base + length) is > >> + associated with the listed GIC ITS, with the iommu-specifier > >> + (i - icid-base + iommu-base). > >> Example: > >> > >> smmu: iommu@500 { > >> @@ -128,6 +140,16 @@ Example: > >> ... > >> }; > >> > >> + gic: interrupt-controller@600 { > >> + compatible = "arm,gic-v3"; > >> + ... > >> + its: gic-its@602 { > >> + compatible = "arm,gic-v3-its"; > >> + msi-controller; > >> + ... > >> + }; > >> + }; > >> + > >> fsl_mc: fsl-mc@80c00 { > >> compatible = "fsl,qoriq-mc"; > >> reg = <0x0008 0x0c00 0 0x40>,/* MC portal > >> base */ > >> @@ -135,6 +157,8 @@ Example: > >> msi-parent = <&its>; > > Side note: is it right to keep msi-parent here? It rather implies that > the MC itself has a 'native' Device ID rather than an ICID, which I > believe is not strictly true. Plus it's extra-confusing that it doesn't > specify an ID either way, since that makes it look like the legacy PCI > case that gets treated implicitly as an identity msi-map, which makes no > sense at all to combine with an actual msi-map. No, it doesn't make sense from a binding perspective. > > >> /* define map for ICIDs 23-64 */ > >> iommu-map = <23 &smmu 23 41>; > >> +/* define msi map for ICIDs 23-64 */ > >> +msi-map = <23 &its 23 41>; > > > > Seeing 23 twice is odd. The numbers to the right of 'its' should be an > > ITS number space. > > On about 99% of systems the values in the SMMU Stream ID and ITS Device > ID spaces are going to be the same. Nobody's going to bother carrying > *two* sets of sideband data across the interconnect if they don't have to ;) I'm referring to the 23 on the left and right, not between the msi and iommu. If the left and right are the same, then what are we remapping exactly? Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: Fix group refcount in iommu_alloc_default_domain()
Since the change to move default domain allocation to probe, there is a refcount decrement missing for the group in iommu_alloc_default_domain(). Because of this missing refcount decrement, the device is never released from the group as the devices_kobj refcount never reaches 0 in iommu_group_remove_device() leading to a lot of issues. One such case is that this will lead to a different group allocation on every reload of the module which configures iommu such as the ath10k module which then finally fails to attach this device to the SMMU with -ENOSPC error in __arm_smmu_alloc_bitmap() once the count of module reload crosses the number of context banks. This will then lead to NULL pointer deference in the next reload of the module. Add the missing refcount decrement(iommu_group_put()) in iommu_alloc_default_domain() to fix this issue. Call trace: ... platform wifi-firmware.0: Adding to iommu group 82 ath10k_snoc 1880.wifi: could not attach device: -28 platform wifi-firmware.0: Removing from iommu group 82 ath10k_snoc 1880.wifi: failed to initialize firmware: -28 ath10k_snoc: probe of 1880.wifi failed with error -28 platform wifi-firmware.0: Adding to iommu group 83 Unable to handle kernel NULL pointer dereference at virtual address Mem abort info: ESR = 0x9606 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x0006 CM = 0, WnR = 0 user pgtable: 4k pages, 39-bit VAs, pgdp=000177a53000 [] pgd=0001e74f5003, pud=0001e74f5003, pmd= Internal error: Oops: 9606 [#1] PREEMPT SMP pstate: 6049 (nZCv daif +PAN -UAO) arm_smmu_flush_iotlb_all+0x20/0x6c iommu_create_device_direct_mappings+0x17c/0x1d8 iommu_probe_device+0xc0/0x100 of_iommu_configure+0x108/0x240 of_dma_configure+0x130/0x1d0 ath10k_fw_init+0xc4/0x1c4 [ath10k_snoc] ath10k_snoc_probe+0x5cc/0x678 [ath10k_snoc] platform_drv_probe+0x90/0xb0 really_probe+0x134/0x2ec driver_probe_device+0x64/0xfc device_driver_attach+0x4c/0x6c __driver_attach+0xac/0xc0 bus_for_each_dev+0x8c/0xd4 driver_attach+0x2c/0x38 bus_add_driver+0xfc/0x1d0 driver_register+0x64/0xf8 __platform_driver_register+0x4c/0x58 init_module+0x20/0x1000 [ath10k_snoc] do_one_initcall+0x13c/0x2d0 do_init_module+0x58/0x1dc load_module+0xde0/0xf10 __arm64_sys_finit_module+0xb0/0xe0 el0_svc_common+0xa4/0x154 el0_svc_compat_handler+0x2c/0x38 el0_svc_compat+0x8/0x10 Code: d503201f f85b8268 b4000248 f8560e74 (f9400280) ---[ end trace e5c1470a584952a0 ]--- Kernel panic - not syncing: Fatal exception Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/iommu.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index a4c2f122eb8b..05f7b77c432f 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1491,6 +1491,7 @@ static int iommu_alloc_default_domain(struct device *dev) { struct iommu_group *group; unsigned int type; + int ret; group = iommu_group_get(dev); if (!group) @@ -1501,7 +1502,11 @@ static int iommu_alloc_default_domain(struct device *dev) type = iommu_get_def_domain_type(dev); - return iommu_group_alloc_default_domain(dev->bus, group, type); + ret = iommu_group_alloc_default_domain(dev->bus, group, type); + + iommu_group_put(group); + + return ret; } /** -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 39/38] drm: xen: fix common struct sg_table related issues
The Documentation/DMA-API-HOWTO.txt states that the dma_map_sg() function returns the number of the created entries in the DMA address space. However the subsequent calls to the dma_sync_sg_for_{device,cpu}() and dma_unmap_sg must be called with the original number of the entries passed to the dma_map_sg(). struct sg_table is a common structure used for describing a non-contiguous memory buffer, used commonly in the DRM and graphics subsystems. It consists of a scatterlist with memory pages and DMA addresses (sgl entry), as well as the number of scatterlist entries: CPU pages (orig_nents entry) and DMA mapped pages (nents entry). It turned out that it was a common mistake to misuse nents and orig_nents entries, calling DMA-mapping functions with a wrong number of entries or ignoring the number of mapped entries returned by the dma_map_sg() function. Fix the code to refer to proper nents or orig_nents entries. This driver reports the number of the pages in the imported scatterlist, so it should refer to sg_table->orig_nents entry. Signed-off-by: Marek Szyprowski Acked-by: Oleksandr Andrushchenko --- For more information, see '[PATCH v5 00/38] DRM: fix struct sg_table nents vs. orig_nents misuse' thread: https://lore.kernel.org/linux-iommu/20200513132114.6046-1-m.szyprow...@samsung.com/T/ This patch has been resurrected on Oleksandr Andrushchenko request. The patch was a part of v2 patchset, but then I've dropped it as it only fixes the debug output, thus I didn't consider it so critical. --- drivers/gpu/drm/xen/xen_drm_front_gem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c b/drivers/gpu/drm/xen/xen_drm_front_gem.c index f0b85e0..ba4bdc5 100644 --- a/drivers/gpu/drm/xen/xen_drm_front_gem.c +++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c @@ -215,7 +215,7 @@ struct drm_gem_object * return ERR_PTR(ret); DRM_DEBUG("Imported buffer of size %zu with nents %u\n", - size, sgt->nents); + size, sgt->orig_nents); return &xen_obj->base; } -- 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 13/24] iommu/arm-smmu-v3: Enable broadcast TLB maintenance
[+Eric] On Thu, May 21, 2020 at 03:38:35PM +0100, Marc Zyngier wrote: > On 2020-05-21 15:17, Will Deacon wrote: > > [+Marc] > > > > On Tue, May 19, 2020 at 07:54:51PM +0200, Jean-Philippe Brucker wrote: > > > The SMMUv3 can handle invalidation targeted at TLB entries with shared > > > ASIDs. If the implementation supports broadcast TLB maintenance, > > > enable it > > > and keep track of it in a feature bit. The SMMU will then be > > > affected by > > > inner-shareable TLB invalidations from other agents. > > > > > > A major side-effect of this change is that stage-2 translation > > > contexts > > > are now affected by all invalidations by VMID. VMIDs are all shared > > > and > > > the only ways to prevent over-invalidation, since the stage-2 page > > > tables > > > are not shared between CPU and SMMU, are to either disable BTM or > > > allocate > > > different VMIDs. This patch does not address the problem. > > > > This sounds like a potential performance issue, particularly as we > > expose > > stage-2 contexts via VFIO directly. Yes it's certainly going to affect SMMU performance, though I haven't measured it. QEMU and kvmtool currently use stage-1 translations instead of stage-2, so it won't be a problem until they start using nested translation (and unless the SMMU only supports stage-2). In the coming month I'd like to have a look at coordinating VMID allocation between KVM and SMMU, for guest SVA. If the guest wants to share page tables with the SMMU, the SMMU has to use the same VMIDs as the VM to receive broadcast TLBI. Similarly to patch 06 ("arm64: mm: Pin down ASIDs for sharing mm with devices") the SMMU would request a VMID allocated by KVM, when setting up a nesting VFIO container. One major downside is that the VMID is pinned and cannot be recycled on rollover while it's being used for DMA. I wonder if we could use this even when page tables aren't shared between CPU and SMMU, to avoid splitting the VMID space. > > Maybe we could reserve some portion > > of > > VMID space for the SMMU? Marc, what do you reckon? > > Certainly doable when we have 16bits VMIDs. With smaller VMID spaces (like > on > v8.0), this is a bit more difficult (we do have pretty large v8.0 systems > around). It's only an issue if those systems have an SMMUv3 supporting DVM. With any luck that doesn't exist? > How many VMID bits are we talking about? That's anyone's guess... One passed-through device per VM would halve the VMID space. But the SMMU allocates one VMID for each device assigned to a guest, not one per VM (well one per domain, or VFIO container, but I think it boils down to one per device with QEMU). So with SR-IOV for example it should be pretty easy to reach 256 VMIDs in the SMMU. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 12/12] bus: fsl-mc: Add ACPI support for fsl-mc
On Fri, May 22, 2020 at 05:32:02AM +, Makarand Pawagi wrote: [...] > > Subject: Re: [PATCH 12/12] bus: fsl-mc: Add ACPI support for fsl-mc > > > > Hi Lorenzo, > > > > On 5/21/2020 4:00 PM, Lorenzo Pieralisi wrote: > > > From: Diana Craciun > > > > > > Add ACPI support in the fsl-mc driver. Driver parses MC DSDT table to > > > extract memory and other resources. > > > > > > Interrupt (GIC ITS) information is extracted from the MADT table by > > > drivers/irqchip/irq-gic-v3-its-fsl-mc-msi.c. > > > > > > IORT table is parsed to configure DMA. > > > > > > Signed-off-by: Makarand Pawagi > > > Signed-off-by: Diana Craciun > > > Signed-off-by: Laurentiu Tudor > > > --- > > > > The author of this patch should be Makarand. I think I accidentaly broke it > > when > > we exchanged the patches. Very sorry about it. > > > > Will you be able to correct this or should I post another patch? I will update it. Lorenzo ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 09/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
On 2020-05-22 00:10, Rob Herring wrote: On Thu, May 21, 2020 at 7:00 AM Lorenzo Pieralisi wrote: From: Laurentiu Tudor The existing bindings cannot be used to specify the relationship between fsl-mc devices and GIC ITSes. Add a generic binding for mapping fsl-mc devices to GIC ITSes, using msi-map property. Signed-off-by: Laurentiu Tudor Cc: Rob Herring --- .../devicetree/bindings/misc/fsl,qoriq-mc.txt | 30 +-- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt index 9134e9bcca56..b0813b2d0493 100644 --- a/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt +++ b/Documentation/devicetree/bindings/misc/fsl,qoriq-mc.txt @@ -18,9 +18,9 @@ same hardware "isolation context" and a 10-bit value called an ICID the requester. The generic 'iommus' property is insufficient to describe the relationship -between ICIDs and IOMMUs, so an iommu-map property is used to define -the set of possible ICIDs under a root DPRC and how they map to -an IOMMU. +between ICIDs and IOMMUs, so the iommu-map and msi-map properties are used +to define the set of possible ICIDs under a root DPRC and how they map to +an IOMMU and a GIC ITS respectively. For generic IOMMU bindings, see Documentation/devicetree/bindings/iommu/iommu.txt. @@ -28,6 +28,9 @@ Documentation/devicetree/bindings/iommu/iommu.txt. For arm-smmu binding, see: Documentation/devicetree/bindings/iommu/arm,smmu.yaml. +For GICv3 and GIC ITS bindings, see: +Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml. + Required properties: - compatible @@ -119,6 +122,15 @@ Optional properties: associated with the listed IOMMU, with the iommu-specifier (i - icid-base + iommu-base). +- msi-map: Maps an ICID to a GIC ITS and associated iommu-specifier + data. + + The property is an arbitrary number of tuples of + (icid-base,iommu,iommu-base,length). I'm confused because the example has GIC ITS phandle, not an IOMMU. What is an iommu-base? Right, I was already halfway through writing a reply to say that all the copy-pasted "iommu" references here should be using the terminology from the pci-msi.txt binding instead. + + Any ICID in the interval [icid-base, icid-base + length) is + associated with the listed GIC ITS, with the iommu-specifier + (i - icid-base + iommu-base). Example: smmu: iommu@500 { @@ -128,6 +140,16 @@ Example: ... }; + gic: interrupt-controller@600 { + compatible = "arm,gic-v3"; + ... + its: gic-its@602 { + compatible = "arm,gic-v3-its"; + msi-controller; + ... + }; + }; + fsl_mc: fsl-mc@80c00 { compatible = "fsl,qoriq-mc"; reg = <0x0008 0x0c00 0 0x40>,/* MC portal base */ @@ -135,6 +157,8 @@ Example: msi-parent = <&its>; Side note: is it right to keep msi-parent here? It rather implies that the MC itself has a 'native' Device ID rather than an ICID, which I believe is not strictly true. Plus it's extra-confusing that it doesn't specify an ID either way, since that makes it look like the legacy PCI case that gets treated implicitly as an identity msi-map, which makes no sense at all to combine with an actual msi-map. /* define map for ICIDs 23-64 */ iommu-map = <23 &smmu 23 41>; +/* define msi map for ICIDs 23-64 */ +msi-map = <23 &its 23 41>; Seeing 23 twice is odd. The numbers to the right of 'its' should be an ITS number space. On about 99% of systems the values in the SMMU Stream ID and ITS Device ID spaces are going to be the same. Nobody's going to bother carrying *two* sets of sideband data across the interconnect if they don't have to ;) Robin. #address-cells = <3>; #size-cells = <1>; -- 2.26.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/dma: limit iova free size to unmmaped iova
On 2020-05-22 07:25, gup...@codeaurora.org wrote: On 2020-05-22 01:46, Robin Murphy wrote: On 2020-05-21 12:30, Prakash Gupta wrote: Limit the iova size while freeing based on unmapped size. In absence of this even with unmap failure, invalid iova is pushed to iova rcache and subsequently can cause panic while rcache magazine is freed. Can you elaborate on that panic? We have seen couple of stability issues around this. Below is one such example: kernel BUG at kernel/msm-4.19/drivers/iommu/iova.c:904! iova_magazine_free_pfns iova_rcache_insert free_iova_fast __iommu_unmap_page iommu_dma_unmap_page OK, so it's not some NULL dereference or anything completely unexpected, that's good. It turned out an iova pfn 0 got into iova_rcache. One possibility I see is where client unmap with invalid dma_addr. The unmap call will fail and warn on and still try to free iova. This will cause invalid pfn to be inserted into rcache. As and when the magazine with invalid pfn will be freed private_find_iova() will return NULL for invalid iova and meet bug condition. That would indeed be a bug in whatever driver made the offending dma_unmap call. Signed-off-by: Prakash Gupta :100644 100644 4959f5df21bd 098f7d377e04 M drivers/iommu/dma-iommu.c diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 4959f5df21bd..098f7d377e04 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -472,7 +472,8 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr, if (!cookie->fq_domain) iommu_tlb_sync(domain, &iotlb_gather); - iommu_dma_free_iova(cookie, dma_addr, size); + if (unmapped) + iommu_dma_free_iova(cookie, dma_addr, unmapped); Frankly, if any part of the unmap fails then things have gone catastrophically wrong already, but either way this isn't right. The IOVA API doesn't support partial freeing - an IOVA *must* be freed with its original size, or not freed at all, otherwise it will corrupt the state of the rcaches and risk a cascade of further misbehaviour for future callers. I agree, we shouldn't be freeing the partial iova. Instead just making sure if unmap was successful should be sufficient before freeing iova. So change can instead be something like this: - iommu_dma_free_iova(cookie, dma_addr, size); + if (unmapped) + iommu_dma_free_iova(cookie, dma_addr, size); TBH my gut feeling here is that you're really just trying to treat a symptom of another bug elsewhere, namely some driver calling dma_unmap_* or dma_free_* with the wrong address or size in the first place. This condition would arise only if driver calling dma_unmap/free_* with 0 iova_pfn. This will be flagged with a warning during unmap but will trigger panic later on while doing unrelated dma_map/unmap_*. If unmapped has already failed for invalid iova, there is no reason we should consider this as valid iova and free. This part should be fixed. I disagree. In general, if drivers call the DMA API incorrectly it is liable to lead to data loss, memory corruption, and various other unpleasant misbehaviour - it is not the DMA layer's job to attempt to paper over driver bugs. There *is* an argument for downgrading the BUG_ON() in iova_magazine_free_pfns() to a WARN_ON(), since frankly it isn't a sufficiently serious condition to justify killing the whole machine immediately, but NAK to bodging the iommu-dma mid-layer to "fix" that. A serious bug already happened elsewhere, so trying to hide the fallout really doesn't help anyone. Robin. On 2020-05-22 00:19, Andrew Morton wrote: I think we need a cc:stable here? Added now. Thanks, Prakash ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu