Re: [PATCH 2/2] dt-bindings: iommu: ipmmu-vmsa: Add r8a774a1 support
On Fri, 17 Aug 2018 15:31:05 +0100, Fabrizio Castro wrote: > Document RZ/G2M (R8A774A1) SoC bindings. > > Signed-off-by: Fabrizio Castro > Reviewed-by: Biju Das > --- > This patch applies on top of next-20180817 > > Documentation/devicetree/bindings/iommu/renesas,ipmmu-vmsa.txt | 1 + > 1 file changed, 1 insertion(+) > Reviewed-by: Rob Herring ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 06/19] dt-bindings: memory: tegra: Squash tegra20-gart into tegra20-mc
On 20.08.2018 22:27, Dmitry Osipenko wrote: > On 20.08.2018 22:12, Rob Herring wrote: >> On Sat, Aug 18, 2018 at 06:54:17PM +0300, Dmitry Osipenko wrote: >>> Splitting GART and Memory Controller wasn't a good decision that was made >>> back in the day. Given that the GART driver hasn't ever been used by >>> anything in the kernel, we decided that it will be better to correct the >>> mistakes of the past and merge two bindings into a single one. In a result >> >> As a result... >> >>> there is a DT ABI change for the Memory Controller that allows not to >>> break newer kernels using older DT by introducing a new required property, >>> the memory clock. Adding the new clock property also puts the tegra20-mc >>> binding in line with the bindings of the later Tegra generations. >> >> I don't understand this part. It looks to me like you are breaking >> compatibility. The driver failing to probe with an old DT is okay? > > Yes, DT compatibility is broken. New driver won't probe/load with the old DT, > that's what we want. > >> OS's like OpenSUSE use new DTs with older kernel versions, so you should >> consider how to not break them as well. I guess if all this is optional >> or has been unused, then there shouldn't be a problem. > > That's interesting.. Memory Controller isn't optional, I guess we could change > compatible to "nvidia,tegra20-mc-gart". * I meant it's not optional in a sense that it's enabled in kernels config by default and driver is functional, but it's okay if MC driver will stop to probe with older kernels as it is used only for reporting memory errors. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 06/19] dt-bindings: memory: tegra: Squash tegra20-gart into tegra20-mc
On 20.08.2018 22:12, Rob Herring wrote: > On Sat, Aug 18, 2018 at 06:54:17PM +0300, Dmitry Osipenko wrote: >> Splitting GART and Memory Controller wasn't a good decision that was made >> back in the day. Given that the GART driver hasn't ever been used by >> anything in the kernel, we decided that it will be better to correct the >> mistakes of the past and merge two bindings into a single one. In a result > > As a result... > >> there is a DT ABI change for the Memory Controller that allows not to >> break newer kernels using older DT by introducing a new required property, >> the memory clock. Adding the new clock property also puts the tegra20-mc >> binding in line with the bindings of the later Tegra generations. > > I don't understand this part. It looks to me like you are breaking > compatibility. The driver failing to probe with an old DT is okay? Yes, DT compatibility is broken. New driver won't probe/load with the old DT, that's what we want. > OS's like OpenSUSE use new DTs with older kernel versions, so you should > consider how to not break them as well. I guess if all this is optional > or has been unused, then there shouldn't be a problem. That's interesting.. Memory Controller isn't optional, I guess we could change compatible to "nvidia,tegra20-mc-gart". Thierry, do you have any other suggestions? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 06/19] dt-bindings: memory: tegra: Squash tegra20-gart into tegra20-mc
On Sat, Aug 18, 2018 at 06:54:17PM +0300, Dmitry Osipenko wrote: > Splitting GART and Memory Controller wasn't a good decision that was made > back in the day. Given that the GART driver hasn't ever been used by > anything in the kernel, we decided that it will be better to correct the > mistakes of the past and merge two bindings into a single one. In a result As a result... > there is a DT ABI change for the Memory Controller that allows not to > break newer kernels using older DT by introducing a new required property, > the memory clock. Adding the new clock property also puts the tegra20-mc > binding in line with the bindings of the later Tegra generations. I don't understand this part. It looks to me like you are breaking compatibility. The driver failing to probe with an old DT is okay? OS's like OpenSUSE use new DTs with older kernel versions, so you should consider how to not break them as well. I guess if all this is optional or has been unused, then there shouldn't be a problem. > Signed-off-by: Dmitry Osipenko > --- > .../bindings/iommu/nvidia,tegra20-gart.txt| 14 --- > .../memory-controllers/nvidia,tegra20-mc.txt | 23 ++- > 2 files changed, 17 insertions(+), 20 deletions(-) > delete mode 100644 > Documentation/devicetree/bindings/iommu/nvidia,tegra20-gart.txt > > diff --git a/Documentation/devicetree/bindings/iommu/nvidia,tegra20-gart.txt > b/Documentation/devicetree/bindings/iommu/nvidia,tegra20-gart.txt > deleted file mode 100644 > index 099d9362ebc1.. > --- a/Documentation/devicetree/bindings/iommu/nvidia,tegra20-gart.txt > +++ /dev/null > @@ -1,14 +0,0 @@ > -NVIDIA Tegra 20 GART > - > -Required properties: > -- compatible: "nvidia,tegra20-gart" > -- reg: Two pairs of cells specifying the physical address and size of > - the memory controller registers and the GART aperture respectively. > - > -Example: > - > - gart { > - compatible = "nvidia,tegra20-gart"; > - reg = <0x7000f024 0x0018/* controller registers */ > -0x5800 0x0200>; /* GART aperture */ > - }; > diff --git > a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-mc.txt > b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-mc.txt > index 7d60a50a4fa1..1564df89392a 100644 > --- > a/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-mc.txt > +++ > b/Documentation/devicetree/bindings/memory-controllers/nvidia,tegra20-mc.txt > @@ -2,25 +2,36 @@ NVIDIA Tegra20 MC(Memory Controller) > > Required properties: > - compatible : "nvidia,tegra20-mc" > -- reg : Should contain 2 register ranges(address and length); see the > - example below. Note that the MC registers are interleaved with the > - GART registers, and hence must be represented as multiple ranges. > +- reg : Should contain 2 register ranges: physical base address and length of > + the controller's registers and the GART aperture respectively. > +- clocks: Must contain an entry for each entry in clock-names. > + See ../clocks/clock-bindings.txt for details. > +- clock-names: Must include the following entries: > + - mc: the module's clock input > - interrupts : Should contain MC General interrupt. > - #reset-cells : Should be 1. This cell represents memory client module ID. >The assignments may be found in header file > >or in the TRM documentation. > +- #iommu-cells: Should be 0. This cell represents the number of cells in an > + IOMMU specifier needed to encode an address. GART supports only a single > + address space that is shared by all devices, therefore no additional > + information needed for the address encoding. > > Example: > mc: memory-controller@7000f000 { > compatible = "nvidia,tegra20-mc"; > - reg = <0x7000f000 0x024 > -0x7000f03c 0x3c4>; > - interrupts = <0 77 0x04>; > + reg = <0x7000f000 0x400 /* controller registers */ > +0x5800 0x0200>; /* GART aperture */ > + clocks = <&tegra_car TEGRA20_CLK_MC>; > + clock-names = "mc"; > + interrupts = ; > #reset-cells = <1>; > + #iommu-cells = <0>; > }; > > video-codec@6001a000 { > compatible = "nvidia,tegra20-vde"; > ... > resets = <&mc TEGRA20_MC_RESET_VDE>; > + iommus = <&mc>; > }; > -- > 2.18.0 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/3] arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
On 14/08/18 14:04, Robin Murphy wrote: Whilst the symmetry of deferring to the existing sync callback in __iommu_map_page() is nice, taking a round-trip through iommu_iova_to_phys() is a pretty heavyweight way to get an address we can trivially compute from the page we already have. Tweaking it to just perform the cache maintenance directly when appropriate doesn't really make the code any more complicated, and the runtime efficiency gain can only be a benefit. Furthermore, the sync operations themselves know they can only be invoked on a managed DMA ops domain, so can use the fast specific domain lookup to avoid excessive manipulation of the group refcount (particularly in the scatterlist cases). Signed-off-by: Robin Murphy --- arch/arm64/mm/dma-mapping.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 61e93f0b5482..5d4144093c20 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -712,7 +712,7 @@ static void __iommu_sync_single_for_cpu(struct device *dev, if (is_device_dma_coherent(dev)) return; - phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr); + phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr); __dma_unmap_area(phys_to_virt(phys), size, dir); } @@ -725,7 +725,7 @@ static void __iommu_sync_single_for_device(struct device *dev, if (is_device_dma_coherent(dev)) return; - phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr); + phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dev_addr); __dma_map_area(phys_to_virt(phys), size, dir); } @@ -738,9 +738,9 @@ static dma_addr_t __iommu_map_page(struct device *dev, struct page *page, int prot = dma_info_to_prot(dir, coherent, attrs); dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot); - if (!iommu_dma_mapping_error(dev, dev_addr) && - (attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0) - __iommu_sync_single_for_device(dev, dev_addr, size, dir); + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && + !iommu_dma_mapping_error(dev, dev_addr)) + __dma_map_area(page_address(page), size, dir); + offset Sigh... And there I was assuming that if I'd bothered to write up a proper commit message for the 15-month-old change I was cherry-picking, I must have actually tested it at the time. Robin. return dev_addr; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: dmaengine for sh7760 (was Re: use the generic dma-noncoherent code for sh V2)
On Sun, Aug 19, 2018 at 7:38 AM Rob Landley wrote: > > On 08/17/2018 03:23 PM, Arnd Bergmann wrote: > > On Fri, Aug 17, 2018 at 7:04 PM Rob Landley wrote: > >> On 07/31/2018 07:56 AM, Arnd Bergmann wrote: > >>> On Fri, Jul 27, 2018 at 6:20 PM, Rob Landley wrote: > On 07/24/2018 03:21 PM, Christoph Hellwig wrote: > > On Tue, Jul 24, 2018 at 02:01:42PM +0200, Christoph Hellwig wrote: > >> Hi all, > >>> If you hack on it, please convert the dmaengine platform data to use > >>> a dma_slave_map array to pass the data into the dmaengine driver, > >> > >> The dmatest module didn't need it? I don't see why the ethernet driver > >> would? > >> (Isn't the point of an allocator to allocate from a request?) > > > > I guess you have hit two of the special cases here: > > > > - dmatest uses the memory-to-memory DMA engine interface, not the slave > > API, so you don't have to configure a slave at all > > I've read through > https://www.kernel.org/doc/Documentation/driver-api/dmaengine/client.rst twice > and am still very unclear on the slave API. > > > - smc91x (and its smc911x.c relative) are apparently special in that they > > use they use the DMA slave API > > Only sort of. In 4.14 at least it's under #ifdef ARCH_PXA and full of PXA > constants (PXAD_PRIO_LOWEST and such). > > > but (AFAICT) require programming > > the dmaengine hardware into a memory-to-memory transfer with no > > DMA slave request signal and completely synchronous operation > > (the IRQ handler polls for the DMA descriptor to be complete), > > see also https://lkml.org/lkml/2018/4/3/464 for the discussion about > > the recent rework of that driver's implementation. > > Bookmarked, thanks. > > (Being able to just upgrade to a 4.19 kernel or something and have DMA work in > this driver if I've got dmaengine set up for the platform would be lovely.) I wouldn't expect too much even with the newer kernel, I think it still relies on a special case in the pxa DMA engine driver, possibly even in their hardware implementation. > >>> mapping the settings from a (pdev-name, channel-id) tuple to a pointer > >>> that describes the channel configuration rather than having the > >>> mapping from an numerical slave_id to a struct sh_dmae_slave_config > >>> in the setup files. It should be a fairly mechanical conversion. > >> > >> I think all 8 channels are generic. Drivers should be able to grab them and > >> release them at will, why does it need a table? > >> > >> (I say this not having made the smc91x.c driver use this yet, its > >> "conversion" > >> to device tree left it full of PXA #ifdefs and constants, and I've tried > >> the > > > > Another point about smc91x is that it only uses DMA on the PXA platform, > > which is not part of the "multiplatform" ARM setup. It's likely that no > > other platform actually has a DMA engine that can talk to this device in > > the absence of a request signal, or that on more modern CPU cores, > > a readsl() is actually just as fast, but it avoids the setup cost of talking > > to the dma engine. Possibly both of the above. > > The sh7760 has the CPU pegged at 100% trying to keep up with ethernet traffic. > Being able to use DMA on this would be very nice. This is probably for the most part due to the rather slow bus interface of the smc91x, especially if you can't use the 32-bit mode or an optimized readsl() implementation. Using DMA won't let you do the transfer in the background either, as it would on any other ethernet hardware, it just means the CPU is blocked for a little less time if the DMA engine can access the bus faster than the readsl() implementation can on your CPU. > >> last half-dozen kernel releases and qemu releases and have yet to find an > >> arm > >> mainstone board under qemu that _doesn't_ time out trying to use DMA with > >> this > >> card. But that's another post...) > > > > Is smc91x the only driver that you want to make use of the DMA engine? > > This driver's the low-hanging fruit, yeah. Copying NOR flash jffs2 data into > page cache would be nice but there's a decompression step so I'm not sure > that's > a win. Right, that would be even harder. The devices that are actually designed for interacting with the DMA engine are likely MMC, USB and audio on that chip. Those should be easier to do than the smc91x. > > I suspect that every other one currently relies on passing a slave ID > > shdma_chan_filter into dma_request_slave_channel_compat() or > > dma_request_channel() , which are some of the interfaces we want to > > remove in the future, to make everything work the same across > > all platforms. > > What are "all platforms" in this context? I tried to find an x86 variant that > uses DMAEngine but came up empty. Can I use DMAEngine on a raspberry pi > perhaps? > Is there a QEMU taret I can play with DMAEngine under? Most ARM SoCs these days have a DMA engine that only uses the new style interface with dma_request_chan() or dma_request_slave_channel().
Re: Testing the RK3288 VPU with static data on mainline kernels (Re: VPU tests)
On 18/08/18 07:15, Miouyouyou (Myy) wrote: I'm adding the Linux Rockchip LKML and Linux IOMMU LKML since mimicking old 4.4 code leads me to other issues. ayaka a écrit : Have you tried my new driver? MPP_service ? I'd like to but since the 4.4 Rockchip branch is being a bit difficult to recompile these days, I have to make do with the old prepackaged Rockchip-linux specific 4.4 kernel and its "vpu-service" driver. I could try to port the code, but then I'll have other issues, as stated below. I don't see the configure to the iommu and the iommu is not set to bypass either. Well, trying to do a simple iommu_get_dma_cookie triggered a ENODEV error. Which leds me to an old issue with RK3288 systems and mainline kernels : CONFIG_IOMMU_DMA is not set up by default when you select the Rockchip IOMMU driver. It's only enabled if you also enables the MediaTek IOMMU driver. So, I guess that it's only enabled when using global configuration files that target many boards at once. I'm adding the Rockchip LKML, since I'd like to know why CONFIG_IOMMU_DMA is not enabled, nor tested, by default when selecting the Rockchip IOMMU driver ? The old 4.4 drivers seems to heavily rely on it, making the whole porting process more difficult. Because the 32-bit Arm code has its own implementation and doesn't use IOMMU_DMA. The Arm DMA ops rely on a domain explicitly created by arm_setup_iommu_dma_ops() and wrapped in the dma_iommu_mapping address allocator. Arm will *eventually* get converted over to IOMMU_DMA, there's just a fair few fiddly bits still to resolve. Now, forcing CONFIG_IOMMU_DMA on mainline kernels breaks the Video Output MMU initialization, which leads to a lot of BUG_ON from the DRM drivers. Unplugging the screen before the system starts allows me to boot the system correctly (but without screen) and SSH into it. From a mainline perspective, enabling IOMMU_DMA on 32-bit Arm is still pretty much an untested and unsupported configuration. Without any corresponding dma_map_ops to complete the glue layer, there's not an awful lot of point. That said, enabling this option doesn't solve my issues with my VPU driver. Meaning that the VPU starts, triggers the IRQ, stops and nothing is written in the output... And now I also have no useable screen. I tried adding the gool old dance : iommu_domain_alloc(vpu_dev); iommu_get_dma_cookie(driver_data->iommu_domain); iommu_group_get(vpu_dev); iommu_dma_init_domain(driver_data->iommu_domain, 0x1000, SZ_2G, vpu_dev); iommu_group_put(group); But that doesn't change anything. The output DMA buffer is still untouched and my custom IOMMU Fault handler is not triggered. I'll give the DMA-Debug API a try. FWIW you're not actually attaching the group to the new domain in that sequence, but as above that still wouldn't make anything magically succeed because the Arm DMA ops won't understand an IOMMU_DMA domain anyway. Meanwhile, I'm also adding the Linux IOMMU LKML, since I'd like to know what's the recommended way to initialize a device to perform DMA operations, when there's an IOMMU, on mainline kernels ? I see a lot of legacy code (from 4.4 kernels) that tends to use the IOMMU and DMA API in ways that have been removed, or seem rather unused (grep or bootlin doesn't show much use). For example, do I still need to do iommu_get_dma_cookie ? rk_iommu_domain_alloc seems to perform the operation automatically, and the domain allocation is also done automatically with iommu_get_domain_for_dev . Should I still call iommu_dma_init_domain ? Also, does calling dma_set_max_seg_size makes sense for a device driver ? That function seems to be reserved for DMA drivers, yet I saw it on multiple implementations of the VPU driver, in the 4.4 kernels : https://github.com/rockchip-linux/kernel/blob/release-4.4/drivers/video/rockchip/vpu/vpu_iommu_drm.c#L139 https://github.com/rockchip-linux/kernel/blob/release-4.4/drivers/media/platform/rockchip-vpu/rockchip_vpu_hw.c#L179 Do you still need to attach the device you're using, using iommu_attach_device, if the attached IOMMU device is declared in its DTS node ? As far as I'm aware, unless you want to explicitly manage the IOMMU address space within your driver (which is beyond the scope of everything above) you shouldn't need to do anything - since the Rockchip IOMMU uses the generic "iommus" DT binding it should get picked up by dma_configure() and configured with appropriate IOMMU ops by arch_setup_dma_ops(). In general this is all designed to be transparently handled by the arch code, so touching any of it in a device driver is a sign of doing something wrong. Given that it apparently works fine for the VOP MMUs, I can't see any obvious reason why the VPU MMU would behave differently. Robin. On 08/18/2018 09:41 AM, Miouyouyou (Myy) wrote: Greetings, I'm currently testing the RK3288 VPU driver on mainline kernels 4.18+ (soon 4.19-rc1). The boards I'm using to perform the tests are : * A Tinkerboa
Re: [PATCH] swiotlb: Fix uninitialized pointer on DMA ops
On 18/08/18 20:04, Esteban Zamora wrote: The mmap function pointer on swiotlb_dma_ops struct is uninitialized, which causes a random crash when calling the dma_mmap_coherent function on platforms where no DMA address translation hardware is available. Can you share any kernel logs with details of those crashes? As Konrad mentions, the rules for partial structure initialisation in C are well-defined, even with designated initialisers[1], and if this commit message were true then half of the subsystems in the kernel would be crashing left right and centre. Robin. [1] https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html#Designated-Inits Set this pointer to NULL in order to fix the issue. Signed-off-by: Esteban Zamora --- kernel/dma/swiotlb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 4f8a6db..9a7718c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -1082,5 +1082,6 @@ const struct dma_map_ops swiotlb_dma_ops = { .map_page = swiotlb_map_page, .unmap_page = swiotlb_unmap_page, .dma_supported = dma_direct_supported, + .mmap = NULL, }; EXPORT_SYMBOL(swiotlb_dma_ops); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 0/4] iommu/arm-smmu: Add runtime pm/sleep support
Hi Robin, On Fri, Jul 27, 2018 at 4:02 PM Vivek Gautam wrote: > > This series provides the support for turning on the arm-smmu's > clocks/power domains using runtime pm. This is done using > device links between smmu and client devices. The device link > framework keeps the two devices in correct order for power-cycling > across runtime PM or across system-wide PM. > > With addition of a new device link flag DL_FLAG_AUTOREMOVE_SUPPLIER [8] > (available in linux-next of Rafael's linux-pm tree [9]), the device links > created between arm-smmu and its clients will be automatically purged > when arm-smmu driver unbinds from its device. > > As not all implementations support clock/power gating, we are checking > for a valid 'smmu->dev's pm_domain' to conditionally enable the runtime > power management for such smmu implementations that can support it. > Otherwise, the clocks are turned to be always on in .probe until .remove. > With conditional runtime pm now, we avoid touching dev->power.lock > in fastpaths for smmu implementations that don't need to do anything > useful with pm_runtime. > This lets us to use the much-argued pm_runtime_get_sync/put_sync() > calls in map/unmap callbacks so that the clients do not have to > worry about handling any of the arm-smmu's power. > > This series also adds support for Qcom's arm-smmu-v2 variant that > has different clocks and power requirements. > > Previous version of this patch series is @ [2]. > > Tested this series on msm8996, and sdm845 after pulling in Rafael's linux-pm > linux-next[9] and Joerg's iommu next[10] branches, and related changes for > device tree, etc. > > Hi Robin, Will, > I have addressed the comments for v13. If there's still a chance > can you please consider pulling this for v4.19. > Thanks. > > [v14] >* Moved arm_smmu_device_reset() from arm_smmu_pm_resume() to > arm_smmu_runtime_resume() so that the pm_resume callback calls > only runtime_resume to resume the device. > This should take care of restoring the state of smmu in systems > in which smmu lose register state on power-domain collapse. It's been a while since this series was posted and no more comments seem to be left anymore. Would you have some time to take a look again? Thanks. Best regards, Tomasz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu