Re: [PATCH v3 79/79] media: hantro: document the usage of pm_runtime_get_sync()
On 2021-04-27 11:27, Mauro Carvalho Chehab wrote: Despite other *_get()/*_put() functions, where usage count is incremented only if not errors, the pm_runtime_get_sync() has a different behavior, incrementing the counter *even* on errors. That's an error prone behavior, as people often forget to decrement the usage counter. However, the hantro driver depends on this behavior, as it will decrement the usage_count unconditionally at the m2m job finish time, which makes sense. So, intead of using the pm_runtime_resume_and_get() that would decrement the counter on error, keep the current API, but add a documentation explaining the rationale for keep using pm_runtime_get_sync(). Signed-off-by: Mauro Carvalho Chehab --- drivers/staging/media/hantro/hantro_drv.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c index 595e82a82728..96f940c1c85c 100644 --- a/drivers/staging/media/hantro/hantro_drv.c +++ b/drivers/staging/media/hantro/hantro_drv.c @@ -155,6 +155,13 @@ static void device_run(void *priv) ret = clk_bulk_enable(ctx->dev->variant->num_clocks, ctx->dev->clocks); if (ret) goto err_cancel_job; ..except this can also cause the same pm_runtime_put_autosuspend() call without even reaching the "matching" get below, so rather than some kind of cleverness it seems more like it's just broken :/ Robin. + + /* +* The pm_runtime_get_sync() will increment dev->power.usage_count, +* even on errors. That's the expected behavior here, since the +* hantro_job_finish() function at the error handling code +* will internally call pm_runtime_put_autosuspend(). +*/ ret = pm_runtime_get_sync(ctx->dev->dev); if (ret < 0) goto err_cancel_job; ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: use of dma_direct_set_offset in (allwinner) drivers
On 2020-11-04 08:14, Maxime Ripard wrote: Hi Christoph, On Tue, Nov 03, 2020 at 10:55:38AM +0100, Christoph Hellwig wrote: Linux 5.10-rc1 switched from having a single dma offset in struct device to a set of DMA ranges, and introduced a new helper to set them, dma_direct_set_offset. This in fact surfaced that a bunch of drivers that violate our layering and set the offset from drivers, which meant we had to reluctantly export the symbol to set up the DMA range. The drivers are: drivers/gpu/drm/sun4i/sun4i_backend.c This just use dma_direct_set_offset as a fallback. Is there any good reason to not just kill off the fallback? drivers/media/platform/sunxi/sun4i-csi/sun4i_csi.c Same as above. So, the history of this is: - We initially introduced the support for those two controllers assuming that there was a direct mapping between the physical and DMA addresses. It turns out it didn't and the DMA accesses were going through a secondary, dedicated, bus that didn't have the same mapping of the RAM than the CPU. 4690803b09c6 ("drm/sun4i: backend: Offset layer buffer address by DRAM starting address") - This dedicated bus is undocumented and barely used in the vendor kernel so this was overlooked, and it's fairly hard to get infos on it for all the SoCs we support. We added the DT support for it though on some SoCs we had enough infos to do so: c43a4469402f ("dt-bindings: interconnect: Add a dma interconnect name") 22f88e311399 ("ARM: dts: sun5i: Add the MBUS controller") This explains the check on the interconnect property - However, due to the stable DT rule, we still need to operate without regressions on older DTs that wouldn't have that property (and for SoCs we haven't figured out). Hence the fallback. How about having something in the platform code that keys off the top-level SoC compatible and uses a bus notifier to create offsets for the relevant devices if an MBUS description is missing? At least that way the workaround could be confined to a single dedicated place and look somewhat similar to other special cases like sta2x11, rather than being duplicated all over the place. Robin. drivers/media/platform/sunxi/sun6i-csi/sun6i_csi.c This driver unconditionally sets the offset. Why can't we do this in the device tree? drivers/staging/media/sunxi/cedrus/cedrus_hw.c Same as above. We should make those two match the previous ones, but we'll have the same issue here eventually. Most likely they were never ran on an SoC for which we have the MBUS figured out. Maxime ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH v5 8/9] arm64: dts: rockchip: add isp0 node for rk3399
On 2020-10-14 17:27, Helen Koike wrote: Hi Tomasz, On 9/26/20 10:00 AM, Tomasz Figa wrote: Hi Helen, On Wed, Jul 22, 2020 at 12:55:32PM -0300, Helen Koike wrote: From: Shunqian Zheng RK3399 has two ISPs, but only isp0 was tested. Add isp0 node in rk3399 dtsi Verified with: make ARCH=arm64 dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/media/rockchip-isp1.yaml Signed-off-by: Shunqian Zheng Signed-off-by: Jacob Chen Signed-off-by: Helen Koike --- V4: - update clock names V3: - clean up clocks V2: - re-order power-domains property V1: This patch was originally part of this patchset: https://patchwork.kernel.org/patch/10267431/ The only difference is: - add phy properties - add ports --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 25 1 file changed, 25 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index dba9641947a3a..ed8ba75dbbce8 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -1721,6 +1721,31 @@ vopb_mmu: iommu@ff903f00 { status = "disabled"; }; + isp0: isp0@ff91 { + compatible = "rockchip,rk3399-cif-isp"; + reg = <0x0 0xff91 0x0 0x4000>; + interrupts = ; + clocks = <&cru SCLK_ISP0>, +<&cru ACLK_ISP0_WRAPPER>, +<&cru HCLK_ISP0_WRAPPER>; + clock-names = "isp", "aclk", "hclk"; + iommus = <&isp0_mmu>; + phys = <&mipi_dphy_rx0>; + phy-names = "dphy"; + power-domains = <&power RK3399_PD_ISP0>; Should this have status = "disabled" too? The mipi_dphy_rx0 node is disabled by default too, so in the default configuration the driver would always fail to probe. I'm thinking what is the overall guideline here. Since isp and mipi_dphy are always present in the rk3399, shouldn't they always be enabled? Or since they are only useful if a sensor is present, we should let the dts of the board to enable it? Yes, the usual pattern is that anything which needs additional hardware outside the SoC to be useful is disabled by default in the SoC DTSI, and enabled in individual board DTSs as appropriate. See USB, HDMI, etc. for instance. There's probably a further debate about how much the board itself should enable if it only breaks out a connector for the user to add their own camera module, but hey, one step at a time ;) Robin. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH 00/16] IOMMU driver for Kirin 960/970
On 2020-08-19 11:28, Mauro Carvalho Chehab wrote: Em Tue, 18 Aug 2020 15:02:54 -0700 John Stultz escreveu: On Tue, Aug 18, 2020 at 9:26 AM Robin Murphy wrote: On 2020-08-18 16:29, Mauro Carvalho Chehab wrote: Em Tue, 18 Aug 2020 15:47:55 +0100 Basically, the DT binding has this, for IOMMU: smmu_lpae { compatible = "hisilicon,smmu-lpae"; }; ... dpe: dpe@e860 { compatible = "hisilicon,kirin970-dpe"; memory-region = <&drm_dma_reserved>; ... iommu_info { start-addr = <0x8000>; size = <0xbfff8000>; }; } This is used by kirin9xx_drm_dss.c in order to enable and use the iommu: static int dss_enable_iommu(struct platform_device *pdev, struct dss_hw_ctx *ctx) { struct device *dev = NULL; dev = &pdev->dev; /* create iommu domain */ ctx->mmu_domain = iommu_domain_alloc(dev->bus); if (!ctx->mmu_domain) { pr_err("iommu_domain_alloc failed!\n"); return -EINVAL; } iommu_attach_device(ctx->mmu_domain, dev); return 0; } The only place where the IOMMU domain is used is on this part of the code(error part simplified here) [1]: void hisi_dss_smmu_on(struct dss_hw_ctx *ctx) { uint64_t fama_phy_pgd_base; uint32_t phy_pgd_base; ... fama_phy_pgd_base = iommu_iova_to_phys(ctx->mmu_domain, 0); phy_pgd_base = (uint32_t)fama_phy_pgd_base; if (WARN_ON(!phy_pgd_base)) return; set_reg(smmu_base + SMMU_CB_TTBR0, phy_pgd_base, 32, 0); } [1] https://github.com/mchehab/linux/commit/36da105e719b47bbe9d6cb7e5619b30c7f3eb1bd In other words, the driver needs to get the physical address of the frame buffer (mapped via iommu) in order to set some DRM-specific register. Yeah, the above code is somewhat hackish. I would love to replace this part by a more standard approach. OK, so from a quick look at that, my impression is that your display controller has its own MMU and you don't need to pretend to use the IOMMU API at all. Just have the DRM driver use io-pgtable directly to run its own set of ARM_32_LPAE_S1 pagetables - see Panfrost for an example (but try to ignore the wacky "Mali LPAE" format). Yea. For the HiKey960, there was originally a similar patch series but it was refactored out and the (still out of tree) DRM driver I'm carrying doesn't seem to need it (though looking we still have the iommu_info subnode in the dts that maybe needs to be cleaned up). Funny... while the Hikey 970 DRM driver has such IOMMU code, it doesn't actually use it! The driver has a function called hisi_dss_smmu_config() with sets the registers on a different way in order to use IOMMU or not, at the hisi_fb_pan_display() function. It can also use a mode called "afbcd". Well, this function sets both to false: bool afbcd = false; bool mmu_enable = false; I ended commenting out the code which depends at the iommu driver and everything is working as before. So, I'll just forget about this iommu driver, as we can live without that. For now, I'll keep the mmu code there commented out, as it could be useful on a future port for it to use io-pgtable. - Robin, Can the Panfrost driver use io-pgtable while the KMS driver won't be using it? Or this would cause it to not work? My end goal here is to be able to test the Panfrost driver ;-) Yup, the GPU has its own independent MMU, so Panfrost can import display buffers regardless of whether they're physically contiguous or not. Since Mesa master has recently landed AFBC support, there's probably more immediate benefit in getting that AFBC decoder working before the display MMU (although ultimately things are likely to work better under memory pressure if you don't have to rely on CMA, so it should still be worth coming back to at some point). Robin. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH 00/16] IOMMU driver for Kirin 960/970
On 2020-08-18 23:02, John Stultz wrote: On Tue, Aug 18, 2020 at 9:26 AM Robin Murphy wrote: On 2020-08-18 16:29, Mauro Carvalho Chehab wrote: Em Tue, 18 Aug 2020 15:47:55 +0100 Basically, the DT binding has this, for IOMMU: smmu_lpae { compatible = "hisilicon,smmu-lpae"; }; ... dpe: dpe@e860 { compatible = "hisilicon,kirin970-dpe"; memory-region = <&drm_dma_reserved>; ... iommu_info { start-addr = <0x8000>; size = <0xbfff8000>; }; } This is used by kirin9xx_drm_dss.c in order to enable and use the iommu: static int dss_enable_iommu(struct platform_device *pdev, struct dss_hw_ctx *ctx) { struct device *dev = NULL; dev = &pdev->dev; /* create iommu domain */ ctx->mmu_domain = iommu_domain_alloc(dev->bus); if (!ctx->mmu_domain) { pr_err("iommu_domain_alloc failed!\n"); return -EINVAL; } iommu_attach_device(ctx->mmu_domain, dev); return 0; } The only place where the IOMMU domain is used is on this part of the code(error part simplified here) [1]: void hisi_dss_smmu_on(struct dss_hw_ctx *ctx) { uint64_t fama_phy_pgd_base; uint32_t phy_pgd_base; ... fama_phy_pgd_base = iommu_iova_to_phys(ctx->mmu_domain, 0); phy_pgd_base = (uint32_t)fama_phy_pgd_base; if (WARN_ON(!phy_pgd_base)) return; set_reg(smmu_base + SMMU_CB_TTBR0, phy_pgd_base, 32, 0); } [1] https://github.com/mchehab/linux/commit/36da105e719b47bbe9d6cb7e5619b30c7f3eb1bd In other words, the driver needs to get the physical address of the frame buffer (mapped via iommu) in order to set some DRM-specific register. Yeah, the above code is somewhat hackish. I would love to replace this part by a more standard approach. OK, so from a quick look at that, my impression is that your display controller has its own MMU and you don't need to pretend to use the IOMMU API at all. Just have the DRM driver use io-pgtable directly to run its own set of ARM_32_LPAE_S1 pagetables - see Panfrost for an example (but try to ignore the wacky "Mali LPAE" format). Yea. For the HiKey960, there was originally a similar patch series but it was refactored out and the (still out of tree) DRM driver I'm carrying doesn't seem to need it (though looking we still have the iommu_info subnode in the dts that maybe needs to be cleaned up). Indeed, I'd assume it's possible to leave the MMU off and just use CMA buffers instead, but wiring it up properly without the downstream mis-design should be pretty clean, so maybe that could ultimately be shared with 960 too (assuming the hardware isn't wildly dissimilar). I notice there's already a whole load of MMU configuration hard-coded into the DRM driver - does iommu_info even need to be in the DT, or could that also be decided directly by the driver? (Most other MMU-aware DRM drivers seem to hard-code their drm_mm dimensions.) I can't imagine the *virtual* address space limits need to vary on a per-board basis, and they could easily be tied to the compatible if they legitimately differ across SoCs and a simple lowest-common-denominator approach wouldn't suffice for whatever reason. Robin. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH 00/16] IOMMU driver for Kirin 960/970
On 2020-08-18 16:29, Mauro Carvalho Chehab wrote: Hi Robin, Em Tue, 18 Aug 2020 15:47:55 +0100 Robin Murphy escreveu: On 2020-08-17 08:49, Mauro Carvalho Chehab wrote: Add a driver for the Kirin 960/970 iommu. As on the past series, this starts from the original 4.9 driver from the 96boards tree: https://github.com/96boards-hikey/linux/tree/hikey970-v4.9 The remaining patches add SPDX headers and make it build and run with the upstream Kernel. Chenfeng (1): iommu: add support for HiSilicon Kirin 960/970 iommu Mauro Carvalho Chehab (15): iommu: hisilicon: remove default iommu_map_sg handler iommu: hisilicon: map and unmap ops gained new arguments iommu: hisi_smmu_lpae: rebase it to work with upstream iommu: hisi_smmu: remove linux/hisi/hisi-iommu.h iommu: hisilicon: cleanup its code style iommu: hisi_smmu_lpae: get rid of IOMMU_SEC and IOMMU_DEVICE iommu: get rid of map/unmap tile functions iommu: hisi_smmu_lpae: use the right code to get domain-priv data iommu: hisi_smmu_lpae: convert it to probe_device iommu: add Hisilicon Kirin970 iommu at the building system iommu: hisi_smmu_lpae: cleanup printk macros iommu: hisi_smmu_lpae: make OF compatible more standard Echoing the other comments about none of the driver patches being CC'd to the IOMMU list... Still, I dug the series up on lore and frankly I'm not sure what to make of it - AFAICS the "driver" is just yet another implementation of Arm LPAE pagetable code, with no obvious indication of how those pagetables ever get handed off to IOMMU hardware (and indeed no indication of IOMMU hardware at all). Can you explain how it's supposed to work? And as a pre-emptive strike, we really don't need any more LPAE implementations - that's what the io-pgtable library is all about (which incidentally has been around since 4.0...). I think that should make the issue of preserving authorship largely moot since there's no need to preserve most of the code anyway ;) I didn't know about that, since I got a Hikey 970 board for the first time about one month ago, and that's the first time I looked into iommu code. My end goal with this is to make the DRM/KMS driver to work with upstream Kernels. The full patch series are at: https://github.com/mchehab/linux/commits/hikey970/to_upstream-2.0-v1.1 (I need to put a new version there, after some changes due to recent upstream discussions at the regulator's part of the code) Basically, the DT binding has this, for IOMMU: smmu_lpae { compatible = "hisilicon,smmu-lpae"; }; ... dpe: dpe@e860 { compatible = "hisilicon,kirin970-dpe"; memory-region = <&drm_dma_reserved>; ... iommu_info { start-addr = <0x8000>; size = <0xbfff8000>; }; } This is used by kirin9xx_drm_dss.c in order to enable and use the iommu: static int dss_enable_iommu(struct platform_device *pdev, struct dss_hw_ctx *ctx) { struct device *dev = NULL; dev = &pdev->dev; /* create iommu domain */ ctx->mmu_domain = iommu_domain_alloc(dev->bus); if (!ctx->mmu_domain) { pr_err("iommu_domain_alloc failed!\n"); return -EINVAL; } iommu_attach_device(ctx->mmu_domain, dev); return 0; } The only place where the IOMMU domain is used is on this part of the code(error part simplified here) [1]: void hisi_dss_smmu_on(struct dss_hw_ctx *ctx) { uint64_t fama_phy_pgd_base; uint32_t phy_pgd_base; ... fama_phy_pgd_base = iommu_iova_to_phys(ctx->mmu_domain, 0); phy_pgd_base = (uint32_t)fama_phy_pgd_base; if (WARN_ON(!phy_pgd_base)) return; set_reg(smmu_base + SMMU_CB_TTBR0, phy_pgd_base, 32, 0); } [1] https://github.com/mchehab/linux/commit/36da105e719b47bbe9d6cb7e5619b30c7f3eb1bd In other words, the driver needs to get the physical address of the frame buffer (mapped via iommu) in order to set some DRM-specific register. Yeah, the above code is somewhat hackish. I would love to replace this part by a more standard approach. OK, so from a quick look at that, my impression is that your display controller has its own MMU and you don't need to pretend to use the IOMMU API at all. Just have the DRM driver use io-pgtable directly to run its own set of ARM_32_LPAE_S1 pagetables - see Panfrost for an example (but try to ignore the wacky "Mali LPAE" format). Robin. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH 00/16] IOMMU driver for Kirin 960/970
On 2020-08-17 08:49, Mauro Carvalho Chehab wrote: Add a driver for the Kirin 960/970 iommu. As on the past series, this starts from the original 4.9 driver from the 96boards tree: https://github.com/96boards-hikey/linux/tree/hikey970-v4.9 The remaining patches add SPDX headers and make it build and run with the upstream Kernel. Chenfeng (1): iommu: add support for HiSilicon Kirin 960/970 iommu Mauro Carvalho Chehab (15): iommu: hisilicon: remove default iommu_map_sg handler iommu: hisilicon: map and unmap ops gained new arguments iommu: hisi_smmu_lpae: rebase it to work with upstream iommu: hisi_smmu: remove linux/hisi/hisi-iommu.h iommu: hisilicon: cleanup its code style iommu: hisi_smmu_lpae: get rid of IOMMU_SEC and IOMMU_DEVICE iommu: get rid of map/unmap tile functions iommu: hisi_smmu_lpae: use the right code to get domain-priv data iommu: hisi_smmu_lpae: convert it to probe_device iommu: add Hisilicon Kirin970 iommu at the building system iommu: hisi_smmu_lpae: cleanup printk macros iommu: hisi_smmu_lpae: make OF compatible more standard Echoing the other comments about none of the driver patches being CC'd to the IOMMU list... Still, I dug the series up on lore and frankly I'm not sure what to make of it - AFAICS the "driver" is just yet another implementation of Arm LPAE pagetable code, with no obvious indication of how those pagetables ever get handed off to IOMMU hardware (and indeed no indication of IOMMU hardware at all). Can you explain how it's supposed to work? And as a pre-emptive strike, we really don't need any more LPAE implementations - that's what the io-pgtable library is all about (which incidentally has been around since 4.0...). I think that should make the issue of preserving authorship largely moot since there's no need to preserve most of the code anyway ;) Robin. dt: add an spec for the Kirin36x0 SMMU dt: hi3670-hikey970.dts: load the SMMU driver on Hikey970 staging: hikey9xx: add an item about the iommu driver .../iommu/hisilicon,kirin36x0-smmu.yaml | 55 ++ .../boot/dts/hisilicon/hi3670-hikey970.dts| 3 + drivers/staging/hikey9xx/Kconfig | 9 + drivers/staging/hikey9xx/Makefile | 1 + drivers/staging/hikey9xx/TODO | 1 + drivers/staging/hikey9xx/hisi_smmu.h | 196 ++ drivers/staging/hikey9xx/hisi_smmu_lpae.c | 648 ++ 7 files changed, 913 insertions(+) create mode 100644 Documentation/devicetree/bindings/iommu/hisilicon,kirin36x0-smmu.yaml create mode 100644 drivers/staging/hikey9xx/hisi_smmu.h create mode 100644 drivers/staging/hikey9xx/hisi_smmu_lpae.c ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH v5 08/12] device core: Introduce multiple dma pfn offsets
Hi Jim, Thanks for taking this on! On 2020-06-16 21:55, Jim Quinlan wrote: The new field in struct device 'dma_pfn_offset_map' is used to facilitate the use of single or multiple pfn offsets between cpu addrs and dma addrs. It subsumes the role of dev->dma_pfn_offset -- a uniform offset. This isn't just about offsets - it should (eventually) subsume bus_dma_limit as well, so I'd be inclined to call it something like "dma_ranges"/"dma_range_map"/"dma_regions"/etc. The function of_dma_get_range() has been modified to take two additional arguments: the "map", which is an array that holds the information regarding the pfn offset regions, and map_size, which is the size in bytes of the map array. of_dma_configure() is the typical manner to set pfn offsets but there are a number of ad hoc assignments to dev->dma_pfn_offset in the kernel driver code. These cases now invoke the function dma_attach_uniform_pfn_offset(dev, pfn_offset). I'm also not convinced that sticking to the PFN paradigm is necessarily the right way to go - when there's only a single nicely-aligned offset to consider then an unsigned long that's immune to PAE/LPAE/etc. disruption is indeed the cheapest and easiest option from core code's PoV. However it already means that all the users have to do some degree of conversion back and forth between PFNs and usable addresses; once the core code itself also has to start bouncing back and forth between addresses and PFNs internally then we end up effectively just doing work to cancel out other work, and the whole lot would end up simpler and more efficient if the API worked purely in terms of addresses. [...] diff --git a/drivers/of/address.c b/drivers/of/address.c index 8eea3f6e29a4..767fa3b492c8 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -918,12 +918,48 @@ void __iomem *of_io_request_and_map(struct device_node *np, int index, } EXPORT_SYMBOL(of_io_request_and_map); +static int dma_attach_pfn_offset_map(struct device_node *node, int num_ranges, +struct bus_dma_region **map, size_t *map_size) +{ + struct of_range_parser parser; + struct of_range range; + struct bus_dma_region *r; + + *map_size = (num_ranges + 1) * sizeof(**map); + r = kcalloc(num_ranges + 1, sizeof(**map), GFP_KERNEL); + if (!r) + return -ENOMEM; + *map = r; + + of_dma_range_parser_init(&parser, node); + /* +* Record all info for DMA ranges array. We could +* just use the of_range struct, but if we did that it Not making the entire DMA API depend on OF is a far better justification for having its own dedicated structure. +* would require more calculations for phys_to_dma and +* dma_to_phys conversions. +*/ However that part is pretty much nonsense. Consider your "efficient" operation for looking up and consuming a DMA offset: API caller 1. load cpu_start 2. compare addr >= cpu_start 3. load cpu_end 4. compare addr <= cpu_end 5. load pfn_offset 6. shift pfn_offset << PAGE_SHIFT 7. add "offset" + addr 8. [use the result] versus the "more calculations" approach (once the PFN cruft is peeled away): API caller 1. load cpu_addr 2. compare addr >= cpu_addr 3. subtract addr - cpu_addr 4. load size 5. compare "addr_offset" < size 6. load dma_start 7. add dma_start + "addr_offset" 8. [use the result] Oh look, it's the exact same number of memory accesses and ALU operations, but with a smaller code footprint (assuming, reasonably, more than one caller) and less storage overhead ;) Basically, having this degree of redundancy is somewhere between silly and actively harmful (what if pfn_offset gets out of sync with cpu_start/dma_start? What if cpu_end/dma_end don't represent equivalent lengths?) + for_each_of_range(&parser, &range) { + r->cpu_start = range.cpu_addr; + r->cpu_end = r->cpu_start + range.size - 1; + r->dma_start = range.bus_addr; + r->dma_end = r->dma_start + range.size - 1; + r->pfn_offset = PFN_DOWN(range.cpu_addr) - PFN_DOWN(range.bus_addr); + r++; + } + return 0; +} + /** * of_dma_get_range - Get DMA range info * @np: device node to get DMA range info * @dma_addr: pointer to store initial DMA address of DMA range * @paddr:pointer to store initial CPU address of DMA range * @size: pointer to store size of DMA range + * @map: pointer to a pointer of an array of structs. This is updated + * to point to NULL (no offsets needed) or kmalloc'd array of + * structs. In the latter case, it is the caller's obligation to + * kfree the array in the case
Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled
On 24/07/2019 15:09, Dmitry Osipenko wrote: 24.07.2019 17:03, Yuehaibing пишет: On 2019/7/24 21:49, Robin Murphy wrote: On 24/07/2019 11:30, Sakari Ailus wrote: Hi Yue, On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote: If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m. But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT" in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA is m, then the building fails like this: drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map': iommu.c:(.text+0x41): undefined reference to `alloc_iova' iommu.c:(.text+0x56): undefined reference to `__free_iova' Reported-by: Hulk Robot Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci device driver") Signed-off-by: YueHaibing --- drivers/staging/media/ipu3/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/media/ipu3/Kconfig b/drivers/staging/media/ipu3/Kconfig index 4b51c67..b7df18f 100644 --- a/drivers/staging/media/ipu3/Kconfig +++ b/drivers/staging/media/ipu3/Kconfig @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU depends on PCI && VIDEO_V4L2 depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API depends on X86 -select IOMMU_IOVA +select IOMMU_IOVA if IOMMU_SUPPORT This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA independently of IOMMU_SUPPORT. Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not declared in its Kconfig entry. I wonder if adding that would be the right way to fix this. Cc'ing the IOMMU list. IOMMU_SUPPORT is optional for the Tegra-VDE driver. Right, I also had the impression that we'd made the IOVA library completely standalone. And what does the IPU3 driver's Kconfig have to do with some *other* driver failing to link anyway? I can see it failing if IPU3 is compiled as a loadable module, while Tegra-VDE is a built-in driver. Hence IOVA lib should be also a kernel module and thus the IOVA symbols will be missing during of linkage of the VDE driver. Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank you for clarification. I will try to fix this in tegra-vde. Probably IOVA could be selected independently of IOMMU_SUPPORT, but IOVA library isn't needed for the VDE driver if IOMMU_SUPPORT is disabled. Oh, I think I get the problem now - tegra-vde/iommu.c is built unconditionally and relies on the static inline stubs for IOMMU and IOVA calls if !IOMMU_SUPPORT, but in a compile-test config where IOVA=m for other reasons, it then picks up the real declarations from linux/iova.h instead of the stubs, and things go downhill from there. So there is a real issue, but indeed it's Tegra-VDE which needs to be restructured to cope with such configurations, and not IPU3's (or anyone else who may select IOVA=m in future) job to work around it. Robin. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled
On 24/07/2019 11:30, Sakari Ailus wrote: Hi Yue, On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote: If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m. But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT" in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA is m, then the building fails like this: drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map': iommu.c:(.text+0x41): undefined reference to `alloc_iova' iommu.c:(.text+0x56): undefined reference to `__free_iova' Reported-by: Hulk Robot Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci device driver") Signed-off-by: YueHaibing --- drivers/staging/media/ipu3/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/media/ipu3/Kconfig b/drivers/staging/media/ipu3/Kconfig index 4b51c67..b7df18f 100644 --- a/drivers/staging/media/ipu3/Kconfig +++ b/drivers/staging/media/ipu3/Kconfig @@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU depends on PCI && VIDEO_V4L2 depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API depends on X86 - select IOMMU_IOVA + select IOMMU_IOVA if IOMMU_SUPPORT This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA independently of IOMMU_SUPPORT. Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not declared in its Kconfig entry. I wonder if adding that would be the right way to fix this. Cc'ing the IOMMU list. Right, I also had the impression that we'd made the IOVA library completely standalone. And what does the IPU3 driver's Kconfig have to do with some *other* driver failing to link anyway? Robin. select VIDEOBUF2_DMA_SG help This is the Video4Linux2 driver for Intel IPU3 image processing unit, ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH 16/16] dma-mapping: use exact allocation in dma_alloc_contiguous
On 14/06/2019 15:50, 'Christoph Hellwig' wrote: On Fri, Jun 14, 2019 at 02:15:44PM +, David Laight wrote: Does this still guarantee that requests for 16k will not cross a 16k boundary? It looks like you are losing the alignment parameter. The DMA API never gave you alignment guarantees to start with, and you can get not naturally aligned memory from many of our current implementations. Well, apart from the bit in DMA-API-HOWTO which has said this since forever (well, before Git history, at least): "The CPU virtual address and the DMA address are both guaranteed to be aligned to the smallest PAGE_SIZE order which is greater than or equal to the requested size. This invariant exists (for example) to guarantee that if you allocate a chunk which is smaller than or equal to 64 kilobytes, the extent of the buffer you receive will not cross a 64K boundary." That said, I don't believe this particular patch should make any appreciable difference - alloc_pages_exact() is still going to give back the same base address as the rounded up over-allocation would, and PAGE_ALIGN()ing the size passed to get_order() already seemed to be pointless. Robin. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH v3 2/4] drivers/staging/fsl-mc: Fix DPIO error path issues
Hi Roy, On 26/03/18 20:05, Roy Pledge wrote: The error path in the dpaa2_dpio_probe() function was not properly unmapping the QBMan device memory on the error path. This was also missing from the dpaa2_dpio_release() function. Also addresses a memory leak of the device private data structure. Signed-off-by: Roy Pledge --- drivers/staging/fsl-mc/bus/dpio/dpio-driver.c | 49 +++ 1 file changed, 34 insertions(+), 15 deletions(-) diff --git a/drivers/staging/fsl-mc/bus/dpio/dpio-driver.c b/drivers/staging/fsl-mc/bus/dpio/dpio-driver.c index e00f473..e7a0009 100644 --- a/drivers/staging/fsl-mc/bus/dpio/dpio-driver.c +++ b/drivers/staging/fsl-mc/bus/dpio/dpio-driver.c @@ -28,6 +28,7 @@ MODULE_DESCRIPTION("DPIO Driver"); struct dpio_priv { struct dpaa2_io *io; + struct dpaa2_io_desc desc; }; static irqreturn_t dpio_irq_handler(int irq_num, void *arg) @@ -85,7 +86,6 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu) static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev) { struct dpio_attr dpio_attrs; - struct dpaa2_io_desc desc; struct dpio_priv *priv; int err = -ENOMEM; struct device *dev = &dpio_dev->dev; @@ -117,7 +117,7 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev) dev_err(dev, "dpio_get_attributes() failed %d\n", err); goto err_get_attr; } - desc.qman_version = dpio_attrs.qbman_version; + priv->desc.qman_version = dpio_attrs.qbman_version; err = dpio_enable(dpio_dev->mc_io, 0, dpio_dev->mc_handle); if (err) { @@ -126,9 +126,9 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev) } /* initialize DPIO descriptor */ - desc.receives_notifications = dpio_attrs.num_priorities ? 1 : 0; - desc.has_8prio = dpio_attrs.num_priorities == 8 ? 1 : 0; - desc.dpio_id = dpio_dev->obj_desc.id; + priv->desc.receives_notifications = dpio_attrs.num_priorities ? 1 : 0; + priv->desc.has_8prio = dpio_attrs.num_priorities == 8 ? 1 : 0; + priv->desc.dpio_id = dpio_dev->obj_desc.id; /* get the cpu to use for the affinity hint */ if (next_cpu == -1) @@ -139,19 +139,28 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev) if (!cpu_possible(next_cpu)) { dev_err(dev, "probe failed. Number of DPIOs exceeds NR_CPUS.\n"); err = -ERANGE; - goto err_allocate_irqs; + goto err_too_many_cpu; } - desc.cpu = next_cpu; + priv->desc.cpu = next_cpu; /* * Set the CENA regs to be the cache inhibited area of the portal to * avoid coherency issues if a user migrates to another core. */ - desc.regs_cena = memremap(dpio_dev->regions[1].start, - resource_size(&dpio_dev->regions[1]), - MEMREMAP_WC); - desc.regs_cinh = ioremap(dpio_dev->regions[1].start, -resource_size(&dpio_dev->regions[1])); + priv->desc.regs_cena = memremap(dpio_dev->regions[1].start, + resource_size(&dpio_dev->regions[1]), + MEMREMAP_WC); Since you already have some devres-managed resources in this driver, maybe use devm_memremap? (and perhaps convert the existing ioremap too) + if (!priv->desc.regs_cena) { + dev_err(dev, "memremap failed\n"); + goto err_too_many_cpu; + } + + priv->desc.regs_cinh = ioremap(dpio_dev->regions[1].start, + resource_size(&dpio_dev->regions[1])); + if (!priv->desc.regs_cinh) { + dev_err(dev, "ioremap failed\n"); + goto err_ioremap_failed; + } err = fsl_mc_allocate_irqs(dpio_dev); if (err) { @@ -159,11 +168,11 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev) goto err_allocate_irqs; } - err = register_dpio_irq_handlers(dpio_dev, desc.cpu); + err = register_dpio_irq_handlers(dpio_dev, priv->desc.cpu); if (err) goto err_register_dpio_irq; - priv->io = dpaa2_io_create(&desc); + priv->io = dpaa2_io_create(&priv->desc); if (!priv->io) { dev_err(dev, "dpaa2_io_create failed\n"); goto err_dpaa2_io_create; @@ -171,7 +180,7 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev) dev_info(dev, "probed\n"); dev_dbg(dev, " receives_notifications = %d\n", - desc.receives_notifications); + priv->desc.receives_notifications); dpio_close(dpio_dev->mc_io, 0, dpio_dev->mc_handle); fsl_mc_portal_free(dpio_dev->mc_io); @@ -182,6 +191,10 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev) err_register_dpio_irq: fsl_mc_fr
Re: [PATCH 6/7] staging: fsl-mc: rewrite mc command send/receive to work on 32-bits
On 17/07/17 14:26, laurentiu.tu...@nxp.com wrote: > From: Laurentiu Tudor > > Split the 64-bit accesses in 32-bit accesses because there's no real > constrain in MC to do only atomic 64-bit. There's only one place > where ordering is important: when writing the MC command header the > first 32-bit part of the header must be written last. > We do this switch in order to allow compiling the driver on 32-bit. #include Then you can just use writeq()/writeq_relaxed()/readq_relaxed() on 32-bit platforms as well. Robin. > Signed-off-by: Laurentiu Tudor > --- > drivers/staging/fsl-mc/bus/mc-sys.c | 31 --- > 1 file changed, 12 insertions(+), 19 deletions(-) > > diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c > b/drivers/staging/fsl-mc/bus/mc-sys.c > index 195d9f3..dd2828e 100644 > --- a/drivers/staging/fsl-mc/bus/mc-sys.c > +++ b/drivers/staging/fsl-mc/bus/mc-sys.c > @@ -124,14 +124,15 @@ static inline void mc_write_command(struct mc_command > __iomem *portal, > { > int i; > > - /* copy command parameters into the portal */ > - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) > - __raw_writeq(cmd->params[i], &portal->params[i]); > - /* ensure command params are committed before submitting it */ > - wmb(); > - > - /* submit the command by writing the header */ > - __raw_writeq(cmd->header, &portal->header); > + /* > + * copy command parameters into the portal. Final write > + * triggers the submission of the command. > + */ > + for (i = sizeof(struct mc_command) / sizeof(u32) - 1; i >= 0; i--) { > + __raw_writel(((u32 *)cmd)[i], &((u32 *)portal)[i]); > + /* ensure command params are committed before submitting it */ > + wmb(); > + } > } > > /** > @@ -148,19 +149,11 @@ static inline enum mc_cmd_status > mc_read_response(struct mc_command __iomem * > struct mc_command *resp) > { > int i; > - enum mc_cmd_status status; > - > - /* Copy command response header from MC portal: */ > - resp->header = __raw_readq(&portal->header); > - status = mc_cmd_hdr_read_status(resp); > - if (status != MC_CMD_STATUS_OK) > - return status; > > - /* Copy command response data from MC portal: */ > - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) > - resp->params[i] = __raw_readq(&portal->params[i]); > + for (i = 0; i < sizeof(struct mc_command) / sizeof(u32); i++) > + ((u32 *)resp)[i] = __raw_readl(&((u32 *)portal)[i]); > > - return status; > + return mc_cmd_hdr_read_status(resp); > } > > /** > ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH 2/7] staging: fsl-mc: use generic memory barriers
On 17/07/17 14:26, laurentiu.tu...@nxp.com wrote: > From: Laurentiu Tudor > > No need to use arch-specific memory barriers; switch to using generic > ones. The rmb()s were useless so drop them. > > Signed-off-by: Laurentiu Tudor > --- > drivers/staging/fsl-mc/bus/mc-sys.c | 6 ++ > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c > b/drivers/staging/fsl-mc/bus/mc-sys.c > index a1704c3..012abd5 100644 > --- a/drivers/staging/fsl-mc/bus/mc-sys.c > +++ b/drivers/staging/fsl-mc/bus/mc-sys.c > @@ -127,7 +127,8 @@ static inline void mc_write_command(struct mc_command > __iomem *portal, > /* copy command parameters into the portal */ > for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) > __raw_writeq(cmd->params[i], &portal->params[i]); > - __iowmb(); > + /* ensure command params are committed before submitting it */ > + wmb(); > > /* submit the command by writing the header */ > __raw_writeq(cmd->header, &portal->header); AFAICS, just using writeq() here should ensure sufficient order against the previous iomem accessors, without the need for explicit barriers. Also, note that the __raw_*() variants aren't endian-safe, so consider updating things to *_relaxed() where ordering doesn't matter. Robin. > @@ -150,9 +151,7 @@ static inline enum mc_cmd_status mc_read_response(struct > mc_command __iomem * > enum mc_cmd_status status; > > /* Copy command response header from MC portal: */ > - __iormb(); > resp->header = __raw_readq(&portal->header); > - __iormb(); > status = mc_cmd_hdr_read_status(resp); > if (status != MC_CMD_STATUS_OK) > return status; > @@ -160,7 +159,6 @@ static inline enum mc_cmd_status mc_read_response(struct > mc_command __iomem * > /* Copy command response data from MC portal: */ > for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++) > resp->params[i] = __raw_readq(&portal->params[i]); > - __iormb(); > > return status; > } > ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [RFC PATCH 00/12] Ion cleanup in preparation for moving out of staging
On 10/03/17 10:31, Brian Starkey wrote: > Hi, > > On Thu, Mar 09, 2017 at 09:38:49AM -0800, Laura Abbott wrote: >> On 03/09/2017 02:00 AM, Benjamin Gaignard wrote: > > [snip] > >>> >>> For me those patches are going in the right direction. >>> >>> I still have few questions: >>> - since alignment management has been remove from ion-core, should it >>> be also removed from ioctl structure ? >> >> Yes, I think I'm going to go with the suggestion to fixup the ABI >> so we don't need the compat layer and as part of that I'm also >> dropping the align argument. >> > > Is the only motivation for removing the alignment parameter that > no-one got around to using it for something useful yet? > The original comment was true - different devices do have different > alignment requirements. > > Better alignment can help SMMUs use larger blocks when mapping, > reducing TLB pressure and the chance of a page table walk causing > display underruns. For that use-case, though, alignment alone doesn't necessarily help - you need the whole allocation granularity to match your block size (i.e. given a 1MB block size, asking for 17KB and getting back 17KB starting at a 1MB boundary doesn't help much - that whole 1MB needs to be allocated and everyone needs to know it to ensure that the whole lot can be mapped safely). Now, whether it's down to the callers or the heap implementations to decide and enforce that granularity is another question, but provided allocations are at least naturally aligned to whatever the granularity is (which is a reasonable assumption to bake in) then it's all good. Robin. > > -Brian > > ___ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] staging: vc04_services: setup DMA and coherent mask
Hi Michael, On 31/10/16 19:53, Michael Zoran wrote: > On Mon, 2016-10-31 at 11:40 -0700, Michael Zoran wrote: >> On Mon, 2016-10-31 at 11:36 -0700, Eric Anholt wrote: >>> Michael Zoran writes: >>> Setting the DMA mask is optional on 32 bit but is mandatory on 64 bit. Set the DMA mask and coherent to force all DMA to be in the 32 bit address space. This is considered a "good practice" and most drivers already do this. Signed-off-by: Michael Zoran --- [...] + /* + * Setting the DMA mask is necessary in the 64 bit environment. + * It isn't necessary in a 32 bit environment but is considered + * a good practice. + */ + err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)); >>> >>> I think a better comment here would be simply: >>> >>> /* VCHI messages between the CPU and firmware use 32-bit bus >>> addresses. */ >>> >>> explaining why the value is chosen (once you know that the 32 bit >>> restriction exists, reporting it is obviously needed). I'm >>> curious, >>> though: what failed when you didn't set it? >>> >> >> The comment is easy to change. >> >> I don't have the log available ATM, but if I remember the DMA API's >> bugcheck the first time that are used. >> >> I think this was a policy decision or something because the >> information >> should be available in the dma-ranges. >> >> If it's important, I can setup a test again without the change and e- >> mail the logs. >> >> If you look at the DWC2 driver you will see that it also sets this >> mask. > > OK, I'm begging to understand this. It appears the architecture > specific paths are very different. > > In arm the mask and coherent is set to DMA_BIT_MASK(32) in mm/dma- > mapping.c the first time the dma APIs are used. On arm64, it appears > this variable is uninitialized and will contain random crude. > > Like I said, I don't know if this is a policy decision or if it just > slipped through the cracks. [...] > arch/arm64/mm/dma-mapping.c(Note no call to get_coherent_dma_mask) > > static void *__dma_alloc(struct device *dev, size_t size, >dma_addr_t *dma_handle, gfp_t flags, >unsigned long attrs) > { > struct page *page; > void *ptr, *coherent_ptr; > bool coherent = is_device_dma_coherent(dev); > pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, false); > > size = PAGE_ALIGN(size); > > if (!coherent && !gfpflags_allow_blocking(flags)) { > struct page *page = NULL; > void *addr = __alloc_from_pool(size, &page, flags); > > if (addr) > *dma_handle = phys_to_dma(dev, > page_to_phys(page)); > > return addr; > } > > ptr = __dma_alloc_coherent(dev, size, dma_handle, flags, > attrs); In the general case, the device's coherent DMA mask is checked inside that helper function, and then again further on in the SWIOTLB code if necessary. For the atomic pool case beforehand, the pool is already allocated below 4GB, and on arm64 we don't really expect devices to have DMA masks *smaller* than 32 bits. Devices created from DT get 32-bit DMA masks by default, although the dma-ranges property may override that - see of_dma_configure() - so if you somehow have a device with an uninitialised mask then I can only assume there's some platform driver shenanigans going on. Adding the dma_set_mask() call to the driver is no bad thing, but the rationale in the commit message is bogus. Robin. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCHv2] staging: android: ion: use the manged version of DMA memory allocation
On 03/02/16 06:49, Gujulan Elango, Hari Prasath (H.) wrote: From: Hari Prasath Gujulan Elango Use the managed version of the dma_alloc_coherent() i.e. the dmam_alloc_coherent() & accordingly cleanup the error handling part.Also,remove the references to dma_free_coherent That last aspect looks a bit off to me - the heaps don't seem to be something that exist for the lifetime of the ION "device", given that these are specific runtime alloc and free calls, rather than the probe and remove routines. I don't know if CMA heaps are among those which ION creates and destroys frequently (enough that it apparently kicks off a whole background thread to manage the freeing), but this looks like a recipe for leaks. If the free call doesn't actually free the buffer, it's going to remain hanging around consuming precious CMA area until the ION device itself is torn down, which is likely never. I wouldn't say it's necessarily inappropriate to use managed DMA resources here to cover unexpected failure cases for the ION device itself (I see the comment in ion_device_remove()), but that means still using dmam_free_coherent() when naturally releasing allocations for other reasons (i.e. both cases here). Think Java finalisers, rather than C++ destructors. Robin. Signed-off-by: Hari Prasath Gujulan Elango --- v2:kbuild test robot reported warnings on ununsed variables.Those warnings are fixed. --- drivers/staging/android/ion/ion_cma_heap.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c index a3446da..8cd720b 100644 --- a/drivers/staging/android/ion/ion_cma_heap.c +++ b/drivers/staging/android/ion/ion_cma_heap.c @@ -61,7 +61,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, if (!info) return ION_CMA_ALLOCATE_FAILED; - info->cpu_addr = dma_alloc_coherent(dev, len, &(info->handle), + info->cpu_addr = dmam_alloc_coherent(dev, len, &(info->handle), GFP_HIGHUSER | __GFP_ZERO); if (!info->cpu_addr) { @@ -71,7 +71,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, info->table = kmalloc(sizeof(struct sg_table), GFP_KERNEL); if (!info->table) - goto free_mem; + goto err; if (dma_get_sgtable(dev, info->table, info->cpu_addr, info->handle, len)) @@ -83,8 +83,6 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, free_table: kfree(info->table); -free_mem: - dma_free_coherent(dev, len, info->cpu_addr, info->handle); err: kfree(info); return ION_CMA_ALLOCATE_FAILED; @@ -92,13 +90,8 @@ err: static void ion_cma_free(struct ion_buffer *buffer) { - struct ion_cma_heap *cma_heap = to_cma_heap(buffer->heap); - struct device *dev = cma_heap->dev; struct ion_cma_buffer_info *info = buffer->priv_virt; - dev_dbg(dev, "Release buffer %p\n", buffer); - /* release memory */ - dma_free_coherent(dev, buffer->size, info->cpu_addr, info->handle); /* release sg table */ sg_free_table(info->table); kfree(info->table); ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] staging: ion: ion_cma_heap: Don't directly use dma_common_get_sgtable
Hi Laura, On 17/07/15 17:50, Laura Abbott wrote: On 07/17/2015 08:21 AM, Robin Murphy wrote: Hi Tixy, On 17/07/15 12:01, Jon Medhurst (Tixy) wrote: Use dma_get_sgtable rather than dma_common_get_sgtable so a device's dma_ops aren't bypassed. This is essential in situations where a device uses an IOMMU and the physical memory is not contiguous (as the common function assumes). Signed-off-by: Jon Medhurst The lack of obvious users of this code makes it hard to tell if "dev" hereis always the right, real, device pointer and never null or some dummy device with the wrong dma_ops, but the rest of the calls in this file are to the proper DMA API interface so at least this patch definitely makes things less wrong in that respect. Ion currently lacks any standard way to set up heaps and associate a device with a heap. This means it's basically a free for all for what devices get associated (getting something mainlined might help...). I agree that using the proper DMA APIs is a step in the right direction. I suspected as much, thanks for the confirmation. Reviewed-by: Robin Murphy --- This also begs the question as to what happens if the memory region _is_ contiguous but is in highmem or an ioremapped region. Should a device always provide dma_ops for that case? Because I believe the current implementation of dma_common_get_sgtable won't work for those as it uses virt_to_page. I see that this point has been raised before [1] by Zeng Tao, and I myself have been given a different fix to apply to a Linaro kernel tree. However, both solutions looked wrong to me as they treat a dma_addr_t as a physical address, so should at least be using dma_to_phys. So, should we fix dma_common_get_sgtable or mandate that the device has dma_ops? The latter seems to be implied by the commit message which introduced the function: This patch provides a generic implementation based on virt_to_page() call. Architectures which require more sophisticated translation might provide their own get_sgtable() methods. Given that we're largely here due to having poked this on arm64 systems, I'm inclined to think that implementing our own get_sgtable as per arch/arm is the right course of action. Since a lot of architectures using dma_common_get_sgtable don't even implement dma_to_phys, I don't think it would be right to try complicating the common code for a case that seems to be all but common. I can spin an arm64 patch if you like. This would be hit on any system that has non-coherent DMA or highmem. I'm not sure I agree this isn't a common case. How many of the other architectures are actually using the dma_get_sgtable and would have the potential to find a problem? This appears to be pretty much exclusively a graphics/video thing. Surveying in-tree callers (other than Ion) gives DRM, V4L, and a couple of specific ARM SoC drivers - my hunch is that none of those see much action on the likes of Blackfin and 68k. That said, going through the git logs, the primary purpose of dma_common_get_sgtable would appear to be not breaking allmodconfig builds on architectures other than ARM. Thus I'm not really sure which is the least worst option - having "common" code which doesn't actually represent the common use case, or adding bogus dma_to_phys definitions to loads of architectures that don't even have proper DMA mapping implementations for the sake of some code they don't even use... Robin. Thanks, Laura ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [PATCH] staging: ion: ion_cma_heap: Don't directly use dma_common_get_sgtable
Hi Tixy, On 17/07/15 12:01, Jon Medhurst (Tixy) wrote: Use dma_get_sgtable rather than dma_common_get_sgtable so a device's dma_ops aren't bypassed. This is essential in situations where a device uses an IOMMU and the physical memory is not contiguous (as the common function assumes). Signed-off-by: Jon Medhurst The lack of obvious users of this code makes it hard to tell if "dev" here is always the right, real, device pointer and never null or some dummy device with the wrong dma_ops, but the rest of the calls in this file are to the proper DMA API interface so at least this patch definitely makes things less wrong in that respect. Reviewed-by: Robin Murphy --- This also begs the question as to what happens if the memory region _is_ contiguous but is in highmem or an ioremapped region. Should a device always provide dma_ops for that case? Because I believe the current implementation of dma_common_get_sgtable won't work for those as it uses virt_to_page. I see that this point has been raised before [1] by Zeng Tao, and I myself have been given a different fix to apply to a Linaro kernel tree. However, both solutions looked wrong to me as they treat a dma_addr_t as a physical address, so should at least be using dma_to_phys. So, should we fix dma_common_get_sgtable or mandate that the device has dma_ops? The latter seems to be implied by the commit message which introduced the function: This patch provides a generic implementation based on virt_to_page() call. Architectures which require more sophisticated translation might provide their own get_sgtable() methods. Given that we're largely here due to having poked this on arm64 systems, I'm inclined to think that implementing our own get_sgtable as per arch/arm is the right course of action. Since a lot of architectures using dma_common_get_sgtable don't even implement dma_to_phys, I don't think it would be right to try complicating the common code for a case that seems to be all but common. I can spin an arm64 patch if you like. Robin. Note, I don't have a system where any of this code is used to test things, and have never looked at this area before yesterday, so I may have misunderstood what’s going on in the code. [1] https://lkml.org/lkml/2014/12/1/584 drivers/staging/android/ion/ion_cma_heap.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c index f4211f1..86b91fd 100644 --- a/drivers/staging/android/ion/ion_cma_heap.c +++ b/drivers/staging/android/ion/ion_cma_heap.c @@ -73,8 +73,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, if (!info->table) goto free_mem; - if (dma_common_get_sgtable - (dev, info->table, info->cpu_addr, info->handle, len)) + if (dma_get_sgtable(dev, info->table, info->cpu_addr, info->handle, len)) goto free_table; /* keep this for memory release */ buffer->priv_virt = info; ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel