Re: [PATCH v3 79/79] media: hantro: document the usage of pm_runtime_get_sync()

2021-04-27 Thread Robin Murphy

On 2021-04-27 11:27, Mauro Carvalho Chehab wrote:

Despite other *_get()/*_put() functions, where usage count is
incremented only if not errors, the pm_runtime_get_sync() has
a different behavior, incrementing the counter *even* on
errors.

That's an error prone behavior, as people often forget to
decrement the usage counter.

However, the hantro driver depends on this behavior, as it
will decrement the usage_count unconditionally at the m2m
job finish time, which makes sense.

So, intead of using the pm_runtime_resume_and_get() that
would decrement the counter on error, keep the current
API, but add a documentation explaining the rationale for
keep using pm_runtime_get_sync().

Signed-off-by: Mauro Carvalho Chehab 
---
  drivers/staging/media/hantro/hantro_drv.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/staging/media/hantro/hantro_drv.c 
b/drivers/staging/media/hantro/hantro_drv.c
index 595e82a82728..96f940c1c85c 100644
--- a/drivers/staging/media/hantro/hantro_drv.c
+++ b/drivers/staging/media/hantro/hantro_drv.c
@@ -155,6 +155,13 @@ static void device_run(void *priv)
ret = clk_bulk_enable(ctx->dev->variant->num_clocks, ctx->dev->clocks);
if (ret)
goto err_cancel_job;


..except this can also cause the same pm_runtime_put_autosuspend() call 
without even reaching the "matching" get below, so rather than some kind 
of cleverness it seems more like it's just broken :/


Robin.


+
+   /*
+* The pm_runtime_get_sync() will increment dev->power.usage_count,
+* even on errors. That's the expected behavior here, since the
+* hantro_job_finish() function at the error handling code
+* will internally call pm_runtime_put_autosuspend().
+*/
ret = pm_runtime_get_sync(ctx->dev->dev);
if (ret < 0)
goto err_cancel_job;


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: use of dma_direct_set_offset in (allwinner) drivers

2020-11-04 Thread Robin Murphy

On 2020-11-04 08:14, Maxime Ripard wrote:

Hi Christoph,

On Tue, Nov 03, 2020 at 10:55:38AM +0100, Christoph Hellwig wrote:

Linux 5.10-rc1 switched from having a single dma offset in struct device
to a set of DMA ranges, and introduced a new helper to set them,
dma_direct_set_offset.

This in fact surfaced that a bunch of drivers that violate our layering
and set the offset from drivers, which meant we had to reluctantly
export the symbol to set up the DMA range.

The drivers are:

drivers/gpu/drm/sun4i/sun4i_backend.c

   This just use dma_direct_set_offset as a fallback.  Is there any good
   reason to not just kill off the fallback?

drivers/media/platform/sunxi/sun4i-csi/sun4i_csi.c

   Same as above.


So, the history of this is:

   - We initially introduced the support for those two controllers
 assuming that there was a direct mapping between the physical and
 DMA addresses. It turns out it didn't and the DMA accesses were
 going through a secondary, dedicated, bus that didn't have the same
 mapping of the RAM than the CPU.

 4690803b09c6 ("drm/sun4i: backend: Offset layer buffer address by DRAM starting 
address")

   - This dedicated bus is undocumented and barely used in the vendor
 kernel so this was overlooked, and it's fairly hard to get infos on
 it for all the SoCs we support. We added the DT support for it
 though on some SoCs we had enough infos to do so:

 c43a4469402f ("dt-bindings: interconnect: Add a dma interconnect name")
 22f88e311399 ("ARM: dts: sun5i: Add the MBUS controller")

 This explains the check on the interconnect property

   - However, due to the stable DT rule, we still need to operate without
 regressions on older DTs that wouldn't have that property (and for
 SoCs we haven't figured out). Hence the fallback.


How about having something in the platform code that keys off the 
top-level SoC compatible and uses a bus notifier to create offsets for 
the relevant devices if an MBUS description is missing? At least that 
way the workaround could be confined to a single dedicated place and 
look somewhat similar to other special cases like sta2x11, rather than 
being duplicated all over the place.


Robin.


drivers/media/platform/sunxi/sun6i-csi/sun6i_csi.c

   This driver unconditionally sets the offset.  Why can't we do this
   in the device tree?

drivers/staging/media/sunxi/cedrus/cedrus_hw.c

   Same as above.



We should make those two match the previous ones, but we'll have the
same issue here eventually. Most likely they were never ran on an SoC
for which we have the MBUS figured out.

Maxime


___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH v5 8/9] arm64: dts: rockchip: add isp0 node for rk3399

2020-10-14 Thread Robin Murphy

On 2020-10-14 17:27, Helen Koike wrote:

Hi Tomasz,

On 9/26/20 10:00 AM, Tomasz Figa wrote:

Hi Helen,

On Wed, Jul 22, 2020 at 12:55:32PM -0300, Helen Koike wrote:

From: Shunqian Zheng 

RK3399 has two ISPs, but only isp0 was tested.
Add isp0 node in rk3399 dtsi

Verified with:
make ARCH=arm64 dtbs_check 
DT_SCHEMA_FILES=Documentation/devicetree/bindings/media/rockchip-isp1.yaml

Signed-off-by: Shunqian Zheng 
Signed-off-by: Jacob Chen 
Signed-off-by: Helen Koike 

---

V4:
- update clock names

V3:
- clean up clocks

V2:
- re-order power-domains property

V1:
This patch was originally part of this patchset:

 https://patchwork.kernel.org/patch/10267431/

The only difference is:
- add phy properties
- add ports
---
  arch/arm64/boot/dts/rockchip/rk3399.dtsi | 25 
  1 file changed, 25 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi 
b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
index dba9641947a3a..ed8ba75dbbce8 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi
@@ -1721,6 +1721,31 @@ vopb_mmu: iommu@ff903f00 {
status = "disabled";
};
  
+	isp0: isp0@ff91 {

+   compatible = "rockchip,rk3399-cif-isp";
+   reg = <0x0 0xff91 0x0 0x4000>;
+   interrupts = ;
+   clocks = <&cru SCLK_ISP0>,
+<&cru ACLK_ISP0_WRAPPER>,
+<&cru HCLK_ISP0_WRAPPER>;
+   clock-names = "isp", "aclk", "hclk";
+   iommus = <&isp0_mmu>;
+   phys = <&mipi_dphy_rx0>;
+   phy-names = "dphy";
+   power-domains = <&power RK3399_PD_ISP0>;


Should this have status = "disabled" too? The mipi_dphy_rx0 node is
disabled by default too, so in the default configuration the driver
would always fail to probe.


I'm thinking what is the overall guideline here.
Since isp and mipi_dphy are always present in the rk3399, shouldn't they always 
be enabled?
Or since they are only useful if a sensor is present, we should let the dts of 
the board to
enable it?


Yes, the usual pattern is that anything which needs additional hardware 
outside the SoC to be useful is disabled by default in the SoC DTSI, and 
enabled in individual board DTSs as appropriate. See USB, HDMI, etc. for 
instance.


There's probably a further debate about how much the board itself should 
enable if it only breaks out a connector for the user to add their own 
camera module, but hey, one step at a time ;)


Robin.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 00/16] IOMMU driver for Kirin 960/970

2020-08-19 Thread Robin Murphy

On 2020-08-19 11:28, Mauro Carvalho Chehab wrote:

Em Tue, 18 Aug 2020 15:02:54 -0700
John Stultz  escreveu:


On Tue, Aug 18, 2020 at 9:26 AM Robin Murphy  wrote:

On 2020-08-18 16:29, Mauro Carvalho Chehab wrote:

Em Tue, 18 Aug 2020 15:47:55 +0100
Basically, the DT binding has this, for IOMMU:


   smmu_lpae {
   compatible = "hisilicon,smmu-lpae";
   };

...
   dpe: dpe@e860 {
   compatible = "hisilicon,kirin970-dpe";
   memory-region = <&drm_dma_reserved>;
...
   iommu_info {
   start-addr = <0x8000>;
   size = <0xbfff8000>;
   };
   }

This is used by kirin9xx_drm_dss.c in order to enable and use
the iommu:


   static int dss_enable_iommu(struct platform_device *pdev, struct 
dss_hw_ctx *ctx)
   {
   struct device *dev = NULL;

   dev = &pdev->dev;

   /* create iommu domain */
   ctx->mmu_domain = iommu_domain_alloc(dev->bus);
   if (!ctx->mmu_domain) {
   pr_err("iommu_domain_alloc failed!\n");
   return -EINVAL;
   }

   iommu_attach_device(ctx->mmu_domain, dev);

   return 0;
   }

The only place where the IOMMU domain is used is on this part of the
code(error part simplified here) [1]:

   void hisi_dss_smmu_on(struct dss_hw_ctx *ctx)
   {
   uint64_t fama_phy_pgd_base;
   uint32_t phy_pgd_base;
...
   fama_phy_pgd_base = iommu_iova_to_phys(ctx->mmu_domain, 0);
   phy_pgd_base = (uint32_t)fama_phy_pgd_base;
   if (WARN_ON(!phy_pgd_base))
   return;

   set_reg(smmu_base + SMMU_CB_TTBR0, phy_pgd_base, 32, 0);
   }

[1] 
https://github.com/mchehab/linux/commit/36da105e719b47bbe9d6cb7e5619b30c7f3eb1bd

In other words, the driver needs to get the physical address of the frame
buffer (mapped via iommu) in order to set some DRM-specific register.

Yeah, the above code is somewhat hackish. I would love to replace
this part by a more standard approach.


OK, so from a quick look at that, my impression is that your display
controller has its own MMU and you don't need to pretend to use the
IOMMU API at all. Just have the DRM driver use io-pgtable directly to
run its own set of ARM_32_LPAE_S1 pagetables - see Panfrost for an
example (but try to ignore the wacky "Mali LPAE" format).


Yea. For the HiKey960, there was originally a similar patch series but
it was refactored out and the (still out of tree) DRM driver I'm
carrying doesn't seem to need it (though looking we still have the
iommu_info subnode in the dts that maybe needs to be cleaned up).


Funny... while the Hikey 970 DRM driver has such IOMMU code, it
doesn't actually use it!

The driver has a function called hisi_dss_smmu_config() with
sets the registers on a different way in order to use IOMMU
or not, at the hisi_fb_pan_display() function. It can also
use a mode called "afbcd".

Well, this function sets both to false:

bool afbcd = false;
bool mmu_enable = false;

I ended commenting out the code which depends at the iommu
driver and everything is working as before.

So, I'll just forget about this iommu driver, as we can live
without that.

For now, I'll keep the mmu code there commented out, as
it could be useful on a future port for it to use io-pgtable.

-

Robin,

Can the Panfrost driver use io-pgtable while the KMS driver
won't be using it? Or this would cause it to not work?

My end goal here is to be able to test the Panfrost driver ;-)


Yup, the GPU has its own independent MMU, so Panfrost can import display 
buffers regardless of whether they're physically contiguous or not. 
Since Mesa master has recently landed AFBC support, there's probably 
more immediate benefit in getting that AFBC decoder working before the 
display MMU (although ultimately things are likely to work better under 
memory pressure if you don't have to rely on CMA, so it should still be 
worth coming back to at some point).


Robin.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 00/16] IOMMU driver for Kirin 960/970

2020-08-19 Thread Robin Murphy

On 2020-08-18 23:02, John Stultz wrote:

On Tue, Aug 18, 2020 at 9:26 AM Robin Murphy  wrote:

On 2020-08-18 16:29, Mauro Carvalho Chehab wrote:

Em Tue, 18 Aug 2020 15:47:55 +0100
Basically, the DT binding has this, for IOMMU:


   smmu_lpae {
   compatible = "hisilicon,smmu-lpae";
   };

...
   dpe: dpe@e860 {
   compatible = "hisilicon,kirin970-dpe";
   memory-region = <&drm_dma_reserved>;
...
   iommu_info {
   start-addr = <0x8000>;
   size = <0xbfff8000>;
   };
   }

This is used by kirin9xx_drm_dss.c in order to enable and use
the iommu:


   static int dss_enable_iommu(struct platform_device *pdev, struct 
dss_hw_ctx *ctx)
   {
   struct device *dev = NULL;

   dev = &pdev->dev;

   /* create iommu domain */
   ctx->mmu_domain = iommu_domain_alloc(dev->bus);
   if (!ctx->mmu_domain) {
   pr_err("iommu_domain_alloc failed!\n");
   return -EINVAL;
   }

   iommu_attach_device(ctx->mmu_domain, dev);

   return 0;
   }

The only place where the IOMMU domain is used is on this part of the
code(error part simplified here) [1]:

   void hisi_dss_smmu_on(struct dss_hw_ctx *ctx)
   {
   uint64_t fama_phy_pgd_base;
   uint32_t phy_pgd_base;
...
   fama_phy_pgd_base = iommu_iova_to_phys(ctx->mmu_domain, 0);
   phy_pgd_base = (uint32_t)fama_phy_pgd_base;
   if (WARN_ON(!phy_pgd_base))
   return;

   set_reg(smmu_base + SMMU_CB_TTBR0, phy_pgd_base, 32, 0);
   }

[1] 
https://github.com/mchehab/linux/commit/36da105e719b47bbe9d6cb7e5619b30c7f3eb1bd

In other words, the driver needs to get the physical address of the frame
buffer (mapped via iommu) in order to set some DRM-specific register.

Yeah, the above code is somewhat hackish. I would love to replace
this part by a more standard approach.


OK, so from a quick look at that, my impression is that your display
controller has its own MMU and you don't need to pretend to use the
IOMMU API at all. Just have the DRM driver use io-pgtable directly to
run its own set of ARM_32_LPAE_S1 pagetables - see Panfrost for an
example (but try to ignore the wacky "Mali LPAE" format).


Yea. For the HiKey960, there was originally a similar patch series but
it was refactored out and the (still out of tree) DRM driver I'm
carrying doesn't seem to need it (though looking we still have the
iommu_info subnode in the dts that maybe needs to be cleaned up).


Indeed, I'd assume it's possible to leave the MMU off and just use CMA 
buffers instead, but wiring it up properly without the downstream 
mis-design should be pretty clean, so maybe that could ultimately be 
shared with 960 too (assuming the hardware isn't wildly dissimilar).


I notice there's already a whole load of MMU configuration hard-coded 
into the DRM driver - does iommu_info even need to be in the DT, or 
could that also be decided directly by the driver? (Most other MMU-aware 
DRM drivers seem to hard-code their drm_mm dimensions.) I can't imagine 
the *virtual* address space limits need to vary on a per-board basis, 
and they could easily be tied to the compatible if they legitimately 
differ across SoCs and a simple lowest-common-denominator approach 
wouldn't suffice for whatever reason.


Robin.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 00/16] IOMMU driver for Kirin 960/970

2020-08-18 Thread Robin Murphy

On 2020-08-18 16:29, Mauro Carvalho Chehab wrote:

Hi Robin,

Em Tue, 18 Aug 2020 15:47:55 +0100
Robin Murphy  escreveu:


On 2020-08-17 08:49, Mauro Carvalho Chehab wrote:

Add a driver for the Kirin 960/970 iommu.

As on the past series, this starts from the original 4.9 driver from
the 96boards tree:

https://github.com/96boards-hikey/linux/tree/hikey970-v4.9

The remaining patches add SPDX headers and make it build and run with
the upstream Kernel.

Chenfeng (1):
iommu: add support for HiSilicon Kirin 960/970 iommu

Mauro Carvalho Chehab (15):
iommu: hisilicon: remove default iommu_map_sg handler
iommu: hisilicon: map and unmap ops gained new arguments
iommu: hisi_smmu_lpae: rebase it to work with upstream
iommu: hisi_smmu: remove linux/hisi/hisi-iommu.h
iommu: hisilicon: cleanup its code style
iommu: hisi_smmu_lpae: get rid of IOMMU_SEC and IOMMU_DEVICE
iommu: get rid of map/unmap tile functions
iommu: hisi_smmu_lpae: use the right code to get domain-priv data
iommu: hisi_smmu_lpae: convert it to probe_device
iommu: add Hisilicon Kirin970 iommu at the building system
iommu: hisi_smmu_lpae: cleanup printk macros
iommu: hisi_smmu_lpae: make OF compatible more standard


Echoing the other comments about none of the driver patches being CC'd
to the IOMMU list...

Still, I dug the series up on lore and frankly I'm not sure what to make
of it - AFAICS the "driver" is just yet another implementation of Arm
LPAE pagetable code, with no obvious indication of how those pagetables
ever get handed off to IOMMU hardware (and indeed no indication of IOMMU
hardware at all). Can you explain how it's supposed to work?

And as a pre-emptive strike, we really don't need any more LPAE
implementations - that's what the io-pgtable library is all about (which
incidentally has been around since 4.0...). I think that should make the
issue of preserving authorship largely moot since there's no need to
preserve most of the code anyway ;)


I didn't know about that, since I got a Hikey 970 board for the first time
about one month ago, and that's the first time I looked into iommu code.

My end goal with this is to make the DRM/KMS driver to work with upstream
Kernels.

The full patch series are at:

https://github.com/mchehab/linux/commits/hikey970/to_upstream-2.0-v1.1

(I need to put a new version there, after some changes due to recent
upstream discussions at the regulator's part of the code)

Basically, the DT binding has this, for IOMMU:


smmu_lpae {
compatible = "hisilicon,smmu-lpae";
};

...
dpe: dpe@e860 {
compatible = "hisilicon,kirin970-dpe";
memory-region = <&drm_dma_reserved>;
...
iommu_info {
start-addr = <0x8000>;
size = <0xbfff8000>;
};
}

This is used by kirin9xx_drm_dss.c in order to enable and use
the iommu:


static int dss_enable_iommu(struct platform_device *pdev, struct 
dss_hw_ctx *ctx)
{
struct device *dev = NULL;

dev = &pdev->dev;

/* create iommu domain */
ctx->mmu_domain = iommu_domain_alloc(dev->bus);
if (!ctx->mmu_domain) {
pr_err("iommu_domain_alloc failed!\n");
return -EINVAL;
}

iommu_attach_device(ctx->mmu_domain, dev);

return 0;
}

The only place where the IOMMU domain is used is on this part of the
code(error part simplified here) [1]:

void hisi_dss_smmu_on(struct dss_hw_ctx *ctx)
{
uint64_t fama_phy_pgd_base;
uint32_t phy_pgd_base;
...
fama_phy_pgd_base = iommu_iova_to_phys(ctx->mmu_domain, 0);
phy_pgd_base = (uint32_t)fama_phy_pgd_base;
if (WARN_ON(!phy_pgd_base))
return;

set_reg(smmu_base + SMMU_CB_TTBR0, phy_pgd_base, 32, 0);
}

[1] 
https://github.com/mchehab/linux/commit/36da105e719b47bbe9d6cb7e5619b30c7f3eb1bd

In other words, the driver needs to get the physical address of the frame
buffer (mapped via iommu) in order to set some DRM-specific register.

Yeah, the above code is somewhat hackish. I would love to replace
this part by a more standard approach.


OK, so from a quick look at that, my impression is that your display 
controller has its own MMU and you don't need to pretend to use the 
IOMMU API at all. Just have the DRM driver use io-pgtable directly to 
run its own set of ARM_32_LPAE_S1 pagetables - see Panfrost for an 
example (but try to ignore the wacky "Mali LPAE" format).


Robin.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 00/16] IOMMU driver for Kirin 960/970

2020-08-18 Thread Robin Murphy

On 2020-08-17 08:49, Mauro Carvalho Chehab wrote:

Add a driver for the Kirin 960/970 iommu.

As on the past series, this starts from the original 4.9 driver from
the 96boards tree:

https://github.com/96boards-hikey/linux/tree/hikey970-v4.9

The remaining patches add SPDX headers and make it build and run with
the upstream Kernel.

Chenfeng (1):
   iommu: add support for HiSilicon Kirin 960/970 iommu

Mauro Carvalho Chehab (15):
   iommu: hisilicon: remove default iommu_map_sg handler
   iommu: hisilicon: map and unmap ops gained new arguments
   iommu: hisi_smmu_lpae: rebase it to work with upstream
   iommu: hisi_smmu: remove linux/hisi/hisi-iommu.h
   iommu: hisilicon: cleanup its code style
   iommu: hisi_smmu_lpae: get rid of IOMMU_SEC and IOMMU_DEVICE
   iommu: get rid of map/unmap tile functions
   iommu: hisi_smmu_lpae: use the right code to get domain-priv data
   iommu: hisi_smmu_lpae: convert it to probe_device
   iommu: add Hisilicon Kirin970 iommu at the building system
   iommu: hisi_smmu_lpae: cleanup printk macros
   iommu: hisi_smmu_lpae: make OF compatible more standard


Echoing the other comments about none of the driver patches being CC'd 
to the IOMMU list...


Still, I dug the series up on lore and frankly I'm not sure what to make 
of it - AFAICS the "driver" is just yet another implementation of Arm 
LPAE pagetable code, with no obvious indication of how those pagetables 
ever get handed off to IOMMU hardware (and indeed no indication of IOMMU 
hardware at all). Can you explain how it's supposed to work?


And as a pre-emptive strike, we really don't need any more LPAE 
implementations - that's what the io-pgtable library is all about (which 
incidentally has been around since 4.0...). I think that should make the 
issue of preserving authorship largely moot since there's no need to 
preserve most of the code anyway ;)


Robin.


   dt: add an spec for the Kirin36x0 SMMU
   dt: hi3670-hikey970.dts: load the SMMU driver on Hikey970
   staging: hikey9xx: add an item about the iommu driver

  .../iommu/hisilicon,kirin36x0-smmu.yaml   |  55 ++
  .../boot/dts/hisilicon/hi3670-hikey970.dts|   3 +
  drivers/staging/hikey9xx/Kconfig  |   9 +
  drivers/staging/hikey9xx/Makefile |   1 +
  drivers/staging/hikey9xx/TODO |   1 +
  drivers/staging/hikey9xx/hisi_smmu.h  | 196 ++
  drivers/staging/hikey9xx/hisi_smmu_lpae.c | 648 ++
  7 files changed, 913 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/iommu/hisilicon,kirin36x0-smmu.yaml
  create mode 100644 drivers/staging/hikey9xx/hisi_smmu.h
  create mode 100644 drivers/staging/hikey9xx/hisi_smmu_lpae.c


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH v5 08/12] device core: Introduce multiple dma pfn offsets

2020-06-17 Thread Robin Murphy

Hi Jim,

Thanks for taking this on!

On 2020-06-16 21:55, Jim Quinlan wrote:

The new field in struct device 'dma_pfn_offset_map' is used to facilitate
the use of single or multiple pfn offsets between cpu addrs and dma addrs.
It subsumes the role of dev->dma_pfn_offset -- a uniform offset.


This isn't just about offsets - it should (eventually) subsume 
bus_dma_limit as well, so I'd be inclined to call it something like 
"dma_ranges"/"dma_range_map"/"dma_regions"/etc.



The function of_dma_get_range() has been modified to take two additional
arguments: the "map", which is an array that holds the information
regarding the pfn offset regions, and map_size, which is the size in bytes
of the map array.

of_dma_configure() is the typical manner to set pfn offsets but there are a
number of ad hoc assignments to dev->dma_pfn_offset in the kernel driver
code.  These cases now invoke the function
dma_attach_uniform_pfn_offset(dev, pfn_offset).


I'm also not convinced that sticking to the PFN paradigm is necessarily 
the right way to go - when there's only a single nicely-aligned offset 
to consider then an unsigned long that's immune to PAE/LPAE/etc. 
disruption is indeed the cheapest and easiest option from core code's 
PoV. However it already means that all the users have to do some degree 
of conversion back and forth between PFNs and usable addresses; once the 
core code itself also has to start bouncing back and forth between 
addresses and PFNs internally then we end up effectively just doing work 
to cancel out other work, and the whole lot would end up simpler and 
more efficient if the API worked purely in terms of addresses.


[...]

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 8eea3f6e29a4..767fa3b492c8 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -918,12 +918,48 @@ void __iomem *of_io_request_and_map(struct device_node 
*np, int index,
  }
  EXPORT_SYMBOL(of_io_request_and_map);
  
+static int dma_attach_pfn_offset_map(struct device_node *node, int num_ranges,

+struct bus_dma_region **map, size_t 
*map_size)
+{
+   struct of_range_parser parser;
+   struct of_range range;
+   struct bus_dma_region *r;
+
+   *map_size = (num_ranges + 1) * sizeof(**map);
+   r = kcalloc(num_ranges + 1, sizeof(**map), GFP_KERNEL);
+   if (!r)
+   return -ENOMEM;
+   *map = r;
+
+   of_dma_range_parser_init(&parser, node);
+   /*
+* Record all info for DMA ranges array.  We could
+* just use the of_range struct, but if we did that it


Not making the entire DMA API depend on OF is a far better justification 
for having its own dedicated structure.



+* would require more calculations for phys_to_dma and
+* dma_to_phys conversions.
+*/


However that part is pretty much nonsense. Consider your "efficient" 
operation for looking up and consuming a DMA offset:


API caller
1. load cpu_start
2. compare addr >= cpu_start
3. load cpu_end
4. compare addr <= cpu_end
5. load pfn_offset
6.  shift pfn_offset << PAGE_SHIFT
7.  add "offset" + addr
8.  [use the result]

versus the "more calculations" approach (once the PFN cruft is peeled away):

API caller
1. load cpu_addr
2. compare addr >= cpu_addr
3. subtract addr - cpu_addr
4. load size
5. compare "addr_offset" < size
6. load dma_start
7. add dma_start + "addr_offset"
8.  [use the result]

Oh look, it's the exact same number of memory accesses and ALU 
operations, but with a smaller code footprint (assuming, reasonably, 
more than one caller) and less storage overhead ;)


Basically, having this degree of redundancy is somewhere between silly 
and actively harmful (what if pfn_offset gets out of sync with 
cpu_start/dma_start? What if cpu_end/dma_end don't represent equivalent 
lengths?)



+   for_each_of_range(&parser, &range) {
+   r->cpu_start = range.cpu_addr;
+   r->cpu_end = r->cpu_start + range.size - 1;
+   r->dma_start = range.bus_addr;
+   r->dma_end = r->dma_start + range.size - 1;
+   r->pfn_offset = PFN_DOWN(range.cpu_addr) - 
PFN_DOWN(range.bus_addr);
+   r++;
+   }
+   return 0;
+}
+
  /**
   * of_dma_get_range - Get DMA range info
   * @np:   device node to get DMA range info
   * @dma_addr: pointer to store initial DMA address of DMA range
   * @paddr:pointer to store initial CPU address of DMA range
   * @size: pointer to store size of DMA range
+ * @map:   pointer to a pointer of an array of structs.  This is updated
+ * to point to NULL (no offsets needed) or kmalloc'd array of
+ * structs.  In the latter case, it is the caller's obligation to
+ * kfree the array in the case 

Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled

2019-07-24 Thread Robin Murphy

On 24/07/2019 15:09, Dmitry Osipenko wrote:

24.07.2019 17:03, Yuehaibing пишет:

On 2019/7/24 21:49, Robin Murphy wrote:

On 24/07/2019 11:30, Sakari Ailus wrote:

Hi Yue,

On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote:

If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m.
But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT"
in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA
is m, then the building fails like this:

drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map':
iommu.c:(.text+0x41): undefined reference to `alloc_iova'
iommu.c:(.text+0x56): undefined reference to `__free_iova'

Reported-by: Hulk Robot 
Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci device 
driver")
Signed-off-by: YueHaibing 
---
   drivers/staging/media/ipu3/Kconfig | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/media/ipu3/Kconfig 
b/drivers/staging/media/ipu3/Kconfig
index 4b51c67..b7df18f 100644
--- a/drivers/staging/media/ipu3/Kconfig
+++ b/drivers/staging/media/ipu3/Kconfig
@@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU
   depends on PCI && VIDEO_V4L2
   depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API
   depends on X86
-select IOMMU_IOVA
+select IOMMU_IOVA if IOMMU_SUPPORT


This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA
independently of IOMMU_SUPPORT.

Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not
declared in its Kconfig entry. I wonder if adding that would be the right
way to fix this.

Cc'ing the IOMMU list.

IOMMU_SUPPORT is optional for the Tegra-VDE driver.


Right, I also had the impression that we'd made the IOVA library completely 
standalone. And what does the IPU3 driver's Kconfig have to do with some 
*other* driver failing to link anyway?


I can see it failing if IPU3 is compiled as a loadable module, while
Tegra-VDE is a built-in driver. Hence IOVA lib should be also a kernel
module and thus the IOVA symbols will be missing during of linkage of
the VDE driver.


Oh, I misunderstand that IOMMU_IOVA is depend on IOMMU_SUPPORT, thank you for 
clarification.

I will try to fix this in tegra-vde.


Probably IOVA could be selected independently of IOMMU_SUPPORT, but IOVA
library isn't needed for the VDE driver if IOMMU_SUPPORT is disabled.


Oh, I think I get the problem now - tegra-vde/iommu.c is built 
unconditionally and relies on the static inline stubs for IOMMU and IOVA 
calls if !IOMMU_SUPPORT, but in a compile-test config where IOVA=m for 
other reasons, it then picks up the real declarations from linux/iova.h 
instead of the stubs, and things go downhill from there. So there is a 
real issue, but indeed it's Tegra-VDE which needs to be restructured to 
cope with such configurations, and not IPU3's (or anyone else who may 
select IOVA=m in future) job to work around it.


Robin.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] media: staging: ipu3: Enable IOVA API only when IOMMU support is enabled

2019-07-24 Thread Robin Murphy

On 24/07/2019 11:30, Sakari Ailus wrote:

Hi Yue,

On Mon, Jul 22, 2019 at 09:47:49PM +0800, YueHaibing wrote:

If IOMMU_SUPPORT is not set, ipu3 driver may select IOMMU_IOVA to m.
But for many drivers, they use "select IOMMU_IOVA if IOMMU_SUPPORT"
in the Kconfig, for example, CONFIG_TEGRA_VDE is set to y but IOMMU_IOVA
is m, then the building fails like this:

drivers/staging/media/tegra-vde/iommu.o: In function `tegra_vde_iommu_map':
iommu.c:(.text+0x41): undefined reference to `alloc_iova'
iommu.c:(.text+0x56): undefined reference to `__free_iova'

Reported-by: Hulk Robot 
Fixes: 7fc7af649ca7 ("media: staging/intel-ipu3: Add imgu top level pci device 
driver")
Signed-off-by: YueHaibing 
---
  drivers/staging/media/ipu3/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/media/ipu3/Kconfig 
b/drivers/staging/media/ipu3/Kconfig
index 4b51c67..b7df18f 100644
--- a/drivers/staging/media/ipu3/Kconfig
+++ b/drivers/staging/media/ipu3/Kconfig
@@ -4,7 +4,7 @@ config VIDEO_IPU3_IMGU
depends on PCI && VIDEO_V4L2
depends on MEDIA_CONTROLLER && VIDEO_V4L2_SUBDEV_API
depends on X86
-   select IOMMU_IOVA
+   select IOMMU_IOVA if IOMMU_SUPPORT


This doesn't seem right: the ipu3-cio2 driver needs IOMMU_IOVA
independently of IOMMU_SUPPORT.

Looking at tegra-vde, it seems to depend on IOMMU_SUPPORT but that's not
declared in its Kconfig entry. I wonder if adding that would be the right
way to fix this.

Cc'ing the IOMMU list.


Right, I also had the impression that we'd made the IOVA library 
completely standalone. And what does the IPU3 driver's Kconfig have to 
do with some *other* driver failing to link anyway?


Robin.




select VIDEOBUF2_DMA_SG
help
  This is the Video4Linux2 driver for Intel IPU3 image processing unit,



___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 16/16] dma-mapping: use exact allocation in dma_alloc_contiguous

2019-06-14 Thread Robin Murphy

On 14/06/2019 15:50, 'Christoph Hellwig' wrote:

On Fri, Jun 14, 2019 at 02:15:44PM +, David Laight wrote:

Does this still guarantee that requests for 16k will not cross a 16k boundary?
It looks like you are losing the alignment parameter.


The DMA API never gave you alignment guarantees to start with,
and you can get not naturally aligned memory from many of our
current implementations.


Well, apart from the bit in DMA-API-HOWTO which has said this since 
forever (well, before Git history, at least):


"The CPU virtual address and the DMA address are both
guaranteed to be aligned to the smallest PAGE_SIZE order which
is greater than or equal to the requested size.  This invariant
exists (for example) to guarantee that if you allocate a chunk
which is smaller than or equal to 64 kilobytes, the extent of the
buffer you receive will not cross a 64K boundary."

That said, I don't believe this particular patch should make any 
appreciable difference - alloc_pages_exact() is still going to give back 
the same base address as the rounded up over-allocation would, and 
PAGE_ALIGN()ing the size passed to get_order() already seemed to be 
pointless.


Robin.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH v3 2/4] drivers/staging/fsl-mc: Fix DPIO error path issues

2018-03-27 Thread Robin Murphy

Hi Roy,

On 26/03/18 20:05, Roy Pledge wrote:

The error path in the dpaa2_dpio_probe() function was not properly
unmapping the QBMan device memory on the error path. This was also
missing from the dpaa2_dpio_release() function.

Also addresses a memory leak of the device private data structure.

Signed-off-by: Roy Pledge 
---
  drivers/staging/fsl-mc/bus/dpio/dpio-driver.c | 49 +++
  1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/drivers/staging/fsl-mc/bus/dpio/dpio-driver.c 
b/drivers/staging/fsl-mc/bus/dpio/dpio-driver.c
index e00f473..e7a0009 100644
--- a/drivers/staging/fsl-mc/bus/dpio/dpio-driver.c
+++ b/drivers/staging/fsl-mc/bus/dpio/dpio-driver.c
@@ -28,6 +28,7 @@ MODULE_DESCRIPTION("DPIO Driver");
  
  struct dpio_priv {

struct dpaa2_io *io;
+   struct dpaa2_io_desc desc;
  };
  
  static irqreturn_t dpio_irq_handler(int irq_num, void *arg)

@@ -85,7 +86,6 @@ static int register_dpio_irq_handlers(struct fsl_mc_device 
*dpio_dev, int cpu)
  static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev)
  {
struct dpio_attr dpio_attrs;
-   struct dpaa2_io_desc desc;
struct dpio_priv *priv;
int err = -ENOMEM;
struct device *dev = &dpio_dev->dev;
@@ -117,7 +117,7 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev)
dev_err(dev, "dpio_get_attributes() failed %d\n", err);
goto err_get_attr;
}
-   desc.qman_version = dpio_attrs.qbman_version;
+   priv->desc.qman_version = dpio_attrs.qbman_version;
  
  	err = dpio_enable(dpio_dev->mc_io, 0, dpio_dev->mc_handle);

if (err) {
@@ -126,9 +126,9 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev)
}
  
  	/* initialize DPIO descriptor */

-   desc.receives_notifications = dpio_attrs.num_priorities ? 1 : 0;
-   desc.has_8prio = dpio_attrs.num_priorities == 8 ? 1 : 0;
-   desc.dpio_id = dpio_dev->obj_desc.id;
+   priv->desc.receives_notifications = dpio_attrs.num_priorities ? 1 : 0;
+   priv->desc.has_8prio = dpio_attrs.num_priorities == 8 ? 1 : 0;
+   priv->desc.dpio_id = dpio_dev->obj_desc.id;
  
  	/* get the cpu to use for the affinity hint */

if (next_cpu == -1)
@@ -139,19 +139,28 @@ static int dpaa2_dpio_probe(struct fsl_mc_device 
*dpio_dev)
if (!cpu_possible(next_cpu)) {
dev_err(dev, "probe failed. Number of DPIOs exceeds 
NR_CPUS.\n");
err = -ERANGE;
-   goto err_allocate_irqs;
+   goto err_too_many_cpu;
}
-   desc.cpu = next_cpu;
+   priv->desc.cpu = next_cpu;
  
  	/*

 * Set the CENA regs to be the cache inhibited area of the portal to
 * avoid coherency issues if a user migrates to another core.
 */
-   desc.regs_cena = memremap(dpio_dev->regions[1].start,
- resource_size(&dpio_dev->regions[1]),
- MEMREMAP_WC);
-   desc.regs_cinh = ioremap(dpio_dev->regions[1].start,
-resource_size(&dpio_dev->regions[1]));
+   priv->desc.regs_cena = memremap(dpio_dev->regions[1].start,
+   resource_size(&dpio_dev->regions[1]),
+   MEMREMAP_WC);


Since you already have some devres-managed resources in this driver, 
maybe use devm_memremap? (and perhaps convert the existing ioremap too)



+   if (!priv->desc.regs_cena) {
+   dev_err(dev, "memremap failed\n");
+   goto err_too_many_cpu;
+   }
+
+   priv->desc.regs_cinh = ioremap(dpio_dev->regions[1].start,
+  resource_size(&dpio_dev->regions[1]));
+   if (!priv->desc.regs_cinh) {
+   dev_err(dev, "ioremap failed\n");
+   goto err_ioremap_failed;
+   }
  
  	err = fsl_mc_allocate_irqs(dpio_dev);

if (err) {
@@ -159,11 +168,11 @@ static int dpaa2_dpio_probe(struct fsl_mc_device 
*dpio_dev)
goto err_allocate_irqs;
}
  
-	err = register_dpio_irq_handlers(dpio_dev, desc.cpu);

+   err = register_dpio_irq_handlers(dpio_dev, priv->desc.cpu);
if (err)
goto err_register_dpio_irq;
  
-	priv->io = dpaa2_io_create(&desc);

+   priv->io = dpaa2_io_create(&priv->desc);
if (!priv->io) {
dev_err(dev, "dpaa2_io_create failed\n");
goto err_dpaa2_io_create;
@@ -171,7 +180,7 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev)
  
  	dev_info(dev, "probed\n");

dev_dbg(dev, "   receives_notifications = %d\n",
-   desc.receives_notifications);
+   priv->desc.receives_notifications);
dpio_close(dpio_dev->mc_io, 0, dpio_dev->mc_handle);
fsl_mc_portal_free(dpio_dev->mc_io);
  
@@ -182,6 +191,10 @@ static int dpaa2_dpio_probe(struct fsl_mc_device *dpio_dev)

  err_register_dpio_irq:
fsl_mc_fr

Re: [PATCH 6/7] staging: fsl-mc: rewrite mc command send/receive to work on 32-bits

2017-07-17 Thread Robin Murphy
On 17/07/17 14:26, laurentiu.tu...@nxp.com wrote:
> From: Laurentiu Tudor 
> 
> Split the 64-bit accesses in 32-bit accesses because there's no real
> constrain in MC to do only atomic 64-bit. There's only one place
> where ordering is important: when writing the MC command header the
> first 32-bit part of the header must be written last.
> We do this switch in order to allow compiling the driver on 32-bit.

#include 

Then you can just use writeq()/writeq_relaxed()/readq_relaxed() on
32-bit platforms as well.

Robin.

> Signed-off-by: Laurentiu Tudor 
> ---
>  drivers/staging/fsl-mc/bus/mc-sys.c | 31 ---
>  1 file changed, 12 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c 
> b/drivers/staging/fsl-mc/bus/mc-sys.c
> index 195d9f3..dd2828e 100644
> --- a/drivers/staging/fsl-mc/bus/mc-sys.c
> +++ b/drivers/staging/fsl-mc/bus/mc-sys.c
> @@ -124,14 +124,15 @@ static inline void mc_write_command(struct mc_command 
> __iomem *portal,
>  {
>   int i;
>  
> - /* copy command parameters into the portal */
> - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++)
> - __raw_writeq(cmd->params[i], &portal->params[i]);
> - /* ensure command params are committed before submitting it */
> - wmb();
> -
> - /* submit the command by writing the header */
> - __raw_writeq(cmd->header, &portal->header);
> + /*
> +  * copy command parameters into the portal. Final write
> +  * triggers the submission of the command.
> +  */
> + for (i = sizeof(struct mc_command) / sizeof(u32) - 1; i >= 0; i--) {
> + __raw_writel(((u32 *)cmd)[i], &((u32 *)portal)[i]);
> + /* ensure command params are committed before submitting it */
> + wmb();
> + }
>  }
>  
>  /**
> @@ -148,19 +149,11 @@ static inline enum mc_cmd_status 
> mc_read_response(struct mc_command __iomem *
> struct mc_command *resp)
>  {
>   int i;
> - enum mc_cmd_status status;
> -
> - /* Copy command response header from MC portal: */
> - resp->header = __raw_readq(&portal->header);
> - status = mc_cmd_hdr_read_status(resp);
> - if (status != MC_CMD_STATUS_OK)
> - return status;
>  
> - /* Copy command response data from MC portal: */
> - for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++)
> - resp->params[i] = __raw_readq(&portal->params[i]);
> + for (i = 0; i < sizeof(struct mc_command) / sizeof(u32); i++)
> + ((u32 *)resp)[i] = __raw_readl(&((u32 *)portal)[i]);
>  
> - return status;
> + return mc_cmd_hdr_read_status(resp);
>  }
>  
>  /**
> 

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH 2/7] staging: fsl-mc: use generic memory barriers

2017-07-17 Thread Robin Murphy
On 17/07/17 14:26, laurentiu.tu...@nxp.com wrote:
> From: Laurentiu Tudor 
> 
> No need to use arch-specific memory barriers; switch to using generic
> ones. The rmb()s were useless so drop them.
> 
> Signed-off-by: Laurentiu Tudor 
> ---
>  drivers/staging/fsl-mc/bus/mc-sys.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/staging/fsl-mc/bus/mc-sys.c 
> b/drivers/staging/fsl-mc/bus/mc-sys.c
> index a1704c3..012abd5 100644
> --- a/drivers/staging/fsl-mc/bus/mc-sys.c
> +++ b/drivers/staging/fsl-mc/bus/mc-sys.c
> @@ -127,7 +127,8 @@ static inline void mc_write_command(struct mc_command 
> __iomem *portal,
>   /* copy command parameters into the portal */
>   for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++)
>   __raw_writeq(cmd->params[i], &portal->params[i]);
> - __iowmb();
> + /* ensure command params are committed before submitting it */
> + wmb();
>  
>   /* submit the command by writing the header */
>   __raw_writeq(cmd->header, &portal->header);

AFAICS, just using writeq() here should ensure sufficient order against
the previous iomem accessors, without the need for explicit barriers.

Also, note that the __raw_*() variants aren't endian-safe, so consider
updating things to *_relaxed() where ordering doesn't matter.

Robin.

> @@ -150,9 +151,7 @@ static inline enum mc_cmd_status mc_read_response(struct 
> mc_command __iomem *
>   enum mc_cmd_status status;
>  
>   /* Copy command response header from MC portal: */
> - __iormb();
>   resp->header = __raw_readq(&portal->header);
> - __iormb();
>   status = mc_cmd_hdr_read_status(resp);
>   if (status != MC_CMD_STATUS_OK)
>   return status;
> @@ -160,7 +159,6 @@ static inline enum mc_cmd_status mc_read_response(struct 
> mc_command __iomem *
>   /* Copy command response data from MC portal: */
>   for (i = 0; i < MC_CMD_NUM_OF_PARAMS; i++)
>   resp->params[i] = __raw_readq(&portal->params[i]);
> - __iormb();
>  
>   return status;
>  }
> 

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC PATCH 00/12] Ion cleanup in preparation for moving out of staging

2017-03-10 Thread Robin Murphy
On 10/03/17 10:31, Brian Starkey wrote:
> Hi,
> 
> On Thu, Mar 09, 2017 at 09:38:49AM -0800, Laura Abbott wrote:
>> On 03/09/2017 02:00 AM, Benjamin Gaignard wrote:
> 
> [snip]
> 
>>>
>>> For me those patches are going in the right direction.
>>>
>>> I still have few questions:
>>> - since alignment management has been remove from ion-core, should it
>>> be also removed from ioctl structure ?
>>
>> Yes, I think I'm going to go with the suggestion to fixup the ABI
>> so we don't need the compat layer and as part of that I'm also
>> dropping the align argument.
>>
> 
> Is the only motivation for removing the alignment parameter that
> no-one got around to using it for something useful yet?
> The original comment was true - different devices do have different
> alignment requirements.
> 
> Better alignment can help SMMUs use larger blocks when mapping,
> reducing TLB pressure and the chance of a page table walk causing
> display underruns.

For that use-case, though, alignment alone doesn't necessarily help -
you need the whole allocation granularity to match your block size (i.e.
given a 1MB block size, asking for 17KB and getting back 17KB starting
at a 1MB boundary doesn't help much - that whole 1MB needs to be
allocated and everyone needs to know it to ensure that the whole lot can
be mapped safely). Now, whether it's down to the callers or the heap
implementations to decide and enforce that granularity is another
question, but provided allocations are at least naturally aligned to
whatever the granularity is (which is a reasonable assumption to bake
in) then it's all good.

Robin.

> 
> -Brian
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] staging: vc04_services: setup DMA and coherent mask

2016-11-01 Thread Robin Murphy
Hi Michael,

On 31/10/16 19:53, Michael Zoran wrote:
> On Mon, 2016-10-31 at 11:40 -0700, Michael Zoran wrote:
>> On Mon, 2016-10-31 at 11:36 -0700, Eric Anholt wrote:
>>> Michael Zoran  writes:
>>>
 Setting the DMA mask is optional on 32 bit but
 is mandatory on 64 bit.  Set the DMA mask and coherent
 to force all DMA to be in the 32 bit address space.

 This is considered a "good practice" and most drivers
 already do this.

 Signed-off-by: Michael Zoran 
 ---

[...]

 +  /*
 +   * Setting the DMA mask is necessary in the 64 bit
 environment.
 +   * It isn't necessary in a 32 bit environment but is
 considered
 +   * a good practice.
 +   */
 +  err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>>>
>>> I think a better comment here would be simply:
>>>
>>> /* VCHI messages between the CPU and firmware use 32-bit bus
>>> addresses. */
>>>
>>> explaining why the value is chosen (once you know that the 32 bit
>>> restriction exists, reporting it is obviously needed).  I'm
>>> curious,
>>> though: what failed when you didn't set it?
>>>
>>
>> The comment is easy to change.
>>
>> I don't have the log available ATM, but if I remember the DMA API's
>> bugcheck the first time that are used.  
>>
>> I think this was a policy decision or something because the
>> information
>> should be available in the dma-ranges.
>>
>> If it's important, I can setup a test again without the change and e-
>> mail the logs.
>>
>> If you look at the DWC2 driver you will see that it also sets this
>> mask.
> 
> OK, I'm begging to understand this.  It appears the architecture
> specific paths are very different.
> 
> In arm the mask and coherent is set to DMA_BIT_MASK(32) in mm/dma-
> mapping.c the first time the dma APIs are used.  On arm64, it appears
> this variable is uninitialized and will contain random crude.
> 
> Like I said, I don't know if this is a policy decision or if it just
> slipped through the cracks.

[...]

> arch/arm64/mm/dma-mapping.c(Note no call to get_coherent_dma_mask)
> 
> static void *__dma_alloc(struct device *dev, size_t size,
>dma_addr_t *dma_handle, gfp_t flags,
>unsigned long attrs)
> {
>   struct page *page;
>   void *ptr, *coherent_ptr;
>   bool coherent = is_device_dma_coherent(dev);
>   pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, false);
> 
>   size = PAGE_ALIGN(size);
> 
>   if (!coherent && !gfpflags_allow_blocking(flags)) {
>   struct page *page = NULL;
>   void *addr = __alloc_from_pool(size, &page, flags);
> 
>   if (addr)
>   *dma_handle = phys_to_dma(dev,
> page_to_phys(page));
> 
>   return addr;
>   }
> 
>   ptr = __dma_alloc_coherent(dev, size, dma_handle, flags,
> attrs);

In the general case, the device's coherent DMA mask is checked inside
that helper function, and then again further on in the SWIOTLB code if
necessary. For the atomic pool case beforehand, the pool is already
allocated below 4GB, and on arm64 we don't really expect devices to have
DMA masks *smaller* than 32 bits.

Devices created from DT get 32-bit DMA masks by default, although the
dma-ranges property may override that - see of_dma_configure() - so if
you somehow have a device with an uninitialised mask then I can only
assume there's some platform driver shenanigans going on. Adding the
dma_set_mask() call to the driver is no bad thing, but the rationale in
the commit message is bogus.

Robin.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCHv2] staging: android: ion: use the manged version of DMA memory allocation

2016-02-03 Thread Robin Murphy

On 03/02/16 06:49, Gujulan Elango, Hari Prasath (H.) wrote:

From: Hari Prasath Gujulan Elango 

Use the managed version of the dma_alloc_coherent() i.e. the
dmam_alloc_coherent() & accordingly cleanup the error handling
part.Also,remove the references to dma_free_coherent


That last aspect looks a bit off to me - the heaps don't seem to be 
something that exist for the lifetime of the ION "device", given that 
these are specific runtime alloc and free calls, rather than the probe 
and remove routines. I don't know if CMA heaps are among those which ION 
creates and destroys frequently (enough that it apparently kicks off a 
whole background thread to manage the freeing), but this looks like a 
recipe for leaks. If the free call doesn't actually free the buffer, 
it's going to remain hanging around consuming precious CMA area until 
the ION device itself is torn down, which is likely never.


I wouldn't say it's necessarily inappropriate to use managed DMA 
resources here to cover unexpected failure cases for the ION device 
itself (I see the comment in ion_device_remove()), but that means still 
using dmam_free_coherent() when naturally releasing allocations for 
other reasons (i.e. both cases here). Think Java finalisers, rather than 
C++ destructors.


Robin.


Signed-off-by: Hari Prasath Gujulan Elango 
---
v2:kbuild test robot reported warnings on ununsed
   variables.Those warnings are fixed.
---
  drivers/staging/android/ion/ion_cma_heap.c | 11 ++-
  1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/android/ion/ion_cma_heap.c 
b/drivers/staging/android/ion/ion_cma_heap.c
index a3446da..8cd720b 100644
--- a/drivers/staging/android/ion/ion_cma_heap.c
+++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -61,7 +61,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
ion_buffer *buffer,
if (!info)
return ION_CMA_ALLOCATE_FAILED;

-   info->cpu_addr = dma_alloc_coherent(dev, len, &(info->handle),
+   info->cpu_addr = dmam_alloc_coherent(dev, len, &(info->handle),
GFP_HIGHUSER | __GFP_ZERO);

if (!info->cpu_addr) {
@@ -71,7 +71,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
ion_buffer *buffer,

info->table = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
if (!info->table)
-   goto free_mem;
+   goto err;

if (dma_get_sgtable(dev, info->table, info->cpu_addr, info->handle,
len))
@@ -83,8 +83,6 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
ion_buffer *buffer,

  free_table:
kfree(info->table);
-free_mem:
-   dma_free_coherent(dev, len, info->cpu_addr, info->handle);
  err:
kfree(info);
return ION_CMA_ALLOCATE_FAILED;
@@ -92,13 +90,8 @@ err:

  static void ion_cma_free(struct ion_buffer *buffer)
  {
-   struct ion_cma_heap *cma_heap = to_cma_heap(buffer->heap);
-   struct device *dev = cma_heap->dev;
struct ion_cma_buffer_info *info = buffer->priv_virt;

-   dev_dbg(dev, "Release buffer %p\n", buffer);
-   /* release memory */
-   dma_free_coherent(dev, buffer->size, info->cpu_addr, info->handle);
/* release sg table */
sg_free_table(info->table);
kfree(info->table);



___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] staging: ion: ion_cma_heap: Don't directly use dma_common_get_sgtable

2015-07-20 Thread Robin Murphy

Hi Laura,

On 17/07/15 17:50, Laura Abbott wrote:

On 07/17/2015 08:21 AM, Robin Murphy wrote:

Hi Tixy,

On 17/07/15 12:01, Jon Medhurst (Tixy) wrote:

Use dma_get_sgtable rather than dma_common_get_sgtable so a device's
dma_ops aren't bypassed. This is essential in situations where a device
uses an IOMMU and the physical memory is not contiguous (as the common
function assumes).

Signed-off-by: Jon Medhurst 


The lack of obvious users of this code makes it hard to tell if "dev"
  hereis always the right, real, device pointer and never null or some
  dummy device with the wrong dma_ops, but the rest of the calls in this
  file are to the proper DMA API interface so at least this patch definitely
  makes things less wrong in that respect.




Ion currently lacks any standard way to set up heaps and associate a device
with a heap. This means it's basically a free for all for what devices get
associated (getting something mainlined might help...). I agree that using
the proper DMA APIs is a step in the right direction.


I suspected as much, thanks for the confirmation.




Reviewed-by: Robin Murphy 


---

This also begs the question as to what happens if the memory region _is_
contiguous but is in highmem or an ioremapped region. Should a device
always provide dma_ops for that case? Because I believe the current
implementation of dma_common_get_sgtable won't work for those as it uses
virt_to_page.

I see that this point has been raised before [1] by Zeng Tao, and I
myself have been given a different fix to apply to a Linaro kernel tree.
However, both solutions looked wrong to me as they treat a dma_addr_t as
a physical address, so should at least be using dma_to_phys.
So, should we fix dma_common_get_sgtable or mandate that the device
has dma_ops? The latter seems to be implied by the commit message which
introduced the function:

  This patch provides a generic implementation based on
  virt_to_page() call. Architectures which require more
  sophisticated translation might provide their own get_sgtable()
  methods.


Given that we're largely here due to having poked this on arm64 systems,
  I'm inclined to think that implementing our own get_sgtable as per arch/arm
  is the right course of action. Since a lot of architectures using
  dma_common_get_sgtable don't even implement dma_to_phys, I don't think it
  would be right to try complicating the common code for a case that seems to
  be all but common. I can spin an arm64 patch if you like.



This would be hit on any system that has non-coherent DMA or highmem. I'm
not sure I agree this isn't a common case. How many of the other
architectures are actually using the dma_get_sgtable and would have the
potential to find a problem?


This appears to be pretty much exclusively a graphics/video thing. 
Surveying in-tree callers (other than Ion) gives DRM, V4L, and a couple 
of specific ARM SoC drivers - my hunch is that none of those see much 
action on the likes of Blackfin and 68k.


That said, going through the git logs, the primary purpose of 
dma_common_get_sgtable would appear to be not breaking allmodconfig 
builds on architectures other than ARM. Thus I'm not really sure which 
is the least worst option - having "common" code which doesn't actually 
represent the common use case, or adding bogus dma_to_phys definitions 
to loads of architectures that don't even have proper DMA mapping 
implementations for the sake of some code they don't even use...


Robin.



Thanks,
Laura



___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [PATCH] staging: ion: ion_cma_heap: Don't directly use dma_common_get_sgtable

2015-07-17 Thread Robin Murphy

Hi Tixy,

On 17/07/15 12:01, Jon Medhurst (Tixy) wrote:

Use dma_get_sgtable rather than dma_common_get_sgtable so a device's
dma_ops aren't bypassed. This is essential in situations where a device
uses an IOMMU and the physical memory is not contiguous (as the common
function assumes).

Signed-off-by: Jon Medhurst 


The lack of obvious users of this code makes it hard to tell if "dev" 
here is always the right, real, device pointer and never null or some 
dummy device with the wrong dma_ops, but the rest of the calls in this 
file are to the proper DMA API interface so at least this patch 
definitely makes things less wrong in that respect.


Reviewed-by: Robin Murphy 


---

This also begs the question as to what happens if the memory region _is_
contiguous but is in highmem or an ioremapped region. Should a device
always provide dma_ops for that case? Because I believe the current
implementation of dma_common_get_sgtable won't work for those as it uses
virt_to_page.

I see that this point has been raised before [1] by Zeng Tao, and I
myself have been given a different fix to apply to a Linaro kernel tree.
However, both solutions looked wrong to me as they treat a dma_addr_t as
a physical address, so should at least be using dma_to_phys.
So, should we fix dma_common_get_sgtable or mandate that the device
has dma_ops? The latter seems to be implied by the commit message which
introduced the function:

 This patch provides a generic implementation based on
 virt_to_page() call. Architectures which require more
 sophisticated translation might provide their own get_sgtable()
 methods.


Given that we're largely here due to having poked this on arm64 systems, 
I'm inclined to think that implementing our own get_sgtable as per 
arch/arm is the right course of action. Since a lot of architectures 
using dma_common_get_sgtable don't even implement dma_to_phys, I don't 
think it would be right to try complicating the common code for a case 
that seems to be all but common. I can spin an arm64 patch if you like.


Robin.


Note, I don't have a system where any of this code is used to test
things, and have never looked at this area before yesterday, so I may
have misunderstood what’s going on in the code.

[1] https://lkml.org/lkml/2014/12/1/584

  drivers/staging/android/ion/ion_cma_heap.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/staging/android/ion/ion_cma_heap.c 
b/drivers/staging/android/ion/ion_cma_heap.c
index f4211f1..86b91fd 100644
--- a/drivers/staging/android/ion/ion_cma_heap.c
+++ b/drivers/staging/android/ion/ion_cma_heap.c
@@ -73,8 +73,7 @@ static int ion_cma_allocate(struct ion_heap *heap, struct 
ion_buffer *buffer,
if (!info->table)
goto free_mem;

-   if (dma_common_get_sgtable
-   (dev, info->table, info->cpu_addr, info->handle, len))
+   if (dma_get_sgtable(dev, info->table, info->cpu_addr, info->handle, 
len))
goto free_table;
/* keep this for memory release */
buffer->priv_virt = info;



___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel