from:"Nicolas Saenz Julienne"

Re: [PATCH 09/15] device core: Add ability to handle multiple dma offsets

2020-05-20 Thread Nicolas Saenz Julienne

Hi Jim,
thanks for having a go at this! My two cents.

On Tue, 2020-05-19 at 16:34 -0400, Jim Quinlan wrote:
> The device variable 'dma_pfn_offset' is used to do a single
> linear map between cpu addrs and dma addrs.  The variable
> 'dma_map' is added to struct device to point to an array
> of multiple offsets which is required for some devices.
> 
> Signed-off-by: Jim Quinlan 
> ---

[...]

> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -493,6 +493,8 @@ struct dev_links_info {
>   * @bus_dma_limit: Limit of an upstream bridge or bus which imposes a smaller
>   *   DMA limit than the device itself supports.
>   * @dma_pfn_offset: offset of DMA memory range relatively of RAM
> + * @dma_map: Like dma_pfn_offset but used when there are multiple
> + *   pfn offsets for multiple dma-ranges.
>   * @dma_parms:   A low level driver may set these to teach IOMMU code
> about
>   *   segment limitations.
>   * @dma_pools:   Dma pools (if dma'ble device).
> @@ -578,7 +580,12 @@ struct device {
>allocations such descriptors. */
>   u64 bus_dma_limit;  /* upstream dma constraint */
>   unsigned long   dma_pfn_offset;
> -
> +#ifdef CONFIG_DMA_PFN_OFFSET_MAP
> + const void *dma_offset_map; /* Like dma_pfn_offset, but for
> +  * the unlikely case of multiple
> +  * offsets. If non-null, dma_pfn_offset
> +  * will be 0. */

I get a bad feeling about separating the DMA offset handling into two distinct
variables. Albeit generally frowned upon, there is a fair amount of trickery
around dev->dma_pfn_offset all over the kernel. usb_alloc_dev() comes to mind
for example. And this obviously doesn't play well with it. I feel a potential
solution to multiple DMA ranges should completely integrate with the current
device DMA handling code, without special cases, on top of that, be transparent
to the user.

In more concrete terms, I'd repackage dev->bus_dma_limit and
dev->dma_pfn_offset into a list/array of DMA range structures and adapt/create
the relevant getter/setter functions so as for DMA users not to have to worry
about the specifics of a device's DMA constraints. For example, instead of
editing dev->dma_pfn_offset, you'd be passing a DMA range structure to the
device core, and let it take the relevant decisions on how to handle it
internally (overwrite, add a new entry, merge them, etc...).

Easier said than done. :)

Regards,
Nicolas

signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 09/14] device core: Add ability to handle multiple dma offsets

2020-05-27 Thread Nicolas Saenz Julienne

Hi Jim,
one thing comes to mind, there is a small test suite in drivers/of/unittest.c
(specifically of_unittest_pci_dma_ranges()) you could extend it to include your
use cases.

On Tue, 2020-05-26 at 15:12 -0400, Jim Quinlan wrote:
> The new field in struct device 'dma_pfn_offset_map' is used to facilitate
> the use of multiple pfn offsets between cpu addrs and dma addrs.  It is
> similar to 'dma_pfn_offset' except that the offset chosen depends on the
> cpu or dma address involved.
> 
> Signed-off-by: Jim Quinlan 
> ---
>  drivers/of/address.c| 65 +++--
>  drivers/usb/core/message.c  |  3 ++
>  drivers/usb/core/usb.c  |  3 ++
>  include/linux/device.h  | 10 +-
>  include/linux/dma-direct.h  | 10 --
>  include/linux/dma-mapping.h | 46 ++
>  kernel/dma/Kconfig  | 13 
>  7 files changed, 144 insertions(+), 6 deletions(-)
> 

[...]

> @@ -977,10 +1020,19 @@ int of_dma_get_range(struct device *dev, struct
> device_node *np, u64 *dma_addr,
>   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
>range.bus_addr, range.cpu_addr, range.size);
>  
> + num_ranges++;
>   if (dma_offset && range.cpu_addr - range.bus_addr != dma_offset)
> {
> - pr_warn("Can't handle multiple dma-ranges with different
> offsets on node(%pOF)\n", node);
> - /* Don't error out as we'd break some existing DTs */
> - continue;
> + if (!IS_ENABLED(CONFIG_DMA_PFN_OFFSET_MAP)) {
> + pr_warn("Can't handle multiple dma-ranges with
> different offsets on node(%pOF)\n", node);
> + pr_warn("Perhaps set DMA_PFN_OFFSET_MAP=y?\n");
> + /*
> +  * Don't error out as we'd break some existing
> +  * DTs that are using configs w/o
> +  * CONFIG_DMA_PFN_OFFSET_MAP set.
> +  */
> + continue;

dev->bus_dma_limit is set in of_dma_configure(), this function's caller, based
on dma_start's value (set after this continue). So you'd be effectively setting
the dev->bus_dma_limit to whatever we get from the first dma-range.

This can be troublesome depending on how the dma-ranges are setup, for example
if the first dma-range doesn't include the CMA area, in arm64 generally set as
high as possible in ZONE_DMA32, that would render it useless for
dma/{direct/swiotlb}. Again depending on the bus_dma_limit value, if smaller
than ZONE_DMA you'd be unable to allocate any DMA memory.

IMO, a solution to this calls for a revamp of dma-direct's dma_capable(): match
the target DMA memory area with each dma-range we have to see if it fits.

> + }
> + dma_multi_pfn_offset = true;
>   }
>   dma_offset = range.cpu_addr - range.bus_addr;
>  
> @@ -991,6 +1043,13 @@ int of_dma_get_range(struct device *dev, struct
> device_node *np, u64 *dma_addr,
>   dma_end = range.bus_addr + range.size;
>   }
>  
> + if (dma_multi_pfn_offset) {
> + dma_offset = 0;
> + ret = attach_dma_pfn_offset_map(dev, node, num_ranges);
> + if (ret)
> + return ret;
> + }
> +
>   if (dma_start >= dma_end) {
>   ret = -EINVAL;
>   pr_debug("Invalid DMA ranges configuration on node(%pOF)\n",
> diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
> index 6197938dcc2d..aaa3e58f5eb4 100644
> --- a/drivers/usb/core/message.c
> +++ b/drivers/usb/core/message.c
> @@ -1960,6 +1960,9 @@ int usb_set_configuration(struct usb_device *dev, int
> configuration)
>*/
>   intf->dev.dma_mask = dev->dev.dma_mask;
>   intf->dev.dma_pfn_offset = dev->dev.dma_pfn_offset;
> +#ifdef CONFIG_DMA_PFN_OFFSET_MAP
> + intf->dev.dma_pfn_offset_map = dev->dev.dma_pfn_offset_map;
> +#endif

Thanks for looking at this, that said, I see more instances of drivers changing
dma_pfn_offset outside of the core code. Why not doing this there too?

Also, are we 100% sure that dev->dev.dma_pfn_offset isn't going to be freed
before we're done using intf->dev? Maybe it's safer to copy the ranges?

>   INIT_WORK(&intf->reset_ws, __usb_queue_reset_device);
>   intf->minor = -1;
>   device_initialize(&intf->dev);
> diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
> index f16c26dc079d..d2ed4d90e56e 100644
> --- a/drivers/usb/core/usb.c
> +++ b/drivers/usb/core/usb.c
> @@ -612,6 +612,9 @@ struct usb_device *usb_alloc_dev(struct usb_device
> *parent,
>*/
>   dev->dev.dma_mask = bus->sysdev->dma_mask;
>   dev->dev.dma_pfn_offset = bus->sysdev->dma_pfn_offset;
> +#ifdef CONFIG_DMA_PFN_OFFSET_MAP
> + de

Re: [PATCH v2 09/14] device core: Add ability to handle multiple dma offsets

2020-05-27 Thread Nicolas Saenz Julienne

Hi Jim,

On Wed, 2020-05-27 at 11:43 -0400, Jim Quinlan wrote:
> Hi Nicolas,
> 
> On Wed, May 27, 2020 at 11:00 AM Nicolas Saenz Julienne
>  wrote:
> > Hi Jim,
> > one thing comes to mind, there is a small test suite in
> > drivers/of/unittest.c
> > (specifically of_unittest_pci_dma_ranges()) you could extend it to include
> > your
> > use cases.
> Sure, will check out.
> > On Tue, 2020-05-26 at 15:12 -0400, Jim Quinlan wrote:
> > > The new field in struct device 'dma_pfn_offset_map' is used to facilitate
> > > the use of multiple pfn offsets between cpu addrs and dma addrs.  It is
> > > similar to 'dma_pfn_offset' except that the offset chosen depends on the
> > > cpu or dma address involved.
> > > 
> > > Signed-off-by: Jim Quinlan 
> > > ---
> > >  drivers/of/address.c| 65 +++--
> > >  drivers/usb/core/message.c  |  3 ++
> > >  drivers/usb/core/usb.c  |  3 ++
> > >  include/linux/device.h  | 10 +-
> > >  include/linux/dma-direct.h  | 10 --
> > >  include/linux/dma-mapping.h | 46 ++
> > >  kernel/dma/Kconfig  | 13 
> > >  7 files changed, 144 insertions(+), 6 deletions(-)
> > > 
> > 
> > [...]
> > 
> > > @@ -977,10 +1020,19 @@ int of_dma_get_range(struct device *dev, struct
> > > device_node *np, u64 *dma_addr,
> > >   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
> > >range.bus_addr, range.cpu_addr, range.size);
> > > 
> > > + num_ranges++;
> > >   if (dma_offset && range.cpu_addr - range.bus_addr !=
> > > dma_offset)
> > > {
> > > - pr_warn("Can't handle multiple dma-ranges with
> > > different
> > > offsets on node(%pOF)\n", node);
> > > - /* Don't error out as we'd break some existing DTs
> > > */
> > > - continue;
> > > + if (!IS_ENABLED(CONFIG_DMA_PFN_OFFSET_MAP)) {
> > > + pr_warn("Can't handle multiple dma-ranges
> > > with
> > > different offsets on node(%pOF)\n", node);
> > > + pr_warn("Perhaps set
> > > DMA_PFN_OFFSET_MAP=y?\n");
> > > + /*
> > > +  * Don't error out as we'd break some
> > > existing
> > > +  * DTs that are using configs w/o
> > > +  * CONFIG_DMA_PFN_OFFSET_MAP set.
> > > +  */
> > > + continue;
> > 
> > dev->bus_dma_limit is set in of_dma_configure(), this function's caller,
> > based
> > on dma_start's value (set after this continue). So you'd be effectively
> > setting
> > the dev->bus_dma_limit to whatever we get from the first dma-range.
> I'm not seeing that at all.  On the  evaluation of each dma-range,
> dma_start and dma_end are re-evaluated to be the lowest and highest
> bus values of the  dma-ranges seen so far.  After all dma-ranges are
> examined,  dev->bus_dma_limit being set to the highest.  In fact, the
> current code -- ie before my commits -- already does this for multiple
> dma-ranges as long as the cpu-bus offset is the same in the
> dma-ranges.

Sorry I got carried away, you're right.

So I understand there is an underlaying assumption that the non DMAble memory
space will always sit on top of the bus memory space, as intertwined as it
might be, so as to every phys_to_dma() call on non DMAble memory to fall above
bus_dma_limit.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 09/13] device core: Introduce multiple dma pfn offsets

2020-06-04 Thread Nicolas Saenz Julienne

Hi Jim,

On Wed, 2020-06-03 at 15:20 -0400, Jim Quinlan wrote:
> The new field in struct device 'dma_pfn_offset_map' is used to facilitate
> the use of multiple pfn offsets between cpu addrs and dma addrs.  It
> subsumes the role of dev->dma_pfn_offset -- a uniform offset -- and
> designates the single offset a special case.
> 
> of_dma_configure() is the typical manner to set pfn offsets but there
> are a number of ad hoc assignments to dev->dma_pfn_offset in the
> kernel code.  These cases now invoke the function
> attach_uniform_dma_pfn_offset(dev, pfn_offset).
> 
> Signed-off-by: Jim Quinlan 
> ---
>  arch/arm/include/asm/dma-mapping.h|  9 +-
>  arch/arm/mach-keystone/keystone.c |  9 +-
>  arch/sh/drivers/pci/pcie-sh7786.c |  3 +-
>  arch/sh/kernel/dma-coherent.c | 17 ++--
>  arch/x86/pci/sta2x11-fixup.c  |  7 +-
>  drivers/acpi/arm64/iort.c |  5 +-
>  drivers/gpu/drm/sun4i/sun4i_backend.c |  7 +-
>  drivers/iommu/io-pgtable-arm.c|  2 +-
>  .../platform/sunxi/sun4i-csi/sun4i_csi.c  |  5 +-
>  .../platform/sunxi/sun6i-csi/sun6i_csi.c  |  5 +-
>  drivers/of/address.c  | 93 +--
>  drivers/of/device.c   |  8 +-
>  drivers/remoteproc/remoteproc_core.c  |  2 +-
>  .../staging/media/sunxi/cedrus/cedrus_hw.c|  7 +-
>  drivers/usb/core/message.c|  4 +-
>  drivers/usb/core/usb.c|  2 +-
>  include/linux/device.h|  4 +-
>  include/linux/dma-direct.h| 16 +++-
>  include/linux/dma-mapping.h   | 45 +
>  kernel/dma/coherent.c | 11 ++-
>  20 files changed, 210 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-
> mapping.h
> index bdd80ddbca34..f1e72f99468b 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -35,8 +35,9 @@ static inline const struct dma_map_ops
> *get_arch_dma_ops(struct bus_type *bus)
>  #ifndef __arch_pfn_to_dma
>  static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
>  {
> - if (dev)
> - pfn -= dev->dma_pfn_offset;
> + if (dev && dev->dma_pfn_offset_map)

Would it make sense to move the dev->dma_pfn_offset_map check into
dma_pfn_offset_from_phys_addr() and return 0 if not available? Same for the
opposite variant of the function. I think it'd make the code a little simpler on
some of the use cases, and overall less error prone if anyone starts using the
function elsewhere.

> + pfn -= dma_pfn_offset_from_phys_addr(dev, PFN_PHYS(pfn));
> +
>   return (dma_addr_t)__pfn_to_bus(pfn);
>  }
>  
> @@ -44,8 +45,8 @@ static inline unsigned long dma_to_pfn(struct device *dev,
> dma_addr_t addr)
>  {
>   unsigned long pfn = __bus_to_pfn(addr);
>  
> - if (dev)
> - pfn += dev->dma_pfn_offset;
> + if (dev && dev->dma_pfn_offset_map)
> + pfn += dma_pfn_offset_from_dma_addr(dev, addr);
>  
>   return pfn;
>  }
> diff --git a/arch/arm/mach-keystone/keystone.c b/arch/arm/mach-
> keystone/keystone.c
> index 638808c4e122..e7d3ee6e9cb5 100644
> --- a/arch/arm/mach-keystone/keystone.c
> +++ b/arch/arm/mach-keystone/keystone.c
> @@ -8,6 +8,7 @@
>   */
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -38,9 +39,11 @@ static int keystone_platform_notifier(struct notifier_block
> *nb,
>   return NOTIFY_BAD;
>  
>   if (!dev->of_node) {
> - dev->dma_pfn_offset = keystone_dma_pfn_offset;
> - dev_err(dev, "set dma_pfn_offset%08lx\n",
> - dev->dma_pfn_offset);
> + int ret = attach_uniform_dma_pfn_offset
> + (dev, keystone_dma_pfn_offset);
> +
> + dev_err(dev, "set dma_pfn_offset%08lx%s\n",
> + dev->dma_pfn_offset, ret ? " failed" : "");
>   }
>   return NOTIFY_OK;
>  }
> diff --git a/arch/sh/drivers/pci/pcie-sh7786.c b/arch/sh/drivers/pci/pcie-
> sh7786.c
> index e0b568aaa701..2e832a5c58c1 100644
> --- a/arch/sh/drivers/pci/pcie-sh7786.c
> +++ b/arch/sh/drivers/pci/pcie-sh7786.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -487,7 +488,7 @@ int pcibios_map_platform_irq(const struct pci_dev *pdev,
> u8 slot, u8 pin)
>  
>  void pcibios_bus_add_device(struct pci_dev *pdev)
>  {
> - pdev->dev.dma_pfn_offset = dma_pfn_offset;
> + attach_uniform_dma_pfn_offset(&pdev->dev, dma_pfn_offset);
>  }
>  
>  static int __init sh7786_pcie_core_init(void)
> diff --git a/arch/sh/kernel/dma-coherent.c b/arch/sh/kernel/dma-coherent.c
> index d4811691b93c..5fc9e358b6c7 100644
> --- a/arch/sh/kernel/dma-coherent.c
> +++ b/arch/sh/kernel/dma-coherent.c
> @@ -14,6 +14,8 @@ void

Re: [PATCH v3 09/13] device core: Introduce multiple dma pfn offsets

2020-06-04 Thread Nicolas Saenz Julienne

Hi Jim,

On Thu, 2020-06-04 at 10:35 -0400, Jim Quinlan wrote:

[...]

> > > --- a/arch/sh/kernel/dma-coherent.c
> > > +++ b/arch/sh/kernel/dma-coherent.c
> > > @@ -14,6 +14,8 @@ void *arch_dma_alloc(struct device *dev, size_t size,
> > > dma_addr_t *dma_handle,
> > >  {
> > >   void *ret, *ret_nocache;
> > >   int order = get_order(size);
> > > + unsigned long pfn;
> > > + phys_addr_t phys;
> > > 
> > >   gfp |= __GFP_ZERO;
> > > 
> > > @@ -34,11 +36,14 @@ void *arch_dma_alloc(struct device *dev, size_t size,
> > > dma_addr_t *dma_handle,
> > >   return NULL;
> > >   }
> > > 
> > > - split_page(pfn_to_page(virt_to_phys(ret) >> PAGE_SHIFT), order);
> > > + phys = virt_to_phys(ret);
> > > + pfn =  phys >> PAGE_SHIFT;
> > 
> > nit: not sure it really pays off to have a pfn variable here.
> Did it for readability; the compiler's optimization should take care
> of any extra variables.  But I can switch if you insist.

No need.

[...]

> > > diff --git a/drivers/media/platform/sunxi/sun6i-csi/sun6i_csi.c
> > > b/drivers/media/platform/sunxi/sun6i-csi/sun6i_csi.c
> > > index 055eb0b8e396..2d66d415b6c3 100644
> > > --- a/drivers/media/platform/sunxi/sun6i-csi/sun6i_csi.c
> > > +++ b/drivers/media/platform/sunxi/sun6i-csi/sun6i_csi.c
> > > @@ -898,7 +898,10 @@ static int sun6i_csi_probe(struct platform_device
> > > *pdev)
> > > 
> > >   sdev->dev = &pdev->dev;
> > >   /* The DMA bus has the memory mapped at 0 */
> > > - sdev->dev->dma_pfn_offset = PHYS_OFFSET >> PAGE_SHIFT;
> > > + ret = attach_uniform_dma_pfn_offset(sdev->dev,
> > > + PHYS_OFFSET >> PAGE_SHIFT);
> > > + if (ret)
> > > + return ret;
> > > 
> > >   ret = sun6i_csi_resource_request(sdev, pdev);
> > >   if (ret)
> > > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > > index 96d8cfb14a60..c89333b0a5fb 100644
> > > --- a/drivers/of/address.c
> > > +++ b/drivers/of/address.c
> > > @@ -918,6 +918,70 @@ void __iomem *of_io_request_and_map(struct
> > > device_node
> > > *np, int index,
> > >  }
> > >  EXPORT_SYMBOL(of_io_request_and_map);
> > > 
> > > +static int attach_dma_pfn_offset_map(struct device *dev,
> > > +  struct device_node *node, int
> > > num_ranges)
> > 
> > As with the previous review, please take this comment with a grain of salt.
> > 
> > I think there should be a clear split between what belongs to OF and what
> > belongs to the core device infrastructure.
> > 
> > OF's job should be to parse DT and provide a list/array of ranges, whereas
> > the
> > core device infrastructure should provide an API to assign a list of
> > ranges/offset to a device.
> > 
> > As a concrete example, you're forcing devices like the sta2x11 to build with
> > OF
> > support, which, being an Intel device, it's pretty odd. But I'm also
> > thinking
> > of how will all this fit once an ACPI device wants to use it.
> To fix this I only have to move attach_uniform_dma_pfn_offset() from
> of/address.c to say include/linux/dma-mapping.h.  It has no
> dependencies on OF.  Do you agree?

Yes that seems nicer. In case you didn't had it in mind already, I'd change the
function name to match the naming scheme they use there.

On the other hand, I'd also move the non OF parts of the non unifom dma_offset
version of the function there.

> > Expanding on this idea, once you have a split between the OF's and device
> > core
> > roles, it transpires that of_dma_get_range()'s job should only be to provide
> > the ranges in a device understandable structure and of_dma_configre()'s to
> > actually assign the device's parameters. This would obsolete patch #7.
> 
> I think you mean patch #8.

Yes, my bad.

> I agree with you.  The reason I needed a "struct device *"  in the call is
> because I wanted to make sure the memory that is alloc'd belongs to the
> device that needs it.  If I do a regular kzalloc(), this memory  will become
> a leak once someone starts unbinding/binding their device.  Also, in  all
> uses of of_dma_rtange() -- there is only one --  a dev is required as one
> can't attach an offset map to NULL.
> 
> I do see that there are a number of functions in drivers/of/*.c that
> take 'struct device *dev' as an argument so there is precedent for
> something like this.  Regardless, I need an owner to the memory I
> alloc().

I understand the need for dev to be around, devm_*() is key. But also it's
important to keep the functions on purpose. And if of_dma_get_range() starts
setting ranges it calls, for the very least, for a function rename. Although
I'd rather split the parsing and setting of ranges as mentioned earlier. That
said, I get that's a more drastic move.

Talking about drastic moves. How about getting rid of the concept of
dma_pfn_offset for drivers altogether. Let them provide dma_pfn_offset_regions
(even when there is only one). I feel it's conceptually nicer, as you'd be
dealing only in

Re: [PATCH v3 09/13] device core: Introduce multiple dma pfn offsets

2020-06-05 Thread Nicolas Saenz Julienne

Hi Christoph,
a question arouse, is there a real value to dealing with PFNs (as opposed to
real addresses) in the core DMA code/structures? I see that in some cases it
eases interacting with mm, but the overwhelming usage of say,
dev->dma_pfn_offset, involves shifting it.

Hi Jim,
On Thu, 2020-06-04 at 14:01 -0400, Jim Quinlan wrote:
> Hi Nicolas,

[...]

> > I understand the need for dev to be around, devm_*() is key. But also it's
> > important to keep the functions on purpose. And if of_dma_get_range() starts
> > setting ranges it calls, for the very least, for a function rename. Although
> > I'd rather split the parsing and setting of ranges as mentioned earlier.
> > That
> > said, I get that's a more drastic move.
> 
> I agree with you.  I could do this from device.c:
> 
> of_dma_get_num_ranges(..., &num_ranges); /* new function */
> r = devm_kcalloc(dev, num_ranges + 1, sizeof(*r), GFP_KERNEL);
> of_dma_get_range(np, &dma_addr, &paddr, &size, r, num_ranges);
> 
> The problem here is that there could be four ranges, all with
> offset=0.  My current code would optimize this case out but the above
> would have us holding useless memory and looping through the four
> ranges on every dma <=> phys conversion only to add 0.

Point taken. Ultimately it's setting the device's dma ranges in
of_dma_get_range() that was really bothering me, so if we have to pass the
device pointer for allocations, be it.

> > Talking about drastic moves. How about getting rid of the concept of
> > dma_pfn_offset for drivers altogether. Let them provide
> > dma_pfn_offset_regions
> > (even when there is only one). I feel it's conceptually nicer, as you'd be
> > dealing only in one currency, so to speak, and you'd centralize the bus DMA
> > ranges setter function which is always easier to maintain.
> Do you agree that we have to somehow hang this info on the struct
> device structure?  Because in the dma2phys() and phys2dma() all you
> have is the dev parameter.  I don't see how this  can be done w/o
> involving dev.

Sorry I didn't make myself clear here. What bothers me is having two functions
setting the same device parameter trough different means, I'd be happy to get
rid of attach_uniform_dma_pfn_offset(), and always use the same function to set
a device's bus dma regions. Something the likes of this comes to mind:

dma_attach_pfn_offset_region(struct device *dev, struct dma_pfn_offset_regions 
*r)

We could maybe use some helper macros for the linear case. But that's the gist
of it.

Also, it goes hand in hand with the comment below. Why having a special case
for non sparse DMA offsets in struct dma_pfn_offset_regions? The way I see it,
in this case, code simplicity is more interesting than a small optimization.

> > I'd go as far as not creating a special case for uniform offsets. Let just
> > set
> > cpu_end and dma_end to -1 so we always get a match. It's slightly more
> > compute
> > heavy, but I don't think it's worth the optimization.
> Well, there are two subcases here.  One where we do know the bounds
> and one where we do not.  I suppose for the latter I could have the
> drivers calling it with begin=0 and end=~(dma_addr_t)0.  Let me give
> this some thought...
> 
> > Just my two cents :)
> 
> Worth much more than $0.02 IMO :-)

BTW, would you consider renaming the DMA offset struct to something simpler
like, struct bus_dma_region? It complements 'dev->bus_dma_limit' better IMO.

> BTW, I tried putting the "if (dev->dma_pfn_offset_map)" clause inside
> the inline functions but the problem is that it slows the fastpath;
> consider the following code from dma-direct.h
> 
> if (dev->dma_pfn_offset_map) {
> unsigned long dma_pfn_offset =
dma_pfn_offset_from_phys_addr(dev, paddr);
> 
> dev_addr -= ((dma_addr_t)dma_pfn_offset << PAGE_SHIFT);
> }
> return dev_addr;
> 
> becomes
> 
> unsigned long dma_pfn_offset = dma_pfn_offset_from_phys_addr(dev,
paddr);
> 
> dev_addr -= ((dma_addr_t)dma_pfn_offset << PAGE_SHIFT);
> return dev_addr;
> 
> So those configurations that  have no dma_pfn_offsets are doing an
> unnecessary shift and add.

Fair enough. Still not a huge difference, but I see the value being the most
common case.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 03/11] of/fdt: add of_fdt_machine_is_compatible function

2019-08-20 Thread Nicolas Saenz Julienne

Provides the same functionality as of_machine_is_compatible.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes in v2: None

 drivers/of/fdt.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 9cdf14b9aaab..06ffbd39d9af 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -802,6 +802,13 @@ const char * __init of_flat_dt_get_machine_name(void)
return name;
 }
 
+static const int __init of_fdt_machine_is_compatible(char *name)
+{
+   unsigned long dt_root = of_get_flat_dt_root();
+
+   return of_flat_dt_is_compatible(dt_root, name);
+}
+
 /**
  * of_flat_dt_match_machine - Iterate match tables to find matching machine.
  *
-- 
2.22.0

[PATCH v2 01/11] asm-generic: add dma_zone_size

2019-08-20 Thread Nicolas Saenz Julienne

Some architectures have platform specific DMA addressing limitations.
This will allow for hardware description code to provide the constraints
in a generic manner, so as for arch code to properly setup it's memory
zones and DMA mask.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes in v2: None

 include/asm-generic/dma.h | 8 +++-
 mm/page_alloc.c   | 3 +++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/dma.h b/include/asm-generic/dma.h
index 43d0c8af8058..c2f39cdb64f6 100644
--- a/include/asm-generic/dma.h
+++ b/include/asm-generic/dma.h
@@ -8,7 +8,13 @@
  *
  * Some code relies on seeing MAX_DMA_ADDRESS though.
  */
-#define MAX_DMA_ADDRESS PAGE_OFFSET
+#define MAX_DMA_ADDRESS (PAGE_OFFSET + dma_zone_size)
+
+/*
+ * Some architectures may have platform specific DMA addressing constraints.
+ * Firmware can use this to fine tune the device's DMA memory zone.
+ */
+extern u64 dma_zone_size __ro_after_init;
 
 extern int request_dma(unsigned int dmanr, const char *device_id);
 extern void free_dma(unsigned int dmanr);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 272c6de1bf4e..b514afee5451 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -133,6 +133,9 @@ EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
 unsigned long totalcma_pages __read_mostly;
 
+u64 dma_zone_size __ro_after_init;
+EXPORT_SYMBOL(dma_zone_size);
+
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
 #ifdef CONFIG_INIT_ON_ALLOC_DEFAULT_ON
-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 06/11] arm64: rename variables used to calculate ZONE_DMA32's size

2019-08-20 Thread Nicolas Saenz Julienne

Let the name indicate that they are used to calculate ZONE_DMA32's size
as opposed to ZONE_DMA.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes in v2: None

 arch/arm64/mm/init.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 6112d6c90fa8..8956c22634dd 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -50,7 +50,7 @@
 s64 memstart_addr __ro_after_init = -1;
 EXPORT_SYMBOL(memstart_addr);
 
-phys_addr_t arm64_dma_phys_limit __ro_after_init;
+phys_addr_t arm64_dma32_phys_limit __ro_after_init;
 
 #ifdef CONFIG_KEXEC_CORE
 /*
@@ -168,7 +168,7 @@ static void __init reserve_elfcorehdr(void)
  * currently assumes that for memory starting above 4G, 32-bit devices will
  * use a DMA offset.
  */
-static phys_addr_t __init max_zone_dma_phys(void)
+static phys_addr_t __init max_zone_dma32_phys(void)
 {
phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 32);
return min(offset + (1ULL << 32), memblock_end_of_DRAM());
@@ -181,7 +181,7 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA32
-   max_zone_pfns[ZONE_DMA32] = PFN_DOWN(arm64_dma_phys_limit);
+   max_zone_pfns[ZONE_DMA32] = PFN_DOWN(arm64_dma32_phys_limit);
 #endif
max_zone_pfns[ZONE_NORMAL] = max;
 
@@ -194,16 +194,16 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 {
struct memblock_region *reg;
unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
-   unsigned long max_dma = min;
+   unsigned long max_dma32 = min;
 
memset(zone_size, 0, sizeof(zone_size));
 
/* 4GB maximum for 32-bit only capable devices */
 #ifdef CONFIG_ZONE_DMA32
-   max_dma = PFN_DOWN(arm64_dma_phys_limit);
-   zone_size[ZONE_DMA32] = max_dma - min;
+   max_dma32 = PFN_DOWN(arm64_dma32_phys_limit);
+   zone_size[ZONE_DMA32] = max_dma32 - min;
 #endif
-   zone_size[ZONE_NORMAL] = max - max_dma;
+   zone_size[ZONE_NORMAL] = max - max_dma32;
 
memcpy(zhole_size, zone_size, sizeof(zhole_size));
 
@@ -215,14 +215,14 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
continue;
 
 #ifdef CONFIG_ZONE_DMA32
-   if (start < max_dma) {
-   unsigned long dma_end = min(end, max_dma);
+   if (start < max_dma32) {
+   unsigned long dma_end = min(end, max_dma32);
zhole_size[ZONE_DMA32] -= dma_end - start;
}
 #endif
-   if (end > max_dma) {
+   if (end > max_dma32) {
unsigned long normal_end = min(end, max);
-   unsigned long normal_start = max(start, max_dma);
+   unsigned long normal_start = max(start, max_dma32);
zhole_size[ZONE_NORMAL] -= normal_end - normal_start;
}
}
@@ -407,9 +407,9 @@ void __init arm64_memblock_init(void)
 
/* 4GB maximum for 32-bit only capable devices */
if (IS_ENABLED(CONFIG_ZONE_DMA32))
-   arm64_dma_phys_limit = max_zone_dma_phys();
+   arm64_dma32_phys_limit = max_zone_dma32_phys();
else
-   arm64_dma_phys_limit = PHYS_MASK + 1;
+   arm64_dma32_phys_limit = PHYS_MASK + 1;
 
reserve_crashkernel();
 
@@ -417,7 +417,7 @@ void __init arm64_memblock_init(void)
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
 
-   dma_contiguous_reserve(arm64_dma_phys_limit);
+   dma_contiguous_reserve(arm64_dma32_phys_limit);
 }
 
 void __init bootmem_init(void)
@@ -521,7 +521,7 @@ static void __init free_unused_memmap(void)
 void __init mem_init(void)
 {
if (swiotlb_force == SWIOTLB_FORCE ||
-   max_pfn > (arm64_dma_phys_limit >> PAGE_SHIFT))
+   max_pfn > (arm64_dma32_phys_limit >> PAGE_SHIFT))
swiotlb_init(1);
else
swiotlb_force = SWIOTLB_NO_FORCE;
-- 
2.22.0

[PATCH v2 00/11] Raspberry Pi 4 DMA addressing support

2019-08-20 Thread Nicolas Saenz Julienne

Hi all,
this series attempts to address some issues we found while bringing up
the new Raspberry Pi 4 in arm64 and it's intended to serve as a follow
up of these discussions:
v1: https://lkml.org/lkml/2019/7/31/922
RFC: https://lkml.org/lkml/2019/7/17/476

The new Raspberry Pi 4 has up to 4GB of memory but most peripherals can
only address the first GB: their DMA address range is
0xc000-0xfc00 which is aliased to the first GB of physical
memory 0x-0x3c00. Note that only some peripherals have these
limitations: the PCIe, V3D, GENET, and 40-bit DMA channels have a wider
view of the address space by virtue of being hooked up trough a second
interconnect.

Part of this is solved in arm32 by setting up the machine specific
'.dma_zone_size = SZ_1G', which takes care of reserving the coherent
memory area at the right spot. That said no buffer bouncing (needed for
dma streaming) is available at the moment, but that's a story for
another series.

Unfortunately there is no such thing as 'dma_zone_size' in arm64. Only
ZONE_DMA32 is created which is interpreted by dma-direct and the arm64
arch code as if all peripherals where be able to address the first 4GB
of memory.

In the light of this, the series implements the following changes:

- Create generic 'dma_zone_size' in order for hardware description code
  to set it up when needed.

- Add a function in early_init_dt_scan() to setup 'dma_zone_size' for
  the RPi4.

- Create both DMA zones in arm64, ZONE_DMA will contain the area
  addressable by all peripherals and ZONE_DMA32 the rest of the 32 bit
  addressable memory. ZONE_DMA32 might be left empty.

- Reserve the CMA area in a place suitable for all peripherals.

- Inform dma-direct of the new runtime calculated min_mask.

This series has been tested on multiple devices both by checking the
zones setup matches the expectations and by double-checking physical
addresses on pages allocated on the three relevant areas GFP_DMA,
GFP_DMA32, GFP_KERNEL:

- On an RPi4 with variations on the ram memory size. But also forcing
  the situation where all three memory zones are nonempty by setting a 3G
  ZONE_DMA32 ceiling on a 4G setup. Both with and without NUMA support.

- On a Synquacer box[1] with 32G of memory.

- On an ACPI based Huawei TaiShan server[2] with 256G of memory.

- On a QEMU virtual machine running arm64's OpenSUSE Tumbleweed.

That's all.

Regards,
Nicolas

[1] https://www.96boards.org/product/developerbox/
[2] 
https://e.huawei.com/en/products/cloud-computing-dc/servers/taishan-server/taishan-2280-v2

---

Changes in v2:
- More in depth testing.
- Create new global 'dma_zone_size'.
- New approach to getting the dma_zone_size, instead of parsing the dts
  we hardcode it conditionally to the machine compatible name.
- Fix ZONE_DMA and ZONE_DMA32 split, now ZONE_DMA32 remains empty if
  ZONE_DMA fits the whole 32 bit addressable space.
- Take into account devices with DMA offset.
- Rename new dma-direct variable to zone_dma_bits.
- Try new approach by merging both ZONE_DMA and ZONE_DMA32 comments
  in mmzone.h, add new up to date examples.

Nicolas Saenz Julienne (11):
  asm-generic: add dma_zone_size
  arm: use generic dma_zone_size
  of/fdt: add of_fdt_machine_is_compatible function
  of/fdt: add early_init_dt_get_dma_zone_size()
  arm64: mm: use arm64_dma_phys_limit instead of calling
max_zone_dma_phys()
  arm64: rename variables used to calculate ZONE_DMA32's size
  arm64: re-introduce max_zone_dma_phys()
  arm64: use both ZONE_DMA and ZONE_DMA32
  dma-direct: turn ARCH_ZONE_DMA_BITS into a variable
  arm64: edit zone_dma_bits to fine tune dma-direct min mask
  mm: refresh ZONE_DMA and ZONE_DMA32 comments in 'enum zone_type'

 arch/arm/include/asm/dma.h  |  8 ++--
 arch/arm/mm/init.c  | 12 ++
 arch/arm64/Kconfig  |  4 ++
 arch/arm64/mm/init.c| 73 +
 arch/powerpc/include/asm/page.h |  9 
 arch/powerpc/mm/mem.c   | 16 +---
 arch/s390/include/asm/page.h|  2 -
 arch/s390/mm/init.c |  1 +
 drivers/of/fdt.c| 15 +++
 include/asm-generic/dma.h   |  8 +++-
 include/linux/dma-direct.h  |  2 +
 include/linux/mmzone.h  | 46 -
 kernel/dma/direct.c | 13 +++---
 mm/page_alloc.c |  3 ++
 14 files changed, 140 insertions(+), 72 deletions(-)

-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 02/11] arm: use generic dma_zone_size

2019-08-20 Thread Nicolas Saenz Julienne

'dma_zone_size' was created as a generic replacement to
'arm_dma_zone_size'. Use it accordingly.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes in v2: None

 arch/arm/include/asm/dma.h |  8 +---
 arch/arm/mm/init.c | 12 
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/dma.h b/arch/arm/include/asm/dma.h
index a81dda65c576..52d19ffd92b4 100644
--- a/arch/arm/include/asm/dma.h
+++ b/arch/arm/include/asm/dma.h
@@ -2,16 +2,18 @@
 #ifndef __ASM_ARM_DMA_H
 #define __ASM_ARM_DMA_H
 
+#include 
+
 /*
  * This is the maximum virtual address which can be DMA'd from.
  */
+#undef MAX_DMA_ADDRESS
 #ifndef CONFIG_ZONE_DMA
 #define MAX_DMA_ADDRESS0xUL
 #else
 #define MAX_DMA_ADDRESS({ \
-   extern phys_addr_t arm_dma_zone_size; \
-   arm_dma_zone_size && arm_dma_zone_size < (0x1000 - PAGE_OFFSET) ? \
-   (PAGE_OFFSET + arm_dma_zone_size) : 0xUL; })
+   dma_zone_size && dma_zone_size < (0x1000 - PAGE_OFFSET) ? \
+   (PAGE_OFFSET + dma_zone_size) : 0xUL; })
 #endif
 
 #ifdef CONFIG_ISA_DMA_API
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 16d373d587c4..95680bad245a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -79,10 +79,6 @@ static void __init find_limits(unsigned long *min, unsigned 
long *max_low,
 }
 
 #ifdef CONFIG_ZONE_DMA
-
-phys_addr_t arm_dma_zone_size __read_mostly;
-EXPORT_SYMBOL(arm_dma_zone_size);
-
 /*
  * The DMA mask corresponding to the maximum bus address allocatable
  * using GFP_DMA.  The default here places no restriction on DMA
@@ -109,8 +105,8 @@ void __init setup_dma_zone(const struct machine_desc *mdesc)
 {
 #ifdef CONFIG_ZONE_DMA
if (mdesc->dma_zone_size) {
-   arm_dma_zone_size = mdesc->dma_zone_size;
-   arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
+   dma_zone_size = mdesc->dma_zone_size;
+   arm_dma_limit = PHYS_OFFSET + dma_zone_size - 1;
} else
arm_dma_limit = 0x;
arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
@@ -164,9 +160,9 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max_low,
 * Adjust the sizes according to any special requirements for
 * this machine type.
 */
-   if (arm_dma_zone_size)
+   if (dma_zone_size)
arm_adjust_dma_zone(zone_size, zhole_size,
-   arm_dma_zone_size >> PAGE_SHIFT);
+   dma_zone_size >> PAGE_SHIFT);
 #endif
 
free_area_init_node(0, zone_size, min, zhole_size);
-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 10/11] arm64: edit zone_dma_bits to fine tune dma-direct min mask

2019-08-20 Thread Nicolas Saenz Julienne

With the introduction of ZONE_DMA in arm64 devices are not forced to
support 32 bit DMA masks. We have to inform dma-direct of this
limitation whenever it happens.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes in v2:
- Make sure to filter the higher part of arm64_dma_phys_limit
- Rename variable to zone_dma_bits

 arch/arm64/mm/init.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index c51ce79b692b..c5e619f21ad8 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -437,8 +438,10 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
arm64_dma_phys_limit = max_zone_dma_phys();
+   zone_dma_bits = ilog2((arm64_dma_phys_limit - 1) & 
GENMASK_ULL(31, 0)) + 1;
+   }
 
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_dma32_phys();
-- 
2.22.0

[PATCH v2 04/11] of/fdt: add early_init_dt_get_dma_zone_size()

2019-08-20 Thread Nicolas Saenz Julienne

Some devices might have weird DMA addressing limitations that only apply
to a subset of the available peripherals. For example the Raspberry Pi 4
has two interconnects, one able to address the whole lower 4G memory
area and another one limited to the lower 1G.

Being an uncommon situation we simply hardcode the device wide DMA
addressable memory size conditionally to the machine compatible name and
set 'dma_zone_size' accordingly.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes in v2:
- New approach to getting dma_zone_size, instead of parsing the dts we
  hardcode it conditionally to the machine compatible name.

 drivers/of/fdt.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 06ffbd39d9af..f756e8c05a77 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -27,6 +27,7 @@
 
 #include   /* for COMMAND_LINE_SIZE */
 #include 
+#include/* for dma_zone_size */
 
 #include "of_private.h"
 
@@ -1195,6 +1196,12 @@ void __init early_init_dt_scan_nodes(void)
of_scan_flat_dt(early_init_dt_scan_memory, NULL);
 }
 
+void __init early_init_dt_get_dma_zone_size(void)
+{
+   if (of_fdt_machine_is_compatible("brcm,bcm2711"))
+   dma_zone_size = 0x3c00;
+}
+
 bool __init early_init_dt_scan(void *params)
 {
bool status;
@@ -1204,6 +1211,7 @@ bool __init early_init_dt_scan(void *params)
return false;
 
early_init_dt_scan_nodes();
+   early_init_dt_get_dma_zone_size();
return true;
 }
 
-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 09/11] dma-direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-08-20 Thread Nicolas Saenz Julienne

Some architectures, notably arm64, are interested in tweaking this
depending on their runtime dma addressing limitations.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes in v2:
- Rename new variable to zone_dma_bits
- Update comment with Christoph's suggestion
- Remove old powerpc comment

 arch/powerpc/include/asm/page.h |  9 -
 arch/powerpc/mm/mem.c   | 16 +++-
 arch/s390/include/asm/page.h|  2 --
 arch/s390/mm/init.c |  1 +
 include/linux/dma-direct.h  |  2 ++
 kernel/dma/direct.c | 13 ++---
 6 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 0d52f57fca04..73668a21ae78 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -319,13 +319,4 @@ struct vm_area_struct;
 #endif /* __ASSEMBLY__ */
 #include 
 
-/*
- * Allow 30-bit DMA for very limited Broadcom wifi chips on many powerbooks.
- */
-#ifdef CONFIG_PPC32
-#define ARCH_ZONE_DMA_BITS 30
-#else
-#define ARCH_ZONE_DMA_BITS 31
-#endif
-
 #endif /* _ASM_POWERPC_PAGE_H */
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 9191a66b3bc5..2a69f87585df 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -201,10 +202,10 @@ static int __init mark_nonram_nosave(void)
  * everything else. GFP_DMA32 page allocations automatically fall back to
  * ZONE_DMA.
  *
- * By using 31-bit unconditionally, we can exploit ARCH_ZONE_DMA_BITS to
- * inform the generic DMA mapping code.  32-bit only devices (if not handled
- * by an IOMMU anyway) will take a first dip into ZONE_NORMAL and get
- * otherwise served by ZONE_DMA.
+ * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
+ * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
+ * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
+ * ZONE_DMA.
  */
 static unsigned long max_zone_pfns[MAX_NR_ZONES];
 
@@ -237,9 +238,14 @@ void __init paging_init(void)
printk(KERN_DEBUG "Memory hole size: %ldMB\n",
   (long int)((top_of_ram - total_ram) >> 20));
 
+   if (IS_ENABLED(CONFIG_PPC32))
+   zone_dma_bits = 30;
+   else
+   zone_dma_bits = 31;
+
 #ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
- 1UL << (ARCH_ZONE_DMA_BITS - PAGE_SHIFT));
+ 1UL << (zone_dma_bits - PAGE_SHIFT));
 #endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 823578c6b9e2..a4d38092530a 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -177,8 +177,6 @@ static inline int devmem_is_allowed(unsigned long pfn)
 #define VM_DATA_DEFAULT_FLAGS  (VM_READ | VM_WRITE | \
 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
-#define ARCH_ZONE_DMA_BITS 31
-
 #include 
 #include 
 
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 20340a03ad90..bd98465b8b9f 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -118,6 +118,7 @@ void __init paging_init(void)
 
sparse_memory_present_with_active_regions(MAX_NUMNODES);
sparse_init();
+   zone_dma_bits = 31;
memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index adf993a3bd58..d03af3605460 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -5,6 +5,8 @@
 #include 
 #include 
 
+extern unsigned int zone_dma_bits;
+
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include 
 #else
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 795c9b095d75..b23cd65f26e0 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -16,12 +16,11 @@
 #include 
 
 /*
- * Most architectures use ZONE_DMA for the first 16 Megabytes, but
- * some use it for entirely different regions:
+ * Most architectures use ZONE_DMA for the first 16 Megabytes, but some use it
+ * it for entirely different regions. In that case the arch code needs to
+ * override the variable below for dma-direct to work properly.
  */
-#ifndef ARCH_ZONE_DMA_BITS
-#define ARCH_ZONE_DMA_BITS 24
-#endif
+unsigned int zone_dma_bits __ro_after_init = 24;
 
 static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
@@ -69,7 +68,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device 
*dev, u64 dma_mask,
 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
 * zones.
 */
-   if (*phys_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
+   if (*phy

[PATCH v2 08/11] arm64: use both ZONE_DMA and ZONE_DMA32

2019-08-20 Thread Nicolas Saenz Julienne

So far all arm64 devices have supported 32 bit DMA masks for their
peripherals. This is not true anymore for the Raspberry Pi 4 as most of
it's peripherals can only address the first GB or memory of a total of
up to 4 GB.

This goes against ZONE_DMA32's original intent, and breaks other
subsystems as it's expected for ZONE_DMA32 to be addressable with a 32
bit mask. So it was decided to use ZONE_DMA for this specific case.

ZONE_DMA will contain the memory addressable by all the SoC's
peripherals and ZONE_DMA32 the rest of the 32 bit addressable memory. If
all peripherals where able to address the whole 32 bit addressable space
ZONE_DMA32 will be left empty.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes in v2:
- ZONE_DMA will never be left empty
- Update comment to reflect new zones split

 arch/arm64/Kconfig   |  4 
 arch/arm64/mm/init.c | 39 +++
 2 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3adcec05b1f6..a9fd71d3bc8e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -266,6 +266,10 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
def_bool y
 
+config ZONE_DMA
+   bool "Support DMA zone" if EXPERT
+   default y
+
 config ZONE_DMA32
bool "Support DMA32 zone" if EXPERT
default y
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index bc7999020c71..c51ce79b692b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -50,6 +50,14 @@
 s64 memstart_addr __ro_after_init = -1;
 EXPORT_SYMBOL(memstart_addr);
 
+/*
+ * We create both ZONE_DMA and ZONE_DMA32. ZONE_DMA's size is decided based on
+ * whether all the device's peripherals are able to address the first naturally
+ * aligned 4G of memory. If not, ZONE_DMA covers the area common to all them
+ * and ZONE_DMA32 the rest. If ZONE_DMA fits the whole 4G area, ZONE_DMA32 is
+ * left empty.
+ */
+phys_addr_t arm64_dma_phys_limit __ro_after_init;
 phys_addr_t arm64_dma32_phys_limit __ro_after_init;
 
 #ifdef CONFIG_KEXEC_CORE
@@ -191,6 +199,9 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 {
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
+#ifdef CONFIG_ZONE_DMA
+   max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
+#endif
 #ifdef CONFIG_ZONE_DMA32
max_zone_pfns[ZONE_DMA32] = PFN_DOWN(arm64_dma32_phys_limit);
 #endif
@@ -206,13 +217,17 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
struct memblock_region *reg;
unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
unsigned long max_dma32 = min;
+   unsigned long max_dma = min;
 
memset(zone_size, 0, sizeof(zone_size));
 
-   /* 4GB maximum for 32-bit only capable devices */
+#ifdef CONFIG_ZONE_DMA
+   max_dma = PFN_DOWN(arm64_dma_phys_limit);
+   zone_size[ZONE_DMA] = max_dma - min;
+#endif
 #ifdef CONFIG_ZONE_DMA32
max_dma32 = PFN_DOWN(arm64_dma32_phys_limit);
-   zone_size[ZONE_DMA32] = max_dma32 - min;
+   zone_size[ZONE_DMA32] = max_dma32 - max_dma;
 #endif
zone_size[ZONE_NORMAL] = max - max_dma32;
 
@@ -224,11 +239,17 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 
if (start >= max)
continue;
-
+#ifdef CONFIG_ZONE_DMA
+   if (start < max_dma) {
+   unsigned long dma_end = min_not_zero(end, max_dma);
+   zhole_size[ZONE_DMA] -= dma_end - start;
+   }
+#endif
 #ifdef CONFIG_ZONE_DMA32
if (start < max_dma32) {
-   unsigned long dma_end = min(end, max_dma32);
-   zhole_size[ZONE_DMA32] -= dma_end - start;
+   unsigned long dma32_end = min(end, max_dma32);
+   unsigned long dma32_start = max(start, max_dma);
+   zhole_size[ZONE_DMA32] -= dma32_end - dma32_start;
}
 #endif
if (end > max_dma32) {
@@ -416,7 +437,9 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   /* 4GB maximum for 32-bit only capable devices */
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   arm64_dma_phys_limit = max_zone_dma_phys();
+
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_dma32_phys();
else
@@ -428,7 +451,7 @@ void __init arm64_memblock_init(void)
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
 
-   dma_contiguous_reserve(arm64_dma32_phys_limit);
+   dma_contiguous_reserve(arm64_dma_phys_limit ? : arm64_dma32_phys_limit);
 }
 
 void __init bootmem_init(void)
@@ -531,7 +554,7 @@ static void __init free_unused_memmap(void)
  */
 void __init mem_init(void)
 {
-   if (swiotlb_force == SWIOTLB_FORCE ||
+

[PATCH v2 11/11] mm: refresh ZONE_DMA and ZONE_DMA32 comments in 'enum zone_type'

2019-08-20 Thread Nicolas Saenz Julienne

These zones usage has evolved with time and the comments were outdated.
This joins both ZONE_DMA and ZONE_DMA32 explanation and gives up to date
examples on how they are used on different architectures.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes in v2:
- Try another approach merging both zones explanations into one
- Address Christoph's comments
- If this approach doesn't get much traction I'll just drop the patch
  from the series as it's not really essential

 include/linux/mmzone.h | 46 +-
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d77d717c620c..9c150223d41f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -356,33 +356,41 @@ struct per_cpu_nodestat {
 #endif /* !__GENERATING_BOUNDS.H */
 
 enum zone_type {
-#ifdef CONFIG_ZONE_DMA
/*
-* ZONE_DMA is used when there are devices that are not able
-* to do DMA to all of addressable memory (ZONE_NORMAL). Then we
-* carve out the portion of memory that is needed for these devices.
-* The range is arch specific.
+* ZONE_DMA and ZONE_DMA32 are used when there are peripherals not able
+* to DMA to all of the addressable memory (ZONE_NORMAL).
+* On architectures where this area covers the whole 32 bit address
+* space ZONE_DMA32 is used. ZONE_DMA is left for the ones with smaller
+* DMA addressing constraints. This distinction is important as a 32bit
+* DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit
+* platforms may need both zones as they support peripherals with
+* different DMA addressing limitations.
+*
+* Some examples:
+*
+*  - i386 and x86_64 have a fixed 16M ZONE_DMA and ZONE_DMA32 for the
+*rest of the lower 4G.
+*
+*  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
+*the specific device.
+*
+*  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
+*depending on the specific device.
 *
-* Some examples
+*  - s390 uses ZONE_DMA fixed to the lower 2G.
 *
-* Architecture Limit
-* ---
-* parisc, ia64, sparc  <4G
-* s390, powerpc<2G
-* arm  Various
-* alphaUnlimited or 0-16MB.
+*  - arm64 uses ZONE_DMA to mark the area addresable by all
+*peripherals on the device, and ZONE_DMA32 for the rest of the
+*lower 4G. ZONE_DMA32 might be left empty.
 *
-* i386, x86_64 and multiple other arches
-*  <16M.
+*  - ia64 and riscv only use ZONE_DMA32.
+*
+*  - parisc uses neither.
 */
+#ifdef CONFIG_ZONE_DMA
ZONE_DMA,
 #endif
 #ifdef CONFIG_ZONE_DMA32
-   /*
-* x86_64 needs two ZONE_DMAs because it supports devices that are
-* only able to do DMA to the lower 16M but also 32 bit devices that
-* can only do DMA areas below 4G.
-*/
ZONE_DMA32,
 #endif
/*
-- 
2.22.0

[PATCH v2 07/11] arm64: re-introduce max_zone_dma_phys()

2019-08-20 Thread Nicolas Saenz Julienne

Some devices might have multiple interconnects with different DMA
addressing limitations. This function provides the higher physical
address accessible by all peripherals on the SoC. If such limitation
doesn't exist it'll return the maximum physical address of the 32 bit
addressable area.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes in v2:
- Update function's behavior to fit new dma zones split
- Use dma_zone_size
- Take into account devices with a hardcoded DMA offset

 arch/arm64/mm/init.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 8956c22634dd..bc7999020c71 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -174,6 +174,17 @@ static phys_addr_t __init max_zone_dma32_phys(void)
return min(offset + (1ULL << 32), memblock_end_of_DRAM());
 }
 
+static phys_addr_t __init max_zone_dma_phys(void)
+
+{
+   phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 32);
+
+   if (dma_zone_size)
+   return min(offset + dma_zone_size, memblock_end_of_DRAM());
+   else
+   return max_zone_dma32_phys();
+}
+
 #ifdef CONFIG_NUMA
 
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 05/11] arm64: mm: use arm64_dma_phys_limit instead of calling max_zone_dma_phys()

2019-08-20 Thread Nicolas Saenz Julienne

By the time we call zones_sizes_init() arm64_dma_phys_limit already
contains the result of max_zone_dma_phys(). We use the variable instead
of calling the function directly to save some precious cpu time.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes in v2: None

 arch/arm64/mm/init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f3c795278def..6112d6c90fa8 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -181,7 +181,7 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA32
-   max_zone_pfns[ZONE_DMA32] = PFN_DOWN(max_zone_dma_phys());
+   max_zone_pfns[ZONE_DMA32] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
max_zone_pfns[ZONE_NORMAL] = max;
 
-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 03/11] of/fdt: add of_fdt_machine_is_compatible function

2019-08-20 Thread Nicolas Saenz Julienne

Hi Rob,
thanks for the rewiew.

On Tue, 2019-08-20 at 12:16 -0500, Rob Herring wrote:
> On Tue, Aug 20, 2019 at 9:58 AM Nicolas Saenz Julienne
>  wrote:
> > Provides the same functionality as of_machine_is_compatible.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> > 
> > Changes in v2: None
> > 
> >  drivers/of/fdt.c | 7 +++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index 9cdf14b9aaab..06ffbd39d9af 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -802,6 +802,13 @@ const char * __init of_flat_dt_get_machine_name(void)
> > return name;
> >  }
> > 
> > +static const int __init of_fdt_machine_is_compatible(char *name)
> 
> No point in const return (though name could possibly be const), and
> the return could be bool instead.

Sorry, I completely missed that const, shouldn't have been there to begin with.

I'll add a const to the name argument and return a bool on the next version.

Regards,
Nicolas




signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2 10/11] arm64: edit zone_dma_bits to fine tune dma-direct min mask

2019-08-26 Thread Nicolas Saenz Julienne

On Mon, 2019-08-26 at 09:06 +0200, Christoph Hellwig wrote:
> On Tue, Aug 20, 2019 at 04:58:18PM +0200, Nicolas Saenz Julienne wrote:
> > -   if (IS_ENABLED(CONFIG_ZONE_DMA))
> > +   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> > arm64_dma_phys_limit = max_zone_dma_phys();
> > +   zone_dma_bits = ilog2((arm64_dma_phys_limit - 1) &
> > GENMASK_ULL(31, 0)) + 1;
>
Hi Christoph,
thanks for the rewiews.

> This adds a way too long line.

I know, I couldn't find a way to split the operation without making it even
harder to read. I'll find a solution.

> I also find the use of GENMASK_ULL
> horribly obsfucating, but I know that opinion is't shared by everyone.

Don't have any preference so I'll happily change it. Any suggestions? Using the
explicit 0xULL seems hard to read, how about SZ_4GB - 1?



signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2 01/11] asm-generic: add dma_zone_size

2019-08-26 Thread Nicolas Saenz Julienne

On Mon, 2019-08-26 at 09:09 +0200, Christoph Hellwig wrote:
> On Tue, Aug 20, 2019 at 04:58:09PM +0200, Nicolas Saenz Julienne wrote:
> > Some architectures have platform specific DMA addressing limitations.
> > This will allow for hardware description code to provide the constraints
> > in a generic manner, so as for arch code to properly setup it's memory
> > zones and DMA mask.
> 
> I know this just spreads the arm code, but I still kinda hate it.

Rob's main concern was finding a way to pass the constraint from HW definition
to arch without widening fdt's architecture specific function surface. I'd say
it's fair to argue that having a generic mechanism makes sense as it'll now
traverse multiple archs and subsystems.

I get adding globals like this is not very appealing, yet I went with it as it
was the easier to integrate with arm's code. Any alternative suggestions?

> MAX_DMA_ADDRESS is such an oddly defined concepts.  We have the mm
> code that uses it to start allocating after the dma zones, but
> I think that would better be done using a function returning
> 1 << max(zone_dma_bits, 32) or so.  Then we have about a handful
> of drivers using it that all seem rather bogus, and one of which
> I think are usable on arm64.

Is it safe to assume DMA limitations will always be a power of 2? I ask as RPi4
kinda isn't: ZONE_DMA is 0x3c00 bytes big, I'm approximating the zone mask
to 30 as [0x3c00 0x3fff] isn't defined as memory so it's unlikely that
we´ll encounter buffers there. But I don't know how it could affect mm
initialization code.

This also rules out 'zone_dma_bits' as a mechanism to pass ZONE_DMA's size from
HW definition code to arch's.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2 01/11] asm-generic: add dma_zone_size

2019-08-28 Thread Nicolas Saenz Julienne

On Mon, 2019-08-26 at 15:46 +0200, Nicolas Saenz Julienne wrote:
> On Mon, 2019-08-26 at 09:09 +0200, Christoph Hellwig wrote:
> > On Tue, Aug 20, 2019 at 04:58:09PM +0200, Nicolas Saenz Julienne wrote:
> > > Some architectures have platform specific DMA addressing limitations.
> > > This will allow for hardware description code to provide the constraints
> > > in a generic manner, so as for arch code to properly setup it's memory
> > > zones and DMA mask.
> > 
> > I know this just spreads the arm code, but I still kinda hate it.
> 
> Rob's main concern was finding a way to pass the constraint from HW definition
> to arch without widening fdt's architecture specific function surface. I'd say
> it's fair to argue that having a generic mechanism makes sense as it'll now
> traverse multiple archs and subsystems.
> 
> I get adding globals like this is not very appealing, yet I went with it as it
> was the easier to integrate with arm's code. Any alternative suggestions?
> 
> > MAX_DMA_ADDRESS is such an oddly defined concepts.  We have the mm
> > code that uses it to start allocating after the dma zones, but
> > I think that would better be done using a function returning
> > 1 << max(zone_dma_bits, 32) or so.  Then we have about a handful
> > of drivers using it that all seem rather bogus, and one of which
> > I think are usable on arm64.
> 
> Is it safe to assume DMA limitations will always be a power of 2? I ask as
> RPi4
> kinda isn't: ZONE_DMA is 0x3c00 bytes big, I'm approximating the zone mask
> to 30 as [0x3c00 0x3fff] isn't defined as memory so it's unlikely that
> we´ll encounter buffers there. But I don't know how it could affect mm
> initialization code.
> 
> This also rules out 'zone_dma_bits' as a mechanism to pass ZONE_DMA's size
> from
> HW definition code to arch's.

Hi Christoph,
I gave it a thought and think this whole MAX_DMA_ADDRESS topic falls out of the
scope of the series. I agree it's something that we should get rid of, but
fixing it isn't going to affect the overall enhancement intended here.  I'd
rather focus on how are we going to pass the DMA zone data into the arch code
and fix MAX_DMA_ADDRESS on another series.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 01/11] asm-generic: add dma_zone_size

2019-08-30 Thread Nicolas Saenz Julienne

On Fri, 2019-08-30 at 15:45 +0100, Catalin Marinas wrote:
> On Mon, Aug 26, 2019 at 03:46:52PM +0200, Nicolas Saenz Julienne wrote:
> > On Mon, 2019-08-26 at 09:09 +0200, Christoph Hellwig wrote:
> > > On Tue, Aug 20, 2019 at 04:58:09PM +0200, Nicolas Saenz Julienne wrote:
> > > > Some architectures have platform specific DMA addressing limitations.
> > > > This will allow for hardware description code to provide the constraints
> > > > in a generic manner, so as for arch code to properly setup it's memory
> > > > zones and DMA mask.
> > > 
> > > I know this just spreads the arm code, but I still kinda hate it.
> > 
> > Rob's main concern was finding a way to pass the constraint from HW
> > definition
> > to arch without widening fdt's architecture specific function surface. I'd
> > say
> > it's fair to argue that having a generic mechanism makes sense as it'll now
> > traverse multiple archs and subsystems.
> > 
> > I get adding globals like this is not very appealing, yet I went with it as
> > it
> > was the easier to integrate with arm's code. Any alternative suggestions?
> 
> In some discussion with Robin, since it's just RPi4 that we are aware of
> having such requirement on arm64, he suggested that we have a permanent
> ZONE_DMA on arm64 with a default size of 1GB. It should cover all arm64
> SoCs we know of without breaking the single Image binary. The arch/arm
> can use its current mach-* support.
> 
> I may like this more than the proposed early_init_dt_get_dma_zone_size()
> here which checks for specific SoCs (my preferred way was to build the
> mask from all buses described in DT but I hadn't realised the
> complications).

Hi Catalin, thanks for giving it a thought.

I'll be happy to implement it that way. I agree it's a good compromise.

@Christoph, do you still want the patch where I create 'zone_dma_bits'? With a
hardcoded ZONE_DMA it's not absolutely necessary. Though I remember you said it
was a first step towards being able to initialize dma-direct's min_mask in
meminit.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part

PCIe DMA addressing issues on Raspberry Pi 4

2019-09-03 Thread Nicolas Saenz Julienne

Hi all, sorry for the long read, I kept it as short as possible.

So, the wrapper around the PCIe block available on the Raspberry Pi 4 has a bug
preventing it from accessing anything beyond the first 3G of ram [1]. I'm
trying to figure out the best way to integrate this upstream.

Note that the only thing behind the PCIe bus is an USB3 chip. The bus is not
exposed to users directly.

I see two options:

- Isolate the PCIe block on it's own interconnect. IMO this could be acceptable
  as it's arguable that the bug is not really in the PCIe block.  I set the
  interconnect's dma-range size to 2GB instead of 3GB as dma masks don't play
  well with non power of two DMA constraints:

  buggy-scb {
compatible = "simple-bus";
dma-ranges = <0x0 0x 0x0 0x 0x8>;

pcie {
compatible = "brcm,bcm2711-pcie";
dma-ranges = <0x0200 0x0 0x 0x0 0x
  0x1 0x>;
[...]

};
  };

  scb {
compatible = "simple-bus";
dma-ranges = <0x0 0x 0x0 0x 0xfc000>;

eth0 {
[...]
};

[...]
  };

  With this setup the PCIe devices should behave correctly as long as they
  don't play with their own DMA masks.

- Configure PCIe's inbound view of memory taking into account the buggy
  interconnect:

  scb {
compatible = "simple-bus";
dma-ranges = <0x0 0x 0x0 0x 0xfc00>;

pcie {
compatible = "brcm,bcm2711-pcie";
dma-ranges = <0x0200 0x0 0x 0x0 0x
  0x0 0x8000>;
[...]

};

eth0 {
[...]
};

[...]
  };

  The downside of this is that of_dma_configure() doesn't play well with PCI
  devices. of_dma_configure() expects a device's OF node, yet the PCI core
  passes the bridge's OF node, as the device has none. The result is
  of_dma_configure() ignores PCI's dma-ranges. Solving this is not trivial.
  IMO the simplest solution would be to create a distinct OF node on PCI
  bridges, child of the actual PCI root complex.  FYI this was already an issue
  some years ago [2].

  This solution also suffers from devices setting their own DMA masks.

If you're curious abot the downstream kernel, they use their custom buffer
bouncing code, which AFAIK is something we're trying to get rid of.

Any comments? Alternative solutions?

Thanks,
Nicolas

[1] https://www.spinics.net/lists/arm-kernel/msg740693.html
[2] https://patchwork.kernel.org/patch/9650345/#20294961



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH RFC 0/5] ARM: Raspberry Pi 4 DMA support

2019-10-14 Thread Nicolas Saenz Julienne

Hi all,
the Raspberry Pi 4 offers up to 4GB of memory, of which only the first
is DMA capable device wide. This forces us to use of bounce buffers,
which are currently not very well supported by ARM's custom DMA ops.
Among other things the current mechanism (see dmabounce.c) isn't
suitable for high memory. Instead of fixing it, this series introduces a
way of selecting dma-direct as the default DMA ops provider which allows
for the Raspberry Pi to make use of swiotlb.

Regards,
Nicolas

---

Nicolas Saenz Julienne (5):
  dma/direct: turn ARCH_ZONE_DMA_BITS into a variable
  ARM: introduce arm_dma_direct
  ARM: let machines select dma-direct over arch's DMA implementation
  dma/direct: check for overflows in ARM's dma_capable()
  ARM: bcm2711: use dma-direct

 arch/arm/include/asm/dma-direct.h  |  6 ++
 arch/arm/include/asm/dma-mapping.h |  3 ++-
 arch/arm/include/asm/dma.h |  2 ++
 arch/arm/include/asm/mach/arch.h   |  1 +
 arch/arm/mach-bcm/Kconfig  |  1 +
 arch/arm/mach-bcm/bcm2711.c|  1 +
 arch/arm/mm/dma-mapping.c  | 10 ++
 arch/arm/mm/init.c | 21 -
 arch/arm64/include/asm/page.h  |  2 --
 arch/arm64/mm/init.c   |  9 +++--
 arch/powerpc/include/asm/page.h|  9 -
 arch/powerpc/mm/mem.c  | 20 +++-
 arch/s390/include/asm/page.h   |  2 --
 arch/s390/mm/init.c|  1 +
 include/linux/dma-direct.h |  2 ++
 kernel/dma/direct.c| 13 ++---
 16 files changed, 66 insertions(+), 37 deletions(-)

-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH RFC 1/5] dma/direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-10-14 Thread Nicolas Saenz Julienne

Some architectures, notably ARM, are interested in tweaking this
depending on their runtime DMA addressing limitations.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/include/asm/page.h   |  2 --
 arch/arm64/mm/init.c|  9 +++--
 arch/powerpc/include/asm/page.h |  9 -
 arch/powerpc/mm/mem.c   | 20 +++-
 arch/s390/include/asm/page.h|  2 --
 arch/s390/mm/init.c |  1 +
 include/linux/dma-direct.h  |  2 ++
 kernel/dma/direct.c | 13 ++---
 8 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 7b8c98830101..d39ddb258a04 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -38,6 +38,4 @@ extern int pfn_valid(unsigned long);
 
 #include 
 
-#define ARCH_ZONE_DMA_BITS 30
-
 #endif
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 44f07fdf7a59..ddd6a6ce158e 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -41,6 +42,8 @@
 #include 
 #include 
 
+#define ARM64_ZONE_DMA_BITS30
+
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -440,8 +443,10 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA))
-   arm64_dma_phys_limit = max_zone_phys(ARCH_ZONE_DMA_BITS);
+   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
+   }
 
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_phys(32);
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index c8bb14ff4713..f6c562acc3f8 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -329,13 +329,4 @@ struct vm_area_struct;
 #endif /* __ASSEMBLY__ */
 #include 
 
-/*
- * Allow 30-bit DMA for very limited Broadcom wifi chips on many powerbooks.
- */
-#ifdef CONFIG_PPC32
-#define ARCH_ZONE_DMA_BITS 30
-#else
-#define ARCH_ZONE_DMA_BITS 31
-#endif
-
 #endif /* _ASM_POWERPC_PAGE_H */
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 97e5922cb52e..8bab4e8b6bae 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -200,10 +201,10 @@ static int __init mark_nonram_nosave(void)
  * everything else. GFP_DMA32 page allocations automatically fall back to
  * ZONE_DMA.
  *
- * By using 31-bit unconditionally, we can exploit ARCH_ZONE_DMA_BITS to
- * inform the generic DMA mapping code.  32-bit only devices (if not handled
- * by an IOMMU anyway) will take a first dip into ZONE_NORMAL and get
- * otherwise served by ZONE_DMA.
+ * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
+ * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
+ * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
+ * ZONE_DMA.
  */
 static unsigned long max_zone_pfns[MAX_NR_ZONES];
 
@@ -236,9 +237,18 @@ void __init paging_init(void)
printk(KERN_DEBUG "Memory hole size: %ldMB\n",
   (long int)((top_of_ram - total_ram) >> 20));
 
+   /*
+* Allow 30-bit DMA for very limited Broadcom wifi chips on many
+* powerbooks.
+*/
+   if (IS_ENABLED(CONFIG_PPC32))
+   zone_dma_bits = 30;
+   else
+   zone_dma_bits = 31;
+
 #ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
- 1UL << (ARCH_ZONE_DMA_BITS - PAGE_SHIFT));
+ 1UL << (zone_dma_bits - PAGE_SHIFT));
 #endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 823578c6b9e2..a4d38092530a 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -177,8 +177,6 @@ static inline int devmem_is_allowed(unsigned long pfn)
 #define VM_DATA_DEFAULT_FLAGS  (VM_READ | VM_WRITE | \
 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
-#define ARCH_ZONE_DMA_BITS 31
-
 #include 
 #include 
 
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index c1d96e588152..ac44bd76db4b 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -118,6 +118,7 @@ void __init paging_init(void)
 
sparse_memory_present_with_active_regions(MAX_NUMNODES);
sparse_init();
+   zone_dma_bits = 31;
memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
max_zone_pfns[ZONE

[PATCH RFC 5/5] ARM: bcm2711: use dma-direct

2019-10-14 Thread Nicolas Saenz Julienne

The Raspberry Pi 4 supports up to 4GB of memory yet most of its devices
are only able to address the fist GB. Enable dma-direct for that board
in order to benefit from swiotlb's bounce buffering mechanism.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm/mach-bcm/Kconfig   | 1 +
 arch/arm/mach-bcm/bcm2711.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm/mach-bcm/Kconfig b/arch/arm/mach-bcm/Kconfig
index e4e25f287ad7..fd7d725d596c 100644
--- a/arch/arm/mach-bcm/Kconfig
+++ b/arch/arm/mach-bcm/Kconfig
@@ -163,6 +163,7 @@ config ARCH_BCM2835
select ARM_ERRATA_411920 if ARCH_MULTI_V6
select ARM_GIC if ARCH_MULTI_V7
select ZONE_DMA if ARCH_MULTI_V7
+   select SWIOTLB if ARCH_MULTI_V7
select ARM_TIMER_SP804
select HAVE_ARM_ARCH_TIMER if ARCH_MULTI_V7
select TIMER_OF
diff --git a/arch/arm/mach-bcm/bcm2711.c b/arch/arm/mach-bcm/bcm2711.c
index dbe296798647..67d98cb0533f 100644
--- a/arch/arm/mach-bcm/bcm2711.c
+++ b/arch/arm/mach-bcm/bcm2711.c
@@ -19,6 +19,7 @@ DT_MACHINE_START(BCM2711, "BCM2711")
 #ifdef CONFIG_ZONE_DMA
.dma_zone_size  = SZ_1G,
 #endif
+   .dma_direct = true,
.dt_compat = bcm2711_compat,
.smp = smp_ops(bcm2836_smp_ops),
 MACHINE_END
-- 
2.23.0

[PATCH RFC 2/5] ARM: introduce arm_dma_direct

2019-10-14 Thread Nicolas Saenz Julienne

ARM devices might use the arch's custom dma-mapping implementation or
dma-direct/swiotlb depending on how the kernel is built. This is not
good enough as we need to be able to control the device's DMA ops based
on the specific machine configuration.

Centralise control over DMA ops with arm_dma_direct, a global variable
which will be set accordingly during init.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm/include/asm/dma-mapping.h |  3 ++-
 arch/arm/include/asm/dma.h |  2 ++
 arch/arm/mm/dma-mapping.c  | 10 ++
 arch/arm/mm/init.c | 13 +
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index bdd80ddbca34..b19af5c55bee 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 #include 
@@ -18,7 +19,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-   if (IS_ENABLED(CONFIG_MMU) && !IS_ENABLED(CONFIG_ARM_LPAE))
+   if (IS_ENABLED(CONFIG_MMU) && !arm_dma_direct)
return &arm_dma_ops;
return NULL;
 }
diff --git a/arch/arm/include/asm/dma.h b/arch/arm/include/asm/dma.h
index a81dda65c576..d386719c53cd 100644
--- a/arch/arm/include/asm/dma.h
+++ b/arch/arm/include/asm/dma.h
@@ -14,6 +14,8 @@
(PAGE_OFFSET + arm_dma_zone_size) : 0xUL; })
 #endif
 
+extern bool arm_dma_direct __ro_after_init;
+
 #ifdef CONFIG_ISA_DMA_API
 /*
  * This is used to support drivers written for the x86 ISA DMA API.
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 13ef9f131975..172eea707cf7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -1100,14 +1101,7 @@ int arm_dma_supported(struct device *dev, u64 mask)
 
 static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
-   /*
-* When CONFIG_ARM_LPAE is set, physical address can extend above
-* 32-bits, which then can't be addressed by devices that only support
-* 32-bit DMA.
-* Use the generic dma-direct / swiotlb ops code in that case, as that
-* handles bounce buffering for us.
-*/
-   if (IS_ENABLED(CONFIG_ARM_LPAE))
+   if (arm_dma_direct)
return NULL;
return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
 }
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index b4be3baa83d4..0a63379a4d1a 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -105,8 +105,21 @@ static void __init arm_adjust_dma_zone(unsigned long 
*size, unsigned long *hole,
 }
 #endif
 
+bool arm_dma_direct __ro_after_init;
+EXPORT_SYMBOL(arm_dma_direct);
+
 void __init setup_dma_zone(const struct machine_desc *mdesc)
 {
+   /*
+* When CONFIG_ARM_LPAE is set, physical address can extend above
+* 32-bits, which then can't be addressed by devices that only support
+* 32-bit DMA.
+* Use the generic dma-direct / swiotlb ops code in that case, as that
+* handles bounce buffering for us.
+*/
+   if (IS_ENABLED(CONFIG_ARM_LPAE))
+   arm_dma_direct = true;
+
 #ifdef CONFIG_ZONE_DMA
if (mdesc->dma_zone_size) {
arm_dma_zone_size = mdesc->dma_zone_size;
-- 
2.23.0

[PATCH RFC 3/5] ARM: let machines select dma-direct over arch's DMA implementation

2019-10-14 Thread Nicolas Saenz Julienne

A bounce buffering feature is already available in ARM, dmabounce.c, yet
it doesn't support high memory which some devices need. Instead of
fixing it, provide a means for devices to enable dma-direct, which is the
preferred way of doing DMA now days.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm/include/asm/mach/arch.h | 1 +
 arch/arm/mm/init.c   | 8 +++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h
index e7df5a822cab..3542bf502573 100644
--- a/arch/arm/include/asm/mach/arch.h
+++ b/arch/arm/include/asm/mach/arch.h
@@ -33,6 +33,7 @@ struct machine_desc {
 #ifdef CONFIG_ZONE_DMA
phys_addr_t dma_zone_size;  /* size of DMA-able area */
 #endif
+   booldma_direct;
 
unsigned intvideo_start;/* start of video RAM   */
unsigned intvideo_end;  /* end of video RAM */
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 0a63379a4d1a..556f70665353 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -119,6 +120,8 @@ void __init setup_dma_zone(const struct machine_desc *mdesc)
 */
if (IS_ENABLED(CONFIG_ARM_LPAE))
arm_dma_direct = true;
+   else
+   arm_dma_direct = mdesc->dma_direct;
 
 #ifdef CONFIG_ZONE_DMA
if (mdesc->dma_zone_size) {
@@ -126,7 +129,10 @@ void __init setup_dma_zone(const struct machine_desc 
*mdesc)
arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
} else
arm_dma_limit = 0x;
+
arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
+
+   zone_dma_bits = ilog2(arm_dma_limit) + 1;
 #endif
 }
 
@@ -482,7 +488,7 @@ static void __init free_highpages(void)
  */
 void __init mem_init(void)
 {
-#ifdef CONFIG_ARM_LPAE
+#ifdef CONFIG_SWIOTLB
swiotlb_init(1);
 #endif
 
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH RFC 4/5] dma/direct: check for overflows in ARM's dma_capable()

2019-10-14 Thread Nicolas Saenz Julienne

The Raspberry Pi 4 has a 1GB ZONE_DMA area starting at address
0x and a mapping between physical and DMA memory offset by
0xc000.  It transpires that, on non LPAE systems, any attempt to
translate physical addresses outside of ZONE_DMA will result in an
overflow. The resulting DMA addresses will not be detected by arm's
dma_capable() as they still fit in the device's DMA mask.

Fix this by failing to validate a DMA address smaller than the lowest
possible DMA address.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm/include/asm/dma-direct.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/include/asm/dma-direct.h 
b/arch/arm/include/asm/dma-direct.h
index b67e5fc1fe43..ee8ad47a14e3 100644
--- a/arch/arm/include/asm/dma-direct.h
+++ b/arch/arm/include/asm/dma-direct.h
@@ -2,6 +2,8 @@
 #ifndef ASM_ARM_DMA_DIRECT_H
 #define ASM_ARM_DMA_DIRECT_H 1
 
+#include 
+
 static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
 {
unsigned int offset = paddr & ~PAGE_MASK;
@@ -21,6 +23,10 @@ static inline bool dma_capable(struct device *dev, 
dma_addr_t addr, size_t size)
if (!dev->dma_mask)
return 0;
 
+   /* Check if address overflowed */
+   if (addr < __phys_to_dma(dev, PFN_UP(min_low_pfn)))
+   return 0;
+
mask = *dev->dma_mask;
 
limit = (mask + 1) & ~mask;
-- 
2.23.0

Re: [PATCH RFC 0/5] ARM: Raspberry Pi 4 DMA support

2019-10-15 Thread Nicolas Saenz Julienne

On Mon, 2019-10-14 at 21:59 +0100, Catalin Marinas wrote:
> On Mon, Oct 14, 2019 at 08:31:02PM +0200, Nicolas Saenz Julienne wrote:
> > the Raspberry Pi 4 offers up to 4GB of memory, of which only the first
> > is DMA capable device wide. This forces us to use of bounce buffers,
> > which are currently not very well supported by ARM's custom DMA ops.
> > Among other things the current mechanism (see dmabounce.c) isn't
> > suitable for high memory. Instead of fixing it, this series introduces a
> > way of selecting dma-direct as the default DMA ops provider which allows
> > for the Raspberry Pi to make use of swiotlb.
> 
> I presume these patches go on top of this series:
> 
> http://lkml.kernel.org/r/20190911182546.17094-1-nsaenzjulie...@suse.de

Yes, forgot to mention it. It's relevant for the first patch.

> 
> which I queued here:
> 
> 
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/zone-dma

Thanks!

A little off topic but I was wondering if you have a preferred way to refer to
the arm architecture in a way that it unambiguously excludes arm64 (for example
arm32 would work).

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part

Re: [PATCH RFC 4/5] dma/direct: check for overflows in ARM's dma_capable()

2019-10-15 Thread Nicolas Saenz Julienne

On Tue, 2019-10-15 at 03:23 -0700, Christoph Hellwig wrote:
> On Mon, Oct 14, 2019 at 08:31:06PM +0200, Nicolas Saenz Julienne wrote:
> > The Raspberry Pi 4 has a 1GB ZONE_DMA area starting at address
> > 0x and a mapping between physical and DMA memory offset by
> > 0xc000.  It transpires that, on non LPAE systems, any attempt to
> > translate physical addresses outside of ZONE_DMA will result in an
> > overflow. The resulting DMA addresses will not be detected by arm's
> > dma_capable() as they still fit in the device's DMA mask.
> > 
> > Fix this by failing to validate a DMA address smaller than the lowest
> > possible DMA address.
> 
> I think the main problem here is that arm doesn't respect the
> bus_dma_mask.  If you replace the arm version of dma_capable with
> the generic one, does that fi the issue for you as well?

Yeah, that was my fist thought too, but it doesn't help.

So with RPi4's DMA mapping:

soc {
dma-ranges = <0xc000  0x0 0x  0x3c00>;
[...]
};

You'll get a 32bit bus dma map (log2i(0xc000 + 0x3c00) + 1 = 32).

Trouble is, taking into account arm's multi_v7_defconfig uses 32bit addresses,
most phys_to_dma() translations are likely to overflow. For example phys
0x6000 will be translated to DMA 0x2000, which is no good.

No mask is going to catch this, and both dma_capable() implementations will
fail.

> We need to untangle the various macros arm uses for the direct mapping
> and eventually we should be able to use the linux/dma-direct.h helpers
> directly.  Here is a branch with some simple preps I had.  Freshly
> rebased, not actually tested:
> 
> 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/arm-generic-dma-preps

Noted, looks good to me.

Actually, an alternative to this patch would be to kill all custom
dma_capable() implementations, which are mostly redundant, and add these extra
checks conditional to the DMA address size in the generic version. I'll try to
respin this if I manage to understand what's going on with x86/sta211-fixup.c.

signature.asc
Description: This is a digitally signed message part

[PATCH v2 1/2] dma-direct: check for overflows on 32 bit DMA addresses

2019-10-18 Thread Nicolas Saenz Julienne

As seen on the new Raspberry Pi 4 and sta2x11's DMA implementation it is
possible for a device configured with 32 bit DMA addresses and a partial
DMA mapping located at the end of the address space to overflow. It
happens when a higher physical address, not DMAable, is translated to
it's DMA counterpart.

For example the Raspberry Pi 4, configurable up to 4 GB of memory, has
an interconnect capable of addressing the lower 1 GB of physical memory
with a DMA offset of 0xc000. It transpires that, any attempt to
translate physical addresses higher than the first GB will result in an
overflow which dma_capable() can't detect as it only checks for
addresses bigger then the maximum allowed DMA address.

Fix this by verifying in dma_capable() if the DMA address range provided
is at any point lower than the minimum possible DMA address on the bus.

Signed-off-by: Nicolas Saenz Julienne 
---
 include/linux/dma-direct.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index adf993a3bd58..af4bb9c81ccc 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -3,6 +3,7 @@
 #define _LINUX_DMA_DIRECT_H 1
 
 #include 
+#include  /* for min_low_pfn */
 #include 
 
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
@@ -27,6 +28,13 @@ static inline bool dma_capable(struct device *dev, 
dma_addr_t addr, size_t size)
if (!dev->dma_mask)
return false;
 
+#ifndef CONFIG_ARCH_DMA_ADDR_T_64BIT
+   /* Check if DMA address overflowed */
+   if (min(addr, addr + size - 1) <
+   __phys_to_dma(dev, (phys_addr_t)(min_low_pfn << PAGE_SHIFT)))
+   return false;
+#endif
+
return addr + size - 1 <=
min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
 }
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 0/2] x86: Get rid of custom DMA functions

2019-10-18 Thread Nicolas Saenz Julienne

sta2x11 is the only x86 device that depends custom DMA direct functions.
It turns out it can be made standard by carefully setting the device's
DMA masks and offset.

Originally only patch #2 was sent but I realised patch #1 is also
needed, which is a good addition as it's also a prerequisite to get
proper DMA support on the Raspberry Pi 4[1].

[1] https://lkml.org/lkml/2019/10/15/523

---

Changes since v1:
  - Small cleanups in sta2x11-fixup.x
  - add patch checking DMA addresses lower bounds

Nicolas Saenz Julienne (2):
  dma-direct: check for overflows on 32 bit DMA addresses
  x86/PCI: sta2x11: use default DMA address translation ops

 arch/x86/Kconfig  |   1 -
 arch/x86/include/asm/device.h |   3 -
 arch/x86/include/asm/dma-direct.h |   9 --
 arch/x86/pci/sta2x11-fixup.c  | 135 ++
 include/linux/dma-direct.h|   8 ++
 5 files changed, 34 insertions(+), 122 deletions(-)
 delete mode 100644 arch/x86/include/asm/dma-direct.h

-- 
2.23.0

Re: [PATCH RFC 1/5] dma/direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-10-31 Thread Nicolas Saenz Julienne

On Wed, 2019-10-30 at 14:49 -0700, Christoph Hellwig wrote:
> On Mon, Oct 14, 2019 at 08:31:03PM +0200, Nicolas Saenz Julienne wrote:
> > Some architectures, notably ARM, are interested in tweaking this
> > depending on their runtime DMA addressing limitations.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> 
> Do you want me to pick this up for the 5.5 dma-mapping tree, or do you
> want me to wait for the rest to settle?

I'd say take it, this will be ultimately needed once we push forward with ARM.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 1/2] dma-direct: check for overflows on 32 bit DMA addresses

2019-10-31 Thread Nicolas Saenz Julienne

On Wed, 2019-10-30 at 14:41 -0700, Christoph Hellwig wrote:
> On Fri, Oct 18, 2019 at 01:00:43PM +0200, Nicolas Saenz Julienne wrote:
> > +#ifndef CONFIG_ARCH_DMA_ADDR_T_64BIT
> > +   /* Check if DMA address overflowed */
> > +   if (min(addr, addr + size - 1) <
> > +   __phys_to_dma(dev, (phys_addr_t)(min_low_pfn << PAGE_SHIFT)))
> > +   return false;
> > +#endif
> 
> Would be nice to use IS_ENABLED and PFN_PHYS here, and I also think we
> need to use phys_to_dma to take care of the encryption bit.  If you then
> also introduce an end variable we can make the whole thing actually look
> nice:
> 
> static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t
> size)
> {
>   dma_addr_t end = addr + size - 1;
> 
> if (!dev->dma_mask)
> return false;
> 
>   if (!IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
>   min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
>   return false;
> 
> return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
> }
> 
> Otherwise this looks sensible to me.

Thanks, noted.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] dma/direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-10-31 Thread Nicolas Saenz Julienne

Some architectures, notably ARM, are interested in tweaking this
depending on their runtime DMA addressing limitations.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes since RFC:
 - Rebased to v5.4-rc6, fixed arm64 code.

NOTE: This will only apply to linux-next, where 
 arch/arm64/mm/init.c|  5 +
 arch/powerpc/include/asm/page.h |  9 -
 arch/powerpc/mm/mem.c   | 20 +++-
 arch/s390/include/asm/page.h|  2 --
 arch/s390/mm/init.c |  1 +
 include/linux/dma-direct.h  |  2 ++
 kernel/dma/direct.c | 13 ++---
 7 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 45c00a54909c..f716ea634804 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -41,6 +42,8 @@
 #include 
 #include 
 
+#define ARM64_ZONE_DMA_BITS30
+
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -424,6 +427,8 @@ void __init arm64_memblock_init(void)
else
arm64_dma_phys_limit = PHYS_MASK + 1;
 
+   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+
reserve_crashkernel();
 
reserve_elfcorehdr();
diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index c8bb14ff4713..f6c562acc3f8 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -329,13 +329,4 @@ struct vm_area_struct;
 #endif /* __ASSEMBLY__ */
 #include 
 
-/*
- * Allow 30-bit DMA for very limited Broadcom wifi chips on many powerbooks.
- */
-#ifdef CONFIG_PPC32
-#define ARCH_ZONE_DMA_BITS 30
-#else
-#define ARCH_ZONE_DMA_BITS 31
-#endif
-
 #endif /* _ASM_POWERPC_PAGE_H */
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index be941d382c8d..c95b7fe9f298 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -201,10 +202,10 @@ static int __init mark_nonram_nosave(void)
  * everything else. GFP_DMA32 page allocations automatically fall back to
  * ZONE_DMA.
  *
- * By using 31-bit unconditionally, we can exploit ARCH_ZONE_DMA_BITS to
- * inform the generic DMA mapping code.  32-bit only devices (if not handled
- * by an IOMMU anyway) will take a first dip into ZONE_NORMAL and get
- * otherwise served by ZONE_DMA.
+ * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
+ * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
+ * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
+ * ZONE_DMA.
  */
 static unsigned long max_zone_pfns[MAX_NR_ZONES];
 
@@ -237,9 +238,18 @@ void __init paging_init(void)
printk(KERN_DEBUG "Memory hole size: %ldMB\n",
   (long int)((top_of_ram - total_ram) >> 20));
 
+   /*
+* Allow 30-bit DMA for very limited Broadcom wifi chips on many
+* powerbooks.
+*/
+   if (IS_ENABLED(CONFIG_PPC32))
+   zone_dma_bits = 30;
+   else
+   zone_dma_bits = 31;
+
 #ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
- 1UL << (ARCH_ZONE_DMA_BITS - PAGE_SHIFT));
+ 1UL << (zone_dma_bits - PAGE_SHIFT));
 #endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 823578c6b9e2..a4d38092530a 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -177,8 +177,6 @@ static inline int devmem_is_allowed(unsigned long pfn)
 #define VM_DATA_DEFAULT_FLAGS  (VM_READ | VM_WRITE | \
 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
-#define ARCH_ZONE_DMA_BITS 31
-
 #include 
 #include 
 
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index a124f19f7b3c..f0ce0565 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -118,6 +118,7 @@ void __init paging_init(void)
 
sparse_memory_present_with_active_regions(MAX_NUMNODES);
sparse_init();
+   zone_dma_bits = 31;
memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index adf993a3bd58..d03af3605460 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -5,6 +5,8 @@
 #include 
 #include 
 
+extern unsigned int zone_dma_bits;
+
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include 
 #else
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8402b29c280f..0b67c04e531b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -16,12 +16,11

Re: [PATCH] dma/direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-10-31 Thread Nicolas Saenz Julienne

On Thu, 2019-10-31 at 16:47 +0100, Christoph Hellwig wrote:
> On Thu, Oct 31, 2019 at 04:28:37PM +0100, Nicolas Saenz Julienne wrote:
> > Some architectures, notably ARM, are interested in tweaking this
> > depending on their runtime DMA addressing limitations.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> > 
> > Changes since RFC:
> >  - Rebased to v5.4-rc6, fixed arm64 code.
> > 
> > NOTE: This will only apply to linux-next, where 
> 
> missing end of the sentence.  But only applying to linux-next isn't
> going to help anyone..

Arrgh, excuse me, I meant to delete that line.

> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 45c00a54909c..f716ea634804 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -20,6 +20,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -41,6 +42,8 @@
> >  #include 
> >  #include 
> >  
> > +#define ARM64_ZONE_DMA_BITS30
> > +
> >  /*
> >   * We need to be able to catch inadvertent references to memstart_addr
> >   * that occur (potentially in generic code) before arm64_memblock_init()
> > @@ -424,6 +427,8 @@ void __init arm64_memblock_init(void)
> > else
> > arm64_dma_phys_limit = PHYS_MASK + 1;
> >  
> > +   zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > +
> > reserve_crashkernel();
> 
> This actually adds a new limit, as there wasn't one before for arm64.

Well, as zone_dma_bits is only relevant in dma/direct when ZONE_DMA is defined
I figured it doesn't matter if the variable is set conditionally to ZONE_DMA or
not.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma/direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-10-31 Thread Nicolas Saenz Julienne

On Thu, 2019-10-31 at 16:57 +0100, Christoph Hellwig wrote:
> On Thu, Oct 31, 2019 at 04:53:13PM +0100, Nicolas Saenz Julienne wrote:
> > > > +#define ARM64_ZONE_DMA_BITS30
> > > > +
> > > >  /*
> > > >   * We need to be able to catch inadvertent references to memstart_addr
> > > >   * that occur (potentially in generic code) before
> > > > arm64_memblock_init()
> > > > @@ -424,6 +427,8 @@ void __init arm64_memblock_init(void)
> > > > else
> > > > arm64_dma_phys_limit = PHYS_MASK + 1;
> > > >  
> > > > +   zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > > > +
> > > > reserve_crashkernel();
> > > 
> > > This actually adds a new limit, as there wasn't one before for arm64.
> > 
> > Well, as zone_dma_bits is only relevant in dma/direct when ZONE_DMA is
> > defined
> > I figured it doesn't matter if the variable is set conditionally to ZONE_DMA
> > or
> > not.
> 
> I'd much prefer that to do separately.

OK, I see what you mean now. It's wrong indeed.

The trouble is the ZONE_DMA series[1] in arm64, also due for v5.5, will be
affected by this patch. I don't know the right way to approach this problem
since depending on the merge order, this patch should be updated or the arm64
ZONE_DMA series fixed.

Maybe it's easier to just wait for v5.6.

Regards,
Nicolas

[1] https://lkml.org/lkml/2019/9/11/734



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH RFC 1/5] dma/direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-10-31 Thread Nicolas Saenz Julienne

On Thu, 2019-10-31 at 06:38 -0700, Christoph Hellwig wrote:
> On Thu, Oct 31, 2019 at 11:30:36AM +0100, Nicolas Saenz Julienne wrote:
> > On Wed, 2019-10-30 at 14:49 -0700, Christoph Hellwig wrote:
> > > On Mon, Oct 14, 2019 at 08:31:03PM +0200, Nicolas Saenz Julienne wrote:
> > > > Some architectures, notably ARM, are interested in tweaking this
> > > > depending on their runtime DMA addressing limitations.
> > > > 
> > > > Signed-off-by: Nicolas Saenz Julienne 
> > > 
> > > Do you want me to pick this up for the 5.5 dma-mapping tree, or do you
> > > want me to wait for the rest to settle?
> > 
> > I'd say take it, this will be ultimately needed once we push forward with
> > ARM.
> 
> Can you resend a version that applies against 5.4-rc?  The current one
> has conflicts in the arm64 code.

Hi Catalin,
would you mind taking this patch on top of the arm64 ZONE_DMA series?

I tried to go though Christoph's tree but it ultimately clashes with the
ZONE_DMA series. It's simpler to apply it on top of your tree.

Regrads,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 1/2] dma-direct: check for overflows on 32 bit DMA addresses

2019-11-07 Thread Nicolas Saenz Julienne

As seen on the new Raspberry Pi 4 and sta2x11's DMA implementation it is
possible for a device configured with 32 bit DMA addresses and a partial
DMA mapping located at the end of the address space to overflow. It
happens when a higher physical address, not DMAable, is translated to
it's DMA counterpart.

For example the Raspberry Pi 4, configurable up to 4 GB of memory, has
an interconnect capable of addressing the lower 1 GB of physical memory
with a DMA offset of 0xc000. It transpires that, any attempt to
translate physical addresses higher than the first GB will result in an
overflow which dma_capable() can't detect as it only checks for
addresses bigger then the maximum allowed DMA address.

Fix this by verifying in dma_capable() if the DMA address range provided
is at any point lower than the minimum possible DMA address on the bus.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v2:
  - Cleanup code: use IS_ENABLED, a tmp variable for end, and
use phys_to_dma().

 include/linux/dma-direct.h | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index adf993a3bd58..6a18a97b76a8 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -3,8 +3,11 @@
 #define _LINUX_DMA_DIRECT_H 1
 
 #include 
+#include  /* for min_low_pfn */
 #include 
 
+static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr);
+
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include 
 #else
@@ -24,11 +27,16 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, 
dma_addr_t dev_addr)
 
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
 {
+   dma_addr_t end = addr + size - 1;
+
if (!dev->dma_mask)
return false;
 
-   return addr + size - 1 <=
-   min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
+   if (!IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
+   min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
+   return false;
+
+   return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_mask);
 }
 #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */
 
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 0/2] x86: Get rid of custom DMA functions

2019-11-07 Thread Nicolas Saenz Julienne

sta2x11 is the only x86 device that depends custom DMA direct functions.
It turns out it can be made standard by carefully setting the device's
DMA masks and offset.

Originally only patch #2 was sent but I realised patch #1 is also
needed, which is a good addition as it's also a prerequisite to get
proper DMA support on the Raspberry Pi 4[1].

[1] https://lkml.org/lkml/2019/10/15/523

---

Changes since v2:
  - cleanup dma-direct.h

Changes since v1:
  - Small cleanups in sta2x11-fixup.x
  - add patch checking DMA addresses lower bounds

Nicolas Saenz Julienne (2):
  dma-direct: check for overflows on 32 bit DMA addresses
  x86/PCI: sta2x11: use default DMA address translation

 arch/x86/Kconfig  |   1 -
 arch/x86/include/asm/device.h |   3 -
 arch/x86/include/asm/dma-direct.h |   9 --
 arch/x86/pci/sta2x11-fixup.c  | 135 ++
 include/linux/dma-direct.h|  12 ++-
 5 files changed, 36 insertions(+), 124 deletions(-)
 delete mode 100644 arch/x86/include/asm/dma-direct.h

-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 1/6] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-12 Thread Nicolas Saenz Julienne

Some users need to make sure their rounding function accepts and returns
64bit long variables regardless of the architecture. Sadly
roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
new generic 64bit variant of the function and cleanup rougue custom
implementations.

Signed-off-by: Nicolas Saenz Julienne 
---
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |  3 +-
 drivers/pci/controller/pcie-cadence-ep.c  |  7 +--
 drivers/pci/controller/pcie-cadence.c |  7 +--
 drivers/pci/controller/pcie-rockchip-ep.c |  9 ++--
 include/linux/log2.h  | 52 +++
 kernel/dma/direct.c   |  3 +-
 6 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 024788549c25..027bd72505e2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -33,6 +33,7 @@
 
 #include 
 #include 
+#include 
 
 #include "mlx4_en.h"
 
@@ -252,7 +253,7 @@ static u32 freq_to_shift(u16 freq)
 {
u32 freq_khz = freq * 1000;
u64 max_val_cycles = freq_khz * 1000 * MLX4_EN_WRAP_AROUND_SEC;
-   u64 max_val_cycles_rounded = 1ULL << fls64(max_val_cycles - 1);
+   u64 max_val_cycles_rounded = roundup_pow_of_two64(max_val_cycles);
/* calculate max possible multiplier in order to fit in 64bit */
u64 max_mul = div64_u64(ULLONG_MAX, max_val_cycles_rounded);
 
diff --git a/drivers/pci/controller/pcie-cadence-ep.c 
b/drivers/pci/controller/pcie-cadence-ep.c
index def7820cb824..26ff424b16f5 100644
--- a/drivers/pci/controller/pcie-cadence-ep.c
+++ b/drivers/pci/controller/pcie-cadence-ep.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pcie-cadence.h"
 
@@ -90,11 +91,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
 
/* BAR size is 2^(aperture + 7) */
sz = max_t(size_t, epf_bar->size, CDNS_PCIE_EP_MIN_APERTURE);
-   /*
-* roundup_pow_of_two() returns an unsigned long, which is not suited
-* for 64bit values.
-*/
-   sz = 1ULL << fls64(sz - 1);
+   sz = roundup_pow_of_two64(sz);
aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
 
if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
diff --git a/drivers/pci/controller/pcie-cadence.c 
b/drivers/pci/controller/pcie-cadence.c
index cd795f6fc1e2..b2278e6b955c 100644
--- a/drivers/pci/controller/pcie-cadence.c
+++ b/drivers/pci/controller/pcie-cadence.c
@@ -4,6 +4,7 @@
 // Author: Cyrille Pitchen 
 
 #include 
+#include 
 
 #include "pcie-cadence.h"
 
@@ -11,11 +12,7 @@ void cdns_pcie_set_outbound_region(struct cdns_pcie *pcie, 
u8 fn,
   u32 r, bool is_io,
   u64 cpu_addr, u64 pci_addr, size_t size)
 {
-   /*
-* roundup_pow_of_two() returns an unsigned long, which is not suited
-* for 64bit values.
-*/
-   u64 sz = 1ULL << fls64(size - 1);
+   u64 sz = roundup_pow_of_two64(size);
int nbits = ilog2(sz);
u32 addr0, addr1, desc0, desc1;
 
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c 
b/drivers/pci/controller/pcie-rockchip-ep.c
index d743b0a48988..ed50aaf27784 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pcie-rockchip.h"
 
@@ -70,7 +71,7 @@ static void rockchip_pcie_prog_ep_ob_atu(struct rockchip_pcie 
*rockchip, u8 fn,
 u32 r, u32 type, u64 cpu_addr,
 u64 pci_addr, size_t size)
 {
-   u64 sz = 1ULL << fls64(size - 1);
+   u64 sz = roundup_pow_of_two64(size);
int num_pass_bits = ilog2(sz);
u32 addr0, addr1, desc0, desc1;
bool is_nor_msg = (type == AXI_WRAPPER_NOR_MSG);
@@ -172,11 +173,7 @@ static int rockchip_pcie_ep_set_bar(struct pci_epc *epc, 
u8 fn,
/* BAR size is 2^(aperture + 7) */
sz = max_t(size_t, epf_bar->size, MIN_EP_APERTURE);
 
-   /*
-* roundup_pow_of_two() returns an unsigned long, which is not suited
-* for 64bit values.
-*/
-   sz = 1ULL << fls64(sz - 1);
+   sz = roundup_pow_of_two64(sz);
aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
 
if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
diff --git a/include/linux/log2.h b/include/linux/log2.h
index 83a4a3ca3e8a..db12d92ab6eb 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -67,6 +67,24 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
return 1UL << (fls_long(n) - 1);
 }
 
+/**
+ * __roundup_pow_of_two64() - rou

[PATCH v2 0/6] Raspberry Pi 4 PCIe support

2019-11-12 Thread Nicolas Saenz Julienne

This series aims at providing support for Raspberry Pi 4's PCIe
controller, which is also shared with the Broadcom STB family of
devices.

There was a previous attempt to upstream this some years ago[1] but was
blocked as most STB PCIe integrations have a sparse DMA mapping[2] which
is something currently not supported by the kernel.  Luckily this is not
the case for the Raspberry Pi 4.

Note that the driver code is to be based on top of Rob Herring's series
simplifying inbound and outbound range parsing.

[1] https://patchwork.kernel.org/cover/10605933/
[2] https://patchwork.kernel.org/patch/10605957/

---

Changes since v1:
  - add generic rounddown/roundup_pow_two64() patch
  - Add MAINTAINERS patch
  - Fix Kconfig
  - Cleanup probe, use up to date APIs, exit on MSI failure
  - Get rid of linux,pci-domain and other unused constructs
  - Use edge triggered setup for MSI
  - Cleanup MSI implementation
  - Fix multiple cosmetic issues
  - Remove supend/resume code

Jim Quinlan (3):
  dt-bindings: PCI: Add bindings for brcmstb's PCIe device
  PCI: brcmstb: add Broadcom STB PCIe host controller driver
  PCI: brcmstb: add MSI capability

Nicolas Saenz Julienne (3):
  linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
  ARM: dts: bcm2711: Enable PCIe controller
  MAINTAINERS: Add brcmstb PCIe controller

 .../bindings/pci/brcm,stb-pcie.yaml   |  110 ++
 MAINTAINERS   |4 +
 arch/arm/boot/dts/bcm2711.dtsi|   46 +
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |3 +-
 drivers/pci/controller/Kconfig|9 +
 drivers/pci/controller/Makefile   |1 +
 drivers/pci/controller/pcie-brcmstb.c | 1179 +
 drivers/pci/controller/pcie-cadence-ep.c  |7 +-
 drivers/pci/controller/pcie-cadence.c |7 +-
 drivers/pci/controller/pcie-rockchip-ep.c |9 +-
 include/linux/log2.h  |   52 +
 kernel/dma/direct.c   |3 +-
 12 files changed, 1412 insertions(+), 18 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
 create mode 100644 drivers/pci/controller/pcie-brcmstb.c

-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: unify the dma_capable definition

2019-11-13 Thread Nicolas Saenz Julienne

On Wed, 2019-11-13 at 08:35 +0100, Christoph Hellwig wrote:
> Hi all,
> 
> there is no good reason to have different version of dma_capable.
> This series removes the arch overrides and also adds a few cleanups
> in the same area.

Reviewed-by: Nicolas Saenz Julienne 

for the whole series,
Thanks!



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] dma-mapping: treat dev->bus_dma_mask as a DMA limit

2019-11-13 Thread Nicolas Saenz Julienne

Using a mask to represent bus DMA constraints has a set of limitations.
The biggest one being it can only hold a power of two (minus one). The
DMA mapping code is already aware of this and treats dev->bus_dma_mask
as a limit. This quirk is already used by some architectures although
still rare.

With the introduction of the Raspberry Pi 4 we've found a new contender
for the use of bus DMA limits, as its PCIe bus can only address the
lower 3GB of memory (of a total of 4GB). This is impossible to represent
with a mask. To make things worse the device-tree code rounds non power
of two bus DMA limits to the next power of two, which is unacceptable in
this case.

In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
over the tree and treat it as such. Note that dev->bus_dma_limit is
meant to contain the higher accesible DMA address.

Signed-off-by: Nicolas Saenz Julienne 

---

Note this is rebased on top of Christoph's latest DMA series:
https://www.spinics.net/lists/arm-kernel/msg768600.html

 arch/mips/pci/fixup-sb1250.c  | 16 
 arch/powerpc/sysdev/fsl_pci.c |  6 +++---
 arch/x86/kernel/pci-dma.c |  2 +-
 arch/x86/mm/mem_encrypt.c |  2 +-
 arch/x86/pci/sta2x11-fixup.c  |  2 +-
 drivers/acpi/arm64/iort.c |  2 +-
 drivers/ata/ahci.c|  2 +-
 drivers/iommu/dma-iommu.c |  3 +--
 drivers/of/device.c   |  9 +
 include/linux/device.h|  6 +++---
 include/linux/dma-direct.h|  2 +-
 include/linux/dma-mapping.h   |  2 +-
 kernel/dma/direct.c   | 27 +--
 13 files changed, 40 insertions(+), 41 deletions(-)

diff --git a/arch/mips/pci/fixup-sb1250.c b/arch/mips/pci/fixup-sb1250.c
index 8a41b359cf90..40efc990cdce 100644
--- a/arch/mips/pci/fixup-sb1250.c
+++ b/arch/mips/pci/fixup-sb1250.c
@@ -21,22 +21,22 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SIBYTE, 
PCI_DEVICE_ID_BCM1250_PCI,
 
 /*
  * The BCM1250, etc. PCI host bridge does not support DAC on its 32-bit
- * bus, so we set the bus's DMA mask accordingly.  However the HT link
+ * bus, so we set the bus's DMA limit accordingly.  However the HT link
  * down the artificial PCI-HT bridge supports 40-bit addressing and the
  * SP1011 HT-PCI bridge downstream supports both DAC and a 64-bit bus
  * width, so we record the PCI-HT bridge's secondary and subordinate bus
- * numbers and do not set the mask for devices present in the inclusive
+ * numbers and do not set the limit for devices present in the inclusive
  * range of those.
  */
-struct sb1250_bus_dma_mask_exclude {
+struct sb1250_bus_dma_limit_exclude {
bool set;
unsigned char start;
unsigned char end;
 };
 
-static int sb1250_bus_dma_mask(struct pci_dev *dev, void *data)
+static int sb1250_bus_dma_limit(struct pci_dev *dev, void *data)
 {
-   struct sb1250_bus_dma_mask_exclude *exclude = data;
+   struct sb1250_bus_dma_limit_exclude *exclude = data;
bool exclude_this;
bool ht_bridge;
 
@@ -55,7 +55,7 @@ static int sb1250_bus_dma_mask(struct pci_dev *dev, void 
*data)
exclude->start, exclude->end);
} else {
dev_dbg(&dev->dev, "disabling DAC for device");
-   dev->dev.bus_dma_mask = DMA_BIT_MASK(32);
+   dev->dev.bus_dma_limit = DMA_BIT_MASK(32);
}
 
return 0;
@@ -63,9 +63,9 @@ static int sb1250_bus_dma_mask(struct pci_dev *dev, void 
*data)
 
 static void quirk_sb1250_pci_dac(struct pci_dev *dev)
 {
-   struct sb1250_bus_dma_mask_exclude exclude = { .set = false };
+   struct sb1250_bus_dma_limit_exclude exclude = { .set = false };
 
-   pci_walk_bus(dev->bus, sb1250_bus_dma_mask, &exclude);
+   pci_walk_bus(dev->bus, sb1250_bus_dma_limit, &exclude);
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SIBYTE, PCI_DEVICE_ID_BCM1250_PCI,
quirk_sb1250_pci_dac);
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index ff0e2b156cb5..617a443d673d 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -115,8 +115,8 @@ static void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev)
 {
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 
-   pdev->dev.bus_dma_mask =
-   hose->dma_window_base_cur + hose->dma_window_size;
+   pdev->dev.bus_dma_limit =
+   hose->dma_window_base_cur + hose->dma_window_size - 1;
 }
 
 static void setup_swiotlb_ops(struct pci_controller *hose)
@@ -135,7 +135,7 @@ static void fsl_pci_dma_set_mask(struct device *dev, u64 
dma_mask)
 * mapping that allows addressing any RAM address from across PCI.
 */
if (dev_is_pci(dev) && dma_mask >= pci64_dma_offset * 2 - 1) {
-   dev->bus_dma_mask = 0;
+   dev->bus_dma_limit = 0;
dev->archdata.dma_offs

Re: [PATCH] dma-mapping: treat dev->bus_dma_mask as a DMA limit

2019-11-14 Thread Nicolas Saenz Julienne

On Wed, 2019-11-13 at 20:34 +, Robin Murphy wrote:
> On 13/11/2019 4:13 pm, Nicolas Saenz Julienne wrote:
> > Using a mask to represent bus DMA constraints has a set of limitations.
> > The biggest one being it can only hold a power of two (minus one). The
> > DMA mapping code is already aware of this and treats dev->bus_dma_mask
> > as a limit. This quirk is already used by some architectures although
> > still rare.
> > 
> > With the introduction of the Raspberry Pi 4 we've found a new contender
> > for the use of bus DMA limits, as its PCIe bus can only address the
> > lower 3GB of memory (of a total of 4GB). This is impossible to represent
> > with a mask. To make things worse the device-tree code rounds non power
> > of two bus DMA limits to the next power of two, which is unacceptable in
> > this case.
> > 
> > In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
> > over the tree and treat it as such. Note that dev->bus_dma_limit is
> > meant to contain the higher accesible DMA address.
> 
> Neat, you win a "why didn't I do it that way in the first place?" :)

:)

> Looking at it without all the history of previous attempts, this looks 
> entirely reasonable, and definitely a step in the right direction.
> 
> [...]
> > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> > index 5a7551d060f2..f18827cf96df 100644
> > --- a/drivers/acpi/arm64/iort.c
> > +++ b/drivers/acpi/arm64/iort.c
> > @@ -1097,7 +1097,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr,
> > u64 *dma_size)
> >  * Limit coherent and dma mask based on size
> >  * retrieved from firmware.
> >  */
> > -   dev->bus_dma_mask = mask;
> > +   dev->bus_dma_limit = mask;
> 
> Although this preserves the existing behaviour, as in of_dma_configure() 
> we can do better here since we have the original address range to hand. 
> I think it's worth keeping the ACPI and OF paths in sync for minor 
> tweaks like this, rather than letting them diverge unnecessarily.

I figure you mean something like this:

@@ -1085,19 +1085,15 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr,
u64 *dma_size)
}

if (!ret) {
-   msb = fls64(dmaaddr + size - 1);
-   /*
-* Round-up to the power-of-two mask or set
-* the mask to the whole 64-bit address space
-* in case the DMA region covers the full
-* memory window.
-*/
-   mask = msb == 64 ? U64_MAX : (1ULL << msb) - 1;
+   /* Round-up to the power-of-two */
+   end = dmaddr + size - 1;
+   mask = DMA_BIT_MASK(ilog2(end) + 1);
+
/*
 * Limit coherent and dma mask based on size
 * retrieved from firmware.
 */
-   dev->bus_dma_limit = mask;
+   dev->bus_dma_limit = end;
dev->coherent_dma_mask = mask;
*dev->dma_mask = mask;
}

> Otherwise, the rest looks OK to me - in principle we could store it as 
> an exclusive limit such that we could then streamline the min_not_zero() 
> tests to just min(mask, limit - 1), but that's probably too clever for 
> its own good.

Yes, that was my first intuition and in a perfect world I'd prefer it like
that. But as you say, it's probably going to cause more trouble than anything.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-mapping: treat dev->bus_dma_mask as a DMA limit

2019-11-21 Thread Nicolas Saenz Julienne

On Thu, 2019-11-21 at 08:31 +0100, Christoph Hellwig wrote:
> On Tue, Nov 19, 2019 at 05:17:03PM +, Robin Murphy wrote:
> > TBH I can't see it being a massive problem even if the DMA patch, driver 
> > and DTS patch went entirely separately via the respective DMA, PCI, and 
> > arm-soc trees in the same cycle. Bisecting over a merge window is a big 
> > enough pain in the bum as it is, and if the worst case is that someone 
> > trying to do that on a Pi4 has a wonky PCI controller appear for a couple 
> > of commits, they may as well just disable that driver for their bisection, 
> > because it wasn't there at the start so can't possibly be the thing they're 
> > looking for regressions in ;)
> 
> Agreed.
> 
> Nicolas, can you send a respin?  That way I can still queue it up
> for 5.5.

Oh, I thought it was too late for that already. I'll send it in a minute.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] dma-mapping: treat dev->bus_dma_mask as a DMA limit

2019-11-21 Thread Nicolas Saenz Julienne

Using a mask to represent bus DMA constraints has a set of limitations.
The biggest one being it can only hold a power of two (minus one). The
DMA mapping code is already aware of this and treats dev->bus_dma_mask
as a limit. This quirk is already used by some architectures although
still rare.

With the introduction of the Raspberry Pi 4 we've found a new contender
for the use of bus DMA limits, as its PCIe bus can only address the
lower 3GB of memory (of a total of 4GB). This is impossible to represent
with a mask. To make things worse the device-tree code rounds non power
of two bus DMA limits to the next power of two, which is unacceptable in
this case.

In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
over the tree and treat it as such. Note that dev->bus_dma_limit should
contain the higher accesible DMA address.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v1:
  - rework ACPI code to avoid divergence with OF's

 arch/mips/pci/fixup-sb1250.c  | 16 
 arch/powerpc/sysdev/fsl_pci.c |  6 +++---
 arch/x86/kernel/pci-dma.c |  2 +-
 arch/x86/mm/mem_encrypt.c |  2 +-
 arch/x86/pci/sta2x11-fixup.c  |  2 +-
 drivers/acpi/arm64/iort.c | 20 +++-
 drivers/ata/ahci.c|  2 +-
 drivers/iommu/dma-iommu.c |  3 +--
 drivers/of/device.c   |  9 +
 include/linux/device.h|  6 +++---
 include/linux/dma-direct.h|  2 +-
 include/linux/dma-mapping.h   |  2 +-
 kernel/dma/direct.c   | 27 +--
 13 files changed, 46 insertions(+), 53 deletions(-)

diff --git a/arch/mips/pci/fixup-sb1250.c b/arch/mips/pci/fixup-sb1250.c
index 8a41b359cf90..40efc990cdce 100644
--- a/arch/mips/pci/fixup-sb1250.c
+++ b/arch/mips/pci/fixup-sb1250.c
@@ -21,22 +21,22 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SIBYTE, 
PCI_DEVICE_ID_BCM1250_PCI,
 
 /*
  * The BCM1250, etc. PCI host bridge does not support DAC on its 32-bit
- * bus, so we set the bus's DMA mask accordingly.  However the HT link
+ * bus, so we set the bus's DMA limit accordingly.  However the HT link
  * down the artificial PCI-HT bridge supports 40-bit addressing and the
  * SP1011 HT-PCI bridge downstream supports both DAC and a 64-bit bus
  * width, so we record the PCI-HT bridge's secondary and subordinate bus
- * numbers and do not set the mask for devices present in the inclusive
+ * numbers and do not set the limit for devices present in the inclusive
  * range of those.
  */
-struct sb1250_bus_dma_mask_exclude {
+struct sb1250_bus_dma_limit_exclude {
bool set;
unsigned char start;
unsigned char end;
 };
 
-static int sb1250_bus_dma_mask(struct pci_dev *dev, void *data)
+static int sb1250_bus_dma_limit(struct pci_dev *dev, void *data)
 {
-   struct sb1250_bus_dma_mask_exclude *exclude = data;
+   struct sb1250_bus_dma_limit_exclude *exclude = data;
bool exclude_this;
bool ht_bridge;
 
@@ -55,7 +55,7 @@ static int sb1250_bus_dma_mask(struct pci_dev *dev, void 
*data)
exclude->start, exclude->end);
} else {
dev_dbg(&dev->dev, "disabling DAC for device");
-   dev->dev.bus_dma_mask = DMA_BIT_MASK(32);
+   dev->dev.bus_dma_limit = DMA_BIT_MASK(32);
}
 
return 0;
@@ -63,9 +63,9 @@ static int sb1250_bus_dma_mask(struct pci_dev *dev, void 
*data)
 
 static void quirk_sb1250_pci_dac(struct pci_dev *dev)
 {
-   struct sb1250_bus_dma_mask_exclude exclude = { .set = false };
+   struct sb1250_bus_dma_limit_exclude exclude = { .set = false };
 
-   pci_walk_bus(dev->bus, sb1250_bus_dma_mask, &exclude);
+   pci_walk_bus(dev->bus, sb1250_bus_dma_limit, &exclude);
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_SIBYTE, PCI_DEVICE_ID_BCM1250_PCI,
quirk_sb1250_pci_dac);
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index ff0e2b156cb5..617a443d673d 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -115,8 +115,8 @@ static void pci_dma_dev_setup_swiotlb(struct pci_dev *pdev)
 {
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
 
-   pdev->dev.bus_dma_mask =
-   hose->dma_window_base_cur + hose->dma_window_size;
+   pdev->dev.bus_dma_limit =
+   hose->dma_window_base_cur + hose->dma_window_size - 1;
 }
 
 static void setup_swiotlb_ops(struct pci_controller *hose)
@@ -135,7 +135,7 @@ static void fsl_pci_dma_set_mask(struct device *dev, u64 
dma_mask)
 * mapping that allows addressing any RAM address from across PCI.
 */
if (dev_is_pci(dev) && dma_mask >= pci64_dma_offset * 2 - 1) {
-   dev->bus_dma_mask = 0;
+   dev->bus_dma_limit = 0;
dev->archdata.dma_offset = pci64_dma_offset;
}
 }

Re: [PATCH v2] dma-mapping: treat dev->bus_dma_mask as a DMA limit

2019-11-21 Thread Nicolas Saenz Julienne

On Thu, 2019-11-21 at 16:24 +0100, Christoph Hellwig wrote:
> On Thu, Nov 21, 2019 at 10:26:44AM +0100, Nicolas Saenz Julienne wrote:
> > Using a mask to represent bus DMA constraints has a set of limitations.
> > The biggest one being it can only hold a power of two (minus one). The
> > DMA mapping code is already aware of this and treats dev->bus_dma_mask
> > as a limit. This quirk is already used by some architectures although
> > still rare.
> > 
> > With the introduction of the Raspberry Pi 4 we've found a new contender
> > for the use of bus DMA limits, as its PCIe bus can only address the
> > lower 3GB of memory (of a total of 4GB). This is impossible to represent
> > with a mask. To make things worse the device-tree code rounds non power
> > of two bus DMA limits to the next power of two, which is unacceptable in
> > this case.
> > 
> > In the light of this, rename dev->bus_dma_mask to dev->bus_dma_limit all
> > over the tree and treat it as such. Note that dev->bus_dma_limit should
> > contain the higher accesible DMA address.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> 
> I've tentatively added this patch to the dma-mapping tree based on
> Robins principal approval of the last version.  That way tomorrows
> linux-next run should still pick it up.

Thanks!



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 0/7] Raspberry Pi 4 PCIe support

2019-11-26 Thread Nicolas Saenz Julienne

This series aims at providing support for Raspberry Pi 4's PCIe
controller, which is also shared with the Broadcom STB family of
devices.

There was a previous attempt to upstream this some years ago[1] but was
blocked as most STB PCIe integrations have a sparse DMA mapping[2] which
is something currently not supported by the kernel.  Luckily this is not
the case for the Raspberry Pi 4.

Note that the driver code is to be based on top of Rob Herring's series
simplifying inbound and outbound range parsing.

[1] https://patchwork.kernel.org/cover/10605933/
[2] https://patchwork.kernel.org/patch/10605957/

---

Changes since v2:
  - Redo register access in driver avoiding indirection while keeping
the naming intact
  - Add patch editing ARM64's config
  - Last MSI cleanups, notably removing MSIX flag
  - Got rid of all _RB writes
  - Got rid of all of_data
  - Overall churn removal
  - Address the rest of Andrew's comments

Changes since v1:
  - add generic rounddown/roundup_pow_two64() patch
  - Add MAINTAINERS patch
  - Fix Kconfig
  - Cleanup probe, use up to date APIs, exit on MSI failure
  - Get rid of linux,pci-domain and other unused constructs
  - Use edge triggered setup for MSI
  - Cleanup MSI implementation
  - Fix multiple cosmetic issues
  - Remove supend/resume code

Jim Quinlan (3):
  dt-bindings: PCI: Add bindings for brcmstb's PCIe device
  PCI: brcmstb: add Broadcom STB PCIe host controller driver
  PCI: brcmstb: add MSI capability

Nicolas Saenz Julienne (4):
  linux/log2.h: Add roundup/rounddown_pow_two64() family of functions
  ARM: dts: bcm2711: Enable PCIe controller
  MAINTAINERS: Add brcmstb PCIe controller
  arm64: defconfig: Enable Broadcom's STB PCIe controller

 .../bindings/pci/brcm,stb-pcie.yaml   |   97 ++
 MAINTAINERS   |4 +
 arch/arm/boot/dts/bcm2711.dtsi|   41 +
 arch/arm64/configs/defconfig  |1 +
 drivers/acpi/arm64/iort.c |2 +-
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |3 +-
 drivers/of/device.c   |2 +-
 drivers/pci/controller/Kconfig|9 +
 drivers/pci/controller/Makefile   |1 +
 .../pci/controller/cadence/pcie-cadence-ep.c  |7 +-
 drivers/pci/controller/cadence/pcie-cadence.c |7 +-
 drivers/pci/controller/pcie-brcmstb.c | 1012 +
 drivers/pci/controller/pcie-rockchip-ep.c |9 +-
 include/linux/log2.h  |   52 +
 kernel/dma/direct.c   |3 +-
 15 files changed, 1230 insertions(+), 20 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
 create mode 100644 drivers/pci/controller/pcie-brcmstb.c

-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-26 Thread Nicolas Saenz Julienne

Some users need to make sure their rounding function accepts and returns
64bit long variables regardless of the architecture. Sadly
roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
new generic 64bit variant of the function and cleanup rougue custom
implementations.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v2:
  - Use u64
  - Rename function to roundup/down_pow_two_u64()
  - Use 1ULL instead of 1UL
  - Include function usage in of/device.c and acpi/arm64/iort.c

 drivers/acpi/arm64/iort.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |  3 +-
 drivers/of/device.c   |  2 +-
 .../pci/controller/cadence/pcie-cadence-ep.c  |  7 +--
 drivers/pci/controller/cadence/pcie-cadence.c |  7 +--
 drivers/pci/controller/pcie-rockchip-ep.c |  9 ++--
 include/linux/log2.h  | 52 +++
 kernel/dma/direct.c   |  3 +-
 8 files changed, 65 insertions(+), 20 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 33f71983e001..4809d3757d71 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1090,7 +1090,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, 
u64 *dma_size)
 * firmware.
 */
end = dmaaddr + size - 1;
-   mask = DMA_BIT_MASK(ilog2(end) + 1);
+   mask = roundup_pow_of_two_u64(end) - 1;
dev->bus_dma_limit = end;
dev->coherent_dma_mask = mask;
*dev->dma_mask = mask;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 024788549c25..ba5368e221f3 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -33,6 +33,7 @@
 
 #include 
 #include 
+#include 
 
 #include "mlx4_en.h"
 
@@ -252,7 +253,7 @@ static u32 freq_to_shift(u16 freq)
 {
u32 freq_khz = freq * 1000;
u64 max_val_cycles = freq_khz * 1000 * MLX4_EN_WRAP_AROUND_SEC;
-   u64 max_val_cycles_rounded = 1ULL << fls64(max_val_cycles - 1);
+   u64 max_val_cycles_rounded = roundup_pow_of_two_u64(max_val_cycles);
/* calculate max possible multiplier in order to fit in 64bit */
u64 max_mul = div64_u64(ULLONG_MAX, max_val_cycles_rounded);
 
diff --git a/drivers/of/device.c b/drivers/of/device.c
index e9127db7b067..418d7d014af1 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -149,7 +149,7 @@ int of_dma_configure(struct device *dev, struct device_node 
*np, bool force_dma)
 * set by the driver.
 */
end = dma_addr + size - 1;
-   mask = DMA_BIT_MASK(ilog2(end) + 1);
+   mask = roundup_pow_of_two_u64(end) - 1;
dev->coherent_dma_mask &= mask;
*dev->dma_mask &= mask;
/* ...but only set bus limit if we found valid dma-ranges earlier */
diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c 
b/drivers/pci/controller/cadence/pcie-cadence-ep.c
index 1c173dad67d1..f6e37eb67c68 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pcie-cadence.h"
 
@@ -61,11 +62,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
 
/* BAR size is 2^(aperture + 7) */
sz = max_t(size_t, epf_bar->size, CDNS_PCIE_EP_MIN_APERTURE);
-   /*
-* roundup_pow_of_two() returns an unsigned long, which is not suited
-* for 64bit values.
-*/
-   sz = 1ULL << fls64(sz - 1);
+   sz = roundup_pow_of_two_u64(sz);
aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
 
if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
diff --git a/drivers/pci/controller/cadence/pcie-cadence.c 
b/drivers/pci/controller/cadence/pcie-cadence.c
index cd795f6fc1e2..2ddda5ac60c4 100644
--- a/drivers/pci/controller/cadence/pcie-cadence.c
+++ b/drivers/pci/controller/cadence/pcie-cadence.c
@@ -4,6 +4,7 @@
 // Author: Cyrille Pitchen 
 
 #include 
+#include 
 
 #include "pcie-cadence.h"
 
@@ -11,11 +12,7 @@ void cdns_pcie_set_outbound_region(struct cdns_pcie *pcie, 
u8 fn,
   u32 r, bool is_io,
   u64 cpu_addr, u64 pci_addr, size_t size)
 {
-   /*
-* roundup_pow_of_two() returns an unsigned long, which is not suited
-* for 64bit values.
-*/
-   u64 sz = 1ULL << fls64(size - 1);
+   u64 sz = roundup_pow_of_two_u64(size);
int nbits = ilog2(sz);
u32 addr0, addr1, desc0, desc1;
 
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c 
b/drivers/pci/controller/pcie-rockchip-ep.c
index d743b0a48988..343f9683b540 100644
--- a/drivers/pci/contr

Re: [PATCH v2] dma-mapping: treat dev->bus_dma_mask as a DMA limit

2019-11-26 Thread Nicolas Saenz Julienne

On Mon, 2019-11-25 at 16:33 +, Robin Murphy wrote:
> On 25/11/2019 7:44 am, Christoph Hellwig wrote:
> > On Sat, Nov 23, 2019 at 09:51:08AM -0700, Nathan Chancellor wrote:
> > > Just as an FYI, this introduces a warning on arm32 allyesconfig for me:
> > 
> > I think the dma_limit argument to iommu_dma_alloc_iova should be a u64
> > and/or we need to use min_t and open code the zero exception.
> > 
> > Robin, Nicolas - any opinions?
> 
> Yeah, given that it's always held a mask I'm not entirely sure why it 
> was ever a dma_addr_t rather than a u64. Unless anyone else is desperate 
> to do it I'll get a cleanup patch ready for rc1.

Sounds good to me too

Robin, since I started the mess, I'll be happy to do it if it helps offloading
some work from you.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/7] Raspberry Pi 4 PCIe support

2019-11-27 Thread Nicolas Saenz Julienne

Hi Bjorn,

On Tue, 2019-11-26 at 15:50 -0600, Bjorn Helgaas wrote:
> On Tue, Nov 26, 2019 at 10:19:38AM +0100, Nicolas Saenz Julienne wrote:
> > This series aims at providing support for Raspberry Pi 4's PCIe
> > controller, which is also shared with the Broadcom STB family of
> > devices.
> > Jim Quinlan (3):
> >   dt-bindings: PCI: Add bindings for brcmstb's PCIe device
> >   PCI: brcmstb: add Broadcom STB PCIe host controller driver
> >   PCI: brcmstb: add MSI capability
> 
> Please update these subjects to match the others, i.e., capitalize
> "Add".  Also, I think "Add MSI capability" really means "Add support
> for MSI ..."; in PCIe terms the "MSI Capability" is a structure in
> config space and it's there whether the OS supports it or not.
> 
> No need to repost just for this.

Noted, I'll update them.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-27 Thread Nicolas Saenz Julienne

On Tue, 2019-11-26 at 10:19 +0100, Nicolas Saenz Julienne wrote:
> Some users need to make sure their rounding function accepts and returns
> 64bit long variables regardless of the architecture. Sadly
> roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
> new generic 64bit variant of the function and cleanup rougue custom
> implementations.
> 
> Signed-off-by: Nicolas Saenz Julienne 

Small Nit: I corrected the patch subject for next version.

linux/log2.h: Add roundup/rounddown_pow_two_u64() family of functions

Note the change here:  

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-27 Thread Nicolas Saenz Julienne

On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote:
> On 26/11/2019 12:51 pm, Leon Romanovsky wrote:
> > On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote:
> > > Some users need to make sure their rounding function accepts and returns
> > > 64bit long variables regardless of the architecture. Sadly
> > > roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
> > > new generic 64bit variant of the function and cleanup rougue custom
> > > implementations.
> > 
> > Is it possible to create general roundup/rounddown_pow_two() which will
> > work correctly for any type of variables, instead of creating special
> > variant for every type?
> 
> In fact, that is sort of the case already - roundup_pow_of_two() itself 
> wraps ilog2() such that the constant case *is* type-independent. And 
> since ilog2() handles non-constant values anyway, might it be reasonable 
> to just take the strongly-typed __roundup_pow_of_two() helper out of the 
> loop as below?
> 
> Robin
> 

That looks way better that's for sure. Some questions.

> ->8-
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index 83a4a3ca3e8a..e825f8a6e8b5 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>*/
>   #define roundup_pow_of_two(n)   \
>   (   \
> - __builtin_constant_p(n) ? ( \
> - (n == 1) ? 1 :  \
> - (1UL << (ilog2((n) - 1) + 1))   \
> -) :  \
> - __roundup_pow_of_two(n) \
> + (__builtin_constant_p(n) && (n == 1)) ? \
> + 1 : (1UL << (ilog2((n) - 1) + 1))   \

Then here you'd have to use ULL instead of UL, right? I want my 64bit value
everywhere regardless of the CPU arch. The downside is that would affect
performance to some extent (i.e. returning a 64bit value where you used to have
a 32bit one)?

Also, what about callers to this function on platforms with 32bit 'unsigned
longs' that happen to input a 64bit value into this. IIUC we'd have a change of
behaviour.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/7] linux/log2.h: Add roundup/rounddown_pow_two64() family of functions

2019-11-27 Thread Nicolas Saenz Julienne

On Wed, 2019-11-27 at 19:06 +, Robin Murphy wrote:
> On 27/11/2019 6:24 pm, Nicolas Saenz Julienne wrote:
> > On Wed, 2019-11-27 at 18:06 +, Robin Murphy wrote:
> > > On 26/11/2019 12:51 pm, Leon Romanovsky wrote:
> > > > On Tue, Nov 26, 2019 at 10:19:39AM +0100, Nicolas Saenz Julienne wrote:
> > > > > Some users need to make sure their rounding function accepts and
> > > > > returns
> > > > > 64bit long variables regardless of the architecture. Sadly
> > > > > roundup/rounddown_pow_two() takes and returns unsigned longs. Create a
> > > > > new generic 64bit variant of the function and cleanup rougue custom
> > > > > implementations.
> > > > 
> > > > Is it possible to create general roundup/rounddown_pow_two() which will
> > > > work correctly for any type of variables, instead of creating special
> > > > variant for every type?
> > > 
> > > In fact, that is sort of the case already - roundup_pow_of_two() itself
> > > wraps ilog2() such that the constant case *is* type-independent. And
> > > since ilog2() handles non-constant values anyway, might it be reasonable
> > > to just take the strongly-typed __roundup_pow_of_two() helper out of the
> > > loop as below?
> > > 
> > > Robin
> > > 
> > 
> > That looks way better that's for sure. Some questions.
> > 
> > > ->8-
> > > diff --git a/include/linux/log2.h b/include/linux/log2.h
> > > index 83a4a3ca3e8a..e825f8a6e8b5 100644
> > > --- a/include/linux/log2.h
> > > +++ b/include/linux/log2.h
> > > @@ -172,11 +172,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
> > > */
> > >#define roundup_pow_of_two(n)  \
> > >(  \
> > > - __builtin_constant_p(n) ? ( \
> > > - (n == 1) ? 1 :  \
> > > - (1UL << (ilog2((n) - 1) + 1))   \
> > > -) :  \
> > > - __roundup_pow_of_two(n) \
> > > + (__builtin_constant_p(n) && (n == 1)) ? \
> > > + 1 : (1UL << (ilog2((n) - 1) + 1))   \
> > 
> > Then here you'd have to use ULL instead of UL, right? I want my 64bit value
> > everywhere regardless of the CPU arch. The downside is that would affect
> > performance to some extent (i.e. returning a 64bit value where you used to
> > have
> > a 32bit one)?
> 
> True, although it's possible that 1ULL might result in the same codegen 
> if the compiler can see that the result is immediately truncated back to 
> long anyway. Or at worst, I suppose "(typeof(n))1" could suffice, 
> however ugly. Either way, this diff was only an illustration rather than 
> a concrete proposal, but it might be an interesting diversion to 
> investigate.
> 
> On that note, though, you should probably be using ULL in your current 
> patch too.

I actually meant to, the fix got lost. Thanks for pointing it out.

As I see Leon also likes this, I'll try out this implementation and come back
with some results.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations

2019-12-03 Thread Nicolas Saenz Julienne

The function now is safe to use while expecting a 64bit value. Use it
where relevant.

Signed-off-by: Nicolas Saenz Julienne 
---
 drivers/acpi/arm64/iort.c| 2 +-
 drivers/net/ethernet/mellanox/mlx4/en_clock.c| 3 ++-
 drivers/of/device.c  | 3 ++-
 drivers/pci/controller/cadence/pcie-cadence-ep.c | 3 ++-
 drivers/pci/controller/cadence/pcie-cadence.c| 3 ++-
 drivers/pci/controller/pcie-brcmstb.c| 3 ++-
 drivers/pci/controller/pcie-rockchip-ep.c| 5 +++--
 kernel/dma/direct.c  | 2 +-
 8 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 33f71983e001..9950c9757092 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1090,7 +1090,7 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, 
u64 *dma_size)
 * firmware.
 */
end = dmaaddr + size - 1;
-   mask = DMA_BIT_MASK(ilog2(end) + 1);
+   mask = roundup_pow_of_two(end) - 1;
dev->bus_dma_limit = end;
dev->coherent_dma_mask = mask;
*dev->dma_mask = mask;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
index 024788549c25..23dcb18224d4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_clock.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_clock.c
@@ -33,6 +33,7 @@
 
 #include 
 #include 
+#include 
 
 #include "mlx4_en.h"
 
@@ -252,7 +253,7 @@ static u32 freq_to_shift(u16 freq)
 {
u32 freq_khz = freq * 1000;
u64 max_val_cycles = freq_khz * 1000 * MLX4_EN_WRAP_AROUND_SEC;
-   u64 max_val_cycles_rounded = 1ULL << fls64(max_val_cycles - 1);
+   u64 max_val_cycles_rounded = roundup_pow_of_two(max_val_cycles);
/* calculate max possible multiplier in order to fit in 64bit */
u64 max_mul = div64_u64(ULLONG_MAX, max_val_cycles_rounded);
 
diff --git a/drivers/of/device.c b/drivers/of/device.c
index e9127db7b067..7259922d2078 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "of_private.h"
@@ -149,7 +150,7 @@ int of_dma_configure(struct device *dev, struct device_node 
*np, bool force_dma)
 * set by the driver.
 */
end = dma_addr + size - 1;
-   mask = DMA_BIT_MASK(ilog2(end) + 1);
+   mask = roundup_pow_of_two(end) - 1;
dev->coherent_dma_mask &= mask;
*dev->dma_mask &= mask;
/* ...but only set bus limit if we found valid dma-ranges earlier */
diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c 
b/drivers/pci/controller/cadence/pcie-cadence-ep.c
index 1c173dad67d1..72eda0b2f939 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pcie-cadence.h"
 
@@ -65,7 +66,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
 * roundup_pow_of_two() returns an unsigned long, which is not suited
 * for 64bit values.
 */
-   sz = 1ULL << fls64(sz - 1);
+   sz = roundup_pow_of_two(sz);
aperture = ilog2(sz) - 7; /* 128B -> 0, 256B -> 1, 512B -> 2, ... */
 
if ((flags & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO) {
diff --git a/drivers/pci/controller/cadence/pcie-cadence.c 
b/drivers/pci/controller/cadence/pcie-cadence.c
index cd795f6fc1e2..b1689f725b41 100644
--- a/drivers/pci/controller/cadence/pcie-cadence.c
+++ b/drivers/pci/controller/cadence/pcie-cadence.c
@@ -4,6 +4,7 @@
 // Author: Cyrille Pitchen 
 
 #include 
+#include 
 
 #include "pcie-cadence.h"
 
@@ -15,7 +16,7 @@ void cdns_pcie_set_outbound_region(struct cdns_pcie *pcie, u8 
fn,
 * roundup_pow_of_two() returns an unsigned long, which is not suited
 * for 64bit values.
 */
-   u64 sz = 1ULL << fls64(size - 1);
+   u64 sz = roundup_pow_of_two(size);
int nbits = ilog2(sz);
u32 addr0, addr1, desc0, desc1;
 
diff --git a/drivers/pci/controller/pcie-brcmstb.c 
b/drivers/pci/controller/pcie-brcmstb.c
index 7ba06a0e1a71..e705d9d73030 100644
--- a/drivers/pci/controller/pcie-brcmstb.c
+++ b/drivers/pci/controller/pcie-brcmstb.c
@@ -627,7 +627,8 @@ static inline int 
brcm_pcie_get_rc_bar2_size_and_offset(struct brcm_pcie *pcie,
return -ENODEV;
 
*rc_bar2_offset = -entry->offset;
-   *rc_bar2_size = 1ULL << fls64(entry->res->end - entry->res->start);
+   *rc_bar2_size = roundup_pow_of_two(entry->res->end -
+  entry->res->start + 1);
 
/*
 * We validate the inbound memory view even though we shou

[PATCH v4 0/8] Raspberry Pi 4 PCIe support

2019-12-03 Thread Nicolas Saenz Julienne

This series aims at providing support for Raspberry Pi 4's PCIe
controller, which is also shared with the Broadcom STB family of
devices.

There was a previous attempt to upstream this some years ago[1] but was
blocked as most STB PCIe integrations have a sparse DMA mapping[2] which
is something currently not supported by the kernel.  Luckily this is not
the case for the Raspberry Pi 4.

Note that the driver code is to be based on top of Rob Herring's series
simplifying inbound and outbound range parsing.

[1] https://patchwork.kernel.org/cover/10605933/
[2] https://patchwork.kernel.org/patch/10605957/

---

Changes since v3:
  - Moved all the log2.h related changes at the end of the series, as I
presume they will be contentious and I don't want the PCIe patches
to depend on them. Ultimately I think I'll respin them on their own
series but wanted to keep them in for this submission just for the
sake of continuity.
  - Addressed small nits here and there.

Changes since v2:
  - Redo register access in driver avoiding indirection while keeping
the naming intact
  - Add patch editing ARM64's config
  - Last MSI cleanups, notably removing MSIX flag
  - Got rid of all _RB writes
  - Got rid of all of_data
  - Overall churn removal
  - Address the rest of Andrew's comments

Changes since v1:
  - add generic rounddown/roundup_pow_two64() patch
  - Add MAINTAINERS patch
  - Fix Kconfig
  - Cleanup probe, use up to date APIs, exit on MSI failure
  - Get rid of linux,pci-domain and other unused constructs
  - Use edge triggered setup for MSI
  - Cleanup MSI implementation
  - Fix multiple cosmetic issues
  - Remove supend/resume code

Jim Quinlan (3):
  dt-bindings: PCI: Add bindings for brcmstb's PCIe device
  PCI: brcmstb: Add Broadcom STB PCIe host controller driver
  PCI: brcmstb: Add MSI support

Nicolas Saenz Julienne (5):
  ARM: dts: bcm2711: Enable PCIe controller
  MAINTAINERS: Add brcmstb PCIe controller
  arm64: defconfig: Enable Broadcom's STB PCIe controller
  linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()
  linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations

 .../bindings/pci/brcm,stb-pcie.yaml   |   97 ++
 MAINTAINERS   |4 +
 arch/arm/boot/dts/bcm2711.dtsi|   37 +
 arch/arm64/configs/defconfig  |1 +
 drivers/acpi/arm64/iort.c |2 +-
 drivers/clk/clk-divider.c |8 +-
 drivers/clk/sunxi/clk-sunxi.c |2 +-
 drivers/infiniband/hw/hfi1/chip.c |4 +-
 drivers/infiniband/hw/hfi1/init.c |4 +-
 drivers/infiniband/hw/mlx4/srq.c  |2 +-
 drivers/infiniband/hw/mthca/mthca_srq.c   |2 +-
 drivers/infiniband/sw/rxe/rxe_qp.c|4 +-
 drivers/iommu/intel-iommu.c   |4 +-
 drivers/iommu/intel-svm.c |4 +-
 drivers/iommu/intel_irq_remapping.c   |2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c  |4 +-
 drivers/net/ethernet/marvell/sky2.c   |2 +-
 drivers/net/ethernet/mellanox/mlx4/en_clock.c |3 +-
 drivers/net/ethernet/rocker/rocker_hw.h   |4 +-
 drivers/net/ethernet/sfc/ef10.c   |2 +-
 drivers/net/ethernet/sfc/efx.h|2 +-
 drivers/net/ethernet/sfc/falcon/efx.h |2 +-
 drivers/of/device.c   |3 +-
 drivers/pci/controller/Kconfig|9 +
 drivers/pci/controller/Makefile   |1 +
 .../pci/controller/cadence/pcie-cadence-ep.c  |3 +-
 drivers/pci/controller/cadence/pcie-cadence.c |3 +-
 drivers/pci/controller/pcie-brcmstb.c | 1008 +
 drivers/pci/controller/pcie-rockchip-ep.c |5 +-
 drivers/pci/msi.c |2 +-
 include/linux/log2.h  |   44 +-
 kernel/dma/direct.c   |2 +-
 kernel/kexec_core.c   |3 +-
 lib/rhashtable.c  |2 +-
 net/sunrpc/xprtrdma/verbs.c   |2 +-
 35 files changed, 1211 insertions(+), 72 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
 create mode 100644 drivers/pci/controller/pcie-brcmstb.c

-- 
2.24.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()

2019-12-03 Thread Nicolas Saenz Julienne

Some users need to make sure their rounding function accepts and returns
64bit long variables regardless of the architecture. Sadly
roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
out ilog2() already handles 32/64bit calculations properly, and being
the building block to the round functions we can rework them as a
wrapper around it.

Suggested-by: Robin Murphy 
Signed-off-by: Nicolas Saenz Julienne 
---
 drivers/clk/clk-divider.c|  8 ++--
 drivers/clk/sunxi/clk-sunxi.c|  2 +-
 drivers/infiniband/hw/hfi1/chip.c|  4 +-
 drivers/infiniband/hw/hfi1/init.c|  4 +-
 drivers/infiniband/hw/mlx4/srq.c |  2 +-
 drivers/infiniband/hw/mthca/mthca_srq.c  |  2 +-
 drivers/infiniband/sw/rxe/rxe_qp.c   |  4 +-
 drivers/iommu/intel-iommu.c  |  4 +-
 drivers/iommu/intel-svm.c|  4 +-
 drivers/iommu/intel_irq_remapping.c  |  2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-ethtool.c |  4 +-
 drivers/net/ethernet/marvell/sky2.c  |  2 +-
 drivers/net/ethernet/rocker/rocker_hw.h  |  4 +-
 drivers/net/ethernet/sfc/ef10.c  |  2 +-
 drivers/net/ethernet/sfc/efx.h   |  2 +-
 drivers/net/ethernet/sfc/falcon/efx.h|  2 +-
 drivers/pci/msi.c|  2 +-
 include/linux/log2.h | 44 +---
 kernel/kexec_core.c  |  3 +-
 lib/rhashtable.c |  2 +-
 net/sunrpc/xprtrdma/verbs.c  |  2 +-
 21 files changed, 41 insertions(+), 64 deletions(-)

diff --git a/drivers/clk/clk-divider.c b/drivers/clk/clk-divider.c
index 098b2b01f0af..ba947e4c8193 100644
--- a/drivers/clk/clk-divider.c
+++ b/drivers/clk/clk-divider.c
@@ -222,7 +222,7 @@ static int _div_round_up(const struct clk_div_table *table,
int div = DIV_ROUND_UP_ULL((u64)parent_rate, rate);
 
if (flags & CLK_DIVIDER_POWER_OF_TWO)
-   div = __roundup_pow_of_two(div);
+   div = roundup_pow_of_two(div);
if (table)
div = _round_up_table(table, div);
 
@@ -240,8 +240,8 @@ static int _div_round_closest(const struct clk_div_table 
*table,
down = parent_rate / rate;
 
if (flags & CLK_DIVIDER_POWER_OF_TWO) {
-   up = __roundup_pow_of_two(up);
-   down = __rounddown_pow_of_two(down);
+   up = roundup_pow_of_two(up);
+   down = rounddown_pow_of_two(down);
} else if (table) {
up = _round_up_table(table, up);
down = _round_down_table(table, down);
@@ -278,7 +278,7 @@ static int _next_div(const struct clk_div_table *table, int 
div,
div++;
 
if (flags & CLK_DIVIDER_POWER_OF_TWO)
-   return __roundup_pow_of_two(div);
+   return roundup_pow_of_two(div);
if (table)
return _round_up_table(table, div);
 
diff --git a/drivers/clk/sunxi/clk-sunxi.c b/drivers/clk/sunxi/clk-sunxi.c
index 27201fd26e44..faec99dc09c0 100644
--- a/drivers/clk/sunxi/clk-sunxi.c
+++ b/drivers/clk/sunxi/clk-sunxi.c
@@ -311,7 +311,7 @@ static void sun6i_get_ahb1_factors(struct factors_request 
*req)
 
calcm = DIV_ROUND_UP(div, 1 << calcp);
} else {
-   calcp = __roundup_pow_of_two(div);
+   calcp = roundup_pow_of_two(div);
calcp = calcp > 3 ? 3 : calcp;
}
 
diff --git a/drivers/infiniband/hw/hfi1/chip.c 
b/drivers/infiniband/hw/hfi1/chip.c
index 9b1fb84a3d45..96b1d343c32f 100644
--- a/drivers/infiniband/hw/hfi1/chip.c
+++ b/drivers/infiniband/hw/hfi1/chip.c
@@ -14199,10 +14199,10 @@ static int qos_rmt_entries(struct hfi1_devdata *dd, 
unsigned int *mp,
max_by_vl = krcvqs[i];
if (max_by_vl > 32)
goto no_qos;
-   m = ilog2(__roundup_pow_of_two(max_by_vl));
+   m = ilog2(roundup_pow_of_two(max_by_vl));
 
/* determine bits for vl */
-   n = ilog2(__roundup_pow_of_two(num_vls));
+   n = ilog2(roundup_pow_of_two(num_vls));
 
/* reject if too much is used */
if ((m + n) > 7)
diff --git a/drivers/infiniband/hw/hfi1/init.c 
b/drivers/infiniband/hw/hfi1/init.c
index 26b792bb1027..838c789c7cce 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -467,7 +467,7 @@ int hfi1_create_ctxtdata(struct hfi1_pportdata *ppd, int 
numa,
 * MTU supported.
 */
if (rcd->egrbufs.size < hfi1_max_mtu) {
-   rcd->egrbufs.size = __roundup_pow_of_two(hfi1_max_mtu);
+   rcd->egrbufs.size = roundup_pow_of_two(hfi1_max_mtu);
hfi1_cdbg(PROC,
  "ctxt%u: eager bufs size too small. Adjusting 
to %u\n",
rcd->

Re: [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations

2019-12-03 Thread Nicolas Saenz Julienne

Hi Rob,

On Tue, 2019-12-03 at 09:53 -0600, Rob Herring wrote:
> On Tue, Dec 3, 2019 at 5:48 AM Nicolas Saenz Julienne
>  wrote:
> > The function now is safe to use while expecting a 64bit value. Use it
> > where relevant.
> 
> What was wrong with the existing code? This is missing some context.

You're right, I'll update it.

For most of files changed the benefit here is factoring out a common pattern
using the standard function roundup/down_pow_two() which now provides correct
64bit results.

As for of/device.c and arm64/iort.c it's more of a readability enhancement. I
consider it's easier to understand than the current calculation as it abstracts
the math.

> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> >  drivers/acpi/arm64/iort.c| 2 +-
> >  drivers/net/ethernet/mellanox/mlx4/en_clock.c| 3 ++-
> >  drivers/of/device.c  | 3 ++-
> 
> In any case,
> 
> Acked-by: Rob Herring 
> 

Thanks!

Regards,
Nicolas

signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2] iommu/dma: Rationalise types for DMA masks

2019-12-11 Thread Nicolas Saenz Julienne

On Wed, 2019-12-11 at 18:33 +, Robin Murphy wrote:
> Since iommu_dma_alloc_iova() combines incoming masks with the u64 bus
> limit, it makes more sense to pass them around in their native u64
> rather than converting to dma_addr_t early. Do that, and resolve the
> remaining type discrepancy against the domain geometry with a cheeky
> cast to keep things simple.
> 
> Signed-off-by: Robin Murphy 

Thanks!

Reviewed-by: Nicolas Saenz Julienne 



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()

2019-12-12 Thread Nicolas Saenz Julienne

Hi Robin,

On Thu, 2019-12-05 at 17:48 +, Robin Murphy wrote:
> On 03/12/2019 11:47 am, Nicolas Saenz Julienne wrote:
> > Some users need to make sure their rounding function accepts and returns
> > 64bit long variables regardless of the architecture. Sadly
> > roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> > out ilog2() already handles 32/64bit calculations properly, and being
> > the building block to the round functions we can rework them as a
> > wrapper around it.
> 
> Neat! Although all the additional ULL casts this introduces seem 
> somewhat unwelcome - I suppose the (1ULL << (ilog2(n))) makes it 
> effectively always return unsigned long long now. Might it make sense to 
> cast the return value to typeof(n) to avoid this slightly non-obvious 
> behaviour (and the associated churn)?

It might alleviate some of the churn alright but I don't think a cast is really
going to make the behaviour more obvious. Say your expression is a big mess,
you'll have to analyze it to infer the output type, keeping in mind things like
integer promotion. See this example, 'params->nelem_hint' and
'params->min_size' are u16:

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index bdb7e4cadf05..70908678c7a8 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -950,7 +950,7 @@ static size_t rounded_hashtable_size(const struct 
rhashtable_params *params)

if (params->nelem_hint)
retsize = max(roundup_pow_of_two(params->nelem_hint * 4 
/ 3),
- (unsigned long)params->min_size);
+ (unsigned long long)params->min_size);
else
retsize = max(HASH_DEFAULT_SIZE,
  (unsigned long)params->min_size);

With a cast the patch will look like this:

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index bdb7e4cadf05..70908678c7a8 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -950,7 +950,7 @@ static size_t rounded_hashtable_size(const struct 
rhashtable_params *params)

if (params->nelem_hint)
retsize = max(roundup_pow_of_two(params->nelem_hint * 4 
/ 3),
- (unsigned long)params->min_size);
+ (int)params->min_size);
else
retsize = max(HASH_DEFAULT_SIZE,
  (unsigned long)params->min_size);

To me it's even less obvious than with a fixed ULL.

My intuition tells me to keep it as similar as the old behaviour, at the
expense of the extra churn (which is not that different from the current status
quo anyway). That said, I'll be happy to change it.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 7/8] linux/log2.h: Fix 64bit calculations in roundup/down_pow_two()

2019-12-12 Thread Nicolas Saenz Julienne

On Thu, 2019-12-05 at 16:30 -0600, Bjorn Helgaas wrote:
> You got the "n" on "down" in the subject, but still missing "of" ;)

Yes, sorry about that, I tend to re-read what I meant to say instead of what
it's actually written.

> On Tue, Dec 03, 2019 at 12:47:40PM +0100, Nicolas Saenz Julienne wrote:
> > Some users need to make sure their rounding function accepts and returns
> > 64bit long variables regardless of the architecture. Sadly
> > roundup/rounddown_pow_two() takes and returns unsigned longs. It turns
> > out ilog2() already handles 32/64bit calculations properly, and being
> > the building block to the round functions we can rework them as a
> > wrapper around it.
> 
> Missing "of" in the function names here.
> s/a wrapper/wrappers/

Noted

> IIUC the point of this is that roundup_pow_of_two() returned
> "unsigned long", which can be either 32 or 64 bits (worth pointing
> out, I think), and many callers need something that returns
> "unsigned long long" (always 64 bits).

I'll update the commit message to be a more explicit.

> It's a nice simplification to remove the "__" variants.  Just as a
> casual reader of this commit message, I'd like to know why we had both
> the roundup and the __roundup versions in the first place, and why we
> no longer need both.

So, the commit that introduced it (312a0c170945b) meant to use the '__' variant
as a helper, but, due to the fact this is a header file, some found it and made
use of it. I went over some if the commits introducing '__' usages and none of
them seem to acknowledge its use as opposed to the macro version. I think it's
fair to say it's a case of cargo-culting.

> > -#define roundup_pow_of_two(n)  \
> > -(  \
> > -   __builtin_constant_p(n) ? ( \
> > -   (n == 1) ? 1 :  \
> > -   (1UL << (ilog2((n) - 1) + 1))   \
> > -  ) :  \
> > -   __roundup_pow_of_two(n) \
> > - )
> > +#define roundup_pow_of_two(n)\
> > +(\
> > +   (__builtin_constant_p(n) && ((n) == 1)) ? \
> > +   1 : (1ULL << (ilog2((n) - 1) + 1))\
> > +)
> 
> Should the resulting type of this expression always be a ULL, even
> when n==1, i.e., should it be this?
> 
>   1ULL : (1ULL << (ilog2((n) - 1) + 1))\
> 
> Or maybe there's no case where that makes a difference?

It should be 1ULL on either case.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 8/8] linux/log2.h: Use roundup/dow_pow_two() on 64bit calculations

2019-12-12 Thread Nicolas Saenz Julienne

On Thu, 2019-12-05 at 14:38 -0600, Bjorn Helgaas wrote:
> The subject contains a couple typos: it's missing "of" and it's
> missing the "n" on "down".

Noted > 
> On Tue, Dec 03, 2019 at 12:47:41PM +0100, Nicolas Saenz Julienne wrote:
> > The function now is safe to use while expecting a 64bit value. Use it
> > where relevant.
> 
> Please include the function names ("roundup_pow_of_two()",
> "rounddown_pow_of_two()") in the changelog so it is self-contained and
> doesn't depend on the subject.

Noted

> > Signed-off-by: Nicolas Saenz Julienne 
> 
> With the nits above and below addressed,
> 
> Acked-by: Bjorn Helgaas  # drivers/pci

Thanks!

> > ---
> >  drivers/acpi/arm64/iort.c| 2 +-
> >  drivers/net/ethernet/mellanox/mlx4/en_clock.c| 3 ++-
> >  drivers/of/device.c  | 3 ++-
> >  drivers/pci/controller/cadence/pcie-cadence-ep.c | 3 ++-
> >  drivers/pci/controller/cadence/pcie-cadence.c| 3 ++-
> >  drivers/pci/controller/pcie-brcmstb.c| 3 ++-
> >  drivers/pci/controller/pcie-rockchip-ep.c| 5 +++--
> >  kernel/dma/direct.c  | 2 +-
> >  8 files changed, 15 insertions(+), 9 deletions(-)
> > --- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
> > +++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
> > @@ -10,6 +10,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include "pcie-cadence.h"
> >  
> > @@ -65,7 +66,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8
> > fn,
> >  * roundup_pow_of_two() returns an unsigned long, which is not suited
> >  * for 64bit values.
> >  */
> 
> Please remove the comment above since it no longer applies.

Noted

[...]

> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 6af7ae83c4ad..056886c4efec 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -53,7 +53,7 @@ u64 dma_direct_get_required_mask(struct device *dev)
> >  {
> > u64 max_dma = phys_to_dma_direct(dev, (max_pfn - 1) << PAGE_SHIFT);
> >  
> > -   return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
> > +   return rounddown_pow_of_two(max_dma) * 2 - 1;
> 
> Personally I would probably make this one a separate patch since it's
> qualitatively different than the others and it would avoid the slight
> awkwardness of the non-greppable "roundup/down_pow_of_two()"
> construction in the commit subject.
> 
> But it's fine either way.

I'll split it into two parts, as RobH made a similar complaint.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] dma-contiguous: CMA: give precedence to cmdline

2020-01-10 Thread Nicolas Saenz Julienne

Although the device tree might contain a reserved-memory DT node
dedicated as the default CMA pool, users might want to change CMA's
parameters using the kernel command line for debugging purposes and
whatnot. Honor this by bypassing the reserved memory CMA setup, which
will ultimately end up freeing the memblock and allow the command line
CMA configuration routine to run.

Signed-off-by: Nicolas Saenz Julienne 
---

NOTE: Tested this on arm and arm64 with the Raspberry Pi 4.

 kernel/dma/contiguous.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index daa4e6eefdde..8bc6f2d670f9 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -302,9 +302,16 @@ static int __init rmem_cma_setup(struct reserved_mem *rmem)
phys_addr_t align = PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order);
phys_addr_t mask = align - 1;
unsigned long node = rmem->fdt_node;
+   bool default_cma = of_get_flat_dt_prop(node, "linux,cma-default", NULL);
struct cma *cma;
int err;
 
+   if (size_cmdline != -1 && default_cma) {
+   pr_info("Reserved memory: bypass %s node, using cmdline CMA 
params instead\n",
+   rmem->name);
+   return -EBUSY;
+   }
+
if (!of_get_flat_dt_prop(node, "reusable", NULL) ||
of_get_flat_dt_prop(node, "no-map", NULL))
return -EINVAL;
@@ -322,7 +329,7 @@ static int __init rmem_cma_setup(struct reserved_mem *rmem)
/* Architecture specific contiguous memory fixup. */
dma_contiguous_early_fixup(rmem->base, rmem->size);
 
-   if (of_get_flat_dt_prop(node, "linux,cma-default", NULL))
+   if (default_cma)
dma_contiguous_set_default(cma);
 
rmem->ops = &rmem_cma_ops;
-- 
2.24.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-contiguous: CMA: give precedence to cmdline

2020-01-30 Thread Nicolas Saenz Julienne

On Thu Jan 30, 2020 at 2:42 PM, Christoph Hellwig wrote:
> I've picked this up for Linux 5.6, sorry for the delay.

Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC 3/4] dma-direct: add dma_direct_min_mask

2019-07-17 Thread Nicolas Saenz Julienne

Historically devices with ZONE_DMA32 have been assumed to be able to
address at least the lower 4GB of ram for DMA. This is still the defualt
behavior yet the Raspberry Pi 4 is limited to the first GB of memory.
This has been observed to trigger failures in dma_direct_supported() as
the 'min_mask' isn't properly set.

We create 'dma_direct_min_mask' in order for the arch init code to be
able to fine-tune dma direct's 'min_dma' mask.

Signed-off-by: Nicolas Saenz Julienne 
---
 kernel/dma/direct.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index b90e1aede743..3c8cd730648b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -23,6 +23,8 @@
 #define ARCH_ZONE_DMA_BITS 24
 #endif
 
+u64 dma_direct_min_mask __ro_after_init = DMA_BIT_MASK(32);
+
 /*
  * For AMD SEV all DMA must be to unencrypted addresses.
  */
@@ -393,7 +395,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
if (IS_ENABLED(CONFIG_ZONE_DMA))
min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
else
-   min_mask = DMA_BIT_MASK(32);
+   min_mask = dma_direct_min_mask;
 
min_mask = min_t(u64, min_mask, (max_pfn - 1) << PAGE_SHIFT);
 
-- 
2.22.0

[RFC 0/4] Raspberry Pi 4 DMA addressing support

2019-07-17 Thread Nicolas Saenz Julienne

Hi all,
this series attempts to address some issues we found while bringing up the new
Raspberry Pi 4 in arm64 and it's intended to serve as a follow up to this:
https://www.spinics.net/lists/arm-kernel/msg740650.html

The new Raspberry Pi 4 has up to 4GB of ram but most devices can only address
the first GB of ram: the DMA address range is 0xc000-0xfc00 which is
aliased to the first GB of memory 0x-0x3c00. Note that only some
devices have this limitations, the ARM cores, PCIe, GENET, and 40-bit DMA
channels have a wider view of the address space.

This is solved in arm32 by setting up the correct '.dma_zone_size = SZ_1G'
which takes care of the allocating the coherent memory area at the right spot
and also is taken into account in the arch specific 'dma_map_ops'.

Unfortunately there is no such thing as '.dma_zone_size' in arm64, to make
things worse it's assumed that all devices will be able to adress the first 4GB
of memory.

This raises two issues: the coherent memory reserves are located in an area not
accessible by most devices, and DMA streaming's dma_supported(), which fails
for most devices since it's min_mask isn't properly set. Note that the rest if
DMA streaming works fine thanks to the help of swiotlb.

On one hand I've implemented a function that parses the 'dma-range' on all
interconnects and tries to select a location for the coherent memory reserves
that'll fit all devices. I made the algorithm as simple as possible, based on
the existing devices limitations.

On the other I've added a new variable in dma-direct that allows modifying the
min_mask during the init process and taken care of setting it accordingly in
the arm64's init code.

Regards,
Nicolas

---

Nicolas Saenz Julienne (4):
  arm64: mm: use arm64_dma_phys_limit instead of calling max_zone_dma_phys()
  arm64: mm: parse dma-ranges in order to better estimate arm64_dma_phys_limit
  dma-direct: add dma_direct_min_mask
  arm64: mm: set direct_dma_min_mask according to dma-ranges

 arch/arm64/mm/init.c | 69 
 kernel/dma/direct.c  |  4 ++-
 2 files changed, 67 insertions(+), 6 deletions(-)

-- 
2.22.0

Re: [RFC 3/4] dma-direct: add dma_direct_min_mask

2019-07-18 Thread Nicolas Saenz Julienne

On Thu, 2019-07-18 at 11:15 +0200, Christoph Hellwig wrote:
> On Wed, Jul 17, 2019 at 05:31:34PM +0200, Nicolas Saenz Julienne wrote:
> > Historically devices with ZONE_DMA32 have been assumed to be able to
> > address at least the lower 4GB of ram for DMA. This is still the defualt
> > behavior yet the Raspberry Pi 4 is limited to the first GB of memory.
> > This has been observed to trigger failures in dma_direct_supported() as
> > the 'min_mask' isn't properly set.
> > 
> > We create 'dma_direct_min_mask' in order for the arch init code to be
> > able to fine-tune dma direct's 'min_dma' mask.
> 
> Normally we use ZONE_DMA for that case.

Fair enough, I didn't think of that possibility.

So would the arm64 maintainers be happy with something like this:

- ZONE_DMA: Follows standard definition, 16MB in size. ARCH_ZONE_DMA_BITS is
left as is.
- ZONE_DMA32: Will honor the most constraining 'dma-ranges'. Which so far for
  most devices is 4G, except for RPi4.
- ZONE_NORMAL: The rest of the memory.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC 3/4] dma-direct: add dma_direct_min_mask

2019-07-19 Thread Nicolas Saenz Julienne

On Thu, 2019-07-18 at 13:18 +0200, Nicolas Saenz Julienne wrote:
> On Thu, 2019-07-18 at 11:15 +0200, Christoph Hellwig wrote:
> > On Wed, Jul 17, 2019 at 05:31:34PM +0200, Nicolas Saenz Julienne wrote:
> > > Historically devices with ZONE_DMA32 have been assumed to be able to
> > > address at least the lower 4GB of ram for DMA. This is still the defualt
> > > behavior yet the Raspberry Pi 4 is limited to the first GB of memory.
> > > This has been observed to trigger failures in dma_direct_supported() as
> > > the 'min_mask' isn't properly set.
> > > 
> > > We create 'dma_direct_min_mask' in order for the arch init code to be
> > > able to fine-tune dma direct's 'min_dma' mask.
> > 
> > Normally we use ZONE_DMA for that case.
> 
> Fair enough, I didn't think of that possibility.
> 
> So would the arm64 maintainers be happy with something like this:
> 
> - ZONE_DMA: Follows standard definition, 16MB in size. ARCH_ZONE_DMA_BITS is
>   left as is.
> - ZONE_DMA32: Will honor the most constraining 'dma-ranges'. Which so far for
> most devices is 4G, except for RPi4.
> - ZONE_NORMAL: The rest of the memory.

Never mind this suggestion, I don't think it makes any sense. If anything arm64
seems to fit the ZONE_DMA usage pattern of arm and powerpc: where ZONE_DMA's
size is decided based on ram size and/or board configuration. It was actually
set-up like this until Christoph's ad67f5a6545f7 ("arm64: replace ZONE_DMA with
ZONE_DMA32").

So the easy solution would be to simply revert that commit. On one hand I feel
it would be a step backwards as most 64 bit architectures have been moving to
use ZONE_DMA32. On the other, current ZONE_DMA32 usage seems to be heavily
rooted on having a 32 bit DMA mask*, which will no longer be the case on arm64
if we want to support the RPi 4.

So the way I see it and lacking a better solution, the argument is stronger on
moving back arm64 to using ZONE_DMA. Any comments/opinions?

Note that I've been looking at all the DMA/CMA/swiotlb code to see if this
would break anything or change behaviors and couldn't find anything obvious. I
also tested the revert on my RPi4 and nothing seems to fail.

* A good example is dma-direct's implementation.

signature.asc
Description: This is a digitally signed message part

Re: [RFC 3/4] dma-direct: add dma_direct_min_mask

2019-07-24 Thread Nicolas Saenz Julienne

On Wed, 2019-07-24 at 15:56 +0200, Christoph Hellwig wrote:
> On Wed, Jul 24, 2019 at 02:51:24PM +0100, Catalin Marinas wrote:
> > I think it may be better if we have both ZONE_DMA and ZONE_DMA32 on
> > arm64. ZONE_DMA would be based on the smallest dma-ranges as described
> > in the DT while DMA32 covers the first naturally aligned 4GB of RAM
> > (unchanged). When a smaller ZONE_DMA is not needed, it could be expanded
> > to cover what would normally be ZONE_DMA32 (or could we have ZONE_DMA as
> > 0-bytes? I don't think GFP_DMA can still allocate memory in this case).
> > 
> > We'd probably have to define ARCH_ZONE_DMA_BITS for arm64 to something
> > smaller than 32-bit but sufficient to cover the known platforms like
> > RPi4 (the current 24 is too small, so maybe 30). AFAICT,
> > __dma_direct_optimal_gfp_mask() figures out whether GFP_DMA or GFP_DMA32
> > should be passed.
> 
> ARCH_ZONE_DMA_BITS should probably become a variable.  That way we can
> just initialize it to the default 24 bits in kernel/dma/direct.c and
> allow architectures to override it in their early boot code.

Thanks both for your feedback. I'll start preparing a proper series.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs

2019-07-24 Thread Nicolas Saenz Julienne

On Tue, 2019-07-09 at 07:20 -0700, Christoph Hellwig wrote:
> The DMA API requires that 32-bit DMA masks are always supported, but on
> arm LPAE configs they do not currently work when memory is present
> above 4GB.  Wire up the swiotlb code like for all other architectures
> to provide the bounce buffering in that case.
> 
> Fixes: 21e07dba9fb11 ("scsi: reduce use of block bounce buffers").
> Reported-by: Roger Quadros 
> Signed-off-by: Christoph Hellwig 
> ---

Hi Chistoph,
Out of curiosity, what is the reason stopping us from using dma-direct/swiotlb
instead of arm_dma_ops altogether?

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 4/8] arm64: re-introduce max_zone_dma_phys()

2019-07-31 Thread Nicolas Saenz Julienne

Some devices might have multiple interconnects with different DMA
addressing limitations. This function provides the higher physical
address accessible by all peripherals on the SoC. If such limitation
doesn't exist it'll return 0.

Signed-off-by: Nicolas Saenz Julienne 
---

 arch/arm64/mm/init.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 8956c22634dd..1c4ffabbe1cb 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -174,6 +174,19 @@ static phys_addr_t __init max_zone_dma32_phys(void)
return min(offset + (1ULL << 32), memblock_end_of_DRAM());
 }
 
+static phys_addr_t __init max_zone_dma_phys(void)
+
+{
+   u64 memory_size = memblock_end_of_DRAM() - memblock_start_of_DRAM();
+   u64 zone_dma_size;
+
+   of_scan_flat_dt(early_init_dt_dma_zone_size, &zone_dma_size);
+   if (zone_dma_size && zone_dma_size < min(memory_size, SZ_4G))
+   return memblock_start_of_DRAM() + zone_dma_size;
+
+   return 0;
+}
+
 #ifdef CONFIG_NUMA
 
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
-- 
2.22.0

[PATCH 5/8] arm64: use ZONE_DMA on DMA addressing limited devices

2019-07-31 Thread Nicolas Saenz Julienne

So far all arm64 devices have supported 32 bit DMA masks for their
peripherals. This is not true anymore for the Raspberry Pi 4. Most of
it's peripherals can only address the first GB or memory of a total of
up to 4 GB.

This goes against ZONE_DMA32's original intent, and breaks other
subsystems as it's expected for ZONE_DMA32 to be addressable with a 32
bit mask. So it was decided to use ZONE_DMA for this specific case.

Devices with with 32 bit DMA addressing support will still bypass
ZONE_DMA but those who don't will create both zones. ZONE_DMA will
contain the memory addressable by all the SoC's devices and ZONE_DMA32
the rest of the 32 bit addressable memory.

Signed-off-by: Nicolas Saenz Julienne 
---

 arch/arm64/Kconfig   |  4 
 arch/arm64/mm/init.c | 38 --
 2 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3adcec05b1f6..a9fd71d3bc8e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -266,6 +266,10 @@ config GENERIC_CSUM
 config GENERIC_CALIBRATE_DELAY
def_bool y
 
+config ZONE_DMA
+   bool "Support DMA zone" if EXPERT
+   default y
+
 config ZONE_DMA32
bool "Support DMA32 zone" if EXPERT
default y
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1c4ffabbe1cb..f5279ef85756 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -50,6 +50,13 @@
 s64 memstart_addr __ro_after_init = -1;
 EXPORT_SYMBOL(memstart_addr);
 
+/*
+ * We might create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA is needed if there
+ * are periferals unable to address the first naturally aligned 4GB of ram.
+ * ZONE_DMA32 will be expanded to cover the rest of that memory. If such
+ * limitations doesn't exist only ZONE_DMA32 is created.
+ */
+phys_addr_t arm64_dma_phys_limit __ro_after_init;
 phys_addr_t arm64_dma32_phys_limit __ro_after_init;
 
 #ifdef CONFIG_KEXEC_CORE
@@ -193,6 +200,9 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 {
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
+#ifdef CONFIG_ZONE_DMA
+   max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
+#endif
 #ifdef CONFIG_ZONE_DMA32
max_zone_pfns[ZONE_DMA32] = PFN_DOWN(arm64_dma32_phys_limit);
 #endif
@@ -207,14 +217,19 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 {
struct memblock_region *reg;
unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
+   unsigned long max_dma = PFN_DOWN(arm64_dma_phys_limit);
unsigned long max_dma32 = min;
 
memset(zone_size, 0, sizeof(zone_size));
 
+#ifdef CONFIG_ZONE_DMA
+   if (max_dma)
+   zone_size[ZONE_DMA] = max_dma - min;
+#endif
/* 4GB maximum for 32-bit only capable devices */
 #ifdef CONFIG_ZONE_DMA32
max_dma32 = PFN_DOWN(arm64_dma32_phys_limit);
-   zone_size[ZONE_DMA32] = max_dma32 - min;
+   zone_size[ZONE_DMA32] = max_dma32 - max_dma - min;
 #endif
zone_size[ZONE_NORMAL] = max - max_dma32;
 
@@ -226,11 +241,17 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 
if (start >= max)
continue;
-
+#ifdef CONFIG_ZONE_DMA
+   if (start < max_dma) {
+   unsigned long dma_end = min_not_zero(end, max_dma);
+   zhole_size[ZONE_DMA] -= dma_end - start;
+   }
+#endif
 #ifdef CONFIG_ZONE_DMA32
if (start < max_dma32) {
-   unsigned long dma_end = min(end, max_dma32);
-   zhole_size[ZONE_DMA32] -= dma_end - start;
+   unsigned long dma32_end = min(end, max_dma32);
+   unsigned long dma32_start = max(start, max_dma);
+   zhole_size[ZONE_DMA32] -= dma32_end - dma32_start;
}
 #endif
if (end > max_dma32) {
@@ -418,6 +439,11 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   arm64_dma_phys_limit = max_zone_dma_phys();
+   else
+   arm64_dma_phys_limit = 0;
+
/* 4GB maximum for 32-bit only capable devices */
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_dma32_phys();
@@ -430,7 +456,7 @@ void __init arm64_memblock_init(void)
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
 
-   dma_contiguous_reserve(arm64_dma32_phys_limit);
+   dma_contiguous_reserve(arm64_dma_phys_limit ? : arm64_dma32_phys_limit);
 }
 
 void __init bootmem_init(void)
@@ -533,7 +559,7 @@ static void __init free_unused_memmap(void)
  */
 void __init mem_init(void)
 {
-   if (swiotlb_force == SWIOTLB_FORCE ||
+   if (swiotlb_force == SWIOTLB_FORCE || arm64_dma_phys_limit ||
max_pfn > (arm6

[PATCH 2/8] arm64: rename variables used to calculate ZONE_DMA32's size

2019-07-31 Thread Nicolas Saenz Julienne

Let the name indicate that they are used to calculate ZONE_DMA32's size
as opposed to ZONE_DMA.

Signed-off-by: Nicolas Saenz Julienne 
---

 arch/arm64/mm/init.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 6112d6c90fa8..8956c22634dd 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -50,7 +50,7 @@
 s64 memstart_addr __ro_after_init = -1;
 EXPORT_SYMBOL(memstart_addr);
 
-phys_addr_t arm64_dma_phys_limit __ro_after_init;
+phys_addr_t arm64_dma32_phys_limit __ro_after_init;
 
 #ifdef CONFIG_KEXEC_CORE
 /*
@@ -168,7 +168,7 @@ static void __init reserve_elfcorehdr(void)
  * currently assumes that for memory starting above 4G, 32-bit devices will
  * use a DMA offset.
  */
-static phys_addr_t __init max_zone_dma_phys(void)
+static phys_addr_t __init max_zone_dma32_phys(void)
 {
phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 32);
return min(offset + (1ULL << 32), memblock_end_of_DRAM());
@@ -181,7 +181,7 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA32
-   max_zone_pfns[ZONE_DMA32] = PFN_DOWN(arm64_dma_phys_limit);
+   max_zone_pfns[ZONE_DMA32] = PFN_DOWN(arm64_dma32_phys_limit);
 #endif
max_zone_pfns[ZONE_NORMAL] = max;
 
@@ -194,16 +194,16 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 {
struct memblock_region *reg;
unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES];
-   unsigned long max_dma = min;
+   unsigned long max_dma32 = min;
 
memset(zone_size, 0, sizeof(zone_size));
 
/* 4GB maximum for 32-bit only capable devices */
 #ifdef CONFIG_ZONE_DMA32
-   max_dma = PFN_DOWN(arm64_dma_phys_limit);
-   zone_size[ZONE_DMA32] = max_dma - min;
+   max_dma32 = PFN_DOWN(arm64_dma32_phys_limit);
+   zone_size[ZONE_DMA32] = max_dma32 - min;
 #endif
-   zone_size[ZONE_NORMAL] = max - max_dma;
+   zone_size[ZONE_NORMAL] = max - max_dma32;
 
memcpy(zhole_size, zone_size, sizeof(zhole_size));
 
@@ -215,14 +215,14 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
continue;
 
 #ifdef CONFIG_ZONE_DMA32
-   if (start < max_dma) {
-   unsigned long dma_end = min(end, max_dma);
+   if (start < max_dma32) {
+   unsigned long dma_end = min(end, max_dma32);
zhole_size[ZONE_DMA32] -= dma_end - start;
}
 #endif
-   if (end > max_dma) {
+   if (end > max_dma32) {
unsigned long normal_end = min(end, max);
-   unsigned long normal_start = max(start, max_dma);
+   unsigned long normal_start = max(start, max_dma32);
zhole_size[ZONE_NORMAL] -= normal_end - normal_start;
}
}
@@ -407,9 +407,9 @@ void __init arm64_memblock_init(void)
 
/* 4GB maximum for 32-bit only capable devices */
if (IS_ENABLED(CONFIG_ZONE_DMA32))
-   arm64_dma_phys_limit = max_zone_dma_phys();
+   arm64_dma32_phys_limit = max_zone_dma32_phys();
else
-   arm64_dma_phys_limit = PHYS_MASK + 1;
+   arm64_dma32_phys_limit = PHYS_MASK + 1;
 
reserve_crashkernel();
 
@@ -417,7 +417,7 @@ void __init arm64_memblock_init(void)
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
 
-   dma_contiguous_reserve(arm64_dma_phys_limit);
+   dma_contiguous_reserve(arm64_dma32_phys_limit);
 }
 
 void __init bootmem_init(void)
@@ -521,7 +521,7 @@ static void __init free_unused_memmap(void)
 void __init mem_init(void)
 {
if (swiotlb_force == SWIOTLB_FORCE ||
-   max_pfn > (arm64_dma_phys_limit >> PAGE_SHIFT))
+   max_pfn > (arm64_dma32_phys_limit >> PAGE_SHIFT))
swiotlb_init(1);
else
swiotlb_force = SWIOTLB_NO_FORCE;
-- 
2.22.0

[PATCH 3/8] of/fdt: add function to get the SoC wide DMA addressable memory size

2019-07-31 Thread Nicolas Saenz Julienne

Some SoCs might have multiple interconnects each with their own DMA
addressing limitations. This function parses the 'dma-ranges' on each of
them and tries to guess the maximum SoC wide DMA addressable memory
size.

This is specially useful for arch code in order to properly setup CMA
and memory zones.

Signed-off-by: Nicolas Saenz Julienne 
---

 drivers/of/fdt.c   | 72 ++
 include/linux/of_fdt.h |  2 ++
 2 files changed, 74 insertions(+)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 9cdf14b9aaab..f2444c61a136 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -953,6 +953,78 @@ int __init early_init_dt_scan_chosen_stdout(void)
 }
 #endif
 
+/**
+ * early_init_dt_dma_zone_size - Look at all 'dma-ranges' and provide the
+ * maximum common dmable memory size.
+ *
+ * Some devices might have multiple interconnects each with their own DMA
+ * addressing limitations. For example the Raspberry Pi 4 has the following:
+ *
+ * soc {
+ * dma-ranges = <0xc000  0x0 0x  0x3c00>;
+ * [...]
+ * }
+ *
+ * v3dbus {
+ * dma-ranges = <0x  0x0 0x  0x3c00>;
+ * [...]
+ * }
+ *
+ * scb {
+ * dma-ranges = <0x0 0x  0x0 0x  0xfc00>;
+ * [...]
+ * }
+ *
+ * Here the area addressable by all devices is [0x-0x3bff]. Hence
+ * the function will write in 'data' a size of 0x3c00.
+ *
+ * Note that the implementation assumes all interconnects have the same 
physical
+ * memory view and that the mapping always start at the beginning of RAM.
+ */
+int __init early_init_dt_dma_zone_size(unsigned long node, const char *uname,
+  int depth, void *data)
+{
+   const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+   u64 phys_addr, dma_addr, size;
+   u64 *dma_zone_size = data;
+   int dma_addr_cells;
+   const __be32 *reg;
+   const void *prop;
+   int len;
+
+   if (depth == 0)
+   *dma_zone_size = 0;
+
+   /*
+* We avoid pci host controllers as they have their own way of using
+* 'dma-ranges'.
+*/
+   if (type && !strcmp(type, "pci"))
+   return 0;
+
+   reg = of_get_flat_dt_prop(node, "dma-ranges", &len);
+   if (!reg)
+   return 0;
+
+   prop = of_get_flat_dt_prop(node, "#address-cells", NULL);
+   if (prop)
+   dma_addr_cells = be32_to_cpup(prop);
+   else
+   dma_addr_cells = 1; /* arm64's default addr_cell size */
+
+   if (len < (dma_addr_cells + dt_root_addr_cells + dt_root_size_cells))
+   return 0;
+
+   dma_addr = dt_mem_next_cell(dma_addr_cells, ®);
+   phys_addr = dt_mem_next_cell(dt_root_addr_cells, ®);
+   size = dt_mem_next_cell(dt_root_size_cells, ®);
+
+   if (!*dma_zone_size || *dma_zone_size > size)
+   *dma_zone_size = size;
+
+   return 0;
+}
+
 /**
  * early_init_dt_scan_root - fetch the top level address and size cells
  */
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index acf820e88952..2ad36b7bd4fa 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -72,6 +72,8 @@ extern int early_init_dt_reserve_memory_arch(phys_addr_t 
base, phys_addr_t size,
 bool no_map);
 extern u64 dt_mem_next_cell(int s, const __be32 **cellp);
 
+extern int early_init_dt_dma_zone_size(unsigned long node, const char *uname,
+  int depth, void *data);
 /* Early flat tree scan hooks */
 extern int early_init_dt_scan_root(unsigned long node, const char *uname,
   int depth, void *data);
-- 
2.22.0

[PATCH 6/8] dma-direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-07-31 Thread Nicolas Saenz Julienne

Some architectures, notably arm64, are interested in tweaking this
depending on their runtime dma addressing limitations.

Signed-off-by: Nicolas Saenz Julienne 
---

 arch/powerpc/include/asm/page.h |  9 -
 arch/powerpc/mm/mem.c   | 14 --
 arch/s390/include/asm/page.h|  2 --
 arch/s390/mm/init.c |  1 +
 include/linux/dma-direct.h  |  2 ++
 kernel/dma/direct.c |  8 +++-
 6 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index 0d52f57fca04..73668a21ae78 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -319,13 +319,4 @@ struct vm_area_struct;
 #endif /* __ASSEMBLY__ */
 #include 
 
-/*
- * Allow 30-bit DMA for very limited Broadcom wifi chips on many powerbooks.
- */
-#ifdef CONFIG_PPC32
-#define ARCH_ZONE_DMA_BITS 30
-#else
-#define ARCH_ZONE_DMA_BITS 31
-#endif
-
 #endif /* _ASM_POWERPC_PAGE_H */
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 9191a66b3bc5..3792a998ca02 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -201,7 +202,7 @@ static int __init mark_nonram_nosave(void)
  * everything else. GFP_DMA32 page allocations automatically fall back to
  * ZONE_DMA.
  *
- * By using 31-bit unconditionally, we can exploit ARCH_ZONE_DMA_BITS to
+ * By using 31-bit unconditionally, we can exploit arch_zone_dma_bits to
  * inform the generic DMA mapping code.  32-bit only devices (if not handled
  * by an IOMMU anyway) will take a first dip into ZONE_NORMAL and get
  * otherwise served by ZONE_DMA.
@@ -237,9 +238,18 @@ void __init paging_init(void)
printk(KERN_DEBUG "Memory hole size: %ldMB\n",
   (long int)((top_of_ram - total_ram) >> 20));
 
+   /*
+* Allow 30-bit DMA for very limited Broadcom wifi chips on many
+* powerbooks.
+*/
+   if (IS_ENABLED(CONFIG_PPC32))
+   arch_zone_dma_bits = 30;
+   else
+   arch_zone_dma_bits = 31;
+
 #ifdef CONFIG_ZONE_DMA
max_zone_pfns[ZONE_DMA] = min(max_low_pfn,
- 1UL << (ARCH_ZONE_DMA_BITS - PAGE_SHIFT));
+ 1UL << (arch_zone_dma_bits - PAGE_SHIFT));
 #endif
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h
index 823578c6b9e2..a4d38092530a 100644
--- a/arch/s390/include/asm/page.h
+++ b/arch/s390/include/asm/page.h
@@ -177,8 +177,6 @@ static inline int devmem_is_allowed(unsigned long pfn)
 #define VM_DATA_DEFAULT_FLAGS  (VM_READ | VM_WRITE | \
 VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
 
-#define ARCH_ZONE_DMA_BITS 31
-
 #include 
 #include 
 
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 20340a03ad90..07d93955d3e4 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -118,6 +118,7 @@ void __init paging_init(void)
 
sparse_memory_present_with_active_regions(MAX_NUMNODES);
sparse_init();
+   arch_zone_dma_bits = 31;
memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS);
max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index adf993a3bd58..a1b353b77858 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -5,6 +5,8 @@
 #include 
 #include 
 
+extern unsigned int arch_zone_dma_bits;
+
 #ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA
 #include 
 #else
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 59bdceea3737..40dfc9b4ee4c 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -19,9 +19,7 @@
  * Most architectures use ZONE_DMA for the first 16 Megabytes, but
  * some use it for entirely different regions:
  */
-#ifndef ARCH_ZONE_DMA_BITS
-#define ARCH_ZONE_DMA_BITS 24
-#endif
+unsigned int arch_zone_dma_bits __ro_after_init = 24;
 
 static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size)
 {
@@ -72,7 +70,7 @@ static gfp_t __dma_direct_optimal_gfp_mask(struct device 
*dev, u64 dma_mask,
 * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
 * zones.
 */
-   if (*phys_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS))
+   if (*phys_mask <= DMA_BIT_MASK(arch_zone_dma_bits))
return GFP_DMA;
if (*phys_mask <= DMA_BIT_MASK(32))
return GFP_DMA32;
@@ -387,7 +385,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
u64 min_mask;
 
if (IS_ENABLED(CONFIG_ZONE_DMA))
-   min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS);
+   min_mask = DMA_BIT_MASK(arch_zone_dma_bits);
else
min_mask = DMA_BIT_MASK(32);
 
-- 
2.22.0

[PATCH 8/8] mm: comment arm64's usage of 'enum zone_type'

2019-07-31 Thread Nicolas Saenz Julienne

arm64 uses both ZONE_DMA and ZONE_DMA32 for the same reasons x86_64
does: peripherals with different DMA addressing limitations. This
updates both ZONE_DMAs comments to inform about the usage.

Signed-off-by: Nicolas Saenz Julienne 

---

 include/linux/mmzone.h | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d77d717c620c..8fa6bcf72e7c 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -365,23 +365,24 @@ enum zone_type {
 *
 * Some examples
 *
-* Architecture Limit
-* ---
-* parisc, ia64, sparc  <4G
-* s390, powerpc<2G
-* arm  Various
-* alphaUnlimited or 0-16MB.
+* Architecture Limit
+* --
+* parisc, ia64, sparc, arm64   <4G
+* s390, powerpc<2G
+* arm  Various
+* alphaUnlimited or 0-16MB.
 *
 * i386, x86_64 and multiple other arches
-*  <16M.
+*  <16M.
 */
ZONE_DMA,
 #endif
 #ifdef CONFIG_ZONE_DMA32
/*
-* x86_64 needs two ZONE_DMAs because it supports devices that are
-* only able to do DMA to the lower 16M but also 32 bit devices that
-* can only do DMA areas below 4G.
+* x86_64 and arm64 need two ZONE_DMAs because they support devices
+* that are only able to DMA a fraction of the 32 bit addressable
+* memory area, but also devices that are limited to that whole 32 bit
+* area.
 */
ZONE_DMA32,
 #endif
-- 
2.22.0

[PATCH 7/8] arm64: update arch_zone_dma_bits to fine tune dma-direct min mask

2019-07-31 Thread Nicolas Saenz Julienne

With the introduction of ZONE_DMA in arm64 devices are not forced to
support 32 bit DMA masks. We have to inform dma-direct of this
limitation whenever it happens.

Signed-off-by: Nicolas Saenz Julienne 
---

 arch/arm64/mm/init.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f5279ef85756..b809f3259340 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -439,10 +440,14 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
arm64_dma_phys_limit = max_zone_dma_phys();
-   else
+
+   if (arm64_dma_phys_limit)
+   arch_zone_dma_bits = ilog2(arm64_dma_phys_limit) + 1;
+   } else {
arm64_dma_phys_limit = 0;
+   }
 
/* 4GB maximum for 32-bit only capable devices */
if (IS_ENABLED(CONFIG_ZONE_DMA32))
-- 
2.22.0

[PATCH 1/8] arm64: mm: use arm64_dma_phys_limit instead of calling max_zone_dma_phys()

2019-07-31 Thread Nicolas Saenz Julienne

By the time we call zones_sizes_init() arm64_dma_phys_limit already
contains the result of max_zone_dma_phys(). We use the variable instead
of calling the function directly to save some precious cpu time.

Signed-off-by: Nicolas Saenz Julienne 
---

 arch/arm64/mm/init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f3c795278def..6112d6c90fa8 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -181,7 +181,7 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA32
-   max_zone_pfns[ZONE_DMA32] = PFN_DOWN(max_zone_dma_phys());
+   max_zone_pfns[ZONE_DMA32] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
max_zone_pfns[ZONE_NORMAL] = max;
 
-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/8] Raspberry Pi 4 DMA addressing support

2019-07-31 Thread Nicolas Saenz Julienne

Hi all,
this series attempts to address some issues we found while bringing up
the new Raspberry Pi 4 in arm64 and it's intended to serve as a follow
up of this discussion:
https://lkml.org/lkml/2019/7/17/476

The new Raspberry Pi 4 has up to 4GB of memory but most peripherals can
only address the first GB: their DMA address range is
0xc000-0xfc00 which is aliased to the first GB of physical
memory 0x-0x3c00. Note that only some peripherals have these
limitations: the ARM cores, PCIe, V3D, GENET, and 40-bit DMA channels
have a wider view of the address space.

Part of this is solved in arm32 by setting up the machine specific
'.dma_zone_size = SZ_1G', which takes care of the allocating the
coherent memory area at the right spot. Yet no buffer bouncing (needed
for dma streaming) is available at the moment, but that's a story for
another series.

Unfortunately there is no such thing as '.dma_zone_size' in arm64 also
only ZONE_DMA32 is created which is interpreted by dma-direct and the
arm64 code as if all peripherals where be able to address the first 4GB
of memory.

In the light of this, the series implements the following changes:

- Add code that parses the device-tree in oder to find the SoC's common
  DMA area.

- Create a ZONE_DMA whenever that area is needed and add the rest of the
  lower 4 GB of memory to ZONE_DMA32*.

- Create the CMA area in a place suitable for all peripherals.

- Inform dma-direct of the new runtime calculated min_mask*.

That's all.

Regards,
Nicolas

* These solutions where already discussed on the previous RFC (see link
above).

---

Nicolas Saenz Julienne (8):
  arm64: mm: use arm64_dma_phys_limit instead of calling
max_zone_dma_phys()
  arm64: rename variables used to calculate ZONE_DMA32's size
  of/fdt: add function to get the SoC wide DMA addressable memory size
  arm64: re-introduce max_zone_dma_phys()
  arm64: use ZONE_DMA on DMA addressing limited devices
  dma-direct: turn ARCH_ZONE_DMA_BITS into a variable
  arm64: update arch_zone_dma_bits to fine tune dma-direct min mask
  mm: comment arm64's usage of 'enum zone_type'

 arch/arm64/Kconfig  |  4 ++
 arch/arm64/mm/init.c| 78 ++---
 arch/powerpc/include/asm/page.h |  9 
 arch/powerpc/mm/mem.c   | 14 +-
 arch/s390/include/asm/page.h|  2 -
 arch/s390/mm/init.c |  1 +
 drivers/of/fdt.c| 72 ++
 include/linux/dma-direct.h  |  2 +
 include/linux/mmzone.h  | 21 -
 include/linux/of_fdt.h  |  2 +
 kernel/dma/direct.c |  8 ++--
 11 files changed, 168 insertions(+), 45 deletions(-)

-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 5/8] arm64: use ZONE_DMA on DMA addressing limited devices

2019-08-01 Thread Nicolas Saenz Julienne

On Wed, 2019-07-31 at 18:07 +0100, Catalin Marinas wrote:
> On Wed, Jul 31, 2019 at 05:47:48PM +0200, Nicolas Saenz Julienne wrote:
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 1c4ffabbe1cb..f5279ef85756 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -50,6 +50,13 @@
> >  s64 memstart_addr __ro_after_init = -1;
> >  EXPORT_SYMBOL(memstart_addr);
> >  
> > +/*
> > + * We might create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA is needed if
> > there
> > + * are periferals unable to address the first naturally aligned 4GB of ram.
> > + * ZONE_DMA32 will be expanded to cover the rest of that memory. If such
> > + * limitations doesn't exist only ZONE_DMA32 is created.
> > + */
> 
> Shouldn't we instead only create ZONE_DMA to cover the whole 32-bit
> range and leave ZONE_DMA32 empty? Can__GFP_DMA allocations fall back
> onto ZONE_DMA32?

Hi Catalin, thanks for the review.

You're right, the GFP_DMA page allocation will fail with a nasty dmesg error if
ZONE_DMA is configured but empty. Unsurprisingly the opposite situation is fine
(GFP_DMA32 with an empty ZONE_DMA32).

I switched to the scheme you're suggesting for the next version of the series.
The comment will be something the likes of this:

/*
 * We create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA's size is decided based
 * on whether the SoC's peripherals are able to address the first naturally
 * aligned 4 GB of ram.
 *
 * If limited, ZONE_DMA covers that area and ZONE_DMA32 the rest of that 32 bit
 * addressable memory.
 *
 * If not ZONE_DMA is expanded to cover the whole 32 bit addressable memory and
 * ZONE_DMA32 is left empty.
 */

 Regards,
 Nicolas

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 6/8] dma-direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-08-01 Thread Nicolas Saenz Julienne

Hi Christoph, thanks for the review.

On Thu, 2019-08-01 at 16:04 +0200, Christoph Hellwig wrote:
> A few nitpicks, otherwise this looks great:
> 
> > @@ -201,7 +202,7 @@ static int __init mark_nonram_nosave(void)
> >   * everything else. GFP_DMA32 page allocations automatically fall back to
> >   * ZONE_DMA.
> >   *
> > - * By using 31-bit unconditionally, we can exploit ARCH_ZONE_DMA_BITS to
> > + * By using 31-bit unconditionally, we can exploit arch_zone_dma_bits to
> >   * inform the generic DMA mapping code.  32-bit only devices (if not
> > handled
> >   * by an IOMMU anyway) will take a first dip into ZONE_NORMAL and get
> >   * otherwise served by ZONE_DMA.
> > @@ -237,9 +238,18 @@ void __init paging_init(void)
> > printk(KERN_DEBUG "Memory hole size: %ldMB\n",
> >(long int)((top_of_ram - total_ram) >> 20));
> >  
> > +   /*
> > +* Allow 30-bit DMA for very limited Broadcom wifi chips on many
> > +* powerbooks.
> > +*/
> > +   if (IS_ENABLED(CONFIG_PPC32))
> > +   arch_zone_dma_bits = 30;
> > +   else
> > +   arch_zone_dma_bits = 31;
> > +
> 
> So the above unconditionally comment obviously isn't true any more, and
> Ben also said for the recent ppc32 hack he'd prefer dynamic detection.
> 
> Maybe Ben and or other ppc folks can chime in an add a patch to the series
> to sort this out now that we have a dynamic ZONE_DMA threshold?

Noted, for now I'll remove the comment.

> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 59bdceea3737..40dfc9b4ee4c 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -19,9 +19,7 @@
> >   * Most architectures use ZONE_DMA for the first 16 Megabytes, but
> >   * some use it for entirely different regions:
> >   */
> > -#ifndef ARCH_ZONE_DMA_BITS
> > -#define ARCH_ZONE_DMA_BITS 24
> > -#endif
> > +unsigned int arch_zone_dma_bits __ro_after_init = 24;
> 
> I'd prefer to drop the arch_ prefix and just calls this zone_dma_bits.
> In the long run we really need to find a way to just automatically set
> this from the meminit code, but that is out of scope for this series.
> For now can you please just update the comment above to say something
> like:
> 
> /*
>  * Most architectures use ZONE_DMA for the first 16 Megabytes, but some use it
>  * it for entirely different regions.  In that case the arch code needs to
>  * override the variable below for dma-direct to work properly.
>  */

Ok perfect.



signature.asc
Description: This is a digitally signed message part

Re: [PATCH 5/8] arm64: use ZONE_DMA on DMA addressing limited devices

2019-08-01 Thread Nicolas Saenz Julienne

On Thu, 2019-08-01 at 17:07 +0100, Robin Murphy wrote:
> On 2019-08-01 4:44 pm, Nicolas Saenz Julienne wrote:
> > On Wed, 2019-07-31 at 18:07 +0100, Catalin Marinas wrote:
> > > On Wed, Jul 31, 2019 at 05:47:48PM +0200, Nicolas Saenz Julienne wrote:
> > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > > > index 1c4ffabbe1cb..f5279ef85756 100644
> > > > --- a/arch/arm64/mm/init.c
> > > > +++ b/arch/arm64/mm/init.c
> > > > @@ -50,6 +50,13 @@
> > > >   s64 memstart_addr __ro_after_init = -1;
> > > >   EXPORT_SYMBOL(memstart_addr);
> > > >   
> > > > +/*
> > > > + * We might create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA is needed
> > > > if
> > > > there
> > > > + * are periferals unable to address the first naturally aligned 4GB of
> > > > ram.
> > > > + * ZONE_DMA32 will be expanded to cover the rest of that memory. If
> > > > such
> > > > + * limitations doesn't exist only ZONE_DMA32 is created.
> > > > + */
> > > 
> > > Shouldn't we instead only create ZONE_DMA to cover the whole 32-bit
> > > range and leave ZONE_DMA32 empty? Can__GFP_DMA allocations fall back
> > > onto ZONE_DMA32?
> > 
> > Hi Catalin, thanks for the review.
> > 
> > You're right, the GFP_DMA page allocation will fail with a nasty dmesg error
> > if
> > ZONE_DMA is configured but empty. Unsurprisingly the opposite situation is
> > fine
> > (GFP_DMA32 with an empty ZONE_DMA32).
> 
> Was that tested on something other than RPi4 with more than 4GB of RAM? 
> (i.e. with a non-empty ZONE_NORMAL either way)

No, all I did is play around with RPi4's memory size (1 GB vs 4 GB).

I'll see If I can get access to a dts based board with more than 4 GB, If not
I'll try to fake it. It's not ideal but I can set the limit on 3 GB and have
the 3 areas created (with and witouth an empty ZONE_DMA32).

On top of that, now that you ask, I realise I neglected all the ACPI based
servers. I have access to some so I'll make sure I test everything on them too
for the next series.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part

Re: [PATCH 3/8] of/fdt: add function to get the SoC wide DMA addressable memory size

2019-08-05 Thread Nicolas Saenz Julienne

Hi Rob,
Thanks for the review!

On Fri, 2019-08-02 at 11:17 -0600, Rob Herring wrote:
> On Wed, Jul 31, 2019 at 9:48 AM Nicolas Saenz Julienne
>  wrote:
> > Some SoCs might have multiple interconnects each with their own DMA
> > addressing limitations. This function parses the 'dma-ranges' on each of
> > them and tries to guess the maximum SoC wide DMA addressable memory
> > size.
> > 
> > This is specially useful for arch code in order to properly setup CMA
> > and memory zones.
> 
> We already have a way to setup CMA in reserved-memory, so why is this
> needed for that?

Correct me if I'm wrong but I got the feeling you got the point of the patch
later on.

> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> > 
> >  drivers/of/fdt.c   | 72 ++
> >  include/linux/of_fdt.h |  2 ++
> >  2 files changed, 74 insertions(+)
> > 
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index 9cdf14b9aaab..f2444c61a136 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -953,6 +953,78 @@ int __init early_init_dt_scan_chosen_stdout(void)
> >  }
> >  #endif
> > 
> > +/**
> > + * early_init_dt_dma_zone_size - Look at all 'dma-ranges' and provide the
> > + * maximum common dmable memory size.
> > + *
> > + * Some devices might have multiple interconnects each with their own DMA
> > + * addressing limitations. For example the Raspberry Pi 4 has the
> > following:
> > + *
> > + * soc {
> > + * dma-ranges = <0xc000  0x0 0x  0x3c00>;
> > + * [...]
> > + * }
> > + *
> > + * v3dbus {
> > + * dma-ranges = <0x  0x0 0x  0x3c00>;
> > + * [...]
> > + * }
> > + *
> > + * scb {
> > + * dma-ranges = <0x0 0x  0x0 0x  0xfc00>;
> > + * [...]
> > + * }
> > + *
> > + * Here the area addressable by all devices is [0x-0x3bff].
> > Hence
> > + * the function will write in 'data' a size of 0x3c00.
> > + *
> > + * Note that the implementation assumes all interconnects have the same
> > physical
> > + * memory view and that the mapping always start at the beginning of RAM.
> 
> Not really a valid assumption for general code.

Fair enough. On my defence I settled on that assumption after grepping all dts
and being unable to find a board that behaved otherwise.

[...]

> It's possible to have multiple levels of nodes and dma-ranges. You need to
> handle that case too. Doing that and handling differing address translations
> will be complicated.

Understood.

> IMO, I'd just do:
> 
> if (of_fdt_machine_is_compatible(blob, "brcm,bcm2711"))
> dma_zone_size = XX;
> 
> 2 lines of code is much easier to maintain than 10s of incomplete code
> and is clearer who needs this. Maybe if we have dozens of SoCs with
> this problem we should start parsing dma-ranges.

FYI that's what arm32 is doing at the moment and was my first instinct. But it
seems that arm64 has been able to survive so far without any machine specific
code and I have the feeling Catalin and Will will not be happy about this
solution. Am I wrong?



signature.asc
Description: This is a digitally signed message part

Re: [PATCH 3/8] of/fdt: add function to get the SoC wide DMA addressable memory size

2019-08-06 Thread Nicolas Saenz Julienne

Hi Rob,

On Mon, 2019-08-05 at 13:23 -0600, Rob Herring wrote:
> On Mon, Aug 5, 2019 at 10:03 AM Nicolas Saenz Julienne
>  wrote:
> > Hi Rob,
> > Thanks for the review!
> > 
> > On Fri, 2019-08-02 at 11:17 -0600, Rob Herring wrote:
> > > On Wed, Jul 31, 2019 at 9:48 AM Nicolas Saenz Julienne
> > >  wrote:
> > > > Some SoCs might have multiple interconnects each with their own DMA
> > > > addressing limitations. This function parses the 'dma-ranges' on each of
> > > > them and tries to guess the maximum SoC wide DMA addressable memory
> > > > size.
> > > > 
> > > > This is specially useful for arch code in order to properly setup CMA
> > > > and memory zones.
> > > 
> > > We already have a way to setup CMA in reserved-memory, so why is this
> > > needed for that?
> > 
> > Correct me if I'm wrong but I got the feeling you got the point of the patch
> > later on.
> 
> No, for CMA I don't. Can't we already pass a size and location for CMA
> region under /reserved-memory. The only advantage here is perhaps the
> CMA range could be anywhere in the DMA zone vs. a fixed location.

Now I get it, sorry I wasn't aware of that interface.

Still, I'm not convinced it matches RPi's use case as this would hard-code
CMA's size. Most people won't care, but for the ones that do, it's nicer to
change the value from the kernel command line than editing the dtb. I get that
if you need to, for example, reserve some memory for the video to work, it's
silly not to hard-code it. Yet due to the board's nature and users base I say
it's important to favor flexibility. It would also break compatibility with
earlier versions of the board and diverge from the downstream kernel behaviour.
Which is a bigger issue than it seems as most users don't always understand
which kernel they are running and unknowingly copy configuration options from
forums.

As I also need to know the DMA addressing limitations to properly configure
memory zones and dma-direct. Setting up the proper CMA constraints during the
arch's init will be trivial anyway.

> > > IMO, I'd just do:
> > > 
> > > if (of_fdt_machine_is_compatible(blob, "brcm,bcm2711"))
> > > dma_zone_size = XX;
> > > 
> > > 2 lines of code is much easier to maintain than 10s of incomplete code
> > > and is clearer who needs this. Maybe if we have dozens of SoCs with
> > > this problem we should start parsing dma-ranges.
> > 
> > FYI that's what arm32 is doing at the moment and was my first instinct. But
> > it
> > seems that arm64 has been able to survive so far without any machine
> > specific
> > code and I have the feeling Catalin and Will will not be happy about this
> > solution. Am I wrong?
> 
> No doubt. I'm fine if the 2 lines live in drivers/of/.
> 
> Note that I'm trying to reduce the number of early_init_dt_scan_*
> calls from arch code into the DT code so there's more commonality
> across architectures in the early DT scans. So ideally, this can all
> be handled under early_init_dt_scan() call.

How does this look? (I'll split it in two patches and add a comment explaining
why dt_dma_zone_size is needed)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index f2444c61a136..1395be40b722 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -30,6 +30,8 @@
 
 #include "of_private.h"
 
+u64 dt_dma_zone_size __ro_after_init;
+
 /*
  * of_fdt_limit_memory - limit the number of regions in the /memory node
  * @limit: maximum entries
@@ -802,6 +805,11 @@ const char * __init of_flat_dt_get_machine_name(void)
return name;
 }
 
+static const int __init of_fdt_machine_is_compatible(char *name)
+{
+   return of_compat_cmp(of_flat_dt_get_machine_name(), name, strlen(name));
+}
+
 /**
  * of_flat_dt_match_machine - Iterate match tables to find matching machine.
  *
@@ -1260,6 +1268,14 @@ void __init early_init_dt_scan_nodes(void)
of_scan_flat_dt(early_init_dt_scan_memory, NULL);
 }
 
+void __init early_init_dt_get_dma_zone_size(void)
+{
+   dt_dma_zone_size = 0;
+
+   if (of_fdt_machine_is_compatible("brcm,bcm2711"))
+   dt_dma_zone_size = 0x3c00;
+}
+
 bool __init early_init_dt_scan(void *params)
 {
bool status;
@@ -1269,6 +1285,7 @@ bool __init early_init_dt_scan(void *params)
return false;
 
early_init_dt_scan_nodes();
+   early_init_dt_get_dma_zone_size();
return true;
 }
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 2ad36b7bd4fa..b5a9f685de14 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -27,6 +27,8

Re: [PATCH 3/8] of/fdt: add function to get the SoC wide DMA addressable memory size

2019-08-08 Thread Nicolas Saenz Julienne

On Thu, 2019-08-08 at 09:02 -0600, Rob Herring wrote:
> On Tue, Aug 6, 2019 at 12:12 PM Nicolas Saenz Julienne
>  wrote:
> > Hi Rob,
> > 
> > On Mon, 2019-08-05 at 13:23 -0600, Rob Herring wrote:
> > > On Mon, Aug 5, 2019 at 10:03 AM Nicolas Saenz Julienne
> > >  wrote:
> > > > Hi Rob,
> > > > Thanks for the review!
> > > > 
> > > > On Fri, 2019-08-02 at 11:17 -0600, Rob Herring wrote:
> > > > > On Wed, Jul 31, 2019 at 9:48 AM Nicolas Saenz Julienne
> > > > >  wrote:
> > > > > > Some SoCs might have multiple interconnects each with their own DMA
> > > > > > addressing limitations. This function parses the 'dma-ranges' on
> > > > > > each of
> > > > > > them and tries to guess the maximum SoC wide DMA addressable memory
> > > > > > size.
> > > > > > 
> > > > > > This is specially useful for arch code in order to properly setup
> > > > > > CMA
> > > > > > and memory zones.
> > > > > 
> > > > > We already have a way to setup CMA in reserved-memory, so why is this
> > > > > needed for that?
> > > > 
> > > > Correct me if I'm wrong but I got the feeling you got the point of the
> > > > patch
> > > > later on.
> > > 
> > > No, for CMA I don't. Can't we already pass a size and location for CMA
> > > region under /reserved-memory. The only advantage here is perhaps the
> > > CMA range could be anywhere in the DMA zone vs. a fixed location.
> > 
> > Now I get it, sorry I wasn't aware of that interface.
> > 
> > Still, I'm not convinced it matches RPi's use case as this would hard-code
> > CMA's size. Most people won't care, but for the ones that do, it's nicer to
> > change the value from the kernel command line than editing the dtb.
> 
> Sure, I fully agree and am not a fan of the CMA DT overlays I've seen.
> 
> > I get that
> > if you need to, for example, reserve some memory for the video to work, it's
> > silly not to hard-code it. Yet due to the board's nature and users base I
> > say
> > it's important to favor flexibility. It would also break compatibility with
> > earlier versions of the board and diverge from the downstream kernel
> > behaviour.
> > Which is a bigger issue than it seems as most users don't always understand
> > which kernel they are running and unknowingly copy configuration options
> > from
> > forums.
> > 
> > As I also need to know the DMA addressing limitations to properly configure
> > memory zones and dma-direct. Setting up the proper CMA constraints during
> > the
> > arch's init will be trivial anyway.
> 
> It was really just commentary on commit text as for CMA alone we have
> a solution already. I agree on the need for zones.

Ok, understood :)

> > > > > IMO, I'd just do:
> > > > > 
> > > > > if (of_fdt_machine_is_compatible(blob, "brcm,bcm2711"))
> > > > > dma_zone_size = XX;
> > > > > 
> > > > > 2 lines of code is much easier to maintain than 10s of incomplete code
> > > > > and is clearer who needs this. Maybe if we have dozens of SoCs with
> > > > > this problem we should start parsing dma-ranges.
> > > > 
> > > > FYI that's what arm32 is doing at the moment and was my first instinct.
> > > > But
> > > > it
> > > > seems that arm64 has been able to survive so far without any machine
> > > > specific
> > > > code and I have the feeling Catalin and Will will not be happy about
> > > > this
> > > > solution. Am I wrong?
> > > 
> > > No doubt. I'm fine if the 2 lines live in drivers/of/.
> > > 
> > > Note that I'm trying to reduce the number of early_init_dt_scan_*
> > > calls from arch code into the DT code so there's more commonality
> > > across architectures in the early DT scans. So ideally, this can all
> > > be handled under early_init_dt_scan() call.
> > 
> > How does this look? (I'll split it in two patches and add a comment
> > explaining
> > why dt_dma_zone_size is needed)
> > 
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index f2444c61a136..1395be40b722 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -30,6 +30,8 @@
> > 
> >  #include "of_private.h"
> > 
> > +u64 dt_dma_zone_size __ro_after_init;
> 
> Avoiding a call from arch code by just having a variable isn't really
> better. I'd rather see a common, non DT specific variable that can be
> adjusted. Something similar to initrd_start/end. Then the arch code
> doesn't have to care what hardware description code adjusted the
> value.

Way better, I'll update it.



signature.asc
Description: This is a digitally signed message part

[PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-07 Thread Nicolas Saenz Julienne

When allocating atomic DMA memory for a device, the dma-pool core
queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
use. It turns out the GFP flag returned is only an optimistic guess.
The pool selected might sometimes live in a zone higher than the
device's view of memory.

As there isn't a way to grantee a mapping between a device's DMA
constraints and correct GFP flags this unifies both DMA atomic pools.
The resulting pool is allocated in the lower DMA zone available, if any,
so as for devices to always get accessible memory while having the
flexibility of using dma_pool_kernel for the non constrained ones.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
mask")
Reported-by: Jeremy Linton 
Suggested-by: Robin Murphy 
Signed-off-by: Nicolas Saenz Julienne 
---
 kernel/dma/pool.c | 47 +++
 1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 8cfa01243ed2..883f7a583969 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -13,10 +13,11 @@
 #include 
 #include 
 
+#define GFP_ATOMIC_POOL_DMA(IS_ENABLED(CONFIG_ZONE_DMA) ? GFP_DMA : \
+IS_ENABLED(CONFIG_ZONE_DMA32) ? GFP_DMA32 : 0)
+
 static struct gen_pool *atomic_pool_dma __ro_after_init;
 static unsigned long pool_size_dma;
-static struct gen_pool *atomic_pool_dma32 __ro_after_init;
-static unsigned long pool_size_dma32;
 static struct gen_pool *atomic_pool_kernel __ro_after_init;
 static unsigned long pool_size_kernel;
 
@@ -42,16 +43,13 @@ static void __init dma_atomic_pool_debugfs_init(void)
return;
 
debugfs_create_ulong("pool_size_dma", 0400, root, &pool_size_dma);
-   debugfs_create_ulong("pool_size_dma32", 0400, root, &pool_size_dma32);
debugfs_create_ulong("pool_size_kernel", 0400, root, &pool_size_kernel);
 }
 
 static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)
 {
-   if (gfp & __GFP_DMA)
+   if (gfp & GFP_ATOMIC_POOL_DMA)
pool_size_dma += size;
-   else if (gfp & __GFP_DMA32)
-   pool_size_dma32 += size;
else
pool_size_kernel += size;
 }
@@ -132,12 +130,11 @@ static void atomic_pool_resize(struct gen_pool *pool, 
gfp_t gfp)
 
 static void atomic_pool_work_fn(struct work_struct *work)
 {
-   if (IS_ENABLED(CONFIG_ZONE_DMA))
-   atomic_pool_resize(atomic_pool_dma,
-  GFP_KERNEL | GFP_DMA);
-   if (IS_ENABLED(CONFIG_ZONE_DMA32))
-   atomic_pool_resize(atomic_pool_dma32,
-  GFP_KERNEL | GFP_DMA32);
+   gfp_t dma_gfp = GFP_ATOMIC_POOL_DMA;
+
+   if (dma_gfp)
+   atomic_pool_resize(atomic_pool_dma, GFP_KERNEL | dma_gfp);
+
atomic_pool_resize(atomic_pool_kernel, GFP_KERNEL);
 }
 
@@ -168,6 +165,7 @@ static __init struct gen_pool 
*__dma_atomic_pool_init(size_t pool_size,
 
 static int __init dma_atomic_pool_init(void)
 {
+   gfp_t dma_gfp = GFP_ATOMIC_POOL_DMA;
int ret = 0;
 
/*
@@ -185,18 +183,13 @@ static int __init dma_atomic_pool_init(void)
GFP_KERNEL);
if (!atomic_pool_kernel)
ret = -ENOMEM;
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+
+   if (dma_gfp) {
atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size,
-   GFP_KERNEL | GFP_DMA);
+GFP_KERNEL | dma_gfp);
if (!atomic_pool_dma)
ret = -ENOMEM;
}
-   if (IS_ENABLED(CONFIG_ZONE_DMA32)) {
-   atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size,
-   GFP_KERNEL | GFP_DMA32);
-   if (!atomic_pool_dma32)
-   ret = -ENOMEM;
-   }
 
dma_atomic_pool_debugfs_init();
return ret;
@@ -206,14 +199,12 @@ postcore_initcall(dma_atomic_pool_init);
 static inline struct gen_pool *dev_to_pool(struct device *dev)
 {
u64 phys_mask;
-   gfp_t gfp;
-
-   gfp = dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
- &phys_mask);
-   if (IS_ENABLED(CONFIG_ZONE_DMA) && gfp == GFP_DMA)
-   return atomic_pool_dma;
-   if (IS_ENABLED(CONFIG_ZONE_DMA32) && gfp == GFP_DMA32)
-   return atomic_pool_dma32;
+
+   if (atomic_pool_dma &&
+   dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+   &phys_mask))
+   return atomic_pool_dma;
+
return atomic_pool_kernel;
 }
 
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Nicolas Saenz Julienne

Hi Jim,

On Tue, 2020-07-07 at 17:08 -0500, Jeremy Linton wrote:
> Hi,
> 
> I spun this up on my 8G model using the PFTF firmware from:
> 
> https://github.com/pftf/RPi4/releases
> 
> Which allows me to switch between ACPI/DT on the machine. In DT mode it 
> works fine now, 

Nice, would that count as a Tested-by from you?

> but with ACPI I continue to have failures unless I 
> disable CMA via cma=0 on the kernel command line. 

Yes, I see why, in atomic_pool_expand() memory is allocated from CMA without
checking its correctness. That calls for a separate fix. I'll try to think of
something.

> It think that is because
> 
> using DT:
> 
> [0.00] Reserved memory: created CMA memory pool at
> 0x3740, size 64 MiB
> 
> 
> using ACPI:
> [0.00] cma: Reserved 64 MiB at 0xf800
> 
> Which is AFAIK because the default arm64 CMA allocation is just below 
> the arm64_dma32_phys_limit.

As I'm sure you know, we fix the CMA address trough DT, isn't that possible
trough ACPI?

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-08 Thread Nicolas Saenz Julienne

On Wed, 2020-07-08 at 17:35 +0200, Christoph Hellwig wrote:
> On Tue, Jul 07, 2020 at 02:28:04PM +0200, Nicolas Saenz Julienne wrote:
> > When allocating atomic DMA memory for a device, the dma-pool core
> > queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
> > use. It turns out the GFP flag returned is only an optimistic guess.
> > The pool selected might sometimes live in a zone higher than the
> > device's view of memory.
> > 
> > As there isn't a way to grantee a mapping between a device's DMA
> > constraints and correct GFP flags this unifies both DMA atomic pools.
> > The resulting pool is allocated in the lower DMA zone available, if any,
> > so as for devices to always get accessible memory while having the
> > flexibility of using dma_pool_kernel for the non constrained ones.
> > 
> > Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp
> > mask")
> > Reported-by: Jeremy Linton 
> > Suggested-by: Robin Murphy 
> > Signed-off-by: Nicolas Saenz Julienne 
> 
> Hmm, this is not what I expected from the previous thread.  I thought
> we'd just use one dma pool based on runtime available of the zones..

I may be misunderstanding you, but isn't that going back to how things used to
be before pulling in David Rientjes' work? The benefit of having a GFP_KERNEL
pool is that non-address-constrained devices can get their atomic memory there,
instead of consuming somewhat scarcer low memory.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] dma-pool: Do not allocate pool memory from CMA

2020-07-08 Thread Nicolas Saenz Julienne

There is no guarantee to CMA's placement, so allocating a zone specific
atomic pool from CMA might return memory from a completely different
memory zone. So stop using it.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
mask")
Reported-by: Jeremy Linton 
Signed-off-by: Nicolas Saenz Julienne 
---

An more costly alternative would be adding an option to
dma_alloc_from_contiguous() so it fails when the allocation doesn't fall
in a specific zone.

 kernel/dma/pool.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 8cfa01243ed2..4bc1c46ae6ef 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -6,7 +6,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -69,12 +68,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
 
do {
pool_size = 1 << (PAGE_SHIFT + order);
-
-   if (dev_get_cma_area(NULL))
-   page = dma_alloc_from_contiguous(NULL, 1 << order,
-order, false);
-   else
-   page = alloc_pages(gfp, order);
+   page = alloc_pages(gfp, order);
} while (!page && order-- > 0);
if (!page)
goto out;
@@ -118,8 +112,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
dma_common_free_remap(addr, pool_size);
 #endif
 free_page: __maybe_unused
-   if (!dma_release_from_contiguous(NULL, page, 1 << order))
-   __free_pages(page, order);
+   __free_pages(page, order);
 out:
return ret;
 }
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/4] dma-pool: Fix atomic pool selection

2020-07-09 Thread Nicolas Saenz Julienne

This is my attempt at fixing one of the regressions we've seen[1] after
the introduction of per-zone atomic pools.

This combined with "dma-pool: Do not allocate pool memory from CMA"[2]
should fix the boot issues on Jeremy's RPi4 setup.

[1] https://lkml.org/lkml/2020/7/2/974
[2] https://lkml.org/lkml/2020/7/8/1108

---

Nicolas Saenz Julienne (4):
  dma-direct: Provide function to check physical memory area validity
  dma-pool: Get rid of dma_in_atomic_pool()
  dma-pool: Introduce dma_guess_pool()
  dma-pool: Make sure atomic pool suits device

 include/linux/dma-direct.h |  1 +
 kernel/dma/direct.c|  2 +-
 kernel/dma/pool.c  | 76 +++---
 3 files changed, 56 insertions(+), 23 deletions(-)

-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 3/4] dma-pool: Introduce dma_guess_pool()

2020-07-09 Thread Nicolas Saenz Julienne

dma-pool's dev_to_pool() creates the false impression that there is a
way to grantee a mapping between a device's DMA constraints and an
atomic pool. It tuns out it's just a guess, and the device might need to
use an atomic pool containing memory from a 'safer' (or lower) memory
zone.

To help mitigate this, introduce dma_guess_pool() which can be fed a
device's DMA constraints or an atomic pool already known to be faulty in
order for it to provide a better guess at which pool to use.

Signed-off-by: Nicolas Saenz Julienne 
---
 kernel/dma/pool.c | 26 +++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 7363640fc91c..3d518de07617 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -203,7 +203,7 @@ static int __init dma_atomic_pool_init(void)
 }
 postcore_initcall(dma_atomic_pool_init);
 
-static inline struct gen_pool *dev_to_pool(struct device *dev)
+static inline struct gen_pool *dma_guess_pool_from_device(struct device *dev)
 {
u64 phys_mask;
gfp_t gfp;
@@ -217,10 +217,30 @@ static inline struct gen_pool *dev_to_pool(struct device 
*dev)
return atomic_pool_kernel;
 }
 
+static inline struct gen_pool *dma_get_safer_pool(struct gen_pool *bad_pool)
+{
+   if (bad_pool == atomic_pool_kernel)
+   return atomic_pool_dma32 ? : atomic_pool_dma;
+
+   if (bad_pool == atomic_pool_dma32)
+   return atomic_pool_dma;
+
+   return NULL;
+}
+
+static inline struct gen_pool *dma_guess_pool(struct device *dev,
+ struct gen_pool *bad_pool)
+{
+   if (bad_pool)
+   return dma_get_safer_pool(bad_pool);
+
+   return dma_guess_pool_from_device(dev);
+}
+
 void *dma_alloc_from_pool(struct device *dev, size_t size,
  struct page **ret_page, gfp_t flags)
 {
-   struct gen_pool *pool = dev_to_pool(dev);
+   struct gen_pool *pool = dma_guess_pool(dev, NULL);
unsigned long val;
void *ptr = NULL;
 
@@ -245,7 +265,7 @@ void *dma_alloc_from_pool(struct device *dev, size_t size,
 
 bool dma_free_from_pool(struct device *dev, void *start, size_t size)
 {
-   struct gen_pool *pool = dev_to_pool(dev);
+   struct gen_pool *pool = dma_guess_pool(dev, NULL);
 
if (!pool || !gen_pool_has_addr(pool, (unsigned long)start, size))
return false;
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 4/4] dma-pool: Make sure atomic pool suits device

2020-07-09 Thread Nicolas Saenz Julienne

When allocating DMA memory from a pool, the core can only guess which
atomic pool will fit a device's constraints. If it doesn't, get a safer
atomic pool and try again.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
mask")
Reported-by: Jeremy Linton 
Suggested-by: Robin Murphy 
Signed-off-by: Nicolas Saenz Julienne 
---
 kernel/dma/pool.c | 53 +--
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 3d518de07617..d48d9acb585f 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -240,35 +240,56 @@ static inline struct gen_pool *dma_guess_pool(struct 
device *dev,
 void *dma_alloc_from_pool(struct device *dev, size_t size,
  struct page **ret_page, gfp_t flags)
 {
-   struct gen_pool *pool = dma_guess_pool(dev, NULL);
-   unsigned long val;
+   struct gen_pool *pool = NULL;
+   unsigned long val = 0;
void *ptr = NULL;
-
-   if (!pool) {
-   WARN(1, "%pGg atomic pool not initialised!\n", &flags);
-   return NULL;
+   phys_addr_t phys;
+
+   while (1) {
+   pool = dma_guess_pool(dev, pool);
+   if (!pool) {
+   WARN(1, "Failed to get suitable pool for %s\n",
+dev_name(dev));
+   break;
+   }
+
+   val = gen_pool_alloc(pool, size);
+   if (!val)
+   continue;
+
+   phys = gen_pool_virt_to_phys(pool, val);
+   if (dma_coherent_ok(dev, phys, size))
+   break;
+
+   gen_pool_free(pool, val, size);
+   val = 0;
}
 
-   val = gen_pool_alloc(pool, size);
-   if (val) {
-   phys_addr_t phys = gen_pool_virt_to_phys(pool, val);
 
+   if (val) {
*ret_page = pfn_to_page(__phys_to_pfn(phys));
ptr = (void *)val;
memset(ptr, 0, size);
+
+   if (gen_pool_avail(pool) < atomic_pool_size)
+   schedule_work(&atomic_pool_work);
}
-   if (gen_pool_avail(pool) < atomic_pool_size)
-   schedule_work(&atomic_pool_work);
 
return ptr;
 }
 
 bool dma_free_from_pool(struct device *dev, void *start, size_t size)
 {
-   struct gen_pool *pool = dma_guess_pool(dev, NULL);
+   struct gen_pool *pool = NULL;
+
+   while (1) {
+   pool = dma_guess_pool(dev, pool);
+   if (!pool)
+   return false;
 
-   if (!pool || !gen_pool_has_addr(pool, (unsigned long)start, size))
-   return false;
-   gen_pool_free(pool, (unsigned long)start, size);
-   return true;
+   if (gen_pool_has_addr(pool, (unsigned long)start, size)) {
+   gen_pool_free(pool, (unsigned long)start, size);
+   return true;
+   }
+   }
 }
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/4] dma-direct: Provide function to check physical memory area validity

2020-07-09 Thread Nicolas Saenz Julienne

dma_coherent_ok() checks if a physical memory area fits a device's DMA
constraints.

Signed-off-by: Nicolas Saenz Julienne 
---
 include/linux/dma-direct.h | 1 +
 kernel/dma/direct.c| 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index cdfa400f89b3..cb23a8305132 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -69,6 +69,7 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 u64 dma_direct_get_required_mask(struct device *dev);
 gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
  u64 *phys_mask);
+bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 93f578a8e613..4de864cacd22 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -70,7 +70,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 
dma_mask,
return 0;
 }
 
-static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
+bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
return phys_to_dma_direct(dev, phys) + size - 1 <=
min_not_zero(dev->coherent_dma_mask, 
dev->bus_dma_limit);
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/4] dma-pool: Get rid of dma_in_atomic_pool()

2020-07-09 Thread Nicolas Saenz Julienne

The function is only used once and can be simplified to a one-liner.

Signed-off-by: Nicolas Saenz Julienne 
---
 kernel/dma/pool.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 8cfa01243ed2..7363640fc91c 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -217,15 +217,6 @@ static inline struct gen_pool *dev_to_pool(struct device 
*dev)
return atomic_pool_kernel;
 }
 
-static bool dma_in_atomic_pool(struct device *dev, void *start, size_t size)
-{
-   struct gen_pool *pool = dev_to_pool(dev);
-
-   if (unlikely(!pool))
-   return false;
-   return gen_pool_has_addr(pool, (unsigned long)start, size);
-}
-
 void *dma_alloc_from_pool(struct device *dev, size_t size,
  struct page **ret_page, gfp_t flags)
 {
@@ -256,7 +247,7 @@ bool dma_free_from_pool(struct device *dev, void *start, 
size_t size)
 {
struct gen_pool *pool = dev_to_pool(dev);
 
-   if (!dma_in_atomic_pool(dev, start, size))
+   if (!pool || !gen_pool_has_addr(pool, (unsigned long)start, size))
return false;
gen_pool_free(pool, (unsigned long)start, size);
return true;
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-10 Thread Nicolas Saenz Julienne

On Thu, 2020-07-09 at 14:49 -0700, David Rientjes wrote:
> On Wed, 8 Jul 2020, Christoph Hellwig wrote:
> 
> > On Wed, Jul 08, 2020 at 06:00:35PM +0200, Nicolas Saenz Julienne wrote:
> > > On Wed, 2020-07-08 at 17:35 +0200, Christoph Hellwig wrote:
> > > > On Tue, Jul 07, 2020 at 02:28:04PM +0200, Nicolas Saenz Julienne wrote:
> > > > > When allocating atomic DMA memory for a device, the dma-pool core
> > > > > queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
> > > > > use. It turns out the GFP flag returned is only an optimistic guess.
> > > > > The pool selected might sometimes live in a zone higher than the
> > > > > device's view of memory.
> > > > > 
> > > > > As there isn't a way to grantee a mapping between a device's DMA
> > > > > constraints and correct GFP flags this unifies both DMA atomic pools.
> > > > > The resulting pool is allocated in the lower DMA zone available, if
> > > > > any,
> > > > > so as for devices to always get accessible memory while having the
> > > > > flexibility of using dma_pool_kernel for the non constrained ones.
> > > > > 
> > > > > Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map
> > > > > to gfp
> > > > > mask")
> > > > > Reported-by: Jeremy Linton 
> > > > > Suggested-by: Robin Murphy 
> > > > > Signed-off-by: Nicolas Saenz Julienne 
> > > > 
> > > > Hmm, this is not what I expected from the previous thread.  I thought
> > > > we'd just use one dma pool based on runtime available of the zones..
> > > 
> > > I may be misunderstanding you, but isn't that going back to how things
> > > used to
> > > be before pulling in David Rientjes' work? The benefit of having a
> > > GFP_KERNEL
> > > pool is that non-address-constrained devices can get their atomic memory
> > > there,
> > > instead of consuming somewhat scarcer low memory.
> > 
> > Yes, I think we are misunderstanding each other.  I don't want to remove
> > any pool, just make better runtime decisions when to use them.
> > 
> 
> Just to be extra explicit for the record and for my own understanding: 
> Nicolas, your patch series "dma-pool: Fix atomic pool selection" obsoletes 
> this patch, right? :)

Yes, that's right.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] dma-pool: Only allocate from CMA when in same memory zone

2020-07-10 Thread Nicolas Saenz Julienne

There is no guarantee to CMA's placement, so allocating a zone specific
atomic pool from CMA might return memory from a completely different
memory zone. To get around this double check CMA's placement before
allocating from it.

Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
mask")
Reported-by: Jeremy Linton 
Signed-off-by: Nicolas Saenz Julienne 
---

This is a code intensive alternative to "dma-pool: Do not allocate pool
memory from CMA"[1].

[1] https://lkml.org/lkml/2020/7/8/1108

 kernel/dma/pool.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 8cfa01243ed2..ccf3eeb77e00 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -3,6 +3,7 @@
  * Copyright (C) 2012 ARM Ltd.
  * Copyright (C) 2020 Google LLC
  */
+#include 
 #include 
 #include 
 #include 
@@ -56,6 +57,39 @@ static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)
pool_size_kernel += size;
 }
 
+static bool cma_in_zone(gfp_t gfp)
+{
+   u64 zone_dma_end, zone_dma32_end;
+   phys_addr_t base, end;
+   unsigned long size;
+   struct cma *cma;
+
+   cma = dev_get_cma_area(NULL);
+   if (!cma)
+   return false;
+
+   size = cma_get_size(cma);
+   if (!size)
+   return false;
+   base = cma_get_base(cma) - memblock_start_of_DRAM();
+   end = base + size - 1;
+
+   zone_dma_end = IS_ENABLED(CONFIG_ZONE_DMA) ? 
DMA_BIT_MASK(zone_dma_bits) : 0;
+   zone_dma32_end = IS_ENABLED(CONFIG_ZONE_DMA32) ? DMA_BIT_MASK(32) : 0;
+
+   /* CMA can't cross zone boundaries, see cma_activate_area() */
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && gfp & GFP_DMA &&
+  end <= zone_dma_end)
+   return true;
+   else if (IS_ENABLED(CONFIG_ZONE_DMA32) && gfp & GFP_DMA32 &&
+   base > zone_dma_end && end <= zone_dma32_end)
+   return true;
+   else if (base > zone_dma32_end)
+   return true;
+
+   return false;
+}
+
 static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
  gfp_t gfp)
 {
@@ -70,7 +104,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
do {
pool_size = 1 << (PAGE_SHIFT + order);
 
-   if (dev_get_cma_area(NULL))
+   if (cma_in_zone(gfp))
page = dma_alloc_from_contiguous(NULL, 1 << order,
 order, false);
else
-- 
2.27.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/4] dma-pool: Introduce dma_guess_pool()

2020-07-14 Thread Nicolas Saenz Julienne

On Tue, 2020-07-14 at 13:22 +0200, Christoph Hellwig wrote:
> This one doesn't appear to actually apply on top of Linus' tree plus the
> two previous patches.

I'll take care of it.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

1 2 3 >

1 - 100 of 220 matches

Mail list logo