Re: [PATCH 6/9] swiotlb: refactor swiotlb_tbl_map_single

2021-02-22 Thread Nicolas Saenz Julienne
Hi Christoph,

On Sun, 2021-02-07 at 17:03 +0100, Christoph Hellwig wrote:
> Split out a bunch of a self-contained helpers to make the function easier
> to follow.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/swiotlb.c | 179 +--
>  1 file changed, 89 insertions(+), 90 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index b38b1553c4663a..381c24ef1ac1d0 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,134 +468,133 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
> phys_addr_t tlb_addr,
>   }
>  }
>  
> 
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t 
> orig_addr,
> - size_t mapping_size, size_t alloc_size,
> - enum dma_data_direction dir, unsigned long attrs)
> -{
> - dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
> - unsigned long flags;
> - phys_addr_t tlb_addr;
> - unsigned int nslots, stride, index, wrap;
> - int i;
> - unsigned long mask;
> - unsigned long offset_slots;
> - unsigned long max_slots;
> - unsigned long tmp_io_tlb_used;
> -
> - if (no_iotlb_memory)
> - panic("Can not allocate SWIOTLB buffer earlier and can't now 
> provide you with the DMA bounce buffer");
> -
> - if (mem_encrypt_active())
> - pr_warn_once("Memory encryption is active and system is using 
> DMA bounce buffers\n");
> +#define slot_addr(start, idx)((start) + ((idx) << IO_TLB_SHIFT))
>  
> 
> - if (mapping_size > alloc_size) {
> - dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: 
> %zd bytes)",
> -   mapping_size, alloc_size);
> - return (phys_addr_t)DMA_MAPPING_ERROR;
> - }
> -
> - mask = dma_get_seg_boundary(hwdev);
> +/*
> + * Carefully handle integer overflow which can occur when boundary_mask == 
> ~0UL.
> + */
> +static inline unsigned long get_max_slots(unsigned long boundary_mask)
> +{
> + if (boundary_mask == ~0UL)
> + return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> + return nr_slots(boundary_mask + 1);
> +}
>  
> 
> - tbl_dma_addr &= mask;
> +static unsigned int wrap_index(unsigned int index)
> +{
> + if (index >= io_tlb_nslabs)
> + return 0;
> + return index;
> +}
>  
> 
> - offset_slots = nr_slots(tbl_dma_addr);
> +/*
> + * Find a suitable number of IO TLB entries size that will fit this request 
> and
> + * allocate a buffer from that IO TLB pool.
> + */
> +static int find_slots(struct device *dev, size_t alloc_size)
> +{
> + unsigned long boundary_mask = dma_get_seg_boundary(dev);
> + dma_addr_t tbl_dma_addr =
> + phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
> + unsigned int max_slots = get_max_slots(boundary_mask);

'max_slots' should be 'unsigned long' here. Breaks SWIOTLB on RPi4. Do you want
me to send a fix or you prefer editing the patch?

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 3/3] dt-bindings: Fix errors in 'if' schemas

2021-02-03 Thread Nicolas Saenz Julienne
On Tue, 2021-02-02 at 14:55 -0600, Rob Herring wrote:
> Properties in if/then schemas weren't getting checked by the meta-schemas.
> Enabling meta-schema checks finds several errors.
> 
> The use of an 'items' schema (as opposed to the list form) is wrong in
> some cases as it applies to all entries. 'contains' is the correct schema
> to use in the case of multiple entries.
> 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Cc: Maxime Ripard 
> Cc: Chen-Yu Tsai 
> Cc: Eric Anholt 
> Cc: Nicolas Saenz Julienne 
> Cc: Florian Fainelli 
> Cc: Ray Jui 
> Cc: Scott Branden 
> Cc: Pavel Machek 
> Cc: Ulf Hansson 
> Cc: Kishon Vijay Abraham I 
> Cc: Vinod Koul 
> Cc: Geert Uytterhoeven 
> Cc: Linus Walleij 
> Cc: Daniel Lezcano 
> Cc: linux-cry...@vger.kernel.org
> Cc: dri-de...@lists.freedesktop.org
> Cc: linux-l...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-g...@vger.kernel.org
> Signed-off-by: Rob Herring 
> ---

Reviewed-by: Nicolas Saenz Julienne 



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-13 Thread Nicolas Saenz Julienne
Hi All,

On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > Add the initialization function to create restricted DMA pools from
> > matching reserved-memory nodes in the device tree.
> > 
> > Signed-off-by: Claire Chang 
> > ---
> >  include/linux/device.h  |   4 ++
> >  include/linux/swiotlb.h |   7 +-
> >  kernel/dma/Kconfig  |   1 +
> >  kernel/dma/swiotlb.c| 144 ++--
> >  4 files changed, 131 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 89bb8b84173e..ca6f71ec8871 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -413,6 +413,7 @@ struct dev_links_info {
> >   * @dma_pools: Dma pools (if dma'ble device).
> >   * @dma_mem:   Internal for coherent mem override.
> >   * @cma_area:  Contiguous memory area for dma allocations
> > + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
> >   * @archdata:  For arch-specific additions.
> >   * @of_node:   Associated device tree node.
> >   * @fwnode:Associated device node supplied by platform firmware.
> > @@ -515,6 +516,9 @@ struct device {
> >  #ifdef CONFIG_DMA_CMA
> >     struct cma *cma_area;   /* contiguous memory area for dma
> >    allocations */
> > +#endif
> > +#ifdef CONFIG_SWIOTLB
> > +   struct io_tlb_mem   *dma_io_tlb_mem;
> >  #endif
> >     /* arch specific additions */
> >     struct dev_archdata archdata;
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index dd8eb57cbb8f..a1bbd775 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
> >   *
> >   * @start: The start address of the swiotlb memory pool. Used to do a quick
> >   * range check to see if the memory was in fact allocated by this
> > - * API.
> > + * API. For restricted DMA pool, this is device tree adjustable.
> 
> Maybe write it as this is "firmware adjustable" such that when/if ACPI
> needs something like this, the description does not need updating.
> 
> [snip]
> 
> > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> > +   struct device *dev)
> > +{
> > +   struct io_tlb_mem *mem = rmem->priv;
> > +   int ret;
> > +
> > +   if (dev->dma_io_tlb_mem)
> > +   return -EBUSY;
> > +
> > +   if (!mem) {
> > +   mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > +   if (!mem)
> > +   return -ENOMEM;
> > +
> > +   if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
> 
> MEMREMAP_WB sounds appropriate as a default.

As per the binding 'no-map' has to be disabled here. So AFAIU, this memory will
be part of the linear mapping. Is this really needed then?

> Documentation/devicetree/bindings/reserved-memory/ramoops.txt does
> define an "unbuffered" property which in premise could be applied to the
> generic reserved memory binding as well and that we may have to be
> honoring here, if we were to make it more generic. Oh well, this does
> not need to be addressed right now I guess.





signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v7 7/7] mm: Remove examples from enum zone_type comment

2020-11-19 Thread Nicolas Saenz Julienne
We can't really list every setup in common code. On top of that they are
unlikely to stay true for long as things change in the arch trees
independently of this comment.

Suggested-by: Christoph Hellwig 
Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Christoph Hellwig 
---
 include/linux/mmzone.h | 20 
 1 file changed, 20 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8b074e2ba12c..15132adaa233 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -355,26 +355,6 @@ enum zone_type {
 * DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit
 * platforms may need both zones as they support peripherals with
 * different DMA addressing limitations.
-*
-* Some examples:
-*
-*  - i386 and x86_64 have a fixed 16M ZONE_DMA and ZONE_DMA32 for the
-*rest of the lower 4G.
-*
-*  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
-*the specific device.
-*
-*  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
-*lower 4G.
-*
-*  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
-*depending on the specific device.
-*
-*  - s390 uses ZONE_DMA fixed to the lower 2G.
-*
-*  - ia64 and riscv only use ZONE_DMA32.
-*
-*  - parisc uses neither.
 */
 #ifdef CONFIG_ZONE_DMA
ZONE_DMA,
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-11-19 Thread Nicolas Saenz Julienne
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

The DMA layer also needs to be able to allocate memory that is
guaranteed to meet those DMA constraints, for bounce buffering as well
as allocating the backing for consistent mappings. This is why the 1 GB
ZONE_DMA was introduced recently. Unfortunately, it turns out the having
a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and
potentially in other places where allocations cannot cross zone
boundaries. Therefore, we should avoid having two separate DMA zones
when possible.

So, with the help of of_dma_get_max_cpu_address() get the topmost
physical address accessible to all DMA masters in system and use that
information to fine-tune ZONE_DMA's size. In the absence of addressing
limited masters ZONE_DMA will span the whole 32-bit address space,
otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit
address space, and have ZONE_DMA32 cover the rest of the 32-bit address
space.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v4:
 - Use fls64 as we're now using the max address (as opposed to the
   limit)

Changes since v3:
 - Simplify code for readability.

Changes since v2:
 - Updated commit log by shamelessly copying Ard's ACPI commit log

 arch/arm64/mm/init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 0954ea736987..a96d3fbbd12c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -42,8 +42,6 @@
 #include 
 #include 
 
-#define ARM64_ZONE_DMA_BITS30
-
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
zone_bits)
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
+   unsigned int __maybe_unused dt_zone_dma_bits;
 
 #ifdef CONFIG_ZONE_DMA
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
+   zone_dma_bits = min(32U, dt_zone_dma_bits);
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 4/7] of: unittest: Add test for of_dma_get_max_cpu_address()

2020-11-19 Thread Nicolas Saenz Julienne
Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
data as the rest of dma-ranges unit tests.

Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Rob Herring 

---
Changes since v5:
- Update address expected by test

Changes since v3:
 - Remove HAS_DMA guards

 drivers/of/unittest.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 06cc988faf78..98cc0163301b 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -869,6 +869,23 @@ static void __init of_unittest_changeset(void)
 #endif
 }
 
+static void __init of_unittest_dma_get_max_cpu_address(void)
+{
+   struct device_node *np;
+   phys_addr_t cpu_addr;
+
+   np = of_find_node_by_path("/testcase-data/address-tests");
+   if (!np) {
+   pr_err("missing testcase data\n");
+   return;
+   }
+
+   cpu_addr = of_dma_get_max_cpu_address(np);
+   unittest(cpu_addr == 0x4fff,
+"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting 
%x)\n",
+_addr, 0x4fff);
+}
+
 static void __init of_unittest_dma_ranges_one(const char *path,
u64 expect_dma_addr, u64 expect_paddr)
 {
@@ -3266,6 +3283,7 @@ static int __init of_unittest(void)
of_unittest_changeset();
of_unittest_parse_interrupts();
of_unittest_parse_interrupts_extended();
+   of_unittest_dma_get_max_cpu_address();
of_unittest_parse_dma_ranges();
of_unittest_pci_dma_ranges();
of_unittest_match_node();
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 6/7] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-11-19 Thread Nicolas Saenz Julienne
From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.

Cc: Jeremy Linton 
Cc: Lorenzo Pieralisi 
Cc: Nicolas Saenz Julienne 
Cc: Rob Herring 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Cc: Hanjun Guo 
Cc: Sudeep Holla 
Cc: Anshuman Khandual 
Signed-off-by: Ard Biesheuvel 
[nsaenz: unified implementation with DT's counterpart]
Signed-off-by: Nicolas Saenz Julienne 
Tested-by: Jeremy Linton 
Acked-by: Lorenzo Pieralisi 
Acked-by: Hanjun Guo 

---

Changes since v3:
 - Use min_not_zero()
 - Check revision
 - Remove unnecessary #ifdef in zone_sizes_init()

 arch/arm64/mm/init.c  |  5 +++-
 drivers/acpi/arm64/iort.c | 55 +++
 include/linux/acpi_iort.h |  4 +++
 3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a96d3fbbd12c..99741ba63cb8 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -186,11 +187,13 @@ static phys_addr_t __init max_zone_phys(unsigned int 
zone_bits)
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
+   unsigned int __maybe_unused acpi_zone_dma_bits;
unsigned int __maybe_unused dt_zone_dma_bits;
 
 #ifdef CONFIG_ZONE_DMA
+   acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
-   zone_dma_bits = min(32U, dt_zone_dma_bits);
+   zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9929ff50c0c0..1787406684aa 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1718,3 +1718,58 @@ void __init acpi_iort_init(void)
 
iort_init_platform_devices();
 }
+
+#ifdef CONFIG_ZONE_DMA
+/*
+ * Extract the highest CPU physical address accessible to all DMA masters in
+ * the system. PHYS_ADDR_MAX is returned when no constrained device is found.
+ */
+phys_addr_t __init acpi_iort_dma_get_max_cpu_address(void)
+{
+   phys_addr_t limit = PHYS_ADDR_MAX;
+   struct acpi_iort_node *node, *end;
+   struct acpi_table_iort *iort;
+   acpi_status status;
+   int i;
+
+   if (acpi_disabled)
+   return limit;
+
+   status = acpi_get_table(ACPI_SIG_IORT, 0,
+   (struct acpi_table_header **));
+   if (ACPI_FAILURE(status))
+   return limit;
+
+   node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset);
+   end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length);
+
+   for (i = 0; i < iort->node_count; i++) {
+   if (node >= end)
+   break;
+
+   switch (node->type) {
+   struct acpi_iort_named_component *ncomp;
+   struct acpi_iort_root_complex *rc;
+   phys_addr_t local_limit;
+
+   case ACPI_IORT_NODE_NAMED_COMPONENT:
+   ncomp = (struct acpi_iort_named_component 
*)node->node_data;
+   local_limit = DMA_BIT_MASK(ncomp->memory_address_limit);
+   

[PATCH v7 0/7] arm64: Default to 32-bit wide ZONE_DMA

2020-11-19 Thread Nicolas Saenz Julienne
Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Changes since v6:
 - Update patch #1 so we reserve crashkernel before request_standard_resources()
 - Tested on top of Catalin's mem_init() patches.

Changes since v5:
 - Unify ACPI/DT functions

Changes since v4:
 - Fix of_dma_get_max_cpu_address() so it returns the last addressable
   addres, not the limit

Changes since v3:
 - Drop patch adding define in dma-mapping
 - Address small review changes
 - Update Ard's patch
 - Add new patch removing examples from mmzone.h

Changes since v2:
 - Introduce Ard's patch
 - Improve OF dma-ranges parsing function
 - Add unit test for OF function
 - Address small changes
 - Move crashkernel reservation later in boot process

Changes since v1:
 - Parse dma-ranges instead of using machine compatible string

Ard Biesheuvel (1):
  arm64: mm: Set ZONE_DMA size based on early IORT scan

Nicolas Saenz Julienne (6):
  arm64: mm: Move reserve_crashkernel() into mem_init()
  arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
  of/address: Introduce of_dma_get_max_cpu_address()
  of: unittest: Add test for of_dma_get_max_cpu_address()
  arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
  mm: Remove examples from enum zone_type comment

 arch/arm64/mm/init.c  | 22 +---
 drivers/acpi/arm64/iort.c | 55 +++
 drivers/of/address.c  | 42 ++
 drivers/of/unittest.c | 18 +
 include/linux/acpi_iort.h |  4 +++
 include/linux/mmzone.h| 20 --
 include/linux/of.h|  7 +
 7 files changed, 139 insertions(+), 29 deletions(-)

-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-19 Thread Nicolas Saenz Julienne
crashkernel might reserve memory located in ZONE_DMA. We plan to delay
ZONE_DMA's initialization after unflattening the devicetree and ACPI's
boot table initialization, so move it later in the boot process.
Specifically into bootmem_init() since request_standard_resources()
depends on it.

Signed-off-by: Nicolas Saenz Julienne 
Tested-by: Jeremy Linton 

---

Changes since v6:
 - Move crashkernel reserve placement earlier, in bootmem_init()

 arch/arm64/mm/init.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 71d463544400..fafdf992fd32 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -389,8 +389,6 @@ void __init arm64_memblock_init(void)
else
arm64_dma32_phys_limit = PHYS_MASK + 1;
 
-   reserve_crashkernel();
-
reserve_elfcorehdr();
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
@@ -430,6 +428,12 @@ void __init bootmem_init(void)
sparse_init();
zone_sizes_init(min, max);
 
+   /*
+* request_standard_resources() depends on crashkernel's memory being
+* reserved, so do it here.
+*/
+   reserve_crashkernel();
+
memblock_dump_all();
 }
 
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v7 2/7] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()

2020-11-19 Thread Nicolas Saenz Julienne
zone_dma_bits's initialization happens earlier that it's actually
needed, in arm64_memblock_init(). So move it into the more suitable
zone_sizes_init().

Signed-off-by: Nicolas Saenz Julienne 
Tested-by: Jeremy Linton 
---
 arch/arm64/mm/init.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index fafdf992fd32..0954ea736987 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -190,6 +190,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
+   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
 #ifdef CONFIG_ZONE_DMA32
@@ -379,11 +381,6 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
-   arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
-   }
-
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_phys(32);
else
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-19 Thread Nicolas Saenz Julienne
On Thu, 2020-11-19 at 17:10 +, Catalin Marinas wrote:
> On Thu, Nov 19, 2020 at 03:09:58PM +0100, Nicolas Saenz Julienne wrote:
> > On Fri, 2020-11-13 at 11:29 +, Catalin Marinas wrote:
> > [...]
> > > > > > Let me stress that knowing the DMA constraints in the system before 
> > > > > > reserving
> > > > > > crashkernel's regions is necessary if we ever want it to work 
> > > > > > seamlessly on all
> > > > > > platforms. Be it small stuff like the Raspberry Pi or huge servers 
> > > > > > with TB of
> > > > > > memory.
> > > > > 
> > > > > Indeed. So we have 3 options (so far):
> > > > > 
> > > > > 1. Allow the crashkernel reservation to go into the linear map but set
> > > > >it to invalid once allocated.
> > > > > 
> > > > > 2. Parse the flattened DT (not sure what we do with ACPI) before
> > > > >creating the linear map. We may have to rely on some SoC ID here
> > > > >instead of actual DMA ranges.
> > > > > 
> > > > > 3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for 
> > > > > crashkernel
> > > > >reservations and not rely on arm64_dma_phys_limit in
> > > > >reserve_crashkernel().
> > > > > 
> > > > > I think (2) we tried hard to avoid. Option (3) brings us back to the
> > > > > issues we had on large crashkernel reservations regressing on some
> > > > > platforms (though it's been a while since, they mostly went quiet ;)).
> > > > > However, with Chen's crashkernel patches we end up with two
> > > > > reservations, one in the low DMA zone and one higher, potentially 
> > > > > above
> > > > > 4GB. Having a fixed 1GB limit wouldn't be any worse for crashkernel
> > > > > reservations than what we have now.
> > > > > 
> > > > > If (1) works, I'd go for it (James knows this part better than me),
> > > > > otherwise we can go for (3).
> > > > 
> > > > Overall, I'd prefer (1) as well, and I'd be happy to have a got at it. 
> > > > If not
> > > > I'll append (3) in this series.
> > > 
> > > I think for 1 we could also remove the additional KEXEC_CORE checks,
> > > something like below, untested:
> > > 
> > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > index 3e5a6913acc8..27ab609c1c0c 100644
> > > --- a/arch/arm64/mm/mmu.c
> > > +++ b/arch/arm64/mm/mmu.c
> > > @@ -477,7 +477,8 @@ static void __init map_mem(pgd_t *pgdp)
> > >   int flags = 0;
> > >   u64 i;
> > >  
> > > - if (rodata_full || debug_pagealloc_enabled())
> > > + if (rodata_full || debug_pagealloc_enabled() ||
> > > + IS_ENABLED(CONFIG_KEXEC_CORE))
> > >   flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > >  
> > >   /*
> > > @@ -487,11 +488,6 @@ static void __init map_mem(pgd_t *pgdp)
> > >* the following for-loop
> > >*/
> > >   memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> > > -#ifdef CONFIG_KEXEC_CORE
> > > - if (crashk_res.end)
> > > - memblock_mark_nomap(crashk_res.start,
> > > - resource_size(_res));
> > > -#endif
> > >  
> > >   /* map all the memory banks */
> > >   for_each_mem_range(i, , ) {
> > > @@ -518,21 +514,6 @@ static void __init map_mem(pgd_t *pgdp)
> > >   __map_memblock(pgdp, kernel_start, kernel_end,
> > >  PAGE_KERNEL, NO_CONT_MAPPINGS);
> > >   memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
> > > -
> > > -#ifdef CONFIG_KEXEC_CORE
> > > - /*
> > > -  * Use page-level mappings here so that we can shrink the region
> > > -  * in page granularity and put back unused memory to buddy system
> > > -  * through /sys/kernel/kexec_crash_size interface.
> > > -  */
> > > - if (crashk_res.end) {
> > > - __map_memblock(pgdp, crashk_res.start, crashk_res.end + 1,
> > > -PAGE_KERNEL,
> > > -NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
> > > - memblock_clear_nomap(crashk_res.start,
> > > -  resource_size(_res));
> > > - }
> > > -#endif
> > >  }
> > >  
> > >  void m

Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-19 Thread Nicolas Saenz Julienne
Hi Catalin, James,
sorry for the late reply but I got sidetracked.

On Fri, 2020-11-13 at 11:29 +, Catalin Marinas wrote:
[...]
> > > > Let me stress that knowing the DMA constraints in the system before 
> > > > reserving
> > > > crashkernel's regions is necessary if we ever want it to work 
> > > > seamlessly on all
> > > > platforms. Be it small stuff like the Raspberry Pi or huge servers with 
> > > > TB of
> > > > memory.
> > > 
> > > Indeed. So we have 3 options (so far):
> > > 
> > > 1. Allow the crashkernel reservation to go into the linear map but set
> > >it to invalid once allocated.
> > > 
> > > 2. Parse the flattened DT (not sure what we do with ACPI) before
> > >creating the linear map. We may have to rely on some SoC ID here
> > >instead of actual DMA ranges.
> > > 
> > > 3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for crashkernel
> > >reservations and not rely on arm64_dma_phys_limit in
> > >reserve_crashkernel().
> > > 
> > > I think (2) we tried hard to avoid. Option (3) brings us back to the
> > > issues we had on large crashkernel reservations regressing on some
> > > platforms (though it's been a while since, they mostly went quiet ;)).
> > > However, with Chen's crashkernel patches we end up with two
> > > reservations, one in the low DMA zone and one higher, potentially above
> > > 4GB. Having a fixed 1GB limit wouldn't be any worse for crashkernel
> > > reservations than what we have now.
> > > 
> > > If (1) works, I'd go for it (James knows this part better than me),
> > > otherwise we can go for (3).
> > 
> > Overall, I'd prefer (1) as well, and I'd be happy to have a got at it. If 
> > not
> > I'll append (3) in this series.
> 
> I think for 1 we could also remove the additional KEXEC_CORE checks,
> something like below, untested:
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 3e5a6913acc8..27ab609c1c0c 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -477,7 +477,8 @@ static void __init map_mem(pgd_t *pgdp)
>   int flags = 0;
>   u64 i;
>  
> - if (rodata_full || debug_pagealloc_enabled())
> + if (rodata_full || debug_pagealloc_enabled() ||
> + IS_ENABLED(CONFIG_KEXEC_CORE))
>   flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>  
>   /*
> @@ -487,11 +488,6 @@ static void __init map_mem(pgd_t *pgdp)
>* the following for-loop
>*/
>   memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> -#ifdef CONFIG_KEXEC_CORE
> - if (crashk_res.end)
> - memblock_mark_nomap(crashk_res.start,
> - resource_size(_res));
> -#endif
>  
>   /* map all the memory banks */
>   for_each_mem_range(i, , ) {
> @@ -518,21 +514,6 @@ static void __init map_mem(pgd_t *pgdp)
>   __map_memblock(pgdp, kernel_start, kernel_end,
>  PAGE_KERNEL, NO_CONT_MAPPINGS);
>   memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
> -
> -#ifdef CONFIG_KEXEC_CORE
> - /*
> -  * Use page-level mappings here so that we can shrink the region
> -  * in page granularity and put back unused memory to buddy system
> -  * through /sys/kernel/kexec_crash_size interface.
> -  */
> - if (crashk_res.end) {
> - __map_memblock(pgdp, crashk_res.start, crashk_res.end + 1,
> -PAGE_KERNEL,
> -NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
> - memblock_clear_nomap(crashk_res.start,
> -  resource_size(_res));
> - }
> -#endif
>  }
>  
>  void mark_rodata_ro(void)

So as far as I'm concerned this is good enough for me. I took the time to
properly test crashkernel on RPi4 using the series, this patch, and another
small fix to properly update /proc/iomem.

I'll send v7 soon, but before, James (or anyone for that matter) any obvious
push-back to Catalin's solution?

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-12 Thread Nicolas Saenz Julienne
Hi Catalin,

On Tue, 2020-11-10 at 18:17 +, Catalin Marinas wrote:
> On Fri, Nov 06, 2020 at 07:46:29PM +0100, Nicolas Saenz Julienne wrote:
> > On Thu, 2020-11-05 at 16:11 +, James Morse wrote:
> > > On 03/11/2020 17:31, Nicolas Saenz Julienne wrote:
> > > > crashkernel might reserve memory located in ZONE_DMA. We plan to delay
> > > > ZONE_DMA's initialization after unflattening the devicetree and ACPI's
> > > > boot table initialization, so move it later in the boot process.
> > > > Specifically into mem_init(), this is the last place crashkernel will be
> > > > able to reserve the memory before the page allocator kicks in.
> > > > There
> > > > isn't any apparent reason for doing this earlier.
> > > 
> > > It's so that map_mem() can carve it out of the linear/direct map.
> > > This is so that stray writes from a crashing kernel can't accidentally 
> > > corrupt the kdump
> > > kernel. We depend on this if we continue with kdump, but failed to 
> > > offline all the other
> > > CPUs.
> > 
> > I presume here you refer to arch_kexec_protect_crashkres(), IIUC this will 
> > only
> > happen further down the line, after having loaded the kdump kernel image. 
> > But
> > it also depends on the mappings to be PAGE sized (flags == 
> > NO_BLOCK_MAPPINGS |
> > NO_CONT_MAPPINGS).
> 
> IIUC, arch_kexec_protect_crashkres() is only for the crashkernel image,
> not the whole reserved memory that the crashkernel will use. For the
> latter, we avoid the linear map by marking it as nomap in map_mem().

I'm not sure we're on the same page here, so sorry if this was already implied.

The crashkernel memory mapping is bypassed while preparing the linear mappings
but it is then mapped right away, with page granularity and !MTE.
See paging_init()->map_mem():

/*
 * Use page-level mappings here so that we can shrink the region
 * in page granularity and put back unused memory to buddy system
 * through /sys/kernel/kexec_crash_size interface.
 */
if (crashk_res.end) {
__map_memblock(pgdp, crashk_res.start, crashk_res.end + 1,
   PAGE_KERNEL,
   NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
memblock_clear_nomap(crashk_res.start,
 resource_size(_res));
}

IIUC the inconvenience here is that we need special mapping options for
crashkernel and updating those after having mapped that memory as regular
memory isn't possible/easy to do.

> > > We also depend on this when skipping the checksum code in purgatory, 
> > > which can be
> > > exceedingly slow.
> > 
> > This one I don't fully understand, so I'll lazily assume the prerequisite is
> > the same WRT how memory is mapped. :)
> > 
> > Ultimately there's also /sys/kernel/kexec_crash_size's handling. Same
> > prerequisite.
> > 
> > Keeping in mind acpi_table_upgrade() and unflatten_device_tree() depend on
> > having the linear mappings available.
> 
> So it looks like reserve_crashkernel() wants to reserve memory before
> setting up the linear map with the information about the DMA zones in
> place but that comes later when we can parse the firmware tables.
> 
> I wonder, instead of not mapping the crashkernel reservation, can we not
> do an arch_kexec_protect_crashkres() for the whole reservation after we
> created the linear map?

arch_kexec_protect_crashkres() depends on __change_memory_common() which
ultimately depends on the memory to be mapped with PAGE_SIZE pages. As I
comment above, the trick would work as long as there is as way to update the
linear mappings with whatever crashkernel needs later in the boot process.

> > Let me stress that knowing the DMA constraints in the system before 
> > reserving
> > crashkernel's regions is necessary if we ever want it to work seamlessly on 
> > all
> > platforms. Be it small stuff like the Raspberry Pi or huge servers with TB 
> > of
> > memory.
> 
> Indeed. So we have 3 options (so far):
> 
> 1. Allow the crashkernel reservation to go into the linear map but set
>it to invalid once allocated.
> 
> 2. Parse the flattened DT (not sure what we do with ACPI) before
>creating the linear map. We may have to rely on some SoC ID here
>instead of actual DMA ranges.
> 
> 3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for crashkernel
>reservations and not rely on arm64_dma_phys_limit in
>reserve_crashkernel().
> 
> I think (2) we tried hard to avoid. Option (3) brings us back to the
> issues we had on large crashker

Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-06 Thread Nicolas Saenz Julienne
Hi James, thanks for the review. Some comments/questions below.

On Thu, 2020-11-05 at 16:11 +, James Morse wrote:
> Hi!
> 
> On 03/11/2020 17:31, Nicolas Saenz Julienne wrote:
> > crashkernel might reserve memory located in ZONE_DMA. We plan to delay
> > ZONE_DMA's initialization after unflattening the devicetree and ACPI's
> > boot table initialization, so move it later in the boot process.
> > Specifically into mem_init(), this is the last place crashkernel will be
> > able to reserve the memory before the page allocator kicks in.
> > There
> > isn't any apparent reason for doing this earlier.
> 
> It's so that map_mem() can carve it out of the linear/direct map.
> This is so that stray writes from a crashing kernel can't accidentally 
> corrupt the kdump
> kernel. We depend on this if we continue with kdump, but failed to offline 
> all the other
> CPUs.

I presume here you refer to arch_kexec_protect_crashkres(), IIUC this will only
happen further down the line, after having loaded the kdump kernel image. But
it also depends on the mappings to be PAGE sized (flags == NO_BLOCK_MAPPINGS |
NO_CONT_MAPPINGS).

> We also depend on this when skipping the checksum code in purgatory, which 
> can be
> exceedingly slow.

This one I don't fully understand, so I'll lazily assume the prerequisite is
the same WRT how memory is mapped. :)

Ultimately there's also /sys/kernel/kexec_crash_size's handling. Same
prerequisite.

Keeping in mind acpi_table_upgrade() and unflatten_device_tree() depend on
having the linear mappings available. I don't see any simple way of solving
this. Both moving the firmware description routines to use fixmap or correcting
the linear mapping further down the line so as to include kdump's regions, seem
excessive/impossible (feel free to correct me here). I'd be happy to hear
suggestions. Otherwise we're back to hard-coding the information as we
initially did.

Let me stress that knowing the DMA constraints in the system before reserving
crashkernel's regions is necessary if we ever want it to work seamlessly on all
platforms. Be it small stuff like the Raspberry Pi or huge servers with TB of
memory.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 7/7] mm: Remove examples from enum zone_type comment

2020-11-03 Thread Nicolas Saenz Julienne
We can't really list every setup in common code. On top of that they are
unlikely to stay true for long as things change in the arch trees
independently of this comment.

Suggested-by: Christoph Hellwig 
Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Christoph Hellwig 
---
 include/linux/mmzone.h | 20 
 1 file changed, 20 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..9d0c454d23cd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -354,26 +354,6 @@ enum zone_type {
 * DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit
 * platforms may need both zones as they support peripherals with
 * different DMA addressing limitations.
-*
-* Some examples:
-*
-*  - i386 and x86_64 have a fixed 16M ZONE_DMA and ZONE_DMA32 for the
-*rest of the lower 4G.
-*
-*  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
-*the specific device.
-*
-*  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
-*lower 4G.
-*
-*  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
-*depending on the specific device.
-*
-*  - s390 uses ZONE_DMA fixed to the lower 2G.
-*
-*  - ia64 and riscv only use ZONE_DMA32.
-*
-*  - parisc uses neither.
 */
 #ifdef CONFIG_ZONE_DMA
ZONE_DMA,
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 6/7] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-11-03 Thread Nicolas Saenz Julienne
From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.

Cc: Jeremy Linton 
Cc: Lorenzo Pieralisi 
Cc: Nicolas Saenz Julienne 
Cc: Rob Herring 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Cc: Hanjun Guo 
Cc: Sudeep Holla 
Cc: Anshuman Khandual 
Signed-off-by: Ard Biesheuvel 
[nsaenz: unified implementation with DT's counterpart]
Signed-off-by: Nicolas Saenz Julienne 
Tested-by: Jeremy Linton 
Acked-by: Lorenzo Pieralisi 
Acked-by: Hanjun Guo 

---

Changes since v5:
 - Unify with DT's counterpart, return phys_addr_t

Changes since v3:
 - Use min_not_zero()
 - Check revision
 - Remove unnecessary #ifdef in zone_sizes_init()

 arch/arm64/mm/init.c  |  5 +++-
 drivers/acpi/arm64/iort.c | 55 +++
 include/linux/acpi_iort.h |  4 +++
 3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a2ce8a9a71a6..ca5d4b10679d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -186,11 +187,13 @@ static phys_addr_t __init max_zone_phys(unsigned int 
zone_bits)
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
+   unsigned int __maybe_unused acpi_zone_dma_bits;
unsigned int __maybe_unused dt_zone_dma_bits;
 
 #ifdef CONFIG_ZONE_DMA
+   acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
-   zone_dma_bits = min(32U, dt_zone_dma_bits);
+   zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9929ff50c0c0..1787406684aa 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1718,3 +1718,58 @@ void __init acpi_iort_init(void)
 
iort_init_platform_devices();
 }
+
+#ifdef CONFIG_ZONE_DMA
+/*
+ * Extract the highest CPU physical address accessible to all DMA masters in
+ * the system. PHYS_ADDR_MAX is returned when no constrained device is found.
+ */
+phys_addr_t __init acpi_iort_dma_get_max_cpu_address(void)
+{
+   phys_addr_t limit = PHYS_ADDR_MAX;
+   struct acpi_iort_node *node, *end;
+   struct acpi_table_iort *iort;
+   acpi_status status;
+   int i;
+
+   if (acpi_disabled)
+   return limit;
+
+   status = acpi_get_table(ACPI_SIG_IORT, 0,
+   (struct acpi_table_header **));
+   if (ACPI_FAILURE(status))
+   return limit;
+
+   node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset);
+   end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length);
+
+   for (i = 0; i < iort->node_count; i++) {
+   if (node >= end)
+   break;
+
+   switch (node->type) {
+   struct acpi_iort_named_component *ncomp;
+   struct acpi_iort_root_complex *rc;
+   phys_addr_t local_limit;
+
+   case ACPI_IORT_NODE_NAMED_COMPONENT:
+   ncomp = (struct acpi_iort_named_component 
*)node->node_data;
+   local_limit

[PATCH v6 4/7] of: unittest: Add test for of_dma_get_max_cpu_address()

2020-11-03 Thread Nicolas Saenz Julienne
Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
data as the rest of dma-ranges unit tests.

Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Rob Herring 

---
Changes since v5:
- Update address expected by test

Changes since v3:
 - Remove HAS_DMA guards

 drivers/of/unittest.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 06cc988faf78..98cc0163301b 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -869,6 +869,23 @@ static void __init of_unittest_changeset(void)
 #endif
 }
 
+static void __init of_unittest_dma_get_max_cpu_address(void)
+{
+   struct device_node *np;
+   phys_addr_t cpu_addr;
+
+   np = of_find_node_by_path("/testcase-data/address-tests");
+   if (!np) {
+   pr_err("missing testcase data\n");
+   return;
+   }
+
+   cpu_addr = of_dma_get_max_cpu_address(np);
+   unittest(cpu_addr == 0x4fff,
+"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting 
%x)\n",
+_addr, 0x4fff);
+}
+
 static void __init of_unittest_dma_ranges_one(const char *path,
u64 expect_dma_addr, u64 expect_paddr)
 {
@@ -3266,6 +3283,7 @@ static int __init of_unittest(void)
of_unittest_changeset();
of_unittest_parse_interrupts();
of_unittest_parse_interrupts_extended();
+   of_unittest_dma_get_max_cpu_address();
of_unittest_parse_dma_ranges();
of_unittest_pci_dma_ranges();
of_unittest_match_node();
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 3/7] of/address: Introduce of_dma_get_max_cpu_address()

2020-11-03 Thread Nicolas Saenz Julienne
Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
physical address addressable by all DMA masters in the system. It's
specially useful for setting memory zones sizes at early boot time.

Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Rob Herring 

---

Changes since v4:
 - Return max address, not address limit (one off difference)

Changes since v3:
 - use u64 with cpu_end

Changes since v2:
 - Use PHYS_ADDR_MAX
 - return phys_dma_t
 - Rename function
 - Correct subject
 - Add support to start parsing from an arbitrary device node in order
   for the function to work with unit tests

 drivers/of/address.c | 42 ++
 include/linux/of.h   |  7 +++
 2 files changed, 49 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index eb9ab4f1e80b..09c0af7fd1c4 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
 }
 #endif /* CONFIG_HAS_DMA */
 
+/**
+ * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
+ * @np: The node to start searching from or NULL to start from the root
+ *
+ * Gets the highest CPU physical address that is addressable by all DMA masters
+ * in the sub-tree pointed by np, or the whole tree if NULL is passed. If no
+ * DMA constrained device is found, it returns PHYS_ADDR_MAX.
+ */
+phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
+{
+   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
+   struct of_range_parser parser;
+   phys_addr_t subtree_max_addr;
+   struct device_node *child;
+   struct of_range range;
+   const __be32 *ranges;
+   u64 cpu_end = 0;
+   int len;
+
+   if (!np)
+   np = of_root;
+
+   ranges = of_get_property(np, "dma-ranges", );
+   if (ranges && len) {
+   of_dma_range_parser_init(, np);
+   for_each_of_range(, )
+   if (range.cpu_addr + range.size > cpu_end)
+   cpu_end = range.cpu_addr + range.size - 1;
+
+   if (max_cpu_addr > cpu_end)
+   max_cpu_addr = cpu_end;
+   }
+
+   for_each_available_child_of_node(np, child) {
+   subtree_max_addr = of_dma_get_max_cpu_address(child);
+   if (max_cpu_addr > subtree_max_addr)
+   max_cpu_addr = subtree_max_addr;
+   }
+
+   return max_cpu_addr;
+}
+
 /**
  * of_dma_is_coherent - Check if device is coherent
  * @np:device node
diff --git a/include/linux/of.h b/include/linux/of.h
index 5d51891cbf1a..9ed5b8532c30 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
   const char *map_name, const char *map_mask_name,
   struct device_node **target, u32 *id_out);
 
+phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 id,
return -EINVAL;
 }
 
+static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np)
+{
+   return PHYS_ADDR_MAX;
+}
+
 #define of_match_ptr(_ptr) NULL
 #define of_match_node(_matches, _node) NULL
 #endif /* CONFIG_OF */
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 2/7] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()

2020-11-03 Thread Nicolas Saenz Julienne
zone_dma_bits's initialization happens earlier that it's actually
needed, in arm64_memblock_init(). So move it into the more suitable
zone_sizes_init().

Signed-off-by: Nicolas Saenz Julienne 
Tested-by: Jeremy Linton 
---
 arch/arm64/mm/init.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index fc4ab0d6d5d2..410721fc4fc0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -190,6 +190,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
+   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
 #ifdef CONFIG_ZONE_DMA32
@@ -376,11 +378,6 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
-   arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
-   }
-
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_phys(32);
else
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-03 Thread Nicolas Saenz Julienne
crashkernel might reserve memory located in ZONE_DMA. We plan to delay
ZONE_DMA's initialization after unflattening the devicetree and ACPI's
boot table initialization, so move it later in the boot process.
Specifically into mem_init(), this is the last place crashkernel will be
able to reserve the memory before the page allocator kicks in. There
isn't any apparent reason for doing this earlier.

Signed-off-by: Nicolas Saenz Julienne 
Tested-by: Jeremy Linton 
---
 arch/arm64/mm/init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 095540667f0f..fc4ab0d6d5d2 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -386,8 +386,6 @@ void __init arm64_memblock_init(void)
else
arm64_dma32_phys_limit = PHYS_MASK + 1;
 
-   reserve_crashkernel();
-
reserve_elfcorehdr();
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
@@ -508,6 +506,8 @@ void __init mem_init(void)
else
swiotlb_force = SWIOTLB_NO_FORCE;
 
+   reserve_crashkernel();
+
set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-11-03 Thread Nicolas Saenz Julienne
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

The DMA layer also needs to be able to allocate memory that is
guaranteed to meet those DMA constraints, for bounce buffering as well
as allocating the backing for consistent mappings. This is why the 1 GB
ZONE_DMA was introduced recently. Unfortunately, it turns out the having
a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and
potentially in other places where allocations cannot cross zone
boundaries. Therefore, we should avoid having two separate DMA zones
when possible.

So, with the help of of_dma_get_max_cpu_address() get the topmost
physical address accessible to all DMA masters in system and use that
information to fine-tune ZONE_DMA's size. In the absence of addressing
limited masters ZONE_DMA will span the whole 32-bit address space,
otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit
address space, and have ZONE_DMA32 cover the rest of the 32-bit address
space.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v4:
 - Use fls64 as we're now using the max address (as opposed to the
   limit)

Changes since v3:
 - Simplify code for readability.

Changes since v2:
 - Updated commit log by shamelessly copying Ard's ACPI commit log

 arch/arm64/mm/init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 410721fc4fc0..a2ce8a9a71a6 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -42,8 +42,6 @@
 #include 
 #include 
 
-#define ARM64_ZONE_DMA_BITS30
-
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
zone_bits)
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
+   unsigned int __maybe_unused dt_zone_dma_bits;
 
 #ifdef CONFIG_ZONE_DMA
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
+   zone_dma_bits = min(32U, dt_zone_dma_bits);
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 0/7] arm64: Default to 32-bit wide ZONE_DMA

2020-11-03 Thread Nicolas Saenz Julienne
Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Changes since v5:
 - Unify ACPI/DT functions

Changes since v4:
 - Fix of_dma_get_max_cpu_address() so it returns the last addressable
   addres, not the limit

Changes since v3:
 - Drop patch adding define in dma-mapping
 - Address small review changes
 - Update Ard's patch
 - Add new patch removing examples from mmzone.h

Changes since v2:
 - Introduce Ard's patch
 - Improve OF dma-ranges parsing function
 - Add unit test for OF function
 - Address small changes
 - Move crashkernel reservation later in boot process

Changes since v1:
 - Parse dma-ranges instead of using machine compatible string

Ard Biesheuvel (1):
  arm64: mm: Set ZONE_DMA size based on early IORT scan

Nicolas Saenz Julienne (6):
  arm64: mm: Move reserve_crashkernel() into mem_init()
  arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
  of/address: Introduce of_dma_get_max_cpu_address()
  of: unittest: Add test for of_dma_get_max_cpu_address()
  arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
  mm: Remove examples from enum zone_type comment

 arch/arm64/mm/init.c  | 18 ++---
 drivers/acpi/arm64/iort.c | 55 +++
 drivers/of/address.c  | 42 ++
 drivers/of/unittest.c | 18 +
 include/linux/acpi_iort.h |  4 +++
 include/linux/mmzone.h| 20 --
 include/linux/of.h|  7 +
 7 files changed, 135 insertions(+), 29 deletions(-)

-- 
2.29.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/7] arm64: Default to 32-bit wide ZONE_DMA

2020-11-03 Thread Nicolas Saenz Julienne
On Fri, 2020-10-30 at 18:11 +, Catalin Marinas wrote:
> On Thu, Oct 29, 2020 at 06:25:43PM +0100, Nicolas Saenz Julienne wrote:
> > Ard Biesheuvel (1):
> >   arm64: mm: Set ZONE_DMA size based on early IORT scan
> > 
> > Nicolas Saenz Julienne (6):
> >   arm64: mm: Move reserve_crashkernel() into mem_init()
> >   arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
> >   of/address: Introduce of_dma_get_max_cpu_address()
> >   of: unittest: Add test for of_dma_get_max_cpu_address()
> >   arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
> >   mm: Remove examples from enum zone_type comment
> 
> Thanks for putting this together. I had a minor comment but the patches
> look fine to me. We still need an ack from Rob on the DT patch and I can
> queue the series for 5.11.

I'm preparing a v6 unifying both functions as you suggested.

> Could you please also test the patch below on top of this series? It's
> the removal of the implied DMA offset in the max_zone_phys()
> calculation.

Yes, happily. Comments below.

> --8<-
> From 3ae252d888be4984a612236124f5b099e804c745 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas 
> Date: Fri, 30 Oct 2020 18:07:34 +
> Subject: [PATCH] arm64: Ignore any DMA offsets in the max_zone_phys()
>  calculation
> 
> Currently, the kernel assumes that if RAM starts above 32-bit (or
> zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
> such constrained devices have a hardwired DMA offset. In practice, we
> haven't noticed any such hardware so let's assume that we can expand
> ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
> ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
> zone_bits.
> 
> Signed-off-by: Catalin Marinas 
> ---
>  arch/arm64/mm/init.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 095540667f0f..362160e16fb2 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -175,14 +175,21 @@ static void __init reserve_elfcorehdr(void)
>  #endif /* CONFIG_CRASH_DUMP */
>  
>  /*
> - * Return the maximum physical address for a zone with a given address size
> - * limit. It currently assumes that for memory starting above 4G, 32-bit
> - * devices will use a DMA offset.
> + * Return the maximum physical address for a zone accessible by the given 
> bits
> + * limit. If the DRAM starts above 32-bit, expand the zone to the maximum
> + * available memory, otherwise cap it at 32-bit.
>   */
>  static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
>  {
> - phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 
> zone_bits);
> - return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
> + phys_addr_t zone_mask = (1ULL << zone_bits) - 1;

Maybe use DMA_BIT_MASK(), instead of the manual calculation?

> + phys_addr_t phys_start = memblock_start_of_DRAM();
> +
> + if (!(phys_start & U32_MAX))

I'd suggest using 'bigger than' instead of masks. Just to cover ourselves
against memory starting at odd locations. Also it'll behaves properly when
phys_start is zero (this breaks things on RPi4).

> + zone_mask = PHYS_ADDR_MAX;
> + else if (!(phys_start & zone_mask))
> + zone_mask = U32_MAX;
> +
> + return min(zone_mask + 1, memblock_end_of_DRAM());

This + 1 isn't going to play well when zone_mask is PHYS_ADDR_MAX.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 7/7] mm: Remove examples from enum zone_type comment

2020-10-29 Thread Nicolas Saenz Julienne
We can't really list every setup in common code. On top of that they are
unlikely to stay true for long as things change in the arch trees
independently of this comment.

Suggested-by: Christoph Hellwig 
Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Christoph Hellwig 
---
 include/linux/mmzone.h | 20 
 1 file changed, 20 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..9d0c454d23cd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -354,26 +354,6 @@ enum zone_type {
 * DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit
 * platforms may need both zones as they support peripherals with
 * different DMA addressing limitations.
-*
-* Some examples:
-*
-*  - i386 and x86_64 have a fixed 16M ZONE_DMA and ZONE_DMA32 for the
-*rest of the lower 4G.
-*
-*  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
-*the specific device.
-*
-*  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
-*lower 4G.
-*
-*  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
-*depending on the specific device.
-*
-*  - s390 uses ZONE_DMA fixed to the lower 2G.
-*
-*  - ia64 and riscv only use ZONE_DMA32.
-*
-*  - parisc uses neither.
 */
 #ifdef CONFIG_ZONE_DMA
ZONE_DMA,
-- 
2.29.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/7] of: unittest: Add test for of_dma_get_max_cpu_address()

2020-10-29 Thread Nicolas Saenz Julienne
Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
data as the rest of dma-ranges unit tests.

Signed-off-by: Nicolas Saenz Julienne 
Reviewed-by: Rob Herring 

---
Changes since v3:
 - Remove HAS_DMA guards

 drivers/of/unittest.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 06cc988faf78..b9a4d047a95e 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -869,6 +869,23 @@ static void __init of_unittest_changeset(void)
 #endif
 }
 
+static void __init of_unittest_dma_get_max_cpu_address(void)
+{
+   struct device_node *np;
+   phys_addr_t cpu_addr;
+
+   np = of_find_node_by_path("/testcase-data/address-tests");
+   if (!np) {
+   pr_err("missing testcase data\n");
+   return;
+   }
+
+   cpu_addr = of_dma_get_max_cpu_address(np);
+   unittest(cpu_addr == 0x5000,
+"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting 
%x)\n",
+_addr, 0x5000);
+}
+
 static void __init of_unittest_dma_ranges_one(const char *path,
u64 expect_dma_addr, u64 expect_paddr)
 {
@@ -3266,6 +3283,7 @@ static int __init of_unittest(void)
of_unittest_changeset();
of_unittest_parse_interrupts();
of_unittest_parse_interrupts_extended();
+   of_unittest_dma_get_max_cpu_address();
of_unittest_parse_dma_ranges();
of_unittest_pci_dma_ranges();
of_unittest_match_node();
-- 
2.29.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/7] arm64: Default to 32-bit wide ZONE_DMA

2020-10-29 Thread Nicolas Saenz Julienne
Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Changes since v4:
 - Fix of_dma_get_max_cpu_address() so it returns the last addressable
   addres, not the limit

Changes since v3:
 - Drop patch adding define in dma-mapping
 - Address small review changes
 - Update Ard's patch
 - Add new patch removing examples from mmzone.h

Changes since v2:
 - Introduce Ard's patch
 - Improve OF dma-ranges parsing function
 - Add unit test for OF function
 - Address small changes
 - Move crashkernel reservation later in boot process

Changes since v1:
 - Parse dma-ranges instead of using machine compatible string

Ard Biesheuvel (1):
  arm64: mm: Set ZONE_DMA size based on early IORT scan

Nicolas Saenz Julienne (6):
  arm64: mm: Move reserve_crashkernel() into mem_init()
  arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
  of/address: Introduce of_dma_get_max_cpu_address()
  of: unittest: Add test for of_dma_get_max_cpu_address()
  arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
  mm: Remove examples from enum zone_type comment

 arch/arm64/mm/init.c  | 16 ++--
 drivers/acpi/arm64/iort.c | 52 +++
 drivers/of/address.c  | 42 +++
 drivers/of/unittest.c | 18 ++
 include/linux/acpi_iort.h |  4 +++
 include/linux/mmzone.h| 20 ---
 include/linux/of.h|  7 ++
 7 files changed, 130 insertions(+), 29 deletions(-)

-- 
2.29.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 6/7] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-29 Thread Nicolas Saenz Julienne
From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.

Cc: Jeremy Linton 
Cc: Lorenzo Pieralisi 
Cc: Nicolas Saenz Julienne 
Cc: Rob Herring 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Cc: Hanjun Guo 
Cc: Sudeep Holla 
Cc: Anshuman Khandual 
Signed-off-by: Ard Biesheuvel 
[nsaenz: Rebased, removed documentation change and add declaration in 
acpi_iort.h]
Signed-off-by: Nicolas Saenz Julienne 
Tested-by: Jeremy Linton 
Acked-by: Lorenzo Pieralisi 
Acked-by: Hanjun Guo 

---

Changes since v3:
 - Use min_not_zero()
 - Check revision
 - Remove unnecessary #ifdef in zone_sizes_init()
 
 arch/arm64/mm/init.c  |  3 ++-
 drivers/acpi/arm64/iort.c | 52 +++
 include/linux/acpi_iort.h |  4 +++
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a2ce8a9a71a6..b9dc3831dd6b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -190,7 +191,7 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 
 #ifdef CONFIG_ZONE_DMA
dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
-   zone_dma_bits = min(32U, dt_zone_dma_bits);
+   zone_dma_bits = min3(32U, dt_zone_dma_bits, 
acpi_iort_get_zone_dma_size());
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9929ff50c0c0..05fe4a076bab 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1718,3 +1718,55 @@ void __init acpi_iort_init(void)
 
iort_init_platform_devices();
 }
+
+#ifdef CONFIG_ZONE_DMA
+/*
+ * Check the IORT whether any devices exist whose DMA mask is < 32 bits.
+ * If so, return the smallest value encountered, or 32 otherwise.
+ */
+unsigned int __init acpi_iort_get_zone_dma_size(void)
+{
+   struct acpi_table_iort *iort;
+   struct acpi_iort_node *node, *end;
+   acpi_status status;
+   u8 limit = 32;
+   int i;
+
+   if (acpi_disabled)
+   return limit;
+
+   status = acpi_get_table(ACPI_SIG_IORT, 0,
+   (struct acpi_table_header **));
+   if (ACPI_FAILURE(status))
+   return limit;
+
+   node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset);
+   end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length);
+
+   for (i = 0; i < iort->node_count; i++) {
+   if (node >= end)
+   break;
+
+   switch (node->type) {
+   struct acpi_iort_named_component *ncomp;
+   struct acpi_iort_root_complex *rc;
+
+   case ACPI_IORT_NODE_NAMED_COMPONENT:
+   ncomp = (struct acpi_iort_named_component 
*)node->node_data;
+   limit = min_not_zero(limit, 
ncomp->memory_address_limit);
+   break;
+
+   case ACPI_IORT_NODE_PCI_ROOT_COMPLEX:
+   if (node->revision < 1)
+   break;
+
+   rc = (struct acpi_iort_root_complex *)node->node_data;
+   limit = min_not_zero(limit, rc->memory_address_limit);
+   break;

[PATCH v5 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-29 Thread Nicolas Saenz Julienne
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

The DMA layer also needs to be able to allocate memory that is
guaranteed to meet those DMA constraints, for bounce buffering as well
as allocating the backing for consistent mappings. This is why the 1 GB
ZONE_DMA was introduced recently. Unfortunately, it turns out the having
a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and
potentially in other places where allocations cannot cross zone
boundaries. Therefore, we should avoid having two separate DMA zones
when possible.

So, with the help of of_dma_get_max_cpu_address() get the topmost
physical address accessible to all DMA masters in system and use that
information to fine-tune ZONE_DMA's size. In the absence of addressing
limited masters ZONE_DMA will span the whole 32-bit address space,
otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit
address space, and have ZONE_DMA32 cover the rest of the 32-bit address
space.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v4:
 - Use fls64 as we're now using the max address (as opposed to the
   limit)

Changes since v3:
 - Simplify code for readability.

Changes since v2:
 - Updated commit log by shamelessly copying Ard's ACPI commit log

 arch/arm64/mm/init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 410721fc4fc0..a2ce8a9a71a6 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -42,8 +42,6 @@
 #include 
 #include 
 
-#define ARM64_ZONE_DMA_BITS30
-
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
zone_bits)
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
+   unsigned int __maybe_unused dt_zone_dma_bits;
 
 #ifdef CONFIG_ZONE_DMA
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
+   zone_dma_bits = min(32U, dt_zone_dma_bits);
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
-- 
2.29.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/7] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()

2020-10-29 Thread Nicolas Saenz Julienne
zone_dma_bits's initialization happens earlier that it's actually
needed, in arm64_memblock_init(). So move it into the more suitable
zone_sizes_init().

Signed-off-by: Nicolas Saenz Julienne 
Tested-by: Jeremy Linton 
---
 arch/arm64/mm/init.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index fc4ab0d6d5d2..410721fc4fc0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -190,6 +190,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
+   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
 #ifdef CONFIG_ZONE_DMA32
@@ -376,11 +378,6 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
-   arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
-   }
-
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_phys(32);
else
-- 
2.29.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-10-29 Thread Nicolas Saenz Julienne
crashkernel might reserve memory located in ZONE_DMA. We plan to delay
ZONE_DMA's initialization after unflattening the devicetree and ACPI's
boot table initialization, so move it later in the boot process.
Specifically into mem_init(), this is the last place crashkernel will be
able to reserve the memory before the page allocator kicks in. There
isn't any apparent reason for doing this earlier.

Signed-off-by: Nicolas Saenz Julienne 
Tested-by: Jeremy Linton 
---
 arch/arm64/mm/init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 095540667f0f..fc4ab0d6d5d2 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -386,8 +386,6 @@ void __init arm64_memblock_init(void)
else
arm64_dma32_phys_limit = PHYS_MASK + 1;
 
-   reserve_crashkernel();
-
reserve_elfcorehdr();
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
@@ -508,6 +506,8 @@ void __init mem_init(void)
else
swiotlb_force = SWIOTLB_NO_FORCE;
 
+   reserve_crashkernel();
+
set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
-- 
2.29.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 3/7] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-29 Thread Nicolas Saenz Julienne
Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
physical address addressable by all DMA masters in the system. It's
specially useful for setting memory zones sizes at early boot time.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v4:
 - Return max address, not address limit (one-off difference)

Changes since v3:
 - use u64 with cpu_end

Changes since v2:
 - Use PHYS_ADDR_MAX
 - return phys_dma_t
 - Rename function
 - Correct subject
 - Add support to start parsing from an arbitrary device node in order
   for the function to work with unit tests

 drivers/of/address.c | 42 ++
 include/linux/of.h   |  7 +++
 2 files changed, 49 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index eb9ab4f1e80b..09c0af7fd1c4 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
 }
 #endif /* CONFIG_HAS_DMA */
 
+/**
+ * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
+ * @np: The node to start searching from or NULL to start from the root
+ *
+ * Gets the highest CPU physical address that is addressable by all DMA masters
+ * in the sub-tree pointed by np, or the whole tree if NULL is passed. If no
+ * DMA constrained device is found, it returns PHYS_ADDR_MAX.
+ */
+phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
+{
+   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
+   struct of_range_parser parser;
+   phys_addr_t subtree_max_addr;
+   struct device_node *child;
+   struct of_range range;
+   const __be32 *ranges;
+   u64 cpu_end = 0;
+   int len;
+
+   if (!np)
+   np = of_root;
+
+   ranges = of_get_property(np, "dma-ranges", );
+   if (ranges && len) {
+   of_dma_range_parser_init(, np);
+   for_each_of_range(, )
+   if (range.cpu_addr + range.size > cpu_end)
+   cpu_end = range.cpu_addr + range.size - 1;
+
+   if (max_cpu_addr > cpu_end)
+   max_cpu_addr = cpu_end;
+   }
+
+   for_each_available_child_of_node(np, child) {
+   subtree_max_addr = of_dma_get_max_cpu_address(child);
+   if (max_cpu_addr > subtree_max_addr)
+   max_cpu_addr = subtree_max_addr;
+   }
+
+   return max_cpu_addr;
+}
+
 /**
  * of_dma_is_coherent - Check if device is coherent
  * @np:device node
diff --git a/include/linux/of.h b/include/linux/of.h
index 5d51891cbf1a..9ed5b8532c30 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
   const char *map_name, const char *map_mask_name,
   struct device_node **target, u32 *id_out);
 
+phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 id,
return -EINVAL;
 }
 
+static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np)
+{
+   return PHYS_ADDR_MAX;
+}
+
 #define of_match_ptr(_ptr) NULL
 #define of_match_node(_matches, _node) NULL
 #endif /* CONFIG_OF */
-- 
2.29.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 0/7] arm64: Default to 32-bit wide ZONE_DMA

2020-10-27 Thread Nicolas Saenz Julienne
On Fri, 2020-10-23 at 14:05 -0500, Jeremy Linton wrote:
> Hi,
> 
> On 10/21/20 7:34 AM, Nicolas Saenz Julienne wrote:
> > Using two distinct DMA zones turned out to be problematic. Here's an
> > attempt go back to a saner default.
> > 
> > I tested this on both a RPi4 and QEMU.
> 
> I've tested this in ACPI mode on the rpi4 (4+8G with/without the 3G 
> limiter) as well, with Ard's IORT patch. Nothing seems to have regressed.
> 
> Thanks,
> 
> Tested-by: Jeremy Linton 

Thanks!



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-23 Thread Nicolas Saenz Julienne
Hi Catalin,

On Thu, 2020-10-22 at 19:06 +0100, Catalin Marinas wrote:
> On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote:
> > @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
> > zone_bits)
> >  static void __init zone_sizes_init(unsigned long min, unsigned long max)
> >  {
> > unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> > +   unsigned int __maybe_unused dt_zone_dma_bits;
> >  
> >  #ifdef CONFIG_ZONE_DMA
> > -   zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > +   dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
> > +   zone_dma_bits = min(32U, dt_zone_dma_bits);
> 
> A thought: can we remove the min here and expand ZONE_DMA to whatever
> dt_zone_dma_bits says? More on this below.

On most platforms we'd get PHYS_ADDR_MAX, or something bigger than the actual
amount of RAM. Which would ultimately create a system wide ZONE_DMA. At first
sight, I don't see it breaking dma-direct in any way.

On the other hand, there is a big amount of MMIO devices out there that can
only handle 32-bit addressing. Be it PCI cards or actual IP cores. To make
things worse, this limitation is often expressed in the driver, not FW (with
dma_set_mask() and friends). If those devices aren't behind an IOMMU we have be
able to provide at least 32-bit addressable memory. See this comment from
dma_direct_supported():

/*
 * Because 32-bit DMA masks are so common we expect every architecture
 * to be able to satisfy them - either by not supporting more physical
 * memory, or by providing a ZONE_DMA32.  If neither is the case, the
 * architecture needs to use an IOMMU instead of the direct mapping.
 */

I think, for the common case, we're stuck with at least one zone spanning the
32-bit address space.

> > arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> > max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> >  #endif
> 
> I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the
> need for max_zone_phys(). This was rather theoretical, the Seattle
> platform has all RAM starting above 4GB and that led to an empty
> ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift
> ZONE_DMA32 into the bottom of the RAM, on the assumption that such
> 32-bit devices would have a DMA offset hardwired. We are not aware of
> any such case on arm64 systems and even on Seattle, IIUC 32-bit devices
> only work if they are behind an SMMU (so no hardwired offset).
> 
> In hindsight, it would have made more sense on platforms with RAM above
> 4GB to expand ZONE_DMA32 to cover the whole memory (so empty
> ZONE_NORMAL). Something like:
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a53c1e0fb017..7d5e3dd85617 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void)
>   */
>  static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
>  {
> - phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 
> zone_bits);
> - return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
> + phys_addr_t zone_mask = 1ULL << zone_bits;
> +
> + if (!(memblock_start_of_DRAM() & zone_mask))
> + zone_mask = PHYS_ADDR_MAX;
> +
> + return min(zone_mask, memblock_end_of_DRAM());
>  }
>  
>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
> 
> I don't think this makes any difference for ZONE_DMA unless a
> broken DT or IORT reports the max CPU address below the start of DRAM.
> 
> There's a minor issue if of_dma_get_max_cpu_address() matches
> memblock_end_of_DRAM() but they are not a power of 2. We'd be left with
> a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.

I agree it makes no sense to create more than one zone when the beginning of
RAM is located above the 32-bit address space. I'm all for disregarding the
possibility of hardwired offsets. As a bonus, as we already discussed some time
ago, this is something that never played well with current dma-direct code[1].

Regards,
Nicolas

[1] https://lkml.org/lkml/2020/9/8/377



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 3/7] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-22 Thread Nicolas Saenz Julienne
On Thu, 2020-10-22 at 14:23 +0200, Ard Biesheuvel wrote:
> > +/**
> > + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
> > + * @np: The node to start searching from or NULL to start from the root
> > + *
> > + * Gets the highest CPU physical address that is addressable by all DMA 
> > masters
> > + * in the sub-tree pointed by np, or the whole tree if NULL is passed. If 
> > no
> > + * DMA constrained device is found, it returns PHYS_ADDR_MAX.
> > + */
> > +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> > +{
> > +   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
> > +   struct of_range_parser parser;
> > +   phys_addr_t subtree_max_addr;
> > +   struct device_node *child;
> > +   struct of_range range;
> > +   const __be32 *ranges;
> > +   u64 cpu_end = 0;
> > +   int len;
> > +
> > +   if (!np)
> > +   np = of_root;
> > +
> > +   ranges = of_get_property(np, "dma-ranges", );
> > +   if (ranges && len) {
> > +   of_dma_range_parser_init(, np);
> > +   for_each_of_range(, )
> > +   if (range.cpu_addr + range.size > cpu_end)
> > +   cpu_end = range.cpu_addr + range.size;
> 
> 
> Shouldn't this be 'range.cpu_addr + range.size - 1' ?

Yes, I agree. In that case arm64's counterpart should be:

zone_dma_bits = max(32U, fls64(of_dma_get_max_cpu_address(NULL)));

I'll update it.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 0/7] arm64: Default to 32-bit wide ZONE_DMA

2020-10-21 Thread Nicolas Saenz Julienne
Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Changes since v3:
 - Drop patch adding define in dma-mapping
 - Address small review changes
 - Update Ard's patch
 - Add new patch removing examples from mmzone.h

Changes since v2:
 - Introduce Ard's patch
 - Improve OF dma-ranges parsing function
 - Add unit test for OF function
 - Address small changes
 - Move crashkernel reservation later in boot process

Changes since v1:
 - Parse dma-ranges instead of using machine compatible string

Ard Biesheuvel (1):
  arm64: mm: Set ZONE_DMA size based on early IORT scan

Nicolas Saenz Julienne (6):
  arm64: mm: Move reserve_crashkernel() into mem_init()
  arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
  of/address: Introduce of_dma_get_max_cpu_address()
  of: unittest: Add test for of_dma_get_max_cpu_address()
  arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
  mm: Remove examples from enum zone_type comment

 arch/arm64/mm/init.c  | 16 ++--
 drivers/acpi/arm64/iort.c | 52 +++
 drivers/of/address.c  | 42 +++
 drivers/of/unittest.c | 18 ++
 include/linux/acpi_iort.h |  4 +++
 include/linux/mmzone.h| 20 ---
 include/linux/of.h|  7 ++
 7 files changed, 130 insertions(+), 29 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 7/7] mm: Remove examples from enum zone_type comment

2020-10-21 Thread Nicolas Saenz Julienne
We can't really list every setup in common code. On top of that they are
unlikely to stay true for long as things change in the arch trees
independently of this comment.

Suggested-by: Christoph Hellwig 
Signed-off-by: Nicolas Saenz Julienne 
---
 include/linux/mmzone.h | 20 
 1 file changed, 20 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..9d0c454d23cd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -354,26 +354,6 @@ enum zone_type {
 * DMA mask is assumed when ZONE_DMA32 is defined. Some 64-bit
 * platforms may need both zones as they support peripherals with
 * different DMA addressing limitations.
-*
-* Some examples:
-*
-*  - i386 and x86_64 have a fixed 16M ZONE_DMA and ZONE_DMA32 for the
-*rest of the lower 4G.
-*
-*  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
-*the specific device.
-*
-*  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
-*lower 4G.
-*
-*  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
-*depending on the specific device.
-*
-*  - s390 uses ZONE_DMA fixed to the lower 2G.
-*
-*  - ia64 and riscv only use ZONE_DMA32.
-*
-*  - parisc uses neither.
 */
 #ifdef CONFIG_ZONE_DMA
ZONE_DMA,
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/7] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()

2020-10-21 Thread Nicolas Saenz Julienne
zone_dma_bits's initialization happens earlier that it's actually
needed, in arm64_memblock_init(). So move it into the more suitable
zone_sizes_init().

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index fc4ab0d6d5d2..410721fc4fc0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -190,6 +190,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
+   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
 #ifdef CONFIG_ZONE_DMA32
@@ -376,11 +378,6 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
-   arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
-   }
-
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_phys(32);
else
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 3/7] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-21 Thread Nicolas Saenz Julienne
Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
physical address addressable by all DMA masters in the system. It's
specially useful for setting memory zones sizes at early boot time.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v3:
 - use u64 with cpu_end

Changes since v2:
 - Use PHYS_ADDR_MAX
 - return phys_dma_t
 - Rename function
 - Correct subject
 - Add support to start parsing from an arbitrary device node in order
   for the function to work with unit tests

 drivers/of/address.c | 42 ++
 include/linux/of.h   |  7 +++
 2 files changed, 49 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index eb9ab4f1e80b..47dfe5881e18 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
 }
 #endif /* CONFIG_HAS_DMA */
 
+/**
+ * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
+ * @np: The node to start searching from or NULL to start from the root
+ *
+ * Gets the highest CPU physical address that is addressable by all DMA masters
+ * in the sub-tree pointed by np, or the whole tree if NULL is passed. If no
+ * DMA constrained device is found, it returns PHYS_ADDR_MAX.
+ */
+phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
+{
+   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
+   struct of_range_parser parser;
+   phys_addr_t subtree_max_addr;
+   struct device_node *child;
+   struct of_range range;
+   const __be32 *ranges;
+   u64 cpu_end = 0;
+   int len;
+
+   if (!np)
+   np = of_root;
+
+   ranges = of_get_property(np, "dma-ranges", );
+   if (ranges && len) {
+   of_dma_range_parser_init(, np);
+   for_each_of_range(, )
+   if (range.cpu_addr + range.size > cpu_end)
+   cpu_end = range.cpu_addr + range.size;
+
+   if (max_cpu_addr > cpu_end)
+   max_cpu_addr = cpu_end;
+   }
+
+   for_each_available_child_of_node(np, child) {
+   subtree_max_addr = of_dma_get_max_cpu_address(child);
+   if (max_cpu_addr > subtree_max_addr)
+   max_cpu_addr = subtree_max_addr;
+   }
+
+   return max_cpu_addr;
+}
+
 /**
  * of_dma_is_coherent - Check if device is coherent
  * @np:device node
diff --git a/include/linux/of.h b/include/linux/of.h
index 481ec0467285..db8db8f2c967 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
   const char *map_name, const char *map_mask_name,
   struct device_node **target, u32 *id_out);
 
+phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 id,
return -EINVAL;
 }
 
+static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np)
+{
+   return PHYS_ADDR_MAX;
+}
+
 #define of_match_ptr(_ptr) NULL
 #define of_match_node(_matches, _node) NULL
 #endif /* CONFIG_OF */
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 4/7] of: unittest: Add test for of_dma_get_max_cpu_address()

2020-10-21 Thread Nicolas Saenz Julienne
Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
data as the rest of dma-ranges unit tests.

Signed-off-by: Nicolas Saenz Julienne 

---
Changes since v3:
 - Remove HAS_DMA guards

 drivers/of/unittest.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 06cc988faf78..b9a4d047a95e 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -869,6 +869,23 @@ static void __init of_unittest_changeset(void)
 #endif
 }
 
+static void __init of_unittest_dma_get_max_cpu_address(void)
+{
+   struct device_node *np;
+   phys_addr_t cpu_addr;
+
+   np = of_find_node_by_path("/testcase-data/address-tests");
+   if (!np) {
+   pr_err("missing testcase data\n");
+   return;
+   }
+
+   cpu_addr = of_dma_get_max_cpu_address(np);
+   unittest(cpu_addr == 0x5000,
+"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting 
%x)\n",
+_addr, 0x5000);
+}
+
 static void __init of_unittest_dma_ranges_one(const char *path,
u64 expect_dma_addr, u64 expect_paddr)
 {
@@ -3266,6 +3283,7 @@ static int __init of_unittest(void)
of_unittest_changeset();
of_unittest_parse_interrupts();
of_unittest_parse_interrupts_extended();
+   of_unittest_dma_get_max_cpu_address();
of_unittest_parse_dma_ranges();
of_unittest_pci_dma_ranges();
of_unittest_match_node();
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-10-21 Thread Nicolas Saenz Julienne
crashkernel might reserve memory located in ZONE_DMA. We plan to delay
ZONE_DMA's initialization after unflattening the devicetree and ACPI's
boot table initialization, so move it later in the boot process.
Specifically into mem_init(), this is the last place crashkernel will be
able to reserve the memory before the page allocator kicks in. There
isn't any apparent reason for doing this earlier.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 095540667f0f..fc4ab0d6d5d2 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -386,8 +386,6 @@ void __init arm64_memblock_init(void)
else
arm64_dma32_phys_limit = PHYS_MASK + 1;
 
-   reserve_crashkernel();
-
reserve_elfcorehdr();
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
@@ -508,6 +506,8 @@ void __init mem_init(void)
else
swiotlb_force = SWIOTLB_NO_FORCE;
 
+   reserve_crashkernel();
+
set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 6/7] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-21 Thread Nicolas Saenz Julienne
From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.

Cc: Jeremy Linton 
Cc: Lorenzo Pieralisi 
Cc: Nicolas Saenz Julienne 
Cc: Rob Herring 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Cc: Hanjun Guo 
Cc: Sudeep Holla 
Cc: Anshuman Khandual 
Signed-off-by: Ard Biesheuvel 
[nsaenz: Rebased, removed documentation change and add declaration in 
acpi_iort.h]
Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v3:
 - Use min_not_zero()
 - Check ACPI revision
 - Remove unnecessary #ifdef in zone_sizes_init()

 arch/arm64/mm/init.c  |  3 ++-
 drivers/acpi/arm64/iort.c | 52 +++
 include/linux/acpi_iort.h |  4 +++
 3 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 94e38f99748b..f5d4f85506e4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -190,7 +191,7 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 
 #ifdef CONFIG_ZONE_DMA
dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
-   zone_dma_bits = min(32U, dt_zone_dma_bits);
+   zone_dma_bits = min3(32U, dt_zone_dma_bits, 
acpi_iort_get_zone_dma_size());
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9929ff50c0c0..05fe4a076bab 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1718,3 +1718,55 @@ void __init acpi_iort_init(void)
 
iort_init_platform_devices();
 }
+
+#ifdef CONFIG_ZONE_DMA
+/*
+ * Check the IORT whether any devices exist whose DMA mask is < 32 bits.
+ * If so, return the smallest value encountered, or 32 otherwise.
+ */
+unsigned int __init acpi_iort_get_zone_dma_size(void)
+{
+   struct acpi_table_iort *iort;
+   struct acpi_iort_node *node, *end;
+   acpi_status status;
+   u8 limit = 32;
+   int i;
+
+   if (acpi_disabled)
+   return limit;
+
+   status = acpi_get_table(ACPI_SIG_IORT, 0,
+   (struct acpi_table_header **));
+   if (ACPI_FAILURE(status))
+   return limit;
+
+   node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset);
+   end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length);
+
+   for (i = 0; i < iort->node_count; i++) {
+   if (node >= end)
+   break;
+
+   switch (node->type) {
+   struct acpi_iort_named_component *ncomp;
+   struct acpi_iort_root_complex *rc;
+
+   case ACPI_IORT_NODE_NAMED_COMPONENT:
+   ncomp = (struct acpi_iort_named_component 
*)node->node_data;
+   limit = min_not_zero(limit, 
ncomp->memory_address_limit);
+   break;
+
+   case ACPI_IORT_NODE_PCI_ROOT_COMPLEX:
+   if (node->revision < 1)
+   break;
+
+   rc = (struct acpi_iort_root_complex *)node->node_data;
+   limit = min_not_zero(limit, rc->memory_address_limit);
+   break;
+   }
+   node = ACPI_ADD_PTR(st

[PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-21 Thread Nicolas Saenz Julienne
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

The DMA layer also needs to be able to allocate memory that is
guaranteed to meet those DMA constraints, for bounce buffering as well
as allocating the backing for consistent mappings. This is why the 1 GB
ZONE_DMA was introduced recently. Unfortunately, it turns out the having
a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and
potentially in other places where allocations cannot cross zone
boundaries. Therefore, we should avoid having two separate DMA zones
when possible.

So, with the help of of_dma_get_max_cpu_address() get the topmost
physical address accessible to all DMA masters in system and use that
information to fine-tune ZONE_DMA's size. In the absence of addressing
limited masters ZONE_DMA will span the whole 32-bit address space,
otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit
address space, and have ZONE_DMA32 cover the rest of the 32-bit address
space.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v3:
 - Simplify code for readability.

Changes since v2:
 - Updated commit log by shamelessly copying Ard's ACPI commit log

 arch/arm64/mm/init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 410721fc4fc0..94e38f99748b 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -42,8 +42,6 @@
 #include 
 #include 
 
-#define ARM64_ZONE_DMA_BITS30
-
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
zone_bits)
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
+   unsigned int __maybe_unused dt_zone_dma_bits;
 
 #ifdef CONFIG_ZONE_DMA
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
+   zone_dma_bits = min(32U, dt_zone_dma_bits);
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-15 Thread Nicolas Saenz Julienne
On Thu, 2020-10-15 at 22:26 +0800, Hanjun Guo wrote:
> On 2020/10/15 3:12, Nicolas Saenz Julienne wrote:
> > From: Ard Biesheuvel 
> > 
> > We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
> > incorporating masters that can address less than 32 bits of DMA, in
> > particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
> > peripherals that can only address up to 1 GB (and its PCIe host
> > bridge can only access the bottom 3 GB)
> > 
> > Instructing the DMA layer about these limitations is straight-forward,
> > even though we had to fix some issues regarding memory limits set in
> > the IORT for named components, and regarding the handling of ACPI _DMA
> > methods. However, the DMA layer also needs to be able to allocate
> > memory that is guaranteed to meet those DMA constraints, for bounce
> > buffering as well as allocating the backing for consistent mappings.
> > 
> > This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
> > it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
> > problems with kdump, and potentially in other places where allocations
> > cannot cross zone boundaries. Therefore, we should avoid having two
> > separate DMA zones when possible.
> > 
> > So let's do an early scan of the IORT, and only create the ZONE_DMA
> > if we encounter any devices that need it. This puts the burden on
> > the firmware to describe such limitations in the IORT, which may be
> > redundant (and less precise) if _DMA methods are also being provided.
> > However, it should be noted that this situation is highly unusual for
> > arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
> > the _DMA method if implemented, and so we will not lose the ability to
> > perform streaming DMA outside the ZONE_DMA if the _DMA method permits
> > it.
> 
> Sorry, I'm still a little bit confused. With this patch, if we have
> a device which set the right _DMA method (DMA size >= 32), but with the
> wrong DMA size in IORT, we still have the ZONE_DMA created which
> is actually not needed?

Yes, that would be the case.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 6/8] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-15 Thread Nicolas Saenz Julienne
On Thu, 2020-10-15 at 07:39 +0200, Christoph Hellwig wrote:
> On Wed, Oct 14, 2020 at 09:12:08PM +0200, Nicolas Saenz Julienne wrote:
> > +   zone_dma_bits = min(zone_dma_bits,
> > +   (unsigned 
> > int)ilog2(of_dma_get_max_cpu_address(NULL)));
> 
> Plase avoid pointlessly long lines.  Especially if it is completely trivial
> by using either min_t or not overindenting like here.

Noted



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 5/8] dma-direct: Turn zone_dma_bits default value into a define

2020-10-15 Thread Nicolas Saenz Julienne
On Thu, 2020-10-15 at 07:38 +0200, Christoph Hellwig wrote:
> On Wed, Oct 14, 2020 at 09:12:07PM +0200, Nicolas Saenz Julienne wrote:
> > Set zone_dma_bits default value through a define so as for architectures
> > to be able to override it with their default value.
> 
> Architectures can do that already by assigning a value to zone_dma_bits
> at runtime.  I really do not want to add the extra clutter.

I'll remove it then.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-15 Thread Nicolas Saenz Julienne
On Thu, 2020-10-15 at 07:42 +0200, Christoph Hellwig wrote:
> > +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> > +{
> > +   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
> > +   struct of_range_parser parser;
> > +   phys_addr_t subtree_max_addr;
> > +   struct device_node *child;
> > +   phys_addr_t cpu_end = 0;
> > +   struct of_range range;
> > +   const __be32 *ranges;
> > +   int len;
> > +
> > +   if (!np)
> > +   np = of_root;
> 
> Requiring of_root to be passed explicitly would seem more natural
> to me than the magic NULL argument.  There doesn't seem to be any
> precedent for that kind of calling convention either.

I inspired that behavior from __of_find_all_nodes(). I'll change it.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 4/8] of: unittest: Add test for of_dma_get_max_cpu_address()

2020-10-15 Thread Nicolas Saenz Julienne
On Wed, 2020-10-14 at 17:04 -0500, Rob Herring wrote:
> On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne
>  wrote:
> > Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
> > data as the rest of dma-ranges unit tests.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> >  drivers/of/unittest.c | 20 
> >  1 file changed, 20 insertions(+)
> > 
> > diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
> > index 06cc988faf78..2cbf2a585c9f 100644
> > --- a/drivers/of/unittest.c
> > +++ b/drivers/of/unittest.c
> > @@ -869,6 +869,25 @@ static void __init of_unittest_changeset(void)
> >  #endif
> >  }
> > 
> > +static void __init of_unittest_dma_get_max_cpu_address(void)
> > +{
> > +#ifdef CONFIG_HAS_DMA
> 
> Can't the unittest run without this? I run the unittests under UML.

It was cargo culted from its sibling of_unittest_dma_ranges_one(), now that you
mention it, I can't seem to find the reason why it's here in the first place,
nor for other similar usages in OF code.

I ran the test in UML with all HAS_DMA conditionals removed from OF code and
things went well. I'll prepare a fix for that.

> > +   struct device_node *np;
> > +   phys_addr_t cpu_addr;
> > +
> > +   np = of_find_node_by_path("/testcase-data/address-tests");
> > +   if (!np) {
> > +   pr_err("missing testcase data\n");
> > +   return;
> > +   }
> > +
> > +   cpu_addr = of_dma_get_max_cpu_address(np);
> > +   unittest(cpu_addr == 0x5000ULL,
> > +"of_dma_get_max_cpu_address: wrong CPU addr %pad 
> > (expecting %llx)\n",
> > +_addr, 0x5000ULL);
> > +#endif
> > +}
> > +
> >  static void __init of_unittest_dma_ranges_one(const char *path,
> > u64 expect_dma_addr, u64 expect_paddr)
> >  {
> > @@ -3266,6 +3285,7 @@ static int __init of_unittest(void)
> > of_unittest_changeset();
> > of_unittest_parse_interrupts();
> > of_unittest_parse_interrupts_extended();
> > +   of_unittest_dma_get_max_cpu_address();
> > of_unittest_parse_dma_ranges();
> > of_unittest_pci_dma_ranges();
> > of_unittest_match_node();
> > --
> > 2.28.0
> > 



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-15 Thread Nicolas Saenz Julienne
On Thu, 2020-10-15 at 08:56 +0200, Ard Biesheuvel wrote:
> On Thu, 15 Oct 2020 at 00:03, Rob Herring  wrote:
> > On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne
> >  wrote:
> > > Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
> > > physical address addressable by all DMA masters in the system. It's
> > > specially useful for setting memory zones sizes at early boot time.
> > > 
> > > Signed-off-by: Nicolas Saenz Julienne 
> > > 
> > > ---
> > > 
> > > Changes since v2:
> > >  - Use PHYS_ADDR_MAX
> > >  - return phys_dma_t
> > >  - Rename function
> > >  - Correct subject
> > >  - Add support to start parsing from an arbitrary device node in order
> > >for the function to work with unit tests
> > > 
> > >  drivers/of/address.c | 42 ++
> > >  include/linux/of.h   |  7 +++
> > >  2 files changed, 49 insertions(+)
> > > 
> > > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > > index eb9ab4f1e80b..b5a9695aaf82 100644
> > > --- a/drivers/of/address.c
> > > +++ b/drivers/of/address.c
> > > @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
> > > struct bus_dma_region **map)
> > >  }
> > >  #endif /* CONFIG_HAS_DMA */
> > > 
> > > +/**
> > > + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
> > > + * @np: The node to start searching from or NULL to start from the root
> > > + *
> > > + * Gets the highest CPU physical address that is addressable by all DMA 
> > > masters
> > > + * in the system (or subtree when np is non-NULL). If no DMA constrained 
> > > device
> > > + * is found, it returns PHYS_ADDR_MAX.
> > > + */
> > > +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> > > +{
> > > +   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
> > 
> > One issue with using phys_addr_t is it may be 32-bit even though the
> > DT is 64-bit addresses. LPAE capable system with LPAE disabled. Maybe
> > the truncation is fine here? Maybe not.
> > 
> 
> PHYS_ADDR_MAX is the max addressable CPU address on the system, and so
> it makes sense to use it for the return type, and for the preliminary
> return value: this is actually what /prevents/ truncation, because we
> will only overwrite max_cpu_addr if the new u64 value is lower.
> 

Actually I now see how things might go south.

> > > +   if (ranges && len) {
> > > +   of_dma_range_parser_init(, np);
> > > +   for_each_of_range(, )
> > > +   if (range.cpu_addr + range.size > cpu_end)
> > > +   cpu_end = range.cpu_addr + range.size;

If cpu_end hits 0x1_, it'll overflow to 0. This is possible on 32-bit
systems (LPAE or not). And something similar might happen on LPAE disabled
systems.

I could add some extra logic, something like:

/* We overflowed */
if (cpu_end < range.cpu_addr)
cpu_end = PHYS_ADDR_MAX;

Which is not perfect but will cover most sensible cases.

Or simply deal internally in u64s, and upon returning, check if "max_cpu_addr"
falls higher than PHYS_ADDR_MAX.

> > > +
> > > +   if (max_cpu_addr > cpu_end)
> > > +   max_cpu_addr = cpu_end;
> > > +   }
> > > +
> > > +   for_each_available_child_of_node(np, child) {
> > > +   subtree_max_addr = of_dma_get_max_cpu_address(child);
> > > +   if (max_cpu_addr > subtree_max_addr)
> > > +   max_cpu_addr = subtree_max_addr;
> > > +   }
> > > +
> > > +   return max_cpu_addr;
> > > +}

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 1/8] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-10-15 Thread Nicolas Saenz Julienne
On Thu, 2020-10-15 at 09:40 +0100, Will Deacon wrote:
> On Wed, Oct 14, 2020 at 09:12:03PM +0200, Nicolas Saenz Julienne wrote:
> > crashkernel might reserve memory located in ZONE_DMA. We plan to delay
> > ZONE_DMA's initialization after unflattening the devicetree and ACPI's
> > boot table initialization, so move it later in the boot process.
> > Specifically into mem_init(), this is the last place crashkernel will be
> > able to reserve the memory before the page allocator kicks in and there is
> > no need to do it earlier.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> >  arch/arm64/mm/init.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Please can you cc me on the whole series next time? I know different
> maintainers have different preferences here, but I find it much easier to
> figure out what's happening when I can see all of the changes together.

Will do.

Regards,
Nicolas




signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-15 Thread Nicolas Saenz Julienne
On Wed, 2020-10-14 at 17:02 -0500, Rob Herring wrote:
> On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne
>  wrote:
> > Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
> > physical address addressable by all DMA masters in the system. It's
> > specially useful for setting memory zones sizes at early boot time.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> > 
> > ---

[...]

> > +   struct of_range_parser parser;
> > +   phys_addr_t subtree_max_addr;
> > +   struct device_node *child;
> > +   phys_addr_t cpu_end = 0;
> > +   struct of_range range;
> > +   const __be32 *ranges;
> > +   int len;
> > +
> > +   if (!np)
> > +   np = of_root;
> > +
> > +   ranges = of_get_property(np, "dma-ranges", );
> 
> I'm not really following why you changed the algorithm here. You're
> skipping disabled nodes which is good. Was there some other reason?

Yes, it's a little more complex. But I had to change it in order to be able to
start parsing down from an arbitrary device node, which is needed for the unit
tests.

for_each_of_allnodes() and friends will traverse the whole tree, regardless of
the starting point. I couldn't find a similar function that would just iterate
over a subsection of the tree, so I went with this recursive approach.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 4/8] of: unittest: Add test for of_dma_get_max_cpu_address()

2020-10-14 Thread Nicolas Saenz Julienne
Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
data as the rest of dma-ranges unit tests.

Signed-off-by: Nicolas Saenz Julienne 
---
 drivers/of/unittest.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 06cc988faf78..2cbf2a585c9f 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -869,6 +869,25 @@ static void __init of_unittest_changeset(void)
 #endif
 }
 
+static void __init of_unittest_dma_get_max_cpu_address(void)
+{
+#ifdef CONFIG_HAS_DMA
+   struct device_node *np;
+   phys_addr_t cpu_addr;
+
+   np = of_find_node_by_path("/testcase-data/address-tests");
+   if (!np) {
+   pr_err("missing testcase data\n");
+   return;
+   }
+
+   cpu_addr = of_dma_get_max_cpu_address(np);
+   unittest(cpu_addr == 0x5000ULL,
+"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting 
%llx)\n",
+_addr, 0x5000ULL);
+#endif
+}
+
 static void __init of_unittest_dma_ranges_one(const char *path,
u64 expect_dma_addr, u64 expect_paddr)
 {
@@ -3266,6 +3285,7 @@ static int __init of_unittest(void)
of_unittest_changeset();
of_unittest_parse_interrupts();
of_unittest_parse_interrupts_extended();
+   of_unittest_dma_get_max_cpu_address();
of_unittest_parse_dma_ranges();
of_unittest_pci_dma_ranges();
of_unittest_match_node();
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-14 Thread Nicolas Saenz Julienne
From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.

Cc: Jeremy Linton 
Cc: Lorenzo Pieralisi 
Cc: Nicolas Saenz Julienne 
Cc: Rob Herring 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Cc: Hanjun Guo 
Cc: Sudeep Holla 
Cc: Anshuman Khandual 
Signed-off-by: Ard Biesheuvel 
[nsaenz: Rebased, removed documentation change, warnings and add
declaration in acpi_iort.h]
Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c  |  6 +
 drivers/acpi/arm64/iort.c | 51 +++
 include/linux/acpi_iort.h |  4 +++
 3 files changed, 61 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 97b0d2768349..f321761eedb2 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -196,6 +197,11 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 #ifdef CONFIG_ZONE_DMA
zone_dma_bits = min(zone_dma_bits,
(unsigned 
int)ilog2(of_dma_get_max_cpu_address(NULL)));
+
+   if (IS_ENABLED(CONFIG_ACPI))
+   zone_dma_bits = min(zone_dma_bits,
+   acpi_iort_get_zone_dma_size());
+
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9929ff50c0c0..8f530bf3c03b 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1718,3 +1718,54 @@ void __init acpi_iort_init(void)
 
iort_init_platform_devices();
 }
+
+#ifdef CONFIG_ZONE_DMA
+/*
+ * Check the IORT whether any devices exist whose DMA mask is < 32 bits.
+ * If so, return the smallest value encountered, or 32 otherwise.
+ */
+unsigned int __init acpi_iort_get_zone_dma_size(void)
+{
+   struct acpi_table_iort *iort;
+   struct acpi_iort_node *node, *end;
+   acpi_status status;
+   u8 limit = 32;
+   int i;
+
+   if (acpi_disabled)
+   return limit;
+
+   status = acpi_get_table(ACPI_SIG_IORT, 0,
+   (struct acpi_table_header **));
+   if (ACPI_FAILURE(status))
+   return limit;
+
+   node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset);
+   end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length);
+
+   for (i = 0; i < iort->node_count; i++) {
+   if (node >= end)
+   break;
+
+   switch (node->type) {
+   struct acpi_iort_named_component *ncomp;
+   struct acpi_iort_root_complex *rc;
+
+   case ACPI_IORT_NODE_NAMED_COMPONENT:
+   ncomp = (struct acpi_iort_named_component 
*)node->node_data;
+   if (ncomp->memory_address_limit)
+   limit = min(limit, ncomp->memory_address_limit);
+   break;
+
+   case ACPI_IORT_NODE_PCI_ROOT_COMPLEX:
+   rc = (struct acpi_iort_root_complex *)node->node_data;
+   if (rc->memory_address_limit)
+   limit = min(limit, rc->memory_address_limit);
+   break;
+   }
+   node = ACPI_ADD_PTR(struct acpi_iort_node, node, node->

[PATCH v3 2/8] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()

2020-10-14 Thread Nicolas Saenz Julienne
zone_dma_bits's initialization happens earlier that it's actually
needed, in arm64_memblock_init(). So move it into the more suitable
zone_sizes_init().

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1d29f2ca81c7..ef0ef0087e2c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -196,6 +196,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
+   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
 #ifdef CONFIG_ZONE_DMA32
@@ -386,11 +388,6 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
-   arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
-   }
-
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_phys(32);
else
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 6/8] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-14 Thread Nicolas Saenz Julienne
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

The DMA layer also needs to be able to allocate memory that is
guaranteed to meet those DMA constraints, for bounce buffering as well
as allocating the backing for consistent mappings. This is why the 1 GB
ZONE_DMA was introduced recently. Unfortunately, it turns out the having
a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and
potentially in other places where allocations cannot cross zone
boundaries. Therefore, we should avoid having two separate DMA zones
when possible.

So, with the help of of_dma_get_max_cpu_address() get the topmost
physical address accessible to all DMA masters in system and use that
information to fine-tune ZONE_DMA's size. In the absence of addressing
limited masters ZONE_DMA will span the whole 32-bit address space,
otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit
address space, and have ZONE_DMA32 cover the rest of the 32-bit address
space.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v2:
 - Updated commit log by shamelessly copying Ard's ACPI counterpart commit log

 arch/arm64/include/asm/processor.h | 1 +
 arch/arm64/mm/init.c   | 5 ++---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index fce8cbecd6bc..c09d3f1a9a6b 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -97,6 +97,7 @@
 
 extern phys_addr_t arm64_dma_phys_limit;
 #define ARCH_LOW_ADDRESS_LIMIT (arm64_dma_phys_limit - 1)
+#define ZONE_DMA_BITS_DEFAULT  32
 
 struct debug_info {
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ef0ef0087e2c..97b0d2768349 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -42,8 +42,6 @@
 #include 
 #include 
 
-#define ARM64_ZONE_DMA_BITS30
-
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -196,7 +194,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   zone_dma_bits = min(zone_dma_bits,
+   (unsigned 
int)ilog2(of_dma_get_max_cpu_address(NULL)));
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-14 Thread Nicolas Saenz Julienne
Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
physical address addressable by all DMA masters in the system. It's
specially useful for setting memory zones sizes at early boot time.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v2:
 - Use PHYS_ADDR_MAX
 - return phys_dma_t
 - Rename function
 - Correct subject
 - Add support to start parsing from an arbitrary device node in order
   for the function to work with unit tests

 drivers/of/address.c | 42 ++
 include/linux/of.h   |  7 +++
 2 files changed, 49 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index eb9ab4f1e80b..b5a9695aaf82 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
 }
 #endif /* CONFIG_HAS_DMA */
 
+/**
+ * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
+ * @np: The node to start searching from or NULL to start from the root
+ *
+ * Gets the highest CPU physical address that is addressable by all DMA masters
+ * in the system (or subtree when np is non-NULL). If no DMA constrained device
+ * is found, it returns PHYS_ADDR_MAX.
+ */
+phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
+{
+   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
+   struct of_range_parser parser;
+   phys_addr_t subtree_max_addr;
+   struct device_node *child;
+   phys_addr_t cpu_end = 0;
+   struct of_range range;
+   const __be32 *ranges;
+   int len;
+
+   if (!np)
+   np = of_root;
+
+   ranges = of_get_property(np, "dma-ranges", );
+   if (ranges && len) {
+   of_dma_range_parser_init(, np);
+   for_each_of_range(, )
+   if (range.cpu_addr + range.size > cpu_end)
+   cpu_end = range.cpu_addr + range.size;
+
+   if (max_cpu_addr > cpu_end)
+   max_cpu_addr = cpu_end;
+   }
+
+   for_each_available_child_of_node(np, child) {
+   subtree_max_addr = of_dma_get_max_cpu_address(child);
+   if (max_cpu_addr > subtree_max_addr)
+   max_cpu_addr = subtree_max_addr;
+   }
+
+   return max_cpu_addr;
+}
+
 /**
  * of_dma_is_coherent - Check if device is coherent
  * @np:device node
diff --git a/include/linux/of.h b/include/linux/of.h
index 481ec0467285..db8db8f2c967 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
   const char *map_name, const char *map_mask_name,
   struct device_node **target, u32 *id_out);
 
+phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 id,
return -EINVAL;
 }
 
+static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np)
+{
+   return PHYS_ADDR_MAX;
+}
+
 #define of_match_ptr(_ptr) NULL
 #define of_match_node(_matches, _node) NULL
 #endif /* CONFIG_OF */
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/8] arm64: Default to 32-bit wide ZONE_DMA

2020-10-14 Thread Nicolas Saenz Julienne
Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Changes since v2:
 - Introduce Ard's patch
 - Improve OF dma-ranges parsing function
 - Add unit test for OF function
 - Address small changes
 - Move crashkernel reservation later in boot process

Changes since v1:
 - Parse dma-ranges instead of using machine compatible string

Ard Biesheuvel (1):
  arm64: mm: Set ZONE_DMA size based on early IORT scan

Nicolas Saenz Julienne (7):
  arm64: mm: Move reserve_crashkernel() into mem_init()
  arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
  of/address: Introduce of_dma_get_max_cpu_address()
  of: unittest: Add test for of_dma_get_max_cpu_address()
  dma-direct: Turn zone_dma_bits default value into a define
  arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
  mm: Update DMA zones description

 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/mm/init.c   | 20 ++--
 drivers/acpi/arm64/iort.c  | 51 ++
 drivers/of/address.c   | 42 
 drivers/of/unittest.c  | 20 
 include/linux/acpi_iort.h  |  4 +++
 include/linux/dma-direct.h |  3 ++
 include/linux/mmzone.h |  5 +--
 include/linux/of.h |  7 
 kernel/dma/direct.c|  2 +-
 10 files changed, 143 insertions(+), 12 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 5/8] dma-direct: Turn zone_dma_bits default value into a define

2020-10-14 Thread Nicolas Saenz Julienne
Set zone_dma_bits default value through a define so as for architectures
to be able to override it with their default value.

Signed-off-by: Nicolas Saenz Julienne 
---
 include/linux/dma-direct.h | 3 +++
 kernel/dma/direct.c| 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 18aade195884..e433d90cbacf 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -12,6 +12,9 @@
 #include 
 #include 
 
+#ifndef ZONE_DMA_BITS_DEFAULT
+#define ZONE_DMA_BITS_DEFAULT  24
+#endif
 extern unsigned int zone_dma_bits;
 
 /*
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 06c111544f61..c0d97f536e93 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -20,7 +20,7 @@
  * it for entirely different regions. In that case the arch code needs to
  * override the variable below for dma-direct to work properly.
  */
-unsigned int zone_dma_bits __ro_after_init = 24;
+unsigned int zone_dma_bits __ro_after_init = ZONE_DMA_BITS_DEFAULT;
 
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
phys_addr_t phys)
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/8] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-10-14 Thread Nicolas Saenz Julienne
crashkernel might reserve memory located in ZONE_DMA. We plan to delay
ZONE_DMA's initialization after unflattening the devicetree and ACPI's
boot table initialization, so move it later in the boot process.
Specifically into mem_init(), this is the last place crashkernel will be
able to reserve the memory before the page allocator kicks in and there is
no need to do it earlier.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a53c1e0fb017..1d29f2ca81c7 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -396,8 +396,6 @@ void __init arm64_memblock_init(void)
else
arm64_dma32_phys_limit = PHYS_MASK + 1;
 
-   reserve_crashkernel();
-
reserve_elfcorehdr();
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
@@ -518,6 +516,8 @@ void __init mem_init(void)
else
swiotlb_force = SWIOTLB_NO_FORCE;
 
+   reserve_crashkernel();
+
set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 8/8] mm: Update DMA zones description

2020-10-14 Thread Nicolas Saenz Julienne
The default behavior for arm64 changed, so reflect that.

Signed-off-by: Nicolas Saenz Julienne 
Acked-by: Catalin Marinas 
---
 include/linux/mmzone.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..4ee2306351b9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -363,8 +363,9 @@ enum zone_type {
 *  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
 *the specific device.
 *
-*  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
-*lower 4G.
+*  - arm64 uses a single 4GB ZONE_DMA, except on the Raspberry Pi 4,
+*in which ZONE_DMA covers the first GB and ZONE_DMA32 the rest of
+*the lower 4GB.
 *
 *  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
 *depending on the specific device.
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()

2020-10-14 Thread Nicolas Saenz Julienne
Hi Rob,

On Mon, 2020-10-12 at 10:25 -0500, Rob Herring wrote:
> On Sat, Oct 10, 2020 at 10:12 AM Nicolas Saenz Julienne
>  wrote:
> > The function provides the CPU physical address addressable by the most
> > constrained bus in the system. It might be useful in order to
> > dynamically set up memory zones during boot.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> >  drivers/of/address.c | 34 ++
> >  include/linux/of.h   |  7 +++
> >  2 files changed, 41 insertions(+)
> > 
> > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > index eb9ab4f1e80b..755e97b65096 100644
> > --- a/drivers/of/address.c
> > +++ b/drivers/of/address.c
> > @@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const 
> > struct bus_dma_region **map)
> >  }
> >  #endif /* CONFIG_HAS_DMA */
> > 
> > +/**
> > + * of_dma_safe_phys_limit - Get system wide DMA safe address space
> > + *
> > + * Gets the CPU physical address limit for safe DMA addressing system wide 
> > by
> > + * searching for the most constraining dma-range. Otherwise it returns 
> > ~0ULL.
> > + */
> > +u64 __init of_dma_safe_phys_limit(void)
> > +{
> > +   struct device_node *np = NULL;
> > +   struct of_range_parser parser;
> > +   const __be32 *ranges = NULL;
> > +   u64 phys_dma_limit = ~0ULL;
> > +   struct of_range range;
> > +   int len;
> > +
> > +   for_each_of_allnodes(np) {
> > +   dma_addr_t cpu_end = 0;
> > +
> > +   ranges = of_get_property(np, "dma-ranges", );
> > +   if (!ranges || !len)
> > +   continue;
> > +
> > +   of_dma_range_parser_init(, np);
> > +   for_each_of_range(, )
> > +   if (range.cpu_addr + range.size > cpu_end)
> > +   cpu_end = range.cpu_addr + range.size;
> 
> This doesn't work if you have more than one level of dma-ranges. The
> address has to be translated first. It should be okay to do that on
> the start or end address (if not, your DT is broken).

for_each_of_range() calls of_pci_range_parser_one() which utimately populates
range.cpu_addr with of_translate_dma_address() results. Isn't that good enough?

> Please add/extend a unittest for this.

Will do.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()

2020-10-14 Thread Nicolas Saenz Julienne
On Sun, 2020-10-11 at 09:47 +0200, Ard Biesheuvel wrote:
> Hi Nicolas,
> 
> $SUBJECT is out of sync with the patch below. Also, for legibility, it
> helps if the commit log is intelligible by itself, rather than relying
> on $SUBJECT being the first line of the first paragraph.

Noted, I'll update all commit logs.

> On Sat, 10 Oct 2020 at 17:12, Nicolas Saenz Julienne
>  wrote:
> > The function provides the CPU physical address addressable by the most
> > constrained bus in the system. It might be useful in order to
> > dynamically set up memory zones during boot.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> >  drivers/of/address.c | 34 ++
> >  include/linux/of.h   |  7 +++
> >  2 files changed, 41 insertions(+)
> > 
> > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > index eb9ab4f1e80b..755e97b65096 100644
> > --- a/drivers/of/address.c
> > +++ b/drivers/of/address.c
> > @@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const 
> > struct bus_dma_region **map)
> >  }
> >  #endif /* CONFIG_HAS_DMA */
> > 
> > +/**
> > + * of_dma_safe_phys_limit - Get system wide DMA safe address space
> > + *
> > + * Gets the CPU physical address limit for safe DMA addressing system wide 
> > by
> > + * searching for the most constraining dma-range. Otherwise it returns 
> > ~0ULL.
> > + */
> > +u64 __init of_dma_safe_phys_limit(void)
> 
> I don't think 'safe' strikes the right tone here. You are looking for
> the highest CPU address that is addressable by all DMA masters in the
> system.
> 
> Something like
> 
> of_dma_get_max_cpu_address(void)
> 
> perhaps? Also, since this is generic code, phys_addr_t is probably a
> better type to return.

Sonds good to me, I dindn't like the name I used either.

Will use with phys_addr_t.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 3/5] dma-direct: Turn zone_dma_bits default value into a define

2020-10-10 Thread Nicolas Saenz Julienne
So as for architecture code to set their own default values when
relevant.

Signed-off-by: Nicolas Saenz Julienne 
---

Note: This is not really needed, but I think it nicer having
architectures use this than setting zone_dma_bits in a random place in
arch code. That said, I'll hapily edit it out if you don't agree.

 include/linux/dma-direct.h | 3 +++
 kernel/dma/direct.c| 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 18aade195884..e433d90cbacf 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -12,6 +12,9 @@
 #include 
 #include 
 
+#ifndef ZONE_DMA_BITS_DEFAULT
+#define ZONE_DMA_BITS_DEFAULT  24
+#endif
 extern unsigned int zone_dma_bits;
 
 /*
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 06c111544f61..c0d97f536e93 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -20,7 +20,7 @@
  * it for entirely different regions. In that case the arch code needs to
  * override the variable below for dma-direct to work properly.
  */
-unsigned int zone_dma_bits __ro_after_init = 24;
+unsigned int zone_dma_bits __ro_after_init = ZONE_DMA_BITS_DEFAULT;
 
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
phys_addr_t phys)
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/5] arm64: mm: Dynamically resize zone_dma_bits based on system's constraints

2020-10-10 Thread Nicolas Saenz Julienne
With the help of of_dma_safe_phys_limit() we can get the topmost
physical address accessible for DMA to the whole system and use that
information to properly setup zone_dma_bits.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/include/asm/processor.h | 1 +
 arch/arm64/mm/init.c   | 5 ++---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index fce8cbecd6bc..c09d3f1a9a6b 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -97,6 +97,7 @@
 
 extern phys_addr_t arm64_dma_phys_limit;
 #define ARCH_LOW_ADDRESS_LIMIT (arm64_dma_phys_limit - 1)
+#define ZONE_DMA_BITS_DEFAULT  32
 
 struct debug_info {
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 0eca5865dcb1..5934df93bf4d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -42,8 +42,6 @@
 #include 
 #include 
 
-#define ARM64_ZONE_DMA_BITS30
-
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -196,7 +194,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   zone_dma_bits = min(zone_dma_bits,
+   (unsigned int)ilog2(of_dma_safe_phys_limit()));
 
if (IS_ENABLED(CONFIG_ACPI)) {
extern unsigned int acpi_iort_get_zone_dma_size(void);
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/5] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()

2020-10-10 Thread Nicolas Saenz Julienne
There no use for initializing it earlier in arm64_memblock_init().

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index f6902a2b4ea6..0eca5865dcb1 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -196,14 +196,16 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
+   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+
if (IS_ENABLED(CONFIG_ACPI)) {
extern unsigned int acpi_iort_get_zone_dma_size(void);
 
zone_dma_bits = min(zone_dma_bits,
acpi_iort_get_zone_dma_size());
-   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
}
 
+   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
 #ifdef CONFIG_ZONE_DMA32
@@ -394,11 +396,6 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
-   arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
-   }
-
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_phys(32);
else
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()

2020-10-10 Thread Nicolas Saenz Julienne
The function provides the CPU physical address addressable by the most
constrained bus in the system. It might be useful in order to
dynamically set up memory zones during boot.

Signed-off-by: Nicolas Saenz Julienne 
---
 drivers/of/address.c | 34 ++
 include/linux/of.h   |  7 +++
 2 files changed, 41 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index eb9ab4f1e80b..755e97b65096 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
 }
 #endif /* CONFIG_HAS_DMA */
 
+/**
+ * of_dma_safe_phys_limit - Get system wide DMA safe address space
+ *
+ * Gets the CPU physical address limit for safe DMA addressing system wide by
+ * searching for the most constraining dma-range. Otherwise it returns ~0ULL.
+ */
+u64 __init of_dma_safe_phys_limit(void)
+{
+   struct device_node *np = NULL;
+   struct of_range_parser parser;
+   const __be32 *ranges = NULL;
+   u64 phys_dma_limit = ~0ULL;
+   struct of_range range;
+   int len;
+
+   for_each_of_allnodes(np) {
+   dma_addr_t cpu_end = 0;
+
+   ranges = of_get_property(np, "dma-ranges", );
+   if (!ranges || !len)
+   continue;
+
+   of_dma_range_parser_init(, np);
+   for_each_of_range(, )
+   if (range.cpu_addr + range.size > cpu_end)
+   cpu_end = range.cpu_addr + range.size;
+
+   if (phys_dma_limit > cpu_end)
+   phys_dma_limit = cpu_end;
+   }
+
+   return phys_dma_limit;
+}
+
 /**
  * of_dma_is_coherent - Check if device is coherent
  * @np:device node
diff --git a/include/linux/of.h b/include/linux/of.h
index 481ec0467285..958c64cffa92 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
   const char *map_name, const char *map_mask_name,
   struct device_node **target, u32 *id_out);
 
+u64 of_dma_safe_phys_limit(void);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 id,
return -EINVAL;
 }
 
+static inline u64 of_dma_safe_phys_limit(void)
+{
+   return ~0ULL;
+}
+
 #define of_match_ptr(_ptr) NULL
 #define of_match_node(_matches, _node) NULL
 #endif /* CONFIG_OF */
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/5] arm64: Default to 32-bit wide ZONE_DMA

2020-10-10 Thread Nicolas Saenz Julienne
I realized this morning after reading Ard's patch fixing the same issue
in ACPI that we can move the zone_dma_bits initialization later in the
init process. This permits the use of OF to parse dma-ranges in the
system. Something we though we couldn't do on previous iterations of
this.

The series sits on top of Ard's patch "arm64: mm: set ZONE_DMA size
based on early IORT scan."

--- Original cover letter

Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Changes since v1:
 - Parse dma-ranges instead of using machine compatible string

Nicolas Saenz Julienne (5):
  arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
  of/address: Introduce of_dma_lower_bus_limit()
  dma-direct: Turn zone_dma_bits default value into a define
  arm64: mm: Dynamically resize zone_dma_bits based on system's
constraints
  mm: Update DMA zones description

 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/mm/init.c   | 12 ---
 drivers/of/address.c   | 34 ++
 include/linux/dma-direct.h |  3 +++
 include/linux/mmzone.h |  5 +++--
 include/linux/of.h |  7 ++
 kernel/dma/direct.c|  2 +-
 7 files changed, 53 insertions(+), 11 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 5/5] mm: Update DMA zones description

2020-10-10 Thread Nicolas Saenz Julienne
The default behavior for arm64 changed, so reflect that.

Signed-off-by: Nicolas Saenz Julienne 
Acked-by: Catalin Marinas 
---
 include/linux/mmzone.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..4ee2306351b9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -363,8 +363,9 @@ enum zone_type {
 *  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
 *the specific device.
 *
-*  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
-*lower 4G.
+*  - arm64 uses a single 4GB ZONE_DMA, except on the Raspberry Pi 4,
+*in which ZONE_DMA covers the first GB and ZONE_DMA32 the rest of
+*the lower 4GB.
 *
 *  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
 *depending on the specific device.
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-10 Thread Nicolas Saenz Julienne
On Sat, 2020-10-10 at 12:36 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 19:10, Catalin Marinas  wrote:
> > On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> > >  wrote:
> > > > We can move this check to IORT code and call it from arm64 if it
> > > > can be made to work.
> > > 
> > > Finding the smallest value in the IORT, and assigning it to
> > > zone_dma_bits if it is < 32 should be easy. But as I understand it,
> > > having these separate DMA and DMA32 zones is what breaks kdump, no? So
> > > how is this going to fix the underlying issue?
> > 
> > If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
> > allocations fall back to ZONE_DMA).
> > 
> > kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
> > be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
> > became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
> > may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
> > would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
> > (it's been a while since I looked), the kdump allocation couldn't span
> > multiple zones.
> > 
> > In a separate thread, we try to fix kdump to use allocations above 4G as
> > a fallback but this only fixes platforms with enough RAM (and maybe it's
> > only those platforms that care about kdump).
> > 
> 
> One thing that strikes me as odd is that we are applying the same
> shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if
> DRAM starts outside of the zone, it is shifted upwards.
> 
> On a typical ARM box, this gives me
> 
> [0.00] Zone ranges:
> [0.00]   DMA  [mem 0x8000-0xbfff]
> [0.00]   DMA32[mem 0xc000-0x]
> [0.00]   Normal   [mem 0x0001-0x000f]
> 
> i.e., the 30-bit addressable range has bit 31 set, which is weird.

Yes I agree it's weird, and IMO plain useless. I sent a series this summer to
address this[1], which ultimately triggered the discussion we're having right
now.

Although with with your latest patch and the DT counterpart, we should be OK.
It would be weird for a HW description to define DMA constraints that are
impossible to reach on that system.

> I wonder if it wouldn't be better (and less problematic in the general
> case) to drop this logic for ZONE_DMA, and simply let it remain empty
> unless there is really some memory there.

From my experience, you can't have empty ZONE_DMA when enabled. Empty
ZONE_DMA32 is OK though. Although I'm sure it's something that can be changed.

Regards,
Nicolas

[1] https://lkml.org/lkml/2020/8/19/1022




signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Nicolas Saenz Julienne
On Fri, 2020-10-09 at 11:13 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 10:36, Nicolas Saenz Julienne
>  wrote:
> > On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> > > On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
> > > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > > Sadly I just realised that the series is incomplete, we have RPi4 
> > > > > users that
> > > > > want to boot unsing ACPI, and this series would break things for 
> > > > > them. I'll
> > > > > have a word with them to see what we can do for their use-case.
> > > > 
> > > > Stupid question:  why do these users insist on a totally unsuitable
> > > > interface? And why would we as Linux developers care to support such
> > > > a aims?
> > > 
> > > The point is really whether we want to revert changes in Linux that
> > > made both DT and ACPI boot work without quirks on RPi4.
> > 
> > Well, and broke a big amount of devices that were otherwise fine.
> > 
> 
> Yeah that was unfortunate.
> 
> > > Having to check the RPi4 compatible string or OEM id in core init code is
> > > awful, regardless of whether you boot via ACPI or via DT.
> > > 
> > > The problem with this hardware is that it uses a DMA mask which is
> > > narrower than 32, and the arm64 kernel is simply not set up to deal
> > > with that at all. On DT, we have DMA ranges properties and the likes
> > > to describe such limitations, on ACPI we have _DMA methods as well as
> > > DMA range attributes in the IORT, both of which are now handled
> > > correctly. So all the information is there, we just have to figure out
> > > how to consume it early on.
> > 
> > Is it worth the effort just for a single board? I don't know about ACPI but
> > parsing dma-ranges that early at boot time is not trivial. My intuition 
> > tells
> > me that it'd be even harder for ACPI, being a more complex data structure.
> > 
> 
> Yes, it will be harder, especially for the _DMA methods.
> 
> > > Interestingly, this limitation always existed in the SoC, but it
> > > wasn't until they started shipping it with more than 1 GB of DRAM that
> > > it became a problem. This means issues like this could resurface in
> > > the future with existing SoCs when they get shipped with more memory,
> > > and so I would prefer fixing this in a generic way.
> > 
> > Actually what I proposed here is pretty generic. Specially from arm64's
> > perspective. We call early_init_dt_scan(), which sets up zone_dma_bits 
> > based on
> > whatever it finds in DT. Both those operations are architecture independent.
> > arm64 arch code doesn't care about the logic involved in ascertaining
> > zone_dma_bits. I get that the last step isn't generic. But it's all setup 
> > so as
> > to make it as such whenever it's worth the effort.
> > 
> 
> The problem is that, while we are providing a full description of the
> SoC's capabilities, we short circuit this by inserting knowledge into
> the code (that is shared between all DT architectures) that
> "brcm,bcm2711" is special, and needs a DMA zone override.

Yes I understand this and I sympathize with it, not the most beautiful thing
out there :). But that's only half the issue, as I said, implementing this
early at boot time is a tangible amount of work and a burden to maintain just
for one board. So this is the compromise we discussed with the DT maintainer
(RobH). The series sets things up so as to be able to implement the right
thing transparently to arm64's architecture when deemed worth the effort.

Ultimately, if you're worried about inserting knowledge into the code, aren't
we doing that, in a more extreme way, when imposing an extra unwarranted zone
to the whole arm64 ecosystem?

Note that I'm more that happy to work on alternative solutions, but let's first
settle on what would be acceptable to everyone.

> I think for ACPI boot, we might be able to work around this by cold
> plugging the memory above 1 GB, but I have to double check whether it
> won't get pulled into ZONE_DMA32 anyway (unless anyone can answer that
> for me here from the top of their head)

Don't know much about what ACPI memory cold plugging involves, but we'll still
need a proper ZONE_DMA32 (i.e. spanning the whole 32-bit address space) for
RPi4.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Nicolas Saenz Julienne
Hi Jeremy.

On Thu, 2020-10-08 at 22:59 -0500, Jeremy Linton wrote:
> On 10/8/20 2:43 PM, Ard Biesheuvel wrote:
> > (+ Lorenzo)
> > 
> > On Thu, 8 Oct 2020 at 12:14, Catalin Marinas  
> > wrote:
> > > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > > On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne 
> > > > > wrote:
> > > > > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz 
> > > > > > > > Julienne wrote:
> > > > > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > > > > --- a/drivers/of/fdt.c
> > > > > > > > > +++ b/drivers/of/fdt.c
> > > > > > > > > @@ -25,6 +25,7 @@
> > > > > > > > >   #include 
> > > > > > > > >   #include 
> > > > > > > > >   #include 
> > > > > > > > > +#include   /* for zone_dma_bits */
> > > > > > > > > 
> > > > > > > > >   #include   /* for COMMAND_LINE_SIZE */
> > > > > > > > >   #include 
> > > > > > > > > @@ -1198,6 +1199,14 @@ void __init 
> > > > > > > > > early_init_dt_scan_nodes(void)
> > > > > > > > >  of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > > > > >   }
> > > > > > > > > 
> > > > > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > > > > +{
> > > > > > > > > +   unsigned long dt_root = of_get_flat_dt_root();
> > > > > > > > > +
> > > > > > > > > +   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > > > > +   zone_dma_bits = 30;
> > > > > > > > > +}
> > > > > > > > 
> > > > > > > > I think we could keep this entirely in the arm64 
> > > > > > > > setup_machine_fdt() and
> > > > > > > > not pollute the core code with RPi4-specific code.
> > > > > > > 
> > > > > > > Actually, even better, could we not move the check to
> > > > > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > > > > > 
> > > > > > I did it this way as I vaguely remembered Rob saying he wanted to 
> > > > > > centralise
> > > > > > all early boot fdt code in one place. But I'll be happy to move it 
> > > > > > there.
> > > > > 
> > > > > I can see Rob replied and I'm fine if that's his preference. However,
> > > > > what I don't particularly like is that in the arm64 code, if
> > > > > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched 
> > > > > by
> > > > > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > > > > get a platform that actually needs 24 here (I truly hope not, but just
> > > > > the principle of relying on magic values)?
> > > > > 
> > > > > So rather than guessing, I'd prefer if the arch code can override
> > > > > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > > > > need to explicitly touch the zone_dma_bits variable.
> > > > 
> > > > Yes, sonds like the way to go. TBH I wasn't happy with that solution 
> > > > either,
> > > > but couldn't think of a nicer alternative.
> > > > 
> > > > Sadly I just realised that the series is incomplete, we have RPi4 users 
> > > > that
> > > > want to boot unsing ACPI, and this series would break things for them. 
> > > > I'll
> > > > have a word with them to see what we can do for their use-case.
> > > 
> > > Is there a way to get some SoC information from ACPI?
> > > 
> > 
> > This is unfortunate. We used ACPI _DMA methods as they were designed
> > to communica

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Nicolas Saenz Julienne
On Fri, 2020-10-09 at 09:37 +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 09:11, Christoph Hellwig  wrote:
> > On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> > > Sadly I just realised that the series is incomplete, we have RPi4 users 
> > > that
> > > want to boot unsing ACPI, and this series would break things for them. 
> > > I'll
> > > have a word with them to see what we can do for their use-case.
> > 
> > Stupid question:  why do these users insist on a totally unsuitable
> > interface? And why would we as Linux developers care to support such
> > a aims?
>
> The point is really whether we want to revert changes in Linux that
> made both DT and ACPI boot work without quirks on RPi4.

Well, and broke a big amount of devices that were otherwise fine.

> Having to check the RPi4 compatible string or OEM id in core init code is
> awful, regardless of whether you boot via ACPI or via DT.
>
> The problem with this hardware is that it uses a DMA mask which is
> narrower than 32, and the arm64 kernel is simply not set up to deal
> with that at all. On DT, we have DMA ranges properties and the likes
> to describe such limitations, on ACPI we have _DMA methods as well as
> DMA range attributes in the IORT, both of which are now handled
> correctly. So all the information is there, we just have to figure out
> how to consume it early on.

Is it worth the effort just for a single board? I don't know about ACPI but
parsing dma-ranges that early at boot time is not trivial. My intuition tells
me that it'd be even harder for ACPI, being a more complex data structure.

> Interestingly, this limitation always existed in the SoC, but it
> wasn't until they started shipping it with more than 1 GB of DRAM that
> it became a problem. This means issues like this could resurface in
> the future with existing SoCs when they get shipped with more memory,
> and so I would prefer fixing this in a generic way.

Actually what I proposed here is pretty generic. Specially from arm64's
perspective. We call early_init_dt_scan(), which sets up zone_dma_bits based on
whatever it finds in DT. Both those operations are architecture independent.
arm64 arch code doesn't care about the logic involved in ascertaining
zone_dma_bits. I get that the last step isn't generic. But it's all setup so as
to make it as such whenever it's worth the effort.

> Also, I assume papering over the issue like this does not fix the
> kdump issue fundamentally, it just works around it, and so we might
> run into this again in the future.

Any ideas? The way I understand it the kdump issue is just a shortcoming of
the memory zones design.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-08 Thread Nicolas Saenz Julienne
Hi Catalin, sorry for the late reply.

On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > --- a/drivers/of/fdt.c
> > > > > +++ b/drivers/of/fdt.c
> > > > > @@ -25,6 +25,7 @@
> > > > >  #include 
> > > > >  #include 
> > > > >  #include 
> > > > > +#include /* for zone_dma_bits */
> > > > >  
> > > > >  #include   /* for COMMAND_LINE_SIZE */
> > > > >  #include 
> > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > >   of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > >  }
> > > > >  
> > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > +{
> > > > > + unsigned long dt_root = of_get_flat_dt_root();
> > > > > +
> > > > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > + zone_dma_bits = 30;
> > > > > +}
> > > > 
> > > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > > not pollute the core code with RPi4-specific code.
> > > 
> > > Actually, even better, could we not move the check to
> > > arm64_memblock_init() when we initialise zone_dma_bits?
> > 
> > I did it this way as I vaguely remembered Rob saying he wanted to centralise
> > all early boot fdt code in one place. But I'll be happy to move it there.
> 
> I can see Rob replied and I'm fine if that's his preference. However,
> what I don't particularly like is that in the arm64 code, if
> zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> get a platform that actually needs 24 here (I truly hope not, but just
> the principle of relying on magic values)?
> 
> So rather than guessing, I'd prefer if the arch code can override
> ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> need to explicitly touch the zone_dma_bits variable.

Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
but couldn't think of a nicer alternative.

Sadly I just realised that the series is incomplete, we have RPi4 users that
want to boot unsing ACPI, and this series would break things for them. I'll
have a word with them to see what we can do for their use-case.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-01 Thread Nicolas Saenz Julienne
On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > Hi Nicolas,
> > 
> > Thanks for putting this together.
> > 
> > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > index 4602e467ca8b..cd0d115ef329 100644
> > > --- a/drivers/of/fdt.c
> > > +++ b/drivers/of/fdt.c
> > > @@ -25,6 +25,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include /* for zone_dma_bits */
> > >  
> > >  #include   /* for COMMAND_LINE_SIZE */
> > >  #include 
> > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > >   of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > >  }
> > >  
> > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > +{
> > > + unsigned long dt_root = of_get_flat_dt_root();
> > > +
> > > + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > + zone_dma_bits = 30;
> > > +}
> > 
> > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > not pollute the core code with RPi4-specific code.
> 
> Actually, even better, could we not move the check to
> arm64_memblock_init() when we initialise zone_dma_bits?

I did it this way as I vaguely remembered Rob saying he wanted to centralise
all early boot fdt code in one place. But I'll be happy to move it there.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/4] arm64: Default to 32-bit wide ZONE_DMA

2020-10-01 Thread Nicolas Saenz Julienne
Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Nicolas Saenz Julienne (4):
  of/fdt: Update zone_dma_bits when running in bcm2711
  dma-direct: Turn zone_dma_bits default value into a define
  arm64: Default to 32-bit ZONE_DMA
  mm: Update DMA zones description with arm64 newer behavior

 arch/arm64/mm/init.c   | 12 
 drivers/of/fdt.c   | 10 ++
 include/linux/dma-direct.h |  1 +
 include/linux/mmzone.h |  5 +++--
 kernel/dma/direct.c|  2 +-
 5 files changed, 23 insertions(+), 7 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] arm64: Default to 32-bit ZONE_DMA

2020-10-01 Thread Nicolas Saenz Julienne
The Raspberry Pi 4 needs two DMA zones as some of its devices can only
DMA into the 30-bit physical address space. We solved that by creating
an extra ZONE_DMA covering the 30-bit. It turns out that creating extra
zones unnecessarily broke Kdump on large systems. So default to a single
32-bit wide ZONE_DMA and only define both zones if running on RPi4.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index e1a69a618832..3c3f462466eb 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -43,8 +43,6 @@
 #include 
 #include 
 
-#define ARM64_ZONE_DMA_BITS30
-
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -388,8 +386,14 @@ void __init arm64_memblock_init(void)
early_init_fdt_scan_reserved_mem();
 
if (IS_ENABLED(CONFIG_ZONE_DMA)) {
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
-   arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
+   /*
+* early_init_dt_scan() might alter zone_dma_bits based on the
+* device's DT. Otherwise, have it cover the 32-bit address
+* space.
+*/
+   if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT)
+   zone_dma_bits = 32;
+   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
}
 
if (IS_ENABLED(CONFIG_ZONE_DMA32))
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/4] dma-direct: Turn zone_dma_bits default value into a define

2020-10-01 Thread Nicolas Saenz Julienne
Other code might need to reference it.

Signed-off-by: Nicolas Saenz Julienne 
---
 include/linux/dma-direct.h | 1 +
 kernel/dma/direct.c| 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 38ed3b55034d..ef19ff505d09 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 
+#define ZONE_DMA_BITS_DEFAULT  24
 extern unsigned int zone_dma_bits;
 
 /*
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 121a9c1969dd..4b3cfa02532b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -20,7 +20,7 @@
  * it for entirely different regions. In that case the arch code needs to
  * override the variable below for dma-direct to work properly.
  */
-unsigned int zone_dma_bits __ro_after_init = 24;
+unsigned int zone_dma_bits __ro_after_init = ZONE_DMA_BITS_DEFAULT;
 
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
phys_addr_t phys)
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] mm: Update DMA zones description

2020-10-01 Thread Nicolas Saenz Julienne
The default behavior for arm64 changed, so reflect that.

Signed-off-by: Nicolas Saenz Julienne 
---
 include/linux/mmzone.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..d28ce77ccc2a 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -363,8 +363,9 @@ enum zone_type {
 *  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
 *the specific device.
 *
-*  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
-*lower 4G.
+*  - arm64 uses a single 4GB ZONE_DMA, except on the Raspberry Pi 4,
+*in which ZONE_DMA covers the first GB and ZONE_DMA32 the rest of
+*the lower 4GB.
 *
 *  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
 *depending on the specific device.
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-01 Thread Nicolas Saenz Julienne
arm64 wants to be able to set ZONE_DMA's size depending on the specific
platform its being run on. Ideally this could be achieved in a smart way
by parsing all dma-ranges and calculating the smaller DMA constraint in
the system. Easier said than done. We compromised on a simpler solution
as the only platform interested in using this is the Raspberry Pi 4.

So update zone_dma_bits if the machine's compatible string matches
Raspberry Pi 4's, otherwise let arm64's mm code deal with it.

Signed-off-by: Nicolas Saenz Julienne 
---
 drivers/of/fdt.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 4602e467ca8b..cd0d115ef329 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include   /* for zone_dma_bits */
 
 #include   /* for COMMAND_LINE_SIZE */
 #include 
@@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
of_scan_flat_dt(early_init_dt_scan_memory, NULL);
 }
 
+void __init early_init_dt_update_zone_dma_bits(void)
+{
+   unsigned long dt_root = of_get_flat_dt_root();
+
+   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
+   zone_dma_bits = 30;
+}
+
 bool __init early_init_dt_scan(void *params)
 {
bool status;
@@ -1207,6 +1216,7 @@ bool __init early_init_dt_scan(void *params)
return false;
 
early_init_dt_scan_nodes();
+   early_init_dt_update_zone_dma_bits();
return true;
 }
 
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 07/11] device-mapping: Introduce DMA range map, supplanting dma_pfn_offset

2020-09-08 Thread Nicolas Saenz Julienne
On Tue, 2020-09-08 at 11:43 +0200, Christoph Hellwig wrote:
> And because I like replying to myself so much, here is a link to the
> version with the arm cleanup patch applied.  Unlike the previous two
> attempts this has at least survived very basic sanity testing:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-ranges.2
> 
> Note that we'll still need to sort out the arm/keystone warnings from
> the original patch.  Do we have anyone on the CC list who knows that
> platform a little better to figure out if the ifdef solution would work?

Had to do the following to boot without errors:

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index ef61a33c47bc..7dd88a0b6d0b 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -97,6 +97,9 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 {
dma_addr_t end = addr + size - 1;
 
+   if (addr == DMA_MAPPING_ERROR)
+   return false;
+
if (!dev->dma_mask)
return false;
 
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 90f1ecb6baaf..25809703a5bf 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -71,7 +71,12 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, 
u64 dma_mask,
 
 static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
-   return phys_to_dma_direct(dev, phys) + size - 1 <=
+   dma_addr_t dma_addr = phys_to_dma_direct(dev, phys);
+
+   if (dma_addr == DMA_MAPPING_ERROR)
+   return false;
+
+   return dma_addr + size - 1 <=
min_not_zero(dev->coherent_dma_mask, 
dev->bus_dma_limit);
 }


Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v11 07/11] device-mapping: Introduce DMA range map, supplanting dma_pfn_offset

2020-09-07 Thread Nicolas Saenz Julienne
On Mon, 2020-09-07 at 13:40 -0400, Jim Quinlan wrote:
> On Mon, Sep 7, 2020 at 11:01 AM Nicolas Saenz Julienne
>  wrote:
> > > 
> > > Hi Nicolas,
> > > 
> > > Can you please help us out here?  It appears that your commit
> > 
> > It's dma_offset_from_dma_addr() that's causing trouble. It goes over all the
> > dma regions and, if no region matches the phys address range, it returns 0.
> > This isn't acceptable as is. In the past, we used to pass the offset
> > nonetheless, which would make the phys address grow out of the dma memory 
> > area
> > and fail the dma_capable() test.
> > 
> > For example, RPi4 devices attached to the old interconnect see phys [0x0
> > 0x3fff] at [0xc000 0x]. So say you're trying to convert
> > physical address 0xfa00, you'll get 0 from dma_offset_from_phys_addr()
> > (since your dma range only covers the first GB) to then test if 0xfa00 
> > is
> > dma_capable(), which it is, but for the wrong reasons. Causing DMA issues
> > further down the line.
> > 
> > I don't have a proper suggestion on how to solve this but here's the 
> > solution I
> > used:
> > 
> > diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> > index 4c4646761afe..40fe3c97f2be 100644
> > --- a/include/linux/dma-mapping.h
> > +++ b/include/linux/dma-mapping.h
> > @@ -217,11 +217,19 @@ static inline u64 dma_offset_from_phys_addr(struct 
> > device *dev,
> >  {
> > const struct bus_dma_region *m = dev->dma_range_map;
> > 
> > -   if (m)
> > +   if (m) {
> > for (; m->size; m++)
> > if (paddr >= m->cpu_start &&
> > paddr - m->cpu_start < m->size)
> > return m->offset;
> > +
> > +   /*
> > +* The physical address doesn't fit any of the DMA regions,
> > +* return an impossible to fulfill offset.
> > +*/
> > +   return -(1ULL << 46);
> > +   }
> > +
> > return 0;
> >  }
> Hi Nicolas,
> 
> Thanks for looking into this.  The concern I have with your solution
> is that returning an arbitrarily large offset might overlap with an
> improbable but valid usage.  AFAIK there is nothing that disallows
> mapping a device to anywhere within the 64bit range of PCIe space,
> including up to say 0x.

Indeed, that's why I wasn't all that happy with my solution.

As an alternative, how about returning '-dev->bus_dma_limit' instead of 0? It's
always 0 for the devices without bus_dma_regions, and, I think, an always
unattainable offset for devices that do (I tested it on RPi4 with the 30bit
limitd mmc controller and it seems to work alright).

--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -222,7 +222,8 @@ static inline u64 dma_offset_from_phys_addr(struct device 
*dev,
if (paddr >= m->cpu_start &&
paddr - m->cpu_start < m->size)
return m->offset;
-   return 0;
+
+   return -dev->bus_dma_limit;
 }

> As an alternative perhaps changing dma_capable() so that if the
> dma_range_map is present then it validates that both ends of the
> prospective DMA region get "hits" on one of the offset regions in the
> map.  Christoph, if you are okay with this I can quickly post a patch.

IIUC, by the time you enter dma_capable() you already converted the physical
address to DMA address and the damage is done.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v11 07/11] device-mapping: Introduce DMA range map, supplanting dma_pfn_offset

2020-09-07 Thread Nicolas Saenz Julienne
Hi Christoph, a small fix to your fixes:

On Tue, 2020-09-01 at 10:24 +0200, Christoph Hellwig wrote:
> I've applied this to the dma-mapping tree.
> 
> I had to resolve a conflict in drivers/of/address.c with a recent
> mainline commit.  I also applied the minor tweaks Andy pointed out
> plus a few more style changes.  A real change is that I changed the
> prototype for dma_copy_dma_range_map to require less boilerplate code.

diff --git a/drivers/usb/core/usb.c b/drivers/usb/core/usb.c
index 0b5f8d62f251..4cd012817af6 100644
--- a/drivers/usb/core/usb.c
+++ b/drivers/usb/core/usb.c
@@ -610,7 +610,7 @@ struct usb_device *usb_alloc_dev(struct usb_device *parent,
 * mask for the entire HCD, so don't do that.
 */
dev->dev.dma_mask = bus->sysdev->dma_mask;
-   if (!dma_copy_dma_range_map(>dev, bus->sysdev))
+   if (dma_copy_dma_range_map(>dev, bus->sysdev))
dev_err(>dev, "failed to copy DMA map\n");
set_dev_node(>dev, dev_to_node(bus->sysdev));
dev->state = USB_STATE_ATTACHED;

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v11 07/11] device-mapping: Introduce DMA range map, supplanting dma_pfn_offset

2020-09-07 Thread Nicolas Saenz Julienne
Hi Jim, sorry I'm a little late to the party, but was on vacation.

On Thu, 2020-09-03 at 13:32 -0400, Jim Quinlan wrote:
> On Wed, Sep 2, 2020 at 8:52 PM Nathan Chancellor
>  wrote:
> > On Wed, Sep 02, 2020 at 05:36:29PM -0700, Florian Fainelli wrote:
> > > 
> > > On 9/2/2020 3:38 PM, Nathan Chancellor wrote:
> > > [snip]
> > > > > Hello Nathan,
> > > > > 
> > > > > Can you tell me how much memory your RPI has and if all of it is
> > > > 
> > > > This is the 4GB version.
> > > > 
> > > > > accessible by the PCIe device?  Could you also please include the DTS
> > > > > of the PCIe node?  IIRC, the RPI firmware does some mangling of the
> > > > > PCIe DT before Linux boots -- could you describe what is going on
> > > > > there?
> > > > 
> > > > Unfortunately, I am not familiar with how to get this information. If
> > > > you could provide some instructions for how to do so, I am more than
> > > > happy to. I am not very knowleagable about the inner working of the Pi,
> > > > I mainly use it as a test platform for making sure that LLVM does not
> > > > cause problems on real devices.
> > > 
> > > Can you bring the dtc application to your Pi root filesystem, and if so, 
> > > can
> > > you run the following:
> > > 
> > > dtc -I fs -O dtb /proc/device-tree -f > /tmp/device.dtb
> > 
> > Sure, the result is attached.
> > 
> > > or cat /sys/firmware/fdt > device.dtb
> > > 
> > > and attach the resulting file?
> > > 
> > > > > Finally, can you attach the text of the full boot log?
> > > > 
> > > > I have attached a working and broken boot log. Thank you for the quick
> > > > response!
> > > 
> > > Is it possible for you to rebuild your kernel with CONFIG_MMC_DEBUG by any
> > > chance?
> > 
> > Of course. A new log is attached with the debug output from that config.
> 
> Hi Nicolas,
> 
> Can you please help us out here?  It appears that your commit

It's dma_offset_from_dma_addr() that's causing trouble. It goes over all the
dma regions and, if no region matches the phys address range, it returns 0.
This isn't acceptable as is. In the past, we used to pass the offset
nonetheless, which would make the phys address grow out of the dma memory area
and fail the dma_capable() test.

For example, RPi4 devices attached to the old interconnect see phys [0x0
0x3fff] at [0xc000 0x]. So say you're trying to convert
physical address 0xfa00, you'll get 0 from dma_offset_from_phys_addr()
(since your dma range only covers the first GB) to then test if 0xfa00 is
dma_capable(), which it is, but for the wrong reasons. Causing DMA issues
further down the line.

I don't have a proper suggestion on how to solve this but here's the solution I
used:

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4c4646761afe..40fe3c97f2be 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -217,11 +217,19 @@ static inline u64 dma_offset_from_phys_addr(struct device 
*dev,
 {
const struct bus_dma_region *m = dev->dma_range_map;
 
-   if (m)
+   if (m) {
for (; m->size; m++)
if (paddr >= m->cpu_start &&
paddr - m->cpu_start < m->size)
return m->offset;
+
+   /*
+* The physical address doesn't fit any of the DMA regions,
+* return an impossible to fulfill offset.
+*/
+   return -(1ULL << 46);
+   }
+
return 0;
 }

I didn't catch this on my previous tests as I was using a newer bcm2711 SoC
revision which expanded emmc2's DMA capabilities (can do 32 bit DMA as opposed
to 30 bit).

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v4 1/2] dma-pool: Only allocate from CMA when in same memory zone

2020-08-14 Thread Nicolas Saenz Julienne
There is no guarantee to CMA's placement, so allocating a zone specific
atomic pool from CMA might return memory from a completely different
memory zone. To get around this double check CMA's placement before
allocating from it.

Signed-off-by: Nicolas Saenz Julienne 
[hch: rebased, added a fallback to the page allocator, allow dipping into
  lower CMA pools]
Signed-off-by: Christoph Hellwig 
---

Changes since v3:
 - Do not use memblock_start_of_DRAM()

Changes since v2:
 - Go back to v1 behavior

 kernel/dma/pool.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 6bc74a2d5127..57f4a0f32a92 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -3,7 +3,9 @@
  * Copyright (C) 2012 ARM Ltd.
  * Copyright (C) 2020 Google LLC
  */
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -55,6 +57,29 @@ static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)
pool_size_kernel += size;
 }
 
+static bool cma_in_zone(gfp_t gfp)
+{
+   unsigned long size;
+   phys_addr_t end;
+   struct cma *cma;
+
+   cma = dev_get_cma_area(NULL);
+   if (!cma)
+   return false;
+
+   size = cma_get_size(cma);
+   if (!size)
+   return false;
+
+   /* CMA can't cross zone boundaries, see cma_activate_area() */
+   end = cma_get_base(cma) + size - 1;
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+   return end <= DMA_BIT_MASK(zone_dma_bits);
+   if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
+   return end <= DMA_BIT_MASK(32);
+   return true;
+}
+
 static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
  gfp_t gfp)
 {
@@ -68,7 +93,11 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
 
do {
pool_size = 1 << (PAGE_SHIFT + order);
-   page = alloc_pages(gfp, order);
+   if (cma_in_zone(gfp))
+   page = dma_alloc_from_contiguous(NULL, 1 << order,
+order, false);
+   if (!page)
+   page = alloc_pages(gfp, order);
} while (!page && order-- > 0);
if (!page)
goto out;
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 0/2] dma-pool fixes

2020-08-14 Thread Nicolas Saenz Julienne
Now that we have an explanation to Amir's issue, we can re-spin this
series.

---
Changes since v3:
 - Do not use memblock_start_of_DRAM()

Changes since v2:
 - Go back to v1's behavior for patch #2

Changes since v1:
 - Make cma_in_zone() more strict, GFP_KERNEL doesn't default to true
   now

 - Check if phys_addr_ok() exists prior calling it

Christoph Hellwig (1):
  dma-pool: fix coherent pool allocations for IOMMU mappings

Nicolas Saenz Julienne (1):
  dma-pool: Only allocate from CMA when in same memory zone

 drivers/iommu/dma-iommu.c   |   4 +-
 include/linux/dma-direct.h  |   3 -
 include/linux/dma-mapping.h |   5 +-
 kernel/dma/direct.c |  13 +++-
 kernel/dma/pool.c   | 145 +++-
 5 files changed, 92 insertions(+), 78 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 2/2] dma-pool: fix coherent pool allocations for IOMMU mappings

2020-08-14 Thread Nicolas Saenz Julienne
From: Christoph Hellwig 

When allocating coherent pool memory for an IOMMU mapping we don't care
about the DMA mask.  Move the guess for the initial GFP mask into the
dma_direct_alloc_pages and pass dma_coherent_ok as a function pointer
argument so that it doesn't get applied to the IOMMU case.

Signed-off-by: Christoph Hellwig 
---

Changes since v1:
  - Check if phys_addr_ok() exists prior calling it

 drivers/iommu/dma-iommu.c   |   4 +-
 include/linux/dma-direct.h  |   3 -
 include/linux/dma-mapping.h |   5 +-
 kernel/dma/direct.c |  13 ++--
 kernel/dma/pool.c   | 114 +++-
 5 files changed, 62 insertions(+), 77 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..5141d49a046b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1035,8 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
-   cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), ,
-  gfp);
+   page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), _addr,
+  gfp, NULL);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, , gfp, attrs);
if (!cpu_addr)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 5a3ce2a24794..6e87225600ae 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -73,9 +73,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
-gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
- u64 *phys_mask);
-bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 016b96b384bd..52635e91143b 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -522,8 +522,9 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
pgprot_t prot, const void *caller);
 void dma_common_free_remap(void *cpu_addr, size_t size);
 
-void *dma_alloc_from_pool(struct device *dev, size_t size,
- struct page **ret_page, gfp_t flags);
+struct page *dma_alloc_from_pool(struct device *dev, size_t size,
+   void **cpu_addr, gfp_t flags,
+   bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t));
 bool dma_free_from_pool(struct device *dev, void *start, size_t size);
 
 int
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index bb0041e99659..db6ef07aec3b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -43,7 +43,7 @@ u64 dma_direct_get_required_mask(struct device *dev)
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }
 
-gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
  u64 *phys_limit)
 {
u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
@@ -68,7 +68,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 
dma_mask,
return 0;
 }
 
-bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
+static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
return phys_to_dma_direct(dev, phys) + size - 1 <=
min_not_zero(dev->coherent_dma_mask, 
dev->bus_dma_limit);
@@ -161,8 +161,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
size = PAGE_ALIGN(size);
 
if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
-   ret = dma_alloc_from_pool(dev, size, , gfp);
-   if (!ret)
+   u64 phys_mask;
+
+   gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+   _mask);
+   page = dma_alloc_from_pool(dev, size, , gfp,
+   dma_coherent_ok);
+   if (!page)
return NULL;
goto done;
}
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 57f4a0f32a92..b0aaba4197ae 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -225,93 +225,75 @@ static int __init dma_atomic_pool_init(void)
 }
 postcore_initcall(dma_atomic_pool_init);
 
-static inline struct gen_pool *dma_guess_pool_from_device(struct device *dev)
+static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
 {
-   u64 phys_mask;
-   gfp_t gfp;
-
-   gfp = 

Re: [PATCH v3 2/2] dma-pool: Only allocate from CMA when in same memory zone

2020-08-14 Thread Nicolas Saenz Julienne
On Fri, 2020-08-14 at 08:06 +0200, Christoph Hellwig wrote:
> On Sat, Aug 08, 2020 at 08:33:54AM +0200, Christoph Hellwig wrote:
> > On Fri, Aug 07, 2020 at 10:50:19AM +0200, Nicolas Saenz Julienne wrote:
> > > On Fri, 2020-08-07 at 07:21 +0200, Christoph Hellwig wrote:
> > > > On Thu, Aug 06, 2020 at 08:47:55PM +0200, Nicolas Saenz Julienne wrote:
> > > > > There is no guarantee to CMA's placement, so allocating a zone 
> > > > > specific
> > > > > atomic pool from CMA might return memory from a completely different
> > > > > memory zone. To get around this double check CMA's placement before
> > > > > allocating from it.
> > > > 
> > > > As the builtbot pointed out, memblock_start_of_DRAM can't be used from
> > > > non-__init code.  But lookig at it I think throwing that in
> > > > is bogus anyway, as cma_get_base returns a proper physical address
> > > > already.
> > > 
> > > It does indeed, but I'm comparing CMA's base with bitmasks that don't 
> > > take into
> > > account where the memory starts. Say memory starts at 0x8000, and CMA 
> > > falls
> > > into ZONE_DMA [0x8000 0xC000], if you want to compare it with
> > > DMA_BIT_MASK(zone_dma_bits) you're forced to unify the memory bases.
> > > 
> > > That said, I now realize that this doesn't work for ZONE_DMA32 which has 
> > > a hard
> > > limit on 32bit addresses reglardless of the memory base.
> > > 
> > > That said I still need to call memblock_start_of_DRAM() any suggestions 
> > > WRT
> > > that? I could save the value in dma_atomic_pool_init(), which is __init 
> > > code.
> > 
> > The pool must be about a 32-bit physical address.  The offsets can be
> > different for every device..

I now see what you mean.

I was trying to blindly fit CMA with arm64's DMA zone setup, which, as it turns
out, doesn't really honor its purpose. arm64 introduced ZONE_DMA to provide a
30-bit address space, but we're creating it regardless of whether it exists or
not. This creates a mismatch between zone_dma_bits and ZONE_DMA's real
placement. I'll try to look at fixing that in arm64.

> Do you plan to resend this one without the memblock_start_of_DRAM
> thingy?

Yes, sorry for the wait, I've been on vacation and short on time, I'll send it
during the day.

Regards,
Nicolas

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/2] dma-pool: Only allocate from CMA when in same memory zone

2020-08-07 Thread Nicolas Saenz Julienne
On Fri, 2020-08-07 at 07:21 +0200, Christoph Hellwig wrote:
> On Thu, Aug 06, 2020 at 08:47:55PM +0200, Nicolas Saenz Julienne wrote:
> > There is no guarantee to CMA's placement, so allocating a zone specific
> > atomic pool from CMA might return memory from a completely different
> > memory zone. To get around this double check CMA's placement before
> > allocating from it.
> 
> As the builtbot pointed out, memblock_start_of_DRAM can't be used from
> non-__init code.  But lookig at it I think throwing that in
> is bogus anyway, as cma_get_base returns a proper physical address
> already.

It does indeed, but I'm comparing CMA's base with bitmasks that don't take into
account where the memory starts. Say memory starts at 0x8000, and CMA falls
into ZONE_DMA [0x8000 0xC000], if you want to compare it with
DMA_BIT_MASK(zone_dma_bits) you're forced to unify the memory bases.

That said, I now realize that this doesn't work for ZONE_DMA32 which has a hard
limit on 32bit addresses reglardless of the memory base.

That said I still need to call memblock_start_of_DRAM() any suggestions WRT
that? I could save the value in dma_atomic_pool_init(), which is __init code.



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v3 1/2] dma-pool: fix coherent pool allocations for IOMMU mappings

2020-08-06 Thread Nicolas Saenz Julienne
From: Christoph Hellwig 

When allocating coherent pool memory for an IOMMU mapping we don't care
about the DMA mask.  Move the guess for the initial GFP mask into the
dma_direct_alloc_pages and pass dma_coherent_ok as a function pointer
argument so that it doesn't get applied to the IOMMU case.

Signed-off-by: Christoph Hellwig 
---

Changes since v1:
  - Check if phys_addr_ok() exists prior calling it

 drivers/iommu/dma-iommu.c   |   4 +-
 include/linux/dma-direct.h  |   3 -
 include/linux/dma-mapping.h |   5 +-
 kernel/dma/direct.c |  13 ++--
 kernel/dma/pool.c   | 114 +++-
 5 files changed, 62 insertions(+), 77 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..5141d49a046b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1035,8 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
-   cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), ,
-  gfp);
+   page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), _addr,
+  gfp, NULL);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, , gfp, attrs);
if (!cpu_addr)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 5a3ce2a24794..6e87225600ae 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -73,9 +73,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
-gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
- u64 *phys_mask);
-bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 016b96b384bd..52635e91143b 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -522,8 +522,9 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
pgprot_t prot, const void *caller);
 void dma_common_free_remap(void *cpu_addr, size_t size);
 
-void *dma_alloc_from_pool(struct device *dev, size_t size,
- struct page **ret_page, gfp_t flags);
+struct page *dma_alloc_from_pool(struct device *dev, size_t size,
+   void **cpu_addr, gfp_t flags,
+   bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t));
 bool dma_free_from_pool(struct device *dev, void *start, size_t size);
 
 int
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index bb0041e99659..db6ef07aec3b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -43,7 +43,7 @@ u64 dma_direct_get_required_mask(struct device *dev)
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }
 
-gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
  u64 *phys_limit)
 {
u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
@@ -68,7 +68,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 
dma_mask,
return 0;
 }
 
-bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
+static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
return phys_to_dma_direct(dev, phys) + size - 1 <=
min_not_zero(dev->coherent_dma_mask, 
dev->bus_dma_limit);
@@ -161,8 +161,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
size = PAGE_ALIGN(size);
 
if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
-   ret = dma_alloc_from_pool(dev, size, , gfp);
-   if (!ret)
+   u64 phys_mask;
+
+   gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+   _mask);
+   page = dma_alloc_from_pool(dev, size, , gfp,
+   dma_coherent_ok);
+   if (!page)
return NULL;
goto done;
}
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 6bc74a2d5127..5d071d4a3cba 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -196,93 +196,75 @@ static int __init dma_atomic_pool_init(void)
 }
 postcore_initcall(dma_atomic_pool_init);
 
-static inline struct gen_pool *dma_guess_pool_from_device(struct device *dev)
+static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
 {
-   u64 phys_mask;
-   gfp_t gfp;
-
-   gfp = 

[PATCH v3 2/2] dma-pool: Only allocate from CMA when in same memory zone

2020-08-06 Thread Nicolas Saenz Julienne
There is no guarantee to CMA's placement, so allocating a zone specific
atomic pool from CMA might return memory from a completely different
memory zone. To get around this double check CMA's placement before
allocating from it.

Signed-off-by: Nicolas Saenz Julienne 
[hch: rebased, added a fallback to the page allocator, allow dipping into
  lower CMA pools]
Signed-off-by: Christoph Hellwig 
---

Changes since v2:
 - Go back to v1 behavior

 kernel/dma/pool.c | 31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5d071d4a3cba..30d28d761490 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -3,7 +3,9 @@
  * Copyright (C) 2012 ARM Ltd.
  * Copyright (C) 2020 Google LLC
  */
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -55,6 +57,29 @@ static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)
pool_size_kernel += size;
 }
 
+static bool cma_in_zone(gfp_t gfp)
+{
+   unsigned long size;
+   phys_addr_t end;
+   struct cma *cma;
+
+   cma = dev_get_cma_area(NULL);
+   if (!cma)
+   return false;
+
+   size = cma_get_size(cma);
+   if (!size)
+   return false;
+
+   /* CMA can't cross zone boundaries, see cma_activate_area() */
+   end = cma_get_base(cma) - memblock_start_of_DRAM() + size - 1;
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+   return end <= DMA_BIT_MASK(zone_dma_bits);
+   if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
+   return end <= DMA_BIT_MASK(32);
+   return true;
+}
+
 static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
  gfp_t gfp)
 {
@@ -68,7 +93,11 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
 
do {
pool_size = 1 << (PAGE_SHIFT + order);
-   page = alloc_pages(gfp, order);
+   if (cma_in_zone(gfp))
+   page = dma_alloc_from_contiguous(NULL, 1 << order,
+order, false);
+   if (!page)
+   page = alloc_pages(gfp, order);
} while (!page && order-- > 0);
if (!page)
goto out;
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/2] dma-pool fixes

2020-08-06 Thread Nicolas Saenz Julienne
Now that we have an explanation to Amir's issue, we can re-spin this
series.

---
Changes since v2:
 - Go back to v1's behavior for patch #2

Changes since v1:
 - Make cma_in_zone() more strict, GFP_KERNEL doesn't default to true
   now

 - Check if phys_addr_ok() exists prior calling it

Christoph Hellwig (1):
  dma-pool: fix coherent pool allocations for IOMMU mappings

Nicolas Saenz Julienne (1):
  dma-pool: Only allocate from CMA when in same memory zone

 drivers/iommu/dma-iommu.c   |   4 +-
 include/linux/dma-direct.h  |   3 -
 include/linux/dma-mapping.h |   5 +-
 kernel/dma/direct.c |  13 +++-
 kernel/dma/pool.c   | 145 +++-
 5 files changed, 92 insertions(+), 78 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] dma-pool: Only allocate from CMA when in same memory zone

2020-08-06 Thread Nicolas Saenz Julienne
On Thu, 2020-08-06 at 07:18 +0200, Christoph Hellwig wrote:
> On Tue, Aug 04, 2020 at 11:43:15AM +0200, Nicolas Saenz Julienne wrote:
> > > Second I don't see the need (and actually some harm) in preventing 
> > > GFP_KERNEL
> > > allocations from dipping into lower CMA areas - something that we did 
> > > support
> > > before 5.8 with the single pool.
> > 
> > My thinking is the least we pressure CMA the better, it's generally scarse, 
> > and
> > it'll not grow as the atomic pools grow. As far as harm is concerned, we now
> > check addresses for correctness, so we shouldn't run into problems.
> > 
> > There is a potential case for architectures defining a default CMA but not
> > defining DMA zones where this could be problematic. But isn't that just 
> > plain
> > abusing CMA? If you need low memory allocations, you should be defining DMA
> > zones.
> 
> The latter is pretty much what I expect, as we only support the default and
> per-device DMA CMAs.

Fair enough, should I send a v3 with everything cleaned-up/rebased, or you'd
rather pick it up from your version?



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/2] dma-pool: Only allocate from CMA when in same memory zone

2020-08-04 Thread Nicolas Saenz Julienne
On Tue, 2020-08-04 at 08:06 +0200, Christoph Hellwig wrote:
> On Mon, Aug 03, 2020 at 06:09:56PM +0200, Nicolas Saenz Julienne wrote:
> > +   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> > +   return end <= DMA_BIT_MASK(zone_dma_bits);
> > +   if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> > +   return end <= DMA_BIT_MASK(32);
> > +   if (gfp & GFP_KERNEL)
> > +   return end > DMA_BIT_MASK(32);
> 
> So the GFP_KERNEL one here looks weird.  For one I don't think the if
> line is needed at all, and it just confuses things.

Yes, sorry, shoud've seen that.

> Second I don't see the need (and actually some harm) in preventing GFP_KERNEL
> allocations from dipping into lower CMA areas - something that we did support
> before 5.8 with the single pool.

My thinking is the least we pressure CMA the better, it's generally scarse, and
it'll not grow as the atomic pools grow. As far as harm is concerned, we now
check addresses for correctness, so we shouldn't run into problems.

There is a potential case for architectures defining a default CMA but not
defining DMA zones where this could be problematic. But isn't that just plain
abusing CMA? If you need low memory allocations, you should be defining DMA
zones.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 2/2] dma-pool: Only allocate from CMA when in same memory zone

2020-08-03 Thread Nicolas Saenz Julienne
There is no guarantee to CMA's placement, so allocating a zone specific
atomic pool from CMA might return memory from a completely different
memory zone. To get around this double check CMA's placement before
allocating from it.

Signed-off-by: Nicolas Saenz Julienne 
---

Changes since v1:
 - Make cma_in_zone() more strict, GFP_KERNEL doesn't default to true
   now

 kernel/dma/pool.c | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5d071d4a3cba..582523ccf4fe 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -3,7 +3,9 @@
  * Copyright (C) 2012 ARM Ltd.
  * Copyright (C) 2020 Google LLC
  */
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -55,6 +57,32 @@ static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)
pool_size_kernel += size;
 }
 
+static bool cma_in_zone(gfp_t gfp)
+{
+   unsigned long size;
+   phys_addr_t end;
+   struct cma *cma;
+
+   cma = dev_get_cma_area(NULL);
+   if (!cma)
+   return false;
+
+   size = cma_get_size(cma);
+   if (!size)
+   return false;
+
+   /* CMA can't cross zone boundaries, see cma_activate_area() */
+   end = cma_get_base(cma) - memblock_start_of_DRAM() + size - 1;
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
+   return end <= DMA_BIT_MASK(zone_dma_bits);
+   if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
+   return end <= DMA_BIT_MASK(32);
+   if (gfp & GFP_KERNEL)
+   return end > DMA_BIT_MASK(32);
+
+   return false;
+}
+
 static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
  gfp_t gfp)
 {
@@ -68,7 +96,11 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
 
do {
pool_size = 1 << (PAGE_SHIFT + order);
-   page = alloc_pages(gfp, order);
+   if (cma_in_zone(gfp))
+   page = dma_alloc_from_contiguous(NULL, 1 << order,
+order, false);
+   if (!page)
+   page = alloc_pages(gfp, order);
} while (!page && order-- > 0);
if (!page)
goto out;
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/2] dma-pool fixes

2020-08-03 Thread Nicolas Saenz Julienne
Now that we have an explanation to Amir's issue, I took the liberty to
respin the previous dma-pool fixes series with some changes/fixes of my
own.

---

Changes since v1:
 - Make cma_in_zone() more strict, GFP_KERNEL doesn't default to true
   now

 - Check if phys_addr_ok() exists prior calling it

Christoph Hellwig (1):
  dma-pool: fix coherent pool allocations for IOMMU mappings

Nicolas Saenz Julienne (1):
  dma-pool: Only allocate from CMA when in same memory zone

 drivers/iommu/dma-iommu.c   |   4 +-
 include/linux/dma-direct.h  |   3 -
 include/linux/dma-mapping.h |   5 +-
 kernel/dma/direct.c |  13 +++-
 kernel/dma/pool.c   | 148 
 5 files changed, 95 insertions(+), 78 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/2] dma-pool: fix coherent pool allocations for IOMMU mappings

2020-08-03 Thread Nicolas Saenz Julienne
From: Christoph Hellwig 

When allocating coherent pool memory for an IOMMU mapping we don't care
about the DMA mask.  Move the guess for the initial GFP mask into the
dma_direct_alloc_pages and pass dma_coherent_ok as a function pointer
argument so that it doesn't get applied to the IOMMU case.

Signed-off-by: Christoph Hellwig 
---

Changes since v1:
 - Check if phys_addr_ok() exists prior calling it

 drivers/iommu/dma-iommu.c   |   4 +-
 include/linux/dma-direct.h  |   3 -
 include/linux/dma-mapping.h |   5 +-
 kernel/dma/direct.c |  13 ++--
 kernel/dma/pool.c   | 114 +++-
 5 files changed, 62 insertions(+), 77 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..5141d49a046b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1035,8 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
-   cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), ,
-  gfp);
+   page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), _addr,
+  gfp, NULL);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, , gfp, attrs);
if (!cpu_addr)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index ab2e20cba951..ba22952c24e2 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -67,9 +67,6 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
-gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
- u64 *phys_mask);
-bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index a33ed3954ed4..0dc08701d7b7 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -715,8 +715,9 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
pgprot_t prot, const void *caller);
 void dma_common_free_remap(void *cpu_addr, size_t size);
 
-void *dma_alloc_from_pool(struct device *dev, size_t size,
- struct page **ret_page, gfp_t flags);
+struct page *dma_alloc_from_pool(struct device *dev, size_t size,
+   void **cpu_addr, gfp_t flags,
+   bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t));
 bool dma_free_from_pool(struct device *dev, void *start, size_t size);
 
 int
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 67f060b86a73..f17aec9d01f0 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -45,7 +45,7 @@ u64 dma_direct_get_required_mask(struct device *dev)
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }
 
-gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
  u64 *phys_limit)
 {
u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
@@ -70,7 +70,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 
dma_mask,
return 0;
 }
 
-bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
+static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 {
return phys_to_dma_direct(dev, phys) + size - 1 <=
min_not_zero(dev->coherent_dma_mask, 
dev->bus_dma_limit);
@@ -163,8 +163,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
size = PAGE_ALIGN(size);
 
if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
-   ret = dma_alloc_from_pool(dev, size, , gfp);
-   if (!ret)
+   u64 phys_mask;
+
+   gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+   _mask);
+   page = dma_alloc_from_pool(dev, size, , gfp,
+   dma_coherent_ok);
+   if (!page)
return NULL;
goto done;
}
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 6bc74a2d5127..5d071d4a3cba 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -196,93 +196,75 @@ static int __init dma_atomic_pool_init(void)
 }
 postcore_initcall(dma_atomic_pool_init);
 
-static inline struct gen_pool *dma_guess_pool_from_device(struct device *dev)
+static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp)
 {
-   u64 phys_mask;
-   gfp_t gfp;
-
-   gfp = 

Re: [PATCH v9 08/12] device core: Introduce DMA range map, supplanting dma_pfn_offset

2020-08-01 Thread Nicolas Saenz Julienne
Hi Jim, here's some comments after testing your series against RPi4.

On Fri, 2020-07-24 at 16:33 -0400, Jim Quinlan wrote:
> The new field 'dma_range_map' in struct device is used to facilitate the
> use of single or multiple offsets between mapping regions of cpu addrs and
> dma addrs.  It subsumes the role of "dev->dma_pfn_offset" which was only
> capable of holding a single uniform offset and had no region bounds
> checking.
> 
> The function of_dma_get_range() has been modified so that it takes a single
> argument -- the device node -- and returns a map, NULL, or an error code.
> The map is an array that holds the information regarding the DMA regions.
> Each range entry contains the address offset, the cpu_start address, the
> dma_start address, and the size of the region.
> 
> of_dma_configure() is the typical manner to set range offsets but there are
> a number of ad hoc assignments to "dev->dma_pfn_offset" in the kernel
> driver code.  These cases now invoke the function
> dma_attach_offset_range(dev, cpu_addr, dma_addr, size).
> 
> Signed-off-by: Jim Quinlan 
> ---

[...]

> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 8eea3f6e29a4..4b718d199efe 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -918,33 +918,33 @@ void __iomem *of_io_request_and_map(struct device_node 
> *np, int index,
>  }
>  EXPORT_SYMBOL(of_io_request_and_map);
>  
> +#ifdef CONFIG_HAS_DMA
>  /**
> - * of_dma_get_range - Get DMA range info
> + * of_dma_get_range - Get DMA range info and put it into a map array
>   * @np:  device node to get DMA range info
> - * @dma_addr:pointer to store initial DMA address of DMA range
> - * @paddr:   pointer to store initial CPU address of DMA range
> - * @size:pointer to store size of DMA range
> + * @map: dma range structure to return
>   *
>   * Look in bottom up direction for the first "dma-ranges" property
> - * and parse it.
> - *  dma-ranges format:
> + * and parse it.  Put the information into a DMA offset map array.
> + *
> + * dma-ranges format:
>   *   DMA addr (dma_addr) : naddr cells
>   *   CPU addr (phys_addr_t)  : pna cells
>   *   size: nsize cells
>   *
> - * It returns -ENODEV if "dma-ranges" property was not found
> - * for this device in DT.
> + * It returns -ENODEV if "dma-ranges" property was not found for this
> + * device in the DT.
>   */
> -int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 
> *size)
> +int of_dma_get_range(struct device_node *np, const struct bus_dma_region 
> **map)
>  {
>   struct device_node *node = of_node_get(np);
>   const __be32 *ranges = NULL;
> - int len;
> - int ret = 0;
>   bool found_dma_ranges = false;
>   struct of_range_parser parser;
>   struct of_range range;
> - u64 dma_start = U64_MAX, dma_end = 0, dma_offset = 0;
> + struct bus_dma_region *r;
> + int len, num_ranges = 0;
> + int ret;
>  
>   while (node) {
>   ranges = of_get_property(node, "dma-ranges", );
> @@ -970,44 +970,35 @@ int of_dma_get_range(struct device_node *np, u64 
> *dma_addr, u64 *paddr, u64 *siz
>   }
>  
>   of_dma_range_parser_init(, node);
> + for_each_of_range(, )
> + num_ranges++;
> +
> + of_dma_range_parser_init(, node);
> +
> + ret = -ENOMEM;
> + r = kcalloc(num_ranges + 1, sizeof(*r), GFP_KERNEL);
> + if (!r)
> + goto out;
>  
> + /*
> +  * Record all info in the generic DMA ranges array for struct device.
> +  */
> + *map = r;
>   for_each_of_range(, ) {
>   pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n",
>range.bus_addr, range.cpu_addr, range.size);
> -
> - if (dma_offset && range.cpu_addr - range.bus_addr != 
> dma_offset) {
> - pr_warn("Can't handle multiple dma-ranges with 
> different offsets on node(%pOF)\n", node);
> - /* Don't error out as we'd break some existing DTs */
> - continue;
> - }
> - dma_offset = range.cpu_addr - range.bus_addr;
> -
> - /* Take lower and upper limits */
> - if (range.bus_addr < dma_start)
> - dma_start = range.bus_addr;
> - if (range.bus_addr + range.size > dma_end)
> - dma_end = range.bus_addr + range.size;
> - }
> -
> - if (dma_start >= dma_end) {
> - ret = -EINVAL;
> - pr_debug("Invalid DMA ranges configuration on node(%pOF)\n",
> -  node);
> - goto out;
> + r->cpu_start = range.cpu_addr;
> + r->dma_start = range.bus_addr;
> + r->size = range.size;
> + r->offset = (u64)range.cpu_addr - (u64)range.bus_addr;
> + r++;
>   }
>  
> - *dma_addr = dma_start;
> - *size = dma_end - dma_start;
> - *paddr = dma_start + dma_offset;

Re: revert scope for 5.8, was Re: dma-pool fixes

2020-08-01 Thread Nicolas Saenz Julienne
On Sat, 2020-08-01 at 17:27 +0530, Amit Pundir wrote:

[...]

> > I'm between a rock and a hard place here.  If we simply want to revert
> > commits as-is to make sure both the Raspberry Pi 4 and thone phone do
> > not regress we'll have to go all the way back and revert the whole SEV
> > pool support.  I could try to manual revert of the multiple pool
> > support, but it is very late for that.
> 
> Hi, I found the problematic memory region. It was a memory
> chunk reserved/removed in the downstream tree but was
> seemingly reserved upstream for different drivers. I failed to
> calculate the length of the total region reserved downstream
> correctly. And there was still a portion of memory left unmarked,
> which I should have marked as reserved in my testing earlier
> today.
> 
> Sorry for all the noise and thanks Nicolas, Christoph and David
> for your patience.

That's great news, thanks for persevering!

Regards,
Nicolas

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: dma-pool fixes

2020-07-31 Thread Nicolas Saenz Julienne
On Fri, 2020-07-31 at 16:47 +0530, Amit Pundir wrote:
> On Fri, 31 Jul 2020 at 16:17, Nicolas Saenz Julienne

[...]

> > Ok, so lets see who's doing what and with what constraints:
> 
> Here is the relevant dmesg log: https://pastebin.ubuntu.com/p/dh3pPnxS2v/

Sadly nothing out of the ordinary, looks reasonable.

I have an idea, I've been going over the downstream device tree and it seems
the reserved-memory entries, specially the ones marked with 'no-map' don't
fully match what we have upstream. On top of that all these reserved areas seem
to fall into ZONE_DMA.

So, what could be happening is that, while allocating pages for the ZONE_DMA
atomic pool, something in the page allocator is either writing/mapping into a
reserved area triggering some kind of fault.

Amir, could you go over the no-map reserved-memory entries in the downstream
device-tree, both in 'beryllium-*.dtsi' (I think those are the relevant ones)
and 'sdm845.dtsi'[1], and make sure they match what you are using. If not just
edit them in and see if it helps. If you need any help with that I'll be happy
to give you a hand.

Regards,
Nicolas

[1] You could also extract the device tree from a device running with the
downstream kernel, whatever is easier for you.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: dma-pool fixes

2020-07-31 Thread Nicolas Saenz Julienne
Hi Amit,

On Wed, 2020-07-29 at 17:52 +0530, Amit Pundir wrote:
> On Wed, 29 Jul 2020 at 16:15, Nicolas Saenz Julienne
>  wrote:
> > On Tue, 2020-07-28 at 17:30 +0200, Christoph Hellwig wrote:
> > > On Tue, Jul 28, 2020 at 06:18:41PM +0530, Amit Pundir wrote:
> > > > > Oh well, this leaves me confused again.  It looks like your setup
> > > > > really needs a CMA in zone normal for the dma or dma32 pool.
> > > > 
> > > > Anything I should look up in the downstream kernel/dts?
> > > 
> > > I don't have a good idea right now.  Nicolas, can you think of something
> > > else?
> > 
> > To summarise, the device is:
> >  - Using the dma-direct code path.
> >  - Requesting ZONE_DMA memory to then fail when provided memory falls in
> >ZONE_DMA. Actually, the only acceptable memory comes from CMA, which is
> >located topmost of the 4GB boundary.
> > 
> > My wild guess is that we may be abusing an iommu identity mapping setup by
> > firmware.
> > 
> > That said, what would be helpful to me is to find out the troublesome 
> > device.
> > Amit, could you try adding this patch along with Christoph's modified series
> > (so the board boots). Ultimately DMA atomic allocations are not that 
> > common, so
> > we should get only a few hits:
> 
> Hi, still not hitting dma_alloc_from_pool().

Sorry I insisted, but not hitting the atomic path makes the issue even harder
to understand.

> I hit the following direct alloc path only once, at starting:
> 
> dma_alloc_coherent ()
> -> dma_alloc_attrs()
>-> dma_is_direct() -> dma_direct_alloc()
>   -> dma_direct_alloc_pages()
>  -> dma_should_alloc_from_pool() #returns FALSE from here
> 
> After that I'm hitting following iommu dma alloc path all the time:
> 
> dma_alloc_coherent()
> -> dma_alloc_attrs()
>-> (ops->alloc) -> iommu_dma_alloc()
>   -> iommu_dma_alloc_remap() #always returns from here
> 
> So dma_alloc_from_pool() is not getting called at all in either of the
> above cases.

Ok, so lets see who's doing what and with what constraints:

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..d28b3e4b91d3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -594,6 +594,9 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
dma_addr_t iova;
void *vaddr;
 
+   dev_info(dev, "%s, bus_dma_limit %llx, dma_mask %llx, coherent_dma_mask 
%llx, in irq %lu, size %lu, gfp %x, attrs %lx\n",
+__func__, dev->bus_dma_limit, *dev->dma_mask, 
dev->coherent_dma_mask, in_interrupt(), size, gfp, attrs);
+
*dma_handle = DMA_MAPPING_ERROR;
 
if (unlikely(iommu_dma_deferred_attach(dev, domain)))
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index bb0041e99659..e5474e709e7b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -160,6 +160,9 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
 
size = PAGE_ALIGN(size);
 
+   dev_info(dev, "%s, bus_dma_limit %llx, dma_mask %llx, coherent_dma_mask 
%llx, in irq %lu, size %lu, gfp %x, attrs %lx\n",
+__func__, dev->bus_dma_limit, *dev->dma_mask, 
dev->coherent_dma_mask, in_interrupt(), size, gfp, attrs);
+
if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
ret = dma_alloc_from_pool(dev, size, , gfp);
if (!ret)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-pool: Do not allocate pool memory from CMA

2020-07-31 Thread Nicolas Saenz Julienne
Hi Nathan,

On Thu, 2020-07-30 at 18:10 -0700, Nathan Chancellor wrote:
> On Tue, Jul 28, 2020 at 12:09:18PM +0200, Christoph Hellwig wrote:
> > Ok, I found a slight bug that wasn't intended.  I wanted to make sure
> > we can always fall back to a lower pool, but got that wrong.  Should be
> > fixed in the next version.
> 
> Hi Christoph and Nicolas,
> 
> Did a version of that series ever get send out? I am coming into the
> conversation late but I am running into an issue with the Raspberry Pi 4
> not booting on linux-next, which appears to be due to this patch now in
> mainline as commit d9765e41d8e9 ("dma-pool: do not allocate pool memory
> from CMA") combined with
> https://lore.kernel.org/lkml/20200725014529.1143208-2-jiaxun.y...@flygoat.com/
> in -next:
> 
> [1.423163] raspberrypi-firmware soc:firmware: Request 0x0001 returned 
> status 0x
> [1.431883] raspberrypi-firmware soc:firmware: Request 0x00030046 returned 
> status 0x
> [1.443888] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.452527] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 0 
> config (-22 80)
> [1.460836] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.469445] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 1 
> config (-22 81)
> [1.477735] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.486350] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 2 
> config (-22 82)
> [1.494639] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.503246] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 3 
> config (-22 83)
> [1.511529] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.520131] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 4 
> config (-22 84)
> [1.528414] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.537017] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 5 
> config (-22 85)
> [1.545299] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.553903] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 6 
> config (-22 86)
> [1.562184] raspberrypi-firmware soc:firmware: Request 0x00030043
> [1.570787] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 7 
> config (-22 87)
> [1.579897] raspberrypi-firmware soc:firmware: Request 0x00030030 returned 
> status 0x
> [1.589419] raspberrypi-firmware soc:firmware: Request 0x00028001 returned 
> status 0x
> [1.599391] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.608018] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 1 
> config (-22 81)
> [1.616313] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.624932] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 1 
> config (-22 81)
> [1.633195] pwrseq_simple: probe of wifi-pwrseq failed with error -22
> [1.643904] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.652544] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 2 
> config (-22 82)
> [1.660839] raspberrypi-firmware soc:firmware: Request 0x00030041 returned 
> status 0x
> [1.669446] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 2 
> state (-22 82)
> [1.677727] leds-gpio: probe of leds failed with error -22
> [1.683735] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.692346] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 6 
> config (-22 86)
> [1.700636] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.709240] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 6 
> config (-22 86)
> [1.717496] reg-fixed-voltage: probe of sd_vcc_reg failed with error -22
> [1.725546] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.734176] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 4 
> config (-22 84)
> [1.742465] raspberrypi-firmware soc:firmware: Request 0x00030043 returned 
> status 0x
> [1.751072] raspberrypi-exp-gpio soc:firmware:gpio: Failed to get GPIO 4 
> config (-22 84)
> [1.759332] gpio-regulator: probe of sd_io_1v8_reg failed with error -22
> [1.768042] raspberrypi-firmware soc:firmware: Request 0x00028001 returned 
> status 0x
> [1.780871] ALSA device list:
> [1.783960]   No soundcards found.
> [1.787633] Waiting for root device PARTUUID=45a8dd8a-02...
> 
> I am unsure if it is related to the issue that Amit is having or
> if that makes sense at all but I can reliably reproduce it.
> 
> v5.8-rc1: 

  1   2   3   >