Re: [PATCH] arm64: iommu: Refactoring common code.

2014-03-10 Thread Catalin Marinas
On Mon, Mar 10, 2014 at 04:32:50PM +, Ritesh Harjani wrote:
 Hi Everyone,
 
 Please find the following patch as refactoring of the common code out
 from arch/arm/mm/dma-mapping.c to lib/iommu-helper.c
 
 This is just an initial version of patch to get more details and to
 know if this is how we want to plan refactoring iommu code to
 lib/iommu-helper.
 
 Please let me know the changes/suggestion which you think in this ?
 
 
 
 
 Taking out the common code of buffer allocation and mapping
 for iommu from arch/arm to lib/iommu-helper file.
 
 Signed-off-by: Ritesh Harjani ritesh.harj...@gmail.com
 ---
  arch/arm/mm/dma-mapping.c| 159 +---
  include/linux/iommu-helper.h |  22 ++
  lib/iommu-helper.c   | 169 
 +++
  3 files changed, 210 insertions(+), 140 deletions(-)

I haven't had the time to look at this patch but I think you should
change the subject prefix to just arm (rather than arm64) and cc Russell
King and the linux-arm-kernel mailing list.

I'll have a look at the patch tomorrow.

Thanks.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] [RFC] arm: iommu: Refactoring common code

2014-03-19 Thread Catalin Marinas
On Wed, Mar 19, 2014 at 02:58:26AM +, Ritesh Harjani wrote:
 Some suggestions/comments on this to take this discussion forward ?

Just a bit of patience ;). It's likely that you won't get much feedback
before 3.15-rc1 as people are busy preparing for the merging window.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/5] arm64: IOMMU-backed DMA mapping

2015-01-23 Thread Catalin Marinas
On Mon, Jan 12, 2015 at 08:48:52PM +, Robin Murphy wrote:
 Catalin Marinas (1):
   arm64: Combine coherent and non-coherent swiotlb dma_ops
 
 Robin Murphy (4):
   arm64: implement generic IOMMU configuration
   iommu: implement common IOMMU ops for DMA mapping
   arm64: add IOMMU dma_ops
   arm64: hook up IOMMU dma_ops

I queued the first two patches in this series for 3.20. The rest need to
go through some review rounds.

Thanks.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] of: fix size when dma-range is not used

2015-02-02 Thread Catalin Marinas
On Fri, Jan 30, 2015 at 06:06:27PM +, Murali Karicheri wrote:
 On 01/28/2015 12:30 PM, Catalin Marinas wrote:
  I think we can remove this check altogether (we leaved without it for a
  while) but we need to add 1 when calculating the mask:
 
  dev-coherent_dma_mask = min(DMA_BIT_MASK(32),
   DMA_BIT_MASK(ilog2(size + 1)));
 
 For Keystone, the dma_addr is to be taken care as well to determine the 
 mask. The above will not work.

This was discussed before (not on this thread) and dma_addr should not
affect the mask, it only affects the pfn offset.

 Based on the discussion so far, this is the function I have come up with 
 incorporating the suggestions. Please review this and see if I have 
 missed out any. This works fine on Keystone.
 
 void of_dma_configure(struct device *dev, struct device_node *np)
 {
   u64 dma_addr = 0, paddr, size;
   int ret;
   bool coherent;
   unsigned long offset = 0;
   struct iommu_ops *iommu;
 
   /*
* Set default size to cover the 32-bit. Drivers are expected to setup
* the correct size and dma_mask.
*/
   size = 1ULL  32;
 
   ret = of_dma_get_range(np, dma_addr, paddr, size);
   if (!ret) {
   offset = PFN_DOWN(paddr - dma_addr);
   if (!size) {
   dev_err(dev, Invalid size (%llx)\n,
   size);
   return;
   }
   if (size  1) {
   size = size + 1;
   dev_warn(dev, Incorrect usage of size (%llx)\n,
size);
   }
   dev_dbg(dev, dma_pfn_offset(%#08lx)\n, offset);
   }
   dev-dma_pfn_offset = offset;
 
   /*
* Coherent DMA masks larger than 32-bit must be explicitly set by the
* driver.
*/
   dev-coherent_dma_mask = min(DMA_BIT_MASK(32),
DMA_BIT_MASK(ilog2(dma_addr + size)));

That's not correct, coherent_dma_mask should still be calculated solely
based on size, not dma_addr.

Functions like swiotlb_dma_supported() use phys_to_dma() which on ARM
(32-bit) subtracts the dma_pfn_offset, so the mask based on size works
fine.

In the arm64 tree, we haven't taken dma_pfn_offset into account for
phys_to_dma() yet but if needed for a SoC, we'll add it.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 3/7] of: fix size when dma-range is not used

2015-02-06 Thread Catalin Marinas
On Thu, Feb 05, 2015 at 09:52:55PM +, Murali Karicheri wrote:
 Fix the dma-range size when the DT attribute is missing. i.e  set size to
 dev-coherent_dma_mask + 1 instead of dev-coherent_dma_mask. Also add
 code to check invalid values of size configured in DT and log error.
 
 Cc: Joerg Roedel j...@8bytes.org
 Cc: Grant Likely grant.lik...@linaro.org
 Cc: Rob Herring robh...@kernel.org
 Cc: Bjorn Helgaas bhelg...@google.com
 Cc: Will Deacon will.dea...@arm.com
 Cc: Russell King li...@arm.linux.org.uk
 Cc: Arnd Bergmann a...@arndb.de
 Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
 
 Signed-off-by: Murali Karicheri m-kariche...@ti.com
 ---
  drivers/of/device.c |   17 -
  1 file changed, 16 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/of/device.c b/drivers/of/device.c
 index 2de320d..314c8a9 100644
 --- a/drivers/of/device.c
 +++ b/drivers/of/device.c
 @@ -105,9 +105,24 @@ void of_dma_configure(struct device *dev, struct 
 device_node *np)
   ret = of_dma_get_range(np, dma_addr, paddr, size);
   if (ret  0) {
   dma_addr = offset = 0;
 - size = dev-coherent_dma_mask;
 + size = dev-coherent_dma_mask + 1;
   } else {
   offset = PFN_DOWN(paddr - dma_addr);
 +
 + /*
 +  * Add a work around to treat the size as mask + 1 in case
 +  * it is defined in DT as a mask.
 +  */
 + if (size  1) {
 + dev_warn(dev, Invalid size 0x%llx for dma-range\n,
 +  size);
 + size = size + 1;
 + }
 +
 + if (!size) {
 + dev_err(dev, Adjusted size 0x%llx invalid\n, size);
 + return;
 + }

Would it make sense to set coherent_dma_mask to 0 here to make this more
noticeable? It can be done together with the mask calculation from size.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 0/7] PCI: get DMA configuration from parent device

2015-02-06 Thread Catalin Marinas
On Thu, Feb 05, 2015 at 09:52:52PM +, Murali Karicheri wrote:
 This patch add an important capability to PCI driver on Keystone. I hope to
 have this merged to the upstream branch so that it is available for v3.20.

It's very late for 3.20 and the code hasn't been in linux-next at all
(but it's not me who's merging this code), unless you can squeeze it in
as a bug-fix.

But the good part is that there is more time to fix the dma mask setting
as well ;).

 PCI devices on Keystone doesn't have correct dma_pfn_offset set. This patch
 add capability to set the dma configuration such as dma-mask, dma_pfn_offset,
 and dma ops etc using the information from DT. The prior RFCs and discussions
 are available at [1] and [2] below.

For the series, you can add:

Reviewed-by: Catalin Marinas catalin.mari...@arm.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 6/6] arm: dma-mapping: updates to limit dma_mask and iommu mapping size

2015-01-27 Thread Catalin Marinas
On Tue, Jan 27, 2015 at 11:12:32AM +, Robin Murphy wrote:
 On 23/01/15 22:32, Murali Karicheri wrote:
  Limit the dma_mask to minimum of dma_mask and dma_base + size - 1.
 
  Also arm_iommu_create_mapping() has size parameter of size_t and
  arm_setup_iommu_dma_ops() can take a value higher than that. So
  limit the size to SIZE_MAX.
 
  Signed-off-by: Murali Karicheri m-kariche...@ti.com
  ---
arch/arm/mm/dma-mapping.c |   10 ++
1 file changed, 10 insertions(+)
 
  diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
  index 7864797..a1f9030 100644
  --- a/arch/arm/mm/dma-mapping.c
  +++ b/arch/arm/mm/dma-mapping.c
  @@ -2004,6 +2004,13 @@ static bool arm_setup_iommu_dma_ops(struct device 
  *dev, u64 dma_base, u64 size,
  if (!iommu)
  return false;
 
  +   /*
  +* currently arm_iommu_create_mapping() takes a max of size_t
  +* for size param. So check this limit for now.
  +*/
  +   if (size  SIZE_MAX)
  +   return false;
  +
  mapping = arm_iommu_create_mapping(dev-bus, dma_base, size);
  if (IS_ERR(mapping)) {
  pr_warn(Failed to create %llu-byte IOMMU mapping for device 
  %s\n,
  @@ -2053,6 +2060,9 @@ void arch_setup_dma_ops(struct device *dev, u64 
  dma_base, u64 size,
{
  struct dma_map_ops *dma_ops;
 
  +   /* limit dma_mask to the lower of the two values */
  +   *dev-dma_mask = min((*dev-dma_mask), (dma_base + size - 1));
  +
 
 Is there any reason not to do this in of_dma_configure? It seems like 
 something everyone could benefit from - I'd cooked up a dodgy workaround 
 for the same issue in my arm64 IOMMU code, but handling it generically 
 in common code would be much nicer.

I agree. I started something here:

http://article.gmane.org/gmane.linux.kernel/1835096

but I don't remember to have got to a clear conclusion.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] of: fix size when dma-range is not used

2015-01-28 Thread Catalin Marinas
On Tue, Jan 27, 2015 at 06:55:15PM +, Murali Karicheri wrote:
 On 01/27/2015 06:27 AM, Robin Murphy wrote:
  On 23/01/15 22:32, Murali Karicheri wrote:
  Fix the dma-range size when the DT attribute is missing. i.e set size to
  dev-coherent_dma_mask + 1 instead of dev-coherent_dma_mask. To detect
  overflow when mask is set to max of u64, add a check, log error and
  return.
  Some platform use mask format for size in DTS. So add a work around to
  catch this and fix.
 
  Cc: Joerg Roedel j...@8bytes.org
  Cc: Grant Likely grant.lik...@linaro.org
  Cc: Rob Herring robh...@kernel.org
  Cc: Bjorn Helgaas bhelg...@google.com
  Cc: Will Deacon will.dea...@arm.com
  Cc: Russell King li...@arm.linux.org.uk
  Cc: Arnd Bergmann a...@arndb.de
  Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
 
  Signed-off-by: Murali Karicheri m-kariche...@ti.com
  ---
  drivers/of/device.c | 14 +-
  1 file changed, 13 insertions(+), 1 deletion(-)
 
  diff --git a/drivers/of/device.c b/drivers/of/device.c
  index 2de320d..0a5ff54 100644
  --- a/drivers/of/device.c
  +++ b/drivers/of/device.c
  @@ -105,12 +105,24 @@ void of_dma_configure(struct device *dev, struct
  device_node *np)
  ret = of_dma_get_range(np, dma_addr, paddr, size);
  if (ret  0) {
  dma_addr = offset = 0;
  - size = dev-coherent_dma_mask;
  + size = dev-coherent_dma_mask + 1;
  } else {
  offset = PFN_DOWN(paddr - dma_addr);
  + /*
  + * Add a work around to treat the size as mask + 1 in case
  + * it is defined in DT as a mask.
  + */
  + if (size  1)
  + size = size + 1;
  dev_dbg(dev, dma_pfn_offset(%#08lx)\n, offset);
  }
 
  + /* if size is 0, we have an overflow of u64 */
  + if (!size) {
  + dev_err(dev, invalid size\n);
  + return;
  + }
  +
 
  This seems potentially fragile to dodgy DTs given that we might also be
  using size to make a mask later. Would it make sense to double-up a
  sanity check as mask-format detection? Something like:
 
  if is_power_of_2(size)
  // use size
  else if is_power_of_2(size + 1)
  // use size + 1
  else
  // cry
 
 How about having the logic like this?
 
   ret = of_dma_get_range(np, dma_addr, paddr, size);
   if (ret  0) {
   dma_addr = offset = 0;
   size = dev-coherent_dma_mask + 1;
   } else {
   offset = PFN_DOWN(paddr - dma_addr);
   dev_dbg(dev, dma_pfn_offset(%#08lx)\n, offset);
   }
 
   if (is_power_of_2(size + 1))
   size = size + 1;
   else if (!is_power_of_2(size))
   {
   dev_err(dev, invalid size\n);
   return;
   }

In of_dma_configure(), we currently assume that the default coherent
mask is 32-bit. In this thread:

http://article.gmane.org/gmane.linux.kernel/1835096

we talked about setting the coherent mask based on size automatically.
I'm not sure about the size but I think we can assume is 32-bit mask + 1
if it is not specified in the DT. Instead of just assuming a default
mask, let's assume a default size and create the mask based on this
(untested):

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 5b33c6a21807..9ff8d1286b44 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -170,10 +170,10 @@ static void of_dma_configure(struct device *dev)
struct iommu_ops *iommu;
 
/*
-* Set default dma-mask to 32 bit. Drivers are expected to setup
-* the correct supported dma_mask.
+* Set default size to cover the 32-bit. Drivers are expected to setup
+* the correct size and dma_mask.
 */
-   dev-coherent_dma_mask = DMA_BIT_MASK(32);
+   size = 1ULL  32;
 
/*
 * Set it to coherent_dma_mask by default if the architecture
@@ -185,13 +185,24 @@ static void of_dma_configure(struct device *dev)
ret = of_dma_get_range(dev-of_node, dma_addr, paddr, size);
if (ret  0) {
dma_addr = offset = 0;
-   size = dev-coherent_dma_mask;
} else {
offset = PFN_DOWN(paddr - dma_addr);
dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
}
dev-dma_pfn_offset = offset;
 
+   /*
+* Workaround for DTs setting the size to a mask or 0.
+*/
+   if (is_power_of_2(size + 1))
+   size += 1;
+
+   /*
+* Coherent DMA masks larger than 32-bit must be explicitly set by the
+* driver.
+*/
+   dev-coherent_dma_mask = min(DMA_BIT_MASK(32), 
DMA_BIT_MASK(ilog2(size)));
+
coherent = of_dma_is_coherent(dev-of_node);
dev_dbg(dev, device is%sdma coherent\n,
coherent ?   :  not );

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] of: fix size when dma-range is not used

2015-01-28 Thread Catalin Marinas
On Wed, Jan 28, 2015 at 03:55:57PM +, Robin Murphy wrote:
 On 28/01/15 11:05, Catalin Marinas wrote:
  On Tue, Jan 27, 2015 at 06:55:15PM +, Murali Karicheri wrote:
  How about having the logic like this?
 
 ret = of_dma_get_range(np, dma_addr, paddr, size);
 if (ret  0) {
 dma_addr = offset = 0;
 size = dev-coherent_dma_mask + 1;
 } else {
 offset = PFN_DOWN(paddr - dma_addr);
 dev_dbg(dev, dma_pfn_offset(%#08lx)\n, offset);
 }
 
 if (is_power_of_2(size + 1))
 size = size + 1;
 else if (!is_power_of_2(size))
 {
 dev_err(dev, invalid size\n);
 return;
 }
 
  In of_dma_configure(), we currently assume that the default coherent
  mask is 32-bit. In this thread:
 
  http://article.gmane.org/gmane.linux.kernel/1835096
 
  we talked about setting the coherent mask based on size automatically.
  I'm not sure about the size but I think we can assume is 32-bit mask + 1
  if it is not specified in the DT. Instead of just assuming a default
  mask, let's assume a default size and create the mask based on this
  (untested):
 
  diff --git a/drivers/of/platform.c b/drivers/of/platform.c
  index 5b33c6a21807..9ff8d1286b44 100644
  --- a/drivers/of/platform.c
  +++ b/drivers/of/platform.c
  @@ -170,10 +170,10 @@ static void of_dma_configure(struct device *dev)
  struct iommu_ops *iommu;
 
  /*
  -* Set default dma-mask to 32 bit. Drivers are expected to setup
  -* the correct supported dma_mask.
  +* Set default size to cover the 32-bit. Drivers are expected to setup
  +* the correct size and dma_mask.
   */
  -   dev-coherent_dma_mask = DMA_BIT_MASK(32);
  +   size = 1ULL  32;
 
  /*
   * Set it to coherent_dma_mask by default if the architecture
  @@ -185,13 +185,24 @@ static void of_dma_configure(struct device *dev)
  ret = of_dma_get_range(dev-of_node, dma_addr, paddr, size);
  if (ret  0) {
  dma_addr = offset = 0;
  -   size = dev-coherent_dma_mask;
  } else {
  offset = PFN_DOWN(paddr - dma_addr);
  dev_dbg(dev, dma_pfn_offset(%#08lx)\n, dev-dma_pfn_offset);
  }
  dev-dma_pfn_offset = offset;
 
  +   /*
  +* Workaround for DTs setting the size to a mask or 0.
  +*/
  +   if (is_power_of_2(size + 1))
  +   size += 1;
 
 In fact, since the ilog2 below ends up effectively rounding down, we 
 might as well do away with this check as well and just add 1 
 unconditionally. The only time it makes any difference is when we want 
 it to anyway!

Well, we could simply ignore the is_power_of_2() check but without
incrementing size as we don't know what arch_setup_dma_ops() does with
it.

I think we can remove this check altogether (we leaved without it for a
while) but we need to add 1 when calculating the mask:

dev-coherent_dma_mask = min(DMA_BIT_MASK(32),
 DMA_BIT_MASK(ilog2(size + 1)));

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/6] of: fix size when dma-range is not used

2015-01-28 Thread Catalin Marinas
On Wed, Jan 28, 2015 at 03:45:19PM +, Rob Herring wrote:
 On Wed, Jan 28, 2015 at 5:05 AM, Catalin Marinas
 catalin.mari...@arm.com wrote:
  On Tue, Jan 27, 2015 at 06:55:15PM +, Murali Karicheri wrote:
  How about having the logic like this?
 
ret = of_dma_get_range(np, dma_addr, paddr, size);
if (ret  0) {
dma_addr = offset = 0;
size = dev-coherent_dma_mask + 1;
} else {
offset = PFN_DOWN(paddr - dma_addr);
dev_dbg(dev, dma_pfn_offset(%#08lx)\n, offset);
}
 
if (is_power_of_2(size + 1))
size = size + 1;
else if (!is_power_of_2(size))
{
dev_err(dev, invalid size\n);
return;
}
 
  In of_dma_configure(), we currently assume that the default coherent
  mask is 32-bit. In this thread:
 
  http://article.gmane.org/gmane.linux.kernel/1835096
 
  we talked about setting the coherent mask based on size automatically.
  I'm not sure about the size but I think we can assume is 32-bit mask + 1
  if it is not specified in the DT. Instead of just assuming a default
  mask, let's assume a default size and create the mask based on this
  (untested):
 
  diff --git a/drivers/of/platform.c b/drivers/of/platform.c
  index 5b33c6a21807..9ff8d1286b44 100644
  --- a/drivers/of/platform.c
  +++ b/drivers/of/platform.c
  @@ -170,10 +170,10 @@ static void of_dma_configure(struct device *dev)
  struct iommu_ops *iommu;
 
  /*
  -* Set default dma-mask to 32 bit. Drivers are expected to setup
  -* the correct supported dma_mask.
  +* Set default size to cover the 32-bit. Drivers are expected to 
  setup
  +* the correct size and dma_mask.
   */
  -   dev-coherent_dma_mask = DMA_BIT_MASK(32);
  +   size = 1ULL  32;
 
  /*
   * Set it to coherent_dma_mask by default if the architecture
  @@ -185,13 +185,24 @@ static void of_dma_configure(struct device *dev)
  ret = of_dma_get_range(dev-of_node, dma_addr, paddr, size);
  if (ret  0) {
  dma_addr = offset = 0;
  -   size = dev-coherent_dma_mask;
 
 Are we assuming dma_addr, paddr and size are not touched on error? If
 so, we can get rid of this clause entirely.

We can if we initialise dma_addr and offset to 0 when declared in this
function. The dma_addr and size variables are later passed to the
arch_setup_dma_ops(), so they need to have some sane values independent
of the presence of dma-ranges in the DT.

  } else {
  offset = PFN_DOWN(paddr - dma_addr);
  dev_dbg(dev, dma_pfn_offset(%#08lx)\n, 
  dev-dma_pfn_offset);
  }
  dev-dma_pfn_offset = offset;
 
  +   /*
  +* Workaround for DTs setting the size to a mask or 0.
  +*/
  +   if (is_power_of_2(size + 1))
  +   size += 1;
 
 As I mentioned, I think power of 2 is too restrictive (from a DT
 perspective even though the kernel may require a power of 2 here for
 the mask). Just checking bit0 set should be enough.

The power of 2 was mainly to cover the case where the size is wrongly
written as a mask in the DT. If the size is deliberately not a power of
two and not a full mask, the above check won't change it. With my
proposal, ilog2 gets rid of extra bits in size, only that if the size
was a mask because of DT error, we lose a bit in the coherent_dma_mask.

 Also, we need a WARN here so DTs get fixed.

I agree.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/4] iommu: Implement common IOMMU ops for DMA mapping

2015-07-28 Thread Catalin Marinas
On Thu, Jul 16, 2015 at 07:40:13PM +0100, Robin Murphy wrote:
 +/**
 + * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
 + * @dev: Device to allocate memory for. Must be a real device
 + *attached to an iommu_dma_domain
 + * @size: Size of buffer in bytes
 + * @gfp: Allocation flags
 + * @prot: IOMMU mapping flags
 + * @handle: Out argument for allocated DMA handle
 + * @flush_page: Arch callback to flush a single page from all caches to
 + *   ensure DMA visibility of __GFP_ZERO. May be NULL if not
 + *   required.
 + *
 + * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
 + * but an IOMMU which supports smaller pages might not map the whole thing.
 + *
 + * Return: Array of struct page pointers describing the buffer,
 + *  or NULL on failure.
 + */
 +struct page **iommu_dma_alloc(struct device *dev, size_t size,
 + gfp_t gfp, int prot, dma_addr_t *handle,
 + void (*flush_page)(const void *, phys_addr_t))
 +{
 + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 + struct iova_domain *iovad = domain-dma_api_cookie;
 + struct iova *iova;
 + struct page **pages;
 + struct sg_table sgt;
 + dma_addr_t dma_addr;
 + unsigned int count = PAGE_ALIGN(size)  PAGE_SHIFT;
 +
 + *handle = DMA_ERROR_CODE;
 +
 + pages = __iommu_dma_alloc_pages(count, gfp);
 + if (!pages)
 + return NULL;
 +
 + iova = __alloc_iova(iovad, size, dev-coherent_dma_mask);
 + if (!iova)
 + goto out_free_pages;
 +
 + size = iova_align(iovad, size);
 + if (sg_alloc_table_from_pages(sgt, pages, count, 0, size, GFP_KERNEL))
 + goto out_free_iova;
 +
 + if (gfp  __GFP_ZERO) {
 + struct sg_mapping_iter miter;
 + /*
 +  * The flushing provided by the SG_MITER_TO_SG flag only
 +  * applies to highmem pages, so it won't do the job here.
 +  */

The comment here is slightly wrong. The SG_MITER_FROM_SG flushing is
done by calling flush_kernel_dcache_page(). This function can be no-op
even on architectures implementing highmem. For example, on arm32 it
only does some flushing if there is a potential D-cache alias with user
space. The flush that you call below is for DMA operations, something
entirely different.

 + sg_miter_start(miter, sgt.sgl, sgt.orig_nents, 
 SG_MITER_FROM_SG);
 + while (sg_miter_next(miter)) {
 + memset(miter.addr, 0, PAGE_SIZE);

Isn't __GFP_ZERO already honoured by alloc_pages in
__iommu_dma_alloc_pages?

 + if (flush_page)
 + flush_page(miter.addr, 
 page_to_phys(miter.page));

Could you instead do the check as !(prot  IOMEM_CACHE) so that it saves
architecture code from passing NULL when coherent?

I'm still not convinced we need this flushing here but I'll follow up in
patch 3/4.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/4] arm64: Add IOMMU dma_ops

2015-07-28 Thread Catalin Marinas
On Thu, Jul 16, 2015 at 07:40:14PM +0100, Robin Murphy wrote:
 +static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 +  dma_addr_t *handle, gfp_t gfp,
 +  struct dma_attrs *attrs)
 +{
 + bool coherent = is_device_dma_coherent(dev);
 + int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
 + void *addr;
 +
 + if (WARN(!dev, cannot create IOMMU mapping for unknown device\n))
 + return NULL;
 +
 + /*
 +  * Some drivers rely on this, and we may not want to potentially
 +  * expose stale kernel data to devices anyway.
 +  */
 + gfp |= __GFP_ZERO;
 +
 + if (gfp  __GFP_WAIT) {
 + struct page **pages;
 + pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
 +  __pgprot(PROT_NORMAL_NC);
 +
 + pgprot = __get_dma_pgprot(attrs, pgprot, coherent);

I think you can simplify this:

pgprot_t pgprot = __get_dma_pgprot(attrs, PAGE_KERNEL, 
coherent);

(and we could do the same in __dma_alloc)

 + pages = iommu_dma_alloc(dev, size, gfp, ioprot, handle,
 + coherent ? NULL : flush_page);

I commented on patch 2/4. If we checked for IOMMU_CACHE in
iommu_dma_alloc(), we could always pass flush_page here without the
NULL.

 + if (!pages)
 + return NULL;
 +
 + addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
 +   __builtin_return_address(0));
 + if (!addr)
 + iommu_dma_free(dev, pages, size, handle);

For arm64, it would have been easier to simply flush the caches here and
avoid the flush_page argument. However, I reread your earlier reply, so
if we ever honour the ATTR_NO_KERNEL_MAPPING, then we would have to
iterate over the individual pages in the arch code. Let's leave it as
per your patch 2/4 (with a flush_page argument).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 4/4] arm64: Hook up IOMMU dma_ops

2015-07-28 Thread Catalin Marinas
On Thu, Jul 16, 2015 at 07:40:15PM +0100, Robin Murphy wrote:
 With iommu_dma_ops in place, hook them up to the configuration code, so
 IOMMU-fronted devices will get them automatically.
 
 Signed-off-by: Robin Murphy robin.mur...@arm.com

Acked-by: Catalin Marinas catalin.mari...@arm.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/3] arm64: Add IOMMU dma_ops

2015-08-03 Thread Catalin Marinas
On Fri, Jul 31, 2015 at 06:18:28PM +0100, Robin Murphy wrote:
 Taking some inspiration from the arch/arm code, implement the
 arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.
 
 Unfortunately the device setup code has to start out as a big ugly mess
 in order to work usefully right now, as 'proper' operation depends on
 changes to device probe and DMA configuration ordering, IOMMU groups for
 platform devices, and default domain support in arm/arm64 IOMMU drivers.
 The workarounds here need only exist until that work is finished.
 
 Signed-off-by: Robin Murphy robin.mur...@arm.com

Reviewed-by: Catalin Marinas catalin.mari...@arm.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

2015-08-03 Thread Catalin Marinas
On Fri, Jul 31, 2015 at 06:18:27PM +0100, Robin Murphy wrote:
 Taking inspiration from the existing arch/arm code, break out some
 generic functions to interface the DMA-API to the IOMMU-API. This will
 do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
 
 Signed-off-by: Robin Murphy robin.mur...@arm.com

Acked-by: Catalin Marinas catalin.mari...@arm.com
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping

2015-07-14 Thread Catalin Marinas
On Fri, Jul 10, 2015 at 08:19:33PM +0100, Robin Murphy wrote:
 +/*
 + * IOVAs are IOMMU _input_ addresses, so there still exists the possibility
 + * for static bus translation between device output and IOMMU input (yuck).
 + */
 +static inline dma_addr_t dev_dma_addr(struct device *dev, dma_addr_t addr)
 +{
 + dma_addr_t offset = (dma_addr_t)dev-dma_pfn_offset  PAGE_SHIFT;
 +
 + BUG_ON(addr  offset);
 + return addr - offset;
 +}

Are these just theoretical or you expect to see some at some point? I
think the dma_limit in __alloc_iova() may also need to take the offset
into account (or just ignore them altogether for now).

 +
 +/**
 + * dma_direction_to_prot - Translate DMA API directions to IOMMU API page 
 flags
 + * @dir: Direction of DMA transfer
 + * @coherent: Is the DMA master cache-coherent?
 + *
 + * Return: corresponding IOMMU API page protection flags
 + */
 +int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
 +{
 + int prot = coherent ? IOMMU_CACHE : 0;
 +
 + switch (dir) {
 + case DMA_BIDIRECTIONAL:
 + return prot | IOMMU_READ | IOMMU_WRITE;
 + case DMA_TO_DEVICE:
 + return prot | IOMMU_READ;
 + case DMA_FROM_DEVICE:
 + return prot | IOMMU_WRITE;
 + default:
 + return 0;
 + }
 +}
 +
 +static struct iova *__alloc_iova(struct device *dev, size_t size, bool 
 coherent)
 +{
 + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 + struct iova_domain *iovad = domain-dma_api_cookie;
 + unsigned long shift = iova_shift(iovad);
 + unsigned long length = iova_align(iovad, size)  shift;
 + u64 dma_limit = coherent ? dev-coherent_dma_mask : dma_get_mask(dev);

coherent and coherent_dma_mask have related meanings here. As I can
see in patch 3, the coherent information passed all the way to this
function states whether the device is cache coherent or not (and whether
cache maintenance is needed). The coherent_dma_mask refers to an
allocation mask for the dma_alloc_coherent() API but that doesn't
necessarily mean that the device is cache coherent. Similarly, dma_mask
is used for streaming DMA.

You can rename it to coherent_api or simply pass a u64 dma_mask
directly.

[...]
 +/**
 + * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
 + * @dev: Device to allocate memory for. Must be a real device
 + *attached to an iommu_dma_domain
 + * @size: Size of buffer in bytes
 + * @gfp: Allocation flags
 + * @prot: IOMMU mapping flags
 + * @coherent: Which dma_mask to base IOVA allocation on
 + * @handle: Out argument for allocated DMA handle
 + * @flush_page: Arch callback to flush a single page from caches as
 + *   necessary. May be NULL for coherent allocations
 + *
 + * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
 + * but an IOMMU which supports smaller pages might not map the whole thing.
 + * For now, the buffer is unconditionally zeroed for compatibility
 + *
 + * Return: Array of struct page pointers describing the buffer,
 + *  or NULL on failure.
 + */
 +struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
 + int prot, bool coherent, dma_addr_t *handle,
 + void (*flush_page)(const void *, phys_addr_t))

So for this function, coherent should always be true since this is only
used with the coherent DMA API AFAICT.

 +{
 + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 + struct iova_domain *iovad = domain-dma_api_cookie;
 + struct iova *iova;
 + struct page **pages;
 + struct sg_table sgt;
 + struct sg_mapping_iter miter;
 + dma_addr_t dma_addr;
 + unsigned int count = PAGE_ALIGN(size)  PAGE_SHIFT;
 +
 + *handle = DMA_ERROR_CODE;
 +
 + pages = __iommu_dma_alloc_pages(count, gfp);
 + if (!pages)
 + return NULL;
 +
 + iova = __alloc_iova(dev, size, coherent);

And here just __alloc_iova(dev, size, true);

[...]
 +dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 + unsigned long offset, size_t size, int prot, bool coherent)
 +{
 + dma_addr_t dma_addr;
 + struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 + struct iova_domain *iovad = domain-dma_api_cookie;
 + phys_addr_t phys = page_to_phys(page) + offset;
 + size_t iova_off = iova_offset(iovad, phys);
 + size_t len = iova_align(iovad, size + iova_off);
 + struct iova *iova = __alloc_iova(dev, len, coherent);

Here __alloc_iova(dev, len, false);

[...]
 +/*
 + * The DMA API client is passing in a scatterlist which could describe
 + * any old buffer layout, but the IOMMU API requires everything to be
 + * aligned to IOMMU pages. Hence the need for this complicated bit of
 + * impedance-matching, to be able to hand off a suitably-aligned list,
 + * but still preserve the original offsets and sizes for the caller.
 + */
 +int iommu_dma_map_sg(struct device *dev, struct 

Re: [PATCH v3 3/4] arm64: Add IOMMU dma_ops

2015-07-15 Thread Catalin Marinas
On Fri, Jul 10, 2015 at 08:19:34PM +0100, Robin Murphy wrote:
 diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
 index d16a1ce..ccadfd4 100644
 --- a/arch/arm64/mm/dma-mapping.c
 +++ b/arch/arm64/mm/dma-mapping.c
 @@ -526,3 +526,426 @@ static int __init dma_debug_do_init(void)
   return 0;
  }
  fs_initcall(dma_debug_do_init);
 +
 +
 +#ifdef CONFIG_IOMMU_DMA
 +#include linux/dma-iommu.h
 +#include linux/platform_device.h
 +#include linux/amba/bus.h
 +
 +/* Thankfully, all cache ops are by VA so we can ignore phys here */
 +static void flush_page(const void *virt, phys_addr_t phys)
 +{
 + __dma_flush_range(virt, virt + PAGE_SIZE);
 +}
 +
 +static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 +  dma_addr_t *handle, gfp_t gfp,
 +  struct dma_attrs *attrs)
 +{
 + bool coherent = is_device_dma_coherent(dev);
 + int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
 + void *addr;
 +
 + if (WARN(!dev, cannot create IOMMU mapping for unknown device\n))
 + return NULL;
 +
 + if (gfp  __GFP_WAIT) {
 + struct page **pages;
 + pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
 +  __pgprot(PROT_NORMAL_NC);
 +
 + pgprot = __get_dma_pgprot(attrs, pgprot, coherent);
 + pages = iommu_dma_alloc(dev, size, gfp, ioprot, coherent,
 + handle, coherent ? NULL : flush_page);

As I replied already on the other patch, the coherent argument here
should always be true.

BTW, why do we need to call flush_page via iommu_dma_alloc() and not
flush the buffer directly in the arch __iommu_alloc_attrs()? We already
have the pointer and size after remapping in the CPU address space), it
would keep the iommu_dma_alloc() simpler.

 + if (!pages)
 + return NULL;
 +
 + addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
 +   __builtin_return_address(0));
 + if (!addr)
 + iommu_dma_free(dev, pages, size, handle);
 + } else {
 + struct page *page;
 + /*
 +  * In atomic context we can't remap anything, so we'll only
 +  * get the virtually contiguous buffer we need by way of a
 +  * physically contiguous allocation.
 +  */
 + if (coherent) {
 + page = alloc_pages(gfp, get_order(size));
 + addr = page ? page_address(page) : NULL;

We could even use __get_free_pages(gfp  ~__GFP_HIGHMEM) since we don't
have/need highmem on arm64.

 + } else {
 + addr = __alloc_from_pool(size, page, gfp);
 + }
 + if (addr) {
 + *handle = iommu_dma_map_page(dev, page, 0, size,
 +  ioprot, false);

Why coherent == false?

 + if (iommu_dma_mapping_error(dev, *handle)) {
 + if (coherent)
 + __free_pages(page, get_order(size));
 + else
 + __free_from_pool(addr, size);
 + addr = NULL;
 + }
 + }
 + }
 + return addr;
 +}

In the second case here (!__GFP_WAIT), do we do any cache maintenance? I
can't see it and it's needed for the !coherent case.

 +static void __iommu_free_attrs(struct device *dev, size_t size, void 
 *cpu_addr,
 +dma_addr_t handle, struct dma_attrs *attrs)
 +{
 + /*
 +  * @cpu_addr will be one of 3 things depending on how it was allocated:
 +  * - A remapped array of pages from iommu_dma_alloc(), for all
 +  *   non-atomic allocations.
 +  * - A non-cacheable alias from the atomic pool, for atomic
 +  *   allocations by non-coherent devices.
 +  * - A normal lowmem address, for atomic allocations by
 +  *   coherent devices.
 +  * Hence how dodgy the below logic looks...
 +  */
 + if (__free_from_pool(cpu_addr, size)) {
 + iommu_dma_unmap_page(dev, handle, size, 0, NULL);
 + } else if (is_vmalloc_addr(cpu_addr)){
 + struct vm_struct *area = find_vm_area(cpu_addr);
 +
 + if (WARN_ON(!area || !area-pages))
 + return;
 + iommu_dma_free(dev, area-pages, size, handle);
 + dma_common_free_remap(cpu_addr, size, VM_USERMAP);
 + } else {
 + __free_pages(virt_to_page(cpu_addr), get_order(size));
 + iommu_dma_unmap_page(dev, handle, size, 0, NULL);

Just slightly paranoid but it's better to unmap the page from the iommu
space before freeing (in case there is some rogue device still accessing
it).

-- 
Catalin

Re: [PATCH v3 3/4] arm64: Add IOMMU dma_ops

2015-07-15 Thread Catalin Marinas
On Wed, Jul 15, 2015 at 05:27:22PM +0100, Robin Murphy wrote:
 On 15/07/15 10:31, Catalin Marinas wrote:
 On Fri, Jul 10, 2015 at 08:19:34PM +0100, Robin Murphy wrote:
 +   if (iommu_dma_mapping_error(dev, *handle)) {
 +   if (coherent)
 +   __free_pages(page, get_order(size));
 +   else
 +   __free_from_pool(addr, size);
 +   addr = NULL;
 +   }
 +   }
 +   }
 +   return addr;
 +}
 
 In the second case here (!__GFP_WAIT), do we do any cache maintenance? I
 can't see it and it's needed for the !coherent case.
 
 In the atomic non-coherent case, we're stealing from the atomic pool, so
 addr is already a non-cacheable alias (and alloc_from_pool does memset(0)
 through that). That shouldn't need anything extra, right?

You are right, we already flushed the cache for the atomic pool when we
allocated it in atomic_pool_init().

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] arm64: Workaround renaming of __GFP_WAIT

2015-10-16 Thread Catalin Marinas
On Fri, Oct 16, 2015 at 04:33:41PM +0100, Robin Murphy wrote:
> The imminent renaming of __GFP_WAIT in the mm tree conflicts with its
> use in the new IOMMU DMA ops; introduce a temporary local version of
> its replacement to smooth over the transition.
> 
> This patch should be reverted at 4.4-rc1.
> 
> CC: Mel Gorman <mgor...@techsingularity.net>
> CC: Andrew Morton <a...@linux-foundation.org>
> Reported-by: Sudeep Holla <sudeep.ho...@arm.com>
> Signed-off-by: Robin Murphy <robin.mur...@arm.com>

Acked-by: Catalin Marinas <catalin.mari...@arm.com>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] arm64: Use gfpflags_allow_blocking()

2015-10-16 Thread Catalin Marinas
On Fri, Oct 16, 2015 at 04:33:42PM +0100, Robin Murphy wrote:
> __GFP_WAIT is going away to live its life under a new identity; convert
> __iommu_alloc_attrs() to the new helper function instead.
> 
> CC: Mel Gorman <mgor...@techsingularity.net>
> CC: Andrew Morton <a...@linux-foundation.org>
> Reported-by: Sudeep Holla <sudeep.ho...@arm.com>
> Signed-off-by: Robin Murphy <robin.mur...@arm.com>

Acked-by: Catalin Marinas <catalin.mari...@arm.com>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 2/3] arm64: Add IOMMU dma_ops

2015-10-14 Thread Catalin Marinas
On Thu, Oct 01, 2015 at 08:13:59PM +0100, Robin Murphy wrote:
> Taking some inspiration from the arch/arm code, implement the
> arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.
> 
> Since there is still work to do elsewhere to make DMA configuration happen
> in a more appropriate order and properly support platform devices in the
> IOMMU core, the device setup code unfortunately starts out carrying some
> workarounds to ensure it works correctly in the current state of things.
> 
> Signed-off-by: Robin Murphy <robin.mur...@arm.com>

Sorry, I reviewed this patch before but forgot to ack it, so here it is:

Acked-by: Catalin Marinas <catalin.mari...@arm.com>

(and I'm fined for the arm64 patches here to go in via the iommu tree)

I assume part of this patch will disappear at some point when the device
probing order is sorted.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v2] iommu/amd: gray the 'irq_remap_table' object for kmemleak

2015-12-02 Thread Catalin Marinas
On 2 December 2015 at 13:59, Borislav Petkov  wrote:
> On Wed, Dec 02, 2015 at 02:48:47PM +0100, Michael Wang wrote:
>> I'm not sure why amd-iommu use get_page not kmalloc to initialize the
>> pointer table, but if it's necessary then the conflict will be there,
>> it's not the fault of driver or kmemleak, but the design require them
>> to cooperate with each other.
>
> So, according to you, we should go and "fix" all callers of
> __get_free_pages() to make kmemleak happy. Then when the next new tool
> comes along, we should "fix" another kernel API just so that the tools
> are happy.

Defending kmemleak here ;). Tracking page allocations in kmemleak by
intercepting __get_free_pages() has significant implications on false
negatives for two main reasons:

1. The sl?b allocators themselves use page allocations, so kmemleak
could end up detecting the same pointer twice, hiding a potential leak

2. Most page allocations do not contain data/pointers relevant to
kmemleak (e.g. page cache pages), however the randomness of such data
greatly diminishes kmemleak's ability to detect real leaks

Arguably, kmemleak could be made to detect both cases above by a
combination of page flags, additional annotations or specific page
alloc API. However, this has its own drawbacks in terms of code
complexity (potentially outside mm/kmemleak.c) and overhead.

Regarding a kmemleak_alloc() annotation like in the patch I suggested,
that's the second one I've seen needed outside alloc APIs (the first
one is commit f75782e4e067 - "block: kmemleak: Track the page
allocations for struct request"). If the number of such explicit
annotations stays small, it's better to keep it this way.

There are other explicit annotations like kmemleak_not_leak() or
kmemleak_ignore() but these are for objects kmemleak knows about and
incorrectly reports them as leaks. Most of the time is because the
pointers to such objects are stored in a different form (e.g. physical
address).

Anyway, kmemleak is not the only tool requiring annotations (take
spin_lock_nested() for example). If needed, we could do with an
additional page alloc/free API which informs kmemleak in the process
but I don't think it's worth it.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] iommu/rockchip: add ARM64 cache flush operation for iommu

2016-05-23 Thread Catalin Marinas
On Mon, May 23, 2016 at 11:44:14AM +0100, Robin Murphy wrote:
> On 23/05/16 02:37, Shunqian Zheng wrote:
> >From: Simon 
> >
> >Signed-off-by: Simon 
> >---
> >  drivers/iommu/rockchip-iommu.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> >diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> >index 043d18c..1741b65 100644
> >--- a/drivers/iommu/rockchip-iommu.c
> >+++ b/drivers/iommu/rockchip-iommu.c
> >@@ -95,12 +95,16 @@ struct rk_iommu {
> >
> >  static inline void rk_table_flush(u32 *va, unsigned int count)
> >  {
> >+#if defined(CONFIG_ARM)
> > phys_addr_t pa_start = virt_to_phys(va);
> > phys_addr_t pa_end = virt_to_phys(va + count);
> > size_t size = pa_end - pa_start;
> >
> > __cpuc_flush_dcache_area(va, size);
> > outer_flush_range(pa_start, pa_end);
> >+#elif defined(CONFIG_ARM64)
> >+__dma_flush_range(va, va + count);
> >+#endif
> 
> Ugh, please don't use arch-private cache maintenance functions directly from
> a driver. Allocating/mapping page tables to be read by the IOMMU is still
> DMA, so using the DMA APIs is the correct way to manage them, *especially*
> if it needs to work across multiple architectures.

I fully agree, these functions should not be used in drivers.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/5] iommu/rockchip: add ARM64 cache flush operation for iommu

2016-05-24 Thread Catalin Marinas
On Tue, May 24, 2016 at 10:31:17AM +0800, Shunqian Zheng wrote:
> On 2016年05月23日 21:35, Catalin Marinas wrote:
> >On Mon, May 23, 2016 at 11:44:14AM +0100, Robin Murphy wrote:
> >>On 23/05/16 02:37, Shunqian Zheng wrote:
> >>>From: Simon <x...@rock-chips.com>
> >>>
> >>>Signed-off-by: Simon <x...@rock-chips.com>
> >>>---
> >>>  drivers/iommu/rockchip-iommu.c | 4 
> >>>  1 file changed, 4 insertions(+)
> >>>
> >>>diff --git a/drivers/iommu/rockchip-iommu.c 
> >>>b/drivers/iommu/rockchip-iommu.c
> >>>index 043d18c..1741b65 100644
> >>>--- a/drivers/iommu/rockchip-iommu.c
> >>>+++ b/drivers/iommu/rockchip-iommu.c
> >>>@@ -95,12 +95,16 @@ struct rk_iommu {
> >>>
> >>>  static inline void rk_table_flush(u32 *va, unsigned int count)
> >>>  {
> >>>+#if defined(CONFIG_ARM)
> >>>   phys_addr_t pa_start = virt_to_phys(va);
> >>>   phys_addr_t pa_end = virt_to_phys(va + count);
> >>>   size_t size = pa_end - pa_start;
> >>>
> >>>   __cpuc_flush_dcache_area(va, size);
> >>>   outer_flush_range(pa_start, pa_end);
> >>>+#elif defined(CONFIG_ARM64)
> >>>+  __dma_flush_range(va, va + count);
> >>>+#endif
> >>Ugh, please don't use arch-private cache maintenance functions directly from
> >>a driver. Allocating/mapping page tables to be read by the IOMMU is still
> >>DMA, so using the DMA APIs is the correct way to manage them, *especially*
> >>if it needs to work across multiple architectures.
> 
> It's easier for us if changing  the __dma_flush_range() to
> __flush_dcache_area() is acceptable here?

It's not really acceptable for arm64, nor for arm32. Please fix this
driver in a similar way to commit e3c971960fd4 ("iommu/tegra-smmu:
Convert to use DMA API").

The only place where we allowed __flush_dcache_area() is in the GICv3
driver and that's because it hasn't been wired as a platform device
(yet).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 1/8] arm64: mm: change IOMMU notifier action to attach DMA ops

2016-07-08 Thread Catalin Marinas
On Fri, Jul 08, 2016 at 03:55:21PM +0100, Will Deacon wrote:
> On Fri, Jul 01, 2016 at 05:50:10PM +0100, Robin Murphy wrote:
> > From: Lorenzo Pieralisi <lorenzo.pieral...@arm.com>
> > 
> > Current bus notifier in ARM64 (__iommu_attach_notifier)
> > attempts to attach dma_ops to a device on BUS_NOTIFY_ADD_DEVICE
> > action notification.
> > 
> > This will cause issues on ACPI based systems, where PCI devices
> > can be added before the IOMMUs the devices are attached to
> > had a chance to be probed, causing failures on attempts to
> > attach dma_ops in that the domain for the respective IOMMU
> > may not be set-up yet by the time the bus notifier is run.
> > 
> > Devices dma_ops do not require to be set-up till the matching
> > device drivers are probed. This means that instead of running
> > the notifier attaching dma_ops to devices (__iommu_attach_notifier)
> > on BUS_NOTIFY_ADD_DEVICE action, it can be run just before the
> > device driver is bound to the device in question (on action
> > BUS_NOTIFY_BIND_DRIVER) so that it is certain that its IOMMU
> > group and domain are set-up accordingly at the time the
> > notifier is triggered.
> > 
> > This patch changes the notifier action upon which dma_ops
> > are attached to devices and defer it to driver binding time,
> > so that IOMMU devices have a chance to be probed and to register
> > their bus notifiers before the dma_ops attach sequence for a
> > device is actually carried out.
> > 
> > As a result we also no longer need worry about racing with
> > iommu_bus_notifier(), or about retrying the queue in case devices
> > were added too early on DT-based systems, so clean up the notifier
> > itself plus the additional workaround from 722ec35f7fae ("arm64:
> > dma-mapping: fix handling of devices registered before arch_initcall")
> > 
> > Cc: Will Deacon <will.dea...@arm.com>
> > Cc: Catalin Marinas <catalin.mari...@arm.com>
> > Cc: Marek Szyprowski <m.szyprow...@samsung.com>
> > Signed-off-by: Lorenzo Pieralisi <lorenzo.pieral...@arm.com>
> > [rm: get rid of other now-redundant bits]
> > Signed-off-by: Robin Murphy <robin.mur...@arm.com>
> > ---
> > 
> > v4: No change.
> > 
> >  arch/arm64/mm/dma-mapping.c | 22 +-
> >  1 file changed, 5 insertions(+), 17 deletions(-)
> 
> Whilst this series is unlikely to make it for 4.8, this patch is
> independent and it would be good to see it queued in the arm64 tree for
> 4.8, if possible. It shouldn't change anything on its own, but it's a
> prerequisite for this series and anything on the IORT side from Lorenzo,
> so it makes sense to me to keep the delta down if we can.

I queued it for 4.8. Thanks.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] iommu/dma: Implement dma_{map,unmap}_resource()

2016-11-10 Thread Catalin Marinas
On Thu, Nov 10, 2016 at 02:30:04PM +, Robin Murphy wrote:
> On 10/11/16 12:24, Joerg Roedel wrote:
> > On Wed, Oct 26, 2016 at 06:43:56PM +0100, Robin Murphy wrote:
> >> With the new dma_{map,unmap}_resource() functions added to the DMA API
> >> for the benefit of cases like slave DMA, add suitable implementations to
> >> the arsenal of our generic layer. Since cache maintenance should not be
> >> a concern, these can both be standalone callback implementations without
> >> the need for arch code wrappers.
> >>
> >> CC: Joerg Roedel 
> >> Signed-off-by: Robin Murphy 
> >> ---
> >>
> >> v2: Use iommu_dma_unmap_page for symmetry, instead of being sneaky.
> >>
> >>  drivers/iommu/dma-iommu.c | 13 +
> >>  include/linux/dma-iommu.h |  4 
> >>  2 files changed, 17 insertions(+)
> >>
> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> >> index c5ab8667e6f2..a2fd90a6a782 100644
> >> --- a/drivers/iommu/dma-iommu.c
> >> +++ b/drivers/iommu/dma-iommu.c
> >> @@ -624,6 +624,19 @@ void iommu_dma_unmap_sg(struct device *dev, struct 
> >> scatterlist *sg, int nents,
> >>__iommu_dma_unmap(iommu_get_domain_for_dev(dev), sg_dma_address(sg));
> >>  }
> >>  
> >> +dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
> >> +  size_t size, enum dma_data_direction dir, unsigned long attrs)
> >> +{
> >> +  return iommu_dma_map_page(dev, phys_to_page(phys), offset_in_page(phys),
> >> +  size, dma_direction_to_prot(dir, false) | IOMMU_MMIO);
> >> +}
> > 
> > Isn't the whole point of map_resource that we can map regions which have
> > no 'struct page' associated? The use phys_to_page() will certainly not
> > do the right thing here.
> 
> Perhaps it's a bit cheeky, but the struct page pointer is never
> dereferenced - the only thing iommu_dma_map_page() does with it is
> immediately convert it back again. Is there ever a case where
> page_to_phys(phys_to_page(phys)) != phys ?

Without SPARSEMEM_VMEMMAP or FLATMEM, it wouldn't work. For example,
__page_to_pfn() in the SPARSEMEM case, used by page_to_phys(), even
accesses the page structure (through page_to_section()).

If the page struct is not relevant to the iommu code in question, you
could factor it out into an __iommu_dma_map_pfn(). Or maybe move the
whole logic to iommu_dma_map_resource() and call it from
iommu_dma_map_page() with the page_to_phys(page) argument.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/2] iommu/dma: Implement dma_{map,unmap}_resource()

2016-11-14 Thread Catalin Marinas
On Mon, Nov 14, 2016 at 12:16:26PM +, Robin Murphy wrote:
> With the new dma_{map,unmap}_resource() functions added to the DMA API
> for the benefit of cases like slave DMA, add suitable implementations to
> the arsenal of our generic layer. Since cache maintenance should not be
> a concern, these can both be standalone callback implementations without
> the need for arch code wrappers.
> 
> CC: Joerg Roedel <j...@8bytes.org>
> Signed-off-by: Robin Murphy <robin.mur...@arm.com>

FWIW:

Reviewed-by: Catalin Marinas <catalin.mari...@arm.com>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/2] arm64: Wire up iommu_dma_{map, unmap}_resource()

2016-11-13 Thread Catalin Marinas
On Fri, Nov 11, 2016 at 06:27:35PM +, Robin Murphy wrote:
> With no coherency to worry about, just plug'em straight in.
> 
> CC: Catalin Marinas <catalin.mari...@arm.com>
> CC: Will Deacon <will.dea...@arm.com>
> Signed-off-by: Robin Murphy <robin.mur...@arm.com>
> ---
>  arch/arm64/mm/dma-mapping.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 3f74d0d98de6..5cd0a383b14b 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -796,6 +796,8 @@ static struct dma_map_ops iommu_dma_ops = {
>   .sync_single_for_device = __iommu_sync_single_for_device,
>   .sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
>   .sync_sg_for_device = __iommu_sync_sg_for_device,
> + .map_resource = iommu_dma_map_resource,
> + .unmap_resource = iommu_dma_unmap_resource,
>   .dma_supported = iommu_dma_supported,
>   .mapping_error = iommu_dma_mapping_error,

Acked-by: Catalin Marinas <catalin.mari...@arm.com>

I guess these two patches would go in via the iommu tree.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 1/2] iommu/dma: Implement dma_{map,unmap}_resource()

2016-11-13 Thread Catalin Marinas
On Fri, Nov 11, 2016 at 06:27:34PM +, Robin Murphy wrote:
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c5ab8667e6f2..8b745260b3bc 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -432,13 +432,12 @@ int iommu_dma_mmap(struct page **pages, size_t size, 
> struct vm_area_struct *vma)
>   return ret;
>  }
>  
> -dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> - unsigned long offset, size_t size, int prot)
> +dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
> + size_t size, int prot)

This should be static.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] arm64/dma-mapping: fix DMA_ATTR_FORCE_CONTIGUOUS mmaping code

2017-04-21 Thread Catalin Marinas
On Wed, Mar 29, 2017 at 01:55:32PM +0100, Robin Murphy wrote:
> On 29/03/17 11:05, Andrzej Hajda wrote:
> > In case of DMA_ATTR_FORCE_CONTIGUOUS allocations vm_area->pages
> > is invalid. __iommu_mmap_attrs and __iommu_get_sgtable cannot use
> > it. In first case temporary pages array is passed to iommu_dma_mmap,
> > in 2nd case single entry sg table is created directly instead
> > of calling helper.
> > 
> > Fixes: 44176bb ("arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to IOMMU")
> > Signed-off-by: Andrzej Hajda 
> > ---
> > Hi,
> > 
> > I am not familiar with this framework so please don't be too cruel ;)
> > Alternative solution I see is to always create vm_area->pages,
> > I do not know which approach is preferred.
> > 
> > Regards
> > Andrzej
> > ---
> >  arch/arm64/mm/dma-mapping.c | 40 ++--
> >  1 file changed, 38 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> > index f7b5401..bba2bc8 100644
> > --- a/arch/arm64/mm/dma-mapping.c
> > +++ b/arch/arm64/mm/dma-mapping.c
> > @@ -704,7 +704,30 @@ static int __iommu_mmap_attrs(struct device *dev, 
> > struct vm_area_struct *vma,
> > return ret;
> >  
> > area = find_vm_area(cpu_addr);
> > -   if (WARN_ON(!area || !area->pages))
> > +   if (WARN_ON(!area))
> 
> From the look of things, it doesn't seem strictly necessary to change
> this, but whether that's a good thing is another matter. I'm not sure
> that dma_common_contiguous_remap() should really be leaving a dangling
> pointer in area->pages as it apparently does... :/

On this specific point, I don't think area->pages should be set either
(cc'ing Laura). As in the vmalloc vs vmap case, area->pages when pages
need to be freed (via vfree), which is not the case here. The
dma_common_pages_remap() would need to set area->pages when called
directly from the iommu DMA ops. Proposal below, not tested with the
iommu ops. I assume the patch would cause __iommu_mmap_attrs() to return
-ENXIO if DMA_ATTR_FORCE_CONTIGUOUS is set.

8<-
diff --git a/drivers/base/dma-mapping.c b/drivers/base/dma-mapping.c
index efd71cf4fdea..ab7071041141 100644
--- a/drivers/base/dma-mapping.c
+++ b/drivers/base/dma-mapping.c
@@ -277,8 +277,8 @@ EXPORT_SYMBOL(dma_common_mmap);
  * remaps an array of PAGE_SIZE pages into another vm_area
  * Cannot be used in non-sleeping contexts
  */
-void *dma_common_pages_remap(struct page **pages, size_t size,
-   unsigned long vm_flags, pgprot_t prot,
+static struct vm_struct *__dma_common_pages_remap(struct page **pages,
+   size_t size, unsigned long vm_flags, pgprot_t prot,
const void *caller)
 {
struct vm_struct *area;
@@ -287,13 +287,26 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
if (!area)
return NULL;
 
-   area->pages = pages;
-
if (map_vm_area(area, prot, pages)) {
vunmap(area->addr);
return NULL;
}
 
+   return area;
+}
+
+void *dma_common_pages_remap(struct page **pages, size_t size,
+   unsigned long vm_flags, pgprot_t prot,
+   const void *caller)
+{
+   struct vm_struct *area;
+
+   area = __dma_common_pages_remap(pages, size, vm_flags, prot, caller);
+   if (!area)
+   return NULL;
+
+   area->pages = pages;
+
return area->addr;
 }
 
@@ -308,7 +321,7 @@ void *dma_common_contiguous_remap(struct page *page, size_t 
size,
 {
int i;
struct page **pages;
-   void *ptr;
+   struct vm_struct *area;
unsigned long pfn;
 
pages = kmalloc(sizeof(struct page *) << get_order(size), GFP_KERNEL);
@@ -318,11 +331,13 @@ void *dma_common_contiguous_remap(struct page *page, 
size_t size,
for (i = 0, pfn = page_to_pfn(page); i < (size >> PAGE_SHIFT); i++)
pages[i] = pfn_to_page(pfn + i);
 
-   ptr = dma_common_pages_remap(pages, size, vm_flags, prot, caller);
+   area = __dma_common_pages_remap(pages, size, vm_flags, prot, caller);
 
kfree(pages);
 
-   return ptr;
+   if (!area)
+   return NULL;
+   return area->addr;
 }
 
 /*

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5] arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to IOMMU

2017-04-20 Thread Catalin Marinas
Catching up with these threads, so replying to a patch I already
applied.

On Tue, Mar 07, 2017 at 06:43:32PM +0100, Geert Uytterhoeven wrote:
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -584,20 +584,7 @@ static void *__iommu_alloc_attrs(struct device *dev, 
> size_t size,
>*/
>   gfp |= __GFP_ZERO;
>  
> - if (gfpflags_allow_blocking(gfp)) {
> - struct page **pages;
> - pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);
> -
> - pages = iommu_dma_alloc(dev, iosize, gfp, attrs, ioprot,
> - handle, flush_page);
> - if (!pages)
> - return NULL;
> -
> - addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
> -   __builtin_return_address(0));
> - if (!addr)
> - iommu_dma_free(dev, pages, iosize, handle);
> - } else {
> + if (!gfpflags_allow_blocking(gfp)) {
>   struct page *page;
>   /*
>* In atomic context we can't remap anything, so we'll only
> @@ -621,6 +608,45 @@ static void *__iommu_alloc_attrs(struct device *dev, 
> size_t size,
>   __free_from_pool(addr, size);
>   addr = NULL;
>   }
> + } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
> + pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);
> + struct page *page;
> +
> + page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
> +  get_order(size), gfp);
> + if (!page)
> + return NULL;
> +
> + *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
> + if (iommu_dma_mapping_error(dev, *handle)) {
> + dma_release_from_contiguous(dev, page,
> + size >> PAGE_SHIFT);
> + return NULL;
> + }
> + if (!coherent)
> + __dma_flush_area(page_to_virt(page), iosize);
> +
> + addr = dma_common_contiguous_remap(page, size, VM_USERMAP,
> +prot,
> +__builtin_return_address(0));

Do we need to call dma_common_pages_remap() if the allocation is
coherent? In the __dma_alloc() case we don't do it but simply use
page_address(page) as returned by __dma_alloc_coherent().

(note that my comment is not meant to fix the issue reported by Andrzej
Hajda but I just spotted it)

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] arm64/dma-mapping: fix DMA_ATTR_FORCE_CONTIGUOUS mmaping code

2017-04-21 Thread Catalin Marinas
On Fri, Mar 31, 2017 at 01:02:51PM +0200, Andrzej Hajda wrote:
> In case of DMA_ATTR_FORCE_CONTIGUOUS allocations vm_area->pages
> is invalid. __iommu_mmap_attrs and __iommu_get_sgtable should not
> use it and take advantage of contiguous nature of the allocation.

As I replied to your original patch, I think
dma_common_contiguous_remap() should be fixed not to leave a dangling
area->pages pointer.

>  arch/arm64/mm/dma-mapping.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index f7b5401..41c7c36 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -703,6 +703,10 @@ static int __iommu_mmap_attrs(struct device *dev, struct 
> vm_area_struct *vma,
>   if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, ))
>   return ret;
>  
> + if (attrs & DMA_ATTR_FORCE_CONTIGUOUS)
> + return vm_iomap_memory(vma,
> + page_to_phys(vmalloc_to_page(cpu_addr)), size);

I replied to Geert's patch on whether we actually need to call
dma_common_contiguous_remap() in __iommu_alloc_attrs() if the device is
coherent. We don't do this for the swiotlb allocation. If we are going
to change this, cpu_addr may or may not be in the vmalloc range and
vmalloc_to_page() would fail (I guess we could add an is_vmalloc_addr()
check).

Alternatively, just open-code dma_common_contiguous_remap() in
__iommu_alloc_attrs() to keep a valid area->pages around (I think Geert
already suggested this). AFAICT, arch/arm does this already in its own
__iommu_alloc_buffer().

Yet another option would be for iommu_dma_alloc() to understand
DMA_ATTR_FORCE_CONTIGUOUS and remove the arm64-specific handling.

> @@ -715,8 +719,19 @@ static int __iommu_get_sgtable(struct device *dev, 
> struct sg_table *sgt,
>  size_t size, unsigned long attrs)
>  {
>   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> - struct vm_struct *area = find_vm_area(cpu_addr);
> + struct vm_struct *area;
> +
> + if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
> + int ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
> +
> + if (!ret)
> + sg_set_page(sgt->sgl, vmalloc_to_page(cpu_addr),
> + PAGE_ALIGN(size), 0);
>  
> + return ret;
> + }
> +
> + area = find_vm_area(cpu_addr);

As Russell already stated, this function needs a "fix" regardless of the
DMA_ATTR_FORCE_CONTIGUOUS as we may not have page structures
(*_from_coherent allocations). But I agree with you that
dma_get_sgtable() issues should be addressed separately (and I'm happy
to disable such API on arm64 until sorted).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5] arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to IOMMU

2017-03-21 Thread Catalin Marinas
On Tue, Mar 07, 2017 at 06:43:32PM +0100, Geert Uytterhoeven wrote:
> Add support for allocating physically contiguous DMA buffers on arm64
> systems with an IOMMU.  This can be useful when two or more devices
> with different memory requirements are involved in buffer sharing.
> 
> Note that as this uses the CMA allocator, setting the
> DMA_ATTR_FORCE_CONTIGUOUS attribute has a runtime-dependency on
> CONFIG_DMA_CMA, just like on arm32.
> 
> For arm64 systems using swiotlb, no changes are needed to support the
> allocation of physically contiguous DMA buffers:
>   - swiotlb always uses physically contiguous buffers (up to
> IO_TLB_SEGSIZE = 128 pages),
>   - arm64's __dma_alloc_coherent() already calls
> dma_alloc_from_contiguous() when CMA is available.
> 
> Signed-off-by: Geert Uytterhoeven 
> Acked-by: Laurent Pinchart 
> Reviewed-by: Robin Murphy 

Queued for 4.12. Thanks.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] arm64/dma-mapping: fix DMA_ATTR_FORCE_CONTIGUOUS mmaping code

2017-04-24 Thread Catalin Marinas
Hi Geert,

On Fri, Apr 21, 2017 at 07:39:43PM +0200, Geert Uytterhoeven wrote:
> On Fri, Apr 21, 2017 at 7:32 PM, Catalin Marinas
> <catalin.mari...@arm.com> wrote:
> > On Fri, Mar 31, 2017 at 01:02:51PM +0200, Andrzej Hajda wrote:
> >> In case of DMA_ATTR_FORCE_CONTIGUOUS allocations vm_area->pages
> >> is invalid. __iommu_mmap_attrs and __iommu_get_sgtable should not
> >> use it and take advantage of contiguous nature of the allocation.
> >
> > As I replied to your original patch, I think
> > dma_common_contiguous_remap() should be fixed not to leave a dangling
> > area->pages pointer.
> >
> >>  arch/arm64/mm/dma-mapping.c | 17 -
> >>  1 file changed, 16 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> >> index f7b5401..41c7c36 100644
> >> --- a/arch/arm64/mm/dma-mapping.c
> >> +++ b/arch/arm64/mm/dma-mapping.c
> >> @@ -703,6 +703,10 @@ static int __iommu_mmap_attrs(struct device *dev, 
> >> struct vm_area_struct *vma,
> >>   if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, ))
> >>   return ret;
> >>
> >> + if (attrs & DMA_ATTR_FORCE_CONTIGUOUS)
> >> + return vm_iomap_memory(vma,
> >> + page_to_phys(vmalloc_to_page(cpu_addr)), 
> >> size);
> >
> > I replied to Geert's patch on whether we actually need to call
> > dma_common_contiguous_remap() in __iommu_alloc_attrs() if the device is
> > coherent. We don't do this for the swiotlb allocation. If we are going
> > to change this, cpu_addr may or may not be in the vmalloc range and
> > vmalloc_to_page() would fail (I guess we could add an is_vmalloc_addr()
> > check).
> >
> > Alternatively, just open-code dma_common_contiguous_remap() in
> > __iommu_alloc_attrs() to keep a valid area->pages around (I think Geert
> > already suggested this). AFAICT, arch/arm does this already in its own
> > __iommu_alloc_buffer().
> >
> > Yet another option would be for iommu_dma_alloc() to understand
> > DMA_ATTR_FORCE_CONTIGUOUS and remove the arm64-specific handling.
> 
> That was actually the approach I took in my v1.
> V2 changed that to provide standalone iommu_dma_{alloc,free}_contiguous()
> functions.
> V3 changed that to call everything directly from the arm64 code.
> ...

I now read through the past discussions (as I ignored the threads
previously). I got Robin's point of not extending iommu_dma_alloc()
(though it looked simpler) and open-coding dma_common_contiguous_remap()
wouldn't make sense either as a way to pass this buffer to
iommu_dma_mmap() since it wasn't returned by iommu_dma_alloc().

So I think we should just fall back to the swiotlb ops for the mmap and
get_sgtable but since we don't have a dma_addr_t handle (we only have
the one of the other side of the IOMMU), we'd need to factor out the
common code from __swiotlb_mmap into a __swiotlb_mmap_page (similarly
for __swiotlb_get_sgtable). The __iommu_* functions would call these
with the correct page (from vmalloc_to_page) if
DMA_ATTR_FORCE_CONTIGUOUS and the buffer is not a candidate for
*_from_coherent.

While fixing/removing dma_get_sgtable() is a nice goal, we first need to
address DMA_ATTR_FORCE_CONTIGUOUS on arm64 since the patch breaks
existing use-cases (and I'd like to avoid reverting this patch).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] arm64/dma-mapping: fix DMA_ATTR_FORCE_CONTIGUOUS mmaping code

2017-04-25 Thread Catalin Marinas
On Mon, Apr 24, 2017 at 09:58:23AM -0700, Laura Abbott wrote:
> On 04/21/2017 09:12 AM, Catalin Marinas wrote:
> > On Wed, Mar 29, 2017 at 01:55:32PM +0100, Robin Murphy wrote:
> >> On 29/03/17 11:05, Andrzej Hajda wrote:
> >>> In case of DMA_ATTR_FORCE_CONTIGUOUS allocations vm_area->pages
> >>> is invalid. __iommu_mmap_attrs and __iommu_get_sgtable cannot use
> >>> it. In first case temporary pages array is passed to iommu_dma_mmap,
> >>> in 2nd case single entry sg table is created directly instead
> >>> of calling helper.
> >>>
> >>> Fixes: 44176bb ("arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to 
> >>> IOMMU")
> >>> Signed-off-by: Andrzej Hajda <a.ha...@samsung.com>
> >>> ---
> >>> Hi,
> >>>
> >>> I am not familiar with this framework so please don't be too cruel ;)
> >>> Alternative solution I see is to always create vm_area->pages,
> >>> I do not know which approach is preferred.
> >>>
> >>> Regards
> >>> Andrzej
> >>> ---
> >>>  arch/arm64/mm/dma-mapping.c | 40 ++--
> >>>  1 file changed, 38 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> >>> index f7b5401..bba2bc8 100644
> >>> --- a/arch/arm64/mm/dma-mapping.c
> >>> +++ b/arch/arm64/mm/dma-mapping.c
> >>> @@ -704,7 +704,30 @@ static int __iommu_mmap_attrs(struct device *dev, 
> >>> struct vm_area_struct *vma,
> >>>   return ret;
> >>>  
> >>>   area = find_vm_area(cpu_addr);
> >>> - if (WARN_ON(!area || !area->pages))
> >>> + if (WARN_ON(!area))
> >>
> >> From the look of things, it doesn't seem strictly necessary to change
> >> this, but whether that's a good thing is another matter. I'm not sure
> >> that dma_common_contiguous_remap() should really be leaving a dangling
> >> pointer in area->pages as it apparently does... :/
> > 
> > On this specific point, I don't think area->pages should be set either
> > (cc'ing Laura). As in the vmalloc vs vmap case, area->pages when pages
> > need to be freed (via vfree), which is not the case here. The
> > dma_common_pages_remap() would need to set area->pages when called
> > directly from the iommu DMA ops. Proposal below, not tested with the
> > iommu ops. I assume the patch would cause __iommu_mmap_attrs() to return
> > -ENXIO if DMA_ATTR_FORCE_CONTIGUOUS is set.
[...]
> From a quick glance, this looks okay. I can give a proper tag when
> the fix to allow mmaping comes in since I'm guessing -ENXIO is not the
> end goal.

I'll post a proper patch and description. Strictly speaking, the above
fix is independent and we should rather get -ENXIO on arm64's
__iommu_mmap_attrs than dereferencing an already freed pointer. However,
I'm not sure anyone else would trigger this. Even on arm64, once we fix
the mmap case with DMA_ATTR_FORCE_CONTIGUOUS, we wouldn't dereference
the dangling area->pages pointer.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/3] arm64: Force swiotlb bounce buffering for non-coherent DMA with large CWG

2018-05-14 Thread Catalin Marinas
On Sat, May 12, 2018 at 02:38:29PM +0200, Christoph Hellwig wrote:
> On Fri, May 11, 2018 at 02:55:47PM +0100, Catalin Marinas wrote:
> > On systems with a Cache Writeback Granule (CTR_EL0.CWG) greater than
> > ARCH_DMA_MINALIGN, DMA cache maintenance on sub-CWG ranges is not safe,
> > leading to data corruption. If such configuration is detected, the
> > kernel will force swiotlb bounce buffering for all non-coherent devices.
> 
> Per the previous discussion I understand that so far this is a
> purely theoretical condition. 

That's what we think, at least for publicly available hardware.

> Given that I'd rather avoid commiting this patch and just refuse too
> boot in this case.

I'll keep it to a WARN_TAINT() for now. Given that the warn triggers
only when cache_line_size() > ARCH_DMA_MINALIGN and we keep this
constant unchanged (128), it shouldn't be much different from our
current assumptions and no-one complained of DMA corruption so far.

> In a merge window or two I plan to have a noncoherent flag in struct
> device, at which point we can handle this entirely in common code.

Sounds ok, looking forward to this.

Thanks.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/3] Revert "arm64: Increase the max granular size"

2018-05-11 Thread Catalin Marinas
This reverts commit 97303480753e48fb313dc0e15daaf11b0451cdb8.

Commit 97303480753e ("arm64: Increase the max granular size") increased
the cache line size to 128 to match Cavium ThunderX, apparently for some
performance benefit which could not be confirmed. This change, however,
has an impact on the network packet allocation in certain circumstances,
requiring slightly over a 4K page with a significant performance
degradation. The patch reverts L1_CACHE_SHIFT back to 6 (64-byte cache
line).

Cc: Will Deacon <will.dea...@arm.com>
Cc: Robin Murphy <robin.mur...@arm.com>
Signed-off-by: Catalin Marinas <catalin.mari...@arm.com>
---
 arch/arm64/include/asm/cache.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index 9bbffc7a301f..1dd2c2db0010 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -33,7 +33,7 @@
 #define ICACHE_POLICY_VIPT 2
 #define ICACHE_POLICY_PIPT 3
 
-#define L1_CACHE_SHIFT 7
+#define L1_CACHE_SHIFT (6)
 #define L1_CACHE_BYTES (1 << L1_CACHE_SHIFT)
 
 /*
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/3] arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)

2018-05-11 Thread Catalin Marinas
Hi,

The previous version of this patch [1] didn't make it into 4.17 because
of the (compile-time) conflicts with the generic dma-direct.h changes.
I'm reposting it for 4.18 with some minor changes:

- phys_to_dma()/dma_to_phys() now gained underscores to match the
  generic dma-direct.h implementation

- the patch is split in three to make the changes clearer to the
  reviewers

If at some point in the future the generic swiotlb code gain
non-choerent DMA awareness, the last patch in the series could be
refactored. In the meantime, the simplest, non-intrusive approach is to
select ARCH_HAS_PHYS_TO_DMA on arm64 and force bounce buffering through
an arch-specific dma_capable() implementation.

Thanks,

Catalin

[1] http://lkml.kernel.org/r/20180228184720.25467-1-catalin.mari...@arm.com

Catalin Marinas (3):
  Revert "arm64: Increase the max granular size"
  arm64: Increase ARCH_DMA_MINALIGN to 128
  arm64: Force swiotlb bounce buffering for non-coherent DMA with large
CWG

 arch/arm64/Kconfig  |  1 +
 arch/arm64/include/asm/cache.h  |  6 +++---
 arch/arm64/include/asm/dma-direct.h | 43 +
 arch/arm64/kernel/cpufeature.c  |  9 ++--
 arch/arm64/mm/dma-mapping.c | 17 +++
 arch/arm64/mm/init.c|  3 ++-
 6 files changed, 68 insertions(+), 11 deletions(-)
 create mode 100644 arch/arm64/include/asm/dma-direct.h

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/3] arm64: Increase ARCH_DMA_MINALIGN to 128

2018-05-11 Thread Catalin Marinas
This patch increases the ARCH_DMA_MINALIGN to 128 so that it covers the
currently known Cache Writeback Granule (CTR_EL0.CWG) on arm64 and moves
the fallback in cache_line_size() from L1_CACHE_BYTES to this constant.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Robin Murphy <robin.mur...@arm.com>
Signed-off-by: Catalin Marinas <catalin.mari...@arm.com>
---
 arch/arm64/include/asm/cache.h | 4 ++--
 arch/arm64/kernel/cpufeature.c | 9 ++---
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index 1dd2c2db0010..5df5cfe1c143 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -43,7 +43,7 @@
  * cache before the transfer is done, causing old data to be seen by
  * the CPU.
  */
-#define ARCH_DMA_MINALIGN  L1_CACHE_BYTES
+#define ARCH_DMA_MINALIGN  (128)
 
 #ifndef __ASSEMBLY__
 
@@ -77,7 +77,7 @@ static inline u32 cache_type_cwg(void)
 static inline int cache_line_size(void)
 {
u32 cwg = cache_type_cwg();
-   return cwg ? 4 << cwg : L1_CACHE_BYTES;
+   return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
 }
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9d1b06d67c53..fbee8c17a4e6 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1606,7 +1606,6 @@ static void __init setup_system_capabilities(void)
 void __init setup_cpu_features(void)
 {
u32 cwg;
-   int cls;
 
setup_system_capabilities();
mark_const_caps_ready();
@@ -1627,13 +1626,9 @@ void __init setup_cpu_features(void)
 * Check for sane CTR_EL0.CWG value.
 */
cwg = cache_type_cwg();
-   cls = cache_line_size();
if (!cwg)
-   pr_warn("No Cache Writeback Granule information, assuming cache 
line size %d\n",
-   cls);
-   if (L1_CACHE_BYTES < cls)
-   pr_warn("L1_CACHE_BYTES smaller than the Cache Writeback 
Granule (%d < %d)\n",
-   L1_CACHE_BYTES, cls);
+   pr_warn("No Cache Writeback Granule information, assuming %d\n",
+   ARCH_DMA_MINALIGN);
 }
 
 static bool __maybe_unused
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/3] arm64: Force swiotlb bounce buffering for non-coherent DMA with large CWG

2018-05-11 Thread Catalin Marinas
On systems with a Cache Writeback Granule (CTR_EL0.CWG) greater than
ARCH_DMA_MINALIGN, DMA cache maintenance on sub-CWG ranges is not safe,
leading to data corruption. If such configuration is detected, the
kernel will force swiotlb bounce buffering for all non-coherent devices.

Cc: Will Deacon <will.dea...@arm.com>
Cc: Robin Murphy <robin.mur...@arm.com>
Cc: Christoph Hellwig <h...@lst.de>
Signed-off-by: Catalin Marinas <catalin.mari...@arm.com>
---
 arch/arm64/Kconfig  |  1 +
 arch/arm64/include/asm/dma-direct.h | 43 +
 arch/arm64/mm/dma-mapping.c | 17 +++
 arch/arm64/mm/init.c|  3 ++-
 4 files changed, 63 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/dma-direct.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eb2cf4938f6d..ef56b2478205 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -17,6 +17,7 @@ config ARM64
select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
select ARCH_HAS_KCOV
select ARCH_HAS_MEMBARRIER_SYNC_CORE
+   select ARCH_HAS_PHYS_TO_DMA
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_STRICT_KERNEL_RWX
diff --git a/arch/arm64/include/asm/dma-direct.h 
b/arch/arm64/include/asm/dma-direct.h
new file mode 100644
index ..0c18a4d56702
--- /dev/null
+++ b/arch/arm64/include/asm/dma-direct.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_DMA_DIRECT_H
+#define __ASM_DMA_DIRECT_H
+
+#include 
+#include 
+
+#include 
+
+DECLARE_STATIC_KEY_FALSE(swiotlb_noncoherent_bounce);
+
+static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr)
+{
+   dma_addr_t dev_addr = (dma_addr_t)paddr;
+
+   return dev_addr - ((dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
+}
+
+static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t 
dev_addr)
+{
+   phys_addr_t paddr = (phys_addr_t)dev_addr;
+
+   return paddr + ((phys_addr_t)dev->dma_pfn_offset << PAGE_SHIFT);
+}
+
+static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
+{
+   if (!dev->dma_mask)
+   return false;
+
+   /*
+* Force swiotlb buffer bouncing when ARCH_DMA_MINALIGN < CWG. The
+* swiotlb bounce buffers are aligned to (1 << IO_TLB_SHIFT).
+*/
+   if (static_branch_unlikely(_noncoherent_bounce) &&
+   !is_device_dma_coherent(dev) &&
+   !is_swiotlb_buffer(__dma_to_phys(dev, addr)))
+   return false;
+
+   return addr + size - 1 <= *dev->dma_mask;
+}
+
+#endif /* __ASM_DMA_DIRECT_H */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index a96ec0181818..1e9dac8684ca 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -33,6 +33,7 @@
 #include 
 
 static int swiotlb __ro_after_init;
+DEFINE_STATIC_KEY_FALSE(swiotlb_noncoherent_bounce);
 
 static pgprot_t __get_dma_pgprot(unsigned long attrs, pgprot_t prot,
 bool coherent)
@@ -504,6 +505,14 @@ static int __init arm64_dma_init(void)
max_pfn > (arm64_dma_phys_limit >> PAGE_SHIFT))
swiotlb = 1;
 
+   if (WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
+  TAINT_CPU_OUT_OF_SPEC,
+  "ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)",
+  ARCH_DMA_MINALIGN, cache_line_size())) {
+   swiotlb = 1;
+   static_branch_enable(_noncoherent_bounce);
+   }
+
return atomic_pool_init();
 }
 arch_initcall(arm64_dma_init);
@@ -882,6 +891,14 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size,
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
 {
+   /*
+* Enable swiotlb for buffer bouncing if ARCH_DMA_MINALIGN < CWG.
+* dma_capable() forces the actual bounce if the device is
+* non-coherent.
+*/
+   if (static_branch_unlikely(_noncoherent_bounce) && !coherent)
+   iommu = NULL;
+
if (!dev->dma_ops)
dev->dma_ops = _swiotlb_dma_ops;
 
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 9f3c47acf8ff..664acf177799 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -586,7 +586,8 @@ static void __init free_unused_memmap(void)
 void __init mem_init(void)
 {
if (swiotlb_force == SWIOTLB_FORCE ||
-   max_pfn > (arm64_dma_phys_limit >> PAGE_SHIFT))
+   max_pfn > (arm64_dma_phys_limit >> PAGE_SHIFT) ||
+   ARCH_DMA_MINALIGN < cache_line_size())
swiotlb_init(1);
else
swiotlb_force = SWIO

Re: [PATCH v2 16/40] arm64: mm: Pin down ASIDs for sharing mm with devices

2018-05-15 Thread Catalin Marinas
Hi Jean-Philippe,

On Fri, May 11, 2018 at 08:06:17PM +0100, Jean-Philippe Brucker wrote:
> +unsigned long mm_context_get(struct mm_struct *mm)
> +{
> + unsigned long flags;
> + u64 asid;
> +
> + raw_spin_lock_irqsave(_asid_lock, flags);
> +
> + asid = atomic64_read(>context.id);
> +
> + if (mm->context.pinned) {
> + mm->context.pinned++;
> + asid &= ~ASID_MASK;
> + goto out_unlock;
> + }
> +
> + if (nr_pinned_asids >= max_pinned_asids) {
> + asid = 0;
> + goto out_unlock;
> + }
> +
> + if (!asid_gen_match(asid)) {
> + /*
> +  * We went through one or more rollover since that ASID was
> +  * used. Ensure that it is still valid, or generate a new one.
> +  * The cpu argument isn't used by new_context.
> +  */
> + asid = new_context(mm, 0);
> + atomic64_set(>context.id, asid);
> + }
> +
> + asid &= ~ASID_MASK;
> +
> + nr_pinned_asids++;
> + __set_bit(asid2idx(asid), pinned_asid_map);
> + mm->context.pinned++;
> +
> +out_unlock:
> + raw_spin_unlock_irqrestore(_asid_lock, flags);
> +
> + return asid;
> +}

With CONFIG_UNMAP_KERNEL_AT_EL0 (a.k.a. KPTI), the hardware ASID has bit
0 set automatically when entering user space (and cleared when getting
back to the kernel). If the returned asid value here is going to be used
as is in the calling code, you should probably set bit 0 when KPTI is
enabled.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/14] dma-direct: handle the memory encryption bit in common code

2018-03-19 Thread Catalin Marinas
On Mon, Mar 19, 2018 at 05:03:43PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 19, 2018 at 03:48:33PM +, Will Deacon wrote:
> > Why can't we just resolve the conflict by adding the underscores?
> 
> We can solve the conflict easily that way.  But that's not the point.
> 
> The point is that I've been fighting hard to consolidate dma code
> given that the behavior really is common and not arch specific.  And
> this one is another case like that:  the fact that the non-coherent
> dma boundary is bigger than the exposed size is something that can
> easily happen elsewhere, so there is no need to duplicate a lot
> of code for that.

I don't particularly like maintaining an arm64-specific dma-direct.h
either but arm64 seems to be the only architecture that needs to
potentially force a bounce when cache_line_size() > ARCH_DMA_MINALIGN
and the device is non-coherent. Note that lib/swiotlb.c doesn't even
deal with non-coherent DMA (e.g. map_sg doesn't have arch callbacks for
cache maintenance), so not disrupting lib/swiotlb.c seems to be the
least intrusive option.

> Nevermind that the commit should at least be three different patches:
> 
>  (1) revert the broken original commit
>  (2) increase the dma min alignment

Reverting the original commit could, on its own, break an SoC which
expects ARCH_DMA_MINALIGN == 128. So these two should be a single commit
(my patch only reverts the L1_CACHE_BYTES change rather than
ARCH_DMA_MINALIGN, the latter being correct as 128).

Anyway, it's queued already and we try not to rebase the branches we
published. Fix-ups on top are fine though.

>  (3) put the swiotlb workaround in place

As I said above, adding a check in swiotlb.c for
!is_device_dma_coherent(dev) && (ARCH_DMA_MINALIGN < cache_line_size())
feels too architecture specific. Adding yet another hook like
arch_dma_capable() doesn't feel right either since we already have the
possibility to override dma_capable() by selecting ARCH_HAS_PHYS_TO_DMA.

The "cleanest" I came up with for swiotlb.c was a new
DMA_ATTR_FORCE_BOUNCE attribute. However, it required more changes to
the arm64 dma-mapping.c than simply implementing an arch-specific
dma_capable().

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/14] dma-direct: handle the memory encryption bit in common code

2018-03-20 Thread Catalin Marinas
On Mon, Mar 19, 2018 at 08:49:30PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 19, 2018 at 06:01:41PM +0000, Catalin Marinas wrote:
> > I don't particularly like maintaining an arm64-specific dma-direct.h
> > either but arm64 seems to be the only architecture that needs to
> > potentially force a bounce when cache_line_size() > ARCH_DMA_MINALIGN
> > and the device is non-coherent.
> 
> mips is another likely candidate, see all the recent drama about
> dma_get_alignmet().  And I'm also having major discussion about even
> exposing the cache line size architecturally for RISC-V, so changes
> are high it'll have to deal with this mess sooner or later as they
> probably can't agree on a specific cache line size.

On Arm, the cache line size varies between 32 and 128 on publicly
available hardware (and I wouldn't exclude higher numbers at some
point). In addition, the cache line size has a different meaning in the
DMA context, we call it "cache writeback granule" on Arm which is
greater than or equal the minimum cache line size.

So the aim is to have L1_CACHE_BYTES small enough for acceptable
performance numbers and ARCH_DMA_MINALIGN the maximum from a correctness
perspective (the latter is defined by some larger cache lines in L2/L3).

To make things worse, there is no clear definition in the generic kernel
on what cache_line_size() means and the default definition returns
L1_CACHE_BYTES. On arm64, we define it to the hardware's cache
writeback granule (CWG), if available, with a fallback on
ARCH_DMA_MINALIGN. The network layer, OTOH, seems to assume that
SMP_CACHE_BYTES is sufficient for DMA alignment (L1_CACHE_BYTES in
arm64's case).

> > As I said above, adding a check in swiotlb.c for
> > !is_device_dma_coherent(dev) && (ARCH_DMA_MINALIGN < cache_line_size())
> > feels too architecture specific.
> 
> And what exactly is architecture specific about that?  It is a totally
> generic concept, which at this point also seems entirely theoretical
> based on the previous mail in this thread.

The concept may be generic but the kernel macros/functions used here
aren't. is_device_dma_coherent() is only defined on arm and arm64. The
relation between ARCH_DMA_MINALIGN, L1_CACHE_BYTES and cache_line_size()
seems to be pretty ad-hoc. ARCH_DMA_MINALIGN is also only defined for
some architectures and, while there is dma_get_cache_alignment() which
returns this constant, it doesn't seem to be used much.

I'm all for fixing this in a generic way but I think we first need
swiotlb.c to become aware of non-cache-coherent DMA devices.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 08/10] swiotlb: don't dip into swiotlb pool for coherent allocations

2018-10-12 Thread Catalin Marinas
On Mon, Oct 08, 2018 at 10:02:44AM +0200, Christoph Hellwig wrote:
> All architectures that support swiotlb also have a zone that backs up
> these less than full addressing allocations (usually ZONE_DMA32).
> 
> Because of that it is rather pointless to fall back to the global swiotlb
> buffer if the normal dma direct allocation failed - the only thing this
> will do is to eat up bounce buffers that would be more useful to serve
> streaming mappings.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm64/mm/dma-mapping.c |   6 +--
>  include/linux/swiotlb.h |   5 --
>  kernel/dma/swiotlb.c| 105 +---
>  3 files changed, 5 insertions(+), 111 deletions(-)

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/10] swiotlb: remove the overflow buffer

2018-10-12 Thread Catalin Marinas
On Mon, Oct 08, 2018 at 10:02:40AM +0200, Christoph Hellwig wrote:
> Like all other dma mapping drivers just return an error code instead
> of an actual memory buffer.  The reason for the overflow buffer was
> that at the time swiotlb was invented there was no way to check for
> dma mapping errors, but this has long been fixed.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm64/mm/dma-mapping.c   |  2 +-
>  arch/powerpc/kernel/dma-swiotlb.c |  4 +--
>  include/linux/dma-direct.h|  2 ++
>  include/linux/swiotlb.h   |  3 --
>  kernel/dma/direct.c   |  2 --
>  kernel/dma/swiotlb.c  | 59 ++-
>  6 files changed, 8 insertions(+), 64 deletions(-)

I went through the patches and they look fine to me (with the vunmap
fix for the last patch). For the arm64 bits:

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/10] arm64: use the generic swiotlb_dma_ops

2018-10-12 Thread Catalin Marinas
On Fri, Oct 12, 2018 at 04:40:49PM +0200, Christoph Hellwig wrote:
> On Fri, Oct 12, 2018 at 02:01:00PM +0100, Robin Murphy wrote:
> > On 08/10/18 09:02, Christoph Hellwig wrote:
> >> Now that the generic swiotlb code supports non-coherent DMA we can switch
> >> to it for arm64.  For that we need to refactor the existing
> >> alloc/free/mmap/pgprot helpers to be used as the architecture hooks,
> >> and implement the standard arch_sync_dma_for_{device,cpu} hooks for
> >> cache maintaincance in the streaming dma hooks, which also implies
> >> using the generic dma_coherent flag in struct device.
> >>
> >> Note that we need to keep the old is_device_dma_coherent function around
> >> for now, so that the shared arm/arm64 Xen code keeps working.
> >
> > OK, so when I said last night that it boot-tested OK, that much was true, 
> > but then I shut the board down as I left and got a megasplosion of bad page 
> > state BUGs, e.g.:
> 
> I think this is because I am passing the wrong address to
> dma_direct_free_pages.  Please try this patch on top:
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 3c75d69b54e7..4f0f92856c4c 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -126,10 +126,12 @@ void *arch_dma_alloc(struct device *dev, size_t size, 
> dma_addr_t *dma_handle,
>  void arch_dma_free(struct device *dev, size_t size, void *vaddr,
>   dma_addr_t dma_handle, unsigned long attrs)
>  {
> - if (__free_from_pool(vaddr, PAGE_ALIGN(size)))
> - return;
> - vunmap(vaddr);
> - dma_direct_free_pages(dev, size, vaddr, dma_handle, attrs);
> + if (!__free_from_pool(vaddr, PAGE_ALIGN(size)))
> + void *kaddr = phys_to_virt(dma_to_phys(dev, dma_handle));
> +
> + vunmap(vaddr);
> + dma_direct_free_pages(dev, size, kaddr, dma_handle, attrs);
> + }
>  }
>  
>  long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr,

With this fix:

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 RFC/RFT] dma-contiguous: Get normal pages for single-page allocations

2019-03-25 Thread Catalin Marinas
On Fri, Mar 22, 2019 at 01:09:26PM -0700, Nicolin Chen wrote:
> On Fri, Mar 22, 2019 at 10:57:13AM +0000, Catalin Marinas wrote:
> > > > Do you have any numbers to back this up? You don't seem to address
> > > > dma_direct_alloc() either but, as I said above, it's not trivial since
> > > > some platforms expect certain physical range for DMA allocations.
> > > 
> > > What's the dma_direct_alloc() here about? Mind elaborating?
> > 
> > I just did a grep for dma_alloc_from_contiguous() in the 5.1-rc1 kernel
> > and came up with __dma_direct_alloc_pages(). Should your patch cover
> > this as well?
> 
> I don't get the meaning of "cover this" here. What missing part
> do you refer to?

What I meant, do you need this hunk as well in your patch?

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index fcdb23e8d2fc..8955ba6f52fc 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -111,8 +111,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, 
size_t size,
 again:
/* CMA can be used only in the context which permits sleeping */
if (gfpflags_allow_blocking(gfp)) {
-   page = dma_alloc_from_contiguous(dev, count, page_order,
-gfp & __GFP_NOWARN);
+   page = dma_alloc_from_contiguous(dev, count, page_order, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_release_from_contiguous(dev, page, count);
page = NULL;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 RFC/RFT] dma-contiguous: Get normal pages for single-page allocations

2019-03-19 Thread Catalin Marinas
On Tue, Mar 05, 2019 at 10:32:02AM -0800, Nicolin Chen wrote:
> The addresses within a single page are always contiguous, so it's
> not so necessary to always allocate one single page from CMA area.
> Since the CMA area has a limited predefined size of space, it may
> run out of space in heavy use cases, where there might be quite a
> lot CMA pages being allocated for single pages.
> 
> However, there is also a concern that a device might care where a
> page comes from -- it might expect the page from CMA area and act
> differently if the page doesn't.
> 
> This patch tries to get normal pages for single-page allocations
> unless the device has its own CMA area. This would save resources
> from the CMA area for more CMA allocations. And it'd also reduce
> CMA fragmentations resulted from trivial allocations.

This is not sufficient. Some architectures/platforms declare limits on
the CMA range so that DMA is possible with all expected devices. For
example, on arm64 we keep the CMA in the lower 4GB of the address range,
though with this patch you only covered the iommu ops allocation.

Do you have any numbers to back this up? You don't seem to address
dma_direct_alloc() either but, as I said above, it's not trivial since
some platforms expect certain physical range for DMA allocations.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 RFC/RFT] dma-contiguous: Get normal pages for single-page allocations

2019-03-22 Thread Catalin Marinas
Hi Nicolin,

On Thu, Mar 21, 2019 at 04:32:49PM -0700, Nicolin Chen wrote:
> On Tue, Mar 19, 2019 at 02:43:01PM +0000, Catalin Marinas wrote:
> > On Tue, Mar 05, 2019 at 10:32:02AM -0800, Nicolin Chen wrote:
> > > The addresses within a single page are always contiguous, so it's
> > > not so necessary to always allocate one single page from CMA area.
> > > Since the CMA area has a limited predefined size of space, it may
> > > run out of space in heavy use cases, where there might be quite a
> > > lot CMA pages being allocated for single pages.
> > > 
> > > However, there is also a concern that a device might care where a
> > > page comes from -- it might expect the page from CMA area and act
> > > differently if the page doesn't.
> > > 
> > > This patch tries to get normal pages for single-page allocations
> > > unless the device has its own CMA area. This would save resources
> > > from the CMA area for more CMA allocations. And it'd also reduce
> > > CMA fragmentations resulted from trivial allocations.
> > 
> > This is not sufficient. Some architectures/platforms declare limits on
> > the CMA range so that DMA is possible with all expected devices. For
> > example, on arm64 we keep the CMA in the lower 4GB of the address range,
> > though with this patch you only covered the iommu ops allocation.
> 
> I will follow the way of v1 by adding alloc_page()/free_page()
> function to those callers who don't have fallback allocations.
> In this way, archs may use different callbacks to alloc pages.
> 
> > Do you have any numbers to back this up? You don't seem to address
> > dma_direct_alloc() either but, as I said above, it's not trivial since
> > some platforms expect certain physical range for DMA allocations.
> 
> What's the dma_direct_alloc() here about? Mind elaborating?

I just did a grep for dma_alloc_from_contiguous() in the 5.1-rc1 kernel
and came up with __dma_direct_alloc_pages(). Should your patch cover
this as well?

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: add config symbols for arch_{setup,teardown}_dma_ops

2019-02-11 Thread Catalin Marinas
On Mon, Feb 11, 2019 at 02:21:56PM +0100, Christoph Hellwig wrote:
> Any chance to get a quick review on this small series?

For arm64:

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 RFC/RFT 1/5] ARM: dma-mapping: Add fallback normal page allocations

2019-04-30 Thread Catalin Marinas
(catching up on email)

On Wed, Apr 24, 2019 at 09:26:52PM +0200, Christoph Hellwig wrote:
> On Wed, Apr 24, 2019 at 11:33:11AM -0700, Nicolin Chen wrote:
> > I feel it's similar to my previous set, which did most of these
> > internally except the renaming part. But Catalin had a concern
> > that some platforms might have limits on CMA range [1]. Will it
> > be still okay to do the fallback internally?
> > 
> > [1: https://www.spinics.net/lists/arm-kernel/msg714295.html ]
> 
> Catalins statement is correct, but I don't see how it applies to
> your patch.  Your patch just ensures that the fallback we have
> in most callers is uniformly applied everywhere.  The non-iommu
> callers will still need to select a specific zone and/or retry
> just the page allocator with other flags if the CMA (or fallback)
> page doesn't match what they need.  dma-direct does this correctly
> and I think the arm32 allocator does as well, although it is a bit
> hard to follow sometimes.

My reading of the arm32 __dma_alloc() is that if the conditions are
right for the CMA allocator (allows blocking) and there is a default CMA
area or a per-device one, the call ends up in cma_alloc() without any
fallback if such allocation fails. Whether this is on purpose, I'm not
entirely sure. There are a couple of arm32 SoCs which call
dma_declare_contiguous() or dma_contiguous_reserve_area() and a few DT
files describing a specific CMA range (e.g. arch/arm/boot/dts/sun5i.dtsi
with a comment that address must be kept in the lower 256MB).

If ZONE_DMA is set up correctly so that cma_alloc() is (or can be made)
interchangeable with alloc_pages(GFP_DMA) from a device DMA capability
perspective , I think it should be fine to have such fallback.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 08/29] mm/kmemleak: Simplify stacktrace handling

2019-04-18 Thread Catalin Marinas
On Thu, Apr 18, 2019 at 10:41:27AM +0200, Thomas Gleixner wrote:
> Replace the indirection through struct stack_trace by using the storage
> array based interfaces.
> 
> Signed-off-by: Thomas Gleixner 
> Cc: Catalin Marinas 
> Cc: linux...@kvack.org

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/25] iommu/dma: Remove the flush_page callback

2019-05-03 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 06:51:53AM -0400, Christoph Hellwig wrote:
> We now have a arch_dma_prep_coherent architecture hook that is used
> for the generic DMA remap allocator, and we should use the same
> interface for the dma-iommu code.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Robin Murphy 
> ---
>  arch/arm64/mm/dma-mapping.c | 8 +---
>  drivers/iommu/dma-iommu.c   | 8 +++-
>  include/linux/dma-iommu.h   | 3 +--
>  3 files changed, 5 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 674860e3e478..10a8852c8b6a 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -104,12 +104,6 @@ arch_initcall(arm64_dma_init);
>  #include 
>  #include 
>  
> -/* Thankfully, all cache ops are by VA so we can ignore phys here */
> -static void flush_page(struct device *dev, const void *virt, phys_addr_t 
> phys)
> -{
> - __dma_flush_area(virt, PAGE_SIZE);
> -}

Rather than removing, should this not become arch_dma_prep_coherent()?
With patch 2 selecting the corresponding Kconfig option, I think with
this patch you'd get a build error (haven't tried).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: implement generic dma_map_ops for IOMMUs v4

2019-05-03 Thread Catalin Marinas
On Thu, May 02, 2019 at 03:22:08PM +0200, Christoph Hellwig wrote:
> can you quickly look over the arm64 parts?  I'd really like to still
> get this series in for this merge window as it would conflict with
> a lot of dma-mapping work for next merge window, and we also have
> the amd and possibly intel iommu conversions to use it waiting.

Done. They look fine to me.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/25] arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable

2019-05-03 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 06:51:50AM -0400, Christoph Hellwig wrote:
> DMA allocations that can't sleep may return non-remapped addresses, but
> we do not properly handle them in the mmap and get_sgtable methods.
> Resolve non-vmalloc addresses using virt_to_page to handle this corner
> case.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Robin Murphy 

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/25] iommu/dma: move the arm64 wrappers to common code

2019-05-03 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 06:51:55AM -0400, Christoph Hellwig wrote:
> There is nothing really arm64 specific in the iommu_dma_ops
> implementation, so move it to dma-iommu.c and keep a lot of symbols
> self-contained.  Note the implementation does depend on the
> DMA_DIRECT_REMAP infrastructure for now, so we'll have to make the
> DMA_IOMMU support depend on it, but this will be relaxed soon.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Robin Murphy 

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/25] dma-mapping: add a Kconfig symbol to indicated arch_dma_prep_coherent presence

2019-05-03 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 06:51:51AM -0400, Christoph Hellwig wrote:
> Add a Kconfig symbol that indicates an architecture provides a
> arch_dma_prep_coherent implementation, and provide a stub otherwise.
> 
> This will allow the generic dma-iommu code to use it while still
> allowing to be built for cache coherent architectures.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Robin Murphy 

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/25] iommu/dma: Remove the flush_page callback

2019-05-03 Thread Catalin Marinas
On Fri, May 03, 2019 at 12:43:35PM +0100, Catalin Marinas wrote:
> On Tue, Apr 30, 2019 at 06:51:53AM -0400, Christoph Hellwig wrote:
> > We now have a arch_dma_prep_coherent architecture hook that is used
> > for the generic DMA remap allocator, and we should use the same
> > interface for the dma-iommu code.
> > 
> > Signed-off-by: Christoph Hellwig 
> > Reviewed-by: Robin Murphy 
> > ---
> >  arch/arm64/mm/dma-mapping.c | 8 +---
> >  drivers/iommu/dma-iommu.c   | 8 +++-
> >  include/linux/dma-iommu.h   | 3 +--
> >  3 files changed, 5 insertions(+), 14 deletions(-)
> > 
> > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> > index 674860e3e478..10a8852c8b6a 100644
> > --- a/arch/arm64/mm/dma-mapping.c
> > +++ b/arch/arm64/mm/dma-mapping.c
> > @@ -104,12 +104,6 @@ arch_initcall(arm64_dma_init);
> >  #include 
> >  #include 
> >  
> > -/* Thankfully, all cache ops are by VA so we can ignore phys here */
> > -static void flush_page(struct device *dev, const void *virt, phys_addr_t 
> > phys)
> > -{
> > -   __dma_flush_area(virt, PAGE_SIZE);
> > -}
> 
> Rather than removing, should this not become arch_dma_prep_coherent()?
> With patch 2 selecting the corresponding Kconfig option, I think with
> this patch you'd get a build error (haven't tried).

Ah, sorry, it was already there.

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 24/25] arm64: switch copyright boilerplace to SPDX in dma-mapping.c

2019-05-03 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 06:52:13AM -0400, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> Acked-by: Robin Murphy 
> Reviewed-by: Mukesh Ojha 

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 25/25] arm64: trim includes in dma-mapping.c

2019-05-03 Thread Catalin Marinas
On Tue, Apr 30, 2019 at 06:52:14AM -0400, Christoph Hellwig wrote:
> With most of the previous functionality now elsewhere a lot of the
> headers included in this file are not needed.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 01/11] asm-generic: add dma_zone_size

2019-08-30 Thread Catalin Marinas
On Mon, Aug 26, 2019 at 03:46:52PM +0200, Nicolas Saenz Julienne wrote:
> On Mon, 2019-08-26 at 09:09 +0200, Christoph Hellwig wrote:
> > On Tue, Aug 20, 2019 at 04:58:09PM +0200, Nicolas Saenz Julienne wrote:
> > > Some architectures have platform specific DMA addressing limitations.
> > > This will allow for hardware description code to provide the constraints
> > > in a generic manner, so as for arch code to properly setup it's memory
> > > zones and DMA mask.
> > 
> > I know this just spreads the arm code, but I still kinda hate it.
> 
> Rob's main concern was finding a way to pass the constraint from HW definition
> to arch without widening fdt's architecture specific function surface. I'd say
> it's fair to argue that having a generic mechanism makes sense as it'll now
> traverse multiple archs and subsystems.
> 
> I get adding globals like this is not very appealing, yet I went with it as it
> was the easier to integrate with arm's code. Any alternative suggestions?

In some discussion with Robin, since it's just RPi4 that we are aware of
having such requirement on arm64, he suggested that we have a permanent
ZONE_DMA on arm64 with a default size of 1GB. It should cover all arm64
SoCs we know of without breaking the single Image binary. The arch/arm
can use its current mach-* support.

I may like this more than the proposed early_init_dt_get_dma_zone_size()
here which checks for specific SoCs (my preferred way was to build the
mask from all buses described in DT but I hadn't realised the
complications).

-- 
Catalin


Re: [PATCH 1/2] dma-mapping: fix page attributes for dma_mmap_*

2019-08-05 Thread Catalin Marinas
On Mon, Aug 05, 2019 at 11:01:44AM +0300, Christoph Hellwig wrote:
> All the way back to introducing dma_common_mmap we've defaulyed to mark

s/defaultyed/defaulted/

> the pages as uncached.  But this is wrong for DMA coherent devices.
> Later on DMA_ATTR_WRITE_COMBINE also got incorrect treatment as that
> flag is only treated special on the alloc side for non-coherent devices.
> 
> Introduce a new dma_pgprot helper that deals with the check for coherent
> devices so that only the remapping cases even reach arch_dma_mmap_pgprot

s/even/ever/

> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 1d3f0b5a9940..bd2b039f43a6 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -14,9 +14,7 @@
>  pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
>   unsigned long attrs)
>  {
> - if (!dev_is_dma_coherent(dev) || (attrs & DMA_ATTR_WRITE_COMBINE))
> - return pgprot_writecombine(prot);
> - return prot;
> + return pgprot_writecombine(prot);
>  }

For arm64:

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/8] arm64: use ZONE_DMA on DMA addressing limited devices

2019-07-31 Thread Catalin Marinas
On Wed, Jul 31, 2019 at 05:47:48PM +0200, Nicolas Saenz Julienne wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 1c4ffabbe1cb..f5279ef85756 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -50,6 +50,13 @@
>  s64 memstart_addr __ro_after_init = -1;
>  EXPORT_SYMBOL(memstart_addr);
>  
> +/*
> + * We might create both a ZONE_DMA and ZONE_DMA32. ZONE_DMA is needed if 
> there
> + * are periferals unable to address the first naturally aligned 4GB of ram.
> + * ZONE_DMA32 will be expanded to cover the rest of that memory. If such
> + * limitations doesn't exist only ZONE_DMA32 is created.
> + */

Shouldn't we instead only create ZONE_DMA to cover the whole 32-bit
range and leave ZONE_DMA32 empty? Can__GFP_DMA allocations fall back
onto ZONE_DMA32?

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 3/4] dma-direct: add dma_direct_min_mask

2019-07-24 Thread Catalin Marinas
On Fri, Jul 19, 2019 at 03:08:52PM +0200, Nicolas Saenz Julienne wrote:
> On Thu, 2019-07-18 at 13:18 +0200, Nicolas Saenz Julienne wrote:
> > On Thu, 2019-07-18 at 11:15 +0200, Christoph Hellwig wrote:
> > > On Wed, Jul 17, 2019 at 05:31:34PM +0200, Nicolas Saenz Julienne wrote:
> > > > Historically devices with ZONE_DMA32 have been assumed to be able to
> > > > address at least the lower 4GB of ram for DMA. This is still the defualt
> > > > behavior yet the Raspberry Pi 4 is limited to the first GB of memory.
> > > > This has been observed to trigger failures in dma_direct_supported() as
> > > > the 'min_mask' isn't properly set.
> > > > 
> > > > We create 'dma_direct_min_mask' in order for the arch init code to be
> > > > able to fine-tune dma direct's 'min_dma' mask.
> > > 
> > > Normally we use ZONE_DMA for that case.
> > 
> > Fair enough, I didn't think of that possibility.
> > 
> > So would the arm64 maintainers be happy with something like this:
> > 
> > - ZONE_DMA: Follows standard definition, 16MB in size. ARCH_ZONE_DMA_BITS is
> > left as is.
> > - ZONE_DMA32: Will honor the most constraining 'dma-ranges'. Which so far 
> > for
> >   most devices is 4G, except for RPi4.
> > - ZONE_NORMAL: The rest of the memory.
> 
> Never mind this suggestion, I don't think it makes any sense. If anything 
> arm64
> seems to fit the ZONE_DMA usage pattern of arm and powerpc: where ZONE_DMA's
> size is decided based on ram size and/or board configuration. It was actually
> set-up like this until Christoph's ad67f5a6545f7 ("arm64: replace ZONE_DMA 
> with
> ZONE_DMA32").
> 
> So the easy solution would be to simply revert that commit. On one hand I feel
> it would be a step backwards as most 64 bit architectures have been moving to
> use ZONE_DMA32. On the other, current ZONE_DMA32 usage seems to be heavily
> rooted on having a 32 bit DMA mask*, which will no longer be the case on arm64
> if we want to support the RPi 4.
> 
> So the way I see it and lacking a better solution, the argument is stronger on
> moving back arm64 to using ZONE_DMA. Any comments/opinions?

As it was suggested in this or the previous thread, I'm not keen on
limiting ZONE_DMA32 to the smalles RPi4 can cover, as the naming
implies this zone should cover 32-bit devices that can deal with a full
32-bit mask.

I think it may be better if we have both ZONE_DMA and ZONE_DMA32 on
arm64. ZONE_DMA would be based on the smallest dma-ranges as described
in the DT while DMA32 covers the first naturally aligned 4GB of RAM
(unchanged). When a smaller ZONE_DMA is not needed, it could be expanded
to cover what would normally be ZONE_DMA32 (or could we have ZONE_DMA as
0-bytes? I don't think GFP_DMA can still allocate memory in this case).

We'd probably have to define ARCH_ZONE_DMA_BITS for arm64 to something
smaller than 32-bit but sufficient to cover the known platforms like
RPi4 (the current 24 is too small, so maybe 30). AFAICT,
__dma_direct_optimal_gfp_mask() figures out whether GFP_DMA or GFP_DMA32
should be passed.

-- 
Catalin


Re: [PATCH] dma/direct: turn ARCH_ZONE_DMA_BITS into a variable

2019-10-31 Thread Catalin Marinas
On Thu, Oct 31, 2019 at 05:58:53PM +0100, Christoph Hellwig wrote:
> On Thu, Oct 31, 2019 at 05:22:59PM +0100, Nicolas Saenz Julienne wrote:
> > OK, I see what you mean now. It's wrong indeed.
> > 
> > The trouble is the ZONE_DMA series[1] in arm64, also due for v5.5, will be
> > affected by this patch. I don't know the right way to approach this problem
> > since depending on the merge order, this patch should be updated or the 
> > arm64
> > ZONE_DMA series fixed.
> > 
> > Maybe it's easier to just wait for v5.6.
> 
> Ok, I can wait.  Or the arm64 maintainers can pick up this patch if
> you want to add it to that series.

This branch is stable (may add a fix but not I'm not rebasing it) if you
want to base this patch on top:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/zone-dma

Otherwise, with your ack, I can add it on top of the above branch (aimed
at 5.5).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 0/5] ARM: Raspberry Pi 4 DMA support

2019-10-15 Thread Catalin Marinas
On Tue, Oct 15, 2019 at 09:48:22AM +0200, Nicolas Saenz Julienne wrote:
> A little off topic but I was wondering if you have a preferred way to refer to
> the arm architecture in a way that it unambiguously excludes arm64 (for 
> example
> arm32 would work).

arm32 should be fine. Neither arm64 nor arm32 are officially endorsed
ARM Ltd names (officially the exception model is AArch32 while the
instruction set is one of A32/T32/T16).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 0/5] ARM: Raspberry Pi 4 DMA support

2019-10-14 Thread Catalin Marinas
On Mon, Oct 14, 2019 at 08:31:02PM +0200, Nicolas Saenz Julienne wrote:
> the Raspberry Pi 4 offers up to 4GB of memory, of which only the first
> is DMA capable device wide. This forces us to use of bounce buffers,
> which are currently not very well supported by ARM's custom DMA ops.
> Among other things the current mechanism (see dmabounce.c) isn't
> suitable for high memory. Instead of fixing it, this series introduces a
> way of selecting dma-direct as the default DMA ops provider which allows
> for the Raspberry Pi to make use of swiotlb.

I presume these patches go on top of this series:

http://lkml.kernel.org/r/20190911182546.17094-1-nsaenzjulie...@suse.de

which I queued here:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/zone-dma

-- 
Catalin


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-02 Thread Catalin Marinas
On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > --- a/drivers/of/fdt.c
> > > > +++ b/drivers/of/fdt.c
> > > > @@ -25,6 +25,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include   /* for zone_dma_bits */
> > > >  
> > > >  #include   /* for COMMAND_LINE_SIZE */
> > > >  #include 
> > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > >  }
> > > >  
> > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > +{
> > > > +   unsigned long dt_root = of_get_flat_dt_root();
> > > > +
> > > > +   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > +   zone_dma_bits = 30;
> > > > +}
> > > 
> > > I think we could keep this entirely in the arm64 setup_machine_fdt() and
> > > not pollute the core code with RPi4-specific code.
> > 
> > Actually, even better, could we not move the check to
> > arm64_memblock_init() when we initialise zone_dma_bits?
> 
> I did it this way as I vaguely remembered Rob saying he wanted to centralise
> all early boot fdt code in one place. But I'll be happy to move it there.

I can see Rob replied and I'm fine if that's his preference. However,
what I don't particularly like is that in the arm64 code, if
zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
the early_init_dt_update_zone_dma_bits(). What if at some point we'll
get a platform that actually needs 24 here (I truly hope not, but just
the principle of relying on magic values)?

So rather than guessing, I'd prefer if the arch code can override
ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
need to explicitly touch the zone_dma_bits variable.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-15 Thread Catalin Marinas
On Thu, Oct 15, 2020 at 10:26:18PM +0800, Hanjun Guo wrote:
> On 2020/10/15 3:12, Nicolas Saenz Julienne wrote:
> > From: Ard Biesheuvel 
> > 
> > We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
> > incorporating masters that can address less than 32 bits of DMA, in
> > particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
> > peripherals that can only address up to 1 GB (and its PCIe host
> > bridge can only access the bottom 3 GB)
> > 
> > Instructing the DMA layer about these limitations is straight-forward,
> > even though we had to fix some issues regarding memory limits set in
> > the IORT for named components, and regarding the handling of ACPI _DMA
> > methods. However, the DMA layer also needs to be able to allocate
> > memory that is guaranteed to meet those DMA constraints, for bounce
> > buffering as well as allocating the backing for consistent mappings.
> > 
> > This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
> > it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
> > problems with kdump, and potentially in other places where allocations
> > cannot cross zone boundaries. Therefore, we should avoid having two
> > separate DMA zones when possible.
> > 
> > So let's do an early scan of the IORT, and only create the ZONE_DMA
> > if we encounter any devices that need it. This puts the burden on
> > the firmware to describe such limitations in the IORT, which may be
> > redundant (and less precise) if _DMA methods are also being provided.
> > However, it should be noted that this situation is highly unusual for
> > arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
> > the _DMA method if implemented, and so we will not lose the ability to
> > perform streaming DMA outside the ZONE_DMA if the _DMA method permits
> > it.
> 
> Sorry, I'm still a little bit confused. With this patch, if we have
> a device which set the right _DMA method (DMA size >= 32), but with the
> wrong DMA size in IORT, we still have the ZONE_DMA created which
> is actually not needed?

With the current kernel, we get a ZONE_DMA already with an arbitrary
size of 1GB that matches what RPi4 needs. We are trying to eliminate
such unnecessary ZONE_DMA based on some heuristics (well, something that
looks "better" than a OEM ID based quirk). Now, if we learn that IORT
for platforms in the field is that broken as to describe few bits-wide
DMA masks, we may have to go back to the OEM ID quirk.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-12 Thread Catalin Marinas
On Mon, Oct 12, 2020 at 08:47:15AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 09, 2020 at 06:10:52PM +0100, Catalin Marinas wrote:
> > kdump wants DMA-able memory and,
> 
> DMAable by whom?  The only way to guranteed DMAable memory is to use
> the DMA memory allocator(s) and pass a specific device to them.  Everyting
> else is just fundamentally broken.  Note that even when device is not
> DMAable we can still use swiotlb to access it.

What I meant is that the new kexec'ed kernel needs some memory in the
ZONE_DMA range, currently set to the bottom 30-bit even for platforms
that can cope with the whole 32-bit range (anything other than RPi4).
The memory range available to the kdump kernels is limited to what
reserve_crashkernel() allocated, which may not fit in the lower 1GB.

There are two ongoing threads (complementary):

1. Change the arm64 reserve_crashkernel() similar to x86 which allocates
   memory above 4G with a small block in the ZONE_DMA range.

2. Allow zone_dma_bits to be 32 for arm64 platforms other than RPi4.

The second point also fixes some regressions with CMA reservations that
could no longer fit in the lower 1GB.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/5] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()

2020-10-12 Thread Catalin Marinas
On Sat, Oct 10, 2020 at 05:12:31PM +0200, Nicolas Saenz Julienne wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index f6902a2b4ea6..0eca5865dcb1 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -196,14 +196,16 @@ static void __init zone_sizes_init(unsigned long min, 
> unsigned long max)
>   unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
>  
>  #ifdef CONFIG_ZONE_DMA
> + zone_dma_bits = ARM64_ZONE_DMA_BITS;
> +
>   if (IS_ENABLED(CONFIG_ACPI)) {
>   extern unsigned int acpi_iort_get_zone_dma_size(void);
>  
>   zone_dma_bits = min(zone_dma_bits,
>   acpi_iort_get_zone_dma_size());
> - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
>   }
>  
> + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
>   max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>  #endif
>  #ifdef CONFIG_ZONE_DMA32
> @@ -394,11 +396,6 @@ void __init arm64_memblock_init(void)
>  
>   early_init_fdt_scan_reserved_mem();
>  
> - if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> - arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
> - }

arm64_dma_phys_limit is used by memblock_alloc_low() (via
ARCH_LOW_ADDRESS_LIMIT). I think it's too late to leave its
initialisation to zone_sizes_init().

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-09 Thread Catalin Marinas
On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
>  wrote:
> > We can move this check to IORT code and call it from arm64 if it
> > can be made to work.
> 
> Finding the smallest value in the IORT, and assigning it to
> zone_dma_bits if it is < 32 should be easy. But as I understand it,
> having these separate DMA and DMA32 zones is what breaks kdump, no? So
> how is this going to fix the underlying issue?

If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
allocations fall back to ZONE_DMA).

kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
(it's been a while since I looked), the kdump allocation couldn't span
multiple zones.

In a separate thread, we try to fix kdump to use allocations above 4G as
a fallback but this only fixes platforms with enough RAM (and maybe it's
only those platforms that care about kdump).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-10 Thread Catalin Marinas
On Sat, Oct 10, 2020 at 12:53:19PM +0200, Nicolas Saenz Julienne wrote:
> On Sat, 2020-10-10 at 12:36 +0200, Ard Biesheuvel wrote:
> > On Fri, 9 Oct 2020 at 19:10, Catalin Marinas  
> > wrote:
> > > On Fri, Oct 09, 2020 at 06:23:06PM +0200, Ard Biesheuvel wrote:
> > > > On Fri, 9 Oct 2020 at 17:24, Lorenzo Pieralisi
> > > >  wrote:
> > > > > We can move this check to IORT code and call it from arm64 if it
> > > > > can be made to work.
> > > > 
> > > > Finding the smallest value in the IORT, and assigning it to
> > > > zone_dma_bits if it is < 32 should be easy. But as I understand it,
> > > > having these separate DMA and DMA32 zones is what breaks kdump, no? So
> > > > how is this going to fix the underlying issue?
> > > 
> > > If zone_dma_bits is 32, ZONE_DMA32 disappears into ZONE_DMA (GFP_DMA32
> > > allocations fall back to ZONE_DMA).
> > > 
> > > kdump wants DMA-able memory and, without a 30-bit ZONE_DMA, that would
> > > be the bottom 32-bit. With the introduction of ZONE_DMA, this suddenly
> > > became 1GB. We could change kdump to allocate ZONE_DMA32 but this one
> > > may also be small as it lost 1GB to ZONE_DMA. However, the kdump kernel
> > > would need to be rebuilt without ZONE_DMA since it won't have any. IIRC
> > > (it's been a while since I looked), the kdump allocation couldn't span
> > > multiple zones.
> > > 
> > > In a separate thread, we try to fix kdump to use allocations above 4G as
> > > a fallback but this only fixes platforms with enough RAM (and maybe it's
> > > only those platforms that care about kdump).
> > > 
> > 
> > One thing that strikes me as odd is that we are applying the same
> > shifting logic to ZONE_DMA as we are applying to ZONE_DMA32, i.e., if
> > DRAM starts outside of the zone, it is shifted upwards.
> > 
> > On a typical ARM box, this gives me
> > 
> > [0.00] Zone ranges:
> > [0.00]   DMA  [mem 0x8000-0xbfff]
> > [0.00]   DMA32[mem 0xc000-0x]
> > [0.00]   Normal   [mem 0x0001-0x000f]
> > 
> > i.e., the 30-bit addressable range has bit 31 set, which is weird.
> 
> Yes I agree it's weird, and IMO plain useless. I sent a series this summer to
> address this[1], which ultimately triggered the discussion we're having right
> now.

I don't mind assuming that ZONE_DMA is always from pfn 0 (i.e. no
dma_offset for such constrained devices). But if ZONE_DMA32 is squeezed
out with ZONE_DMA extended to 4GB, it should allow non-zero upper 32
bits. IIRC we do have SoCs with RAM starting above 4GB.

However, your patch didn't completely solve the problem for non-RPi4
platforms as there's hardware with RAM starting at 0 that doesn't need
the 1GB ZONE_DMA. We may end up with a combination of the two
approaches.

> Although with with your latest patch and the DT counterpart, we should be OK.
> It would be weird for a HW description to define DMA constraints that are
> impossible to reach on that system.

I don't remember the difficulties with parsing a DT early for inferring
the ZONE_DMA requirements. Could we not check the dma-ranges property in
the soc node? I can see bcm2711.dtsi defines a 30-bit address range. We
are not interested in the absolute physical/bus addresses, just the
size to check whether it's smaller than 32-bit.

> > I wonder if it wouldn't be better (and less problematic in the general
> > case) to drop this logic for ZONE_DMA, and simply let it remain empty
> > unless there is really some memory there.
> 
> From my experience, you can't have empty ZONE_DMA when enabled. Empty
> ZONE_DMA32 is OK though. Although I'm sure it's something that can be changed.

Indeed, because we still have GFP_DMA around which can't fall back to
ZONE_DMA32 (well, unless CONFIG_ZONE_DMA is disabled).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-08 Thread Catalin Marinas
On Thu, Oct 08, 2020 at 12:05:25PM +0200, Nicolas Saenz Julienne wrote:
> On Fri, 2020-10-02 at 12:55 +0100, Catalin Marinas wrote:
> > On Thu, Oct 01, 2020 at 07:31:19PM +0200, Nicolas Saenz Julienne wrote:
> > > On Thu, 2020-10-01 at 18:23 +0100, Catalin Marinas wrote:
> > > > On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> > > > > On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne 
> > > > > wrote:
> > > > > > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > > > > > index 4602e467ca8b..cd0d115ef329 100644
> > > > > > --- a/drivers/of/fdt.c
> > > > > > +++ b/drivers/of/fdt.c
> > > > > > @@ -25,6 +25,7 @@
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > +#include   /* for zone_dma_bits */
> > > > > >  
> > > > > >  #include   /* for COMMAND_LINE_SIZE */
> > > > > >  #include 
> > > > > > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > > > > > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> > > > > >  }
> > > > > >  
> > > > > > +void __init early_init_dt_update_zone_dma_bits(void)
> > > > > > +{
> > > > > > +   unsigned long dt_root = of_get_flat_dt_root();
> > > > > > +
> > > > > > +   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > > > > > +   zone_dma_bits = 30;
> > > > > > +}
> > > > > 
> > > > > I think we could keep this entirely in the arm64 setup_machine_fdt() 
> > > > > and
> > > > > not pollute the core code with RPi4-specific code.
> > > > 
> > > > Actually, even better, could we not move the check to
> > > > arm64_memblock_init() when we initialise zone_dma_bits?
> > > 
> > > I did it this way as I vaguely remembered Rob saying he wanted to 
> > > centralise
> > > all early boot fdt code in one place. But I'll be happy to move it there.
> > 
> > I can see Rob replied and I'm fine if that's his preference. However,
> > what I don't particularly like is that in the arm64 code, if
> > zone_dma_bits == 24, we set it to 32 assuming that it wasn't touched by
> > the early_init_dt_update_zone_dma_bits(). What if at some point we'll
> > get a platform that actually needs 24 here (I truly hope not, but just
> > the principle of relying on magic values)?
> > 
> > So rather than guessing, I'd prefer if the arch code can override
> > ZONE_DMA_BITS_DEFAULT. Then, in arm64, we'll just set it to 32 and no
> > need to explicitly touch the zone_dma_bits variable.
> 
> Yes, sonds like the way to go. TBH I wasn't happy with that solution either,
> but couldn't think of a nicer alternative.
> 
> Sadly I just realised that the series is incomplete, we have RPi4 users that
> want to boot unsing ACPI, and this series would break things for them. I'll
> have a word with them to see what we can do for their use-case.

Is there a way to get some SoC information from ACPI?

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] arm64: Default to 32-bit ZONE_DMA

2020-10-01 Thread Catalin Marinas
On Thu, Oct 01, 2020 at 06:17:39PM +0200, Nicolas Saenz Julienne wrote:
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index e1a69a618832..3c3f462466eb 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -43,8 +43,6 @@
>  #include 
>  #include 
>  
> -#define ARM64_ZONE_DMA_BITS  30
> -
>  /*
>   * We need to be able to catch inadvertent references to memstart_addr
>   * that occur (potentially in generic code) before arm64_memblock_init()
> @@ -388,8 +386,14 @@ void __init arm64_memblock_init(void)
>   early_init_fdt_scan_reserved_mem();
>  
>   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
> - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> - arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
> + /*
> +  * early_init_dt_scan() might alter zone_dma_bits based on the
> +  * device's DT. Otherwise, have it cover the 32-bit address
> +  * space.
> +  */
> + if (zone_dma_bits == ZONE_DMA_BITS_DEFAULT)
> + zone_dma_bits = 32;
> + arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);

So here we assume that if zone_dma_bits is 24, it wasn't initialised. I
think it may be simpler if we just set it in setup_machine_fdt() to 32
or 30 if RPi4. This way we don't have to depend on what the core kernel
sets.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-01 Thread Catalin Marinas
On Thu, Oct 01, 2020 at 06:15:01PM +0100, Catalin Marinas wrote:
> Hi Nicolas,
> 
> Thanks for putting this together.
> 
> On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index 4602e467ca8b..cd0d115ef329 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -25,6 +25,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include   /* for zone_dma_bits */
> >  
> >  #include   /* for COMMAND_LINE_SIZE */
> >  #include 
> > @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
> > of_scan_flat_dt(early_init_dt_scan_memory, NULL);
> >  }
> >  
> > +void __init early_init_dt_update_zone_dma_bits(void)
> > +{
> > +   unsigned long dt_root = of_get_flat_dt_root();
> > +
> > +   if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> > +   zone_dma_bits = 30;
> > +}
> 
> I think we could keep this entirely in the arm64 setup_machine_fdt() and
> not pollute the core code with RPi4-specific code.

Actually, even better, could we not move the check to
arm64_memblock_init() when we initialise zone_dma_bits?

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/4] mm: Update DMA zones description

2020-10-01 Thread Catalin Marinas
On Thu, Oct 01, 2020 at 06:17:40PM +0200, Nicolas Saenz Julienne wrote:
> The default behavior for arm64 changed, so reflect that.
> 
> Signed-off-by: Nicolas Saenz Julienne 

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] of/fdt: Update zone_dma_bits when running in bcm2711

2020-10-01 Thread Catalin Marinas
Hi Nicolas,

Thanks for putting this together.

On Thu, Oct 01, 2020 at 06:17:37PM +0200, Nicolas Saenz Julienne wrote:
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index 4602e467ca8b..cd0d115ef329 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include /* for zone_dma_bits */
>  
>  #include   /* for COMMAND_LINE_SIZE */
>  #include 
> @@ -1198,6 +1199,14 @@ void __init early_init_dt_scan_nodes(void)
>   of_scan_flat_dt(early_init_dt_scan_memory, NULL);
>  }
>  
> +void __init early_init_dt_update_zone_dma_bits(void)
> +{
> + unsigned long dt_root = of_get_flat_dt_root();
> +
> + if (of_flat_dt_is_compatible(dt_root, "brcm,bcm2711"))
> + zone_dma_bits = 30;
> +}

I think we could keep this entirely in the arm64 setup_machine_fdt() and
not pollute the core code with RPi4-specific code.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-23 Thread Catalin Marinas
On Fri, Oct 23, 2020 at 05:27:49PM +0200, Nicolas Saenz Julienne wrote:
> On Thu, 2020-10-22 at 19:06 +0100, Catalin Marinas wrote:
> > On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote:
> > > @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
> > > zone_bits)
> > >  static void __init zone_sizes_init(unsigned long min, unsigned long max)
> > >  {
> > >   unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> > > + unsigned int __maybe_unused dt_zone_dma_bits;
> > >  
> > >  #ifdef CONFIG_ZONE_DMA
> > > - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> > > + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
> > > + zone_dma_bits = min(32U, dt_zone_dma_bits);
> > 
> > A thought: can we remove the min here and expand ZONE_DMA to whatever
> > dt_zone_dma_bits says? More on this below.
> 
> On most platforms we'd get PHYS_ADDR_MAX, or something bigger than the actual
> amount of RAM. Which would ultimately create a system wide ZONE_DMA. At first
> sight, I don't see it breaking dma-direct in any way.
> 
> On the other hand, there is a big amount of MMIO devices out there that can
> only handle 32-bit addressing. Be it PCI cards or actual IP cores. To make
> things worse, this limitation is often expressed in the driver, not FW (with
> dma_set_mask() and friends). If those devices aren't behind an IOMMU we have 
> be
> able to provide at least 32-bit addressable memory. See this comment from
> dma_direct_supported():
> 
> /*
>  * Because 32-bit DMA masks are so common we expect every architecture
>  * to be able to satisfy them - either by not supporting more physical
>  * memory, or by providing a ZONE_DMA32.  If neither is the case, the
>  * architecture needs to use an IOMMU instead of the direct mapping.
>  */
> 
> I think, for the common case, we're stuck with at least one zone spanning the
> 32-bit address space.

You are right, I guess it makes sense to keep a 32-bit zone as not all
devices would be described as such.

> > >   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> > >   max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> > >  #endif
> > 
> > I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the
> > need for max_zone_phys(). This was rather theoretical, the Seattle
> > platform has all RAM starting above 4GB and that led to an empty
> > ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift
> > ZONE_DMA32 into the bottom of the RAM, on the assumption that such
> > 32-bit devices would have a DMA offset hardwired. We are not aware of
> > any such case on arm64 systems and even on Seattle, IIUC 32-bit devices
> > only work if they are behind an SMMU (so no hardwired offset).
> > 
> > In hindsight, it would have made more sense on platforms with RAM above
> > 4GB to expand ZONE_DMA32 to cover the whole memory (so empty
> > ZONE_NORMAL). Something like:
> > 
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index a53c1e0fb017..7d5e3dd85617 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void)
> >   */
> >  static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> >  {
> > -   phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 
> > zone_bits);
> > -   return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
> > +   phys_addr_t zone_mask = 1ULL << zone_bits;
> > +
> > +   if (!(memblock_start_of_DRAM() & zone_mask))
> > +   zone_mask = PHYS_ADDR_MAX;
> > +
> > +   return min(zone_mask, memblock_end_of_DRAM());
> >  }
> >  
> >  static void __init zone_sizes_init(unsigned long min, unsigned long max)
> > 
> > I don't think this makes any difference for ZONE_DMA unless a
> > broken DT or IORT reports the max CPU address below the start of DRAM.
> > 
> > There's a minor issue if of_dma_get_max_cpu_address() matches
> > memblock_end_of_DRAM() but they are not a power of 2. We'd be left with
> > a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.
> 
> I agree it makes no sense to create more than one zone when the beginning of
> RAM is located above the 32-bit address space. I'm all for disregarding the
> possibility of hardwired offsets. As a bonus, as we already discussed some 
> time
> ago, this is something that never played well with current dma-direct code[1].
> 
> [1] https://lkml.org/lkml/2020/9/8/377

Maybe this one is still worth fixing, at least fo

Re: [PATCH 04/15] arm64: numa: simplify dummy_numa_init()

2020-07-30 Thread Catalin Marinas
On Tue, Jul 28, 2020 at 08:11:42AM +0300, Mike Rapoport wrote:
> From: Mike Rapoport 
> 
> dummy_numa_init() loops over memblock.memory and passes nid=0 to
> numa_add_memblk() which essentially wraps memblock_set_node(). However,
> memblock_set_node() can cope with entire memory span itself, so the loop
> over memblock.memory regions is redundant.
> 
> Replace the loop with a single call to memblock_set_node() to the entire
> memory.
> 
> Signed-off-by: Mike Rapoport 

Acked-by: Catalin Marinas 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 00/12] ACPI/OF: Upgrade MSI/IOMMU ID mapping APIs

2020-07-28 Thread Catalin Marinas
On Fri, 19 Jun 2020 09:20:01 +0100, Lorenzo Pieralisi wrote:
> This series is a v2 of a previous posting:
> 
> v1 -> v2
> 
> - Removed _rid() wrappers
> - Fixed !CONFIG_ACPI compilation issue
> - Converted of_pci_iommu_init() to use of_iommu_configure_dev_id()
> 
> [...]

Applied to arm64 (for-next/msi-iommu), thanks!

[01/12] ACPI/IORT: Make iort_match_node_callback walk the ACPI namespace for NC
https://git.kernel.org/arm64/c/07d2e59f27cd
[02/12] ACPI/IORT: Make iort_get_device_domain IRQ domain agnostic
https://git.kernel.org/arm64/c/d1718a1b7a86
[03/12] ACPI/IORT: Make iort_msi_map_rid() PCI agnostic
https://git.kernel.org/arm64/c/39c3cf566cea
[04/12] ACPI/IORT: Remove useless PCI bus walk
https://git.kernel.org/arm64/c/3a3d208beede
[05/12] ACPI/IORT: Add an input ID to acpi_dma_configure()
https://git.kernel.org/arm64/c/b8e069a2a8da
[06/12] of/iommu: Make of_map_rid() PCI agnostic
https://git.kernel.org/arm64/c/746a71d02b5d
[07/12] of/device: Add input id to of_dma_configure()
https://git.kernel.org/arm64/c/a081bd4af4ce
[08/12] dt-bindings: arm: fsl: Add msi-map device-tree binding for fsl-mc bus
https://git.kernel.org/arm64/c/5bda70c6162d
[09/12] of/irq: make of_msi_map_get_device_domain() bus agnostic
https://git.kernel.org/arm64/c/6f881aba0110
[10/12] of/irq: Make of_msi_map_rid() PCI bus agnostic
https://git.kernel.org/arm64/c/2bcdd8f2c07f
[11/12] bus/fsl-mc: Refactor the MSI domain creation in the DPRC driver
https://git.kernel.org/arm64/c/998fb7badf03
[12/12] bus: fsl-mc: Add ACPI support for fsl-mc
https://git.kernel.org/arm64/c/6305166c8771

-- 
Catalin

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-10 Thread Catalin Marinas
On Fri, Nov 06, 2020 at 07:46:29PM +0100, Nicolas Saenz Julienne wrote:
> On Thu, 2020-11-05 at 16:11 +, James Morse wrote:
> > On 03/11/2020 17:31, Nicolas Saenz Julienne wrote:
> > > crashkernel might reserve memory located in ZONE_DMA. We plan to delay
> > > ZONE_DMA's initialization after unflattening the devicetree and ACPI's
> > > boot table initialization, so move it later in the boot process.
> > > Specifically into mem_init(), this is the last place crashkernel will be
> > > able to reserve the memory before the page allocator kicks in.
> > > There
> > > isn't any apparent reason for doing this earlier.
> > 
> > It's so that map_mem() can carve it out of the linear/direct map.
> > This is so that stray writes from a crashing kernel can't accidentally 
> > corrupt the kdump
> > kernel. We depend on this if we continue with kdump, but failed to offline 
> > all the other
> > CPUs.
> 
> I presume here you refer to arch_kexec_protect_crashkres(), IIUC this will 
> only
> happen further down the line, after having loaded the kdump kernel image. But
> it also depends on the mappings to be PAGE sized (flags == NO_BLOCK_MAPPINGS |
> NO_CONT_MAPPINGS).

IIUC, arch_kexec_protect_crashkres() is only for the crashkernel image,
not the whole reserved memory that the crashkernel will use. For the
latter, we avoid the linear map by marking it as nomap in map_mem().

> > We also depend on this when skipping the checksum code in purgatory, which 
> > can be
> > exceedingly slow.
> 
> This one I don't fully understand, so I'll lazily assume the prerequisite is
> the same WRT how memory is mapped. :)
> 
> Ultimately there's also /sys/kernel/kexec_crash_size's handling. Same
> prerequisite.
> 
> Keeping in mind acpi_table_upgrade() and unflatten_device_tree() depend on
> having the linear mappings available.

So it looks like reserve_crashkernel() wants to reserve memory before
setting up the linear map with the information about the DMA zones in
place but that comes later when we can parse the firmware tables.

I wonder, instead of not mapping the crashkernel reservation, can we not
do an arch_kexec_protect_crashkres() for the whole reservation after we
created the linear map?

> Let me stress that knowing the DMA constraints in the system before reserving
> crashkernel's regions is necessary if we ever want it to work seamlessly on 
> all
> platforms. Be it small stuff like the Raspberry Pi or huge servers with TB of
> memory.

Indeed. So we have 3 options (so far):

1. Allow the crashkernel reservation to go into the linear map but set
   it to invalid once allocated.

2. Parse the flattened DT (not sure what we do with ACPI) before
   creating the linear map. We may have to rely on some SoC ID here
   instead of actual DMA ranges.

3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for crashkernel
   reservations and not rely on arm64_dma_phys_limit in
   reserve_crashkernel().

I think (2) we tried hard to avoid. Option (3) brings us back to the
issues we had on large crashkernel reservations regressing on some
platforms (though it's been a while since, they mostly went quiet ;)).
However, with Chen's crashkernel patches we end up with two
reservations, one in the low DMA zone and one higher, potentially above
4GB. Having a fixed 1GB limit wouldn't be any worse for crashkernel
reservations than what we have now.

If (1) works, I'd go for it (James knows this part better than me),
otherwise we can go for (3).

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-13 Thread Catalin Marinas
Hi Nicolas,

On Thu, Nov 12, 2020 at 04:56:38PM +0100, Nicolas Saenz Julienne wrote:
> On Tue, 2020-11-10 at 18:17 +0000, Catalin Marinas wrote:
> > On Fri, Nov 06, 2020 at 07:46:29PM +0100, Nicolas Saenz Julienne wrote:
> > > On Thu, 2020-11-05 at 16:11 +, James Morse wrote:
> > > > On 03/11/2020 17:31, Nicolas Saenz Julienne wrote:
> > > > > crashkernel might reserve memory located in ZONE_DMA. We plan to delay
> > > > > ZONE_DMA's initialization after unflattening the devicetree and ACPI's
> > > > > boot table initialization, so move it later in the boot process.
> > > > > Specifically into mem_init(), this is the last place crashkernel will 
> > > > > be
> > > > > able to reserve the memory before the page allocator kicks in.
> > > > > There
> > > > > isn't any apparent reason for doing this earlier.
> > > > 
> > > > It's so that map_mem() can carve it out of the linear/direct map.
> > > > This is so that stray writes from a crashing kernel can't accidentally 
> > > > corrupt the kdump
> > > > kernel. We depend on this if we continue with kdump, but failed to 
> > > > offline all the other
> > > > CPUs.
> > > 
> > > I presume here you refer to arch_kexec_protect_crashkres(), IIUC this 
> > > will only
> > > happen further down the line, after having loaded the kdump kernel image. 
> > > But
> > > it also depends on the mappings to be PAGE sized (flags == 
> > > NO_BLOCK_MAPPINGS |
> > > NO_CONT_MAPPINGS).
> > 
> > IIUC, arch_kexec_protect_crashkres() is only for the crashkernel image,
> > not the whole reserved memory that the crashkernel will use. For the
> > latter, we avoid the linear map by marking it as nomap in map_mem().
> 
> I'm not sure we're on the same page here, so sorry if this was already 
> implied.
> 
> The crashkernel memory mapping is bypassed while preparing the linear mappings
> but it is then mapped right away, with page granularity and !MTE.
> See paging_init()->map_mem():
> 
>   /*
>* Use page-level mappings here so that we can shrink the region
>* in page granularity and put back unused memory to buddy system
>* through /sys/kernel/kexec_crash_size interface.
>*/
>   if (crashk_res.end) {
>   __map_memblock(pgdp, crashk_res.start, crashk_res.end + 1,
>  PAGE_KERNEL,
>  NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
>   memblock_clear_nomap(crashk_res.start,
>resource_size(_res));
>   }
> 
> IIUC the inconvenience here is that we need special mapping options for
> crashkernel and updating those after having mapped that memory as regular
> memory isn't possible/easy to do.

You are right, it still gets mapped but with page granularity. However,
to James' point, we still need to know the crashkernel range in
map_mem() as arch_kexec_protect_crashkres() relies on having page rather
than block mappings.

> > > > We also depend on this when skipping the checksum code in purgatory, 
> > > > which can be
> > > > exceedingly slow.
> > > 
> > > This one I don't fully understand, so I'll lazily assume the prerequisite 
> > > is
> > > the same WRT how memory is mapped. :)
> > > 
> > > Ultimately there's also /sys/kernel/kexec_crash_size's handling. Same
> > > prerequisite.
> > > 
> > > Keeping in mind acpi_table_upgrade() and unflatten_device_tree() depend on
> > > having the linear mappings available.
> > 
> > So it looks like reserve_crashkernel() wants to reserve memory before
> > setting up the linear map with the information about the DMA zones in
> > place but that comes later when we can parse the firmware tables.
> > 
> > I wonder, instead of not mapping the crashkernel reservation, can we not
> > do an arch_kexec_protect_crashkres() for the whole reservation after we
> > created the linear map?
> 
> arch_kexec_protect_crashkres() depends on __change_memory_common() which
> ultimately depends on the memory to be mapped with PAGE_SIZE pages. As I
> comment above, the trick would work as long as there is as way to update the
> linear mappings with whatever crashkernel needs later in the boot process.

Breaking block mappings into pages is a lot more difficult later. OTOH,
the default these days is rodata_full==true, so I don't think we have
block mappings anyway. We could add NO_BLOCK_MAPPINGS if KEXEC_CORE is
enabled.

> > > Let m

Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-19 Thread Catalin Marinas
On Thu, Nov 19, 2020 at 06:25:29PM +0100, Nicolas Saenz Julienne wrote:
> On Thu, 2020-11-19 at 17:10 +0000, Catalin Marinas wrote:
> > On Thu, Nov 19, 2020 at 03:09:58PM +0100, Nicolas Saenz Julienne wrote:
> > > On Fri, 2020-11-13 at 11:29 +0000, Catalin Marinas wrote:
> > > [...]
> > > > > > > Let me stress that knowing the DMA constraints in the system 
> > > > > > > before reserving
> > > > > > > crashkernel's regions is necessary if we ever want it to work 
> > > > > > > seamlessly on all
> > > > > > > platforms. Be it small stuff like the Raspberry Pi or huge 
> > > > > > > servers with TB of
> > > > > > > memory.
> > > > > > 
> > > > > > Indeed. So we have 3 options (so far):
> > > > > > 
> > > > > > 1. Allow the crashkernel reservation to go into the linear map but 
> > > > > > set
> > > > > >it to invalid once allocated.
> > > > > > 
> > > > > > 2. Parse the flattened DT (not sure what we do with ACPI) before
> > > > > >creating the linear map. We may have to rely on some SoC ID here
> > > > > >instead of actual DMA ranges.
> > > > > > 
> > > > > > 3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for 
> > > > > > crashkernel
> > > > > >reservations and not rely on arm64_dma_phys_limit in
> > > > > >reserve_crashkernel().
> > > > > > 
> > > > > > I think (2) we tried hard to avoid. Option (3) brings us back to the
> > > > > > issues we had on large crashkernel reservations regressing on some
> > > > > > platforms (though it's been a while since, they mostly went quiet 
> > > > > > ;)).
> > > > > > However, with Chen's crashkernel patches we end up with two
> > > > > > reservations, one in the low DMA zone and one higher, potentially 
> > > > > > above
> > > > > > 4GB. Having a fixed 1GB limit wouldn't be any worse for crashkernel
> > > > > > reservations than what we have now.
> > > > > > 
> > > > > > If (1) works, I'd go for it (James knows this part better than me),
> > > > > > otherwise we can go for (3).
> > > > > 
> > > > > Overall, I'd prefer (1) as well, and I'd be happy to have a got at 
> > > > > it. If not
> > > > > I'll append (3) in this series.
> > > > 
> > > > I think for 1 we could also remove the additional KEXEC_CORE checks,
> > > > something like below, untested:
> > > > 
> > > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > > index 3e5a6913acc8..27ab609c1c0c 100644
> > > > --- a/arch/arm64/mm/mmu.c
> > > > +++ b/arch/arm64/mm/mmu.c
> > > > @@ -477,7 +477,8 @@ static void __init map_mem(pgd_t *pgdp)
> > > > int flags = 0;
> > > > u64 i;
> > > >  
> > > > -   if (rodata_full || debug_pagealloc_enabled())
> > > > +   if (rodata_full || debug_pagealloc_enabled() ||
> > > > +   IS_ENABLED(CONFIG_KEXEC_CORE))
> > > > flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > > >  
> > > > /*
> > > > @@ -487,11 +488,6 @@ static void __init map_mem(pgd_t *pgdp)
> > > >  * the following for-loop
> > > >  */
> > > > memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> > > > -#ifdef CONFIG_KEXEC_CORE
> > > > -   if (crashk_res.end)
> > > > -   memblock_mark_nomap(crashk_res.start,
> > > > -   resource_size(_res));
> > > > -#endif
> > > >  
> > > > /* map all the memory banks */
> > > > for_each_mem_range(i, , ) {
> > > > @@ -518,21 +514,6 @@ static void __init map_mem(pgd_t *pgdp)
> > > > __map_memblock(pgdp, kernel_start, kernel_end,
> > > >PAGE_KERNEL, NO_CONT_MAPPINGS);
> > > > memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
> > > > -
> > > > -#ifdef CONFIG_KEXEC_CORE
> > > > -   /*
> > > > -* Use page-level mappings here so that we can shrink the region
&g

Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-19 Thread Catalin Marinas
On Thu, Nov 19, 2020 at 03:09:58PM +0100, Nicolas Saenz Julienne wrote:
> On Fri, 2020-11-13 at 11:29 +0000, Catalin Marinas wrote:
> [...]
> > > > > Let me stress that knowing the DMA constraints in the system before 
> > > > > reserving
> > > > > crashkernel's regions is necessary if we ever want it to work 
> > > > > seamlessly on all
> > > > > platforms. Be it small stuff like the Raspberry Pi or huge servers 
> > > > > with TB of
> > > > > memory.
> > > > 
> > > > Indeed. So we have 3 options (so far):
> > > > 
> > > > 1. Allow the crashkernel reservation to go into the linear map but set
> > > >it to invalid once allocated.
> > > > 
> > > > 2. Parse the flattened DT (not sure what we do with ACPI) before
> > > >creating the linear map. We may have to rely on some SoC ID here
> > > >instead of actual DMA ranges.
> > > > 
> > > > 3. Assume the smallest ZONE_DMA possible on arm64 (1GB) for crashkernel
> > > >reservations and not rely on arm64_dma_phys_limit in
> > > >reserve_crashkernel().
> > > > 
> > > > I think (2) we tried hard to avoid. Option (3) brings us back to the
> > > > issues we had on large crashkernel reservations regressing on some
> > > > platforms (though it's been a while since, they mostly went quiet ;)).
> > > > However, with Chen's crashkernel patches we end up with two
> > > > reservations, one in the low DMA zone and one higher, potentially above
> > > > 4GB. Having a fixed 1GB limit wouldn't be any worse for crashkernel
> > > > reservations than what we have now.
> > > > 
> > > > If (1) works, I'd go for it (James knows this part better than me),
> > > > otherwise we can go for (3).
> > > 
> > > Overall, I'd prefer (1) as well, and I'd be happy to have a got at it. If 
> > > not
> > > I'll append (3) in this series.
> > 
> > I think for 1 we could also remove the additional KEXEC_CORE checks,
> > something like below, untested:
> > 
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 3e5a6913acc8..27ab609c1c0c 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -477,7 +477,8 @@ static void __init map_mem(pgd_t *pgdp)
> > int flags = 0;
> > u64 i;
> >  
> > -   if (rodata_full || debug_pagealloc_enabled())
> > +   if (rodata_full || debug_pagealloc_enabled() ||
> > +   IS_ENABLED(CONFIG_KEXEC_CORE))
> > flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> >  
> > /*
> > @@ -487,11 +488,6 @@ static void __init map_mem(pgd_t *pgdp)
> >  * the following for-loop
> >  */
> > memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
> > -#ifdef CONFIG_KEXEC_CORE
> > -   if (crashk_res.end)
> > -   memblock_mark_nomap(crashk_res.start,
> > -   resource_size(_res));
> > -#endif
> >  
> > /* map all the memory banks */
> > for_each_mem_range(i, , ) {
> > @@ -518,21 +514,6 @@ static void __init map_mem(pgd_t *pgdp)
> > __map_memblock(pgdp, kernel_start, kernel_end,
> >PAGE_KERNEL, NO_CONT_MAPPINGS);
> > memblock_clear_nomap(kernel_start, kernel_end - kernel_start);
> > -
> > -#ifdef CONFIG_KEXEC_CORE
> > -   /*
> > -* Use page-level mappings here so that we can shrink the region
> > -* in page granularity and put back unused memory to buddy system
> > -* through /sys/kernel/kexec_crash_size interface.
> > -*/
> > -   if (crashk_res.end) {
> > -   __map_memblock(pgdp, crashk_res.start, crashk_res.end + 1,
> > -  PAGE_KERNEL,
> > -  NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS);
> > -   memblock_clear_nomap(crashk_res.start,
> > -resource_size(_res));
> > -   }
> > -#endif
> >  }
> >  
> >  void mark_rodata_ro(void)
> 
> So as far as I'm concerned this is good enough for me. I took the time to
> properly test crashkernel on RPi4 using the series, this patch, and another
> small fix to properly update /proc/iomem.
> 
> I'll send v7 soon, but before, James (or anyone for that matter) any obvious
> push-back to Catalin's solution?

I talked to James earlier and he was suggesting that we check the
command line for any crashkernel reservations and only disable block

Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-19 Thread Catalin Marinas
On Thu, Nov 19, 2020 at 05:10:49PM +, Catalin Marinas wrote:
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ed71b1c305d7..acdec0c67d3b 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -469,6 +469,21 @@ void __init mark_linear_text_alias_ro(void)
>   PAGE_KERNEL_RO);
>  }
>  
> +static bool crash_mem_map __initdata;
> +
> +static int __init enable_crash_mem_map(char *arg)
> +{
> + /*
> +  * Proper parameter parsing is done by reserve_crashkernel(). We only
> +  * need to know if the linear map has to avoid block mappings so that
> +  * the crashkernel reservations can be unmapped later.
> +  */
> + crash_mem_map = false;

It should be set to true.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/7] arm64: Default to 32-bit wide ZONE_DMA

2020-11-03 Thread Catalin Marinas
On Tue, Nov 03, 2020 at 06:00:33PM +0100, Nicolas Saenz Julienne wrote:
> On Fri, 2020-10-30 at 18:11 +0000, Catalin Marinas wrote:
> > On Thu, Oct 29, 2020 at 06:25:43PM +0100, Nicolas Saenz Julienne wrote:
> > > Ard Biesheuvel (1):
> > >   arm64: mm: Set ZONE_DMA size based on early IORT scan
> > > 
> > > Nicolas Saenz Julienne (6):
> > >   arm64: mm: Move reserve_crashkernel() into mem_init()
> > >   arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
> > >   of/address: Introduce of_dma_get_max_cpu_address()
> > >   of: unittest: Add test for of_dma_get_max_cpu_address()
> > >   arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
> > >   mm: Remove examples from enum zone_type comment
> > 
> > Thanks for putting this together. I had a minor comment but the patches
> > look fine to me. We still need an ack from Rob on the DT patch and I can
> > queue the series for 5.11.
> 
> I'm preparing a v6 unifying both functions as you suggested.
> 
> > Could you please also test the patch below on top of this series? It's
> > the removal of the implied DMA offset in the max_zone_phys()
> > calculation.
> 
> Yes, happily. Comments below.
> 
> > --8<-
> > From 3ae252d888be4984a612236124f5b099e804c745 Mon Sep 17 00:00:00 2001
> > From: Catalin Marinas 
> > Date: Fri, 30 Oct 2020 18:07:34 +
> > Subject: [PATCH] arm64: Ignore any DMA offsets in the max_zone_phys()
> >  calculation
> > 
> > Currently, the kernel assumes that if RAM starts above 32-bit (or
> > zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
> > such constrained devices have a hardwired DMA offset. In practice, we
> > haven't noticed any such hardware so let's assume that we can expand
> > ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
> > ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
> > zone_bits.
> > 
> > Signed-off-by: Catalin Marinas 
> > ---
> >  arch/arm64/mm/init.c | 17 -
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 095540667f0f..362160e16fb2 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -175,14 +175,21 @@ static void __init reserve_elfcorehdr(void)
> >  #endif /* CONFIG_CRASH_DUMP */
> >  
> >  /*
> > - * Return the maximum physical address for a zone with a given address size
> > - * limit. It currently assumes that for memory starting above 4G, 32-bit
> > - * devices will use a DMA offset.
> > + * Return the maximum physical address for a zone accessible by the given 
> > bits
> > + * limit. If the DRAM starts above 32-bit, expand the zone to the maximum
> > + * available memory, otherwise cap it at 32-bit.
> >   */
> >  static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> >  {
> > -   phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 
> > zone_bits);
> > -   return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
> > +   phys_addr_t zone_mask = (1ULL << zone_bits) - 1;
> 
> Maybe use DMA_BIT_MASK(), instead of the manual calculation?

Yes.

> 
> > +   phys_addr_t phys_start = memblock_start_of_DRAM();
> > +
> > +   if (!(phys_start & U32_MAX))
> 
> I'd suggest using 'bigger than' instead of masks. Just to cover ourselves
> against memory starting at odd locations. Also it'll behaves properly when
> phys_start is zero (this breaks things on RPi4).

Good point.

> > +   zone_mask = PHYS_ADDR_MAX;
> > +   else if (!(phys_start & zone_mask))
> > +   zone_mask = U32_MAX;
> > +
> > +   return min(zone_mask + 1, memblock_end_of_DRAM());
> 
> This + 1 isn't going to play well when zone_mask is PHYS_ADDR_MAX.

You are right on PHYS_ADDR_MAX overflowing but I'd keep the +1 since
memblock_end_of_DRAM() returns the first byte past the accessible range
(so exclusive end).

I'll tweak this function a bit to avoid the overflow or use the
arm64-specific PHYS_MASK (that's never going to be the full 64 bits).

Thanks.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 6/7] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-30 Thread Catalin Marinas
On Thu, Oct 29, 2020 at 06:25:49PM +0100, Nicolas Saenz Julienne wrote:
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 9929ff50c0c0..05fe4a076bab 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1718,3 +1718,55 @@ void __init acpi_iort_init(void)
>  
>   iort_init_platform_devices();
>  }
> +
> +#ifdef CONFIG_ZONE_DMA
> +/*
> + * Check the IORT whether any devices exist whose DMA mask is < 32 bits.
> + * If so, return the smallest value encountered, or 32 otherwise.
> + */
> +unsigned int __init acpi_iort_get_zone_dma_size(void)
> +{
> + struct acpi_table_iort *iort;
> + struct acpi_iort_node *node, *end;
> + acpi_status status;
> + u8 limit = 32;

Nitpick: can we not return PHYS_ADDR_MAX here, for consistency with
of_dma_get_max_cpu_address()? There wouldn't be any functional change.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/7] arm64: Default to 32-bit wide ZONE_DMA

2020-10-30 Thread Catalin Marinas
On Thu, Oct 29, 2020 at 06:25:43PM +0100, Nicolas Saenz Julienne wrote:
> Ard Biesheuvel (1):
>   arm64: mm: Set ZONE_DMA size based on early IORT scan
> 
> Nicolas Saenz Julienne (6):
>   arm64: mm: Move reserve_crashkernel() into mem_init()
>   arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
>   of/address: Introduce of_dma_get_max_cpu_address()
>   of: unittest: Add test for of_dma_get_max_cpu_address()
>   arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
>   mm: Remove examples from enum zone_type comment

Thanks for putting this together. I had a minor comment but the patches
look fine to me. We still need an ack from Rob on the DT patch and I can
queue the series for 5.11.

Could you please also test the patch below on top of this series? It's
the removal of the implied DMA offset in the max_zone_phys()
calculation.

--8<-
>From 3ae252d888be4984a612236124f5b099e804c745 Mon Sep 17 00:00:00 2001
From: Catalin Marinas 
Date: Fri, 30 Oct 2020 18:07:34 +
Subject: [PATCH] arm64: Ignore any DMA offsets in the max_zone_phys()
 calculation

Currently, the kernel assumes that if RAM starts above 32-bit (or
zone_bits), there is still a ZONE_DMA/DMA32 at the bottom of the RAM and
such constrained devices have a hardwired DMA offset. In practice, we
haven't noticed any such hardware so let's assume that we can expand
ZONE_DMA32 to the available memory if no RAM below 4GB. Similarly,
ZONE_DMA is expanded to the 4GB limit if no RAM addressable by
zone_bits.

Signed-off-by: Catalin Marinas 
---
 arch/arm64/mm/init.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 095540667f0f..362160e16fb2 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -175,14 +175,21 @@ static void __init reserve_elfcorehdr(void)
 #endif /* CONFIG_CRASH_DUMP */
 
 /*
- * Return the maximum physical address for a zone with a given address size
- * limit. It currently assumes that for memory starting above 4G, 32-bit
- * devices will use a DMA offset.
+ * Return the maximum physical address for a zone accessible by the given bits
+ * limit. If the DRAM starts above 32-bit, expand the zone to the maximum
+ * available memory, otherwise cap it at 32-bit.
  */
 static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
 {
-   phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 
zone_bits);
-   return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
+   phys_addr_t zone_mask = (1ULL << zone_bits) - 1;
+   phys_addr_t phys_start = memblock_start_of_DRAM();
+
+   if (!(phys_start & U32_MAX))
+   zone_mask = PHYS_ADDR_MAX;
+   else if (!(phys_start & zone_mask))
+   zone_mask = U32_MAX;
+
+   return min(zone_mask + 1, memblock_end_of_DRAM());
 }
 
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 5/7] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-22 Thread Catalin Marinas
On Wed, Oct 21, 2020 at 02:34:35PM +0200, Nicolas Saenz Julienne wrote:
> @@ -188,9 +186,11 @@ static phys_addr_t __init max_zone_phys(unsigned int 
> zone_bits)
>  static void __init zone_sizes_init(unsigned long min, unsigned long max)
>  {
>   unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> + unsigned int __maybe_unused dt_zone_dma_bits;
>  
>  #ifdef CONFIG_ZONE_DMA
> - zone_dma_bits = ARM64_ZONE_DMA_BITS;
> + dt_zone_dma_bits = ilog2(of_dma_get_max_cpu_address(NULL));
> + zone_dma_bits = min(32U, dt_zone_dma_bits);

A thought: can we remove the min here and expand ZONE_DMA to whatever
dt_zone_dma_bits says? More on this below.

>   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
>   max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>  #endif

I was talking earlier to Ard and Robin on the ZONE_DMA32 history and the
need for max_zone_phys(). This was rather theoretical, the Seattle
platform has all RAM starting above 4GB and that led to an empty
ZONE_DMA32 originally. The max_zone_phys() hack was meant to lift
ZONE_DMA32 into the bottom of the RAM, on the assumption that such
32-bit devices would have a DMA offset hardwired. We are not aware of
any such case on arm64 systems and even on Seattle, IIUC 32-bit devices
only work if they are behind an SMMU (so no hardwired offset).

In hindsight, it would have made more sense on platforms with RAM above
4GB to expand ZONE_DMA32 to cover the whole memory (so empty
ZONE_NORMAL). Something like:

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a53c1e0fb017..7d5e3dd85617 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -187,8 +187,12 @@ static void __init reserve_elfcorehdr(void)
  */
 static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
 {
-   phys_addr_t offset = memblock_start_of_DRAM() & GENMASK_ULL(63, 
zone_bits);
-   return min(offset + (1ULL << zone_bits), memblock_end_of_DRAM());
+   phys_addr_t zone_mask = 1ULL << zone_bits;
+
+   if (!(memblock_start_of_DRAM() & zone_mask))
+   zone_mask = PHYS_ADDR_MAX;
+
+   return min(zone_mask, memblock_end_of_DRAM());
 }
 
 static void __init zone_sizes_init(unsigned long min, unsigned long max)

I don't think this makes any difference for ZONE_DMA unless a
broken DT or IORT reports the max CPU address below the start of DRAM.

There's a minor issue if of_dma_get_max_cpu_address() matches
memblock_end_of_DRAM() but they are not a power of 2. We'd be left with
a bit of RAM at the end in ZONE_NORMAL due to ilog2 truncation.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] dma: coherent: check no-map property for arm64

2021-06-14 Thread Catalin Marinas
On Mon, Jun 14, 2021 at 06:07:04PM +0800, Dong Aisheng wrote:
> On Mon, Jun 14, 2021 at 4:36 PM Will Deacon  wrote:
> > On Fri, Jun 11, 2021 at 09:10:56PM +0800, Dong Aisheng wrote:
> > > Coherent dma on ARM64 also can't work with mapped system ram,
> > > that means 'no-map' property must be specified in dts.
> > > Add the missing check for ARM64 platforms as well.
> > > Besides 'no-map' checking, 'linux,dma-default' feature is also
> > > enabled for ARM64 along with this patch.
> >
> > Please can you explain _why_ it can't work? We don't need to tear down
> > aliases from the linear map for the streaming DMA API, so why is this
> > case different? Also, coherent devices wouldn't need this either way,
> > would they? What problem are you solving here?
> >
> 
> Not sure if i get your point correctly. Here is my understanding. (fix
> me if wrong)
> In current implementation, the coherent dma memory will be remapped as
> writecombine and uncached type which can't reuse the linear mapping.
> The prerequisite to do this is the memory must not be mapped System RAM.
> e.g. reserved memory with no-map property and invisible to the buddy system.

The architecture allows the system RAM to be mapped in the linear map
while there's another writecombine alias, as long as there are no dirty
cache lines that could be evicted randomly. This works fine with the DMA
API (and we have some cache maintenance when the non-cacheable mapping
is first created).

Looking at the rmem_dma_device_init() -> dma_init_coherent_memory(), it
ends up calling memremap(MEMREMAP_WC) which would warn if it intersects
with system RAM regardless of the architecture. If the memory region is
nomap, it doesn't end up as IORESOURCE_SYSTEM_RAM, so memremap() won't
warn. But why is this specific only to arm (or arm64)?

Is the "shared-dma-pool" property only meant for Normal Non-cacheable
memory (hence the MEMREMAP_WC flag)? If a system is fully cache
coherent, does this check still make sense or the DT is not supposed to
have such nodes?

> This seems a little different from CMA which the memory is still
> underlying managed by the buddy system in order to support migration.
> 
> The patch here does not resolve a real issue but just open the sanity check 
> for
> ARM64 case as well as ARM  which reports the issue a little bit earlier at
> rmem_dma_setup() time.

I think we first need to figure out what the real issue is and then try
to solve it.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] mm: kmemleak: Ignore kmemleak scanning on CMA regions

2021-12-02 Thread Catalin Marinas
On Sun, Nov 28, 2021 at 09:50:53AM +0800, Calvin Zhang wrote:
> On Sat, Nov 27, 2021 at 04:07:18PM -0800, Andrew Morton wrote:
> >On Fri, 26 Nov 2021 10:47:11 +0800 Calvin Zhang  
> >wrote:
> >> Just like this:
> >> commit 620951e27457 ("mm/cma: make kmemleak ignore CMA regions").
> >> 
> >> Add kmemleak_ignore_phys() for CMA created from of reserved node.
[...]
> >The 620951e27457 changelog says "Without this, the kernel crashes...". 
> >Does your patch also fix a crash?  If so under what circumstances and
> >should we backport this fix into -stable kernels?
> 
> No crash occurred. 620951e27457 avoids crashes caused by accessing
> highmem and it was fixed later. Now kmemleak_alloc_phys() and
> kmemleak_ignore_phys() skip highmem. This patch is based on the
> point that CMA regions don't contain pointers to other kmemleak
> objects, and ignores CMA regions from reserved memory as what
> 620951e27457 did.

Note that kmemleak_ignore() only works if there was a prior
kmemleak_alloc() on that address range. With the previous commit we get
this via the memblock_alloc_range() but I fail to see one on the
rmem_cma_setup() path.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu