Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-14 Thread Christoph Hellwig
> +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> +{
> + phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
> + struct of_range_parser parser;
> + phys_addr_t subtree_max_addr;
> + struct device_node *child;
> + phys_addr_t cpu_end = 0;
> + struct of_range range;
> + const __be32 *ranges;
> + int len;
> +
> + if (!np)
> + np = of_root;

Requiring of_root to be passed explicitly would seem more natural
to me than the magic NULL argument.  There doesn't seem to be any
precedent for that kind of calling convention either.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 8/8] mm: Update DMA zones description

2020-10-14 Thread Christoph Hellwig
On Wed, Oct 14, 2020 at 09:12:10PM +0200, Nicolas Saenz Julienne wrote:
> The default behavior for arm64 changed, so reflect that.
> 
> Signed-off-by: Nicolas Saenz Julienne 
> Acked-by: Catalin Marinas 
> ---
>  include/linux/mmzone.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fb3bf696c05e..4ee2306351b9 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -363,8 +363,9 @@ enum zone_type {
>*  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
>*the specific device.
>*
> -  *  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
> -  *lower 4G.
> +  *  - arm64 uses a single 4GB ZONE_DMA, except on the Raspberry Pi 4,
> +  *in which ZONE_DMA covers the first GB and ZONE_DMA32 the rest of
> +  *the lower 4GB.

Honestly I think this comment just needs to go away.  We can't really list
every setup in a comment in common code.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 6/8] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-14 Thread Christoph Hellwig
On Wed, Oct 14, 2020 at 09:12:08PM +0200, Nicolas Saenz Julienne wrote:
> + zone_dma_bits = min(zone_dma_bits,
> + (unsigned 
> int)ilog2(of_dma_get_max_cpu_address(NULL)));

Plase avoid pointlessly long lines.  Especially if it is completely trivial
by using either min_t or not overindenting like here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/8] dma-direct: Turn zone_dma_bits default value into a define

2020-10-14 Thread Christoph Hellwig
On Wed, Oct 14, 2020 at 09:12:07PM +0200, Nicolas Saenz Julienne wrote:
> Set zone_dma_bits default value through a define so as for architectures
> to be able to override it with their default value.

Architectures can do that already by assigning a value to zone_dma_bits
at runtime.  I really do not want to add the extra clutter.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 3/3] iommu/tegra-smmu: Add PCI support

2020-10-14 Thread Nicolin Chen
On Wed, Oct 14, 2020 at 06:42:36PM +0100, Robin Murphy wrote:
> On 2020-10-09 17:19, Nicolin Chen wrote:
> > This patch simply adds support for PCI devices.
> > 
> > Reviewed-by: Dmitry Osipenko 
> > Tested-by: Dmitry Osipenko 
> > Signed-off-by: Nicolin Chen 
> > ---
> > 
> > Changelog
> > v6->v7
> >   * Renamed goto labels, suggested by Thierry.
> > v5->v6
> >   * Added Dmitry's Reviewed-by and Tested-by.
> > v4->v5
> >   * Added Dmitry's Reviewed-by
> > v3->v4
> >   * Dropped !iommu_present() check
> >   * Added CONFIG_PCI check in the exit path
> > v2->v3
> >   * Replaced ternary conditional operator with if-else in .device_group()
> >   * Dropped change in tegra_smmu_remove()
> > v1->v2
> >   * Added error-out labels in tegra_smmu_probe()
> >   * Dropped pci_request_acs() since IOMMU core would call it.
> > 
> >   drivers/iommu/tegra-smmu.c | 35 +--
> >   1 file changed, 25 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> > index be29f5977145..2941d6459076 100644
> > --- a/drivers/iommu/tegra-smmu.c
> > +++ b/drivers/iommu/tegra-smmu.c
> > @@ -10,6 +10,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -865,7 +866,11 @@ static struct iommu_group 
> > *tegra_smmu_device_group(struct device *dev)
> > group->smmu = smmu;
> > group->soc = soc;
> > -   group->group = iommu_group_alloc();
> > +   if (dev_is_pci(dev))
> > +   group->group = pci_device_group(dev);
> 
> Just to check, is it OK to have two or more swgroups "owning" the same
> iommu_group if an existing one gets returned here? It looks like that might
> not play nice with the use of iommu_group_set_iommudata().

Do you mean by "gets returned here" the "IS_ERR" check below?

> Robin.
> 
> > +   else
> > +   group->group = generic_device_group(dev);
> > +
> > if (IS_ERR(group->group)) {
> > devm_kfree(smmu->dev, group);
> > mutex_unlock(>lock);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Increase interrupt remapping table limit to 512 entries

2020-10-14 Thread Suravee Suthikulpanit
Certain device drivers allocate IO queues on a per-cpu basis.
On AMD EPYC platform, which can support up-to 256 cpu threads,
this can exceed the current MAX_IRQ_PER_TABLE limit of 256,
and result in the error message:

AMD-Vi: Failed to allocate IRTE

This has been observed with certain NVME devices.

AMD IOMMU hardware can actually support upto 512 interrupt
remapping table entries. Therefore, update the driver to
match the hardware limit.

Please note that this also increases the size of interrupt remapping
table to 8KB per device when using the 128-bit IRTE format.

Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd/amd_iommu_types.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 30a5d412255a..427484c45589 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -406,7 +406,11 @@ extern bool amd_iommu_np_cache;
 /* Only true if all IOMMUs support device IOTLBs */
 extern bool amd_iommu_iotlb_sup;
 
-#define MAX_IRQS_PER_TABLE 256
+/*
+ * AMD IOMMU hardware only support 512 IRTEs despite
+ * the architectural limitation of 2048 entries.
+ */
+#define MAX_IRQS_PER_TABLE 512
 #define IRQ_TABLE_ALIGNMENT128
 
 struct irq_remap_table {
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-14 Thread Alex Williamson
On Wed, 14 Oct 2020 03:08:31 +
"Tian, Kevin"  wrote:

> > From: Jason Wang 
> > Sent: Tuesday, October 13, 2020 2:22 PM
> > 
> > 
> > On 2020/10/12 下午4:38, Tian, Kevin wrote:  
> > >> From: Jason Wang 
> > >> Sent: Monday, September 14, 2020 12:20 PM
> > >>  
> > > [...]  
> > >   > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > >> specific one.
> > >>
> > >> Jason suggest something like /dev/sva. There will be a lot of other
> > >> subsystems that could benefit from this (e.g vDPA).
> > >>
> > >> Have you ever considered this approach?
> > >>  
> > > Hi, Jason,
> > >
> > > We did some study on this approach and below is the output. It's a
> > > long writing but I didn't find a way to further abstract w/o losing
> > > necessary context. Sorry about that.
> > >
> > > Overall the real purpose of this series is to enable IOMMU nested
> > > translation capability with vSVA as one major usage, through
> > > below new uAPIs:
> > >   1) Report/enable IOMMU nested translation capability;
> > >   2) Allocate/free PASID;
> > >   3) Bind/unbind guest page table;
> > >   4) Invalidate IOMMU cache;
> > >   5) Handle IOMMU page request/response (not in this series);
> > > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > > the other two optional. For example, the guest may enable vSVA on
> > > a device without using PASID. Or, it may bind its gIOVA page table
> > > which doesn't require page fault support. Finally, all operations can
> > > be applied to either physical device or subdevice.
> > >
> > > Then we evaluated each uAPI whether generalizing it is a good thing
> > > both in concept and regarding to complexity.
> > >
> > > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > > allocation/free is through the IOASID sub-system.  
> > 
> > 
> > A question here, is IOASID expected to be the single management
> > interface for PASID?  
> 
> yes
> 
> > 
> > (I'm asking since there're already vendor specific IDA based PASID
> > allocator e.g amdgpu_pasid_alloc())  
> 
> That comes before IOASID core was introduced. I think it should be
> changed to use the new generic interface. Jacob/Jean can better
> comment if other reason exists for this exception.
> 
> > 
> >   
> > >   From this angle
> > > we feel generalizing PASID management does make some sense.
> > > First, PASID is just a number and not related to any device before
> > > it's bound to a page table and IOMMU domain. Second, PASID is a
> > > global resource (at least on Intel VT-d),  
> > 
> > 
> > I think we need a definition of "global" here. It looks to me for vt-d
> > the PASID table is per device.  
> 
> PASID table is per device, thus VT-d could support per-device PASIDs
> in concept. However on Intel platform we require PASIDs to be managed 
> in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV 
> and ENQCMD together. Thus the host creates only one 'global' PASID 
> namespace but do use per-device PASID table to assure isolation between 
> devices on Intel platforms. But ARM does it differently as Jean explained. 
> They have a global namespace for host processes on all host-owned 
> devices (same as Intel), but then per-device namespace when a device 
> (and its PASID table) is assigned to userspace.
> 
> > 
> > Another question, is this possible to have two DMAR hardware unit(at
> > least I can see two even in my laptop). In this case, is PASID still a
> > global resource?  
> 
> yes
> 
> > 
> >   
> > >   while having separate VFIO/
> > > VDPA allocation interfaces may easily cause confusion in userspace,
> > > e.g. which interface to be used if both VFIO/VDPA devices exist.
> > > Moreover, an unified interface allows centralized control over how
> > > many PASIDs are allowed per process.  
> > 
> > 
> > Yes.
> > 
> >   
> > >
> > > One unclear part with this generalization is about the permission.
> > > Do we open this interface to any process or only to those which
> > > have assigned devices? If the latter, what would be the mechanism
> > > to coordinate between this new interface and specific passthrough
> > > frameworks?  
> > 
> > 
> > I'm not sure, but if you just want a permission, you probably can
> > introduce new capability (CAP_XXX) for this.
> > 
> >   
> > >   A more tricky case, vSVA support on ARM (Eric/Jean
> > > please correct me) plans to do per-device PASID namespace which
> > > is built on a bind_pasid_table iommu callback to allow guest fully
> > > manage its PASIDs on a given passthrough device.  
> > 
> > 
> > I see, so I think the answer is to prepare for the namespace support
> > from the start. (btw, I don't see how namespace is handled in current
> > IOASID module?)  
> 
> The PASID table is based on GPA when nested translation is enabled 
> on ARM SMMU. This design implies that the guest manages PASID
> table thus PASIDs instead of going through host-side API on assigned 
> device. From this angle we don't need explicit namespace in 

Re: [PATCH v3 4/8] of: unittest: Add test for of_dma_get_max_cpu_address()

2020-10-14 Thread Rob Herring
On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne
 wrote:
>
> Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
> data as the rest of dma-ranges unit tests.
>
> Signed-off-by: Nicolas Saenz Julienne 
> ---
>  drivers/of/unittest.c | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
> index 06cc988faf78..2cbf2a585c9f 100644
> --- a/drivers/of/unittest.c
> +++ b/drivers/of/unittest.c
> @@ -869,6 +869,25 @@ static void __init of_unittest_changeset(void)
>  #endif
>  }
>
> +static void __init of_unittest_dma_get_max_cpu_address(void)
> +{
> +#ifdef CONFIG_HAS_DMA

Can't the unittest run without this? I run the unittests under UML.

> +   struct device_node *np;
> +   phys_addr_t cpu_addr;
> +
> +   np = of_find_node_by_path("/testcase-data/address-tests");
> +   if (!np) {
> +   pr_err("missing testcase data\n");
> +   return;
> +   }
> +
> +   cpu_addr = of_dma_get_max_cpu_address(np);
> +   unittest(cpu_addr == 0x5000ULL,
> +"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting 
> %llx)\n",
> +_addr, 0x5000ULL);
> +#endif
> +}
> +
>  static void __init of_unittest_dma_ranges_one(const char *path,
> u64 expect_dma_addr, u64 expect_paddr)
>  {
> @@ -3266,6 +3285,7 @@ static int __init of_unittest(void)
> of_unittest_changeset();
> of_unittest_parse_interrupts();
> of_unittest_parse_interrupts_extended();
> +   of_unittest_dma_get_max_cpu_address();
> of_unittest_parse_dma_ranges();
> of_unittest_pci_dma_ranges();
> of_unittest_match_node();
> --
> 2.28.0
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-14 Thread Rob Herring
On Wed, Oct 14, 2020 at 2:12 PM Nicolas Saenz Julienne
 wrote:
>
> Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
> physical address addressable by all DMA masters in the system. It's
> specially useful for setting memory zones sizes at early boot time.
>
> Signed-off-by: Nicolas Saenz Julienne 
>
> ---
>
> Changes since v2:
>  - Use PHYS_ADDR_MAX
>  - return phys_dma_t
>  - Rename function
>  - Correct subject
>  - Add support to start parsing from an arbitrary device node in order
>for the function to work with unit tests
>
>  drivers/of/address.c | 42 ++
>  include/linux/of.h   |  7 +++
>  2 files changed, 49 insertions(+)
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index eb9ab4f1e80b..b5a9695aaf82 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
> struct bus_dma_region **map)
>  }
>  #endif /* CONFIG_HAS_DMA */
>
> +/**
> + * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
> + * @np: The node to start searching from or NULL to start from the root
> + *
> + * Gets the highest CPU physical address that is addressable by all DMA 
> masters
> + * in the system (or subtree when np is non-NULL). If no DMA constrained 
> device
> + * is found, it returns PHYS_ADDR_MAX.
> + */
> +phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
> +{
> +   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;

One issue with using phys_addr_t is it may be 32-bit even though the
DT is 64-bit addresses. LPAE capable system with LPAE disabled. Maybe
the truncation is fine here? Maybe not.

> +   struct of_range_parser parser;
> +   phys_addr_t subtree_max_addr;
> +   struct device_node *child;
> +   phys_addr_t cpu_end = 0;
> +   struct of_range range;
> +   const __be32 *ranges;
> +   int len;
> +
> +   if (!np)
> +   np = of_root;
> +
> +   ranges = of_get_property(np, "dma-ranges", );

I'm not really following why you changed the algorithm here. You're
skipping disabled nodes which is good. Was there some other reason?

> +   if (ranges && len) {
> +   of_dma_range_parser_init(, np);
> +   for_each_of_range(, )
> +   if (range.cpu_addr + range.size > cpu_end)
> +   cpu_end = range.cpu_addr + range.size;
> +
> +   if (max_cpu_addr > cpu_end)
> +   max_cpu_addr = cpu_end;
> +   }
> +
> +   for_each_available_child_of_node(np, child) {
> +   subtree_max_addr = of_dma_get_max_cpu_address(child);
> +   if (max_cpu_addr > subtree_max_addr)
> +   max_cpu_addr = subtree_max_addr;
> +   }
> +
> +   return max_cpu_addr;
> +}
> +
>  /**
>   * of_dma_is_coherent - Check if device is coherent
>   * @np:device node
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 481ec0467285..db8db8f2c967 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
>const char *map_name, const char *map_mask_name,
>struct device_node **target, u32 *id_out);
>
> +phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
> +
>  #else /* CONFIG_OF */
>
>  static inline void of_core_init(void)
> @@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 
> id,
> return -EINVAL;
>  }
>
> +static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np)
> +{
> +   return PHYS_ADDR_MAX;
> +}
> +
>  #define of_match_ptr(_ptr) NULL
>  #define of_match_node(_matches, _node) NULL
>  #endif /* CONFIG_OF */
> --
> 2.28.0
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [git pull] IOMMU Updates for Linux v5.10

2020-10-14 Thread pr-tracker-bot
The pull request you sent on Tue, 13 Oct 2020 18:03:58 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
> tags/iommu-updates-v5.10

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/531d29b0b674036347a04c08c0898ff1aa522180

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [git pull] IOMMU Updates for Linux v5.10

2020-10-14 Thread Linus Torvalds
On Tue, Oct 13, 2020 at 9:04 AM Joerg Roedel  wrote:
>
> there is a minor conflict this time in include/linux/iommu.h which
> should be easy to resolve. I would attach my resolution, but somehow git
> [show|log] didn't show it to me.

So when a resolution takes one side over the other (as opposed to
mixing two sides), then "git show" doesn't show it as a conflict any
more.

The reason is simple: at "git merge" time, git figures out the common
base, so it has all of "base", "side1" and "side2" to look at and does
a three-way diff (for the simplified normal case). So it can see it as
a conflict, because it sees both where you came from, and where you
ended up.

But once you have resolved the conflict and committed the end result,
"git show" (and "git log") doesn't do the whole expensive "what was
the base" thing any more. "git show" just sees the parents (so "side1"
and "side2") and the end result, but doesn't really see - or care -
about the fact that some time in the distant past you also had that
"base" state.

As a result, "git show" doesn't ever really understand the notion of a
"merge conflict", and all it shows is really "whee, this end result
looks like neither side" as a kind of pseudo-conflict diff.

Anyway, thanks for the describing the conflict, it was indeed not
complicated, this email is just to explain your "but somehow git
[show|log] didn't show it to me" thing.

   Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 4/8] of: unittest: Add test for of_dma_get_max_cpu_address()

2020-10-14 Thread Nicolas Saenz Julienne
Introduce a test for of_dma_get_max_cup_address(), it uses the same DT
data as the rest of dma-ranges unit tests.

Signed-off-by: Nicolas Saenz Julienne 
---
 drivers/of/unittest.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 06cc988faf78..2cbf2a585c9f 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -869,6 +869,25 @@ static void __init of_unittest_changeset(void)
 #endif
 }
 
+static void __init of_unittest_dma_get_max_cpu_address(void)
+{
+#ifdef CONFIG_HAS_DMA
+   struct device_node *np;
+   phys_addr_t cpu_addr;
+
+   np = of_find_node_by_path("/testcase-data/address-tests");
+   if (!np) {
+   pr_err("missing testcase data\n");
+   return;
+   }
+
+   cpu_addr = of_dma_get_max_cpu_address(np);
+   unittest(cpu_addr == 0x5000ULL,
+"of_dma_get_max_cpu_address: wrong CPU addr %pad (expecting 
%llx)\n",
+_addr, 0x5000ULL);
+#endif
+}
+
 static void __init of_unittest_dma_ranges_one(const char *path,
u64 expect_dma_addr, u64 expect_paddr)
 {
@@ -3266,6 +3285,7 @@ static int __init of_unittest(void)
of_unittest_changeset();
of_unittest_parse_interrupts();
of_unittest_parse_interrupts_extended();
+   of_unittest_dma_get_max_cpu_address();
of_unittest_parse_dma_ranges();
of_unittest_pci_dma_ranges();
of_unittest_match_node();
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-14 Thread Nicolas Saenz Julienne
From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.

Cc: Jeremy Linton 
Cc: Lorenzo Pieralisi 
Cc: Nicolas Saenz Julienne 
Cc: Rob Herring 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Cc: Hanjun Guo 
Cc: Sudeep Holla 
Cc: Anshuman Khandual 
Signed-off-by: Ard Biesheuvel 
[nsaenz: Rebased, removed documentation change, warnings and add
declaration in acpi_iort.h]
Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c  |  6 +
 drivers/acpi/arm64/iort.c | 51 +++
 include/linux/acpi_iort.h |  4 +++
 3 files changed, 61 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 97b0d2768349..f321761eedb2 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -196,6 +197,11 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
 #ifdef CONFIG_ZONE_DMA
zone_dma_bits = min(zone_dma_bits,
(unsigned 
int)ilog2(of_dma_get_max_cpu_address(NULL)));
+
+   if (IS_ENABLED(CONFIG_ACPI))
+   zone_dma_bits = min(zone_dma_bits,
+   acpi_iort_get_zone_dma_size());
+
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9929ff50c0c0..8f530bf3c03b 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1718,3 +1718,54 @@ void __init acpi_iort_init(void)
 
iort_init_platform_devices();
 }
+
+#ifdef CONFIG_ZONE_DMA
+/*
+ * Check the IORT whether any devices exist whose DMA mask is < 32 bits.
+ * If so, return the smallest value encountered, or 32 otherwise.
+ */
+unsigned int __init acpi_iort_get_zone_dma_size(void)
+{
+   struct acpi_table_iort *iort;
+   struct acpi_iort_node *node, *end;
+   acpi_status status;
+   u8 limit = 32;
+   int i;
+
+   if (acpi_disabled)
+   return limit;
+
+   status = acpi_get_table(ACPI_SIG_IORT, 0,
+   (struct acpi_table_header **));
+   if (ACPI_FAILURE(status))
+   return limit;
+
+   node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset);
+   end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length);
+
+   for (i = 0; i < iort->node_count; i++) {
+   if (node >= end)
+   break;
+
+   switch (node->type) {
+   struct acpi_iort_named_component *ncomp;
+   struct acpi_iort_root_complex *rc;
+
+   case ACPI_IORT_NODE_NAMED_COMPONENT:
+   ncomp = (struct acpi_iort_named_component 
*)node->node_data;
+   if (ncomp->memory_address_limit)
+   limit = min(limit, ncomp->memory_address_limit);
+   break;
+
+   case ACPI_IORT_NODE_PCI_ROOT_COMPLEX:
+   rc = (struct acpi_iort_root_complex *)node->node_data;
+   if (rc->memory_address_limit)
+   limit = min(limit, rc->memory_address_limit);
+   break;
+   }
+   node = ACPI_ADD_PTR(struct acpi_iort_node, node, node->length);
+   }
+   

[PATCH v3 2/8] arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()

2020-10-14 Thread Nicolas Saenz Julienne
zone_dma_bits's initialization happens earlier that it's actually
needed, in arm64_memblock_init(). So move it into the more suitable
zone_sizes_init().

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1d29f2ca81c7..ef0ef0087e2c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -196,6 +196,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
+   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
 #ifdef CONFIG_ZONE_DMA32
@@ -386,11 +388,6 @@ void __init arm64_memblock_init(void)
 
early_init_fdt_scan_reserved_mem();
 
-   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
-   arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
-   }
-
if (IS_ENABLED(CONFIG_ZONE_DMA32))
arm64_dma32_phys_limit = max_zone_phys(32);
else
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 6/8] arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges

2020-10-14 Thread Nicolas Saenz Julienne
We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

The DMA layer also needs to be able to allocate memory that is
guaranteed to meet those DMA constraints, for bounce buffering as well
as allocating the backing for consistent mappings. This is why the 1 GB
ZONE_DMA was introduced recently. Unfortunately, it turns out the having
a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes problems with kdump, and
potentially in other places where allocations cannot cross zone
boundaries. Therefore, we should avoid having two separate DMA zones
when possible.

So, with the help of of_dma_get_max_cpu_address() get the topmost
physical address accessible to all DMA masters in system and use that
information to fine-tune ZONE_DMA's size. In the absence of addressing
limited masters ZONE_DMA will span the whole 32-bit address space,
otherwise, in the case of the Raspberry Pi 4 it'll only span the 30-bit
address space, and have ZONE_DMA32 cover the rest of the 32-bit address
space.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v2:
 - Updated commit log by shamelessly copying Ard's ACPI counterpart commit log

 arch/arm64/include/asm/processor.h | 1 +
 arch/arm64/mm/init.c   | 5 ++---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/processor.h 
b/arch/arm64/include/asm/processor.h
index fce8cbecd6bc..c09d3f1a9a6b 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -97,6 +97,7 @@
 
 extern phys_addr_t arm64_dma_phys_limit;
 #define ARCH_LOW_ADDRESS_LIMIT (arm64_dma_phys_limit - 1)
+#define ZONE_DMA_BITS_DEFAULT  32
 
 struct debug_info {
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ef0ef0087e2c..97b0d2768349 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -42,8 +42,6 @@
 #include 
 #include 
 
-#define ARM64_ZONE_DMA_BITS30
-
 /*
  * We need to be able to catch inadvertent references to memstart_addr
  * that occur (potentially in generic code) before arm64_memblock_init()
@@ -196,7 +194,8 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
 
 #ifdef CONFIG_ZONE_DMA
-   zone_dma_bits = ARM64_ZONE_DMA_BITS;
+   zone_dma_bits = min(zone_dma_bits,
+   (unsigned 
int)ilog2(of_dma_get_max_cpu_address(NULL)));
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 3/8] of/address: Introduce of_dma_get_max_cpu_address()

2020-10-14 Thread Nicolas Saenz Julienne
Introduce of_dma_get_max_cpu_address(), which provides the highest CPU
physical address addressable by all DMA masters in the system. It's
specially useful for setting memory zones sizes at early boot time.

Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v2:
 - Use PHYS_ADDR_MAX
 - return phys_dma_t
 - Rename function
 - Correct subject
 - Add support to start parsing from an arbitrary device node in order
   for the function to work with unit tests

 drivers/of/address.c | 42 ++
 include/linux/of.h   |  7 +++
 2 files changed, 49 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index eb9ab4f1e80b..b5a9695aaf82 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -1024,6 +1024,48 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
 }
 #endif /* CONFIG_HAS_DMA */
 
+/**
+ * of_dma_get_max_cpu_address - Gets highest CPU address suitable for DMA
+ * @np: The node to start searching from or NULL to start from the root
+ *
+ * Gets the highest CPU physical address that is addressable by all DMA masters
+ * in the system (or subtree when np is non-NULL). If no DMA constrained device
+ * is found, it returns PHYS_ADDR_MAX.
+ */
+phys_addr_t __init of_dma_get_max_cpu_address(struct device_node *np)
+{
+   phys_addr_t max_cpu_addr = PHYS_ADDR_MAX;
+   struct of_range_parser parser;
+   phys_addr_t subtree_max_addr;
+   struct device_node *child;
+   phys_addr_t cpu_end = 0;
+   struct of_range range;
+   const __be32 *ranges;
+   int len;
+
+   if (!np)
+   np = of_root;
+
+   ranges = of_get_property(np, "dma-ranges", );
+   if (ranges && len) {
+   of_dma_range_parser_init(, np);
+   for_each_of_range(, )
+   if (range.cpu_addr + range.size > cpu_end)
+   cpu_end = range.cpu_addr + range.size;
+
+   if (max_cpu_addr > cpu_end)
+   max_cpu_addr = cpu_end;
+   }
+
+   for_each_available_child_of_node(np, child) {
+   subtree_max_addr = of_dma_get_max_cpu_address(child);
+   if (max_cpu_addr > subtree_max_addr)
+   max_cpu_addr = subtree_max_addr;
+   }
+
+   return max_cpu_addr;
+}
+
 /**
  * of_dma_is_coherent - Check if device is coherent
  * @np:device node
diff --git a/include/linux/of.h b/include/linux/of.h
index 481ec0467285..db8db8f2c967 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -558,6 +558,8 @@ int of_map_id(struct device_node *np, u32 id,
   const char *map_name, const char *map_mask_name,
   struct device_node **target, u32 *id_out);
 
+phys_addr_t of_dma_get_max_cpu_address(struct device_node *np);
+
 #else /* CONFIG_OF */
 
 static inline void of_core_init(void)
@@ -995,6 +997,11 @@ static inline int of_map_id(struct device_node *np, u32 id,
return -EINVAL;
 }
 
+static inline phys_addr_t of_dma_get_max_cpu_address(struct device_node *np)
+{
+   return PHYS_ADDR_MAX;
+}
+
 #define of_match_ptr(_ptr) NULL
 #define of_match_node(_matches, _node) NULL
 #endif /* CONFIG_OF */
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/8] arm64: Default to 32-bit wide ZONE_DMA

2020-10-14 Thread Nicolas Saenz Julienne
Using two distinct DMA zones turned out to be problematic. Here's an
attempt go back to a saner default.

I tested this on both a RPi4 and QEMU.

---

Changes since v2:
 - Introduce Ard's patch
 - Improve OF dma-ranges parsing function
 - Add unit test for OF function
 - Address small changes
 - Move crashkernel reservation later in boot process

Changes since v1:
 - Parse dma-ranges instead of using machine compatible string

Ard Biesheuvel (1):
  arm64: mm: Set ZONE_DMA size based on early IORT scan

Nicolas Saenz Julienne (7):
  arm64: mm: Move reserve_crashkernel() into mem_init()
  arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
  of/address: Introduce of_dma_get_max_cpu_address()
  of: unittest: Add test for of_dma_get_max_cpu_address()
  dma-direct: Turn zone_dma_bits default value into a define
  arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
  mm: Update DMA zones description

 arch/arm64/include/asm/processor.h |  1 +
 arch/arm64/mm/init.c   | 20 ++--
 drivers/acpi/arm64/iort.c  | 51 ++
 drivers/of/address.c   | 42 
 drivers/of/unittest.c  | 20 
 include/linux/acpi_iort.h  |  4 +++
 include/linux/dma-direct.h |  3 ++
 include/linux/mmzone.h |  5 +--
 include/linux/of.h |  7 
 kernel/dma/direct.c|  2 +-
 10 files changed, 143 insertions(+), 12 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 5/8] dma-direct: Turn zone_dma_bits default value into a define

2020-10-14 Thread Nicolas Saenz Julienne
Set zone_dma_bits default value through a define so as for architectures
to be able to override it with their default value.

Signed-off-by: Nicolas Saenz Julienne 
---
 include/linux/dma-direct.h | 3 +++
 kernel/dma/direct.c| 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 18aade195884..e433d90cbacf 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -12,6 +12,9 @@
 #include 
 #include 
 
+#ifndef ZONE_DMA_BITS_DEFAULT
+#define ZONE_DMA_BITS_DEFAULT  24
+#endif
 extern unsigned int zone_dma_bits;
 
 /*
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 06c111544f61..c0d97f536e93 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -20,7 +20,7 @@
  * it for entirely different regions. In that case the arch code needs to
  * override the variable below for dma-direct to work properly.
  */
-unsigned int zone_dma_bits __ro_after_init = 24;
+unsigned int zone_dma_bits __ro_after_init = ZONE_DMA_BITS_DEFAULT;
 
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
phys_addr_t phys)
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/8] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-10-14 Thread Nicolas Saenz Julienne
crashkernel might reserve memory located in ZONE_DMA. We plan to delay
ZONE_DMA's initialization after unflattening the devicetree and ACPI's
boot table initialization, so move it later in the boot process.
Specifically into mem_init(), this is the last place crashkernel will be
able to reserve the memory before the page allocator kicks in and there is
no need to do it earlier.

Signed-off-by: Nicolas Saenz Julienne 
---
 arch/arm64/mm/init.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a53c1e0fb017..1d29f2ca81c7 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -396,8 +396,6 @@ void __init arm64_memblock_init(void)
else
arm64_dma32_phys_limit = PHYS_MASK + 1;
 
-   reserve_crashkernel();
-
reserve_elfcorehdr();
 
high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
@@ -518,6 +516,8 @@ void __init mem_init(void)
else
swiotlb_force = SWIOTLB_NO_FORCE;
 
+   reserve_crashkernel();
+
set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 8/8] mm: Update DMA zones description

2020-10-14 Thread Nicolas Saenz Julienne
The default behavior for arm64 changed, so reflect that.

Signed-off-by: Nicolas Saenz Julienne 
Acked-by: Catalin Marinas 
---
 include/linux/mmzone.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fb3bf696c05e..4ee2306351b9 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -363,8 +363,9 @@ enum zone_type {
 *  - arm only uses ZONE_DMA, the size, up to 4G, may vary depending on
 *the specific device.
 *
-*  - arm64 has a fixed 1G ZONE_DMA and ZONE_DMA32 for the rest of the
-*lower 4G.
+*  - arm64 uses a single 4GB ZONE_DMA, except on the Raspberry Pi 4,
+*in which ZONE_DMA covers the first GB and ZONE_DMA32 the rest of
+*the lower 4GB.
 *
 *  - powerpc only uses ZONE_DMA, the size, up to 2G, may vary
 *depending on the specific device.
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 3/3] iommu/tegra-smmu: Add PCI support

2020-10-14 Thread Robin Murphy

On 2020-10-09 17:19, Nicolin Chen wrote:

This patch simply adds support for PCI devices.

Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Signed-off-by: Nicolin Chen 
---

Changelog
v6->v7
  * Renamed goto labels, suggested by Thierry.
v5->v6
  * Added Dmitry's Reviewed-by and Tested-by.
v4->v5
  * Added Dmitry's Reviewed-by
v3->v4
  * Dropped !iommu_present() check
  * Added CONFIG_PCI check in the exit path
v2->v3
  * Replaced ternary conditional operator with if-else in .device_group()
  * Dropped change in tegra_smmu_remove()
v1->v2
  * Added error-out labels in tegra_smmu_probe()
  * Dropped pci_request_acs() since IOMMU core would call it.

  drivers/iommu/tegra-smmu.c | 35 +--
  1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index be29f5977145..2941d6459076 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -10,6 +10,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -865,7 +866,11 @@ static struct iommu_group *tegra_smmu_device_group(struct 
device *dev)
group->smmu = smmu;
group->soc = soc;
  
-	group->group = iommu_group_alloc();

+   if (dev_is_pci(dev))
+   group->group = pci_device_group(dev);


Just to check, is it OK to have two or more swgroups "owning" the same 
iommu_group if an existing one gets returned here? It looks like that 
might not play nice with the use of iommu_group_set_iommudata().


Robin.


+   else
+   group->group = generic_device_group(dev);
+
if (IS_ERR(group->group)) {
devm_kfree(smmu->dev, group);
mutex_unlock(>lock);
@@ -1075,22 +1080,32 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
iommu_device_set_fwnode(>iommu, dev->fwnode);
  
  	err = iommu_device_register(>iommu);

-   if (err) {
-   iommu_device_sysfs_remove(>iommu);
-   return ERR_PTR(err);
-   }
+   if (err)
+   goto remove_sysfs;
  
  	err = bus_set_iommu(_bus_type, _smmu_ops);

-   if (err < 0) {
-   iommu_device_unregister(>iommu);
-   iommu_device_sysfs_remove(>iommu);
-   return ERR_PTR(err);
-   }
+   if (err < 0)
+   goto unregister;
+
+#ifdef CONFIG_PCI
+   err = bus_set_iommu(_bus_type, _smmu_ops);
+   if (err < 0)
+   goto unset_platform_bus;
+#endif
  
  	if (IS_ENABLED(CONFIG_DEBUG_FS))

tegra_smmu_debugfs_init(smmu);
  
  	return smmu;

+
+unset_platform_bus: __maybe_unused;
+   bus_set_iommu(_bus_type, NULL);
+unregister:
+   iommu_device_unregister(>iommu);
+remove_sysfs:
+   iommu_device_sysfs_remove(>iommu);
+
+   return ERR_PTR(err);
  }
  
  void tegra_smmu_remove(struct tegra_smmu *smmu)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-14 Thread David Laight
> On Wed, Sep 30, 2020 at 6:09 PM Christoph Hellwig  wrote:
> >
> > Add a new API that returns a virtually non-contigous array of pages
> > and dma address.  This API is only implemented for dma-iommu and will
> > not be implemented for non-iommu DMA API instances that have to allocate
> > contiguous memory.  It is up to the caller to check if the API is
> > available.

Isn't there already a flag that is only implemented for ARM
systems with iommu that forces pages to actually be physically
contiguous?

So isn't this some kind of strange opposite?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 8/8] WIP: add a dma_alloc_contiguous API

2020-10-14 Thread Tomasz Figa
+CC Ricardo who will be looking into using this in the USB stack (UVC
camera driver).

On Wed, Sep 30, 2020 at 6:09 PM Christoph Hellwig  wrote:
>
> Add a new API that returns a virtually non-contigous array of pages
> and dma address.  This API is only implemented for dma-iommu and will
> not be implemented for non-iommu DMA API instances that have to allocate
> contiguous memory.  It is up to the caller to check if the API is
> available.
>
> The intent is that media drivers can use this API if either:
>
>  - no kernel mapping or only temporary kernel mappings are required.
>That is as a better replacement for DMA_ATTR_NO_KERNEL_MAPPING
>  - a kernel mapping is required for cached and DMA mapped pages, but
>the driver also needs the pages to e.g. map them to userspace.
>In that sense it is a replacement for some aspects of the recently
>removed and never fully implemented DMA_ATTR_NON_CONSISTENT
>
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/dma-iommu.c   | 73 +
>  include/linux/dma-mapping.h |  9 +
>  kernel/dma/mapping.c| 35 ++
>  3 files changed, 93 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7922f545cd5eef..158026a856622c 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -565,23 +565,12 @@ static struct page **__iommu_dma_alloc_pages(struct 
> device *dev,
> return pages;
>  }
>
> -/**
> - * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
> - * @dev: Device to allocate memory for. Must be a real device
> - *  attached to an iommu_dma_domain
> - * @size: Size of buffer in bytes
> - * @dma_handle: Out argument for allocated DMA handle
> - * @gfp: Allocation flags
> - * @prot: pgprot_t to use for the remapped mapping
> - * @attrs: DMA attributes for this allocation
> - *
> - * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
> +/*
> + * If size is less than PAGE_SIZE, then a full CPU page will be allocated,
>   * but an IOMMU which supports smaller pages might not map the whole thing.
> - *
> - * Return: Mapped virtual address, or NULL on failure.
>   */
> -static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
> -   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> +static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
> +   size_t size, dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> unsigned long attrs)
>  {
> struct iommu_domain *domain = iommu_get_dma_domain(dev);
> @@ -593,7 +582,6 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> struct page **pages;
> struct sg_table sgt;
> dma_addr_t iova;
> -   void *vaddr;
>
> *dma_handle = DMA_MAPPING_ERROR;
>
> @@ -636,17 +624,10 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> < size)
> goto out_free_sg;
>
> -   vaddr = dma_common_pages_remap(pages, size, prot,
> -   __builtin_return_address(0));
> -   if (!vaddr)
> -   goto out_unmap;
> -
> *dma_handle = iova;
> sg_free_table();
> -   return vaddr;
> +   return pages;
>
> -out_unmap:
> -   __iommu_dma_unmap(dev, iova, size);
>  out_free_sg:
> sg_free_table();
>  out_free_iova:
> @@ -656,6 +637,46 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
> size_t size,
> return NULL;
>  }
>
> +static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
> +   dma_addr_t *dma_handle, gfp_t gfp, pgprot_t prot,
> +   unsigned long attrs)
> +{
> +   struct page **pages;
> +   void *vaddr;
> +
> +   pages = __iommu_dma_alloc_noncontiguous(dev, size, dma_handle, gfp,
> +   prot, attrs);
> +   if (!pages)
> +   return NULL;
> +   vaddr = dma_common_pages_remap(pages, size, prot,
> +   __builtin_return_address(0));
> +   if (!vaddr)
> +   goto out_unmap;
> +   return vaddr;
> +
> +out_unmap:
> +   __iommu_dma_unmap(dev, *dma_handle, size);
> +   __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
> +   return NULL;
> +}
> +
> +#ifdef CONFIG_DMA_REMAP
> +static struct page **iommu_dma_alloc_noncontiguous(struct device *dev,
> +   size_t size, dma_addr_t *dma_handle, gfp_t gfp,
> +   unsigned long attrs)
> +{
> +   return __iommu_dma_alloc_noncontiguous(dev, size, dma_handle, gfp,
> +  PAGE_KERNEL, attrs);
> +}
> +
> +static void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
> +   struct page **pages, dma_addr_t dma_handle)
> +{
> +   __iommu_dma_unmap(dev, dma_handle, size);
> +   

Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built

2020-10-14 Thread Joerg Roedel
On Wed, Oct 14, 2020 at 03:25:08PM +0800, Lu Baolu wrote:
> I suppose Joerg will pick this up. I guess you don't need to resend it
> unless Joerg asks you to do.

Yes, will pick this up soon, no need to re-send.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()

2020-10-14 Thread Rob Herring
On Wed, Oct 14, 2020 at 6:52 AM Nicolas Saenz Julienne
 wrote:
>
> Hi Rob,
>
> On Mon, 2020-10-12 at 10:25 -0500, Rob Herring wrote:
> > On Sat, Oct 10, 2020 at 10:12 AM Nicolas Saenz Julienne
> >  wrote:
> > > The function provides the CPU physical address addressable by the most
> > > constrained bus in the system. It might be useful in order to
> > > dynamically set up memory zones during boot.
> > >
> > > Signed-off-by: Nicolas Saenz Julienne 
> > > ---
> > >  drivers/of/address.c | 34 ++
> > >  include/linux/of.h   |  7 +++
> > >  2 files changed, 41 insertions(+)
> > >
> > > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > > index eb9ab4f1e80b..755e97b65096 100644
> > > --- a/drivers/of/address.c
> > > +++ b/drivers/of/address.c
> > > @@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const 
> > > struct bus_dma_region **map)
> > >  }
> > >  #endif /* CONFIG_HAS_DMA */
> > >
> > > +/**
> > > + * of_dma_safe_phys_limit - Get system wide DMA safe address space
> > > + *
> > > + * Gets the CPU physical address limit for safe DMA addressing system 
> > > wide by
> > > + * searching for the most constraining dma-range. Otherwise it returns 
> > > ~0ULL.
> > > + */
> > > +u64 __init of_dma_safe_phys_limit(void)
> > > +{
> > > +   struct device_node *np = NULL;
> > > +   struct of_range_parser parser;
> > > +   const __be32 *ranges = NULL;
> > > +   u64 phys_dma_limit = ~0ULL;
> > > +   struct of_range range;
> > > +   int len;
> > > +
> > > +   for_each_of_allnodes(np) {
> > > +   dma_addr_t cpu_end = 0;
> > > +
> > > +   ranges = of_get_property(np, "dma-ranges", );
> > > +   if (!ranges || !len)
> > > +   continue;
> > > +
> > > +   of_dma_range_parser_init(, np);
> > > +   for_each_of_range(, )
> > > +   if (range.cpu_addr + range.size > cpu_end)
> > > +   cpu_end = range.cpu_addr + range.size;
> >
> > This doesn't work if you have more than one level of dma-ranges. The
> > address has to be translated first. It should be okay to do that on
> > the start or end address (if not, your DT is broken).
>
> for_each_of_range() calls of_pci_range_parser_one() which utimately populates
> range.cpu_addr with of_translate_dma_address() results. Isn't that good 
> enough?

Indeed. I guess I was remembering the cases not using
for_each_of_range which forgot to do that...

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()

2020-10-14 Thread Nicolas Saenz Julienne
Hi Rob,

On Mon, 2020-10-12 at 10:25 -0500, Rob Herring wrote:
> On Sat, Oct 10, 2020 at 10:12 AM Nicolas Saenz Julienne
>  wrote:
> > The function provides the CPU physical address addressable by the most
> > constrained bus in the system. It might be useful in order to
> > dynamically set up memory zones during boot.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> >  drivers/of/address.c | 34 ++
> >  include/linux/of.h   |  7 +++
> >  2 files changed, 41 insertions(+)
> > 
> > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > index eb9ab4f1e80b..755e97b65096 100644
> > --- a/drivers/of/address.c
> > +++ b/drivers/of/address.c
> > @@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const 
> > struct bus_dma_region **map)
> >  }
> >  #endif /* CONFIG_HAS_DMA */
> > 
> > +/**
> > + * of_dma_safe_phys_limit - Get system wide DMA safe address space
> > + *
> > + * Gets the CPU physical address limit for safe DMA addressing system wide 
> > by
> > + * searching for the most constraining dma-range. Otherwise it returns 
> > ~0ULL.
> > + */
> > +u64 __init of_dma_safe_phys_limit(void)
> > +{
> > +   struct device_node *np = NULL;
> > +   struct of_range_parser parser;
> > +   const __be32 *ranges = NULL;
> > +   u64 phys_dma_limit = ~0ULL;
> > +   struct of_range range;
> > +   int len;
> > +
> > +   for_each_of_allnodes(np) {
> > +   dma_addr_t cpu_end = 0;
> > +
> > +   ranges = of_get_property(np, "dma-ranges", );
> > +   if (!ranges || !len)
> > +   continue;
> > +
> > +   of_dma_range_parser_init(, np);
> > +   for_each_of_range(, )
> > +   if (range.cpu_addr + range.size > cpu_end)
> > +   cpu_end = range.cpu_addr + range.size;
> 
> This doesn't work if you have more than one level of dma-ranges. The
> address has to be translated first. It should be okay to do that on
> the start or end address (if not, your DT is broken).

for_each_of_range() calls of_pci_range_parser_one() which utimately populates
range.cpu_addr with of_translate_dma_address() results. Isn't that good enough?

> Please add/extend a unittest for this.

Will do.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/5] of/address: Introduce of_dma_lower_bus_limit()

2020-10-14 Thread Nicolas Saenz Julienne
On Sun, 2020-10-11 at 09:47 +0200, Ard Biesheuvel wrote:
> Hi Nicolas,
> 
> $SUBJECT is out of sync with the patch below. Also, for legibility, it
> helps if the commit log is intelligible by itself, rather than relying
> on $SUBJECT being the first line of the first paragraph.

Noted, I'll update all commit logs.

> On Sat, 10 Oct 2020 at 17:12, Nicolas Saenz Julienne
>  wrote:
> > The function provides the CPU physical address addressable by the most
> > constrained bus in the system. It might be useful in order to
> > dynamically set up memory zones during boot.
> > 
> > Signed-off-by: Nicolas Saenz Julienne 
> > ---
> >  drivers/of/address.c | 34 ++
> >  include/linux/of.h   |  7 +++
> >  2 files changed, 41 insertions(+)
> > 
> > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > index eb9ab4f1e80b..755e97b65096 100644
> > --- a/drivers/of/address.c
> > +++ b/drivers/of/address.c
> > @@ -1024,6 +1024,40 @@ int of_dma_get_range(struct device_node *np, const 
> > struct bus_dma_region **map)
> >  }
> >  #endif /* CONFIG_HAS_DMA */
> > 
> > +/**
> > + * of_dma_safe_phys_limit - Get system wide DMA safe address space
> > + *
> > + * Gets the CPU physical address limit for safe DMA addressing system wide 
> > by
> > + * searching for the most constraining dma-range. Otherwise it returns 
> > ~0ULL.
> > + */
> > +u64 __init of_dma_safe_phys_limit(void)
> 
> I don't think 'safe' strikes the right tone here. You are looking for
> the highest CPU address that is addressable by all DMA masters in the
> system.
> 
> Something like
> 
> of_dma_get_max_cpu_address(void)
> 
> perhaps? Also, since this is generic code, phys_addr_t is probably a
> better type to return.

Sonds good to me, I dindn't like the name I used either.

Will use with phys_addr_t.

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built

2020-10-14 Thread Lu Baolu

Hi Bartosz,

On 10/14/20 3:18 PM, Bartosz Golaszewski wrote:

On Wed, Oct 14, 2020 at 2:49 AM Lu Baolu  wrote:


On 10/13/20 3:30 PM, Bartosz Golaszewski wrote:

From: Bartosz Golaszewski 

Since commit c40c1018 ("iommu/vt-d: Gracefully handle DMAR units
with no supported address widths") dmar.c needs struct iommu_device to
be selected. We can drop this dependency by not dereferencing struct
iommu_device if IOMMU_API is not selected and by reusing the information
stored in iommu->drhd->ignored instead.

This fixes the following build error when IOMMU_API is not selected:

drivers/iommu/intel/dmar.c: In function ‘free_iommu’:
drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no member 
named ‘ops’
   1139 |  if (intel_iommu_enabled && iommu->iommu.ops) {
  ^

Fixes: c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no supported 
address widths")
Signed-off-by: Bartosz Golaszewski 


With commit title adjusted to "iommu/vt-d: Don't dereference
iommu_device if IOMMU_API is not built",

Acked-by: Lu Baolu 



Do you want me to resend it again with a changed title or can you fix
it up when applying? Or should someone else pick it up?


I suppose Joerg will pick this up. I guess you don't need to resend it
unless Joerg asks you to do.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built

2020-10-14 Thread Bartosz Golaszewski
On Wed, Oct 14, 2020 at 2:49 AM Lu Baolu  wrote:
>
> On 10/13/20 3:30 PM, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski 
> >
> > Since commit c40c1018 ("iommu/vt-d: Gracefully handle DMAR units
> > with no supported address widths") dmar.c needs struct iommu_device to
> > be selected. We can drop this dependency by not dereferencing struct
> > iommu_device if IOMMU_API is not selected and by reusing the information
> > stored in iommu->drhd->ignored instead.
> >
> > This fixes the following build error when IOMMU_API is not selected:
> >
> > drivers/iommu/intel/dmar.c: In function ‘free_iommu’:
> > drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no 
> > member named ‘ops’
> >   1139 |  if (intel_iommu_enabled && iommu->iommu.ops) {
> >  ^
> >
> > Fixes: c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no 
> > supported address widths")
> > Signed-off-by: Bartosz Golaszewski 
>
> With commit title adjusted to "iommu/vt-d: Don't dereference
> iommu_device if IOMMU_API is not built",
>
> Acked-by: Lu Baolu 
>

Do you want me to resend it again with a changed title or can you fix
it up when applying? Or should someone else pick it up?

Bartosz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu