Re: [PATCH v11 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-15 Thread Christoph Hellwig
On Wed, Jun 16, 2021 at 01:10:02PM +0800, Claire Chang wrote:
> On Wed, Jun 16, 2021 at 12:59 PM Christoph Hellwig  wrote:
> >
> > On Wed, Jun 16, 2021 at 12:04:16PM +0800, Claire Chang wrote:
> > > Just noticed that after propagating swiotlb_force setting into
> > > io_tlb_default_mem->force, the memory allocation behavior for
> > > swiotlb_force will change (i.e. always skipping arch_dma_alloc and
> > > dma_direct_alloc_from_pool).
> >
> > Yes, I think we need to split a "use_for_alloc" flag from the force flag.
> 
> How about splitting is_dev_swiotlb_force into is_swiotlb_force_bounce
> (io_tlb_mem->force_bounce) and is_swiotlb_force_alloc
> (io_tlb_mem->force_alloc)?

Yes, something like that.  I'd probably not use force for the alloc side
given that we otherwise never allocte from the swiotlb buffer.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-15 Thread Claire Chang
On Wed, Jun 16, 2021 at 12:59 PM Christoph Hellwig  wrote:
>
> On Wed, Jun 16, 2021 at 12:04:16PM +0800, Claire Chang wrote:
> > Just noticed that after propagating swiotlb_force setting into
> > io_tlb_default_mem->force, the memory allocation behavior for
> > swiotlb_force will change (i.e. always skipping arch_dma_alloc and
> > dma_direct_alloc_from_pool).
>
> Yes, I think we need to split a "use_for_alloc" flag from the force flag.

How about splitting is_dev_swiotlb_force into is_swiotlb_force_bounce
(io_tlb_mem->force_bounce) and is_swiotlb_force_alloc
(io_tlb_mem->force_alloc)?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-15 Thread Christoph Hellwig
On Wed, Jun 16, 2021 at 12:04:16PM +0800, Claire Chang wrote:
> Just noticed that after propagating swiotlb_force setting into
> io_tlb_default_mem->force, the memory allocation behavior for
> swiotlb_force will change (i.e. always skipping arch_dma_alloc and
> dma_direct_alloc_from_pool).

Yes, I think we need to split a "use_for_alloc" flag from the force flag.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-15 Thread Claire Chang
On Wed, Jun 16, 2021 at 11:54 AM Claire Chang  wrote:
>
> Add the functions, swiotlb_{alloc,free} to support the memory allocation
> from restricted DMA pool.
>
> The restricted DMA pool is preferred if available.
>
> Note that since coherent allocation needs remapping, one must set up
> another device coherent pool by shared-dma-pool and use
> dma_alloc_from_dev_coherent instead for atomic coherent allocation.
>
> Signed-off-by: Claire Chang 
> Reviewed-by: Christoph Hellwig 
> ---
>  include/linux/swiotlb.h | 15 +
>  kernel/dma/direct.c | 50 ++---
>  kernel/dma/swiotlb.c| 45 +++--
>  3 files changed, 95 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index efcd56e3a16c..2d5ec670e064 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -156,4 +156,19 @@ static inline void swiotlb_adjust_size(unsigned long 
> size)
>  extern void swiotlb_print_info(void);
>  extern void swiotlb_set_max_segment(unsigned int);
>
> +#ifdef CONFIG_DMA_RESTRICTED_POOL
> +struct page *swiotlb_alloc(struct device *dev, size_t size);
> +bool swiotlb_free(struct device *dev, struct page *page, size_t size);
> +#else
> +static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
> +{
> +   return NULL;
> +}
> +static inline bool swiotlb_free(struct device *dev, struct page *page,
> +   size_t size)
> +{
> +   return false;
> +}
> +#endif /* CONFIG_DMA_RESTRICTED_POOL */
> +
>  #endif /* __LINUX_SWIOTLB_H */
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 3713461d6fe0..da0e09621230 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, 
> phys_addr_t phys, size_t size)
> min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
>  }
>
> +static void __dma_direct_free_pages(struct device *dev, struct page *page,
> +   size_t size)
> +{
> +   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
> +   swiotlb_free(dev, page, size))
> +   return;
> +   dma_free_contiguous(dev, page, size);
> +}
> +
>  static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
> gfp_t gfp)
>  {
> @@ -86,7 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device 
> *dev, size_t size,
>
> gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
>_limit);
> -   page = dma_alloc_contiguous(dev, size, gfp);
> +   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL)) {
> +   page = swiotlb_alloc(dev, size);
> +   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> +   __dma_direct_free_pages(dev, page, size);
> +   return NULL;
> +   }
> +   }
> +
> +   if (!page)
> +   page = dma_alloc_contiguous(dev, size, gfp);
> if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
> dma_free_contiguous(dev, page, size);
> page = NULL;
> @@ -142,7 +160,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> gfp |= __GFP_NOWARN;
>
> if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
> -   !force_dma_unencrypted(dev)) {
> +   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
> page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
> if (!page)
> return NULL;
> @@ -155,18 +173,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> }
>
> if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
> -   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> -   !dev_is_dma_coherent(dev))
> +   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) 
> &&
> +   !is_dev_swiotlb_force(dev))
> return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);

Just noticed that after propagating swiotlb_force setting into
io_tlb_default_mem->force, the memory allocation behavior for
swiotlb_force will change (i.e. always skipping arch_dma_alloc and
dma_direct_alloc_from_pool).

>
> /*
>  * Remapping or decrypting memory may block. If either is required and
>  * we can't block, allocate the memory from the atomic pools.
> +* If restricted DMA (i.e., is_dev_swiotlb_force) is required, one 
> must
> +* set up another device coherent pool by shared-dma-pool and use
> +* dma_alloc_from_dev_coherent instead.
>  */
> if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> !gfpflags_allow_blocking(gfp) &&
> (force_dma_unencrypted(dev) ||
> -(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && 
> 

Re: [PATCH v10 00/12] Restricted DMA

2021-06-15 Thread Claire Chang
v11 https://lore.kernel.org/patchwork/cover/1447216/

On Tue, Jun 15, 2021 at 9:27 PM Claire Chang  wrote:
>
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
>
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
>
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
>
> [1a] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> [1b] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> [2] https://blade.tencent.com/en/advisories/qualpwn/
> [3] 
> https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> [4] 
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>
> v10:
> Address the comments in v9 to
>   - fix the dev->dma_io_tlb_mem assignment
>   - propagate swiotlb_force setting into io_tlb_default_mem->force
>   - move set_memory_decrypted out of swiotlb_init_io_tlb_mem
>   - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block
>   - add swiotlb_ prefix to find_slots and release_slots
>   - merge the 3 alloc/free related patches
>   - move the CONFIG_DMA_RESTRICTED_POOL later
>
> v9:
> Address the comments in v7 to
>   - set swiotlb active pool to dev->dma_io_tlb_mem
>   - get rid of get_io_tlb_mem
>   - dig out the device struct for is_swiotlb_active
>   - move debugfs_create_dir out of swiotlb_create_debugfs
>   - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
>   - use IS_ENABLED in kernel/dma/direct.c
>   - fix redefinition of 'of_dma_set_restricted_buffer'
> https://lore.kernel.org/patchwork/cover/1445081/
>
> v8:
> - Fix reserved-memory.txt and add the reg property in example.
> - Fix sizeof for of_property_count_elems_of_size in
>   drivers/of/address.c#of_dma_set_restricted_buffer.
> - Apply Will's suggestion to try the OF node having DMA configuration in
>   drivers/of/address.c#of_dma_set_restricted_buffer.
> - Fix typo in the comment of 
> drivers/of/address.c#of_dma_set_restricted_buffer.
> - Add error message for PageHighMem in
>   kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
>   rmem_swiotlb_setup.
> - Fix the message string in rmem_swiotlb_setup.
> https://lore.kernel.org/patchwork/cover/1437112/
>
> v7:
> Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
> https://lore.kernel.org/patchwork/cover/1431031/
>
> v6:
> Address the comments in v5
> https://lore.kernel.org/patchwork/cover/1423201/
>
> v5:
> Rebase on latest linux-next
> https://lore.kernel.org/patchwork/cover/1416899/
>
> v4:
> - Fix spinlock bad magic
> - Use rmem->name for debugfs entry
> - Address the comments in v3
> https://lore.kernel.org/patchwork/cover/1378113/
>
> v3:
> Using only one reserved memory region for both streaming DMA and memory
> allocation.
> https://lore.kernel.org/patchwork/cover/1360992/
>
> v2:
> Building on top of swiotlb.
> https://lore.kernel.org/patchwork/cover/1280705/
>
> v1:
> Using dma_map_ops.
> https://lore.kernel.org/patchwork/cover/1271660/
>
>
> Claire Chang (12):
>   swiotlb: Refactor swiotlb init functions
>   swiotlb: Refactor swiotlb_create_debugfs
>   swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
>   swiotlb: Update is_swiotlb_buffer to add a struct device argument
>   swiotlb: Update is_swiotlb_active to add a struct device argument
>   swiotlb: Use is_dev_swiotlb_force for swiotlb data bouncing
>   swiotlb: Move alloc_size to swiotlb_find_slots
>   swiotlb: Refactor swiotlb_tbl_unmap_single
>   swiotlb: Add restricted DMA pool initialization
>   swiotlb: Add restricted DMA alloc/free support
>   dt-bindings: of: Add restricted DMA pool
>   of: Add plumbing for restricted DMA pool
>
>  .../reserved-memory/reserved-memory.txt   |  36 ++-
>  drivers/base/core.c   |   4 +
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>  

[PATCH v11 12/12] of: Add plumbing for restricted DMA pool

2021-06-15 Thread Claire Chang
If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
---
 drivers/of/address.c| 33 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  6 ++
 3 files changed, 42 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 73ddf2540f3f..cdf700fba5c4 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1022,6 +1023,38 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
of_node_put(node);
return ret;
 }
+
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
+{
+   struct device_node *node, *of_node = dev->of_node;
+   int count, i;
+
+   count = of_property_count_elems_of_size(of_node, "memory-region",
+   sizeof(u32));
+   /*
+* If dev->of_node doesn't exist or doesn't contain memory-region, try
+* the OF node having DMA configuration.
+*/
+   if (count <= 0) {
+   of_node = np;
+   count = of_property_count_elems_of_size(
+   of_node, "memory-region", sizeof(u32));
+   }
+
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(of_node, "memory-region", i);
+   /*
+* There might be multiple memory regions, but only one
+* restricted-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(dev, of_node,
+ i);
+   }
+
+   return 0;
+}
 #endif /* CONFIG_HAS_DMA */
 
 /**
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 6cb86de404f1..e68316836a7a 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev, np);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d9e6a324de0a..25cebbed5f02 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -161,12 +161,18 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_set_restricted_buffer(struct device *dev,
+  struct device_node *np)
+{
+   return -ENODEV;
+}
 #endif
 
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 11/12] dt-bindings: of: Add restricted DMA pool

2021-06-15 Thread Claire Chang
Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
---
 .../reserved-memory/reserved-memory.txt   | 36 +--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..46804f24df05 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for 
each corresponding
 
 Example
 ---
-This example defines 3 contiguous regions are defined for Linux kernel:
+This example defines 4 contiguous regions for Linux kernel:
 one default of all device drivers (named linux,cma@7200 and 64MiB in size),
-one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and
-one for multimedia processing (named multimedia-memory@7700, 64MiB).
+one dedicated to the framebuffer device (named framebuffer@7800, 8MiB),
+one for multimedia processing (named multimedia-memory@7700, 64MiB), and
+one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB).
 
 / {
#address-cells = <1>;
@@ -120,6 +138,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_reserved: restricted_dma_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x400>;
+   };
};
 
/* ... */
@@ -138,4 +161,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   reg = <0x8301 0x0 0x 0x0 0x0010
+  0x8301 0x0 0x0010 0x0 0x0010>;
+   memory-region = <_dma_mem_reserved>;
+   /* ... */
+   };
 };
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 10/12] swiotlb: Add restricted DMA pool initialization

2021-06-15 Thread Claire Chang
Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 include/linux/swiotlb.h |  3 +-
 kernel/dma/Kconfig  | 14 
 kernel/dma/swiotlb.c| 75 +
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 2d5ec670e064..9616346b727f 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6ad85b48f101..f3f271f7e272 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -742,4 +749,72 @@ bool swiotlb_free(struct device *dev, struct page *page, 
size_t size)
return true;
 }
 
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   /*
+* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+   mem->force = true;
+   set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
+rmem->size >> PAGE_SHIFT);
+
+   rmem->priv = mem;
+
+   if (IS_ENABLED(CONFIG_DEBUG_FS)) {
+   mem->debugfs =
+   debugfs_create_dir(rmem->name, debugfs_dir);
+   swiotlb_create_debugfs_files(mem);
+   }
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   pr_err("Restricted DMA pool must be accessible within the 
linear mapping.");
+   return -EINVAL;
+   }
+
+   rmem->ops = _swiotlb_ops;
+   pr_info("Reserved memory: created restricted 

[PATCH v11 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-15 Thread Claire Chang
Add the functions, swiotlb_{alloc,free} to support the memory allocation
from restricted DMA pool.

The restricted DMA pool is preferred if available.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 include/linux/swiotlb.h | 15 +
 kernel/dma/direct.c | 50 ++---
 kernel/dma/swiotlb.c| 45 +++--
 3 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index efcd56e3a16c..2d5ec670e064 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -156,4 +156,19 @@ static inline void swiotlb_adjust_size(unsigned long size)
 extern void swiotlb_print_info(void);
 extern void swiotlb_set_max_segment(unsigned int);
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+#else
+static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   return NULL;
+}
+static inline bool swiotlb_free(struct device *dev, struct page *page,
+   size_t size)
+{
+   return false;
+}
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 3713461d6fe0..da0e09621230 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   swiotlb_free(dev, page, size))
+   return;
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -86,7 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
-   page = dma_alloc_contiguous(dev, size, gfp);
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL)) {
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   return NULL;
+   }
+   }
+
+   if (!page)
+   page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
page = NULL;
@@ -142,7 +160,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -155,18 +173,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_dev_swiotlb_force) is required, one must
+* set up another device coherent pool by shared-dma-pool and use
+* dma_alloc_from_dev_coherent instead.
 */
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
-(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && 
!dev_is_dma_coherent(dev
+(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+ !dev_is_dma_coherent(dev))) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
/* we always manually zero the memory once we are done */
@@ -237,7 +260,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return NULL;
}
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   

[PATCH v11 08/12] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-06-15 Thread Claire Chang
Add a new function, swiotlb_release_slots, to make the code reusable for
supporting different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 037772724b3c..fec4934b9926 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -555,27 +555,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -610,6 +598,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   swiotlb_release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 07/12] swiotlb: Move alloc_size to swiotlb_find_slots

2021-06-15 Thread Claire Chang
Rename find_slots to swiotlb_find_slots and move the maintenance of
alloc_size to it for better code reusability later.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index a9907ac262fc..037772724b3c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -431,8 +431,8 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, phys_addr_t orig_addr,
-   size_t alloc_size)
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
@@ -487,8 +487,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -529,7 +532,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
 
-   index = find_slots(dev, orig_addr, alloc_size + offset);
+   index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset);
if (index == -1) {
if (!(attrs & DMA_ATTR_NO_WARN))
dev_warn_ratelimited(dev,
@@ -543,11 +546,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 06/12] swiotlb: Use is_dev_swiotlb_force for swiotlb data bouncing

2021-06-15 Thread Claire Chang
Propagate the swiotlb_force setting into io_tlb_default_mem->force and
use it to determine whether to bounce the data or not. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 include/linux/swiotlb.h | 11 +++
 kernel/dma/direct.c |  2 +-
 kernel/dma/direct.h |  2 +-
 kernel/dma/swiotlb.c|  4 
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index dd1c30a83058..efcd56e3a16c 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
  * unmap calls.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
+ * @force:  %true if swiotlb is forced
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -94,6 +95,7 @@ struct io_tlb_mem {
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+   bool force;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->force;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+   return false;
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..3713461d6fe0 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || is_dev_swiotlb_force(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..6c4d13caceb1 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (is_dev_swiotlb_force(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 101abeb0a57d..a9907ac262fc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->end = mem->start + bytes;
mem->index = 0;
mem->late_alloc = late_alloc;
+
+   if (swiotlb_force == SWIOTLB_FORCE)
+   mem->force = true;
+
spin_lock_init(>lock);
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-15 Thread Claire Chang
Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index a9d65fc8aa0e..4b7afa0fc85d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(obj->base.dev->dev)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 9662522aa066..be15bfd9e0ee 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(dev->dev);
 #endif
 
ret = ttm_bo_device_init(>ttm.bdev, _bo_driver,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..0d56985bfe81 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(>xdev->dev)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d1f3d95881cd..dd1c30a83058 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index a9f5c08dd94a..101abeb0a57d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -663,9 +663,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return dev->dma_io_tlb_mem != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-15 Thread Claire Chang
Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  7 ---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3087d9fa6065..10997ef541f8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 4c89afc0df62..0c6ed09f8513 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..d1f3d95881cd 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -101,9 +102,9 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -115,7 +116,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_device(dev, paddr, sg->length,

[PATCH v11 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-15 Thread Claire Chang
Always have the pointer to the swiotlb pool used in struct device. This
could help simplify the code for other pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 drivers/base/core.c| 4 
 include/linux/device.h | 4 
 kernel/dma/swiotlb.c   | 8 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index f29839382f81..cb3123e3954d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include  /* for dma_default_coherent */
 
@@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
+#ifdef CONFIG_SWIOTLB
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+#endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index ba660731bd25..240d652a0696 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,6 +416,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -518,6 +519,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_SWIOTLB
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index af416bcd1914..a9f5c08dd94a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -339,7 +339,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
phys_addr_t orig_addr = mem->slots[index].orig_addr;
@@ -430,7 +430,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -507,7 +507,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -558,7 +558,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
unsigned long flags;
unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 02/12] swiotlb: Refactor swiotlb_create_debugfs

2021-06-15 Thread Claire Chang
Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 3ba0f08a39a1..af416bcd1914 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -670,19 +670,26 @@ bool is_swiotlb_active(void)
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
+static struct dentry *debugfs_dir;
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
-   if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+   if (mem) {
+   mem->debugfs = debugfs_dir;
+   swiotlb_create_debugfs_files(mem);
+   }
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 01/12] swiotlb: Refactor swiotlb init functions

2021-06-15 Thread Claire Chang
Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 49 ++--
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 52e2ac526757..3ba0f08a39a1 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(>lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -297,20 +308,8 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v11 00/12] Restricted DMA

2021-06-15 Thread Claire Chang
This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v11:
- Rebase against swiotlb devel/for-linus-5.14
- s/mempry/memory/g
- exchange the order of patch 09/12 and 10/12
https://lore.kernel.org/patchwork/cover/1446882/

v10:
Address the comments in v9 to
  - fix the dev->dma_io_tlb_mem assignment
  - propagate swiotlb_force setting into io_tlb_default_mem->force
  - move set_memory_decrypted out of swiotlb_init_io_tlb_mem
  - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block
  - add swiotlb_ prefix to find_slots and release_slots
  - merge the 3 alloc/free related patches
  - move the CONFIG_DMA_RESTRICTED_POOL later

v9:
Address the comments in v7 to
  - set swiotlb active pool to dev->dma_io_tlb_mem
  - get rid of get_io_tlb_mem
  - dig out the device struct for is_swiotlb_active
  - move debugfs_create_dir out of swiotlb_create_debugfs
  - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
  - use IS_ENABLED in kernel/dma/direct.c
  - fix redefinition of 'of_dma_set_restricted_buffer'
https://lore.kernel.org/patchwork/cover/1445081/

v8:
- Fix reserved-memory.txt and add the reg property in example.
- Fix sizeof for of_property_count_elems_of_size in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Apply Will's suggestion to try the OF node having DMA configuration in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer.
- Add error message for PageHighMem in
  kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
  rmem_swiotlb_setup.
- Fix the message string in rmem_swiotlb_setup.
https://lore.kernel.org/patchwork/cover/1437112/

v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
https://lore.kernel.org/patchwork/cover/1431031/

v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/

Claire Chang (12):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Use is_dev_swiotlb_force for swiotlb data bouncing
  swiotlb: Move alloc_size to swiotlb_find_slots
  swiotlb: Refactor swiotlb_tbl_unmap_single
  swiotlb: Add restricted DMA alloc/free support
  swiotlb: Add restricted DMA pool initialization
  dt-bindings: of: Add restricted DMA pool
  of: Add plumbing for restricted DMA pool

 .../reserved-memory/reserved-memory.txt   |  36 ++-
 drivers/base/core.c   |   4 +
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c |   2 +-
 drivers/iommu/dma-iommu.c |  12 +-
 drivers/of/address.c  |  33 

Re: [PATCH 1/1] iommu/arm-smmu-v3: remove unnecessary oom message

2021-06-15 Thread Leizhen (ThunderTown)



On 2021/6/15 19:55, Will Deacon wrote:
> On Tue, Jun 15, 2021 at 12:51:38PM +0100, Robin Murphy wrote:
>> On 2021-06-15 12:34, Will Deacon wrote:
>>> On Tue, Jun 15, 2021 at 07:22:10PM +0800, Leizhen (ThunderTown) wrote:


 On 2021/6/11 18:32, Will Deacon wrote:
> On Wed, Jun 09, 2021 at 08:54:38PM +0800, Zhen Lei wrote:
>> Fixes scripts/checkpatch.pl warning:
>> WARNING: Possible unnecessary 'out of memory' message
>>
>> Remove it can help us save a bit of memory.
>>
>> Signed-off-by: Zhen Lei 
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 ++--
>>   1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
>> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 2ddc3cd5a7d1..fd7c55b44881 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -2787,10 +2787,8 @@ static int arm_smmu_init_l1_strtab(struct 
>> arm_smmu_device *smmu)
>>  void *strtab = smmu->strtab_cfg.strtab;
>>  cfg->l1_desc = devm_kzalloc(smmu->dev, size, GFP_KERNEL);
>> -if (!cfg->l1_desc) {
>> -dev_err(smmu->dev, "failed to allocate l1 stream table 
>> desc\n");
>> +if (!cfg->l1_desc)
>
> What error do you get if devm_kzalloc() fails? I'd like to make sure it's
> easy to track down _which_ allocation failed in that case -- does it give
> you a line number, for example?

 When devm_kzalloc() fails, the OOM information is printed. No line number 
 information, but the
 size(order) and call stack is printed. It doesn't matter which allocation 
 failed, the failure
 is caused by insufficient system memory rather than the fault of the SMMU 
 driver. Therefore,
 the current printing is not helpful for locating the problem of 
 insufficient memory. After all,
 when memory allocation fails, the SMMU driver cannot work at a lower 
 specification.
>>>
>>> I don't entirely agree. Another reason for the failure is because the driver
>>> might be asking for a huge (or negative) allocation, in which case it might
>>> be instructive to have a look at the actual caller, particularly if the
>>> size is derived from hardware or firmware properties.
>>
>> Agreed - other than deliberately-contrived situations I don't think I've
>> ever hit a genuine OOM, but I definitely have debugged attempts to allocate
>> -1 of something. If the driver-specific message actually calls out the
>> critical information, e.g. "failed to allocate %d stream table entries", it
>> gives debugging a head start since the miscalculation is obvious, but a
>> static message that only identifies the callsite really only saves a quick
>> trip to scripts/faddr2line, and personally I've never found that
>> particularly valuable.
> 
> So it sounds like this particular patch is fine, but the one for smmuv2
> should leave the IRQ allocation message alone (by virtue of it printing
> something a bit more useful -- the number of irqs).

num_irqs = 0;
while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
num_irqs++;
}

As the above code, num_irqs is calculated based on the number of dtb or acpi
configuration items, it can't be too large. That is, there is almost zero chance
that devm_kcalloc() will fail because num_irqs is too large.


> 
> Will
> 
> .
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC] /dev/ioasid uAPI proposal

2021-06-15 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, June 16, 2021 7:59 AM
> 
> On Tue, Jun 15, 2021 at 11:56:28PM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Wednesday, June 16, 2021 7:41 AM
> > >
> > > On Tue, Jun 15, 2021 at 11:09:37PM +, Tian, Kevin wrote:
> > >
> > > > which information can you elaborate? This is the area which I'm not
> > > > familiar with thus would appreciate if you can help explain how this
> > > > bus specific information is utilized within the attach function or
> > > > sometime later.
> > >
> > > This is the idea that the device driver needs to specify which bus
> > > specific protocol it uses to issue DMA's when it attaches itself to an
> > > IOASID. For PCI:
> >
> > What about defining some general attributes instead of asking iommu
> > fd to understand those bus specific detail?
> 
> I prefer the API be very clear and intent driven, otherwise things
> just get confused.
> 
> The whole WBINVD/no-snoop discussion I think is proof of that :\
> 
> > from iommu p.o.v there is no difference from last one. In v2 the device
> > driver just needs to communicate the PASID virtualization policy at
> > device binding time,
> 
> I want it documented in the kernel source WTF is happening, because
> otherwise we are going to be completely lost in a few years. And your
> RFC did have device driver specific differences here
> 
> > > The device knows what it is going to do, we need to convey that to the
> > > IOMMU layer so it is prepared properly.
> >
> > Yes, but it's not necessarily to have iommu fd understand bus specific
> > attributes. In the end when /dev/iommu uAPI calls iommu layer interface,
> > it's all bus agnostic.
> 
> Why not? Just put some inline wrappers to translate the bus specific
> language to your generic language if that is what makes the most
> sense.
> 

I can do this. Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-15 Thread Jason Gunthorpe
On Tue, Jun 15, 2021 at 11:56:28PM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, June 16, 2021 7:41 AM
> > 
> > On Tue, Jun 15, 2021 at 11:09:37PM +, Tian, Kevin wrote:
> > 
> > > which information can you elaborate? This is the area which I'm not
> > > familiar with thus would appreciate if you can help explain how this
> > > bus specific information is utilized within the attach function or
> > > sometime later.
> > 
> > This is the idea that the device driver needs to specify which bus
> > specific protocol it uses to issue DMA's when it attaches itself to an
> > IOASID. For PCI:
> 
> What about defining some general attributes instead of asking iommu
> fd to understand those bus specific detail?

I prefer the API be very clear and intent driven, otherwise things
just get confused.

The whole WBINVD/no-snoop discussion I think is proof of that :\

> from iommu p.o.v there is no difference from last one. In v2 the device
> driver just needs to communicate the PASID virtualization policy at
> device binding time, 

I want it documented in the kernel source WTF is happening, because
otherwise we are going to be completely lost in a few years. And your
RFC did have device driver specific differences here

> > The device knows what it is going to do, we need to convey that to the
> > IOMMU layer so it is prepared properly.
> 
> Yes, but it's not necessarily to have iommu fd understand bus specific
> attributes. In the end when /dev/iommu uAPI calls iommu layer interface,
> it's all bus agnostic. 

Why not? Just put some inline wrappers to translate the bus specific
language to your generic language if that is what makes the most
sense.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC] /dev/ioasid uAPI proposal

2021-06-15 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, June 16, 2021 7:41 AM
> 
> On Tue, Jun 15, 2021 at 11:09:37PM +, Tian, Kevin wrote:
> 
> > which information can you elaborate? This is the area which I'm not
> > familiar with thus would appreciate if you can help explain how this
> > bus specific information is utilized within the attach function or
> > sometime later.
> 
> This is the idea that the device driver needs to specify which bus
> specific protocol it uses to issue DMA's when it attaches itself to an
> IOASID. For PCI:

What about defining some general attributes instead of asking iommu
fd to understand those bus specific detail?

> 
> - Normal RID DMA

this is struct device pointer

> - PASID DMA

this is PASID or SSID which is understood by underlying iommu driver

> - ENQCMD triggered PASID DMA

from iommu p.o.v there is no difference from last one. In v2 the device
driver just needs to communicate the PASID virtualization policy at
device binding time, e.g.  whether vPASID is allowed, if yes whether
vPASID must be registered to the kernel, if via kernel whether per-RID
vs. global, etc. This policy is then conveyed to userspace via device 
capability query interface via iommu fd.

> - ATS/PRI enabled or not

Just a generic support I/O page fault or not

> 
> And maybe more. Eg CXL has some other operating modes, I think
> 
> The device knows what it is going to do, we need to convey that to the
> IOMMU layer so it is prepared properly.
> 

Yes, but it's not necessarily to have iommu fd understand bus specific
attributes. In the end when /dev/iommu uAPI calls iommu layer interface,
it's all bus agnostic. 

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-15 Thread Jason Gunthorpe
On Tue, Jun 15, 2021 at 11:09:37PM +, Tian, Kevin wrote:

> which information can you elaborate? This is the area which I'm not
> familiar with thus would appreciate if you can help explain how this
> bus specific information is utilized within the attach function or 
> sometime later.

This is the idea that the device driver needs to specify which bus
specific protocol it uses to issue DMA's when it attaches itself to an
IOASID. For PCI:

- Normal RID DMA
- PASID DMA
- ENQCMD triggered PASID DMA
- ATS/PRI enabled or not

And maybe more. Eg CXL has some other operating modes, I think

The device knows what it is going to do, we need to convey that to the
IOMMU layer so it is prepared properly.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC] /dev/ioasid uAPI proposal

2021-06-15 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, June 16, 2021 7:02 AM
> 
> On Tue, Jun 15, 2021 at 10:59:06PM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Tuesday, June 15, 2021 11:07 PM
> > >
> > > On Tue, Jun 15, 2021 at 08:59:25AM +, Tian, Kevin wrote:
> > > > Hi, Jason,
> > > >
> > > > > From: Jason Gunthorpe
> > > > > Sent: Thursday, June 3, 2021 9:05 PM
> > > > >
> > > > > On Thu, Jun 03, 2021 at 06:39:30AM +, Tian, Kevin wrote:
> > > > > > > > Two helper functions are provided to support
> VFIO_ATTACH_IOASID:
> > > > > > > >
> > > > > > > > struct attach_info {
> > > > > > > > u32 ioasid;
> > > > > > > > // If valid, the PASID to be used physically
> > > > > > > > u32 pasid;
> > > > > > > > };
> > > > > > > > int ioasid_device_attach(struct ioasid_dev *dev,
> > > > > > > > struct attach_info info);
> > > > > > > > int ioasid_device_detach(struct ioasid_dev *dev, u32 
> > > > > > > > ioasid);
> > > > > > >
> > > > > > > Honestly, I still prefer this to be highly explicit as this is 
> > > > > > > where
> > > > > > > all device driver authors get invovled:
> > > > > > >
> > > > > > > ioasid_pci_device_attach(struct pci_device *pdev, struct
> ioasid_dev
> > > *dev,
> > > > > > > u32 ioasid);
> > > > > > > ioasid_pci_device_pasid_attach(struct pci_device *pdev, u32
> > > > > *physical_pasid,
> > > > > > > struct ioasid_dev *dev, u32 ioasid);
> > > > > >
> > > > > > Then better naming it as pci_device_attach_ioasid since the 1st
> > > parameter
> > > > > > is struct pci_device?
> > > > >
> > > > > No, the leading tag indicates the API's primary subystem, in this case
> > > > > it is iommu (and if you prefer list the iommu related arguments first)
> > > > >
> > > >
> > > > I have a question on this suggestion when working on v2.
> > > >
> > > > Within IOMMU fd it uses only the generic struct device pointer, which
> > > > is already saved in struct ioasid_dev at device bind time:
> > > >
> > > > struct ioasid_dev *ioasid_register_device(struct ioasid_ctx 
> > > > *ctx,
> > > > struct device *device, u64 device_label);
> > > >
> > > > What does this additional struct pci_device bring when it's specified in
> > > > the attach call? If we save it in attach_data, at which point will it be
> > > > used or checked?
> > >
> > > The above was for attaching to an ioasid not the register path
> >
> > Yes, I know. and this is my question. When receiving a struct pci_device
> > at attach time, what should IOMMU fd do with it? Just verify whether
> > pci_device->device is same as ioasid_dev->device? if saving it to per-device
> > attach data under ioasid then when will it be further used?
> >
> > I feel once ioasid_dev is returned in the register path, following 
> > operations
> > (unregister, attach, detach) just uses ioasid_dev as the main object.
> 
> The point of having the pci_device specific API was to convey bus
> specific information during the attachment to the IOASID.

which information can you elaborate? This is the area which I'm not
familiar with thus would appreciate if you can help explain how this
bus specific information is utilized within the attach function or 
sometime later.

> 
> The registration of the device to the iommu_fd doesn't need bus
> specific information, AFIAK? So just use a normal struct device
> pointer
> 

yes.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-15 Thread Jason Gunthorpe
On Tue, Jun 15, 2021 at 10:59:06PM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, June 15, 2021 11:07 PM
> > 
> > On Tue, Jun 15, 2021 at 08:59:25AM +, Tian, Kevin wrote:
> > > Hi, Jason,
> > >
> > > > From: Jason Gunthorpe
> > > > Sent: Thursday, June 3, 2021 9:05 PM
> > > >
> > > > On Thu, Jun 03, 2021 at 06:39:30AM +, Tian, Kevin wrote:
> > > > > > > Two helper functions are provided to support VFIO_ATTACH_IOASID:
> > > > > > >
> > > > > > >   struct attach_info {
> > > > > > >   u32 ioasid;
> > > > > > >   // If valid, the PASID to be used physically
> > > > > > >   u32 pasid;
> > > > > > >   };
> > > > > > >   int ioasid_device_attach(struct ioasid_dev *dev,
> > > > > > >   struct attach_info info);
> > > > > > >   int ioasid_device_detach(struct ioasid_dev *dev, u32 ioasid);
> > > > > >
> > > > > > Honestly, I still prefer this to be highly explicit as this is where
> > > > > > all device driver authors get invovled:
> > > > > >
> > > > > > ioasid_pci_device_attach(struct pci_device *pdev, struct ioasid_dev
> > *dev,
> > > > > > u32 ioasid);
> > > > > > ioasid_pci_device_pasid_attach(struct pci_device *pdev, u32
> > > > *physical_pasid,
> > > > > > struct ioasid_dev *dev, u32 ioasid);
> > > > >
> > > > > Then better naming it as pci_device_attach_ioasid since the 1st
> > parameter
> > > > > is struct pci_device?
> > > >
> > > > No, the leading tag indicates the API's primary subystem, in this case
> > > > it is iommu (and if you prefer list the iommu related arguments first)
> > > >
> > >
> > > I have a question on this suggestion when working on v2.
> > >
> > > Within IOMMU fd it uses only the generic struct device pointer, which
> > > is already saved in struct ioasid_dev at device bind time:
> > >
> > >   struct ioasid_dev *ioasid_register_device(struct ioasid_ctx *ctx,
> > >   struct device *device, u64 device_label);
> > >
> > > What does this additional struct pci_device bring when it's specified in
> > > the attach call? If we save it in attach_data, at which point will it be
> > > used or checked?
> > 
> > The above was for attaching to an ioasid not the register path
> 
> Yes, I know. and this is my question. When receiving a struct pci_device
> at attach time, what should IOMMU fd do with it? Just verify whether 
> pci_device->device is same as ioasid_dev->device? if saving it to per-device
> attach data under ioasid then when will it be further used?
> 
> I feel once ioasid_dev is returned in the register path, following operations
> (unregister, attach, detach) just uses ioasid_dev as the main object.

The point of having the pci_device specific API was to convey bus
specific information during the attachment to the IOASID.

The registration of the device to the iommu_fd doesn't need bus
specific information, AFIAK? So just use a normal struct device
pointer

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC] /dev/ioasid uAPI proposal

2021-06-15 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, June 15, 2021 11:07 PM
> 
> On Tue, Jun 15, 2021 at 08:59:25AM +, Tian, Kevin wrote:
> > Hi, Jason,
> >
> > > From: Jason Gunthorpe
> > > Sent: Thursday, June 3, 2021 9:05 PM
> > >
> > > On Thu, Jun 03, 2021 at 06:39:30AM +, Tian, Kevin wrote:
> > > > > > Two helper functions are provided to support VFIO_ATTACH_IOASID:
> > > > > >
> > > > > > struct attach_info {
> > > > > > u32 ioasid;
> > > > > > // If valid, the PASID to be used physically
> > > > > > u32 pasid;
> > > > > > };
> > > > > > int ioasid_device_attach(struct ioasid_dev *dev,
> > > > > > struct attach_info info);
> > > > > > int ioasid_device_detach(struct ioasid_dev *dev, u32 ioasid);
> > > > >
> > > > > Honestly, I still prefer this to be highly explicit as this is where
> > > > > all device driver authors get invovled:
> > > > >
> > > > > ioasid_pci_device_attach(struct pci_device *pdev, struct ioasid_dev
> *dev,
> > > > > u32 ioasid);
> > > > > ioasid_pci_device_pasid_attach(struct pci_device *pdev, u32
> > > *physical_pasid,
> > > > > struct ioasid_dev *dev, u32 ioasid);
> > > >
> > > > Then better naming it as pci_device_attach_ioasid since the 1st
> parameter
> > > > is struct pci_device?
> > >
> > > No, the leading tag indicates the API's primary subystem, in this case
> > > it is iommu (and if you prefer list the iommu related arguments first)
> > >
> >
> > I have a question on this suggestion when working on v2.
> >
> > Within IOMMU fd it uses only the generic struct device pointer, which
> > is already saved in struct ioasid_dev at device bind time:
> >
> > struct ioasid_dev *ioasid_register_device(struct ioasid_ctx *ctx,
> > struct device *device, u64 device_label);
> >
> > What does this additional struct pci_device bring when it's specified in
> > the attach call? If we save it in attach_data, at which point will it be
> > used or checked?
> 
> The above was for attaching to an ioasid not the register path

Yes, I know. and this is my question. When receiving a struct pci_device
at attach time, what should IOMMU fd do with it? Just verify whether 
pci_device->device is same as ioasid_dev->device? if saving it to per-device
attach data under ioasid then when will it be further used?

I feel once ioasid_dev is returned in the register path, following operations
(unregister, attach, detach) just uses ioasid_dev as the main object.

> 
> You should call 'device_label' 'device_cookie' if it is a user
> provided u64
> 

will do.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-15 Thread Konrad Rzeszutek Wilk
On Tue, Jun 15, 2021 at 09:27:02PM +0800, Claire Chang wrote:
> Always have the pointer to the swiotlb pool used in struct device. This
> could help simplify the code for other pools.

Applying: swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
error: patch failed: kernel/dma/swiotlb.c:339
error: kernel/dma/swiotlb.c: patch does not apply
..

Would you be OK rebasing this against devel/for-linus-5.14 please?
(And please send out with the Reviewed-by from Christopher)

Thank you!
> 
> Signed-off-by: Claire Chang 
> ---
>  drivers/base/core.c| 4 
>  include/linux/device.h | 4 
>  kernel/dma/swiotlb.c   | 8 
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index b8a8c96dca58..eeb2d49d3aa3 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include  /* for dma_default_coherent */
>  
> @@ -2846,6 +2847,9 @@ void device_initialize(struct device *dev)
>  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>   dev->dma_coherent = dma_default_coherent;
>  #endif
> +#ifdef CONFIG_SWIOTLB
> + dev->dma_io_tlb_mem = io_tlb_default_mem;
> +#endif
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
>  
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 4443e12238a0..2e9a378c9100 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -432,6 +432,7 @@ struct dev_links_info {
>   * @dma_pools:   Dma pools (if dma'ble device).
>   * @dma_mem: Internal for coherent mem override.
>   * @cma_area:Contiguous memory area for dma allocations
> + * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
>   * @archdata:For arch-specific additions.
>   * @of_node: Associated device tree node.
>   * @fwnode:  Associated device node supplied by platform firmware.
> @@ -540,6 +541,9 @@ struct device {
>  #ifdef CONFIG_DMA_CMA
>   struct cma *cma_area;   /* contiguous memory area for dma
>  allocations */
> +#endif
> +#ifdef CONFIG_SWIOTLB
> + struct io_tlb_mem *dma_io_tlb_mem;
>  #endif
>   /* arch specific additions */
>   struct dev_archdata archdata;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 97c6ad50fdc2..949a6bb21343 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -339,7 +339,7 @@ void __init swiotlb_exit(void)
>  static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
> size,
>  enum dma_data_direction dir)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
>   phys_addr_t orig_addr = mem->slots[index].orig_addr;
>   size_t alloc_size = mem->slots[index].alloc_size;
> @@ -421,7 +421,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
> unsigned int index)
>  static int find_slots(struct device *dev, phys_addr_t orig_addr,
>   size_t alloc_size)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   unsigned long boundary_mask = dma_get_seg_boundary(dev);
>   dma_addr_t tbl_dma_addr =
>   phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
> @@ -498,7 +498,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
> phys_addr_t orig_addr,
>   size_t mapping_size, size_t alloc_size,
>   enum dma_data_direction dir, unsigned long attrs)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   unsigned int offset = swiotlb_align_offset(dev, orig_addr);
>   unsigned int i;
>   int index;
> @@ -549,7 +549,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
> size_t mapping_size, enum dma_data_direction dir,
> unsigned long attrs)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
>   unsigned long flags;
>   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
>   int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> -- 
> 2.32.0.272.g935e593368-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/9] iommu/arm-smmu: Implement ->probe_finalize()

2021-06-15 Thread Will Deacon
On Tue, Jun 15, 2021 at 07:21:35PM +0100, Will Deacon wrote:
> On Tue, Jun 15, 2021 at 06:12:13PM +, Krishna Reddy wrote:
> > > if (smmu->impl->probe_finalize)
> > 
> > The above is the issue. It should be updated as below similar to other 
> > instances impl callbacks.
> > if (smmu->impl && smmu->impl->probe_finalize)
> 
> I'll push a patch on top shortly...

Done:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-thierry/arm-smmu

I'll send this lot to Joerg tomorrow.

Thierry -- feel free to pull in the updated branch if you want the fix
sooner, as it may be a few days before this hits -next.

Cheers,

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD

2021-06-15 Thread Nadav Amit


> On Jun 15, 2021, at 12:20 PM, Robin Murphy  wrote:
> 
> On 2021-06-15 19:14, Nadav Amit wrote:
>>> On Jun 15, 2021, at 5:55 AM, Robin Murphy  wrote:
>>> 
>>> On 2021-06-07 19:25, Nadav Amit wrote:
 From: Nadav Amit 
 AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
 This is in contrast, for instnace, to Intel IOMMUs that have a limit on
 the number of pages that can be flushed in a single flush.  In addition,
 AMD's IOMMU do not care about the page-size, so changes of the page size
 do not need to trigger a TLB flush.
 So in most cases, a TLB flush due to disjoint range or page-size changes
 are not needed for AMD. Yet, vIOMMUs require the hypervisor to
 synchronize the virtualized IOMMU's PTEs with the physical ones. This
 process induce overheads, so it is better not to cause unnecessary
 flushes, i.e., flushes of PTEs that were not modified.
 Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
 of the generic iommu_iotlb_gather_add_page(). Ignore page-size changes
 and disjoint regions unless "non-present cache" feature is reported by
 the IOMMU capabilities, as this is an indication we are running on a
 physical IOMMU. A similar indication is used by VT-d (see "caching
 mode"). The new logic retains the same flushing behavior that we had
 before the introduction of page-selective IOTLB flushes for AMD.
 On virtualized environments, check if the newly flushed region and the
 gathered one are disjoint and flush if it is. Also check whether the new
 region would cause IOTLB invalidation of large region that would include
 unmodified PTE. The latter check is done according to the "order" of the
 IOTLB flush.
>>> 
>>> If it helps,
>>> 
>>> Reviewed-by: Robin Murphy 
>> Thanks!
>>> I wonder if it might be more effective to defer the alignment-based 
>>> splitting part to amd_iommu_iotlb_sync() itself, but that could be 
>>> investigated as another follow-up.
>> Note that the alignment-based splitting is only used for virtualized AMD 
>> IOMMUs, so it has no impact for most users.
>> Right now, the performance is kind of bad on VMs since AMD’s IOMMU driver 
>> does a full IOTLB flush whenever it unmaps more than a single page. So, 
>> although your idea makes sense, I do not know exactly how to implement it 
>> right now, and regardless it is likely to provide much lower performance 
>> improvements than those that avoiding full IOTLB flushes would.
>> Having said that, if I figure out a way to implement it, I would give it a 
>> try (although I am admittedly afraid of a complicated logic that might cause 
>> subtle, mostly undetectable bugs).
> 
> I was mainly thinking that when you observe a change in "order" and sync to 
> avoid over-invalidating adjacent pages, those pages may still be part of the 
> current unmap and you've just not seen them added yet. Hence simply gathering 
> contiguous pages regardless of alignment, then breaking the total range down 
> into appropriately-aligned commands in the sync once you know you've seen 
> everything, seems like it might allow issuing fewer commands overall. But I 
> haven't quite grasped the alignment rules either, so possibly this is moot 
> anyway.

Thanks for explaining. I think that what you propose makes sense. We might 
already flush more than needed in certain cases (e.g., patterns in which pages 
are added before and after the gathered range). I doubt these cases actually 
happen, and the tradeoff between being precise in what you flush (more flushes) 
and not causing the hypervisor to check unchanged mappings (synchronization 
cost) is less obvious.

I will see if I can change __domain_flush_pages() to your liking, and see 
whether it can be part of this series.



signature.asc
Description: Message signed with OpenPGP
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on

2021-06-15 Thread Robin Murphy

On 2021-06-15 19:26, Nadav Amit wrote:




On Jun 15, 2021, at 6:08 AM, Robin Murphy  wrote:

On 2021-06-07 19:25, Nadav Amit wrote:

From: Nadav Amit 
Do not use flush-queue on virtualized environments, where the NpCache
capability of the IOMMU is set. This is required to reduce
virtualization overheads.
This change follows a similar change to Intel's VT-d and a detailed
explanation as for the rationale is described in commit 29b32839725f
("iommu/vt-d: Do not use flush-queue when caching-mode is on").
Cc: Joerg Roedel 
Cc: Will Deacon 
Cc: Jiajun Cao 
Cc: Robin Murphy 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Nadav Amit 
---
  drivers/iommu/amd/init.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index d006724f4dc2..ba3b76ed776d 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1850,8 +1850,13 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
if (ret)
return ret;
  - if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
+   if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
+   if (!amd_iommu_unmap_flush)
+   pr_warn_once("IOMMU batching is disabled due to 
virtualization");


Nit: you can just use pr_warn() (or arguably pr_info()) since the explicit 
conditions already only match once.


Yes, my bad. I will fix it in the next version.


Speaking of which, it might be better to use amd_iommu_np_cache instead, since 
other patches are planning to clean up the last remnants of 
amd_iommu_unmap_flush.


I prefer that the other patches (that remove amd_iommu_unmap_flush) would 
address this code as well. I certainly do not want to embed amd_iommu_np_cache 
deep into the flushing logic. IOW: I don’t know what you have exactly in mind, 
but I prefer the code would be clear.

This code follows (copies?) the same pattern+logic from commit 5f3116ea8b5 
("iommu/vt-d: Do not use flush-queue when caching-mode is on”). I see that changed 
the code in commit 53255e545807c ("iommu: remove DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE”), 
but did not get rid of intel_iommu_strict, so please allow me to use 
amd_iommu_unmap_flush.


Sure, it was just a suggestion to pre-resolve one line of merge conflict 
with another series[1] which is also almost ready, and removes those 
local variables for both AMD and Intel. But there will still be other 
conflicts either way, so it's not a big deal.


Robin.

[1] 
https://lore.kernel.org/linux-iommu/1623414043-40745-5-git-send-email-john.ga...@huawei.com/



To remind you/me/whoever: disabling batching due to caching-mode/NP-cache is 
not inherently needed. It was not needed for quite some time on Intel, but 
somehow along the way the consolidated flushing code broke it, and now it is 
needed (without intrusive code changes).


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD

2021-06-15 Thread Robin Murphy

On 2021-06-15 19:14, Nadav Amit wrote:




On Jun 15, 2021, at 5:55 AM, Robin Murphy  wrote:

On 2021-06-07 19:25, Nadav Amit wrote:

From: Nadav Amit 
AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
This is in contrast, for instnace, to Intel IOMMUs that have a limit on
the number of pages that can be flushed in a single flush.  In addition,
AMD's IOMMU do not care about the page-size, so changes of the page size
do not need to trigger a TLB flush.
So in most cases, a TLB flush due to disjoint range or page-size changes
are not needed for AMD. Yet, vIOMMUs require the hypervisor to
synchronize the virtualized IOMMU's PTEs with the physical ones. This
process induce overheads, so it is better not to cause unnecessary
flushes, i.e., flushes of PTEs that were not modified.
Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
of the generic iommu_iotlb_gather_add_page(). Ignore page-size changes
and disjoint regions unless "non-present cache" feature is reported by
the IOMMU capabilities, as this is an indication we are running on a
physical IOMMU. A similar indication is used by VT-d (see "caching
mode"). The new logic retains the same flushing behavior that we had
before the introduction of page-selective IOTLB flushes for AMD.
On virtualized environments, check if the newly flushed region and the
gathered one are disjoint and flush if it is. Also check whether the new
region would cause IOTLB invalidation of large region that would include
unmodified PTE. The latter check is done according to the "order" of the
IOTLB flush.


If it helps,

Reviewed-by: Robin Murphy 


Thanks!



I wonder if it might be more effective to defer the alignment-based splitting 
part to amd_iommu_iotlb_sync() itself, but that could be investigated as 
another follow-up.


Note that the alignment-based splitting is only used for virtualized AMD 
IOMMUs, so it has no impact for most users.

Right now, the performance is kind of bad on VMs since AMD’s IOMMU driver does 
a full IOTLB flush whenever it unmaps more than a single page. So, although 
your idea makes sense, I do not know exactly how to implement it right now, and 
regardless it is likely to provide much lower performance improvements than 
those that avoiding full IOTLB flushes would.

Having said that, if I figure out a way to implement it, I would give it a try 
(although I am admittedly afraid of a complicated logic that might cause 
subtle, mostly undetectable bugs).


I was mainly thinking that when you observe a change in "order" and sync 
to avoid over-invalidating adjacent pages, those pages may still be part 
of the current unmap and you've just not seen them added yet. Hence 
simply gathering contiguous pages regardless of alignment, then breaking 
the total range down into appropriately-aligned commands in the sync 
once you know you've seen everything, seems like it might allow issuing 
fewer commands overall. But I haven't quite grasped the alignment rules 
either, so possibly this is moot anyway.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/1] iommu/arm-smmu-v3: remove unnecessary oom message

2021-06-15 Thread Will Deacon
On Wed, 9 Jun 2021 20:54:38 +0800, Zhen Lei wrote:
> Fixes scripts/checkpatch.pl warning:
> WARNING: Possible unnecessary 'out of memory' message
> 
> Remove it can help us save a bit of memory.

Applied to will (for-joerg/arm-smmu/updates), thanks!

[1/1] iommu/arm-smmu-v3: Remove unnecessary oom message
  https://git.kernel.org/will/c/affa909571b0

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dt-bindings: Drop redundant minItems/maxItems

2021-06-15 Thread Rob Herring
If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
same size as the list is redundant and can be dropped. Note that is DT
schema specific behavior and not standard json-schema behavior. The tooling
will fixup the final schema adding any unspecified minItems/maxItems.

This condition is partially checked with the meta-schema already, but
only if both 'minItems' and 'maxItems' are equal to the 'items' length.
An improved meta-schema is pending.

Cc: Jens Axboe 
Cc: Stephen Boyd 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Vinod Koul 
Cc: Bartosz Golaszewski 
Cc: Kamal Dasu 
Cc: Jonathan Cameron 
Cc: Lars-Peter Clausen 
Cc: Thomas Gleixner 
Cc: Marc Zyngier 
Cc: Joerg Roedel 
Cc: Jassi Brar 
Cc: Mauro Carvalho Chehab 
Cc: Krzysztof Kozlowski 
Cc: Ulf Hansson 
Cc: Jakub Kicinski 
Cc: Wolfgang Grandegger 
Cc: Marc Kleine-Budde 
Cc: Andrew Lunn 
Cc: Vivien Didelot 
Cc: Vladimir Oltean 
Cc: Bjorn Helgaas 
Cc: Kishon Vijay Abraham I 
Cc: Linus Walleij 
Cc: "Uwe Kleine-König" 
Cc: Lee Jones 
Cc: Ohad Ben-Cohen 
Cc: Mathieu Poirier 
Cc: Philipp Zabel 
Cc: Paul Walmsley 
Cc: Palmer Dabbelt 
Cc: Albert Ou 
Cc: Alessandro Zummo 
Cc: Alexandre Belloni 
Cc: Greg Kroah-Hartman 
Cc: Mark Brown 
Cc: Zhang Rui 
Cc: Daniel Lezcano 
Cc: Wim Van Sebroeck 
Cc: Guenter Roeck 
Signed-off-by: Rob Herring 
---
 .../devicetree/bindings/ata/nvidia,tegra-ahci.yaml  | 1 -
 .../devicetree/bindings/clock/allwinner,sun4i-a10-ccu.yaml  | 2 --
 .../devicetree/bindings/clock/qcom,gcc-apq8064.yaml | 1 -
 Documentation/devicetree/bindings/clock/qcom,gcc-sdx55.yaml | 2 --
 .../devicetree/bindings/clock/qcom,gcc-sm8350.yaml  | 2 --
 .../devicetree/bindings/clock/sprd,sc9863a-clk.yaml | 1 -
 .../devicetree/bindings/crypto/allwinner,sun8i-ce.yaml  | 2 --
 Documentation/devicetree/bindings/crypto/fsl-dcp.yaml   | 1 -
 .../display/allwinner,sun4i-a10-display-backend.yaml| 6 --
 .../bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml  | 1 -
 .../bindings/display/allwinner,sun8i-a83t-dw-hdmi.yaml  | 4 
 .../bindings/display/allwinner,sun8i-a83t-hdmi-phy.yaml | 2 --
 .../bindings/display/allwinner,sun8i-r40-tcon-top.yaml  | 2 --
 .../devicetree/bindings/display/bridge/cdns,mhdp8546.yaml   | 2 --
 .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 2 --
 Documentation/devicetree/bindings/display/st,stm32-dsi.yaml | 2 --
 .../devicetree/bindings/display/st,stm32-ltdc.yaml  | 1 -
 .../devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml | 4 
 .../devicetree/bindings/dma/renesas,rcar-dmac.yaml  | 1 -
 .../devicetree/bindings/edac/amazon,al-mc-edac.yaml | 2 --
 Documentation/devicetree/bindings/eeprom/at24.yaml  | 1 -
 Documentation/devicetree/bindings/example-schema.yaml   | 2 --
 Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 1 -
 Documentation/devicetree/bindings/gpu/vivante,gc.yaml   | 1 -
 Documentation/devicetree/bindings/i2c/brcm,brcmstb-i2c.yaml | 1 -
 .../devicetree/bindings/i2c/marvell,mv64xxx-i2c.yaml| 2 --
 .../devicetree/bindings/i2c/mellanox,i2c-mlxbf.yaml | 1 -
 .../devicetree/bindings/iio/adc/amlogic,meson-saradc.yaml   | 1 -
 .../devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.yaml | 2 --
 .../bindings/interrupt-controller/fsl,irqsteer.yaml | 1 -
 .../bindings/interrupt-controller/loongson,liointc.yaml | 1 -
 Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml| 1 -
 .../devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml   | 1 -
 .../devicetree/bindings/mailbox/st,stm32-ipcc.yaml  | 2 --
 .../devicetree/bindings/media/amlogic,gx-vdec.yaml  | 1 -
 Documentation/devicetree/bindings/media/i2c/adv7604.yaml| 1 -
 .../devicetree/bindings/media/marvell,mmp2-ccic.yaml| 1 -
 .../devicetree/bindings/media/qcom,sc7180-venus.yaml| 1 -
 .../devicetree/bindings/media/qcom,sdm845-venus-v2.yaml | 1 -
 .../devicetree/bindings/media/qcom,sm8250-venus.yaml| 1 -
 Documentation/devicetree/bindings/media/renesas,drif.yaml   | 1 -
 .../bindings/memory-controllers/mediatek,smi-common.yaml| 6 ++
 .../bindings/memory-controllers/mediatek,smi-larb.yaml  | 1 -
 .../devicetree/bindings/mmc/allwinner,sun4i-a10-mmc.yaml| 2 --
 Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml| 1 -
 Documentation/devicetree/bindings/mmc/mtk-sd.yaml   | 2 --
 Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml | 2 --
 Documentation/devicetree/bindings/mmc/sdhci-am654.yaml  | 1 -
 Documentation/devicetree/bindings/mmc/sdhci-pxa.yaml| 1 -
 .../devicetree/bindings/net/amlogic,meson-dwmac.yaml| 2 --
 .../devicetree/bindings/net/brcm,bcm4908-enet.yaml  | 2 --
 Documentation/devicetree/bindings/net/can/bosch,m_can.yaml  | 2 --
 Documentation/devicetree/bindings/net/dsa/brcm,sf2.yaml | 2 --
 

Re: [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers

2021-06-15 Thread Nadav Amit



> On Jun 15, 2021, at 12:05 PM, Nadav Amit  wrote:
> 
> 
> 
>> On Jun 15, 2021, at 3:42 AM, Robin Murphy  wrote:
>> 
>> On 2021-06-07 19:25, Nadav Amit wrote:
>>> From: Robin Murphy 
>>> The Mediatek driver is not the only one which might want a basic
>>> address-based gathering behaviour, so although it's arguably simple
>>> enough to open-code, let's factor it out for the sake of cleanliness.
>>> Let's also take this opportunity to document the intent of these
>>> helpers for clarity.
>>> Cc: Joerg Roedel 
>>> Cc: Will Deacon 
>>> Cc: Jiajun Cao 
>>> Cc: Robin Murphy 
>>> Cc: Lu Baolu 
>>> Cc: iommu@lists.linux-foundation.org
>>> Cc: linux-ker...@vger.kernel.org
>>> Signed-off-by: Robin Murphy 
>> 
>> Nit: missing your signoff.
>> 
>>> ---
>>> Changes from Robin's version:
>>> * Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API
>> 
>> Out of curiosity, is there any config in which a stub is actually needed? 
>> Unlike iommu_iotlb_gather_init(), I would have thought that these helpers 
>> should only ever be called by driver code which already depends on IOMMU_API.
> 
> Indeed, this was only done as a defensive step.
> 
> I will remove it. I see no reason for it. Sorry for ruining your patch.

And remove the stub for iommu_iotlb_gather_is_disjoint() as well.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/6] iommu: Improve iommu_iotlb_gather helpers

2021-06-15 Thread Nadav Amit



> On Jun 15, 2021, at 3:42 AM, Robin Murphy  wrote:
> 
> On 2021-06-07 19:25, Nadav Amit wrote:
>> From: Robin Murphy 
>> The Mediatek driver is not the only one which might want a basic
>> address-based gathering behaviour, so although it's arguably simple
>> enough to open-code, let's factor it out for the sake of cleanliness.
>> Let's also take this opportunity to document the intent of these
>> helpers for clarity.
>> Cc: Joerg Roedel 
>> Cc: Will Deacon 
>> Cc: Jiajun Cao 
>> Cc: Robin Murphy 
>> Cc: Lu Baolu 
>> Cc: iommu@lists.linux-foundation.org
>> Cc: linux-ker...@vger.kernel.org
>> Signed-off-by: Robin Murphy 
> 
> Nit: missing your signoff.
> 
>> ---
>> Changes from Robin's version:
>> * Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API
> 
> Out of curiosity, is there any config in which a stub is actually needed? 
> Unlike iommu_iotlb_gather_init(), I would have thought that these helpers 
> should only ever be called by driver code which already depends on IOMMU_API.

Indeed, this was only done as a defensive step.

I will remove it. I see no reason for it. Sorry for ruining your patch.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 4/6] iommu: Factor iommu_iotlb_gather_is_disjoint() out

2021-06-15 Thread Nadav Amit


> On Jun 15, 2021, at 3:29 AM, Will Deacon  wrote:
> 
> On Fri, Jun 11, 2021 at 09:50:31AM -0700, Nadav Amit wrote:
>> 
>> 
>>> On Jun 11, 2021, at 6:57 AM, Will Deacon  wrote:
>>> 
>>> On Mon, Jun 07, 2021 at 11:25:39AM -0700, Nadav Amit wrote:
 From: Nadav Amit 
 
 Refactor iommu_iotlb_gather_add_page() and factor out the logic that
 detects whether IOTLB gather range and a new range are disjoint. To be
 used by the next patch that implements different gathering logic for
 AMD.
 
 Cc: Joerg Roedel 
 Cc: Will Deacon 
 Cc: Jiajun Cao 
 Cc: Robin Murphy 
 Cc: Lu Baolu 
 Cc: iommu@lists.linux-foundation.org
 Cc: linux-ker...@vger.kernel.org>
 Signed-off-by: Nadav Amit 
 ---
 include/linux/iommu.h | 41 +
 1 file changed, 33 insertions(+), 8 deletions(-)
>>> 
>>> [...]
>>> 
 diff --git a/include/linux/iommu.h b/include/linux/iommu.h
 index f254c62f3720..b5a2bfc68fb0 100644
 --- a/include/linux/iommu.h
 +++ b/include/linux/iommu.h
 @@ -497,6 +497,28 @@ static inline void iommu_iotlb_sync(struct 
 iommu_domain *domain,
iommu_iotlb_gather_init(iotlb_gather);
 }
 
 +/**
 + * iommu_iotlb_gather_is_disjoint - Checks whether a new range is disjoint
 + *
 + * @gather: TLB gather data
 + * @iova: start of page to invalidate
 + * @size: size of page to invalidate
 + *
 + * Helper for IOMMU drivers to check whether a new range is and the 
 gathered
 + * range are disjoint.
>>> 
>>> I can't quite parse this. Delete the "is"?
>> 
>> Indeed. Will do (I mean I will do ;-) )
>> 
>>> 
   For many IOMMUs, flushing the IOMMU in this case is
 + * better than merging the two, which might lead to unnecessary 
 invalidations.
 + */
 +static inline
 +bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather,
 +  unsigned long iova, size_t size)
 +{
 +  unsigned long start = iova, end = start + size - 1;
 +
 +  return gather->end != 0 &&
 +  (end + 1 < gather->start || start > gather->end + 1);
 +}
 +
 +
 /**
 * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation
 * @gather: TLB gather data
 @@ -533,20 +555,16 @@ static inline void 
 iommu_iotlb_gather_add_page(struct iommu_domain *domain,
   struct iommu_iotlb_gather 
 *gather,
   unsigned long iova, size_t size)
 {
 -  unsigned long start = iova, end = start + size - 1;
 -
/*
 * If the new page is disjoint from the current range or is mapped at
 * a different granularity, then sync the TLB so that the gather
 * structure can be rewritten.
 */
 -  if (gather->pgsize != size ||
 -  end + 1 < gather->start || start > gather->end + 1) {
 -  if (gather->pgsize)
 -  iommu_iotlb_sync(domain, gather);
 -  gather->pgsize = size;
 -  }
 +  if ((gather->pgsize && gather->pgsize != size) ||
 +  iommu_iotlb_gather_is_disjoint(gather, iova, size))
 +  iommu_iotlb_sync(domain, gather);
 
 +  gather->pgsize = size;
>>> 
>>> Why have you made this unconditional? I think it's ok, but just not sure
>>> if it's necessary or not.
>> 
>> In regard to gather->pgsize, this function had (and has) an
>> invariant, in which gather->pgsize always represents the flushing
>> granularity of its range. Arguably, “size" should never be
>> zero, but lets assume for the matter of discussion that it might.
>> 
>> If “size” equals to “gather->pgsize”, then the assignment in
>> question has no impact.
>> 
>> Otherwise, if “size” is non-zero, then iommu_iotlb_sync() would
>> initialize the size and range (see iommu_iotlb_gather_init()),
>> and the invariant is kept.
>> 
>> Otherwise, “size” is zero, and “gather” already holds a range,
>> so gather->pgsize is non-zero and
>> (gather->pgsize && gather->pgsize != size) is true. Therefore,
>> again, iommu_iotlb_sync() would be called and initialize the
>> size.
>> 
>> I think that this change makes the code much simpler to read.
>> It probably has no performance impact as “gather” is probably
>> cached and anyhow accessed shortly after.
> 
> Thanks. I was just interested in whether it had a functional impact (I don't
> think it does) or whether it was just cleanup.
> 
> With the updated comment:
> 
> Acked-by: Will Deacon 

Thanks. I will add the explanation to the commit log, but not to the code in 
order not to inflate it too much.



signature.asc
Description: Message signed with OpenPGP
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 6/6] iommu/amd: Sync once for scatter-gather operations

2021-06-15 Thread Nadav Amit



> On Jun 15, 2021, at 4:25 AM, Robin Murphy  wrote:
> 
> On 2021-06-07 19:25, Nadav Amit wrote:
>> From: Nadav Amit 
>> On virtual machines, software must flush the IOTLB after each page table
>> entry update.
>> The iommu_map_sg() code iterates through the given scatter-gather list
>> and invokes iommu_map() for each element in the scatter-gather list,
>> which calls into the vendor IOMMU driver through iommu_ops callback. As
>> the result, a single sg mapping may lead to multiple IOTLB flushes.
>> Fix this by adding amd_iotlb_sync_map() callback and flushing at this
>> point after all sg mappings we set.
>> This commit is followed and inspired by commit 933fcd01e97e2
>> ("iommu/vt-d: Add iotlb_sync_map callback").
>> Cc: Joerg Roedel 
>> Cc: Will Deacon 
>> Cc: Jiajun Cao 
>> Cc: Robin Murphy 
>> Cc: Lu Baolu 
>> Cc: iommu@lists.linux-foundation.org
>> Cc: linux-ker...@vger.kernel.org
>> Signed-off-by: Nadav Amit 
>> ---
>>  drivers/iommu/amd/iommu.c | 15 ---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
>> index 128f2e889ced..dd23566f1db8 100644
>> --- a/drivers/iommu/amd/iommu.c
>> +++ b/drivers/iommu/amd/iommu.c
>> @@ -2027,6 +2027,16 @@ static int amd_iommu_attach_device(struct 
>> iommu_domain *dom,
>>  return ret;
>>  }
>>  +static void amd_iommu_iotlb_sync_map(struct iommu_domain *dom,
>> + unsigned long iova, size_t size)
>> +{
>> +struct protection_domain *domain = to_pdomain(dom);
>> +struct io_pgtable_ops *ops = >iop.iop.ops;
>> +
>> +if (ops->map)
> 
> Not too critical since you're only moving existing code around, but is 
> ops->map ever not set? Either way the check ends up looking rather 
> out-of-place here :/
> 
> It's not very clear what the original intent was - I do wonder whether it's 
> supposed to be related to PAGE_MODE_NONE, but given that amd_iommu_map() has 
> an explicit check and errors out early in that case, we'd never get here 
> anyway. Possibly something to come back and clean up later?

[ +Suravee ]

According to what I see in the git log, the checks for ops->map (as well as 
ops->unmap) were relatively recently introduced by Suravee [1] in preparation 
to AMD IOMMU v2 page tables [2]. Since I do not know what he plans, I prefer 
not to touch this code.

[1] 
https://lore.kernel.org/linux-iommu/20200923101442.73157-13-suravee.suthikulpa...@amd.com/
[2] 
https://lore.kernel.org/linux-iommu/20200923101442.73157-1-suravee.suthikulpa...@amd.com/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/6] iommu/amd: Do not use flush-queue when NpCache is on

2021-06-15 Thread Nadav Amit


> On Jun 15, 2021, at 6:08 AM, Robin Murphy  wrote:
> 
> On 2021-06-07 19:25, Nadav Amit wrote:
>> From: Nadav Amit 
>> Do not use flush-queue on virtualized environments, where the NpCache
>> capability of the IOMMU is set. This is required to reduce
>> virtualization overheads.
>> This change follows a similar change to Intel's VT-d and a detailed
>> explanation as for the rationale is described in commit 29b32839725f
>> ("iommu/vt-d: Do not use flush-queue when caching-mode is on").
>> Cc: Joerg Roedel 
>> Cc: Will Deacon 
>> Cc: Jiajun Cao 
>> Cc: Robin Murphy 
>> Cc: Lu Baolu 
>> Cc: iommu@lists.linux-foundation.org
>> Cc: linux-ker...@vger.kernel.org
>> Signed-off-by: Nadav Amit 
>> ---
>>  drivers/iommu/amd/init.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index d006724f4dc2..ba3b76ed776d 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -1850,8 +1850,13 @@ static int __init iommu_init_pci(struct amd_iommu 
>> *iommu)
>>  if (ret)
>>  return ret;
>>  -   if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
>> +if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
>> +if (!amd_iommu_unmap_flush)
>> +pr_warn_once("IOMMU batching is disabled due to 
>> virtualization");
> 
> Nit: you can just use pr_warn() (or arguably pr_info()) since the explicit 
> conditions already only match once.

Yes, my bad. I will fix it in the next version.

> Speaking of which, it might be better to use amd_iommu_np_cache instead, 
> since other patches are planning to clean up the last remnants of 
> amd_iommu_unmap_flush.

I prefer that the other patches (that remove amd_iommu_unmap_flush) would 
address this code as well. I certainly do not want to embed amd_iommu_np_cache 
deep into the flushing logic. IOW: I don’t know what you have exactly in mind, 
but I prefer the code would be clear.

This code follows (copies?) the same pattern+logic from commit 5f3116ea8b5 
("iommu/vt-d: Do not use flush-queue when caching-mode is on”). I see that 
changed the code in commit 53255e545807c ("iommu: remove 
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE”), but did not get rid of intel_iommu_strict, 
so please allow me to use amd_iommu_unmap_flush.

To remind you/me/whoever: disabling batching due to caching-mode/NP-cache is 
not inherently needed. It was not needed for quite some time on Intel, but 
somehow along the way the consolidated flushing code broke it, and now it is 
needed (without intrusive code changes).

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 3/9] iommu/arm-smmu: Implement ->probe_finalize()

2021-06-15 Thread Will Deacon
On Tue, Jun 15, 2021 at 06:12:13PM +, Krishna Reddy wrote:
> > if (smmu->impl->probe_finalize)
> 
> The above is the issue. It should be updated as below similar to other 
> instances impl callbacks.
> if (smmu->impl && smmu->impl->probe_finalize)

I'll push a patch on top shortly...

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/amd: Tailored gather logic for AMD

2021-06-15 Thread Nadav Amit


> On Jun 15, 2021, at 5:55 AM, Robin Murphy  wrote:
> 
> On 2021-06-07 19:25, Nadav Amit wrote:
>> From: Nadav Amit 
>> AMD's IOMMU can flush efficiently (i.e., in a single flush) any range.
>> This is in contrast, for instnace, to Intel IOMMUs that have a limit on
>> the number of pages that can be flushed in a single flush.  In addition,
>> AMD's IOMMU do not care about the page-size, so changes of the page size
>> do not need to trigger a TLB flush.
>> So in most cases, a TLB flush due to disjoint range or page-size changes
>> are not needed for AMD. Yet, vIOMMUs require the hypervisor to
>> synchronize the virtualized IOMMU's PTEs with the physical ones. This
>> process induce overheads, so it is better not to cause unnecessary
>> flushes, i.e., flushes of PTEs that were not modified.
>> Implement and use amd_iommu_iotlb_gather_add_page() and use it instead
>> of the generic iommu_iotlb_gather_add_page(). Ignore page-size changes
>> and disjoint regions unless "non-present cache" feature is reported by
>> the IOMMU capabilities, as this is an indication we are running on a
>> physical IOMMU. A similar indication is used by VT-d (see "caching
>> mode"). The new logic retains the same flushing behavior that we had
>> before the introduction of page-selective IOTLB flushes for AMD.
>> On virtualized environments, check if the newly flushed region and the
>> gathered one are disjoint and flush if it is. Also check whether the new
>> region would cause IOTLB invalidation of large region that would include
>> unmodified PTE. The latter check is done according to the "order" of the
>> IOTLB flush.
> 
> If it helps,
> 
> Reviewed-by: Robin Murphy 

Thanks!


> I wonder if it might be more effective to defer the alignment-based splitting 
> part to amd_iommu_iotlb_sync() itself, but that could be investigated as 
> another follow-up.

Note that the alignment-based splitting is only used for virtualized AMD 
IOMMUs, so it has no impact for most users.

Right now, the performance is kind of bad on VMs since AMD’s IOMMU driver does 
a full IOTLB flush whenever it unmaps more than a single page. So, although 
your idea makes sense, I do not know exactly how to implement it right now, and 
regardless it is likely to provide much lower performance improvements than 
those that avoiding full IOTLB flushes would.

Having said that, if I figure out a way to implement it, I would give it a try 
(although I am admittedly afraid of a complicated logic that might cause 
subtle, mostly undetectable bugs).
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 3/9] iommu/arm-smmu: Implement ->probe_finalize()

2021-06-15 Thread Krishna Reddy
> if (smmu->impl->probe_finalize)

The above is the issue. It should be updated as below similar to other 
instances impl callbacks.
if (smmu->impl && smmu->impl->probe_finalize)

-KR

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/9] iommu/arm-smmu: Implement ->probe_finalize()

2021-06-15 Thread Robin Murphy

On 2021-06-15 19:01, Marek Szyprowski wrote:

Hi,

On 03.06.2021 18:46, Thierry Reding wrote:

From: Thierry Reding 

Implement a ->probe_finalize() callback that can be used by vendor
implementations to perform extra programming necessary after devices
have been attached to the SMMU.

Signed-off-by: Thierry Reding 


This patch landed recently in linux-next as commit 0d97174aeadf
("iommu/arm-smmu: Implement ->probe_finalize()"). It causes the
following issue on ARM Juno R1 board:


[...]


+static void arm_smmu_probe_finalize(struct device *dev)
+{
+   struct arm_smmu_master_cfg *cfg;
+   struct arm_smmu_device *smmu;
+
+   cfg = dev_iommu_priv_get(dev);
+   smmu = cfg->smmu;
+
+   if (smmu->impl->probe_finalize)


Oops, indeed that needs to check smmu->impl first.

Robin.


+   smmu->impl->probe_finalize(smmu, dev);
+}
+

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/9] iommu/arm-smmu: Implement ->probe_finalize()

2021-06-15 Thread Marek Szyprowski
Hi,

On 03.06.2021 18:46, Thierry Reding wrote:
> From: Thierry Reding 
>
> Implement a ->probe_finalize() callback that can be used by vendor
> implementations to perform extra programming necessary after devices
> have been attached to the SMMU.
>
> Signed-off-by: Thierry Reding 

This patch landed recently in linux-next as commit 0d97174aeadf 
("iommu/arm-smmu: Implement ->probe_finalize()"). It causes the 
following issue on ARM Juno R1 board:

arm-smmu 2b50.iommu: probing hardware configuration...
arm-smmu 2b50.iommu: SMMUv1 with:
arm-smmu 2b50.iommu: stage 2 translation
arm-smmu 2b50.iommu: coherent table walk
arm-smmu 2b50.iommu: stream matching with 32 register groups
arm-smmu 2b50.iommu: 4 context banks (4 stage-2 only)
arm-smmu 2b50.iommu: Supported page sizes: 0x60211000
arm-smmu 2b50.iommu: Stage-2: 40-bit IPA -> 40-bit PA
arm-smmu 7fb0.iommu: probing hardware configuration...
arm-smmu 7fb0.iommu: SMMUv1 with:
arm-smmu 7fb0.iommu: stage 2 translation
arm-smmu 7fb0.iommu: coherent table walk
arm-smmu 7fb0.iommu: stream matching with 16 register groups
arm-smmu 7fb0.iommu: 4 context banks (4 stage-2 only)
arm-smmu 7fb0.iommu: Supported page sizes: 0x60211000
arm-smmu 7fb0.iommu: Stage-2: 40-bit IPA -> 40-bit PA
arm-smmu 7fb1.iommu: probing hardware configuration...
arm-smmu 7fb1.iommu: SMMUv1 with:
arm-smmu 7fb1.iommu: stage 2 translation
arm-smmu 7fb1.iommu: non-coherent table walk
arm-smmu 7fb1.iommu: (IDR0.CTTW overridden by FW configuration)
arm-smmu 7fb1.iommu: stream matching with 2 register groups
arm-smmu 7fb1.iommu: 1 context banks (1 stage-2 only)
arm-smmu 7fb1.iommu: Supported page sizes: 0x60211000
arm-smmu 7fb1.iommu: Stage-2: 40-bit IPA -> 40-bit PA
arm-smmu 7fb2.iommu: probing hardware configuration...
arm-smmu 7fb2.iommu: SMMUv1 with:
arm-smmu 7fb2.iommu: stage 2 translation
arm-smmu 7fb2.iommu: non-coherent table walk
arm-smmu 7fb2.iommu: (IDR0.CTTW overridden by FW configuration)
arm-smmu 7fb2.iommu: stream matching with 2 register groups
arm-smmu 7fb2.iommu: 1 context banks (1 stage-2 only)
arm-smmu 7fb2.iommu: Supported page sizes: 0x60211000
arm-smmu 7fb2.iommu: Stage-2: 40-bit IPA -> 40-bit PA
arm-smmu 7fb3.iommu: probing hardware configuration...
arm-smmu 7fb3.iommu: SMMUv1 with:
arm-smmu 7fb3.iommu: stage 2 translation
arm-smmu 7fb3.iommu: coherent table walk
arm-smmu 7fb3.iommu: stream matching with 2 register groups
arm-smmu 7fb3.iommu: 1 context banks (1 stage-2 only)
arm-smmu 7fb3.iommu: Supported page sizes: 0x60211000
arm-smmu 7fb3.iommu: Stage-2: 40-bit IPA -> 40-bit PA
tda998x 0-0070: found TDA19988
tda998x 0-0071: found TDA19988
brd: module loaded
loop: module loaded
megasas: 07.714.04.00-rc1
sata_sil24 :03:00.0: Adding to iommu group 0
Unable to handle kernel NULL pointer dereference at virtual address 
0070
Mem abort info:
   ESR = 0x9604
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
Data abort info:
   ISV = 0, ISS = 0x0004
   CM = 0, WnR = 0
[0070] user address but active_mm is swapper
Internal error: Oops: 9604 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1+ #3466
Hardware name: ARM Juno development board (r1) (DT)
pstate: 2005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
pc : arm_smmu_probe_finalize+0x14/0x48
lr : iommu_probe_device+0x74/0x120
...
Call trace:
  arm_smmu_probe_finalize+0x14/0x48
  of_iommu_configure+0xe4/0x1b8
  of_dma_configure_id+0xf8/0x2d8
  pci_dma_configure+0x44/0x88
  really_probe+0xc0/0x3c0
  driver_probe_device+0x60/0xc0
  device_driver_attach+0x6c/0x78
  __driver_attach+0xc0/0x100
  bus_for_each_dev+0x68/0xc8
  driver_attach+0x20/0x28
  bus_add_driver+0x168/0x1f8
  driver_register+0x60/0x110
  __pci_register_driver+0x5c/0x68
  sil24_pci_driver_init+0x20/0x28
  do_one_initcall+0x84/0x450
  kernel_init_freeable+0x2dc/0x334
  kernel_init+0x10/0x110
  ret_from_fork+0x10/0x18
Code: b40001e1 f9405821 f9400023 f9401461 (f9403822)
---[ end trace 561eda4b855861d1 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x00240022,25006086
Memory Limit: none
---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b ]---

> ---
> Changes in v2:
> - remove unnecessarily paranoid check
>
>   drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 +
>   drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
>   2 files changed, 14 insertions(+)
>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> 

[PATCH v6 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts

2021-06-15 Thread Georgi Djakov
From: Will Deacon 

The 'addr_merge' parameter to iommu_pgsize() is a fabricated address
intended to describe the alignment requirements to consider when
choosing an appropriate page size. On the iommu_map() path, this address
is the logical OR of the virtual and physical addresses.

Subsequent improvements to iommu_pgsize() will need to check the
alignment of the virtual and physical components of 'addr_merge'
independently, so pass them in as separate parameters and reconstruct
'addr_merge' locally.

No functional change.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/iommu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 80e471ada358..80e14c139d40 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2375,12 +2375,13 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
*domain, dma_addr_t iova)
 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
 
-static size_t iommu_pgsize(struct iommu_domain *domain,
-  unsigned long addr_merge, size_t size)
+static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
+  phys_addr_t paddr, size_t size)
 {
unsigned int pgsize_idx;
unsigned long pgsizes;
size_t pgsize;
+   unsigned long addr_merge = paddr | iova;
 
/* Page sizes supported by the hardware and small enough for @size */
pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
@@ -2433,7 +2434,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size);
 
while (size) {
-   size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
+   size_t pgsize = iommu_pgsize(domain, iova, paddr, size);
 
pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 iova, , pgsize);
@@ -2521,8 +2522,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 * or we hit an area that isn't mapped.
 */
while (unmapped < size) {
-   size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
+   size_t pgsize;
 
+   pgsize = iommu_pgsize(domain, iova, iova, size - unmapped);
unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather);
if (!unmapped_page)
break;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 04/15] iommu: Add a map_pages() op for IOMMU drivers

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can
call into the io-pgtable code, to map a physically contiguous
rnage of pages of the same size.

For IOMMU drivers that do not specify a map_pages() callback,
the existing logic of mapping memory one page block at a time
will be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Acked-by: Lu Baolu 
Signed-off-by: Georgi Djakov 
---
 include/linux/iommu.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 25a844121be5..d7989d4a7404 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -180,6 +180,8 @@ struct iommu_iotlb_gather {
  * @attach_dev: attach device to an iommu domain
  * @detach_dev: detach device from an iommu domain
  * @map: map a physically contiguous memory region to an iommu domain
+ * @map_pages: map a physically contiguous set of pages of the same size to
+ * an iommu domain.
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @unmap_pages: unmap a number of pages of the same size from an iommu domain
  * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
@@ -230,6 +232,9 @@ struct iommu_ops {
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
int (*map)(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+   int (*map_pages)(struct iommu_domain *domain, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped);
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size, struct iommu_iotlb_gather *iotlb_gather);
size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Implement the unmap_pages() callback for the ARM v7s io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index d4004bcf333a..1af060686985 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -710,15 +710,32 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep);
 }
 
-static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
-   size_t size, struct iommu_iotlb_gather *gather)
+static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long 
iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *gather)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
+   size_t unmapped = 0, ret;
 
if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
return 0;
 
-   return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd);
+   while (pgcount--) {
+   ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd);
+   if (!ret)
+   break;
+
+   unmapped += pgsize;
+   iova += pgsize;
+   }
+
+   return unmapped;
+}
+
+static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova,
+   size_t size, struct iommu_iotlb_gather *gather)
+{
+   return arm_v7s_unmap_pages(ops, iova, size, 1, gather);
 }
 
 static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops,
@@ -781,6 +798,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
data->iop.ops = (struct io_pgtable_ops) {
.map= arm_v7s_map,
.unmap  = arm_v7s_unmap,
+   .unmap_pages= arm_v7s_unmap_pages,
.iova_to_phys   = arm_v7s_iova_to_phys,
};
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 00/15] Optimizing iommu_[map/unmap] performance

2021-06-15 Thread Georgi Djakov
When unmapping a buffer from an IOMMU domain, the IOMMU framework unmaps
the buffer at a granule of the largest page size that is supported by
the IOMMU hardware and fits within the buffer. For every block that
is unmapped, the IOMMU framework will call into the IOMMU driver, and
then the io-pgtable framework to walk the page tables to find the entry
that corresponds to the IOVA, and then unmaps the entry.

This can be suboptimal in scenarios where a buffer or a piece of a
buffer can be split into several contiguous page blocks of the same size.
For example, consider an IOMMU that supports 4 KB page blocks, 2 MB page
blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is being
unmapped at IOVA 0. The current call-flow will result in 4 indirect calls,
and 2 page table walks, to unmap 2 entries that are next to each other in
the page-tables, when both entries could have been unmapped in one shot
by clearing both page table entries in the same call.

The same optimization is applicable to mapping buffers as well, so
these patches implement a set of callbacks called unmap_pages and
map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps
an IOVA range that consists of a number of pages of the same
page size that is supported by the IOMMU hardware, and allows for
manipulating multiple page table entries in the same set of indirect
calls. The reason for introducing these callbacks is to give other IOMMU
drivers/io-pgtable formats time to change to using the new callbacks, so
that the transition to using this approach can be done piecemeal.

Changes since V5: 
(https://lore.kernel.org/r/20210408171402.12607-1-isa...@codeaurora.org/)

* Rebased on next-20210515.
* Fixed minor checkpatch warnings - indentation, extra blank lines.
* Use the correct function argument in __arm_lpae_map(). (chenxiang)

Changes since V4:

* Fixed type for addr_merge from phys_addr_t to unsigned long so
  that GENMASK() can be used.
* Hooked up arm_v7s_[unmap/map]_pages to the io-pgtable ops.
* Introduced a macro for calculating the number of page table entries
  for the ARM LPAE io-pgtable format.

Changes since V3:

* Removed usage of ULL variants of bitops from Will's patches, as
  they were not needed.
* Instead of unmapping/mapping pgcount pages, unmap_pages() and
  map_pages() will at most unmap and map pgcount pages, allowing
  for part of the pages in pgcount to be mapped and unmapped. This
  was done to simplify the handling in the io-pgtable layer.
* Extended the existing PTE manipulation methods in io-pgtable-arm
  to handle multiple entries, per Robin's suggestion, eliminating
  the need to add functions to clear multiple PTEs.
* Implemented a naive form of [map/unmap]_pages() for ARM v7s io-pgtable
  format.
* arm_[v7s/lpae]_[map/unmap] will call
  arm_[v7s/lpae]_[map_pages/unmap_pages] with an argument of 1 page.
* The arm_smmu_[map/unmap] functions have been removed, since they
  have been replaced by arm_smmu_[map/unmap]_pages.

Changes since V2:

* Added a check in __iommu_map() to check for the existence
  of either the map or map_pages callback as per Lu's suggestion.

Changes since V1:

* Implemented the map_pages() callbacks
* Integrated Will's patches into this series which
  address several concerns about how iommu_pgsize() partitioned a
  buffer (I made a minor change to the patch which changes
  iommu_pgsize() to use bitmaps by using the ULL variants of
  the bitops)

Isaac J. Manjarres (12):
  iommu/io-pgtable: Introduce unmap_pages() as a page table op
  iommu: Add an unmap_pages() op for IOMMU drivers
  iommu/io-pgtable: Introduce map_pages() as a page table op
  iommu: Add a map_pages() op for IOMMU drivers
  iommu: Add support for the map_pages() callback
  iommu/io-pgtable-arm: Prepare PTE methods for handling multiple
entries
  iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
  iommu/io-pgtable-arm: Implement arm_lpae_map_pages()
  iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()
  iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()
  iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback
  iommu/arm-smmu: Implement the map_pages() IOMMU driver callback

Will Deacon (3):
  iommu: Use bitmap to calculate page size in iommu_pgsize()
  iommu: Split 'addr_merge' argument to iommu_pgsize() into separate
parts
  iommu: Hook up '->unmap_pages' driver callback

 drivers/iommu/arm/arm-smmu/arm-smmu.c |  18 ++--
 drivers/iommu/io-pgtable-arm-v7s.c|  50 +++--
 drivers/iommu/io-pgtable-arm.c| 188 ++
 drivers/iommu/iommu.c | 129 +--
 include/linux/io-pgtable.h|   8 ++
 include/linux/iommu.h |   9 ++
 6 files changed, 287 insertions(+), 115 deletions(-)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 07/15] iommu: Hook up '->unmap_pages' driver callback

2021-06-15 Thread Georgi Djakov
From: Will Deacon 

Extend iommu_pgsize() to populate an optional 'count' parameter so that
we can direct unmapping operation to the ->unmap_pages callback if it
has been provided by the driver.

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/iommu.c | 59 +++
 1 file changed, 50 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 80e14c139d40..725622c7e603 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2376,11 +2376,11 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain 
*domain, dma_addr_t iova)
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
 
 static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova,
-  phys_addr_t paddr, size_t size)
+  phys_addr_t paddr, size_t size, size_t *count)
 {
-   unsigned int pgsize_idx;
+   unsigned int pgsize_idx, pgsize_idx_next;
unsigned long pgsizes;
-   size_t pgsize;
+   size_t offset, pgsize, pgsize_next;
unsigned long addr_merge = paddr | iova;
 
/* Page sizes supported by the hardware and small enough for @size */
@@ -2396,7 +2396,36 @@ static size_t iommu_pgsize(struct iommu_domain *domain, 
unsigned long iova,
/* Pick the biggest page size remaining */
pgsize_idx = __fls(pgsizes);
pgsize = BIT(pgsize_idx);
+   if (!count)
+   return pgsize;
 
+   /* Find the next biggest support page size, if it exists */
+   pgsizes = domain->pgsize_bitmap & ~GENMASK(pgsize_idx, 0);
+   if (!pgsizes)
+   goto out_set_count;
+
+   pgsize_idx_next = __ffs(pgsizes);
+   pgsize_next = BIT(pgsize_idx_next);
+
+   /*
+* There's no point trying a bigger page size unless the virtual
+* and physical addresses are similarly offset within the larger page.
+*/
+   if ((iova ^ paddr) & (pgsize_next - 1))
+   goto out_set_count;
+
+   /* Calculate the offset to the next page size alignment boundary */
+   offset = pgsize_next - (addr_merge & (pgsize_next - 1));
+
+   /*
+* If size is big enough to accommodate the larger page, reduce
+* the number of smaller pages.
+*/
+   if (offset + pgsize_next <= size)
+   size = offset;
+
+out_set_count:
+   *count = size >> pgsize_idx;
return pgsize;
 }
 
@@ -2434,7 +2463,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size);
 
while (size) {
-   size_t pgsize = iommu_pgsize(domain, iova, paddr, size);
+   size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL);
 
pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
 iova, , pgsize);
@@ -2485,6 +2514,19 @@ int iommu_map_atomic(struct iommu_domain *domain, 
unsigned long iova,
 }
 EXPORT_SYMBOL_GPL(iommu_map_atomic);
 
+static size_t __iommu_unmap_pages(struct iommu_domain *domain,
+ unsigned long iova, size_t size,
+ struct iommu_iotlb_gather *iotlb_gather)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize, count;
+
+   pgsize = iommu_pgsize(domain, iova, iova, size, );
+   return ops->unmap_pages ?
+  ops->unmap_pages(domain, iova, pgsize, count, iotlb_gather) :
+  ops->unmap(domain, iova, pgsize, iotlb_gather);
+}
+
 static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long iova, size_t size,
struct iommu_iotlb_gather *iotlb_gather)
@@ -2494,7 +2536,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
unsigned long orig_iova = iova;
unsigned int min_pagesz;
 
-   if (unlikely(ops->unmap == NULL ||
+   if (unlikely(!(ops->unmap || ops->unmap_pages) ||
 domain->pgsize_bitmap == 0UL))
return 0;
 
@@ -2522,10 +2564,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
 * or we hit an area that isn't mapped.
 */
while (unmapped < size) {
-   size_t pgsize;
-
-   pgsize = iommu_pgsize(domain, iova, iova, size - unmapped);
-   unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather);
+   unmapped_page = __iommu_unmap_pages(domain, iova,
+   size - unmapped,
+   iotlb_gather);
if (!unmapped_page)
break;
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 08/15] iommu: Add support for the map_pages() callback

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Since iommu_pgsize can calculate how many pages of the
same size can be mapped/unmapped before the next largest
page size boundary, add support for invoking an IOMMU
driver's map_pages() callback, if it provides one.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/iommu.c | 43 +++
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 725622c7e603..89f8ab6a72a9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2429,6 +2429,30 @@ static size_t iommu_pgsize(struct iommu_domain *domain, 
unsigned long iova,
return pgsize;
 }
 
+static int __iommu_map_pages(struct iommu_domain *domain, unsigned long iova,
+phys_addr_t paddr, size_t size, int prot,
+gfp_t gfp, size_t *mapped)
+{
+   const struct iommu_ops *ops = domain->ops;
+   size_t pgsize, count;
+   int ret;
+
+   pgsize = iommu_pgsize(domain, iova, paddr, size, );
+
+   pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx count %ld\n",
+iova, , pgsize, count);
+
+   if (ops->map_pages) {
+   ret = ops->map_pages(domain, iova, paddr, pgsize, count, prot,
+gfp, mapped);
+   } else {
+   ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
+   *mapped = ret ? 0 : pgsize;
+   }
+
+   return ret;
+}
+
 static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
@@ -2439,7 +2463,7 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
phys_addr_t orig_paddr = paddr;
int ret = 0;
 
-   if (unlikely(ops->map == NULL ||
+   if (unlikely(!(ops->map || ops->map_pages) ||
 domain->pgsize_bitmap == 0UL))
return -ENODEV;
 
@@ -2463,18 +2487,21 @@ static int __iommu_map(struct iommu_domain *domain, 
unsigned long iova,
pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size);
 
while (size) {
-   size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL);
+   size_t mapped = 0;
 
-   pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
-iova, , pgsize);
-   ret = ops->map(domain, iova, paddr, pgsize, prot, gfp);
+   ret = __iommu_map_pages(domain, iova, paddr, size, prot, gfp,
+   );
+   /*
+* Some pages may have been mapped, even if an error occurred,
+* so we should account for those so they can be unmapped.
+*/
+   size -= mapped;
 
if (ret)
break;
 
-   iova += pgsize;
-   paddr += pgsize;
-   size -= pgsize;
+   iova += mapped;
+   paddr += mapped;
}
 
/* unroll mapping in case something went wrong */
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 14/15] iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Implement the unmap_pages() callback for the ARM SMMU driver
to allow calls from iommu_unmap to unmap multiple pages of
the same size in one call. Also, remove the unmap() callback
for the SMMU driver, as it will no longer be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 61233bcc4588..593a15cfa8d5 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1210,8 +1210,9 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
return ret;
 }
 
-static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
-size_t size, struct iommu_iotlb_gather *gather)
+static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long 
iova,
+  size_t pgsize, size_t pgcount,
+  struct iommu_iotlb_gather *iotlb_gather)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1221,7 +1222,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, 
unsigned long iova,
return 0;
 
arm_smmu_rpm_get(smmu);
-   ret = ops->unmap(ops, iova, size, gather);
+   ret = ops->unmap_pages(ops, iova, pgsize, pgcount, iotlb_gather);
arm_smmu_rpm_put(smmu);
 
return ret;
@@ -1574,7 +1575,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_free= arm_smmu_domain_free,
.attach_dev = arm_smmu_attach_dev,
.map= arm_smmu_map,
-   .unmap  = arm_smmu_unmap,
+   .unmap_pages= arm_smmu_unmap_pages,
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
.iova_to_phys   = arm_smmu_iova_to_phys,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 13/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Implement the map_pages() callback for the ARM v7s io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/io-pgtable-arm-v7s.c | 26 ++
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 1af060686985..5db90d7ce2ec 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -519,11 +519,12 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, 
unsigned long iova,
return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep, 
gfp);
 }
 
-static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops);
-   int ret;
+   int ret = -EINVAL;
 
if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
paddr >= (1ULL << data->iop.cfg.oas)))
@@ -533,7 +534,17 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, 
unsigned long iova,
if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
return 0;
 
-   ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd, gfp);
+   while (pgcount--) {
+   ret = __arm_v7s_map(data, iova, paddr, pgsize, prot, 1, 
data->pgd,
+   gfp);
+   if (ret)
+   break;
+
+   iova += pgsize;
+   paddr += pgsize;
+   if (mapped)
+   *mapped += pgsize;
+   }
/*
 * Synchronise all PTE updates for the new mapping before there's
 * a chance for anything to kick off a table walk for the new iova.
@@ -543,6 +554,12 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return ret;
 }
 
+static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova,
+  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+   return arm_v7s_map_pages(ops, iova, paddr, size, 1, prot, gfp, NULL);
+}
+
 static void arm_v7s_free_pgtable(struct io_pgtable *iop)
 {
struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop);
@@ -797,6 +814,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
 
data->iop.ops = (struct io_pgtable_ops) {
.map= arm_v7s_map,
+   .map_pages  = arm_v7s_map_pages,
.unmap  = arm_v7s_unmap,
.unmap_pages= arm_v7s_unmap_pages,
.iova_to_phys   = arm_v7s_iova_to_phys,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 01/15] iommu/io-pgtable: Introduce unmap_pages() as a page table op

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

The io-pgtable code expects to operate on a single block or
granule of memory that is supported by the IOMMU hardware when
unmapping memory.

This means that when a large buffer that consists of multiple
such blocks is unmapped, the io-pgtable code will walk the page
tables to the correct level to unmap each block, even for blocks
that are virtually contiguous and at the same level, which can
incur an overhead in performance.

Introduce the unmap_pages() page table op to express to the
io-pgtable code that it should unmap a number of blocks of
the same size, instead of a single block. Doing so allows
multiple blocks to be unmapped in one call to the io-pgtable
code, reducing the number of page table walks, and indirect
calls.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Will Deacon 
Signed-off-by: Georgi Djakov 
---
 include/linux/io-pgtable.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 4d40dfa75b55..9391c5fa71e6 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -144,6 +144,7 @@ struct io_pgtable_cfg {
  *
  * @map:  Map a physically contiguous memory region.
  * @unmap:Unmap a physically contiguous memory region.
+ * @unmap_pages:  Unmap a range of virtually contiguous pages of the same size.
  * @iova_to_phys: Translate iova to physical address.
  *
  * These functions map directly onto the iommu_ops member functions with
@@ -154,6 +155,9 @@ struct io_pgtable_ops {
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
size_t size, struct iommu_iotlb_gather *gather);
+   size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *gather);
phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops,
unsigned long iova);
 };
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 10/15] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Implement the unmap_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/io-pgtable-arm.c | 75 +++---
 1 file changed, 49 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index ea66b10c04c4..1b690911995a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -46,6 +46,9 @@
 #define ARM_LPAE_PGD_SIZE(d)   \
(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
 
+#define ARM_LPAE_PTES_PER_TABLE(d) \
+   (ARM_LPAE_GRANULE(d) >> ilog2(sizeof(arm_lpae_iopte)))
+
 /*
  * Calculate the index at level l used to map virtual address a using the
  * pagetable in d.
@@ -253,8 +256,8 @@ static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, 
arm_lpae_iopte pte,
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
-  unsigned long iova, size_t size, int lvl,
-  arm_lpae_iopte *ptep);
+  unsigned long iova, size_t size, size_t pgcount,
+  int lvl, arm_lpae_iopte *ptep);
 
 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
phys_addr_t paddr, arm_lpae_iopte prot,
@@ -298,7 +301,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable 
*data,
size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
 
tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz,
+   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1,
 lvl, tblp) != sz) {
WARN_ON(1);
return -EINVAL;
@@ -526,14 +529,15 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
   unsigned long iova, size_t size,
   arm_lpae_iopte blk_pte, int lvl,
-  arm_lpae_iopte *ptep)
+  arm_lpae_iopte *ptep, size_t pgcount)
 {
struct io_pgtable_cfg *cfg = >iop.cfg;
arm_lpae_iopte pte, *tablep;
phys_addr_t blk_paddr;
size_t tablesz = ARM_LPAE_GRANULE(data);
size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-   int i, unmap_idx = -1;
+   int ptes_per_table = ARM_LPAE_PTES_PER_TABLE(data);
+   int i, unmap_idx_start = -1, num_entries = 0, max_entries;
 
if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
return 0;
@@ -542,15 +546,18 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
if (!tablep)
return 0; /* Bytes unmapped */
 
-   if (size == split_sz)
-   unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   if (size == split_sz) {
+   unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   max_entries = ptes_per_table - unmap_idx_start;
+   num_entries = min_t(int, pgcount, max_entries);
+   }
 
blk_paddr = iopte_to_paddr(blk_pte, data);
pte = iopte_prot(blk_pte);
 
-   for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
+   for (i = 0; i < ptes_per_table; i++, blk_paddr += split_sz) {
/* Unmap! */
-   if (i == unmap_idx)
+   if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries))
continue;
 
__arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, [i]);
@@ -568,38 +575,44 @@ static size_t arm_lpae_split_blk_unmap(struct 
arm_lpae_io_pgtable *data,
return 0;
 
tablep = iopte_deref(pte, data);
-   } else if (unmap_idx >= 0) {
-   io_pgtable_tlb_add_page(>iop, gather, iova, size);
-   return size;
+   } else if (unmap_idx_start >= 0) {
+   for (i = 0; i < num_entries; i++)
+   io_pgtable_tlb_add_page(>iop, gather, iova + i * 
size, size);
+
+   return num_entries * size;
}
 
-   return __arm_lpae_unmap(data, gather, iova, size, lvl, tablep);
+   return __arm_lpae_unmap(data, gather, iova, size, pgcount, lvl, tablep);
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
   struct iommu_iotlb_gather *gather,
-  unsigned long iova, size_t size, int lvl,
-  arm_lpae_iopte *ptep)
+  unsigned long 

[PATCH v6 11/15] iommu/io-pgtable-arm: Implement arm_lpae_map_pages()

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Implement the map_pages() callback for the ARM LPAE io-pgtable
format.

Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/io-pgtable-arm.c | 41 +++--
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 1b690911995a..6a6af9b0678e 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -344,20 +344,30 @@ static arm_lpae_iopte 
arm_lpae_install_table(arm_lpae_iopte *table,
 }
 
 static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
- phys_addr_t paddr, size_t size, arm_lpae_iopte prot,
- int lvl, arm_lpae_iopte *ptep, gfp_t gfp)
+ phys_addr_t paddr, size_t size, size_t pgcount,
+ arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep,
+ gfp_t gfp, size_t *mapped)
 {
arm_lpae_iopte *cptep, pte;
size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data);
size_t tblsz = ARM_LPAE_GRANULE(data);
struct io_pgtable_cfg *cfg = >iop.cfg;
+   int ret = 0, num_entries, max_entries, map_idx_start;
 
/* Find our entry at the current level */
-   ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
+   map_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data);
+   ptep += map_idx_start;
 
/* If we can install a leaf entry at this level, then do so */
-   if (size == block_size)
-   return arm_lpae_init_pte(data, iova, paddr, prot, lvl, 1, ptep);
+   if (size == block_size) {
+   max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start;
+   num_entries = min_t(int, pgcount, max_entries);
+   ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, 
num_entries, ptep);
+   if (!ret && mapped)
+   *mapped += num_entries * size;
+
+   return ret;
+   }
 
/* We can't allocate tables at the final level */
if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1))
@@ -386,7 +396,8 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, 
unsigned long iova,
}
 
/* Rinse, repeat */
-   return __arm_lpae_map(data, iova, paddr, size, prot, lvl + 1, cptep, 
gfp);
+   return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1,
+ cptep, gfp, mapped);
 }
 
 static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
@@ -453,8 +464,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
arm_lpae_io_pgtable *data,
return pte;
 }
 
-static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
-   phys_addr_t paddr, size_t size, int iommu_prot, gfp_t 
gfp)
+static int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova,
+ phys_addr_t paddr, size_t pgsize, size_t pgcount,
+ int iommu_prot, gfp_t gfp, size_t *mapped)
 {
struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
struct io_pgtable_cfg *cfg = >iop.cfg;
@@ -463,7 +475,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
arm_lpae_iopte prot;
long iaext = (s64)iova >> cfg->ias;
 
-   if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+   if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize))
return -EINVAL;
 
if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
@@ -476,7 +488,8 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return 0;
 
prot = arm_lpae_prot_to_pte(data, iommu_prot);
-   ret = __arm_lpae_map(data, iova, paddr, size, prot, lvl, ptep, gfp);
+   ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl,
+ptep, gfp, mapped);
/*
 * Synchronise all PTE updates for the new mapping before there's
 * a chance for anything to kick off a table walk for the new iova.
@@ -486,6 +499,13 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, 
unsigned long iova,
return ret;
 }
 
+static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
+   phys_addr_t paddr, size_t size, int iommu_prot, gfp_t 
gfp)
+{
+   return arm_lpae_map_pages(ops, iova, paddr, size, 1, iommu_prot, gfp,
+ NULL);
+}
+
 static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
arm_lpae_iopte *ptep)
 {
@@ -782,6 +802,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 
data->iop.ops = (struct io_pgtable_ops) {
.map= arm_lpae_map,
+   .map_pages  = arm_lpae_map_pages,
.unmap  = arm_lpae_unmap,

[PATCH v6 02/15] iommu: Add an unmap_pages() op for IOMMU drivers

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Add a callback for IOMMU drivers to provide a path for the
IOMMU framework to call into an IOMMU driver, which can call
into the io-pgtable code, to unmap a virtually contiguous
range of pages of the same size.

For IOMMU drivers that do not specify an unmap_pages() callback,
the existing logic of unmapping memory one page block at a time
will be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Will Deacon 
Acked-by: Lu Baolu 
Signed-off-by: Georgi Djakov 
---
 include/linux/iommu.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..25a844121be5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -181,6 +181,7 @@ struct iommu_iotlb_gather {
  * @detach_dev: detach device from an iommu domain
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @unmap_pages: unmap a number of pages of the same size from an iommu domain
  * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
@@ -231,6 +232,9 @@ struct iommu_ops {
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size, struct iommu_iotlb_gather *iotlb_gather);
+   size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova,
+ size_t pgsize, size_t pgcount,
+ struct iommu_iotlb_gather *iotlb_gather);
void (*flush_iotlb_all)(struct iommu_domain *domain);
void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova,
   size_t size);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 15/15] iommu/arm-smmu: Implement the map_pages() IOMMU driver callback

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Implement the map_pages() callback for the ARM SMMU driver
to allow calls from iommu_map to map multiple pages of
the same size in one call. Also, remove the map() callback
for the ARM SMMU driver, as it will no longer be used.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 593a15cfa8d5..c1ca3b49a620 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1193,8 +1193,9 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return ret;
 }
 
-static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
-   phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t pgsize, size_t pgcount,
+ int prot, gfp_t gfp, size_t *mapped)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu;
@@ -1204,7 +1205,7 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
return -ENODEV;
 
arm_smmu_rpm_get(smmu);
-   ret = ops->map(ops, iova, paddr, size, prot, gfp);
+   ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, 
mapped);
arm_smmu_rpm_put(smmu);
 
return ret;
@@ -1574,7 +1575,7 @@ static struct iommu_ops arm_smmu_ops = {
.domain_alloc   = arm_smmu_domain_alloc,
.domain_free= arm_smmu_domain_free,
.attach_dev = arm_smmu_attach_dev,
-   .map= arm_smmu_map,
+   .map_pages  = arm_smmu_map_pages,
.unmap_pages= arm_smmu_unmap_pages,
.flush_iotlb_all= arm_smmu_flush_iotlb_all,
.iotlb_sync = arm_smmu_iotlb_sync,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 05/15] iommu: Use bitmap to calculate page size in iommu_pgsize()

2021-06-15 Thread Georgi Djakov
From: Will Deacon 

Avoid the potential for shifting values by amounts greater than the
width of their type by using a bitmap to compute page size in
iommu_pgsize().

Signed-off-by: Will Deacon 
Signed-off-by: Isaac J. Manjarres 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/iommu.c | 31 ---
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5419c4b9f27a..80e471ada358 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -8,6 +8,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2378,30 +2379,22 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
   unsigned long addr_merge, size_t size)
 {
unsigned int pgsize_idx;
+   unsigned long pgsizes;
size_t pgsize;
 
-   /* Max page size that still fits into 'size' */
-   pgsize_idx = __fls(size);
+   /* Page sizes supported by the hardware and small enough for @size */
+   pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0);
 
-   /* need to consider alignment requirements ? */
-   if (likely(addr_merge)) {
-   /* Max page size allowed by address */
-   unsigned int align_pgsize_idx = __ffs(addr_merge);
-   pgsize_idx = min(pgsize_idx, align_pgsize_idx);
-   }
-
-   /* build a mask of acceptable page sizes */
-   pgsize = (1UL << (pgsize_idx + 1)) - 1;
-
-   /* throw away page sizes not supported by the hardware */
-   pgsize &= domain->pgsize_bitmap;
+   /* Constrain the page sizes further based on the maximum alignment */
+   if (likely(addr_merge))
+   pgsizes &= GENMASK(__ffs(addr_merge), 0);
 
-   /* make sure we're still sane */
-   BUG_ON(!pgsize);
+   /* Make sure we have at least one suitable page size */
+   BUG_ON(!pgsizes);
 
-   /* pick the biggest page */
-   pgsize_idx = __fls(pgsize);
-   pgsize = 1UL << pgsize_idx;
+   /* Pick the biggest page size remaining */
+   pgsize_idx = __fls(pgsizes);
+   pgsize = BIT(pgsize_idx);
 
return pgsize;
 }
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v6 09/15] iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

The PTE methods currently operate on a single entry. In preparation
for manipulating multiple PTEs in one map or unmap call, allow them
to handle multiple PTEs.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Robin Murphy 
Signed-off-by: Georgi Djakov 
---
 drivers/iommu/io-pgtable-arm.c | 78 --
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..ea66b10c04c4 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -232,20 +232,23 @@ static void __arm_lpae_free_pages(void *pages, size_t 
size,
free_pages((unsigned long)pages, get_order(size));
 }
 
-static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep,
+static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
struct io_pgtable_cfg *cfg)
 {
dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep),
-  sizeof(*ptep), DMA_TO_DEVICE);
+  sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
 }
 
 static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte,
-  struct io_pgtable_cfg *cfg)
+  int num_entries, struct io_pgtable_cfg *cfg)
 {
-   *ptep = pte;
+   int i;
+
+   for (i = 0; i < num_entries; i++)
+   ptep[i] = pte;
 
if (!cfg->coherent_walk)
-   __arm_lpae_sync_pte(ptep, cfg);
+   __arm_lpae_sync_pte(ptep, num_entries, cfg);
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
@@ -255,47 +258,54 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
 
 static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
phys_addr_t paddr, arm_lpae_iopte prot,
-   int lvl, arm_lpae_iopte *ptep)
+   int lvl, int num_entries, arm_lpae_iopte *ptep)
 {
arm_lpae_iopte pte = prot;
+   struct io_pgtable_cfg *cfg = >iop.cfg;
+   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+   int i;
 
if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1)
pte |= ARM_LPAE_PTE_TYPE_PAGE;
else
pte |= ARM_LPAE_PTE_TYPE_BLOCK;
 
-   pte |= paddr_to_iopte(paddr, data);
+   for (i = 0; i < num_entries; i++)
+   ptep[i] = pte | paddr_to_iopte(paddr + i * sz, data);
 
-   __arm_lpae_set_pte(ptep, pte, >iop.cfg);
+   if (!cfg->coherent_walk)
+   __arm_lpae_sync_pte(ptep, num_entries, cfg);
 }
 
 static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 unsigned long iova, phys_addr_t paddr,
-arm_lpae_iopte prot, int lvl,
+arm_lpae_iopte prot, int lvl, int num_entries,
 arm_lpae_iopte *ptep)
 {
-   arm_lpae_iopte pte = *ptep;
-
-   if (iopte_leaf(pte, lvl, data->iop.fmt)) {
-   /* We require an unmap first */
-   WARN_ON(!selftest_running);
-   return -EEXIST;
-   } else if (iopte_type(pte) == ARM_LPAE_PTE_TYPE_TABLE) {
-   /*
-* We need to unmap and free the old table before
-* overwriting it with a block entry.
-*/
-   arm_lpae_iopte *tblp;
-   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
-
-   tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
-   if (__arm_lpae_unmap(data, NULL, iova, sz, lvl, tblp) != sz) {
-   WARN_ON(1);
-   return -EINVAL;
+   int i;
+
+   for (i = 0; i < num_entries; i++)
+   if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) {
+   /* We require an unmap first */
+   WARN_ON(!selftest_running);
+   return -EEXIST;
+   } else if (iopte_type(ptep[i]) == ARM_LPAE_PTE_TYPE_TABLE) {
+   /*
+* We need to unmap and free the old table before
+* overwriting it with a block entry.
+*/
+   arm_lpae_iopte *tblp;
+   size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
+
+   tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
+   if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz,
+lvl, tblp) != sz) {
+   WARN_ON(1);
+   return -EINVAL;
+   }
}
-   }
 
-   __arm_lpae_init_pte(data, paddr, prot, lvl, ptep);
+   __arm_lpae_init_pte(data, paddr, prot, lvl, num_entries, ptep);
return 0;
 }
 
@@ -323,7 

[PATCH v6 03/15] iommu/io-pgtable: Introduce map_pages() as a page table op

2021-06-15 Thread Georgi Djakov
From: "Isaac J. Manjarres" 

Mapping memory into io-pgtables follows the same semantics
that unmapping memory used to follow (i.e. a buffer will be
mapped one page block per call to the io-pgtable code). This
means that it can be optimized in the same way that unmapping
memory was, so add a map_pages() callback to the io-pgtable
ops structure, so that a range of pages of the same size
can be mapped within the same call.

Signed-off-by: Isaac J. Manjarres 
Suggested-by: Will Deacon 
Signed-off-by: Georgi Djakov 
---
 include/linux/io-pgtable.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 9391c5fa71e6..c43f3b899d2a 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -143,6 +143,7 @@ struct io_pgtable_cfg {
  * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers.
  *
  * @map:  Map a physically contiguous memory region.
+ * @map_pages:Map a physically contiguous range of pages of the same size.
  * @unmap:Unmap a physically contiguous memory region.
  * @unmap_pages:  Unmap a range of virtually contiguous pages of the same size.
  * @iova_to_phys: Translate iova to physical address.
@@ -153,6 +154,9 @@ struct io_pgtable_cfg {
 struct io_pgtable_ops {
int (*map)(struct io_pgtable_ops *ops, unsigned long iova,
   phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
+   int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova,
+phys_addr_t paddr, size_t pgsize, size_t pgcount,
+int prot, gfp_t gfp, size_t *mapped);
size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova,
size_t size, struct iommu_iotlb_gather *gather);
size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Plan for /dev/ioasid RFC v2

2021-06-15 Thread Alex Williamson
On Tue, 15 Jun 2021 01:21:35 +
"Tian, Kevin"  wrote:

> > From: Jason Gunthorpe 
> > Sent: Monday, June 14, 2021 9:38 PM
> > 
> > On Mon, Jun 14, 2021 at 03:09:31AM +, Tian, Kevin wrote:
> >   
> > > If a device can be always blocked from accessing memory in the IOMMU
> > > before it's bound to a driver or more specifically before the driver
> > > moves it to a new security context, then there is no need for VFIO
> > > to track whether IOASIDfd has taken over ownership of the DMA
> > > context for all devices within a group.  
> > 
> > I've been assuming we'd do something like this, where when a device is
> > first turned into a VFIO it tells the IOMMU layer that this device
> > should be DMA blocked unless an IOASID is attached to
> > it. Disconnecting an IOASID returns it to blocked.  
> 
> Or just make sure a device is in block-DMA when it's unbound from a
> driver or a security context. Then no need to explicitly tell IOMMU layer 
> to do so when it's bound to a new driver.
> 
> Currently the default domain type applies even when a device is not
> bound. This implies that if iommu=passthrough a device is always 
> allowed to access arbitrary system memory with or without a driver.
> I feel the current domain type (identity, dma, unmanged) should apply
> only when a driver is loaded...

Note that vfio does not currently require all devices in the group to
be bound to drivers.  Other devices within the group, those bound to
vfio drivers, can be used in this configuration.  This is not
necessarily recommended though as a non-vfio, non-stub driver binding
to one of those devices can trigger a BUG_ON.

> > > If this works I didn't see the need for vfio to keep the sequence.
> > > VFIO still keeps group fd to claim ownership of all devices in a
> > > group.  
> > 
> > As Alex says you still have to deal with the problem that device A in
> > a group can gain control of device B in the same group.  
> 
> There is no isolation in the group then how could vfio prevent device
> A from gaining control of device B? for example when both are attached
> to the same GPA address space with device MMIO bar included, devA
> can do p2p to devB. It's all user's policy how to deal with devices within
> the group. 

The latter is user policy, yes, but it's a system security issue that
the user cannot use device A to control device B if the user doesn't
have access to both devices, ie. doesn't own the group.  vfio would
prevent this by not allowing access to device A while device B is
insecure and would require that all devices within the group remain in
a secure, user owned state for the extent of access to device A.

> > This means device A and B can not be used from to two different
> > security contexts.  
> 
> It depends on how the security context is defined. From iommu layer
> p.o.v, an IOASID is a security context which isolates a device from
> the rest of the system (but not the sibling in the same group). As you
> suggested earlier, it's completely sane if an user wants to attach
> devices in a group to different IOASIDs. Here I just talk about this fact.

This is sane, yes, but that doesn't give us license to allow the user
to access device A regardless of the state of device B.

> > 
> > If the /dev/iommu FD is the security context then the tracking is
> > needed there.
> >   
> 
> As I replied to Alex, my point is that VFIO doesn't need to know the
> attaching status of each device in a group before it can allow user to
> access a device. As long as a device in a group either in block DMA
> or switch to a new address space created via /dev/iommu FD, there's
> no problem to allow user accessing it. User cannot do harm to the
> world outside of the group. User knows there is no isolation within
> the group. that is it.

This is self contradictory, "vfio doesn't need to know the attachment
status"... "[a]s long as a device in a group either in block DMA or
switch to a new address space".  So vfio does need to know the latter.
How does it know that?  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: Plan for /dev/ioasid RFC v2

2021-06-15 Thread Alex Williamson
On Tue, 15 Jun 2021 02:31:39 +
"Tian, Kevin"  wrote:

> > From: Alex Williamson 
> > Sent: Tuesday, June 15, 2021 12:28 AM
> >   
> [...]
> > > IOASID. Today the group fd requires an IOASID before it hands out a
> > > device_fd. With iommu_fd the device_fd will not allow IOCTLs until it
> > > has a blocked DMA IOASID and is successefully joined to an iommu_fd.  
> > 
> > Which is the root of my concern.  Who owns ioctls to the device fd?
> > It's my understanding this is a vfio provided file descriptor and it's
> > therefore vfio's responsibility.  A device-level IOASID interface
> > therefore requires that vfio manage the group aspect of device access.
> > AFAICT, that means that device access can therefore only begin when all
> > devices for a given group are attached to the IOASID and must halt for
> > all devices in the group if any device is ever detached from an IOASID,
> > even temporarily.  That suggests a lot more oversight of the IOASIDs by
> > vfio than I'd prefer.
> >   
> 
> This is possibly the point that is worthy of more clarification and
> alignment, as it sounds like the root of controversy here.
> 
> I feel the goal of vfio group management is more about ownership, i.e. 
> all devices within a group must be assigned to a single user. Following
> the three rules defined by Jason, what we really care is whether a group
> of devices can be isolated from the rest of the world, i.e. no access to
> memory/device outside of its security context and no access to its 
> security context from devices outside of this group. This can be achieved
> as long as every device in the group is either in block-DMA state when 
> it's not attached to any security context or attached to an IOASID context 
> in IOMMU fd.
> 
> As long as group-level isolation is satisfied, how devices within a group 
> are further managed is decided by the user (unattached, all attached to 
> same IOASID, attached to different IOASIDs) as long as the user 
> understands the implication of lacking of isolation within the group. This 
> is what a device-centric model comes to play. Misconfiguration just hurts 
> the user itself.
> 
> If this rationale can be agreed, then I didn't see the point of having VFIO
> to mandate all devices in the group must be attached/detached in
> lockstep. 

In theory this sounds great, but there are still too many assumptions
and too much hand waving about where isolation occurs for me to feel
like I really have the complete picture.  So let's walk through some
examples.  Please fill in and correct where I'm wrong.

1) A dual-function PCIe e1000e NIC where the functions are grouped
   together due to ACS isolation issues.

   a) Initial state: functions 0 & 1 are both bound to e1000e driver.

   b) Admin uses driverctl to bind function 1 to vfio-pci, creating
  vfio device file, which is chmod'd to grant to a user.

   c) User opens vfio function 1 device file and an iommu_fd, binds
   device_fd to iommu_fd.

   Does this succeed?
 - if no, specifically where does it fail?
 - if yes, vfio can now allow access to the device?

   d) Repeat b) for function 0.

   e) Repeat c), still using function 1, is it different?  Where?  Why?

2) The same NIC as 1)

   a) Initial state: functions 0 & 1 bound to vfio-pci, vfio device
  files granted to user, user has bound both device_fds to the same
  iommu_fd.

   AIUI, even though not bound to an IOASID, vfio can now enable access
   through the device_fds, right?  What specific entity has placed these
   devices into a block DMA state, when, and how?

   b) Both devices are attached to the same IOASID.

   Are we assuming that each device was atomically moved to the new
   IOMMU context by the IOASID code?  What if the IOMMU cannot change
   the domain atomically?

   c) The device_fd for function 1 is detached from the IOASID.

   Are we assuming the reverse of b) performed by the IOASID code?

   d) The device_fd for function 1 is unbound from the iommu_fd.

   Does this succeed?
 - if yes, what is the resulting IOMMU context of the device and
   who owns it?
 - if no, well, that results in numerous tear-down issues.

   e) Function 1 is unbound from vfio-pci.

   Does this work or is it blocked?  If blocked, by what entity
   specifically?

   f) Function 1 is bound to e1000e driver.

   We clearly have a violation here, specifically where and by who in
   this path should have prevented us from getting here or who pushes
   the BUG_ON to abort this?

3) A dual-function conventional PCI e1000 NIC where the functions are
   grouped together due to shared RID.

   a) Repeat 2.a) and 2.b) such that we have a valid, user accessible
  devices in the same IOMMU context.

   b) Function 1 is detached from the IOASID.

   I think function 1 cannot be placed into a different IOMMU context
   here, does the detach work?  What's the IOMMU context now?

   c) A new IOASID is alloc'd within the existing iommu_fd and function
  

Re: [RFC PATCH V3 08/11] swiotlb: Add bounce buffer remap address setting function

2021-06-15 Thread Tianyu Lan

On 6/14/2021 11:32 PM, Christoph Hellwig wrote:

On Mon, Jun 14, 2021 at 02:49:51PM +0100, Robin Murphy wrote:

FWIW, I think a better generalisation for this would be allowing
set_memory_decrypted() to return an address rather than implicitly
operating in-place, and hide all the various hypervisor hooks behind that.


Yes, something like that would be a good idea.  As-is
set_memory_decrypted is a pretty horribly API anyway due to passing
the address as void, and taking a size parameter while it works in units
of pages.  So I'd very much welcome a major overhaul of this API.



Hi Christoph and Robin:
	Thanks for your suggestion. I will try this idea in the next version. 
Besides make the address translation into set_memory_
decrypted() and return address, do you want to make other changes to the 
API in order to make it more reasonable(e.g size parameter)?


Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] /dev/ioasid uAPI proposal

2021-06-15 Thread Jason Gunthorpe
On Tue, Jun 15, 2021 at 08:59:25AM +, Tian, Kevin wrote:
> Hi, Jason,
> 
> > From: Jason Gunthorpe
> > Sent: Thursday, June 3, 2021 9:05 PM
> > 
> > On Thu, Jun 03, 2021 at 06:39:30AM +, Tian, Kevin wrote:
> > > > > Two helper functions are provided to support VFIO_ATTACH_IOASID:
> > > > >
> > > > >   struct attach_info {
> > > > >   u32 ioasid;
> > > > >   // If valid, the PASID to be used physically
> > > > >   u32 pasid;
> > > > >   };
> > > > >   int ioasid_device_attach(struct ioasid_dev *dev,
> > > > >   struct attach_info info);
> > > > >   int ioasid_device_detach(struct ioasid_dev *dev, u32 ioasid);
> > > >
> > > > Honestly, I still prefer this to be highly explicit as this is where
> > > > all device driver authors get invovled:
> > > >
> > > > ioasid_pci_device_attach(struct pci_device *pdev, struct ioasid_dev 
> > > > *dev,
> > > > u32 ioasid);
> > > > ioasid_pci_device_pasid_attach(struct pci_device *pdev, u32
> > *physical_pasid,
> > > > struct ioasid_dev *dev, u32 ioasid);
> > >
> > > Then better naming it as pci_device_attach_ioasid since the 1st parameter
> > > is struct pci_device?
> > 
> > No, the leading tag indicates the API's primary subystem, in this case
> > it is iommu (and if you prefer list the iommu related arguments first)
> > 
> 
> I have a question on this suggestion when working on v2.
> 
> Within IOMMU fd it uses only the generic struct device pointer, which
> is already saved in struct ioasid_dev at device bind time:
> 
>   struct ioasid_dev *ioasid_register_device(struct ioasid_ctx *ctx,
>   struct device *device, u64 device_label);
> 
> What does this additional struct pci_device bring when it's specified in
> the attach call? If we save it in attach_data, at which point will it be
> used or checked? 

The above was for attaching to an ioasid not the register path

You should call 'device_label' 'device_cookie' if it is a user
provided u64

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH V3 10/11] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-06-15 Thread Tianyu Lan

On 6/14/2021 11:33 PM, Christoph Hellwig wrote:

On Mon, Jun 14, 2021 at 10:04:06PM +0800, Tianyu Lan wrote:

The pages in the hv_page_buffer array here are in the kernel linear
mapping. The packet sent to host will contain an array which contains
transaction data. In the isolation VM, data in the these pages needs to be
copied to bounce buffer and so call dma_map_single() here to map these data
pages with bounce buffer. The vmbus has ring buffer where the send/receive
packets are copied to/from. The ring buffer has been remapped to the extra
space above shared gpa boundary/vTom during probing Netvsc driver and so
not call dma map function for vmbus ring
buffer.


So why do we have all that PFN magic instead of using struct page or
the usual kernel I/O buffers that contain a page pointer?



These PFNs originally is part of Hyper-V protocol data and will be sent
to host. Host accepts these GFN and copy data from/to guest memory. The 
translation from va to pa is done by caller that populates the 
hv_page_buffer array. I will try calling dma map function before 
populating struct hv_page_buffer and this can avoid redundant 
translation between PA and VA.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 10/10] Documentation: Add documentation for VDUSE

2021-06-15 Thread Xie Yongji
VDUSE (vDPA Device in Userspace) is a framework to support
implementing software-emulated vDPA devices in userspace. This
document is intended to clarify the VDUSE design and usage.

Signed-off-by: Xie Yongji 
---
 Documentation/userspace-api/index.rst |   1 +
 Documentation/userspace-api/vduse.rst | 222 ++
 2 files changed, 223 insertions(+)
 create mode 100644 Documentation/userspace-api/vduse.rst

diff --git a/Documentation/userspace-api/index.rst 
b/Documentation/userspace-api/index.rst
index 0b5eefed027e..c432be070f67 100644
--- a/Documentation/userspace-api/index.rst
+++ b/Documentation/userspace-api/index.rst
@@ -27,6 +27,7 @@ place where this information is gathered.
iommu
media/index
sysfs-platform_profile
+   vduse
 
 .. only::  subproject and html
 
diff --git a/Documentation/userspace-api/vduse.rst 
b/Documentation/userspace-api/vduse.rst
new file mode 100644
index ..2f9cd1a4e530
--- /dev/null
+++ b/Documentation/userspace-api/vduse.rst
@@ -0,0 +1,222 @@
+==
+VDUSE - "vDPA Device in Userspace"
+==
+
+vDPA (virtio data path acceleration) device is a device that uses a
+datapath which complies with the virtio specifications with vendor
+specific control path. vDPA devices can be both physically located on
+the hardware or emulated by software. VDUSE is a framework that makes it
+possible to implement software-emulated vDPA devices in userspace. And
+to make it simple, the emulated vDPA device's control path is handled in
+the kernel and only the data path is implemented in the userspace.
+
+Note that only virtio block device is supported by VDUSE framework now,
+which can reduce security risks when the userspace process that implements
+the data path is run by an unprivileged user. The Support for other device
+types can be added after the security issue is clarified or fixed in the 
future.
+
+Start/Stop VDUSE devices
+
+
+VDUSE devices are started as follows:
+
+1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on
+   /dev/vduse/control.
+
+2. Begin processing VDUSE messages from /dev/vduse/$NAME. The first
+   messages will arrive while attaching the VDUSE instance to vDPA bus.
+
+3. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE
+   instance to vDPA bus.
+
+VDUSE devices are stopped as follows:
+
+1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE
+   instance from vDPA bus.
+
+2. Close the file descriptor referring to /dev/vduse/$NAME
+
+3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on
+   /dev/vduse/control
+
+The netlink messages metioned above can be sent via vdpa tool in iproute2
+or use the below sample codes:
+
+.. code-block:: c
+
+   static int netlink_add_vduse(const char *name, enum vdpa_command cmd)
+   {
+   struct nl_sock *nlsock;
+   struct nl_msg *msg;
+   int famid;
+
+   nlsock = nl_socket_alloc();
+   if (!nlsock)
+   return -ENOMEM;
+
+   if (genl_connect(nlsock))
+   goto free_sock;
+
+   famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME);
+   if (famid < 0)
+   goto close_sock;
+
+   msg = nlmsg_alloc();
+   if (!msg)
+   goto close_sock;
+
+   if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, 
cmd, 0))
+   goto nla_put_failure;
+
+   NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name);
+   if (cmd == VDPA_CMD_DEV_NEW)
+   NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, 
"vduse");
+
+   if (nl_send_sync(nlsock, msg))
+   goto close_sock;
+
+   nl_close(nlsock);
+   nl_socket_free(nlsock);
+
+   return 0;
+   nla_put_failure:
+   nlmsg_free(msg);
+   close_sock:
+   nl_close(nlsock);
+   free_sock:
+   nl_socket_free(nlsock);
+   return -1;
+   }
+
+How VDUSE works
+---
+
+Since the emuldated vDPA device's control path is handled in the kernel,
+a message-based communication protocol and few types of control messages
+are introduced by VDUSE framework to make userspace be aware of the data
+path related changes:
+
+- VDUSE_GET_VQ_STATE: Get the state for virtqueue from userspace
+
+- VDUSE_START_DATAPLANE: Notify userspace to start the dataplane
+
+- VDUSE_STOP_DATAPLANE: Notify userspace to stop the dataplane
+
+- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping in device 
IOTLB
+
+Userspace needs to read()/write() on /dev/vduse/$NAME to receive/reply
+those control messages from/to VDUSE kernel module as follows:
+
+.. code-block:: c
+
+   static int vduse_message_handler(int dev_fd)
+   {
+   int len;
+

[PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-15 Thread Xie Yongji
This VDUSE driver enables implementing vDPA devices in userspace.
The vDPA device's control path is handled in kernel and the data
path is handled in userspace.

A message mechnism is used by VDUSE driver to forward some control
messages such as starting/stopping datapath to userspace. Userspace
can use read()/write() to receive/reply those control messages.

And some ioctls are introduced to help userspace to implement the
data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file
descriptors referring to vDPA device's iova regions. Then userspace
can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES
and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features
and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD
ioctls can be used to inject interrupt and setup the kickfd for
virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the
configuration space and inject a config interrupt.

Signed-off-by: Xie Yongji 
---
 Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
 drivers/vdpa/Kconfig   |   10 +
 drivers/vdpa/Makefile  |1 +
 drivers/vdpa/vdpa_user/Makefile|5 +
 drivers/vdpa/vdpa_user/vduse_dev.c | 1453 
 include/uapi/linux/vduse.h |  143 ++
 6 files changed, 1613 insertions(+)
 create mode 100644 drivers/vdpa/vdpa_user/Makefile
 create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
 create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 9bfc2b510c64..acd95e9dcfe7 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
 'z'   10-4F  drivers/s390/crypto/zcrypt_api.hconflict!
 '|'   00-7F  linux/media.h
 0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
 0x89  00-06  arch/x86/include/asm/sockios.h
 0x89  0B-DF  linux/sockios.h
 0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index a503c1b2bfd9..6e23bce6433a 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
  vDPA block device simulator which terminates IO request in a
  memory buffer.
 
+config VDPA_USER
+   tristate "VDUSE (vDPA Device in Userspace) support"
+   depends on EVENTFD && MMU && HAS_DMA
+   select DMA_OPS
+   select VHOST_IOTLB
+   select IOMMU_IOVA
+   help
+ With VDUSE it is possible to emulate a vDPA Device
+ in a userspace program.
+
 config IFCVF
tristate "Intel IFC VF vDPA driver"
depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 67fe7f3d6943..f02ebed33f19 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_VDPA) += vdpa.o
 obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
 obj-$(CONFIG_IFCVF)+= ifcvf/
 obj-$(CONFIG_MLX5_VDPA) += mlx5/
 obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..5271cbd15e28
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1453 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024)
+#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
+#define VDUSE_REQUEST_TIMEOUT 30
+
+struct vduse_virtqueue {
+   u16 index;
+   u32 num;
+   u32 avail_idx;
+   u64 desc_addr;
+   u64 driver_addr;
+   u64 device_addr;
+   bool ready;
+   bool kicked;
+   spinlock_t kick_lock;
+   spinlock_t irq_lock;
+   struct eventfd_ctx *kickfd;
+   struct vdpa_callback cb;
+   struct work_struct 

[PATCH v8 08/10] vduse: Implement an MMU-based IOMMU driver

2021-06-15 Thread Xie Yongji
This implements an MMU-based IOMMU driver to support mapping
kernel dma buffer into userspace. The basic idea behind it is
treating MMU (VA->PA) as IOMMU (IOVA->PA). The driver will set
up MMU mapping instead of IOMMU mapping for the DMA transfer so
that the userspace process is able to use its virtual address to
access the dma buffer in kernel.

And to avoid security issue, a bounce-buffering mechanism is
introduced to prevent userspace accessing the original buffer
directly.

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vdpa/vdpa_user/iova_domain.c | 545 +++
 drivers/vdpa/vdpa_user/iova_domain.h |  73 +
 2 files changed, 618 insertions(+)
 create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c
 create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h

diff --git a/drivers/vdpa/vdpa_user/iova_domain.c 
b/drivers/vdpa/vdpa_user/iova_domain.c
new file mode 100644
index ..ad45026f5423
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -0,0 +1,545 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * MMU-based IOMMU implementation
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+static int vduse_iotlb_add_range(struct vduse_iova_domain *domain,
+u64 start, u64 last,
+u64 addr, unsigned int perm,
+struct file *file, u64 offset)
+{
+   struct vdpa_map_file *map_file;
+   int ret;
+
+   map_file = kmalloc(sizeof(*map_file), GFP_ATOMIC);
+   if (!map_file)
+   return -ENOMEM;
+
+   map_file->file = get_file(file);
+   map_file->offset = offset;
+
+   ret = vhost_iotlb_add_range_ctx(domain->iotlb, start, last,
+   addr, perm, map_file);
+   if (ret) {
+   fput(map_file->file);
+   kfree(map_file);
+   return ret;
+   }
+   return 0;
+}
+
+static void vduse_iotlb_del_range(struct vduse_iova_domain *domain,
+ u64 start, u64 last)
+{
+   struct vdpa_map_file *map_file;
+   struct vhost_iotlb_map *map;
+
+   while ((map = vhost_iotlb_itree_first(domain->iotlb, start, last))) {
+   map_file = (struct vdpa_map_file *)map->opaque;
+   fput(map_file->file);
+   kfree(map_file);
+   vhost_iotlb_map_free(domain->iotlb, map);
+   }
+}
+
+int vduse_domain_set_map(struct vduse_iova_domain *domain,
+struct vhost_iotlb *iotlb)
+{
+   struct vdpa_map_file *map_file;
+   struct vhost_iotlb_map *map;
+   u64 start = 0ULL, last = ULLONG_MAX;
+   int ret;
+
+   spin_lock(>iotlb_lock);
+   vduse_iotlb_del_range(domain, start, last);
+
+   for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+map = vhost_iotlb_itree_next(map, start, last)) {
+   map_file = (struct vdpa_map_file *)map->opaque;
+   ret = vduse_iotlb_add_range(domain, map->start, map->last,
+   map->addr, map->perm,
+   map_file->file,
+   map_file->offset);
+   if (ret)
+   goto err;
+   }
+   spin_unlock(>iotlb_lock);
+
+   return 0;
+err:
+   vduse_iotlb_del_range(domain, start, last);
+   spin_unlock(>iotlb_lock);
+   return ret;
+}
+
+void vduse_domain_clear_map(struct vduse_iova_domain *domain,
+   struct vhost_iotlb *iotlb)
+{
+   struct vhost_iotlb_map *map;
+   u64 start = 0ULL, last = ULLONG_MAX;
+
+   spin_lock(>iotlb_lock);
+   for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+map = vhost_iotlb_itree_next(map, start, last)) {
+   vduse_iotlb_del_range(domain, map->start, map->last);
+   }
+   spin_unlock(>iotlb_lock);
+}
+
+static int vduse_domain_map_bounce_page(struct vduse_iova_domain *domain,
+u64 iova, u64 size, u64 paddr)
+{
+   struct vduse_bounce_map *map;
+   u64 last = iova + size - 1;
+
+   while (iova <= last) {
+   map = >bounce_maps[iova >> PAGE_SHIFT];
+   if (!map->bounce_page) {
+   map->bounce_page = alloc_page(GFP_ATOMIC);
+   if (!map->bounce_page)
+   return -ENOMEM;
+   }
+   map->orig_phys = paddr;
+   paddr += PAGE_SIZE;
+   iova += PAGE_SIZE;
+   }
+   return 0;
+}
+
+static void vduse_domain_unmap_bounce_page(struct vduse_iova_domain *domain,
+  u64 iova, u64 size)
+{
+   struct 

[PATCH v8 07/10] vdpa: Support transferring virtual addressing during DMA mapping

2021-06-15 Thread Xie Yongji
This patch introduces an attribute for vDPA device to indicate
whether virtual address can be used. If vDPA device driver set
it, vhost-vdpa bus driver will not pin user page and transfer
userspace virtual address instead of physical address during
DMA mapping. And corresponding vma->vm_file and offset will be
also passed as an opaque pointer.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vdpa/ifcvf/ifcvf_main.c   |  2 +-
 drivers/vdpa/mlx5/net/mlx5_vnet.c |  2 +-
 drivers/vdpa/vdpa.c   |  9 +++-
 drivers/vdpa/vdpa_sim/vdpa_sim.c  |  2 +-
 drivers/vdpa/virtio_pci/vp_vdpa.c |  2 +-
 drivers/vhost/vdpa.c  | 99 ++-
 include/linux/vdpa.h  | 19 ++--
 7 files changed, 116 insertions(+), 19 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index ab0ab5cf0f6e..daf9746e51e6 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -476,7 +476,7 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
}
 
adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa,
-   dev, _vdpa_ops, NULL);
+   dev, _vdpa_ops, NULL, false);
if (adapter == NULL) {
IFCVF_ERR(pdev, "Failed to allocate vDPA structure");
return -ENOMEM;
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c 
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index dda5dc6f7737..2b7ca111f039 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2012,7 +2012,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev 
*v_mdev, const char *name)
max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
 
ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, 
mdev->device, _vdpa_ops,
-name);
+name, false);
if (IS_ERR(ndev))
return PTR_ERR(ndev);
 
diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c
index bb3f1d1f0422..8f01d6a7ecc5 100644
--- a/drivers/vdpa/vdpa.c
+++ b/drivers/vdpa/vdpa.c
@@ -71,6 +71,7 @@ static void vdpa_release_dev(struct device *d)
  * @config: the bus operations that is supported by this device
  * @size: size of the parent structure that contains private data
  * @name: name of the vdpa device; optional.
+ * @use_va: indicate whether virtual address must be used by this device
  *
  * Driver should use vdpa_alloc_device() wrapper macro instead of
  * using this directly.
@@ -80,7 +81,8 @@ static void vdpa_release_dev(struct device *d)
  */
 struct vdpa_device *__vdpa_alloc_device(struct device *parent,
const struct vdpa_config_ops *config,
-   size_t size, const char *name)
+   size_t size, const char *name,
+   bool use_va)
 {
struct vdpa_device *vdev;
int err = -EINVAL;
@@ -91,6 +93,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
*parent,
if (!!config->dma_map != !!config->dma_unmap)
goto err;
 
+   /* It should only work for the device that use on-chip IOMMU */
+   if (use_va && !(config->dma_map || config->set_map))
+   goto err;
+
err = -ENOMEM;
vdev = kzalloc(size, GFP_KERNEL);
if (!vdev)
@@ -106,6 +112,7 @@ struct vdpa_device *__vdpa_alloc_device(struct device 
*parent,
vdev->index = err;
vdev->config = config;
vdev->features_valid = false;
+   vdev->use_va = use_va;
 
if (name)
err = dev_set_name(>dev, "%s", name);
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index efd0cb3d964d..a43479cf57ea 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -250,7 +250,7 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr 
*dev_attr)
ops = _config_ops;
 
vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops,
-   dev_attr->name);
+   dev_attr->name, false);
if (!vdpasim)
goto err_alloc;
 
diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c 
b/drivers/vdpa/virtio_pci/vp_vdpa.c
index c76ebb531212..f907f42e83bb 100644
--- a/drivers/vdpa/virtio_pci/vp_vdpa.c
+++ b/drivers/vdpa/virtio_pci/vp_vdpa.c
@@ -399,7 +399,7 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
return ret;
 
vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa,
-   dev, _vdpa_ops, NULL);
+   dev, _vdpa_ops, NULL, false);
if (vp_vdpa == NULL) {
dev_err(dev, "vp_vdpa: Failed to allocate vDPA structure\n");

[PATCH v8 06/10] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()

2021-06-15 Thread Xie Yongji
The upcoming patch is going to support VA mapping/unmapping.
So let's factor out the logic of PA mapping/unmapping firstly
to make the code more readable.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vhost/vdpa.c | 53 +---
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 1d5c5c6b6d5d..c5ec45b920f8 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -498,7 +498,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
return r;
 }
 
-static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last)
+static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last)
 {
struct vhost_dev *dev = >vdev;
struct vhost_iotlb *iotlb = dev->iotlb;
@@ -520,6 +520,11 @@ static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, 
u64 start, u64 last)
}
 }
 
+static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last)
+{
+   return vhost_vdpa_pa_unmap(v, start, last);
+}
+
 static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v)
 {
struct vhost_dev *dev = >vdev;
@@ -600,37 +605,28 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 
iova, u64 size)
}
 }
 
-static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
-  struct vhost_iotlb_msg *msg)
+static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
+u64 iova, u64 size, u64 uaddr, u32 perm)
 {
struct vhost_dev *dev = >vdev;
-   struct vhost_iotlb *iotlb = dev->iotlb;
struct page **page_list;
unsigned long list_size = PAGE_SIZE / sizeof(struct page *);
unsigned int gup_flags = FOLL_LONGTERM;
unsigned long npages, cur_base, map_pfn, last_pfn = 0;
unsigned long lock_limit, sz2pin, nchunks, i;
-   u64 iova = msg->iova;
+   u64 start = iova;
long pinned;
int ret = 0;
 
-   if (msg->iova < v->range.first ||
-   msg->iova + msg->size - 1 > v->range.last)
-   return -EINVAL;
-
-   if (vhost_iotlb_itree_first(iotlb, msg->iova,
-   msg->iova + msg->size - 1))
-   return -EEXIST;
-
/* Limit the use of memory for bookkeeping */
page_list = (struct page **) __get_free_page(GFP_KERNEL);
if (!page_list)
return -ENOMEM;
 
-   if (msg->perm & VHOST_ACCESS_WO)
+   if (perm & VHOST_ACCESS_WO)
gup_flags |= FOLL_WRITE;
 
-   npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;
+   npages = PAGE_ALIGN(size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;
if (!npages) {
ret = -EINVAL;
goto free;
@@ -644,7 +640,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
goto unlock;
}
 
-   cur_base = msg->uaddr & PAGE_MASK;
+   cur_base = uaddr & PAGE_MASK;
iova &= PAGE_MASK;
nchunks = 0;
 
@@ -675,7 +671,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT;
ret = vhost_vdpa_map(v, iova, csize,
 map_pfn << PAGE_SHIFT,
-msg->perm);
+perm);
if (ret) {
/*
 * Unpin the pages that are left 
unmapped
@@ -704,7 +700,7 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
 
/* Pin the rest chunk */
ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT,
-map_pfn << PAGE_SHIFT, msg->perm);
+map_pfn << PAGE_SHIFT, perm);
 out:
if (ret) {
if (nchunks) {
@@ -723,13 +719,32 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
for (pfn = map_pfn; pfn <= last_pfn; pfn++)
unpin_user_page(pfn_to_page(pfn));
}
-   vhost_vdpa_unmap(v, msg->iova, msg->size);
+   vhost_vdpa_unmap(v, start, size);
}
 unlock:
mmap_read_unlock(dev->mm);
 free:
free_page((unsigned long)page_list);
return ret;
+
+}
+
+static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,
+  struct vhost_iotlb_msg *msg)
+{
+   struct vhost_dev *dev = >vdev;
+   struct vhost_iotlb *iotlb = dev->iotlb;
+
+   if (msg->iova < v->range.first ||
+   msg->iova + msg->size - 1 > v->range.last)
+   return -EINVAL;
+
+   if (vhost_iotlb_itree_first(iotlb, msg->iova,
+   

[PATCH v8 05/10] vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()

2021-06-15 Thread Xie Yongji
Add an opaque pointer for DMA mapping.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 6 +++---
 drivers/vhost/vdpa.c | 2 +-
 include/linux/vdpa.h | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 98f793bc9376..efd0cb3d964d 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -542,14 +542,14 @@ static int vdpasim_set_map(struct vdpa_device *vdpa,
 }
 
 static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size,
-  u64 pa, u32 perm)
+  u64 pa, u32 perm, void *opaque)
 {
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
int ret;
 
spin_lock(>iommu_lock);
-   ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa,
-   perm);
+   ret = vhost_iotlb_add_range_ctx(vdpasim->iommu, iova, iova + size - 1,
+   pa, perm, opaque);
spin_unlock(>iommu_lock);
 
return ret;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index fb41db3da611..1d5c5c6b6d5d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -565,7 +565,7 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
return r;
 
if (ops->dma_map) {
-   r = ops->dma_map(vdpa, iova, size, pa, perm);
+   r = ops->dma_map(vdpa, iova, size, pa, perm, NULL);
} else if (ops->set_map) {
if (!v->in_batch)
r = ops->set_map(vdpa, dev->iotlb);
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index f311d227aa1b..281f768cb597 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -245,7 +245,7 @@ struct vdpa_config_ops {
/* DMA ops */
int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size,
-  u64 pa, u32 perm);
+  u64 pa, u32 perm, void *opaque);
int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size);
 
/* Free device resources */
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 04/10] vhost-iotlb: Add an opaque pointer for vhost IOTLB

2021-06-15 Thread Xie Yongji
Add an opaque pointer for vhost IOTLB. And introduce
vhost_iotlb_add_range_ctx() to accept it.

Suggested-by: Jason Wang 
Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 drivers/vhost/iotlb.c   | 20 
 include/linux/vhost_iotlb.h |  3 +++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
index 0fd3f87e913c..5c99e1112cbb 100644
--- a/drivers/vhost/iotlb.c
+++ b/drivers/vhost/iotlb.c
@@ -36,19 +36,21 @@ void vhost_iotlb_map_free(struct vhost_iotlb *iotlb,
 EXPORT_SYMBOL_GPL(vhost_iotlb_map_free);
 
 /**
- * vhost_iotlb_add_range - add a new range to vhost IOTLB
+ * vhost_iotlb_add_range_ctx - add a new range to vhost IOTLB
  * @iotlb: the IOTLB
  * @start: start of the IOVA range
  * @last: last of IOVA range
  * @addr: the address that is mapped to @start
  * @perm: access permission of this range
+ * @opaque: the opaque pointer for the new mapping
  *
  * Returns an error last is smaller than start or memory allocation
  * fails
  */
-int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
- u64 start, u64 last,
- u64 addr, unsigned int perm)
+int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb,
+ u64 start, u64 last,
+ u64 addr, unsigned int perm,
+ void *opaque)
 {
struct vhost_iotlb_map *map;
 
@@ -71,6 +73,7 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
map->last = last;
map->addr = addr;
map->perm = perm;
+   map->opaque = opaque;
 
iotlb->nmaps++;
vhost_iotlb_itree_insert(map, >root);
@@ -80,6 +83,15 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
 
return 0;
 }
+EXPORT_SYMBOL_GPL(vhost_iotlb_add_range_ctx);
+
+int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
+ u64 start, u64 last,
+ u64 addr, unsigned int perm)
+{
+   return vhost_iotlb_add_range_ctx(iotlb, start, last,
+addr, perm, NULL);
+}
 EXPORT_SYMBOL_GPL(vhost_iotlb_add_range);
 
 /**
diff --git a/include/linux/vhost_iotlb.h b/include/linux/vhost_iotlb.h
index 6b09b786a762..2d0e2f52f938 100644
--- a/include/linux/vhost_iotlb.h
+++ b/include/linux/vhost_iotlb.h
@@ -17,6 +17,7 @@ struct vhost_iotlb_map {
u32 perm;
u32 flags_padding;
u64 __subtree_last;
+   void *opaque;
 };
 
 #define VHOST_IOTLB_FLAG_RETIRE 0x1
@@ -29,6 +30,8 @@ struct vhost_iotlb {
unsigned int flags;
 };
 
+int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, u64 start, u64 last,
+ u64 addr, unsigned int perm, void *opaque);
 int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last,
  u64 addr, unsigned int perm);
 void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 03/10] eventfd: Increase the recursion depth of eventfd_signal()

2021-06-15 Thread Xie Yongji
Increase the recursion depth of eventfd_signal() to 1. This
is the maximum recursion depth we have found so far, which
can be triggered with the following call chain:

kvm_io_bus_write[kvm]
  --> ioeventfd_write   [kvm]
--> eventfd_signal  [eventfd]
  --> vhost_poll_wakeup [vhost]
--> vduse_vdpa_kick_vq  [vduse]
  --> eventfd_signal[eventfd]

Signed-off-by: Xie Yongji 
Acked-by: Jason Wang 
---
 fs/eventfd.c| 2 +-
 include/linux/eventfd.h | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index e265b6dd4f34..cc7cd1dbedd3 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -71,7 +71,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
 * it returns true, the eventfd_signal() call should be deferred to a
 * safe context.
 */
-   if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
+   if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH))
return 0;
 
spin_lock_irqsave(>wqh.lock, flags);
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index fa0a524baed0..886d99cd38ef 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -29,6 +29,9 @@
 #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
 #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
 
+/* Maximum recursion depth */
+#define EFD_WAKE_DEPTH 1
+
 struct eventfd_ctx;
 struct file;
 
@@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count);
 
 static inline bool eventfd_signal_count(void)
 {
-   return this_cpu_read(eventfd_wake_count);
+   return this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH;
 }
 
 #else /* CONFIG_EVENTFD */
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 02/10] file: Export receive_fd() to modules

2021-06-15 Thread Xie Yongji
Export receive_fd() so that some modules can use
it to pass file descriptor between processes without
missing any security stuffs.

Signed-off-by: Xie Yongji 
---
 fs/file.c| 6 ++
 include/linux/file.h | 7 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 86dc9956af32..210e540672aa 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1134,6 +1134,12 @@ int receive_fd_replace(int new_fd, struct file *file, 
unsigned int o_flags)
return new_fd;
 }
 
+int receive_fd(struct file *file, unsigned int o_flags)
+{
+   return __receive_fd(file, NULL, o_flags);
+}
+EXPORT_SYMBOL_GPL(receive_fd);
+
 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags)
 {
int err = -EBADF;
diff --git a/include/linux/file.h b/include/linux/file.h
index 2de2e4613d7b..51e830b4fe3a 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -94,6 +94,9 @@ extern void fd_install(unsigned int fd, struct file *file);
 
 extern int __receive_fd(struct file *file, int __user *ufd,
unsigned int o_flags);
+
+extern int receive_fd(struct file *file, unsigned int o_flags);
+
 static inline int receive_fd_user(struct file *file, int __user *ufd,
  unsigned int o_flags)
 {
@@ -101,10 +104,6 @@ static inline int receive_fd_user(struct file *file, int 
__user *ufd,
return -EFAULT;
return __receive_fd(file, ufd, o_flags);
 }
-static inline int receive_fd(struct file *file, unsigned int o_flags)
-{
-   return __receive_fd(file, NULL, o_flags);
-}
 int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags);
 
 extern void flush_delayed_fput(void);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 01/10] iova: Export alloc_iova_fast() and free_iova_fast();

2021-06-15 Thread Xie Yongji
Export alloc_iova_fast() and free_iova_fast() so that
some modules can use it to improve iova allocation efficiency.

Signed-off-by: Xie Yongji 
---
 drivers/iommu/iova.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7ecd5b08039..59916b4b7fe9 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -518,6 +518,7 @@ alloc_iova_fast(struct iova_domain *iovad, unsigned long 
size,
 
return new_iova->pfn_lo;
 }
+EXPORT_SYMBOL_GPL(alloc_iova_fast);
 
 /**
  * free_iova_fast - free iova pfn range into rcache
@@ -535,6 +536,7 @@ free_iova_fast(struct iova_domain *iovad, unsigned long 
pfn, unsigned long size)
 
free_iova(iovad, pfn);
 }
+EXPORT_SYMBOL_GPL(free_iova_fast);
 
 #define fq_ring_for_each(i, fq) \
for ((i) = (fq)->head; (i) != (fq)->tail; (i) = ((i) + 1) % 
IOVA_FQ_SIZE)
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v8 00/10] Introduce VDUSE - vDPA Device in Userspace

2021-06-15 Thread Xie Yongji
This series introduces a framework that makes it possible to implement
software-emulated vDPA devices in userspace. And to make it simple, the
emulated vDPA device's control path is handled in the kernel and only the
data path is implemented in the userspace.

Since the emuldated vDPA device's control path is handled in the kernel,
a message mechnism is introduced to make userspace be aware of the data
path related changes. Userspace can use read()/write() to receive/reply
the control messages.

In the data path, the core is mapping dma buffer into VDUSE daemon's
address space, which can be implemented in different ways depending on
the vdpa bus to which the vDPA device is attached.

In virtio-vdpa case, we implements a MMU-based on-chip IOMMU driver with
bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma
buffer is reside in a userspace memory region which can be shared to the
VDUSE userspace processs via transferring the shmfd.

The details and our user case is shown below:

-   
--
|Container ||  QEMU(VM) |   |   
VDUSE daemon |
|   -  ||  ---  |   | 
-  |
|   |dev/vdx|  ||  |/dev/vhost-vdpa-x|  |   | | vDPA device 
emulation | | block driver | |
+--- ---+   
-+--+-
|   ||  
|
|   ||  
|
+---++--+-
|| block device |   |  vhost device || vduse driver |   
   | TCP/IP ||
|---+   +---+   
   -+|
|   |   |   |   
||
| --+--   --+--- ---+---
||
| | virtio-blk driver |   |  vhost-vdpa driver | | vdpa device |
||
| --+--   --+--- ---+---
||
|   |  virtio bus   |   |   
||
|   ++---   |   |   
||
||  |   |   
||
|  --+--|   |   
||
|  | virtio-blk device ||   |   
||
|  --+--|   |   
||
||  |   |   
||
| ---+---   |   |   
||
| |  virtio-vdpa driver |   |   |   
||
| ---+---   |   |   
||
||  |   |vdpa 
bus   ||
| 
---+--+---+ 
  ||
|   
 ---+--- |
-|
 NIC |--

 ---+---

|

   -+-

   | Remote Storages |

   ---

We make use of it to implement a block device connecting to
our distributed storage, which can be used both in containers and
VMs. Thus, we can have an unified technology stack in this two cases.

To test it with null-blk:

  $ qemu-storage-daemon \
  --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
  --monitor chardev=charmonitor \
  --blockdev 
driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
 \
  --export 
type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128

The 

Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list

2021-06-15 Thread Robin Murphy

On 2021-06-15 12:51, Sai Prakash Ranjan wrote:

Hi Krishna,

On 2021-06-14 23:18, Krishna Reddy wrote:
Right but we won't know until we profile the specific usecases or try 
them in
generic workload to see if they affect the performance. Sure, over 
invalidation is
a concern where multiple buffers can be mapped to same context and 
the cache
is not usable at the time for lookup and such but we don't do it for 
small buffers
and only for large buffers which means thousands of TLB entry 
mappings in

which case TLBIASID is preferred (note: I mentioned the HW team
recommendation to use it for anything greater than 128 TLB entries) 
in my
earlier reply. And also note that we do this only for partial walk 
flush, we are not

arbitrarily changing all the TLBIs to ASID based.


Most of the heavy bw use cases does involve processing larger buffers.
When the physical memory is allocated dis-contiguously at page_size
(let's use 4KB here)
granularity, each aligned 2MB chunks IOVA unmap would involve
performing a TLBIASID
as 2MB is not a leaf. Essentially, It happens all the time during
large buffer unmaps and
potentially impact active traffic on other large buffers. Depending on 
how much

latency HW engines can absorb, the overflow/underflow issues for ISO
engines can be
sporadic and vendor specific.
Performing TLBIASID as default for all SoCs is not a safe operation.



Ok so from what I gather from this is that its not easy to test for the
negative impact and you don't have data on such yet and the behaviour is
very vendor specific. To add on qcom impl, we have several performance
improvements for TLB cache invalidations in HW like wait-for-safe(for 
realtime

clients such as camera and display) and few others to allow for cache
lookups/updates when TLBI is in progress for the same context bank, so 
atleast

we are good here.



I am no camera expert but from what the camera team mentioned is that 
there
is a thread which frees memory(large unused memory buffers) 
periodically which

ends up taking around 100+ms and causing some camera test failures with
frame drops. Parallel efforts are already being made to optimize this 
usage of
thread but as I mentioned previously, this is *not a camera 
specific*, lets say
someone else invokes such large unmaps, it's going to face the same 
issue.


From the above, It doesn't look like the root cause of frame drops is
fully understood.
Why is 100+ms delay causing camera frame drop?  Is the same thread
submitting the buffers
to camera after unmap is complete? If not, how is the unmap latency
causing issue here?



Ok since you are interested in camera usecase, I have requested for more 
details
from the camera team and will give it once they comeback. However I 
don't think
its good to have unmap latency at all and that is being addressed by 
this patch.





> If unmap is queued and performed on a back ground thread, would it
> resolve the frame drops?

Not sure I understand what you mean by queuing on background thread 
but with

that or not, we still do the same number of TLBIs and hop through
iommu->io-pgtable->arm-smmu to perform the the unmap, so how will that
help?


I mean adding the unmap requests into a queue and processing them from
a different thread.
It is not to reduce the TLBIs. But, not to block subsequent buffer
allocation, IOVA map requests, if they
are being requested from same thread that is performing unmap. If
unmap is already performed from
a different thread, then the issue still need to be root caused to
understand it fully. Check for any
serialization issues.



This patch is to optimize unmap latency because of large number of mmio 
writes(TLBIVAs)
wasting CPU cycles and not to fix camera issue which can probably be 
solved by
parallelization. It seems to me like you are ok with the unmap latency 
in general

which we are not and want to avoid that latency.

Hi @Robin, from these discussions it seems they are not ok with the change
for all SoC vendor implementations and do not have any data on such impact.
As I mentioned above, on QCOM platforms we do have several optimizations 
in HW

for TLBIs and would like to make use of it and reduce the unmap latency.
What do you think, should this be made implementation specific?


Yes, it sounds like there's enough uncertainty for now that this needs 
to be an opt-in feature. However, I still think that non-strict mode 
could use it generically, since that's all about over-invalidating to 
save time on individual unmaps - and relatively non-deterministic - already.


So maybe we have a second set of iommu_flush_ops, or just a flag 
somewhere to control the tlb_flush_walk functions internally, and the 
choice can be made in the iommu_get_dma_strict() test, but also forced 
on all the time by your init_context hook. What do you reckon?


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 10/12] swiotlb: Add restricted DMA alloc/free support

2021-06-15 Thread Christoph Hellwig
On Tue, Jun 15, 2021 at 09:27:09PM +0800, Claire Chang wrote:
> Add the functions, swiotlb_{alloc,free} to support the memory allocation
> from restricted DMA pool.
> 
> The restricted DMA pool is preferred if available.
> 
> Note that since coherent allocation needs remapping, one must set up
> another device coherent pool by shared-dma-pool and use
> dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Note: when applied this should go before the next patch to make sure
bisection works fine.

>  #ifdef CONFIG_DMA_RESTRICTED_POOL
> +struct page *swiotlb_alloc(struct device *dev, size_t size)
> +{
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> + phys_addr_t tlb_addr;
> + int index;
> +
> + /*
> +  * Skip io_tlb_default_mem since swiotlb_alloc doesn't support atomic
> +  * coherent allocation. Otherwise might break existing devices.
> +  * One must set up another device coherent pool by shared-dma-pool and
> +  * use dma_alloc_from_dev_coherent instead for atomic coherent
> +  * allocation to avoid mempry remapping.

s/mempry/memory/g

Otherwise looks good:

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 09/12] swiotlb: Add restricted DMA pool initialization

2021-06-15 Thread Christoph Hellwig
On Tue, Jun 15, 2021 at 09:27:08PM +0800, Claire Chang wrote:
> Add the initialization function to create restricted DMA pools from
> matching reserved-memory nodes.
> 
> Regardless of swiotlb setting, the restricted DMA pool is preferred if
> available.
> 
> The restricted DMA pools provide a basic level of protection against the
> DMA overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system
> needs to provide a way to lock down the memory access, e.g., MPU.
> 
> Signed-off-by: Claire Chang 

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 08/12] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-06-15 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 07/12] swiotlb: Move alloc_size to swiotlb_find_slots

2021-06-15 Thread Christoph Hellwig
On Tue, Jun 15, 2021 at 09:27:06PM +0800, Claire Chang wrote:
> Rename find_slots to swiotlb_find_slots and move the maintenance of
> alloc_size to it for better code reusability later.
> 
> Signed-off-by: Claire Chang 

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 06/12] swiotlb: Use is_dev_swiotlb_force for swiotlb data bouncing

2021-06-15 Thread Christoph Hellwig
On Tue, Jun 15, 2021 at 09:27:05PM +0800, Claire Chang wrote:
> Propagate the swiotlb_force setting into io_tlb_default_mem->force and
> use it to determine whether to bounce the data or not. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-15 Thread Christoph Hellwig
On Tue, Jun 15, 2021 at 09:27:04PM +0800, Claire Chang wrote:
> Update is_swiotlb_active to add a struct device argument. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-15 Thread Christoph Hellwig
On Tue, Jun 15, 2021 at 09:27:03PM +0800, Claire Chang wrote:
> Update is_swiotlb_buffer to add a struct device argument. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-15 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 02/12] swiotlb: Refactor swiotlb_create_debugfs

2021-06-15 Thread Christoph Hellwig
On Tue, Jun 15, 2021 at 09:27:01PM +0800, Claire Chang wrote:
> Split the debugfs creation to make the code reusable for supporting
> different bounce buffer pools.
> 
> Signed-off-by: Claire Chang 

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 01/12] swiotlb: Refactor swiotlb init functions

2021-06-15 Thread Christoph Hellwig
On Tue, Jun 15, 2021 at 09:27:00PM +0800, Claire Chang wrote:
> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
> initialization to make the code reusable.

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v9 00/14] Restricted DMA

2021-06-15 Thread Claire Chang
v10 here: https://lore.kernel.org/patchwork/cover/1446882/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 12/12] of: Add plumbing for restricted DMA pool

2021-06-15 Thread Claire Chang
If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
---
 drivers/of/address.c| 33 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  6 ++
 3 files changed, 42 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 3b2acca7e363..c8066d95ff0e 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1001,6 +1002,38 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
of_node_put(node);
return ret;
 }
+
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
+{
+   struct device_node *node, *of_node = dev->of_node;
+   int count, i;
+
+   count = of_property_count_elems_of_size(of_node, "memory-region",
+   sizeof(u32));
+   /*
+* If dev->of_node doesn't exist or doesn't contain memory-region, try
+* the OF node having DMA configuration.
+*/
+   if (count <= 0) {
+   of_node = np;
+   count = of_property_count_elems_of_size(
+   of_node, "memory-region", sizeof(u32));
+   }
+
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(of_node, "memory-region", i);
+   /*
+* There might be multiple memory regions, but only one
+* restricted-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(dev, of_node,
+ i);
+   }
+
+   return 0;
+}
 #endif /* CONFIG_HAS_DMA */
 
 /**
diff --git a/drivers/of/device.c b/drivers/of/device.c
index c5a9473a5fb1..2defdca418ec 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev, np);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index 631489f7f8c0..376462798f7e 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -163,12 +163,18 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_set_restricted_buffer(struct device *dev,
+  struct device_node *np)
+{
+   return -ENODEV;
+}
 #endif
 
 void fdt_init_reserved_mem(void);
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 11/12] dt-bindings: of: Add restricted DMA pool

2021-06-15 Thread Claire Chang
Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
---
 .../reserved-memory/reserved-memory.txt   | 36 +--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..46804f24df05 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for 
each corresponding
 
 Example
 ---
-This example defines 3 contiguous regions are defined for Linux kernel:
+This example defines 4 contiguous regions for Linux kernel:
 one default of all device drivers (named linux,cma@7200 and 64MiB in size),
-one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and
-one for multimedia processing (named multimedia-memory@7700, 64MiB).
+one dedicated to the framebuffer device (named framebuffer@7800, 8MiB),
+one for multimedia processing (named multimedia-memory@7700, 64MiB), and
+one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB).
 
 / {
#address-cells = <1>;
@@ -120,6 +138,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_reserved: restricted_dma_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x400>;
+   };
};
 
/* ... */
@@ -138,4 +161,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   reg = <0x8301 0x0 0x 0x0 0x0010
+  0x8301 0x0 0x0010 0x0 0x0010>;
+   memory-region = <_dma_mem_reserved>;
+   /* ... */
+   };
 };
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 10/12] swiotlb: Add restricted DMA alloc/free support

2021-06-15 Thread Claire Chang
Add the functions, swiotlb_{alloc,free} to support the memory allocation
from restricted DMA pool.

The restricted DMA pool is preferred if available.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 15 +
 kernel/dma/direct.c | 50 ++---
 kernel/dma/swiotlb.c| 42 --
 3 files changed, 92 insertions(+), 15 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index e76ac469..9616346b727f 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -157,4 +157,19 @@ static inline void swiotlb_adjust_size(unsigned long size)
 extern void swiotlb_print_info(void);
 extern void swiotlb_set_max_segment(unsigned int);
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+#else
+static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   return NULL;
+}
+static inline bool swiotlb_free(struct device *dev, struct page *page,
+   size_t size)
+{
+   return false;
+}
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 3713461d6fe0..da0e09621230 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   swiotlb_free(dev, page, size))
+   return;
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -86,7 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
-   page = dma_alloc_contiguous(dev, size, gfp);
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL)) {
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   return NULL;
+   }
+   }
+
+   if (!page)
+   page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
page = NULL;
@@ -142,7 +160,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -155,18 +173,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_dev_swiotlb_force) is required, one must
+* set up another device coherent pool by shared-dma-pool and use
+* dma_alloc_from_dev_coherent instead.
 */
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
-(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && 
!dev_is_dma_coherent(dev
+(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+ !dev_is_dma_coherent(dev))) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
/* we always manually zero the memory once we are done */
@@ -237,7 +260,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return NULL;
}
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);

[PATCH v10 09/12] swiotlb: Add restricted DMA pool initialization

2021-06-15 Thread Claire Chang
Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h |  3 +-
 kernel/dma/Kconfig  | 14 
 kernel/dma/swiotlb.c| 78 +
 3 files changed, 94 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index efcd56e3a16c..e76ac469 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 3c01162c400b..ef1ccd63534d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -693,3 +700,74 @@ static int __init swiotlb_create_default_debugfs(void)
 late_initcall(swiotlb_create_default_debugfs);
 
 #endif
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   /*
+* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+   mem->force = true;
+   set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
+rmem->size >> PAGE_SHIFT);
+
+   rmem->priv = mem;
+
+   if (IS_ENABLED(CONFIG_DEBUG_FS)) {
+   mem->debugfs =
+   debugfs_create_dir(rmem->name, debugfs_dir);
+   swiotlb_create_debugfs_files(mem);
+   }
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   pr_err("Restricted DMA pool must be accessible within the 
linear mapping.");
+   return -EINVAL;
+   }
+
+   rmem->ops = _swiotlb_ops;
+   pr_info("Reserved 

[PATCH v10 08/12] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-06-15 Thread Claire Chang
Add a new function, swiotlb_release_slots, to make the code reusable for
supporting different bounce buffer pools.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e498f11e150e..3c01162c400b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -546,27 +546,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -601,6 +589,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   swiotlb_release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 07/12] swiotlb: Move alloc_size to swiotlb_find_slots

2021-06-15 Thread Claire Chang
Rename find_slots to swiotlb_find_slots and move the maintenance of
alloc_size to it for better code reusability later.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 5af47a8f68b8..e498f11e150e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -422,8 +422,8 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, phys_addr_t orig_addr,
-   size_t alloc_size)
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
@@ -478,8 +478,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -520,7 +523,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
 
-   index = find_slots(dev, orig_addr, alloc_size + offset);
+   index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset);
if (index == -1) {
if (!(attrs & DMA_ATTR_NO_WARN))
dev_warn_ratelimited(dev,
@@ -534,11 +537,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 06/12] swiotlb: Use is_dev_swiotlb_force for swiotlb data bouncing

2021-06-15 Thread Claire Chang
Propagate the swiotlb_force setting into io_tlb_default_mem->force and
use it to determine whether to bounce the data or not. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 11 +++
 kernel/dma/direct.c |  2 +-
 kernel/dma/direct.h |  2 +-
 kernel/dma/swiotlb.c|  4 
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index dd1c30a83058..efcd56e3a16c 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
  * unmap calls.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
+ * @force:  %true if swiotlb is forced
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -94,6 +95,7 @@ struct io_tlb_mem {
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+   bool force;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->force;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+   return false;
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..3713461d6fe0 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || is_dev_swiotlb_force(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..6c4d13caceb1 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (is_dev_swiotlb_force(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d07e32020edf..5af47a8f68b8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->end = mem->start + bytes;
mem->index = 0;
mem->late_alloc = late_alloc;
+
+   if (swiotlb_force == SWIOTLB_FORCE)
+   mem->force = true;
+
spin_lock_init(>lock);
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-15 Thread Claire Chang
Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index ce6b664b10aa..89a894354263 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(obj->base.dev->dev)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index f4c2e46b6fe1..2ca9d9a9e5d5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -276,7 +276,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(dev->dev);
 #endif
 
ret = ttm_device_init(>ttm.bdev, _bo_driver, drm->dev->dev,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..0d56985bfe81 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(>xdev->dev)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d1f3d95881cd..dd1c30a83058 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 949a6bb21343..d07e32020edf 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -654,9 +654,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return dev->dma_io_tlb_mem != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-15 Thread Claire Chang
Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  7 ---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 5d96fcc45fec..1a6a08908245 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -506,7 +506,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -577,7 +577,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -783,7 +783,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -796,7 +796,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -817,7 +817,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -834,7 +834,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 24d11861ac7d..0c4fb34f11ab 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..d1f3d95881cd 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -101,9 +102,9 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -115,7 +116,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_device(dev, paddr, sg->length,

[PATCH v10 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-15 Thread Claire Chang
Always have the pointer to the swiotlb pool used in struct device. This
could help simplify the code for other pools.

Signed-off-by: Claire Chang 
---
 drivers/base/core.c| 4 
 include/linux/device.h | 4 
 kernel/dma/swiotlb.c   | 8 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b8a8c96dca58..eeb2d49d3aa3 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include  /* for dma_default_coherent */
 
@@ -2846,6 +2847,9 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
+#ifdef CONFIG_SWIOTLB
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+#endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index 4443e12238a0..2e9a378c9100 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -432,6 +432,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -540,6 +541,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_SWIOTLB
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 97c6ad50fdc2..949a6bb21343 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -339,7 +339,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
@@ -421,7 +421,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -498,7 +498,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -549,7 +549,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
unsigned long flags;
unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 02/12] swiotlb: Refactor swiotlb_create_debugfs

2021-06-15 Thread Claire Chang
Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c64298e416c8..97c6ad50fdc2 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -661,19 +661,26 @@ bool is_swiotlb_active(void)
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
+static struct dentry *debugfs_dir;
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
-   if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+   if (mem) {
+   mem->debugfs = debugfs_dir;
+   swiotlb_create_debugfs_files(mem);
+   }
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 01/12] swiotlb: Refactor swiotlb init functions

2021-06-15 Thread Claire Chang
Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 49 ++--
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..c64298e416c8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(>lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -297,20 +308,8 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.32.0.272.g935e593368-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v10 00/12] Restricted DMA

2021-06-15 Thread Claire Chang
This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v10:
Address the comments in v9 to
  - fix the dev->dma_io_tlb_mem assignment
  - propagate swiotlb_force setting into io_tlb_default_mem->force
  - move set_memory_decrypted out of swiotlb_init_io_tlb_mem
  - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block
  - add swiotlb_ prefix to find_slots and release_slots
  - merge the 3 alloc/free related patches
  - move the CONFIG_DMA_RESTRICTED_POOL later

v9:
Address the comments in v7 to
  - set swiotlb active pool to dev->dma_io_tlb_mem
  - get rid of get_io_tlb_mem
  - dig out the device struct for is_swiotlb_active
  - move debugfs_create_dir out of swiotlb_create_debugfs
  - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
  - use IS_ENABLED in kernel/dma/direct.c
  - fix redefinition of 'of_dma_set_restricted_buffer'
https://lore.kernel.org/patchwork/cover/1445081/

v8:
- Fix reserved-memory.txt and add the reg property in example.
- Fix sizeof for of_property_count_elems_of_size in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Apply Will's suggestion to try the OF node having DMA configuration in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer.
- Add error message for PageHighMem in
  kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
  rmem_swiotlb_setup.
- Fix the message string in rmem_swiotlb_setup.
https://lore.kernel.org/patchwork/cover/1437112/

v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
https://lore.kernel.org/patchwork/cover/1431031/

v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/


Claire Chang (12):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Use is_dev_swiotlb_force for swiotlb data bouncing
  swiotlb: Move alloc_size to swiotlb_find_slots
  swiotlb: Refactor swiotlb_tbl_unmap_single
  swiotlb: Add restricted DMA pool initialization
  swiotlb: Add restricted DMA alloc/free support
  dt-bindings: of: Add restricted DMA pool
  of: Add plumbing for restricted DMA pool

 .../reserved-memory/reserved-memory.txt   |  36 ++-
 drivers/base/core.c   |   4 +
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c |   2 +-
 drivers/iommu/dma-iommu.c |  12 +-
 drivers/of/address.c  |  33 +++
 drivers/of/device.c   |   3 +
 drivers/of/of_private.h   |   6 +
 drivers/pci/xen-pcifront.c|   2 

  1   2   >