date:20210618

v14: https://lore.kernel.org/patchwork/cover/1448954/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 12/12] of: Add plumbing for restricted DMA pool

If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 drivers/of/address.c| 33 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  6 ++
 3 files changed, 42 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 73ddf2540f3f..cdf700fba5c4 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1022,6 +1023,38 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
of_node_put(node);
return ret;
 }
+
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
+{
+   struct device_node *node, *of_node = dev->of_node;
+   int count, i;
+
+   count = of_property_count_elems_of_size(of_node, "memory-region",
+   sizeof(u32));
+   /*
+* If dev->of_node doesn't exist or doesn't contain memory-region, try
+* the OF node having DMA configuration.
+*/
+   if (count <= 0) {
+   of_node = np;
+   count = of_property_count_elems_of_size(
+   of_node, "memory-region", sizeof(u32));
+   }
+
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(of_node, "memory-region", i);
+   /*
+* There might be multiple memory regions, but only one
+* restricted-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(dev, of_node,
+ i);
+   }
+
+   return 0;
+}
 #endif /* CONFIG_HAS_DMA */
 
 /**
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 6cb86de404f1..e68316836a7a 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev, np);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d9e6a324de0a..25cebbed5f02 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -161,12 +161,18 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_set_restricted_buffer(struct device *dev,
+  struct device_node *np)
+{
+   return -ENODEV;
+}
 #endif
 
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 11/12] dt-bindings: of: Add restricted DMA pool

Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 .../reserved-memory/reserved-memory.txt   | 36 +--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..39b5f4c5a511 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for 
each corresponding
 
 Example
 ---
-This example defines 3 contiguous regions are defined for Linux kernel:
+This example defines 4 contiguous regions for Linux kernel:
 one default of all device drivers (named linux,cma@7200 and 64MiB in size),
-one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and
-one for multimedia processing (named multimedia-memory@7700, 64MiB).
+one dedicated to the framebuffer device (named framebuffer@7800, 8MiB),
+one for multimedia processing (named multimedia-memory@7700, 64MiB), and
+one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB).
 
 / {
#address-cells = <1>;
@@ -120,6 +138,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_reserved: restricted_dma_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x400>;
+   };
};
 
/* ... */
@@ -138,4 +161,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   reg = <0x8301 0x0 0x 0x0 0x0010
+  0x8301 0x0 0x0010 0x0 0x0010>;
+   memory-region = <_dma_reserved>;
+   /* ... */
+   };
 };
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 10/12] swiotlb: Add restricted DMA pool initialization

Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 include/linux/swiotlb.h |  3 +-
 kernel/dma/Kconfig  | 14 
 kernel/dma/swiotlb.c| 76 +
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index a73fad460162..175b6c113ed8 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 273b21090ee8..1aef294c82b5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -736,4 +743,73 @@ bool swiotlb_free(struct device *dev, struct page *page, 
size_t size)
return true;
 }
 
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   /*
+* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+
+   set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
+rmem->size >> PAGE_SHIFT);
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+   mem->force_bounce = true;
+   mem->for_alloc = true;
+
+   rmem->priv = mem;
+
+   if (IS_ENABLED(CONFIG_DEBUG_FS)) {
+   mem->debugfs =
+   debugfs_create_dir(rmem->name, debugfs_dir);
+   swiotlb_create_debugfs_files(mem);
+   }
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   pr_err("Restricted DMA pool must be accessible within the 
linear mapping.");
+   return

[PATCH v14 09/12] swiotlb: Add restricted DMA alloc/free support

Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to
support the memory allocation from restricted DMA pool.

The restricted DMA pool is preferred if available.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 include/linux/swiotlb.h | 26 ++
 kernel/dma/direct.c | 49 +++--
 kernel/dma/swiotlb.c| 38 ++--
 3 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8d8855c77d9a..a73fad460162 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force;
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
+ * @for_alloc:  %true if the pool is used for memory allocation
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -96,6 +97,7 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool late_alloc;
bool force_bounce;
+   bool for_alloc;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -156,4 +158,28 @@ static inline void swiotlb_adjust_size(unsigned long size)
 extern void swiotlb_print_info(void);
 extern void swiotlb_set_max_segment(unsigned int);
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->for_alloc;
+}
+#else
+static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   return NULL;
+}
+static inline bool swiotlb_free(struct device *dev, struct page *page,
+   size_t size)
+{
+   return false;
+}
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+   return false;
+}
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a92465b4eb12..2de33e5d302b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   swiotlb_free(dev, page, size))
+   return;
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -86,6 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   is_swiotlb_for_alloc(dev)) {
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   return NULL;
+   }
+   return page;
+   }
+
page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
@@ -142,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -155,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_swiotlb_for_alloc(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_swiotlb_for_alloc) is

[PATCH v14 08/12] swiotlb: Refactor swiotlb_tbl_unmap_single

Add a new function, swiotlb_release_slots, to make the code reusable for
supporting different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index daf38a52e66d..e79383df5d4a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -556,27 +556,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -611,6 +599,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   swiotlb_release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 07/12] swiotlb: Move alloc_size to swiotlb_find_slots

Rename find_slots to swiotlb_find_slots and move the maintenance of
alloc_size to it for better code reusability later.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0d294bbf274c..daf38a52e66d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -432,8 +432,8 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, phys_addr_t orig_addr,
-   size_t alloc_size)
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
@@ -488,8 +488,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -530,7 +533,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
 
-   index = find_slots(dev, orig_addr, alloc_size + offset);
+   index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset);
if (index == -1) {
if (!(attrs & DMA_ATTR_NO_WARN))
dev_warn_ratelimited(dev,
@@ -544,11 +547,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
use it to determine whether to bounce the data or not. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   | 11 +++
 kernel/dma/direct.c   |  2 +-
 kernel/dma/direct.h   |  2 +-
 kernel/dma/swiotlb.c  |  4 
 5 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 0c6ed09f8513..4730a146fa35 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -369,7 +369,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
if (dma_capable(dev, dev_addr, size, true) &&
!range_straddles_page_boundary(phys, size) &&
!xen_arch_need_swiotlb(dev, phys, dev_addr) &&
-   swiotlb_force != SWIOTLB_FORCE)
+   !is_swiotlb_force_bounce(dev))
goto done;
 
/*
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index dd1c30a83058..8d8855c77d9a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
  * unmap calls.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
+ * @force_bounce: %true if swiotlb bouncing is forced
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -94,6 +95,7 @@ struct io_tlb_mem {
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+   bool force_bounce;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->force_bounce;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+   return false;
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..a92465b4eb12 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..4632b0f4f72e 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (is_swiotlb_force_bounce(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8a120f42340b..0d294bbf274c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->end = mem->start + bytes;
mem->index = 0;
mem->late_alloc = late_alloc;
+
+   if (swiotlb_force == SWIOTLB_FORCE)
+   mem->force_bounce = true;
+
spin_lock_init(>lock);
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument

Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index a9d65fc8aa0e..4b7afa0fc85d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(obj->base.dev->dev)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 9662522aa066..be15bfd9e0ee 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(dev->dev);
 #endif
 
ret = ttm_bo_device_init(>ttm.bdev, _bo_driver,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..0d56985bfe81 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(>xdev->dev)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d1f3d95881cd..dd1c30a83058 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 72a4289faed1..8a120f42340b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -664,9 +664,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return dev->dma_io_tlb_mem != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument

Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  7 ---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3087d9fa6065..10997ef541f8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 4c89afc0df62..0c6ed09f8513 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..d1f3d95881cd 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -101,9 +102,9 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -115,7 +116,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))

[PATCH v14 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

Always have the pointer to the swiotlb pool used in struct device. This
could help simplify the code for other pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 drivers/base/core.c| 4 
 include/linux/device.h | 4 
 kernel/dma/swiotlb.c   | 8 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index f29839382f81..cb3123e3954d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include  /* for dma_default_coherent */
 
@@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
+#ifdef CONFIG_SWIOTLB
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+#endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index ba660731bd25..240d652a0696 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,6 +416,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -518,6 +519,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_SWIOTLB
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ede66df6835b..72a4289faed1 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -340,7 +340,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
phys_addr_t orig_addr = mem->slots[index].orig_addr;
@@ -431,7 +431,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -508,7 +508,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -559,7 +559,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
unsigned long flags;
unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 02/12] swiotlb: Refactor swiotlb_create_debugfs

Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1f9b2b9e7490..ede66df6835b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -671,19 +671,26 @@ bool is_swiotlb_active(void)
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
+static struct dentry *debugfs_dir;
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
-   if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+   if (mem) {
+   mem->debugfs = debugfs_dir;
+   swiotlb_create_debugfs_files(mem);
+   }
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 01/12] swiotlb: Refactor swiotlb init functions

Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 50 ++--
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 52e2ac526757..1f9b2b9e7490 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(>lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
+   memset(mem, 0, sizeof(*mem));
set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 00/12] Restricted DMA

This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v14:
- Move set_memory_decrypted before swiotlb_init_io_tlb_mem (patch 01/12, 10,12)
- Add Stefano's Acked-by tag from v13

v13:
- Fix xen-swiotlb issues
  - memset in patch 01/12
  - is_swiotlb_force_bounce in patch 06/12
- Fix the dts example typo in reserved-memory.txt
- Add Stefano and Will's Tested-by tag from v12
https://lore.kernel.org/patchwork/cover/1448001/

v12:
Split is_dev_swiotlb_force into is_swiotlb_force_bounce (patch 06/12) and
is_swiotlb_for_alloc (patch 09/12)
https://lore.kernel.org/patchwork/cover/1447254/

v11:
- Rebase against swiotlb devel/for-linus-5.14
- s/mempry/memory/g
- exchange the order of patch 09/12 and 10/12
https://lore.kernel.org/patchwork/cover/1447216/

v10:
Address the comments in v9 to
  - fix the dev->dma_io_tlb_mem assignment
  - propagate swiotlb_force setting into io_tlb_default_mem->force
  - move set_memory_decrypted out of swiotlb_init_io_tlb_mem
  - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block
  - add swiotlb_ prefix to find_slots and release_slots
  - merge the 3 alloc/free related patches
  - move the CONFIG_DMA_RESTRICTED_POOL later
https://lore.kernel.org/patchwork/cover/1446882/

v9:
Address the comments in v7 to
  - set swiotlb active pool to dev->dma_io_tlb_mem
  - get rid of get_io_tlb_mem
  - dig out the device struct for is_swiotlb_active
  - move debugfs_create_dir out of swiotlb_create_debugfs
  - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
  - use IS_ENABLED in kernel/dma/direct.c
  - fix redefinition of 'of_dma_set_restricted_buffer'
https://lore.kernel.org/patchwork/cover/1445081/

v8:
- Fix reserved-memory.txt and add the reg property in example.
- Fix sizeof for of_property_count_elems_of_size in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Apply Will's suggestion to try the OF node having DMA configuration in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer.
- Add error message for PageHighMem in
  kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
  rmem_swiotlb_setup.
- Fix the message string in rmem_swiotlb_setup.
https://lore.kernel.org/patchwork/cover/1437112/

v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
https://lore.kernel.org/patchwork/cover/1431031/

v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/


Claire Chang (12):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  swiotlb: Move alloc_size

Re: [PATCHv2 2/3] iommu/io-pgtable: Optimize partial walk flush for large scatter-gather list

2021-06-18 Thread Doug Anderson

Hi,

On Thu, Jun 17, 2021 at 7:51 PM Sai Prakash Ranjan
 wrote:
>
> Currently for iommu_unmap() of large scatter-gather list with page size
> elements, the majority of time is spent in flushing of partial walks in
> __arm_lpae_unmap() which is a VA based TLB invalidation invalidating
> page-by-page on iommus like arm-smmu-v2 (TLBIVA) which do not support
> range based invalidations like on arm-smmu-v3.2.
>
> For example: to unmap a 32MB scatter-gather list with page size elements
> (8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB
> for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K)
> resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge
> overhead.
>
> So instead use tlb_flush_all() callback (TLBIALL/TLBIASID) to invalidate
> the entire context for partial walk flush on select few platforms where
> cost of over-invalidation is less than unmap latency

It would probably be worth punching this description up a little bit.
Elsewhere you said in more detail why this over-invalidation is less
of a big deal for the Qualcomm SMMU. It's probably worth saying
something like that here, too. Like this bit paraphrased from your
other email:

On qcom impl, we have several performance improvements for TLB cache
invalidations in HW like wait-for-safe (for realtime clients such as
camera and display) and few others to allow for cache lookups/updates
when TLBI is in progress for the same context bank.

> using the newly
> introduced quirk IO_PGTABLE_QUIRK_TLB_INV_ALL. We also do this for
> non-strict mode given its all about over-invalidation saving time on
> individual unmaps and non-deterministic generally.

As per usual I'm mostly clueless, but I don't quite understand why you
want this new behavior for non-strict mode. To me it almost seems like
the opposite? Specifically, non-strict mode is already outside the
critical path today and so there's no need to optimize it. I'm
probably not explaining myself clearly, but I guess i'm thinking:

a) today for strict, unmap is in the critical path and it's important
to get it out of there. Getting it out of the critical path is so
important that we're willing to over-invalidate to speed up the
critical path.

b) today for non-strict, unmap is not in the critical path.

So I would almost expect your patch to _disable_ your new feature for
non-strict mappings, not auto-enable your new feature for non-strict
mappings.

If I'm babbling, feel free to ignore. ;-) Looking back, I guess Robin
was the one that suggested the behavior you're implementing, so it's
more likely he's right than I am. ;-)

-Doug
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch


On 2021-06-18 17:12, Ross Philipson wrote:

The IOMMU should always be set to default translated type after
the PMRs are disabled to protect the MLE from DMA.

Signed-off-by: Ross Philipson 
---
  drivers/iommu/intel/iommu.c | 5 +
  drivers/iommu/iommu.c   | 6 +-
  2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284..4f0256d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -41,6 +41,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device *dev)
   */
  static int device_def_domain_type(struct device *dev)
  {
+   /* Do not allow identity domain when Secure Launch is configured */
+   if (slaunch_get_flags() & SL_FLAG_ACTIVE)
+   return IOMMU_DOMAIN_DMA;


Is this specific to Intel? It seems like it could easily be done 
commonly like the check for untrusted external devices.



+
if (dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(dev);
  
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c

index 808ab70d..d49b7dd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  static struct kset *iommu_group_kset;

@@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
  {
if (cmd_line)
iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
-   iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
+
+   /* Do not allow identity domain when Secure Launch is configured */
+   if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
+   iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;


Quietly ignoring the setting and possibly leaving iommu_def_domain_type 
uninitialised (note that 0 is not actually a usable type) doesn't seem 
great. AFAICS this probably warrants similar treatment to the 
mem_encrypt_active() case - there doesn't seem a great deal of value in 
trying to save users from themselves if they care about measured boot 
yet explicitly pass options which may compromise measured boot. If you 
really want to go down that route there's at least the sysfs interface 
you'd need to nobble as well, not to mention the various ways of 
completely disabling IOMMUs...


It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on 
!SECURE_LAUNCH for clarity though.


Robin.


  }
  
  void iommu_set_default_translated(bool cmd_line)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

2021-06-18 Thread Jason Gunthorpe

On Fri, Jun 18, 2021 at 07:03:31PM +0200, Jean-Philippe Brucker wrote:

> configuration. The Arm SMMUs have a lot of small features that
> implementations can mix and match and that a user shouldn't have to care
> about, and there are lots of different implementations by various
> vendors.

This is really something to think about carefully in this RFC - I do
have a guess that a 'HW specific' channel is going to be useful here.

If the goal is for qemu to provide all these fiddly things and they
cannot be SW emulated, then direct access to the fiddly HW native
stuff is possibly necessary.

We've kind of seen this mistake in DRM and RDMA
historically. Attempting to generalize too early, or generalize
something that is really a one off. Better for everyone to just keep
it as a one off.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-18 Thread Jason Gunthorpe

On Fri, Jun 18, 2021 at 04:57:40PM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Friday, June 18, 2021 8:20 AM
> > 
> > On Thu, Jun 17, 2021 at 03:14:52PM -0600, Alex Williamson wrote:
> > 
> > > I've referred to this as a limitation of type1, that we can't put
> > > devices within the same group into different address spaces, such as
> > > behind separate vRoot-Ports in a vIOMMU config, but really, who cares?
> > > As isolation support improves we see fewer multi-device groups, this
> > > scenario becomes the exception.  Buy better hardware to use the devices
> > > independently.
> > 
> > This is basically my thinking too, but my conclusion is that we should
> > not continue to make groups central to the API.
> > 
> > As I've explained to David this is actually causing functional
> > problems and mess - and I don't see a clean way to keep groups central
> > but still have the device in control of what is happening. We need
> > this device <-> iommu connection to be direct to robustly model all
> > the things that are in the RFC.
> > 
> > To keep groups central someone needs to sketch out how to solve
> > today's mdev SW page table and mdev PASID issues in a clean
> > way. Device centric is my suggestion on how to make it clean, but I
> > haven't heard an alternative??
> > 
> > So, I view the purpose of this discussion to scope out what a
> > device-centric world looks like and then if we can securely fit in the
> > legacy non-isolated world on top of that clean future oriented
> > API. Then decide if it is work worth doing or not.
> > 
> > To my mind it looks like it is not so bad, granted not every detail is
> > clear, and no code has be sketched, but I don't see a big scary
> > blocker emerging. An extra ioctl or two, some special logic that
> > activates for >1 device groups that looks a lot like VFIO's current
> > logic..
> > 
> > At some level I would be perfectly fine if we made the group FD part
> > of the API for >1 device groups - except that complexifies every user
> > space implementation to deal with that. It doesn't feel like a good
> > trade off.
> > 
> 
> Would it be an acceptable tradeoff by leaving >1 device groups 
> supported only via legacy VFIO (which is anyway kept for backward 
> compatibility), if we think such scenario is being deprecated over 
> time (thus little value to add new features on it)? Then all new 
> sub-systems including vdpa and new vfio only support singleton 
> device group via /dev/iommu...

That might just be a great idea - userspace has to support those APIs
anyhow, if it can be made trivially obvious to use this fallback even
though /dev/iommu is available it is a great place to start. It also
means PASID/etc are naturally blocked off.

Maybe years down the road we will want to harmonize them, so I would
still sketch it out enough to be confident it could be implemented..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-18 Thread Jianxiong Gao via iommu

> Jianxiong Gao, before spending more time on this, could you also try
> Chanho Park's patch?
> https://lore.kernel.org/linux-iommu/20210510091816.ga2...@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f
>
I have tested Chanho Parks's patch and it works for us.
The NVMe driver performs correctly with the patch.

I have teste the patch on 06af8679449d

-- 
Jianxiong Gao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] /dev/ioasid uAPI proposal

On Thu, Jun 17, 2021 at 01:00:14PM +1000, David Gibson wrote:
> On Thu, Jun 10, 2021 at 06:37:31PM +0200, Jean-Philippe Brucker wrote:
> > On Tue, Jun 08, 2021 at 04:31:50PM +1000, David Gibson wrote:
> > > For the qemu case, I would imagine a two stage fallback:
> > > 
> > > 1) Ask for the exact IOMMU capabilities (including pagetable
> > >format) that the vIOMMU has.  If the host can supply, you're
> > >good
> > > 
> > > 2) If not, ask for a kernel managed IOAS.  Verify that it can map
> > >all the IOVA ranges the guest vIOMMU needs, and has an equal or
> > >smaller pagesize than the guest vIOMMU presents.  If so,
> > >software emulate the vIOMMU by shadowing guest io pagetable
> > >updates into the kernel managed IOAS.
> > > 
> > > 3) You're out of luck, don't start.
> > > 
> > > For both (1) and (2) I'd expect it to be asking this question *after*
> > > saying what devices are attached to the IOAS, based on the virtual
> > > hardware configuration.  That doesn't cover hotplug, of course, for
> > > that you have to just fail the hotplug if the new device isn't
> > > supportable with the IOAS you already have.
> > 
> > Yes. So there is a point in time when the IOAS is frozen, and cannot take
> > in new incompatible devices. I think that can support the usage I had in
> > mind. If the VMM (non-QEMU, let's say) wanted to create one IOASID FD per
> > feature set it could bind the first device, freeze the features, then bind
> 
> Are you thinking of this "freeze the features" as an explicitly
> triggered action?  I have suggested that an explicit "ENABLE" step
> might be useful, but that hasn't had much traction from what I've
> seen.

Seems like we do need an explicit enable step for the flow you described
above:

a) Bind all devices to an ioasid. Each bind succeeds.
b) Ask for a specific set of features for this aggregate of device. Ask
   for (1), fall back to (2), or abort.
c) Boot the VM
d) Hotplug a device, bind it to the ioasid. We're long past negotiating
   features for the ioasid, so the host needs to reject the bind if the
   new device is incompatible with what was requested at (b)

So a successful request at (b) would be the point where we change the
behavior of bind.

Since the kernel needs a form of feature check in any case, I still have a
preference for aborting the bind at (a) if the device isn't exactly
compatible with other devices already in the ioasid, because it might be
simpler to implement in the host, but I don't feel strongly about this.


> > I'd like to understand better where the difficulty lies, with migration.
> > Is the problem, once we have a guest running on physical machine A, to
> > make sure that physical machine B supports the same IOMMU properties
> > before migrating the VM over to B?  Why can't QEMU (instead of the user)
> > select a feature set on machine A, then when time comes to migrate, query
> > all information from the host kernel on machine B and check that it
> > matches what was picked for machine A?  Or is it only trying to
> > accommodate different sets of features between A and B, that would be too
> > difficult?
> 
> There are two problems
> 
> 1) Although it could be done in theory, it's hard, and it would need a
> huge rewrite to qemu's whole migration infrastructure to do this.
> We'd need a way of representing host features, working out which sets
> are compatible with which others depending on what things the guest is
> allowed to use, encoding the information in the migration stream and
> reporting failure.  None of this exists now.
> 
> Indeed qemu requires that you create the (stopped) machine on the
> destination (including virtual hardware configuration) before even
> attempting to process the incoming migration.  It does not for the
> most part transfer the machine configuration in the migration stream.
> Now, that's generally considered a flaw with the design, but fixing it
> is a huge project that no-one's really had the energy to begin despite
> the idea being around for years.
> 
> 2) It makes behaviour really hard to predict for management layers
> above.  Things like oVirt automatically migrate around a cluster for
> load balancing.  At the moment the model which works is basically that
> you if you request the same guest features on each end of the
> migration, and qemu starts with that configuration on each end, the
> migration should work (or only fail for transient reasons).  If you
> can't know if the migration is possible until you get the incoming
> stream, reporting and exposing what will and won't work to the layer
> above also becomes an immensely fiddly problem.

That was really useful, thanks. One thing I'm worried about is the user
having to know way too much detail about IOMMUs in order to pick a precise
configuration. The Arm SMMUs have a lot of small features that
implementations can mix and match and that a user shouldn't have to care
about, and there are

RE: Plan for /dev/ioasid RFC v2

2021-06-18 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Friday, June 18, 2021 8:20 AM
> 
> On Thu, Jun 17, 2021 at 03:14:52PM -0600, Alex Williamson wrote:
> 
> > I've referred to this as a limitation of type1, that we can't put
> > devices within the same group into different address spaces, such as
> > behind separate vRoot-Ports in a vIOMMU config, but really, who cares?
> > As isolation support improves we see fewer multi-device groups, this
> > scenario becomes the exception.  Buy better hardware to use the devices
> > independently.
> 
> This is basically my thinking too, but my conclusion is that we should
> not continue to make groups central to the API.
> 
> As I've explained to David this is actually causing functional
> problems and mess - and I don't see a clean way to keep groups central
> but still have the device in control of what is happening. We need
> this device <-> iommu connection to be direct to robustly model all
> the things that are in the RFC.
> 
> To keep groups central someone needs to sketch out how to solve
> today's mdev SW page table and mdev PASID issues in a clean
> way. Device centric is my suggestion on how to make it clean, but I
> haven't heard an alternative??
> 
> So, I view the purpose of this discussion to scope out what a
> device-centric world looks like and then if we can securely fit in the
> legacy non-isolated world on top of that clean future oriented
> API. Then decide if it is work worth doing or not.
> 
> To my mind it looks like it is not so bad, granted not every detail is
> clear, and no code has be sketched, but I don't see a big scary
> blocker emerging. An extra ioctl or two, some special logic that
> activates for >1 device groups that looks a lot like VFIO's current
> logic..
> 
> At some level I would be perfectly fine if we made the group FD part
> of the API for >1 device groups - except that complexifies every user
> space implementation to deal with that. It doesn't feel like a good
> trade off.
> 

Would it be an acceptable tradeoff by leaving >1 device groups 
supported only via legacy VFIO (which is anyway kept for backward 
compatibility), if we think such scenario is being deprecated over 
time (thus little value to add new features on it)? Then all new 
sub-systems including vdpa and new vfio only support singleton 
device group via /dev/iommu...

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 2/5] ACPI: Move IOMMU setup code out of IORT


On 2021-06-18 16:20, Jean-Philippe Brucker wrote:

Extract the code that sets up the IOMMU infrastructure from IORT, since
it can be reused by VIOT. Move it one level up into a new
acpi_iommu_configure_id() function, which calls the IORT parsing
function which in turn calls the acpi_iommu_fwspec_init() helper.


Reviewed-by: Robin Murphy 


Signed-off-by: Jean-Philippe Brucker 
---
  include/acpi/acpi_bus.h   |  3 ++
  include/linux/acpi_iort.h |  8 ++---
  drivers/acpi/arm64/iort.c | 74 +--
  drivers/acpi/scan.c   | 73 +-
  4 files changed, 86 insertions(+), 72 deletions(-)

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 3a82faac5767..41f092a269f6 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -588,6 +588,9 @@ struct acpi_pci_root {
  
  bool acpi_dma_supported(struct acpi_device *adev);

  enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
+int acpi_iommu_fwspec_init(struct device *dev, u32 id,
+  struct fwnode_handle *fwnode,
+  const struct iommu_ops *ops);
  int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset,
   u64 *size);
  int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index f7f054833afd..f1f0842a2cb2 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev);
  int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
  /* IOMMU interface */
  int iort_dma_get_ranges(struct device *dev, u64 *size);
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in);
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
  phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
  #else
@@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device 
*dev) { }
  /* IOMMU interface */
  static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
  { return -ENODEV; }
-static inline const struct iommu_ops *iort_iommu_configure_id(
- struct device *dev, const u32 *id_in)
-{ return NULL; }
+static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
+{ return -ENODEV; }
  static inline
  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head)
  { return 0; }
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index a940be1cf2af..487d1095030d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -806,23 +806,6 @@ static struct acpi_iort_node 
*iort_get_msi_resv_iommu(struct device *dev)
return NULL;
  }
  
-static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device *dev)

-{
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
-   return (fwspec && fwspec->ops) ? fwspec->ops : NULL;
-}
-
-static inline int iort_add_device_replay(struct device *dev)
-{
-   int err = 0;
-
-   if (dev->bus && !device_iommu_mapped(dev))
-   err = iommu_probe_device(dev);
-
-   return err;
-}
-
  /**
   * iort_iommu_msi_get_resv_regions - Reserved region driver helper
   * @dev: Device from iommu_get_resv_regions()
@@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type)
}
  }
  
-static int arm_smmu_iort_xlate(struct device *dev, u32 streamid,

-  struct fwnode_handle *fwnode,
-  const struct iommu_ops *ops)
-{
-   int ret = iommu_fwspec_init(dev, fwnode, ops);
-
-   if (!ret)
-   ret = iommu_fwspec_add_ids(dev, , 1);
-
-   return ret;
-}
-
  static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
  {
struct acpi_iort_root_complex *pci_rc;
@@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct 
acpi_iort_node *node,
return iort_iommu_driver_enabled(node->type) ?
   -EPROBE_DEFER : -ENODEV;
  
-	return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops);

+   return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops);
  }
  
  struct iort_pci_alias_info {

@@ -1020,24 +991,13 @@ static int iort_nc_iommu_map_id(struct device *dev,
   * @dev: device to configure
   * @id_in: optional input id const value pointer
   *
- * Returns: iommu_ops pointer on configuration success
- *  NULL on configuration failure
+ * Returns: 0 on success, <0 on failure
   */
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in)
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
  {
struct acpi_iort_node

Re: [PATCH v5 1/5] ACPI: arm64: Move DMA setup operations out of IORT


On 2021-06-18 16:20, Jean-Philippe Brucker wrote:

Extract generic DMA setup code out of IORT, so it can be reused by VIOT.
Keep it in drivers/acpi/arm64 for now, since it could break x86
platforms that haven't run this code so far, if they have invalid
tables.


Reviewed-by: Robin Murphy 


Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
  drivers/acpi/arm64/Makefile |  1 +
  include/linux/acpi.h|  3 +++
  include/linux/acpi_iort.h   |  6 ++---
  drivers/acpi/arm64/dma.c| 50 ++
  drivers/acpi/arm64/iort.c   | 54 ++---
  drivers/acpi/scan.c |  2 +-
  6 files changed, 66 insertions(+), 50 deletions(-)
  create mode 100644 drivers/acpi/arm64/dma.c

diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 6ff50f4ed947..66acbe77f46e 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,3 +1,4 @@
  # SPDX-License-Identifier: GPL-2.0-only
  obj-$(CONFIG_ACPI_IORT)   += iort.o
  obj-$(CONFIG_ACPI_GTDT)   += gtdt.o
+obj-y  += dma.o
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c60745f657e9..7aaa9559cc19 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -259,9 +259,12 @@ void acpi_numa_x2apic_affinity_init(struct 
acpi_srat_x2apic_cpu_affinity *pa);
  
  #ifdef CONFIG_ARM64

  void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa);
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size);
  #else
  static inline void
  acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { }
+static inline void
+acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) { }
  #endif
  
  int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma);

diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index 1a12baa58e40..f7f054833afd 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -34,7 +34,7 @@ struct irq_domain *iort_get_device_domain(struct device *dev, 
u32 id,
  void acpi_configure_pmsi_domain(struct device *dev);
  int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
  /* IOMMU interface */
-void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size);
+int iort_dma_get_ranges(struct device *dev, u64 *size);
  const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
const u32 *id_in);
  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
@@ -48,8 +48,8 @@ static inline struct irq_domain *iort_get_device_domain(
  { return NULL; }
  static inline void acpi_configure_pmsi_domain(struct device *dev) { }
  /* IOMMU interface */
-static inline void iort_dma_setup(struct device *dev, u64 *dma_addr,
- u64 *size) { }
+static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
+{ return -ENODEV; }
  static inline const struct iommu_ops *iort_iommu_configure_id(
  struct device *dev, const u32 *id_in)
  { return NULL; }
diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
new file mode 100644
index ..f16739ad3cc0
--- /dev/null
+++ b/drivers/acpi/arm64/dma.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
+{
+   int ret;
+   u64 end, mask;
+   u64 dmaaddr = 0, size = 0, offset = 0;
+
+   /*
+* If @dev is expected to be DMA-capable then the bus code that created
+* it should have initialised its dma_mask pointer by this point. For
+* now, we'll continue the legacy behaviour of coercing it to the
+* coherent mask if not, but we'll no longer do so quietly.
+*/
+   if (!dev->dma_mask) {
+   dev_warn(dev, "DMA mask not set\n");
+   dev->dma_mask = >coherent_dma_mask;
+   }
+
+   if (dev->coherent_dma_mask)
+   size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
+   else
+   size = 1ULL << 32;
+
+   ret = acpi_dma_get_range(dev, , , );
+   if (ret == -ENODEV)
+   ret = iort_dma_get_ranges(dev, );
+   if (!ret) {
+   /*
+* Limit coherent and dma mask based on size retrieved from
+* firmware.
+*/
+   end = dmaaddr + size - 1;
+   mask = DMA_BIT_MASK(ilog2(end) + 1);
+   dev->bus_dma_limit = end;
+   dev->coherent_dma_mask = min(dev->coherent_dma_mask, mask);
+   *dev->dma_mask = min(*dev->dma_mask, mask);
+   }
+
+   *dma_addr = dmaaddr;
+   *dma_size = size;
+
+   ret = dma_direct_set_offset(dev, dmaaddr + offset, dmaaddr, size);
+
+   dev_dbg(dev, "dma_offset(%#08llx)%s\n", offset, ret ? " failed!" : "");
+}

Re: [PATCH v5 4/5] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()


On 2021-06-18 16:20, Jean-Philippe Brucker wrote:

Passing a 64-bit address width to iommu_setup_dma_ops() is valid on
virtual platforms, but isn't currently possible. The overflow check in
iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass
a limit address instead of a size, so callers don't have to fake a size
to work around the check.

The base and limit parameters are being phased out, because:
* they are redundant for x86 callers. dma-iommu already reserves the
   first page, and the upper limit is already in domain->geometry.
* they can now be obtained from dev->dma_range_map on Arm.
But removing them on Arm isn't completely straightforward so is left for
future work. As an intermediate step, simplify the x86 callers by
passing dummy limits.


Reviewed-by: Robin Murphy 


Signed-off-by: Jean-Philippe Brucker 
---
  include/linux/dma-iommu.h   |  4 ++--
  arch/arm64/mm/dma-mapping.c |  2 +-
  drivers/iommu/amd/iommu.c   |  2 +-
  drivers/iommu/dma-iommu.c   | 12 ++--
  drivers/iommu/intel/iommu.c |  5 +
  5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 6e75a2d689b4..758ca4694257 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, 
dma_addr_t base);
  void iommu_put_dma_cookie(struct iommu_domain *domain);
  
  /* Setup call for arch DMA mapping code */

-void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
+void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
  
  /* The DMA API isn't _quite_ the whole story, though... */

  /*
@@ -50,7 +50,7 @@ struct msi_msg;
  struct device;
  
  static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base,

-   u64 size)
+  u64 dma_limit)
  {
  }
  
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c

index 4bf1dd3eb041..6719f9efea09 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
  
  	dev->dma_coherent = coherent;

if (iommu)
-   iommu_setup_dma_ops(dev, dma_base, size);
+   iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
  
  #ifdef CONFIG_XEN

if (xen_swiotlb_detect())
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3ac42bbdefc6..216323fb27ef 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev)
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
else
set_dma_ops(dev, NULL);
  }
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..c62e19bed302 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev)
   * iommu_dma_init_domain - Initialise a DMA mapping domain
   * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
   * @base: IOVA at which the mappable address space starts
- * @size: Size of IOVA space
+ * @limit: Last address of the IOVA space
   * @dev: Device the domain is being initialised for
   *
- * @base and @size should be exact multiples of IOMMU page granularity to
+ * @base and @limit + 1 should be exact multiples of IOMMU page granularity to
   * avoid rounding surprises. If necessary, we reserve the page at address 0
   * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but
   * any change which could make prior IOVAs invalid will fail.
   */
  static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
-   u64 size, struct device *dev)
+dma_addr_t limit, struct device *dev)
  {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
@@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
/* Check the domain allows at least some access to the device... */
if (domain->geometry.force_aperture) {
if (base > domain->geometry.aperture_end ||
-   base + size <= domain->geometry.aperture_start) {
+   limit < domain->geometry.aperture_start) {
pr_warn("specified DMA range outside IOMMU 
capability\n");
return -EFAULT;
}
@@ -1308,7 +1308,7 @@ static const struct dma_map_ops iommu_dma_ops = {
   * The IOMMU core code allocates the default DMA domain,

Re: [PATCH v5 3/5] ACPI: Add driver for the VIOT table

2021-06-18 Thread Rafael J. Wysocki

On Fri, Jun 18, 2021 at 5:33 PM Jean-Philippe Brucker
 wrote:
>
> The ACPI Virtual I/O Translation Table describes topology of
> para-virtual platforms, similarly to vendor tables DMAR, IVRS and IORT.
> For now it describes the relation between virtio-iommu and the endpoints
> it manages.
>
> Three steps are needed to configure DMA of endpoints:
>
> (1) acpi_viot_init(): parse the VIOT table, find or create the fwnode
> associated to each vIOMMU device. This needs to happen after
> acpi_scan_init(), because it relies on the struct device and their
> fwnode to be available.
>
> (2) When probing the vIOMMU device, the driver registers its IOMMU ops
> within the IOMMU subsystem. This step doesn't require any
> intervention from the VIOT driver.
>
> (3) viot_iommu_configure(): before binding the endpoint to a driver,
> find the associated IOMMU ops. Register them, along with the
> endpoint ID, into the device's iommu_fwspec.
>
> If step (3) happens before step (2), it is deferred until the IOMMU is
> initialized, then retried.
>
> Tested-by: Eric Auger 
> Reviewed-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 

>From the general ACPI perspective

Acked-by: Rafael J. Wysocki 

and I'm assuming that it will be routed through a different tree.

> ---
>  drivers/acpi/Kconfig  |   3 +
>  drivers/iommu/Kconfig |   1 +
>  drivers/acpi/Makefile |   2 +
>  include/linux/acpi_viot.h |  19 ++
>  drivers/acpi/bus.c|   2 +
>  drivers/acpi/scan.c   |   3 +
>  drivers/acpi/viot.c   | 366 ++
>  MAINTAINERS   |   8 +
>  8 files changed, 404 insertions(+)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/viot.c
>
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index eedec61e3476..3758c6940ed7 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -526,6 +526,9 @@ endif
>
>  source "drivers/acpi/pmic/Kconfig"
>
> +config ACPI_VIOT
> +   bool
> +
>  endif  # ACPI
>
>  config X86_PM_TIMER
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 1f111b399bca..aff8a4830dd1 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -403,6 +403,7 @@ config VIRTIO_IOMMU
> depends on ARM64
> select IOMMU_API
> select INTERVAL_TREE
> +   select ACPI_VIOT if ACPI
> help
>   Para-virtualised IOMMU driver with virtio.
>
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 700b41adf2db..a6e644c48987 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -118,3 +118,5 @@ video-objs  += acpi_video.o video_detect.o
>  obj-y  += dptf/
>
>  obj-$(CONFIG_ARM64)+= arm64/
> +
> +obj-$(CONFIG_ACPI_VIOT)+= viot.o
> diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
> new file mode 100644
> index ..1eb8ee5b0e5f
> --- /dev/null
> +++ b/include/linux/acpi_viot.h
> @@ -0,0 +1,19 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ACPI_VIOT_H__
> +#define __ACPI_VIOT_H__
> +
> +#include 
> +
> +#ifdef CONFIG_ACPI_VIOT
> +void __init acpi_viot_init(void);
> +int viot_iommu_configure(struct device *dev);
> +#else
> +static inline void acpi_viot_init(void) {}
> +static inline int viot_iommu_configure(struct device *dev)
> +{
> +   return -ENODEV;
> +}
> +#endif
> +
> +#endif /* __ACPI_VIOT_H__ */
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index a4bd673934c0..d6f4e2f06fdb 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -27,6 +27,7 @@
>  #include 
>  #endif
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1334,6 +1335,7 @@ static int __init acpi_init(void)
> acpi_wakeup_device_init();
> acpi_debugger_init();
> acpi_setup_sb_notify_handler();
> +   acpi_viot_init();
> return 0;
>  }
>
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 2a2e690040e9..3e2bb04ab528 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1556,6 +1557,8 @@ static const struct iommu_ops 
> *acpi_iommu_configure_id(struct device *dev,
> return ops;
>
> err = iort_iommu_configure_id(dev, id_in);
> +   if (err && err != -EPROBE_DEFER)
> +   err = viot_iommu_configure(dev);
>
> /*
>  * If we have reason to believe the IOMMU driver missed the initial
> diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
> new file mode 100644
> index ..d2256326c73a
> --- /dev/null
> +++ b/drivers/acpi/viot.c
> @@ -0,0 +1,366 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtual I/O topology
> + *
> + * The Virtual I/O Translation Table (VIOT) describes the topology of
> + * para-virtual IOMMUs and the endpoints they

[PATCH] iommu/arm-smmu-v3: Add default domain quirk for Arm Fast Models

Arm Fast Models are still implementing legacy virtio-pci devices behind
the SMMU, which continue to be problematic as "real hardware" devices
(from the point of view of the simulated system) without the mitigating
VIRTIO_F_ACCESS_PLATFORM feature.

Since we now have the ability to force passthrough on a device-specific
basis, let's use it to work around this particular oddity so that people
who just want to boot Linux on a model don't have to fiddle around with
things to avoid the SMMU getting in the way by default (and I don't have
to keep telling them which things...)

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 54b2f27b81d4..13cf16e8f45b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2649,6 +2649,20 @@ static int arm_smmu_dev_disable_feature(struct device 
*dev,
}
 }
 
+static int arm_smmu_def_domain_type(struct device *dev)
+{
+   if (dev_is_pci(dev)) {
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   /* Legacy virtio-block devices on Arm Fast Models */
+   if (pdev->vendor == 0x1af4 && pdev->device == 0x1001 &&
+   pdev->subsystem_vendor == 0x00ff && pdev->subsystem_device 
== 0x0002)
+   return IOMMU_DOMAIN_IDENTITY;
+   }
+
+   return 0;
+}
+
 static struct iommu_ops arm_smmu_ops = {
.capable= arm_smmu_capable,
.domain_alloc   = arm_smmu_domain_alloc,
@@ -2673,6 +2687,7 @@ static struct iommu_ops arm_smmu_ops = {
.sva_bind   = arm_smmu_sva_bind,
.sva_unbind = arm_smmu_sva_unbind,
.sva_get_pasid  = arm_smmu_sva_get_pasid,
+   .def_domain_type= arm_smmu_def_domain_type,
.pgsize_bitmap  = -1UL, /* Restricted during device attach */
.owner  = THIS_MODULE,
 };
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 07/12] x86: Secure Launch SMP bringup support

On Intel, the APs are left in a well documented state after TXT performs
the late launch. Specifically they cannot have #INIT asserted on them so
a standard startup via INIT/SIPI/SIPI cannot be performed. Instead the
early SL stub code parked the APs in a pause/jmp loop waiting for an NMI.
The modified SMP boot code is called for the Secure Launch case. The
jump address for the RM piggy entry point is fixed up in the jump where
the APs are waiting and an NMI IPI is sent to the AP. The AP vectors to
the Secure Launch entry point in the RM piggy which mimics what the real
mode code would do then jumps the the standard RM piggy protected mode
entry point.

Signed-off-by: Ross Philipson 
---
 arch/x86/include/asm/realmode.h  |  3 ++
 arch/x86/kernel/smpboot.c| 86 
 arch/x86/realmode/rm/header.S|  3 ++
 arch/x86/realmode/rm/trampoline_64.S | 37 
 4 files changed, 129 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 5db5d08..ef37bf1 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -37,6 +37,9 @@ struct real_mode_header {
 #ifdef CONFIG_X86_64
u32 machine_real_restart_seg;
 #endif
+#ifdef CONFIG_SECURE_LAUNCH
+   u32 sl_trampoline_start32;
+#endif
 };
 
 /* This must match data at realmode/rm/trampoline_{32,64}.S */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7770245..c324b04 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1023,6 +1024,83 @@ int common_cpu_up(unsigned int cpu, struct task_struct 
*idle)
return 0;
 }
 
+#ifdef CONFIG_SECURE_LAUNCH
+
+static atomic_t first_ap_only = {1};
+
+/*
+ * Called to fix the long jump address for the waiting APs to vector to
+ * the correct startup location in the Secure Launch stub in the rmpiggy.
+ */
+static int
+slaunch_fixup_jump_vector(void)
+{
+   struct sl_ap_wake_info *ap_wake_info;
+   u32 *ap_jmp_ptr = NULL;
+
+   if (!atomic_dec_and_test(_ap_only))
+   return 0;
+
+   ap_wake_info = slaunch_get_ap_wake_info();
+
+   ap_jmp_ptr = (u32 *)__va(ap_wake_info->ap_wake_block +
+ap_wake_info->ap_jmp_offset);
+
+   *ap_jmp_ptr = real_mode_header->sl_trampoline_start32;
+
+   pr_info("TXT AP long jump address updated\n");
+
+   return 0;
+}
+
+/*
+ * TXT AP startup is quite different than normal. The APs cannot have #INIT
+ * asserted on them or receive SIPIs. The early Secure Launch code has parked
+ * the APs in a pause loop waiting to receive an NMI. This will wake the APs
+ * and have them jump to the protected mode code in the rmpiggy where the rest
+ * of the SMP boot of the AP will proceed normally.
+ */
+static int
+slaunch_wakeup_cpu_from_txt(int cpu, int apicid)
+{
+   unsigned long send_status = 0, accept_status = 0;
+
+   /* Only done once */
+   if (slaunch_fixup_jump_vector())
+   return -1;
+
+   /* Send NMI IPI to idling AP and wake it up */
+   apic_icr_write(APIC_DM_NMI, apicid);
+
+   if (init_udelay == 0)
+   udelay(10);
+   else
+   udelay(300);
+
+   send_status = safe_apic_wait_icr_idle();
+
+   if (init_udelay == 0)
+   udelay(10);
+   else
+   udelay(300);
+
+   accept_status = (apic_read(APIC_ESR) & 0xEF);
+
+   if (send_status)
+   pr_err("Secure Launch IPI never delivered???\n");
+   if (accept_status)
+   pr_err("Secure Launch IPI delivery error (%lx)\n",
+   accept_status);
+
+   return (send_status | accept_status);
+}
+
+#else
+
+#define slaunch_wakeup_cpu_from_txt(cpu, apicid)   0
+
+#endif  /* !CONFIG_SECURE_LAUNCH */
+
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -1077,6 +1155,13 @@ static int do_boot_cpu(int apicid, int cpu, struct 
task_struct *idle,
cpumask_clear_cpu(cpu, cpu_initialized_mask);
smp_mb();
 
+   /* With Intel TXT, the AP startup is totally different */
+   if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) ==
+  (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) {
+   boot_error = slaunch_wakeup_cpu_from_txt(cpu, apicid);
+   goto txt_wake;
+   }
+
/*
 * Wake up a CPU in difference cases:
 * - Use the method in the APIC driver if it's defined
@@ -1089,6 +1174,7 @@ static int do_boot_cpu(int apicid, int cpu, struct 
task_struct *idle,
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
 cpu0_nmi_registered);
 
+txt_wake:
if (!boot_error) {
/*
 * Wait 10s total for

[PATCH v2 01/12] x86/boot: Place kernel_info at a fixed offset

From: Arvind Sankar 

There are use cases for storing the offset of a symbol in kernel_info.
For example, the trenchboot series [0] needs to store the offset of the
Measured Launch Environment header in kernel_info.

Since commit (note: commit ID from tip/master)

  527afc212231 ("x86/boot: Check that there are no run-time relocations")

run-time relocations are not allowed in the compressed kernel, so simply
using the symbol in kernel_info, as

.long   symbol

will cause a linker error because this is not position-independent.

With kernel_info being a separate object file and in a different section
from startup_32, there is no way to calculate the offset of a symbol
from the start of the image in a position-independent way.

To enable such use cases, put kernel_info into its own section which is
placed at a predetermined offset (KERNEL_INFO_OFFSET) via the linker
script. This will allow calculating the symbol offset in a
position-independent way, by adding the offset from the start of
kernel_info to KERNEL_INFO_OFFSET.

Ensure that kernel_info is aligned, and use the SYM_DATA.* macros
instead of bare labels. This stores the size of the kernel_info
structure in the ELF symbol table.

Signed-off-by: Arvind Sankar 
Cc: Ross Philipson 
Signed-off-by: Ross Philipson 
---
 arch/x86/boot/compressed/kernel_info.S | 19 +++
 arch/x86/boot/compressed/kernel_info.h | 12 
 arch/x86/boot/compressed/vmlinux.lds.S |  6 ++
 3 files changed, 33 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/boot/compressed/kernel_info.h

diff --git a/arch/x86/boot/compressed/kernel_info.S 
b/arch/x86/boot/compressed/kernel_info.S
index f818ee8..c18f071 100644
--- a/arch/x86/boot/compressed/kernel_info.S
+++ b/arch/x86/boot/compressed/kernel_info.S
@@ -1,12 +1,23 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
+#include 
 #include 
+#include "kernel_info.h"
 
-   .section ".rodata.kernel_info", "a"
+/*
+ * If a field needs to hold the offset of a symbol from the start
+ * of the image, use the macro below, eg
+ * .long   rva(symbol)
+ * This will avoid creating run-time relocations, which are not
+ * allowed in the compressed kernel.
+ */
+
+#define rva(X) (((X) - kernel_info) + KERNEL_INFO_OFFSET)
 
-   .global kernel_info
+   .section ".rodata.kernel_info", "a"
 
-kernel_info:
+   .balign 16
+SYM_DATA_START(kernel_info)
/* Header, Linux top (structure). */
.ascii  "LToP"
/* Size. */
@@ -19,4 +30,4 @@ kernel_info:
 
 kernel_info_var_len_data:
/* Empty for time being... */
-kernel_info_end:
+SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
diff --git a/arch/x86/boot/compressed/kernel_info.h 
b/arch/x86/boot/compressed/kernel_info.h
new file mode 100644
index ..c127f84
--- /dev/null
+++ b/arch/x86/boot/compressed/kernel_info.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BOOT_COMPRESSED_KERNEL_INFO_H
+#define BOOT_COMPRESSED_KERNEL_INFO_H
+
+#ifdef CONFIG_X86_64
+#define KERNEL_INFO_OFFSET 0x500
+#else /* 32-bit */
+#define KERNEL_INFO_OFFSET 0x100
+#endif
+
+#endif /* BOOT_COMPRESSED_KERNEL_INFO_H */
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
index 112b237..84c7b4d 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -7,6 +7,7 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT)
 
 #include 
 #include 
+#include "kernel_info.h"
 
 #ifdef CONFIG_X86_64
 OUTPUT_ARCH(i386:x86-64)
@@ -27,6 +28,11 @@ SECTIONS
HEAD_TEXT
_ehead = . ;
}
+   .rodata.kernel_info KERNEL_INFO_OFFSET : {
+   *(.rodata.kernel_info)
+   }
+   ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info at bad 
address!")
+
.rodata..compressed : {
*(.rodata..compressed)
}
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 02/12] x86: Secure Launch Kconfig

Initial bits to bring in Secure Launch functionality. Add Kconfig
options for compiling in/out the Secure Launch code.

Signed-off-by: Ross Philipson 
---
 arch/x86/Kconfig | 32 
 1 file changed, 32 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0045e1b..65d69f0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1989,6 +1989,38 @@ config EFI_MIXED
 
   If unsure, say N.
 
+config SECURE_LAUNCH
+   bool "Secure Launch support"
+   default n
+   depends on X86_64 && X86_X2APIC
+   help
+  The Secure Launch feature allows a kernel to be loaded
+  directly through an Intel TXT measured launch. Intel TXT
+  establishes a Dynamic Root of Trust for Measurement (DRTM)
+  where the CPU measures the kernel image. This feature then
+  continues the measurement chain over kernel configuration
+  information and init images.
+
+config SECURE_LAUNCH_ALT_PCR19
+   bool "Secure Launch Alternate PCR 19 usage"
+   default n
+   depends on SECURE_LAUNCH
+   help
+  In the post ACM environment, Secure Launch by default measures
+  configuration information into PCR18. This feature allows finer
+  control over measurements by moving configuration measurements
+  into PCR19.
+
+config SECURE_LAUNCH_ALT_PCR20
+   bool "Secure Launch Alternate PCR 20 usage"
+   default n
+   depends on SECURE_LAUNCH
+   help
+  In the post ACM environment, Secure Launch by default measures
+  image data like any external initrd into PCR17. This feature
+  allows finer control over measurements by moving image measurements
+  into PCR20.
+
 source "kernel/Kconfig.hz"
 
 config KEXEC
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 00/12] x86: Trenchboot secure dynamic launch Linux kernel support

The focus of Trechboot project (https://github.com/TrenchBoot) is to
enhance the boot security and integrity. This requires the linux kernel
to be directly invoked by x86 Dynamic launch measurements to establish
Dynamic Root of Trust for Measurement (DRTM). The dynamic launch will
be initiated by a boot loader with associated support added to it, for
example the first targeted boot loader will be GRUB2. An integral part of
establishing the DRTM involves measuring everything that is intended to
be run (kernel image, initrd, etc) and everything that will configure
that kernel to run (command line, boot params, etc) into specific PCRs,
the DRTM PCRs (17-22), in the TPM. Another key aspect is the dynamic
launch is rooted in hardware, that is to say the hardware (CPU) is what
takes the first measurement for the chain of integrity measurements. On
Intel this is done using the GETSEC instruction provided by Intel's TXT
and the SKINIT instruction provided by AMD's AMD-V. Information on these
technologies can be readily found online. This patchset introduces Intel
TXT support.

To enable the kernel to be launched by GETSEC, a stub must be built
into the setup section of the compressed kernel to handle the specific
state that the dynamic launch process leaves the BSP in. Also this stub
must measure everything that is going to be used as early as possible.
This stub code and subsequent code must also deal with the specific
state that the dynamic launch leaves the APs in.

A quick note on terminology. The larger open source project itself is
called Trenchboot, which is hosted on Github (links below). The kernel
feature enabling the use of the x86 technology is referred to as "Secure
Launch" within the kernel code. As such the prefixes sl_/SL_ or
slaunch/SLAUNCH will be seen in the code. The stub code discussed above
is referred to as the SL stub.

Note that patch 1 was authored by Arvind Sankar. We were not able to get
a status on this patch but Secure Launch depends on it so it is included
with the set.

The basic flow is:

- Entry from the dynamic launch jumps to the SL stub
- SL stub fixes up the world on the BSP
- For TXT, SL stub wakes the APs, fixes up their worlds
- For TXT, APs are left halted waiting for an NMI to wake them
- SL stub jumps to startup_32
- SL main locates the TPM event log and writes the measurements of
configuration and module information into it.
- Kernel boot proceeds normally from this point.
- During early setup, slaunch_setup() runs to finish some validation
and setup tasks.
- The SMP bringup code is modified to wake the waiting APs. APs vector
to rmpiggy and start up normally from that point.
- SL platform module is registered as a late initcall module. It reads
the TPM event log and extends the measurements taken into the TPM PCRs.
- SL platform module initializes the securityfs interface to allow
asccess to the TPM event log and TXT public registers.
- Kernel boot finishes booting normally
- SEXIT support to leave SMX mode is present on the kexec path and
the various reboot paths (poweroff, reset, halt).

Links:

The Trenchboot project including documentation:

https://github.com/trenchboot

Intel TXT is documented in its own specification and in the SDM Instruction Set
volume:

https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
https://software.intel.com/en-us/articles/intel-sdm

AMD SKINIT is documented in the System Programming manual:

https://www.amd.com/system/files/TechDocs/24593.pdf

GRUB2 pre-launch support patchset (WIP):

https://lists.gnu.org/archive/html/grub-devel/2020-05/msg00011.html

Thanks
Ross Philipson and Daniel P. Smith

Changes in v2:

- Modified 32b entry code to prevent causing relocations in the compressed
kernel.
- Dropped patches for compressed kernel TPM PCR extender.
- Modified event log code to insert log delimiter events and not rely
on TPM access.
- Stop extending PCRs in the early Secure Launch stub code.
- Removed Kconfig options for hash algorithms and use the algorithms the
ACM used.
- Match Secure Launch measurement algorithm use to those reported in the
TPM 2.0 event log.
- Read the TPM events out of the TPM and extend them into the PCRs using
the mainline TPM driver. This is done in the late initcall module.
- Allow use of alternate PCR 19 and 20 for post ACM measurements.
- Add Kconfig constraints needed by Secure Launch (disable KASLR
and add x2apic dependency).
- Fix testing of SL_FLAGS when determining if Secure Launch is active
and the architecture is TXT.
- Use SYM_DATA_START_LOCAL macros in early entry point code.
- Security audit changes:
- Validate buffers passed to MLE do not overlap the MLE and are
properly laid out.
- Validate buffers and memory regions used by the MLE are
protected by IOMMU PMRs.
- Force IOMMU to not use passthrough mode during a Secure Launch.
- Prevent KASLR use during a

[PATCH v2 05/12] x86: Secure Launch kernel early boot stub

The Secure Launch (SL) stub provides the entry point for Intel TXT (and
later AMD SKINIT) to vector to during the late launch. The symbol
sl_stub_entry is that entry point and its offset into the kernel is
conveyed to the launching code using the MLE (Measured Launch
Environment) header in the structure named mle_header. The offset of the
MLE header is set in the kernel_info. The routine sl_stub contains the
very early late launch setup code responsible for setting up the basic
environment to allow the normal kernel startup_32 code to proceed. It is
also responsible for properly waking and handling the APs on Intel
platforms. The routine sl_main which runs after entering 64b mode is
responsible for measuring configuration and module information before
it is used like the boot params, the kernel command line, the TXT heap,
an external initramfs, etc.

Signed-off-by: Ross Philipson 
---
 Documentation/x86/boot.rst |  13 +
 arch/x86/boot/compressed/Makefile  |   3 +-
 arch/x86/boot/compressed/head_64.S |  37 ++
 arch/x86/boot/compressed/kaslr.c   |  11 +
 arch/x86/boot/compressed/kernel_info.S |  33 ++
 arch/x86/boot/compressed/sl_main.c | 523 ++
 arch/x86/boot/compressed/sl_stub.S | 667 +
 arch/x86/kernel/asm-offsets.c  |  19 +
 8 files changed, 1305 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/boot/compressed/sl_main.c
 create mode 100644 arch/x86/boot/compressed/sl_stub.S

diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
index fc84491..7623f60 100644
--- a/Documentation/x86/boot.rst
+++ b/Documentation/x86/boot.rst
@@ -1026,6 +1026,19 @@ Offset/size: 0x000c/4
 
   This field contains maximal allowed type for setup_data and setup_indirect 
structs.
 
+   =
+Field name:mle_header_offset
+Offset/size:   0x0010/4
+   =
+
+  This field contains the offset to the Secure Launch Measured Launch 
Environment
+  (MLE) header. This offset is used to locate information needed during a 
secure
+  late launch using Intel TXT. If the offset is zero, the kernel does not have
+  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
+  following a success measured launch. The specific state of the processors is
+  outlined in the TXT Software Development Guide, the latest can be found here:
+  
https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
+
 
 The Image Checksum
 ==
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 059d49a..1fe55a5 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -102,7 +102,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
 vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
 
-vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o 
$(obj)/early_sha256.o
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o 
$(obj)/early_sha256.o \
+   $(obj)/sl_main.o $(obj)/sl_stub.o
 
 $(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index a2347de..b35e072 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -498,6 +498,17 @@ trampoline_return:
pushq   $0
popfq
 
+#ifdef CONFIG_SECURE_LAUNCH
+   pushq   %rsi
+
+   /* Ensure the relocation region coverd by a PMR */
+   movq%rbx, %rdi
+   movl$(_bss - startup_32), %esi
+   callq   sl_check_region
+
+   popq%rsi
+#endif
+
 /*
  * Copy the compressed kernel to the end of our buffer
  * where decompression in place becomes safe.
@@ -556,6 +567,32 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
shrq$3, %rcx
rep stosq
 
+#ifdef CONFIG_SECURE_LAUNCH
+   /*
+* Have to do the final early sl stub work in 64b area.
+*
+* *** NOTE ***
+*
+* Several boot params get used before we get a chance to measure
+* them in this call. This is a known issue and we currently don't
+* have a solution. The scratch field doesn't matter. There is no
+* obvious way to do anything about the use of kernel_alignment or
+* init_size though these seem low risk with all the PMR and overlap
+* checks in place.
+*/
+   pushq   %rsi
+
+   movq%rsi, %rdi
+   callq   sl_main
+
+   /* Ensure the decompression location is coverd by a PMR */
+   movq%rbp, %rdi
+   movqoutput_len(%rip), %rsi
+   callq   sl_check_region
+
+   popq%rsi
+#endif
+
 /*
  * If running as an SEV guest, the encryption mask is required in the
  * page-table setup code below. When the

[PATCH v2 10/12] x86: Secure Launch late initcall platform module

From: "Daniel P. Smith" 

The Secure Launch platform module is a late init module. During the
init call, the TPM event log is read and measurements taken in the
early boot stub code are located. These measurements are extended
into the TPM PCRs using the mainline TPM kernel driver.

The platform module also registers the securityfs nodes to allow
access to TXT register fields on Intel along with the fetching of
and writing events to the late launch TPM log.

Signed-off-by: Daniel P. Smith 
Signed-off-by: garnetgrimm 
Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/Makefile   |   1 +
 arch/x86/kernel/slmodule.c | 495 +
 2 files changed, 496 insertions(+)
 create mode 100644 arch/x86/kernel/slmodule.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 574e643..1187077 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -81,6 +81,7 @@ obj-$(CONFIG_IA32_EMULATION)  += tls.o
 obj-y  += step.o
 obj-$(CONFIG_INTEL_TXT)+= tboot.o
 obj-$(CONFIG_SECURE_LAUNCH)+= slaunch.o
+obj-$(CONFIG_SECURE_LAUNCH)+= slmodule.o
 obj-$(CONFIG_ISA_DMA_API)  += i8237.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
diff --git a/arch/x86/kernel/slmodule.c b/arch/x86/kernel/slmodule.c
new file mode 100644
index ..807f9ca
--- /dev/null
+++ b/arch/x86/kernel/slmodule.c
@@ -0,0 +1,495 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Secure Launch late validation/setup, securityfs exposure and
+ * finalization support.
+ *
+ * Copyright (c) 2021 Apertus Solutions, LLC
+ * Copyright (c) 2021 Assured Information Security, Inc.
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ *
+ * Author(s):
+ * Daniel P. Smith 
+ * Garnet T. Grimm 
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SL_FS_ENTRIES  10
+/* root directory node must be last */
+#define SL_ROOT_DIR_ENTRY  (SL_FS_ENTRIES - 1)
+#define SL_TXT_DIR_ENTRY   (SL_FS_ENTRIES - 2)
+#define SL_TXT_FILE_FIRST  (SL_TXT_DIR_ENTRY - 1)
+#define SL_TXT_ENTRY_COUNT 7
+
+#define DECLARE_TXT_PUB_READ_U(size, fmt, msg_size)\
+static ssize_t txt_pub_read_u##size(unsigned int offset,   \
+   loff_t *read_offset,\
+   size_t read_len,\
+   char __user *buf)   \
+{  \
+   void __iomem *txt;  \
+   char msg_buffer[msg_size];  \
+   u##size reg_value = 0;  \
+   txt = ioremap(TXT_PUB_CONFIG_REGS_BASE, \
+   TXT_NR_CONFIG_PAGES * PAGE_SIZE);   \
+   if (IS_ERR(txt))\
+   return PTR_ERR(txt);\
+   memcpy_fromio(_value, txt + offset, sizeof(u##size));   \
+   iounmap(txt);   \
+   snprintf(msg_buffer, msg_size, fmt, reg_value); \
+   return simple_read_from_buffer(buf, read_len, read_offset,  \
+   _buffer, msg_size); \
+}
+
+DECLARE_TXT_PUB_READ_U(8, "%#04x\n", 6);
+DECLARE_TXT_PUB_READ_U(32, "%#010x\n", 12);
+DECLARE_TXT_PUB_READ_U(64, "%#018llx\n", 20);
+
+#define DECLARE_TXT_FOPS(reg_name, reg_offset, reg_size)   \
+static ssize_t txt_##reg_name##_read(struct file *flip,
\
+   char __user *buf, size_t read_len, loff_t *read_offset) \
+{  \
+   return txt_pub_read_u##reg_size(reg_offset, read_offset,\
+   read_len, buf); \
+}  \
+static const struct file_operations reg_name##_ops = { \
+   .read = txt_##reg_name##_read,  \
+}
+
+DECLARE_TXT_FOPS(sts, TXT_CR_STS, 64);
+DECLARE_TXT_FOPS(ests, TXT_CR_ESTS, 8);
+DECLARE_TXT_FOPS(errorcode, TXT_CR_ERRORCODE, 32);
+DECLARE_TXT_FOPS(didvid, TXT_CR_DIDVID, 64);
+DECLARE_TXT_FOPS(e2sts, TXT_CR_E2STS, 64);
+DECLARE_TXT_FOPS(ver_emif, TXT_CR_VER_EMIF, 32);
+DECLARE_TXT_FOPS(scratchpad, TXT_CR_SCRATCHPAD, 64);
+
+/*
+ * Securityfs exposure
+ */
+struct memfile {
+   char *name;
+   void *addr;
+   size_t size;
+};
+
+static struct memfile sl_evtlog = {"eventlog", 0, 0};
+static void *txt_heap;
+static struct

[PATCH v2 03/12] x86: Secure Launch main header file

Introduce the main Secure Launch header file used in the early SL stub
and the early setup code.

Signed-off-by: Ross Philipson 
---
 include/linux/slaunch.h | 540 
 1 file changed, 540 insertions(+)
 create mode 100644 include/linux/slaunch.h

diff --git a/include/linux/slaunch.h b/include/linux/slaunch.h
new file mode 100644
index ..dd8d92e
--- /dev/null
+++ b/include/linux/slaunch.h
@@ -0,0 +1,540 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Main Secure Launch header file.
+ *
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ */
+
+#ifndef _LINUX_SLAUNCH_H
+#define _LINUX_SLAUNCH_H
+
+/*
+ * Secure Launch Defined State Flags
+ */
+#define SL_FLAG_ACTIVE 0x0001
+#define SL_FLAG_ARCH_SKINIT0x0002
+#define SL_FLAG_ARCH_TXT   0x0004
+
+/*
+ * Secure Launch CPU Type
+ */
+#define SL_CPU_AMD 1
+#define SL_CPU_INTEL   2
+
+#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
+
+#define __SL32_CS  0x0008
+#define __SL32_DS  0x0010
+
+#define INTEL_CPUID_MFGID_EBX  0x756e6547 /* Genu */
+#define INTEL_CPUID_MFGID_EDX  0x49656e69 /* ineI */
+#define INTEL_CPUID_MFGID_ECX  0x6c65746e /* ntel */
+
+#define AMD_CPUID_MFGID_EBX0x68747541 /* Auth */
+#define AMD_CPUID_MFGID_EDX0x69746e65 /* enti */
+#define AMD_CPUID_MFGID_ECX0x444d4163 /* cAMD */
+
+/*
+ * Intel Safer Mode Extensions (SMX)
+ *
+ * Intel SMX provides a programming interface to establish a Measured Launched
+ * Environment (MLE). The measurement and protection mechanisms supported by 
the
+ * capabilities of an Intel Trusted Execution Technology (TXT) platform. SMX is
+ * the processor’s programming interface in an Intel TXT platform.
+ *
+ * See Intel SDM Volume 2 - 6.1 "Safer Mode Extensions Reference"
+ */
+
+/*
+ * SMX GETSEC Leaf Functions
+ */
+#define SMX_X86_GETSEC_SEXIT   5
+#define SMX_X86_GETSEC_SMCTRL  7
+#define SMX_X86_GETSEC_WAKEUP  8
+
+/*
+ * Intel Trusted Execution Technology MMIO Registers Banks
+ */
+#define TXT_PUB_CONFIG_REGS_BASE   0xfed3
+#define TXT_PRIV_CONFIG_REGS_BASE  0xfed2
+#define TXT_NR_CONFIG_PAGES ((TXT_PUB_CONFIG_REGS_BASE - \
+ TXT_PRIV_CONFIG_REGS_BASE) >> PAGE_SHIFT)
+
+/*
+ * Intel Trusted Execution Technology (TXT) Registers
+ */
+#define TXT_CR_STS 0x
+#define TXT_CR_ESTS0x0008
+#define TXT_CR_ERRORCODE   0x0030
+#define TXT_CR_CMD_RESET   0x0038
+#define TXT_CR_CMD_CLOSE_PRIVATE   0x0048
+#define TXT_CR_DIDVID  0x0110
+#define TXT_CR_VER_EMIF0x0200
+#define TXT_CR_CMD_UNLOCK_MEM_CONFIG   0x0218
+#define TXT_CR_SINIT_BASE  0x0270
+#define TXT_CR_SINIT_SIZE  0x0278
+#define TXT_CR_MLE_JOIN0x0290
+#define TXT_CR_HEAP_BASE   0x0300
+#define TXT_CR_HEAP_SIZE   0x0308
+#define TXT_CR_SCRATCHPAD  0x0378
+#define TXT_CR_CMD_OPEN_LOCALITY1  0x0380
+#define TXT_CR_CMD_CLOSE_LOCALITY1 0x0388
+#define TXT_CR_CMD_OPEN_LOCALITY2  0x0390
+#define TXT_CR_CMD_CLOSE_LOCALITY2 0x0398
+#define TXT_CR_CMD_SECRETS 0x08e0
+#define TXT_CR_CMD_NO_SECRETS  0x08e8
+#define TXT_CR_E2STS   0x08f0
+
+/* TXT default register value */
+#define TXT_REGVALUE_ONE   0x1ULL
+
+/* TXTCR_STS status bits */
+#define TXT_SENTER_DONE_STS(1<<0)
+#define TXT_SEXIT_DONE_STS (1<<1)
+
+/*
+ * SINIT/MLE Capabilities Field Bit Definitions
+ */
+#define TXT_SINIT_MLE_CAP_WAKE_GETSEC  0
+#define TXT_SINIT_MLE_CAP_WAKE_MONITOR 1
+
+/*
+ * OS/MLE Secure Launch Specific Definitions
+ */
+#define TXT_OS_MLE_STRUCT_VERSION  1
+#define TXT_OS_MLE_MAX_VARIABLE_MTRRS  32
+
+/*
+ * TXT Heap Table Enumeration
+ */
+#define TXT_BIOS_DATA_TABLE1
+#define TXT_OS_MLE_DATA_TABLE  2
+#define TXT_OS_SINIT_DATA_TABLE3
+#define TXT_SINIT_MLE_DATA_TABLE   4
+#define TXT_SINIT_TABLE_MAXTXT_SINIT_MLE_DATA_TABLE
+
+/*
+ * Secure Launch Defined Error Codes used in MLE-initiated TXT resets.
+ *
+ * TXT Specification
+ * Appendix I ACM Error Codes
+ */
+#define SL_ERROR_GENERIC   0xc0008001
+#define SL_ERROR_TPM_INIT  0xc0008002
+#define SL_ERROR_TPM_INVALID_LOG20 0xc0008003
+#define SL_ERROR_TPM_LOGGING_FAILED0xc0008004
+#define SL_ERROR_REGION_STRADDLE_4GB   0xc0008005
+#define SL_ERROR_TPM_EXTEND0xc0008006
+#define SL_ERROR_MTRR_INV_VCNT 0xc0008007
+#define SL_ERROR_MTRR_INV_DEF_TYPE 0xc0008008
+#define SL_ERROR_MTRR_INV_BASE 0xc0008009
+#define SL_ERROR_MTRR_INV_MASK 0xc000800a
+#define SL_ERROR_MSR_INV_MISC_EN   0xc000800b
+#define SL_ERROR_INV_AP_INTERRUPT  0xc000800c
+#define SL_ERROR_INTEGER_OVERFLOW  0xc000800d
+#define SL_ERROR_HEAP_WALK 0xc000800e
+#define SL_ERROR_HEAP_MAP

[PATCH v2 04/12] x86: Add early SHA support for Secure Launch early measurements

From: "Daniel P. Smith" 

The SHA algorithms are necessary to measure configuration information into
the TPM as early as possible before using the values. This implementation
uses the established approach of #including the SHA libraries directly in
the code since the compressed kernel is not uncompressed at this point.

The SHA code here has its origins in the code from the main kernel, commit
c4d5b9ffa31f (crypto: sha1 - implement base layer for SHA-1). That code could
not be pulled directly into the setup portion of the compressed kernel because
of other dependencies it pulls in. The result is this is a modified copy of
that code that still leverages the core SHA algorithms.

Signed-off-by: Daniel P. Smith 
Signed-off-by: Ross Philipson 
---
 arch/x86/boot/compressed/Makefile   |   2 +
 arch/x86/boot/compressed/early_sha1.c   | 103 
 arch/x86/boot/compressed/early_sha1.h   |  17 ++
 arch/x86/boot/compressed/early_sha256.c |   7 +++
 lib/crypto/sha256.c |   8 +++
 lib/sha1.c  |   4 ++
 6 files changed, 141 insertions(+)
 create mode 100644 arch/x86/boot/compressed/early_sha1.c
 create mode 100644 arch/x86/boot/compressed/early_sha1.h
 create mode 100644 arch/x86/boot/compressed/early_sha256.c

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 431bf7f..059d49a 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -102,6 +102,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
 vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
 
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o 
$(obj)/early_sha256.o
+
 $(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE
$(call if_changed,ld)
 
diff --git a/arch/x86/boot/compressed/early_sha1.c 
b/arch/x86/boot/compressed/early_sha1.c
new file mode 100644
index ..74f4654
--- /dev/null
+++ b/arch/x86/boot/compressed/early_sha1.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Apertus Solutions, LLC.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "early_sha1.h"
+
+#define SHA1_DISABLE_EXPORT
+#include "../../../../lib/sha1.c"
+
+/* The SHA1 implementation in lib/sha1.c was written to get the workspace
+ * buffer as a parameter. This wrapper function provides a container
+ * around a temporary workspace that is cleared after the transform completes.
+ */
+static void __sha_transform(u32 *digest, const char *data)
+{
+   u32 ws[SHA1_WORKSPACE_WORDS];
+
+   sha1_transform(digest, data, ws);
+
+   memset(ws, 0, sizeof(ws));
+   /*
+* As this is cryptographic code, prevent the memset 0 from being
+* optimized out potentially leaving secrets in memory.
+*/
+   wmb();
+
+}
+
+void early_sha1_init(struct sha1_state *sctx)
+{
+   sha1_init(sctx->state);
+   sctx->count = 0;
+}
+
+void early_sha1_update(struct sha1_state *sctx,
+  const u8 *data,
+  unsigned int len)
+{
+   unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+   sctx->count += len;
+
+   if (likely((partial + len) >= SHA1_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA1_BLOCK_SIZE - partial;
+
+   memcpy(sctx->buffer + partial, data, p);
+   data += p;
+   len -= p;
+
+   __sha_transform(sctx->state, sctx->buffer);
+   }
+
+   blocks = len / SHA1_BLOCK_SIZE;
+   len %= SHA1_BLOCK_SIZE;
+
+   if (blocks) {
+   while (blocks--) {
+   __sha_transform(sctx->state, data);
+   data += SHA1_BLOCK_SIZE;
+   }
+   }
+   partial = 0;
+   }
+
+   if (len)
+   memcpy(sctx->buffer + partial, data, len);
+}
+
+void early_sha1_final(struct sha1_state *sctx, u8 *out)
+{
+   const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64);
+   __be64 *bits = (__be64 *)(sctx->buffer + bit_offset);
+   __be32 *digest = (__be32 *)out;
+   unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+   int i;
+
+   sctx->buffer[partial++] = 0x80;
+   if (partial > bit_offset) {
+   memset(sctx->buffer + partial, 0x0, SHA1_BLOCK_SIZE - partial);
+   partial = 0;
+
+   __sha_transform(sctx->state, sctx->buffer);
+   }
+
+   memset(sctx->buffer + partial, 0x0, bit_offset - partial);
+   *bits = cpu_to_be64(sctx->count << 3);
+   __sha_transform(sctx->state, sctx->buffer);
+
+   for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+   put_unaligned_be32(sctx->state[i], digest++);
+
+   *sctx

[PATCH v2 06/12] x86: Secure Launch kernel late boot stub

The routine slaunch_setup is called out of the x86 specific setup_arch
routine during early kernel boot. After determining what platform is
present, various operations specific to that platform occur. This
includes finalizing setting for the platform late launch and verifying
that memory protections are in place.

For TXT, this code also reserves the original compressed kernel setup
area where the APs were left looping so that this memory cannot be used.

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/Makefile   |   1 +
 arch/x86/kernel/setup.c|   3 +
 arch/x86/kernel/slaunch.c  | 472 +
 drivers/iommu/intel/dmar.c |   4 +
 4 files changed, 480 insertions(+)
 create mode 100644 arch/x86/kernel/slaunch.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0f66682..574e643 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -80,6 +80,7 @@ obj-$(CONFIG_X86_32)  += tls.o
 obj-$(CONFIG_IA32_EMULATION)   += tls.o
 obj-y  += step.o
 obj-$(CONFIG_INTEL_TXT)+= tboot.o
+obj-$(CONFIG_SECURE_LAUNCH)+= slaunch.o
 obj-$(CONFIG_ISA_DMA_API)  += i8237.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 1e72062..7f5d12a 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -993,6 +994,8 @@ void __init setup_arch(char **cmdline_p)
early_gart_iommu_check();
 #endif
 
+   slaunch_setup();
+
/*
 * partially used pages are not usable - thus
 * we are rounding upwards:
diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c
new file mode 100644
index ..a24d384
--- /dev/null
+++ b/arch/x86/kernel/slaunch.c
@@ -0,0 +1,472 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Secure Launch late validation/setup and finalization support.
+ *
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static u32 sl_flags;
+static struct sl_ap_wake_info ap_wake_info;
+static u64 evtlog_addr;
+static u32 evtlog_size;
+static u64 vtd_pmr_lo_size;
+
+/* This should be plenty of room */
+static u8 txt_dmar[PAGE_SIZE] __aligned(16);
+
+u32 slaunch_get_flags(void)
+{
+   return sl_flags;
+}
+EXPORT_SYMBOL(slaunch_get_flags);
+
+struct sl_ap_wake_info *slaunch_get_ap_wake_info(void)
+{
+   return _wake_info;
+}
+
+struct acpi_table_header *slaunch_get_dmar_table(struct acpi_table_header 
*dmar)
+{
+   /* The DMAR is only stashed and provided via TXT on Intel systems */
+   if (memcmp(txt_dmar, "DMAR", 4))
+   return dmar;
+
+   return (struct acpi_table_header *)(_dmar[0]);
+}
+
+void __noreturn slaunch_txt_reset(void __iomem *txt,
+ const char *msg, u64 error)
+{
+   u64 one = 1, val;
+
+   pr_err("%s", msg);
+
+   /*
+* This performs a TXT reset with a sticky error code. The reads of
+* TXT_CR_E2STS act as barriers.
+*/
+   memcpy_toio(txt + TXT_CR_ERRORCODE, , sizeof(error));
+   memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val));
+   memcpy_toio(txt + TXT_CR_CMD_NO_SECRETS, , sizeof(one));
+   memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val));
+   memcpy_toio(txt + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(one));
+   memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val));
+   memcpy_toio(txt + TXT_CR_CMD_RESET, , sizeof(one));
+
+   for ( ; ; )
+   asm volatile ("hlt");
+
+   unreachable();
+}
+
+/*
+ * The TXT heap is too big to map all at once with early_ioremap
+ * so it is done a table at a time.
+ */
+static void __init *txt_early_get_heap_table(void __iomem *txt, u32 type,
+u32 bytes)
+{
+   void *heap;
+   u64 base, size, offset = 0;
+   int i;
+
+   if (type > TXT_SINIT_TABLE_MAX)
+   slaunch_txt_reset(txt,
+   "Error invalid table type for early heap walk\n",
+   SL_ERROR_HEAP_WALK);
+
+   memcpy_fromio(, txt + TXT_CR_HEAP_BASE, sizeof(base));
+   memcpy_fromio(, txt + TXT_CR_HEAP_SIZE, sizeof(size));
+
+   /* Iterate over heap tables looking for table of "type" */
+   for (i = 0; i < type; i++) {
+   base += offset;
+   heap = early_memremap(base, sizeof(u64));
+   if (!heap)
+   slaunch_txt_reset(txt,
+   "Error early_memremap of heap for heap walk\n",
+   SL_ERROR_HEAP_MAP);
+
+   offset = *((u64 *)heap);
+
+   /*
+* After the

[PATCH v2 09/12] reboot: Secure Launch SEXIT support on reboot paths

If the MLE kernel is being powered off, rebooted or halted,
then SEXIT must be called. Note that the SEXIT GETSEC leaf
can only be called after a machine_shutdown() has been done on
these paths. The machine_shutdown() is not called on a few paths
like when poweroff action does not have a poweroff callback (into
ACPI code) or when an emergency reset is done. In these cases,
just the TXT registers are finalized but SEXIT is skipped.

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/reboot.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index b29657b..796cbfb 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -731,6 +732,7 @@ static void native_machine_restart(char *__unused)
 
if (!reboot_force)
machine_shutdown();
+   slaunch_finalize(!reboot_force);
__machine_emergency_restart(0);
 }
 
@@ -741,6 +743,9 @@ static void native_machine_halt(void)
 
tboot_shutdown(TB_SHUTDOWN_HALT);
 
+   /* SEXIT done after machine_shutdown() to meet TXT requirements */
+   slaunch_finalize(1);
+
stop_this_cpu(NULL);
 }
 
@@ -749,8 +754,12 @@ static void native_machine_power_off(void)
if (pm_power_off) {
if (!reboot_force)
machine_shutdown();
+   slaunch_finalize(!reboot_force);
pm_power_off();
+   } else {
+   slaunch_finalize(0);
}
+
/* A fallback in case there is no PM info available */
tboot_shutdown(TB_SHUTDOWN_HALT);
 }
@@ -778,6 +787,7 @@ void machine_shutdown(void)
 
 void machine_emergency_restart(void)
 {
+   slaunch_finalize(0);
__machine_emergency_restart(1);
 }
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch

The IOMMU should always be set to default translated type after
the PMRs are disabled to protect the MLE from DMA.

Signed-off-by: Ross Philipson 
---
 drivers/iommu/intel/iommu.c | 5 +
 drivers/iommu/iommu.c   | 6 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284..4f0256d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device *dev)
  */
 static int device_def_domain_type(struct device *dev)
 {
+   /* Do not allow identity domain when Secure Launch is configured */
+   if (slaunch_get_flags() & SL_FLAG_ACTIVE)
+   return IOMMU_DOMAIN_DMA;
+
if (dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(dev);
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d..d49b7dd 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct kset *iommu_group_kset;
@@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
 {
if (cmd_line)
iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
-   iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
+
+   /* Do not allow identity domain when Secure Launch is configured */
+   if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
+   iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
 }
 
 void iommu_set_default_translated(bool cmd_line)
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 08/12] kexec: Secure Launch kexec SEXIT support

Prior to running the next kernel via kexec, the Secure Launch code
closes down private SMX resources and does an SEXIT. This allows the
next kernel to start normally without any issues starting the APs etc.

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/slaunch.c | 71 +++
 kernel/kexec_core.c   |  4 +++
 2 files changed, 75 insertions(+)

diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c
index a24d384..8db557a 100644
--- a/arch/x86/kernel/slaunch.c
+++ b/arch/x86/kernel/slaunch.c
@@ -470,3 +470,74 @@ void __init slaunch_setup(void)
vendor[3] == INTEL_CPUID_MFGID_EDX)
slaunch_setup_intel();
 }
+
+static inline void smx_getsec_sexit(void)
+{
+   asm volatile (".byte 0x0f,0x37\n"
+ : : "a" (SMX_X86_GETSEC_SEXIT));
+}
+
+void slaunch_finalize(int do_sexit)
+{
+   void __iomem *config;
+   u64 one = TXT_REGVALUE_ONE, val;
+
+   if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) !=
+   (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT))
+   return;
+
+   config = ioremap(TXT_PRIV_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES *
+PAGE_SIZE);
+   if (!config) {
+   pr_emerg("Error SEXIT failed to ioremap TXT private reqs\n");
+   return;
+   }
+
+   /* Clear secrets bit for SEXIT */
+   memcpy_toio(config + TXT_CR_CMD_NO_SECRETS, , sizeof(one));
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val));
+
+   /* Unlock memory configurations */
+   memcpy_toio(config + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(one));
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val));
+
+   /* Close the TXT private register space */
+   memcpy_toio(config + TXT_CR_CMD_CLOSE_PRIVATE, , sizeof(one));
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val));
+
+   /*
+* Calls to iounmap are not being done because of the state of the
+* system this late in the kexec process. Local IRQs are disabled and
+* iounmap causes a TLB flush which in turn causes a warning. Leaving
+* thse mappings is not an issue since the next kernel is going to
+* completely re-setup memory management.
+*/
+
+   /* Map public registers and do a final read fence */
+   config = ioremap(TXT_PUB_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES *
+PAGE_SIZE);
+   if (!config) {
+   pr_emerg("Error SEXIT failed to ioremap TXT public reqs\n");
+   return;
+   }
+
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val));
+
+   pr_emerg("TXT clear secrets bit and unlock memory complete.");
+
+   if (!do_sexit)
+   return;
+
+   if (smp_processor_id() != 0) {
+   pr_emerg("Error TXT SEXIT must be called on CPU 0\n");
+   return;
+   }
+
+   /* Disable SMX mode */
+   cr4_set_bits(X86_CR4_SMXE);
+
+   /* Do the SEXIT SMX operation */
+   smx_getsec_sexit();
+
+   pr_emerg("TXT SEXIT complete.");
+}
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index f099bae..1dcf20c 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1178,6 +1179,9 @@ int kernel_kexec(void)
cpu_hotplug_enable();
pr_notice("Starting new kernel\n");
machine_shutdown();
+
+   /* Finalize TXT registers and do SEXIT */
+   slaunch_finalize(1);
}
 
kmsg_dump(KMSG_DUMP_SHUTDOWN);
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 11/12] tpm: Allow locality 2 to be set when initializing the TPM for Secure Launch

The Secure Launch MLE environment uses PCRs that are only accessible from
the DRTM locality 2. By default the TPM drivers always initialize the
locality to 0. When a Secure Launch is in progress, initialize the
locality to 2.

Signed-off-by: Ross Philipson 
---
 drivers/char/tpm/tpm-chip.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index ddaeceb..48b9351 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "tpm.h"
 
 DEFINE_IDR(dev_nums_idr);
@@ -34,12 +35,20 @@
 
 static int tpm_request_locality(struct tpm_chip *chip)
 {
-   int rc;
+   int rc, locality;
 
if (!chip->ops->request_locality)
return 0;
 
-   rc = chip->ops->request_locality(chip, 0);
+   if (slaunch_get_flags() & SL_FLAG_ACTIVE) {
+   dev_dbg(>dev, "setting TPM locality to 2 for MLE\n");
+   locality = 2;
+   } else {
+   dev_dbg(>dev, "setting TPM locality to 0\n");
+   locality = 0;
+   }
+
+   rc = chip->ops->request_locality(chip, locality);
if (rc < 0)
return rc;
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-18 Thread Alex Williamson

On Fri, 18 Jun 2021 08:37:35 -0700
"Raj, Ashok"  wrote:

> On Fri, Jun 18, 2021 at 12:15:06PM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 18, 2021 at 03:47:51PM +0200, Joerg Roedel wrote:  
> > > Hi Kevin,
> > > 
> > > On Thu, Jun 17, 2021 at 07:31:03AM +, Tian, Kevin wrote:  
> > > > Now let's talk about the new IOMMU behavior:
> > > > 
> > > > -   A device is blocked from doing DMA to any resource outside of
> > > > its group when it's probed by the IOMMU driver. This could be a
> > > > special state w/o attaching to any domain, or a new special domain
> > > > type which differentiates it from existing domain types (identity, 
> > > > dma, or unmanged). Actually existing code already includes a
> > > > IOMMU_DOMAIN_BLOCKED type but nobody uses it.  
> > > 
> > > There is a reason for the default domain to exist: Devices which require
> > > RMRR mappings to be present. You can't just block all DMA from devices
> > > until a driver takes over, we put much effort into making sure there is
> > > not even a small window in time where RMRR regions (unity mapped regions
> > > on AMD) are not mapped.  
> > 
> > Yes, I think the DMA blocking can only start around/after a VFIO type
> > driver has probed() and bound to a device in the group, not much
> > different from today.  
> 
> Does this mean when a device has a required "RMRR" that requires a unity
> mapping we block assigning those devices to guests? I remember we had some
> restriction but there was a need to go around it at some point in time.
> 
> - Either we disallow assigning devices with RMRR
> - Break that unity map when the device is probed and after which any RMRR
>   access from device will fault.

We currently disallow assignment of RMRR encumbered devices except for
the known cases of USB and IGD.  In the general case, an RMRR imposes
a requirement on the host system to maintain ranges of identity mapping
that is incompatible with userspace ownership of the device and IOVA
address space.  AFAICT, nothing changes in the /dev/iommu model that
would make it safe to entrust userspace with RMRR encumbered devices.
Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems

2021-06-18 Thread Suman Anna via iommu

Hi Rob,

On 6/15/21 2:15 PM, Rob Herring wrote:
> If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
> same size as the list is redundant and can be dropped. Note that is DT
> schema specific behavior and not standard json-schema behavior. The tooling
> will fixup the final schema adding any unspecified minItems/maxItems.
> 
> This condition is partially checked with the meta-schema already, but
> only if both 'minItems' and 'maxItems' are equal to the 'items' length.
> An improved meta-schema is pending.
> 
> Cc: Jens Axboe 
> Cc: Stephen Boyd 
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> Cc: Vinod Koul 
> Cc: Bartosz Golaszewski 
> Cc: Kamal Dasu 
> Cc: Jonathan Cameron 
> Cc: Lars-Peter Clausen 
> Cc: Thomas Gleixner 
> Cc: Marc Zyngier 
> Cc: Joerg Roedel 
> Cc: Jassi Brar 
> Cc: Mauro Carvalho Chehab 
> Cc: Krzysztof Kozlowski 
> Cc: Ulf Hansson 
> Cc: Jakub Kicinski 
> Cc: Wolfgang Grandegger 
> Cc: Marc Kleine-Budde 
> Cc: Andrew Lunn 
> Cc: Vivien Didelot 
> Cc: Vladimir Oltean 
> Cc: Bjorn Helgaas 
> Cc: Kishon Vijay Abraham I 
> Cc: Linus Walleij 
> Cc: "Uwe Kleine-König" 
> Cc: Lee Jones 
> Cc: Ohad Ben-Cohen 
> Cc: Mathieu Poirier 
> Cc: Philipp Zabel 
> Cc: Paul Walmsley 
> Cc: Palmer Dabbelt 
> Cc: Albert Ou 
> Cc: Alessandro Zummo 
> Cc: Alexandre Belloni 
> Cc: Greg Kroah-Hartman 
> Cc: Mark Brown 
> Cc: Zhang Rui 
> Cc: Daniel Lezcano 
> Cc: Wim Van Sebroeck 
> Cc: Guenter Roeck 
> Signed-off-by: Rob Herring 
> ---
>  .../devicetree/bindings/ata/nvidia,tegra-ahci.yaml  | 1 -
>  .../devicetree/bindings/clock/allwinner,sun4i-a10-ccu.yaml  | 2 --
>  .../devicetree/bindings/clock/qcom,gcc-apq8064.yaml | 1 -
>  Documentation/devicetree/bindings/clock/qcom,gcc-sdx55.yaml | 2 --
>  .../devicetree/bindings/clock/qcom,gcc-sm8350.yaml  | 2 --
>  .../devicetree/bindings/clock/sprd,sc9863a-clk.yaml | 1 -
>  .../devicetree/bindings/crypto/allwinner,sun8i-ce.yaml  | 2 --
>  Documentation/devicetree/bindings/crypto/fsl-dcp.yaml   | 1 -
>  .../display/allwinner,sun4i-a10-display-backend.yaml| 6 --
>  .../bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml  | 1 -
>  .../bindings/display/allwinner,sun8i-a83t-dw-hdmi.yaml  | 4 
>  .../bindings/display/allwinner,sun8i-a83t-hdmi-phy.yaml | 2 --
>  .../bindings/display/allwinner,sun8i-r40-tcon-top.yaml  | 2 --
>  .../devicetree/bindings/display/bridge/cdns,mhdp8546.yaml   | 2 --
>  .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 2 --
>  Documentation/devicetree/bindings/display/st,stm32-dsi.yaml | 2 --
>  .../devicetree/bindings/display/st,stm32-ltdc.yaml  | 1 -
>  .../devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml | 4 
>  .../devicetree/bindings/dma/renesas,rcar-dmac.yaml  | 1 -
>  .../devicetree/bindings/edac/amazon,al-mc-edac.yaml | 2 --
>  Documentation/devicetree/bindings/eeprom/at24.yaml  | 1 -
>  Documentation/devicetree/bindings/example-schema.yaml   | 2 --
>  Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 1 -
>  Documentation/devicetree/bindings/gpu/vivante,gc.yaml   | 1 -
>  Documentation/devicetree/bindings/i2c/brcm,brcmstb-i2c.yaml | 1 -
>  .../devicetree/bindings/i2c/marvell,mv64xxx-i2c.yaml| 2 --
>  .../devicetree/bindings/i2c/mellanox,i2c-mlxbf.yaml | 1 -
>  .../devicetree/bindings/iio/adc/amlogic,meson-saradc.yaml   | 1 -
>  .../devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.yaml | 2 --
>  .../bindings/interrupt-controller/fsl,irqsteer.yaml | 1 -
>  .../bindings/interrupt-controller/loongson,liointc.yaml | 1 -
>  Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml| 1 -
>  .../devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml   | 1 -
>  .../devicetree/bindings/mailbox/st,stm32-ipcc.yaml  | 2 --
>  .../devicetree/bindings/media/amlogic,gx-vdec.yaml  | 1 -
>  Documentation/devicetree/bindings/media/i2c/adv7604.yaml| 1 -
>  .../devicetree/bindings/media/marvell,mmp2-ccic.yaml| 1 -
>  .../devicetree/bindings/media/qcom,sc7180-venus.yaml| 1 -
>  .../devicetree/bindings/media/qcom,sdm845-venus-v2.yaml | 1 -
>  .../devicetree/bindings/media/qcom,sm8250-venus.yaml| 1 -
>  Documentation/devicetree/bindings/media/renesas,drif.yaml   | 1 -
>  .../bindings/memory-controllers/mediatek,smi-common.yaml| 6 ++
>  .../bindings/memory-controllers/mediatek,smi-larb.yaml  | 1 -
>  .../devicetree/bindings/mmc/allwinner,sun4i-a10-mmc.yaml| 2 --
>  Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml| 1 -
>  Documentation/devicetree/bindings/mmc/mtk-sd.yaml   | 2 --
>  Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml | 2 --
>  Documentation/devicetree/bindings/mmc/sdhci-am654.yaml  | 1 -
>  Documentation/devicetree/bindings/mmc/sdhci-pxa.yaml| 1 -
>  .../devicetree/bindings/net/amlogic,meson-dwmac.yaml

Re: Plan for /dev/ioasid RFC v2

2021-06-18 Thread Raj, Ashok

On Fri, Jun 18, 2021 at 12:15:06PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 18, 2021 at 03:47:51PM +0200, Joerg Roedel wrote:
> > Hi Kevin,
> > 
> > On Thu, Jun 17, 2021 at 07:31:03AM +, Tian, Kevin wrote:
> > > Now let's talk about the new IOMMU behavior:
> > > 
> > > -   A device is blocked from doing DMA to any resource outside of
> > > its group when it's probed by the IOMMU driver. This could be a
> > > special state w/o attaching to any domain, or a new special domain
> > > type which differentiates it from existing domain types (identity, 
> > > dma, or unmanged). Actually existing code already includes a
> > > IOMMU_DOMAIN_BLOCKED type but nobody uses it.
> > 
> > There is a reason for the default domain to exist: Devices which require
> > RMRR mappings to be present. You can't just block all DMA from devices
> > until a driver takes over, we put much effort into making sure there is
> > not even a small window in time where RMRR regions (unity mapped regions
> > on AMD) are not mapped.
> 
> Yes, I think the DMA blocking can only start around/after a VFIO type
> driver has probed() and bound to a device in the group, not much
> different from today.

Does this mean when a device has a required "RMRR" that requires a unity
mapping we block assigning those devices to guests? I remember we had some
restriction but there was a need to go around it at some point in time.

- Either we disallow assigning devices with RMRR
- Break that unity map when the device is probed and after which any RMRR
  access from device will fault.

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 5/5] iommu/virtio: Enable x86 support

With the VIOT support in place, x86 platforms can now use the
virtio-iommu.

Because the other x86 IOMMU drivers aren't yet ready to use the
acpi_dma_setup() path, x86 doesn't implement arch_setup_dma_ops() at the
moment. Similarly to Vt-d and AMD IOMMU, clear the DMA ops and call
iommu_setup_dma_ops() from probe_finalize().

Acked-by: Joerg Roedel 
Acked-by: Michael S. Tsirkin 
Tested-by: Eric Auger 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/Kconfig|  3 ++-
 drivers/iommu/dma-iommu.c|  1 +
 drivers/iommu/virtio-iommu.c | 11 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index aff8a4830dd1..07b7c25cbed8 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -400,8 +400,9 @@ config HYPERV_IOMMU
 config VIRTIO_IOMMU
tristate "Virtio IOMMU driver"
depends on VIRTIO
-   depends on ARM64
+   depends on (ARM64 || X86)
select IOMMU_API
+   select IOMMU_DMA
select INTERVAL_TREE
select ACPI_VIOT if ACPI
help
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c62e19bed302..9dbbc95c8189 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1330,6 +1330,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
 pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA 
ops\n",
 dev_name(dev));
 }
+EXPORT_SYMBOL_GPL(iommu_setup_dma_ops);
 
 static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
phys_addr_t msi_addr, struct iommu_domain *domain)
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index c6e5ee4d9cef..fe581f0c9b3a 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -904,6 +905,15 @@ static struct iommu_device *viommu_probe_device(struct 
device *dev)
return ERR_PTR(ret);
 }
 
+static void viommu_probe_finalize(struct device *dev)
+{
+#ifndef CONFIG_ARCH_HAS_SETUP_DMA_OPS
+   /* First clear the DMA ops in case we're switching from a DMA domain */
+   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
+#endif
+}
+
 static void viommu_release_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
@@ -940,6 +950,7 @@ static struct iommu_ops viommu_ops = {
.iova_to_phys   = viommu_iova_to_phys,
.iotlb_sync = viommu_iotlb_sync,
.probe_device   = viommu_probe_device,
+   .probe_finalize = viommu_probe_finalize,
.release_device = viommu_release_device,
.device_group   = viommu_device_group,
.get_resv_regions   = viommu_get_resv_regions,
-- 
2.32.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 3/5] ACPI: Add driver for the VIOT table

The ACPI Virtual I/O Translation Table describes topology of
para-virtual platforms, similarly to vendor tables DMAR, IVRS and IORT.
For now it describes the relation between virtio-iommu and the endpoints
it manages.

Three steps are needed to configure DMA of endpoints:

(1) acpi_viot_init(): parse the VIOT table, find or create the fwnode
associated to each vIOMMU device. This needs to happen after
acpi_scan_init(), because it relies on the struct device and their
fwnode to be available.

(2) When probing the vIOMMU device, the driver registers its IOMMU ops
within the IOMMU subsystem. This step doesn't require any
intervention from the VIOT driver.

(3) viot_iommu_configure(): before binding the endpoint to a driver,
find the associated IOMMU ops. Register them, along with the
endpoint ID, into the device's iommu_fwspec.

If step (3) happens before step (2), it is deferred until the IOMMU is
initialized, then retried.

Tested-by: Eric Auger 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/Kconfig  |   3 +
 drivers/iommu/Kconfig |   1 +
 drivers/acpi/Makefile |   2 +
 include/linux/acpi_viot.h |  19 ++
 drivers/acpi/bus.c|   2 +
 drivers/acpi/scan.c   |   3 +
 drivers/acpi/viot.c   | 366 ++
 MAINTAINERS   |   8 +
 8 files changed, 404 insertions(+)
 create mode 100644 include/linux/acpi_viot.h
 create mode 100644 drivers/acpi/viot.c

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index eedec61e3476..3758c6940ed7 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -526,6 +526,9 @@ endif
 
 source "drivers/acpi/pmic/Kconfig"
 
+config ACPI_VIOT
+   bool
+
 endif  # ACPI
 
 config X86_PM_TIMER
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..aff8a4830dd1 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -403,6 +403,7 @@ config VIRTIO_IOMMU
depends on ARM64
select IOMMU_API
select INTERVAL_TREE
+   select ACPI_VIOT if ACPI
help
  Para-virtualised IOMMU driver with virtio.
 
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 700b41adf2db..a6e644c48987 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -118,3 +118,5 @@ video-objs  += acpi_video.o video_detect.o
 obj-y  += dptf/
 
 obj-$(CONFIG_ARM64)+= arm64/
+
+obj-$(CONFIG_ACPI_VIOT)+= viot.o
diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
new file mode 100644
index ..1eb8ee5b0e5f
--- /dev/null
+++ b/include/linux/acpi_viot.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __ACPI_VIOT_H__
+#define __ACPI_VIOT_H__
+
+#include 
+
+#ifdef CONFIG_ACPI_VIOT
+void __init acpi_viot_init(void);
+int viot_iommu_configure(struct device *dev);
+#else
+static inline void acpi_viot_init(void) {}
+static inline int viot_iommu_configure(struct device *dev)
+{
+   return -ENODEV;
+}
+#endif
+
+#endif /* __ACPI_VIOT_H__ */
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index a4bd673934c0..d6f4e2f06fdb 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -27,6 +27,7 @@
 #include 
 #endif
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1334,6 +1335,7 @@ static int __init acpi_init(void)
acpi_wakeup_device_init();
acpi_debugger_init();
acpi_setup_sb_notify_handler();
+   acpi_viot_init();
return 0;
 }
 
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 2a2e690040e9..3e2bb04ab528 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1556,6 +1557,8 @@ static const struct iommu_ops 
*acpi_iommu_configure_id(struct device *dev,
return ops;
 
err = iort_iommu_configure_id(dev, id_in);
+   if (err && err != -EPROBE_DEFER)
+   err = viot_iommu_configure(dev);
 
/*
 * If we have reason to believe the IOMMU driver missed the initial
diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
new file mode 100644
index ..d2256326c73a
--- /dev/null
+++ b/drivers/acpi/viot.c
@@ -0,0 +1,366 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtual I/O topology
+ *
+ * The Virtual I/O Translation Table (VIOT) describes the topology of
+ * para-virtual IOMMUs and the endpoints they manage. The OS uses it to
+ * initialize devices in the right order, preventing endpoints from issuing DMA
+ * before their IOMMU is ready.
+ *
+ * When binding a driver to a device, before calling the device driver's 
probe()
+ * method, the driver infrastructure calls dma_configure(). At that point the
+ * VIOT driver looks for an IOMMU associated to the device in the VIOT table.
+ * If an IOMMU exists and has been initialized, the VIOT driver initializes the
+ * device's IOMMU

[PATCH v5 4/5] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()

Passing a 64-bit address width to iommu_setup_dma_ops() is valid on
virtual platforms, but isn't currently possible. The overflow check in
iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass
a limit address instead of a size, so callers don't have to fake a size
to work around the check.

The base and limit parameters are being phased out, because:
* they are redundant for x86 callers. dma-iommu already reserves the
  first page, and the upper limit is already in domain->geometry.
* they can now be obtained from dev->dma_range_map on Arm.
But removing them on Arm isn't completely straightforward so is left for
future work. As an intermediate step, simplify the x86 callers by
passing dummy limits.

Signed-off-by: Jean-Philippe Brucker 
---
 include/linux/dma-iommu.h   |  4 ++--
 arch/arm64/mm/dma-mapping.c |  2 +-
 drivers/iommu/amd/iommu.c   |  2 +-
 drivers/iommu/dma-iommu.c   | 12 ++--
 drivers/iommu/intel/iommu.c |  5 +
 5 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 6e75a2d689b4..758ca4694257 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, 
dma_addr_t base);
 void iommu_put_dma_cookie(struct iommu_domain *domain);
 
 /* Setup call for arch DMA mapping code */
-void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size);
+void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
 
 /* The DMA API isn't _quite_ the whole story, though... */
 /*
@@ -50,7 +50,7 @@ struct msi_msg;
 struct device;
 
 static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base,
-   u64 size)
+  u64 dma_limit)
 {
 }
 
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 4bf1dd3eb041..6719f9efea09 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 
size,
 
dev->dma_coherent = coherent;
if (iommu)
-   iommu_setup_dma_ops(dev, dma_base, size);
+   iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1);
 
 #ifdef CONFIG_XEN
if (xen_swiotlb_detect())
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3ac42bbdefc6..216323fb27ef 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev)
/* Domains are initialized for this device - have a look what we ended 
up with */
domain = iommu_get_domain_for_dev(dev);
if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
else
set_dma_ops(dev, NULL);
 }
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..c62e19bed302 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev)
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
  * @base: IOVA at which the mappable address space starts
- * @size: Size of IOVA space
+ * @limit: Last address of the IOVA space
  * @dev: Device the domain is being initialised for
  *
- * @base and @size should be exact multiples of IOMMU page granularity to
+ * @base and @limit + 1 should be exact multiples of IOMMU page granularity to
  * avoid rounding surprises. If necessary, we reserve the page at address 0
  * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but
  * any change which could make prior IOVAs invalid will fail.
  */
 static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
-   u64 size, struct device *dev)
+dma_addr_t limit, struct device *dev)
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
@@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
/* Check the domain allows at least some access to the device... */
if (domain->geometry.force_aperture) {
if (base > domain->geometry.aperture_end ||
-   base + size <= domain->geometry.aperture_start) {
+   limit < domain->geometry.aperture_start) {
pr_warn("specified DMA range outside IOMMU 
capability\n");
return -EFAULT;
}
@@ -1308,7 +1308,7 @@ static const struct dma_map_ops iommu_dma_ops = {
  * The IOMMU core code allocates the default DMA domain, which the underlying
  * IOMMU driver needs to support via the dma-iommu layer.
  */
-void iommu_setup_dma_ops(struct

[PATCH v5 2/5] ACPI: Move IOMMU setup code out of IORT

Extract the code that sets up the IOMMU infrastructure from IORT, since
it can be reused by VIOT. Move it one level up into a new
acpi_iommu_configure_id() function, which calls the IORT parsing
function which in turn calls the acpi_iommu_fwspec_init() helper.

Signed-off-by: Jean-Philippe Brucker 
---
 include/acpi/acpi_bus.h   |  3 ++
 include/linux/acpi_iort.h |  8 ++---
 drivers/acpi/arm64/iort.c | 74 +--
 drivers/acpi/scan.c   | 73 +-
 4 files changed, 86 insertions(+), 72 deletions(-)

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 3a82faac5767..41f092a269f6 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -588,6 +588,9 @@ struct acpi_pci_root {
 
 bool acpi_dma_supported(struct acpi_device *adev);
 enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev);
+int acpi_iommu_fwspec_init(struct device *dev, u32 id,
+  struct fwnode_handle *fwnode,
+  const struct iommu_ops *ops);
 int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset,
   u64 *size);
 int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr,
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index f7f054833afd..f1f0842a2cb2 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev);
 int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 /* IOMMU interface */
 int iort_dma_get_ranges(struct device *dev, u64 *size);
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in);
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
 phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
 #else
@@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device 
*dev) { }
 /* IOMMU interface */
 static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
 { return -ENODEV; }
-static inline const struct iommu_ops *iort_iommu_configure_id(
- struct device *dev, const u32 *id_in)
-{ return NULL; }
+static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
+{ return -ENODEV; }
 static inline
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head)
 { return 0; }
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index a940be1cf2af..487d1095030d 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -806,23 +806,6 @@ static struct acpi_iort_node 
*iort_get_msi_resv_iommu(struct device *dev)
return NULL;
 }
 
-static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device *dev)
-{
-   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
-
-   return (fwspec && fwspec->ops) ? fwspec->ops : NULL;
-}
-
-static inline int iort_add_device_replay(struct device *dev)
-{
-   int err = 0;
-
-   if (dev->bus && !device_iommu_mapped(dev))
-   err = iommu_probe_device(dev);
-
-   return err;
-}
-
 /**
  * iort_iommu_msi_get_resv_regions - Reserved region driver helper
  * @dev: Device from iommu_get_resv_regions()
@@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type)
}
 }
 
-static int arm_smmu_iort_xlate(struct device *dev, u32 streamid,
-  struct fwnode_handle *fwnode,
-  const struct iommu_ops *ops)
-{
-   int ret = iommu_fwspec_init(dev, fwnode, ops);
-
-   if (!ret)
-   ret = iommu_fwspec_add_ids(dev, , 1);
-
-   return ret;
-}
-
 static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node)
 {
struct acpi_iort_root_complex *pci_rc;
@@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct 
acpi_iort_node *node,
return iort_iommu_driver_enabled(node->type) ?
   -EPROBE_DEFER : -ENODEV;
 
-   return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops);
+   return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops);
 }
 
 struct iort_pci_alias_info {
@@ -1020,24 +991,13 @@ static int iort_nc_iommu_map_id(struct device *dev,
  * @dev: device to configure
  * @id_in: optional input id const value pointer
  *
- * Returns: iommu_ops pointer on configuration success
- *  NULL on configuration failure
+ * Returns: 0 on success, <0 on failure
  */
-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in)
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
 {
struct acpi_iort_node *node;
-   const struct iommu_ops *ops;
int err = -ENODEV;
 
-   /*
-* If we already translated the

[PATCH v5 1/5] ACPI: arm64: Move DMA setup operations out of IORT

Extract generic DMA setup code out of IORT, so it can be reused by VIOT.
Keep it in drivers/acpi/arm64 for now, since it could break x86
platforms that haven't run this code so far, if they have invalid
tables.

Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/acpi/arm64/Makefile |  1 +
 include/linux/acpi.h|  3 +++
 include/linux/acpi_iort.h   |  6 ++---
 drivers/acpi/arm64/dma.c| 50 ++
 drivers/acpi/arm64/iort.c   | 54 ++---
 drivers/acpi/scan.c |  2 +-
 6 files changed, 66 insertions(+), 50 deletions(-)
 create mode 100644 drivers/acpi/arm64/dma.c

diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 6ff50f4ed947..66acbe77f46e 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_ACPI_IORT)+= iort.o
 obj-$(CONFIG_ACPI_GTDT)+= gtdt.o
+obj-y  += dma.o
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index c60745f657e9..7aaa9559cc19 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -259,9 +259,12 @@ void acpi_numa_x2apic_affinity_init(struct 
acpi_srat_x2apic_cpu_affinity *pa);
 
 #ifdef CONFIG_ARM64
 void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa);
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size);
 #else
 static inline void
 acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { }
+static inline void
+acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) { }
 #endif
 
 int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma);
diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
index 1a12baa58e40..f7f054833afd 100644
--- a/include/linux/acpi_iort.h
+++ b/include/linux/acpi_iort.h
@@ -34,7 +34,7 @@ struct irq_domain *iort_get_device_domain(struct device *dev, 
u32 id,
 void acpi_configure_pmsi_domain(struct device *dev);
 int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id);
 /* IOMMU interface */
-void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size);
+int iort_dma_get_ranges(struct device *dev, u64 *size);
 const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
const u32 *id_in);
 int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
*head);
@@ -48,8 +48,8 @@ static inline struct irq_domain *iort_get_device_domain(
 { return NULL; }
 static inline void acpi_configure_pmsi_domain(struct device *dev) { }
 /* IOMMU interface */
-static inline void iort_dma_setup(struct device *dev, u64 *dma_addr,
- u64 *size) { }
+static inline int iort_dma_get_ranges(struct device *dev, u64 *size)
+{ return -ENODEV; }
 static inline const struct iommu_ops *iort_iommu_configure_id(
  struct device *dev, const u32 *id_in)
 { return NULL; }
diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c
new file mode 100644
index ..f16739ad3cc0
--- /dev/null
+++ b/drivers/acpi/arm64/dma.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include 
+#include 
+#include 
+#include 
+
+void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size)
+{
+   int ret;
+   u64 end, mask;
+   u64 dmaaddr = 0, size = 0, offset = 0;
+
+   /*
+* If @dev is expected to be DMA-capable then the bus code that created
+* it should have initialised its dma_mask pointer by this point. For
+* now, we'll continue the legacy behaviour of coercing it to the
+* coherent mask if not, but we'll no longer do so quietly.
+*/
+   if (!dev->dma_mask) {
+   dev_warn(dev, "DMA mask not set\n");
+   dev->dma_mask = >coherent_dma_mask;
+   }
+
+   if (dev->coherent_dma_mask)
+   size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1);
+   else
+   size = 1ULL << 32;
+
+   ret = acpi_dma_get_range(dev, , , );
+   if (ret == -ENODEV)
+   ret = iort_dma_get_ranges(dev, );
+   if (!ret) {
+   /*
+* Limit coherent and dma mask based on size retrieved from
+* firmware.
+*/
+   end = dmaaddr + size - 1;
+   mask = DMA_BIT_MASK(ilog2(end) + 1);
+   dev->bus_dma_limit = end;
+   dev->coherent_dma_mask = min(dev->coherent_dma_mask, mask);
+   *dev->dma_mask = min(*dev->dma_mask, mask);
+   }
+
+   *dma_addr = dmaaddr;
+   *dma_size = size;
+
+   ret = dma_direct_set_offset(dev, dmaaddr + offset, dmaaddr, size);
+
+   dev_dbg(dev, "dma_offset(%#08llx)%s\n", offset, ret ? " failed!" : "");
+}
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 3912a1f6058e..a940be1cf2af 100644
---

[PATCH v5 0/5] Add support for ACPI VIOT

Add a driver for the ACPI VIOT table, which provides topology
information for para-virtual IOMMUs. Enable virtio-iommu on
non-devicetree platforms, including x86.

Since v4 [1]:
* Fixes (comments, wrong argument, unused variable)
* Removed patch 5 that wrongly moved set_dma_ops(dev, NULL) into dma-iommu.
  The simplification of limit parameters for x86 callers is now in patch 4.
* Release ACPI table after parsing
* Added review and tested tags, thanks for all the feedback!

You can find a QEMU implementation at [2], with extra support for
testing all VIOT nodes including MMIO-based endpoints and IOMMU.
This series is at [3].

[1] 
https://lore.kernel.org/linux-iommu/20210610075130.67517-1-jean-phili...@linaro.org/
[2] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/acpi
[3] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/acpi

Jean-Philippe Brucker (5):
  ACPI: arm64: Move DMA setup operations out of IORT
  ACPI: Move IOMMU setup code out of IORT
  ACPI: Add driver for the VIOT table
  iommu/dma: Pass address limit rather than size to
iommu_setup_dma_ops()
  iommu/virtio: Enable x86 support

 drivers/acpi/Kconfig |   3 +
 drivers/iommu/Kconfig|   4 +-
 drivers/acpi/Makefile|   2 +
 drivers/acpi/arm64/Makefile  |   1 +
 include/acpi/acpi_bus.h  |   3 +
 include/linux/acpi.h |   3 +
 include/linux/acpi_iort.h|  14 +-
 include/linux/acpi_viot.h|  19 ++
 include/linux/dma-iommu.h|   4 +-
 arch/arm64/mm/dma-mapping.c  |   2 +-
 drivers/acpi/arm64/dma.c |  50 +
 drivers/acpi/arm64/iort.c| 128 ++--
 drivers/acpi/bus.c   |   2 +
 drivers/acpi/scan.c  |  78 +++-
 drivers/acpi/viot.c  | 366 +++
 drivers/iommu/amd/iommu.c|   2 +-
 drivers/iommu/dma-iommu.c|  13 +-
 drivers/iommu/intel/iommu.c  |   5 +-
 drivers/iommu/virtio-iommu.c |  11 ++
 MAINTAINERS  |   8 +
 20 files changed, 581 insertions(+), 137 deletions(-)
 create mode 100644 include/linux/acpi_viot.h
 create mode 100644 drivers/acpi/arm64/dma.c
 create mode 100644 drivers/acpi/viot.c

-- 
2.32.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Plan for /dev/ioasid RFC v2

2021-06-18 Thread Jason Gunthorpe

On Fri, Jun 18, 2021 at 03:47:51PM +0200, Joerg Roedel wrote:
> Hi Kevin,
> 
> On Thu, Jun 17, 2021 at 07:31:03AM +, Tian, Kevin wrote:
> > Now let's talk about the new IOMMU behavior:
> > 
> > -   A device is blocked from doing DMA to any resource outside of
> > its group when it's probed by the IOMMU driver. This could be a
> > special state w/o attaching to any domain, or a new special domain
> > type which differentiates it from existing domain types (identity, 
> > dma, or unmanged). Actually existing code already includes a
> > IOMMU_DOMAIN_BLOCKED type but nobody uses it.
> 
> There is a reason for the default domain to exist: Devices which require
> RMRR mappings to be present. You can't just block all DMA from devices
> until a driver takes over, we put much effort into making sure there is
> not even a small window in time where RMRR regions (unity mapped regions
> on AMD) are not mapped.

Yes, I think the DMA blocking can only start around/after a VFIO type
driver has probed() and bound to a device in the group, not much
different from today.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions

2021-06-18 Thread Christoph Hellwig

On Fri, Jun 18, 2021 at 09:09:17AM -0500, Tom Lendacky wrote:
> > swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem
> > and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so
> > swiotlb_init_with_tbl is also good.
> > I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think
> > it's clearer and safer.
> 
> On x86, if the memset is done before set_memory_decrypted() and memory
> encryption is active, then the memory will look like ciphertext afterwards
> and not be zeroes. If zeroed memory is required, then a memset must be
> done after the set_memory_decrypted() calls.

Which should be fine - we don't care that the memory is cleared to 0,
just that it doesn't leak other data.  Maybe a comment would be useful,
though,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions

2021-06-18 Thread Tom Lendacky via iommu

On 6/18/21 1:25 AM, Claire Chang wrote:
> On Fri, Jun 18, 2021 at 7:30 AM Stefano Stabellini
>  wrote:
>>
>> On Thu, 17 Jun 2021, Claire Chang wrote:
>>> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
>>> initialization to make the code reusable.
>>>
>>> Signed-off-by: Claire Chang 
>>> Reviewed-by: Christoph Hellwig 
>>> Tested-by: Stefano Stabellini 
>>> Tested-by: Will Deacon 
>>> ---
>>>  kernel/dma/swiotlb.c | 50 ++--
>>>  1 file changed, 25 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>> index 52e2ac526757..47bb2a766798 100644
>>> --- a/kernel/dma/swiotlb.c
>>> +++ b/kernel/dma/swiotlb.c
>>> @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
>>>   memset(vaddr, 0, bytes);
>>>  }
>>>
>>> -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
>>> verbose)
>>> +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
>>> start,
>>> + unsigned long nslabs, bool late_alloc)
>>>  {
>>> + void *vaddr = phys_to_virt(start);
>>>   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>>> +
>>> + mem->nslabs = nslabs;
>>> + mem->start = start;
>>> + mem->end = mem->start + bytes;
>>> + mem->index = 0;
>>> + mem->late_alloc = late_alloc;
>>> + spin_lock_init(>lock);
>>> + for (i = 0; i < mem->nslabs; i++) {
>>> + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>>> + mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>>> + mem->slots[i].alloc_size = 0;
>>> + }
>>> + memset(vaddr, 0, bytes);
>>> +}
>>> +
>>> +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
>>> verbose)
>>> +{
>>>   struct io_tlb_mem *mem;
>>>   size_t alloc_size;
>>>
>>> @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
>>> long nslabs, int verbose)
>>>   if (!mem)
>>>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
>>> __func__, alloc_size, PAGE_SIZE);
>>> - mem->nslabs = nslabs;
>>> - mem->start = __pa(tlb);
>>> - mem->end = mem->start + bytes;
>>> - mem->index = 0;
>>> - spin_lock_init(>lock);
>>> - for (i = 0; i < mem->nslabs; i++) {
>>> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>>> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>>> - mem->slots[i].alloc_size = 0;
>>> - }
>>> +
>>> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
>>>
>>>   io_tlb_default_mem = mem;
>>>   if (verbose)
>>> @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
>>>  int
>>>  swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>>>  {
>>> - unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>>>   struct io_tlb_mem *mem;
>>> + unsigned long bytes = nslabs << IO_TLB_SHIFT;
>>>
>>>   if (swiotlb_force == SWIOTLB_NO_FORCE)
>>>   return 0;
>>> @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
>>> nslabs)
>>>   if (!mem)
>>>   return -ENOMEM;
>>>
>>> - mem->nslabs = nslabs;
>>> - mem->start = virt_to_phys(tlb);
>>> - mem->end = mem->start + bytes;
>>> - mem->index = 0;
>>> - mem->late_alloc = 1;
>>> - spin_lock_init(>lock);
>>> - for (i = 0; i < mem->nslabs; i++) {
>>> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>>> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>>> - mem->slots[i].alloc_size = 0;
>>> - }
>>> -
>>> + memset(mem, 0, sizeof(*mem));
>>> + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
>>>   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
>>> - memset(tlb, 0, bytes);
>>
>> This is good for swiotlb_late_init_with_tbl. However I have just noticed
>> that mem could also be allocated from swiotlb_init_with_tbl, in which
>> case the zeroing is missing. I think we need another memset in
>> swiotlb_init_with_tbl as well. Or maybe it could be better to have a
>> single memset at the beginning of swiotlb_init_io_tlb_mem instead. Up to
>> you.
> 
> swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem
> and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so
> swiotlb_init_with_tbl is also good.
> I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think
> it's clearer and safer.

On x86, if the memset is done before set_memory_decrypted() and memory
encryption is active, then the memory will look like ciphertext afterwards
and not be zeroes. If zeroed memory is required, then a memset must be
done after the set_memory_decrypted() calls.

Thanks,
Tom

> 
> [1] 
>

Re: Plan for /dev/ioasid RFC v2

2021-06-18 Thread Joerg Roedel

Hi Kevin,

On Thu, Jun 17, 2021 at 07:31:03AM +, Tian, Kevin wrote:
> Now let's talk about the new IOMMU behavior:
> 
> -   A device is blocked from doing DMA to any resource outside of
> its group when it's probed by the IOMMU driver. This could be a
> special state w/o attaching to any domain, or a new special domain
> type which differentiates it from existing domain types (identity, 
> dma, or unmanged). Actually existing code already includes a
> IOMMU_DOMAIN_BLOCKED type but nobody uses it.

There is a reason for the default domain to exist: Devices which require
RMRR mappings to be present. You can't just block all DMA from devices
until a driver takes over, we put much effort into making sure there is
not even a small window in time where RMRR regions (unity mapped regions
on AMD) are not mapped.

And if a device has no RMRR regions defined, then the default domain
will be identical to a blocking domain. Device driver bugs don't count
here, as they can be fixed. The kernel trusts itself, so we can rely on
drivers unmapping all of their DMA buffers. Maybe that should be checked
by dma-debug to find violations there.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH][next] iommu/vt-d: Fix dereference of pointer info before it is null checked

2021-06-18 Thread Joerg Roedel

On Fri, Jun 11, 2021 at 02:50:24PM +0100, Colin King wrote:
>  drivers/iommu/intel/iommu.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()


On 2021/6/18 19:34, John Garry wrote:

We only ever now set strict mode enabled in iommu_set_dma_strict(), so
just remove the argument.

Signed-off-by: John Garry 
Reviewed-by: Robin Murphy 
---
  drivers/iommu/amd/init.c| 2 +-
  drivers/iommu/intel/iommu.c | 6 +++---
  drivers/iommu/iommu.c   | 5 ++---
  include/linux/iommu.h   | 2 +-
  4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1e641cb6dddc..6e12a615117b 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3099,7 +3099,7 @@ static int __init parse_amd_iommu_options(char *str)
for (; *str; ++str) {
if (strncmp(str, "fullflush", 9) == 0) {
pr_warn("amd_iommu=fullflush deprecated; use iommu.strict=1 
instead\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
}
if (strncmp(str, "force_enable", 12) == 0)
amd_iommu_force_enable = true;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0f9d8116..77d0834fb0df 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -454,7 +454,7 @@ static int __init intel_iommu_setup(char *str)
iommu_dma_forcedac = true;
} else if (!strncmp(str, "strict", 6)) {
pr_warn("intel_iommu=strict deprecated; use iommu.strict=1 
instead\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
} else if (!strncmp(str, "sp_off", 6)) {
pr_info("Disable supported super page\n");
intel_iommu_superpage = 0;
@@ -4382,7 +4382,7 @@ int __init intel_iommu_init(void)
 */
if (cap_caching_mode(iommu->cap)) {
pr_info_once("IOMMU batching disallowed due to 
virtualization\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
}
iommu_device_sysfs_add(>iommu, NULL,
   intel_iommu_groups,
@@ -5699,7 +5699,7 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev 
*dev)
} else if (dmar_map_gfx) {
/* we have to ensure the gfx device is idle before we flush */
pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
}
  }
  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, 
quirk_calpella_no_shadow_gtt);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..ff221d3ddcbc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str)
  }
  early_param("iommu.strict", iommu_dma_setup);
  
-void iommu_set_dma_strict(bool strict)

+void iommu_set_dma_strict(void)
  {
-   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-   iommu_dma_strict = strict;
+   iommu_dma_strict = true;
  }
  
  bool iommu_get_dma_strict(struct iommu_domain *domain)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..754f67d6dd90 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain);
  int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks);
  
-void iommu_set_dma_strict(bool val);

+void iommu_set_dma_strict(void);
  bool iommu_get_dma_strict(struct iommu_domain *domain);
  
  extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,




Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options


On 2021/6/18 19:34, John Garry wrote:

From: Zhen Lei 

Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set,
as is current behaviour.

Also delete global flag intel_iommu_strict:
- In intel_iommu_setup(), call iommu_set_dma_strict(true) directly. Also
   remove the print, as iommu_subsys_init() prints the mode and we have
   already marked this param as deprecated.

- For cap_caching_mode() check in intel_iommu_setup(), call
   iommu_set_dma_strict(true) directly; also reword the accompanying print
   with a level downgrade and also add the missing '\n'.

- For Ironlake GPU, again call iommu_set_dma_strict(true) directly and
   keep the accompanying print.

[jpg: Remove intel_iommu_strict]
Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
---
  drivers/iommu/Kconfig   |  1 +
  drivers/iommu/intel/iommu.c | 15 ++-
  2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 0327a942fdb7..c214a36eb2dc 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -94,6 +94,7 @@ choice
prompt "IOMMU default DMA IOTLB invalidation mode"
depends on IOMMU_DMA
  
+	default IOMMU_DEFAULT_LAZY if INTEL_IOMMU

default IOMMU_DEFAULT_STRICT
help
  This option allows an IOMMU DMA IOTLB invalidation mode to be
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 29497113d748..0f9d8116 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -361,7 +361,6 @@ int intel_iommu_enabled = 0;
  EXPORT_SYMBOL_GPL(intel_iommu_enabled);
  
  static int dmar_map_gfx = 1;

-static int intel_iommu_strict;
  static int intel_iommu_superpage = 1;
  static int iommu_identity_mapping;
  static int iommu_skip_te_disable;
@@ -455,8 +454,7 @@ static int __init intel_iommu_setup(char *str)
iommu_dma_forcedac = true;
} else if (!strncmp(str, "strict", 6)) {
pr_warn("intel_iommu=strict deprecated; use iommu.strict=1 
instead\n");
-   pr_info("Disable batched IOTLB flush\n");
-   intel_iommu_strict = 1;
+   iommu_set_dma_strict(true);
} else if (!strncmp(str, "sp_off", 6)) {
pr_info("Disable supported super page\n");
intel_iommu_superpage = 0;
@@ -4382,9 +4380,9 @@ int __init intel_iommu_init(void)
 * is likely to be much lower than the overhead of synchronizing
 * the virtual and physical IOMMU page-tables.
 */
-   if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
-   pr_warn("IOMMU batching is disabled due to 
virtualization");
-   intel_iommu_strict = 1;
+   if (cap_caching_mode(iommu->cap)) {
+   pr_info_once("IOMMU batching disallowed due to 
virtualization\n");
+   iommu_set_dma_strict(true);
}
iommu_device_sysfs_add(>iommu, NULL,
   intel_iommu_groups,
@@ -4393,7 +4391,6 @@ int __init intel_iommu_init(void)
}
up_read(_global_lock);
  
-	iommu_set_dma_strict(intel_iommu_strict);

bus_set_iommu(_bus_type, _iommu_ops);
if (si_domain && !hw_pass_through)
register_memory_notifier(_iommu_memory_nb);
@@ -5702,8 +5699,8 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev 
*dev)
} else if (dmar_map_gfx) {
/* we have to ensure the gfx device is idle before we flush */
pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n");
-   intel_iommu_strict = 1;
-   }
+   iommu_set_dma_strict(true);
+   }
  }
  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, 
quirk_calpella_no_shadow_gtt);
  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, 
quirk_calpella_no_shadow_gtt);



Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 3/6] iommu: Enhance IOMMU default DMA mode build options


On 2021/6/18 19:34, John Garry wrote:

From: Zhen Lei 

First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the two config options in an choice, as they are mutually exclusive.

[jpg: Make choice between strict and lazy only (and not passthrough)]
Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
Reviewed-by: Robin Murphy 
---
  .../admin-guide/kernel-parameters.txt |  3 +-
  drivers/iommu/Kconfig | 40 +++
  drivers/iommu/iommu.c |  2 +-
  3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 673952379900..a1b7c8526bb5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2046,9 +2046,10 @@
  throughput at the cost of reduced device isolation.
  Will fall back to strict mode if not supported by
  the relevant IOMMU driver.
-   1 - Strict mode (default).
+   1 - Strict mode.
  DMA unmap operations invalidate IOMMU hardware TLBs
  synchronously.
+   unset - Use value of CONFIG_IOMMU_DEFAULT_{LAZY,STRICT}.
Note: on x86, the default behaviour depends on the
equivalent driver-specific parameters, but a strict
mode explicitly specified by either method takes
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..0327a942fdb7 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -90,6 +90,46 @@ config IOMMU_DEFAULT_PASSTHROUGH
  
  	  If unsure, say N here.
  
+choice

+   prompt "IOMMU default DMA IOTLB invalidation mode"
+   depends on IOMMU_DMA
+
+   default IOMMU_DEFAULT_STRICT
+   help
+ This option allows an IOMMU DMA IOTLB invalidation mode to be
+ chosen at build time, to override the default mode of each ARCH,
+ removing the need to pass in kernel parameters through command line.
+ It is still possible to provide common boot params to override this
+ config.
+
+ If unsure, keep the default.
+
+config IOMMU_DEFAULT_STRICT
+   bool "strict"
+   help
+ For every IOMMU DMA unmap operation, the flush operation of IOTLB and
+ the free operation of IOVA are guaranteed to be done in the unmap
+ function.
+
+config IOMMU_DEFAULT_LAZY
+   bool "lazy"
+   help
+ Support lazy mode, where for every IOMMU DMA unmap operation, the
+ flush operation of IOTLB and the free operation of IOVA are deferred.
+ They are only guaranteed to be done before the related IOVA will be
+ reused.
+
+ The isolation provided in this mode is not as secure as STRICT mode,
+ such that a vulnerable time window may be created between the DMA
+ unmap and the mappings cached in the IOMMU IOTLB or device TLB
+ finally being invalidated, where the device could still access the
+ memory which has already been unmapped by the device driver.
+ However this mode may provide better performance in high throughput
+ scenarios, and is still considerably more secure than passthrough
+ mode or no IOMMU.
+
+endchoice
+
  config OF_IOMMU
def_bool y
depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index cf58949cc2f3..60b1ec42e73b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,7 +29,7 @@ static struct kset *iommu_group_kset;
  static DEFINE_IDA(iommu_group_ida);
  
  static unsigned int iommu_def_domain_type __read_mostly;

-static bool iommu_dma_strict __read_mostly = true;
+static bool iommu_dma_strict __read_mostly = 
IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
  static u32 iommu_cmd_line __read_mostly;
  
  struct iommu_group {




Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 2/6] iommu: Print strict or lazy mode at init time


On 2021/6/18 19:34, John Garry wrote:

As well as the default domain type, it's useful to know whether strict
or lazy for DMA domains, so add this info in a separate print.

The (stict/lazy) mode may be also set via iommu.strict earlyparm, but
this will be processed prior to iommu_subsys_init(), so that print will be
accurate for drivers which don't set the mode via custom means.

For the drivers which set the mode via custom means - AMD and Intel drivers
- they maintain prints to inform a change in policy or that custom cmdline
methods to change policy are deprecated.

Signed-off-by: John Garry 
Reviewed-by: Robin Murphy 
---
  drivers/iommu/iommu.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5419c4b9f27a..cf58949cc2f3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -138,6 +138,11 @@ static int __init iommu_subsys_init(void)
(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ?
"(set via kernel command line)" : "");
  
+	pr_info("DMA domain TLB invalidation policy: %s mode %s\n",

+   iommu_dma_strict ? "strict" : "lazy",
+   (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ?
+   "(set via kernel command line)" : "");
+
return 0;
  }
  subsys_initcall(iommu_subsys_init);



Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 1/6] iommu: Deprecate Intel and AMD cmdline methods to enable strict mode


On 2021/6/18 19:34, John Garry wrote:

Now that the x86 drivers support iommu.strict, deprecate the custom
methods.

Signed-off-by: John Garry 
Acked-by: Robin Murphy 
---
  Documentation/admin-guide/kernel-parameters.txt | 9 ++---
  drivers/iommu/amd/init.c| 4 +++-
  drivers/iommu/intel/iommu.c | 1 +
  3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 30e9dd52464e..673952379900 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -290,10 +290,7 @@
amd_iommu=  [HW,X86-64]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
-   fullflush - enable flushing of IO/TLB entries when
-   they are unmapped. Otherwise they are
-   flushed before they will be reused, which
-   is a lot of faster
+   fullflush - Deprecated, equivalent to iommu.strict=1
off   - do not initialize any AMD IOMMU found in
the system
force_isolation - Force device isolation for all
@@ -1948,9 +1945,7 @@
this case, gfx device will use physical address for
DMA.
strict [Default Off]
-   With this option on every unmap_single operation will
-   result in a hardware IOTLB flush operation as opposed
-   to batching them for performance.
+   Deprecated, equivalent to iommu.strict=1.
sp_off [Default Off]
By default, super page will be supported if Intel IOMMU
has the capability. With this option, super page will
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 46280e6e1535..3a2fb805f11e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3098,8 +3098,10 @@ static int __init parse_amd_iommu_intr(char *str)
  static int __init parse_amd_iommu_options(char *str)
  {
for (; *str; ++str) {
-   if (strncmp(str, "fullflush", 9) == 0)
+   if (strncmp(str, "fullflush", 9) == 0) {
+   pr_warn("amd_iommu=fullflush deprecated; use iommu.strict=1 
instead\n");
amd_iommu_unmap_flush = true;
+   }
if (strncmp(str, "force_enable", 12) == 0)
amd_iommu_force_enable = true;
if (strncmp(str, "off", 3) == 0)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index bd93c7ec879e..29497113d748 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -454,6 +454,7 @@ static int __init intel_iommu_setup(char *str)
pr_warn("intel_iommu=forcedac deprecated; use iommu.forcedac 
instead\n");
iommu_dma_forcedac = true;
} else if (!strncmp(str, "strict", 6)) {
+   pr_warn("intel_iommu=strict deprecated; use iommu.strict=1 
instead\n");
pr_info("Disable batched IOTLB flush\n");
intel_iommu_strict = 1;
} else if (!strncmp(str, "sp_off", 6)) {



Reviewed-by: Lu Baolu 

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu: rockchip: Fix physical address decoding

2021-06-18 Thread Benjamin Gaignard

Restore bits 39 to 32 at correct position.
It reverses the operation done in rk_dma_addr_dte_v2().

Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2")

Reported-by: Dan Carpenter 
Signed-off-by: Benjamin Gaignard 
---
 drivers/iommu/rockchip-iommu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 94b9d8e5b9a40..9febfb7f3025b 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -544,12 +544,14 @@ static inline u32 rk_dma_addr_dte(dma_addr_t dt_dma)
 }
 
 #define DT_HI_MASK GENMASK_ULL(39, 32)
+#define DTE_BASE_HI_MASK GENMASK(11, 4)
 #define DT_SHIFT   28
 
 static inline phys_addr_t rk_dte_addr_phys_v2(u32 addr)
 {
-   return (phys_addr_t)(addr & RK_DTE_PT_ADDRESS_MASK) |
-  ((addr & DT_HI_MASK) << DT_SHIFT);
+   u64 addr64 = addr;
+   return (phys_addr_t)(addr64 & RK_DTE_PT_ADDRESS_MASK) |
+  ((addr64 & DTE_BASE_HI_MASK) << DT_SHIFT);
 }
 
 static inline u32 rk_dma_addr_dte_v2(dma_addr_t dt_dma)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu: rockchip: Fix physical address decoding


On 2021-06-18 12:46, Benjamin Gaignard wrote:

Restore bits 39 to 32 at correct position.
It reverses the operation done in rk_dma_addr_dte_v2().

Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2")

Reported-by: Dan Carpenter 
Signed-off-by: Benjamin Gaignard 
---
  drivers/iommu/rockchip-iommu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 94b9d8e5b9a40..e55482a477080 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -544,12 +544,13 @@ static inline u32 rk_dma_addr_dte(dma_addr_t dt_dma)
  }
  
  #define DT_HI_MASK GENMASK_ULL(39, 32)

+#define DTE_BASE_HI_MASK GENMASK(11, 4)
  #define DT_SHIFT   28
  
  static inline phys_addr_t rk_dte_addr_phys_v2(u32 addr)

  {
return (phys_addr_t)(addr & RK_DTE_PT_ADDRESS_MASK) |
-  ((addr & DT_HI_MASK) << DT_SHIFT);
+  ((addr & DTE_BASE_HI_MASK) << DT_SHIFT);


I think this still has the problem that you're shifting off the top of 
addr *before* the cast takes effect.


Robin.


  }
  
  static inline u32 rk_dma_addr_dte_v2(dma_addr_t dt_dma)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu: rockchip: Fix physical address decoding

2021-06-18 Thread Benjamin Gaignard

Restore bits 39 to 32 at correct position.
It reverses the operation done in rk_dma_addr_dte_v2().

Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2")

Reported-by: Dan Carpenter 
Signed-off-by: Benjamin Gaignard 
---
 drivers/iommu/rockchip-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 94b9d8e5b9a40..e55482a477080 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -544,12 +544,13 @@ static inline u32 rk_dma_addr_dte(dma_addr_t dt_dma)
 }
 
 #define DT_HI_MASK GENMASK_ULL(39, 32)
+#define DTE_BASE_HI_MASK GENMASK(11, 4)
 #define DT_SHIFT   28
 
 static inline phys_addr_t rk_dte_addr_phys_v2(u32 addr)
 {
return (phys_addr_t)(addr & RK_DTE_PT_ADDRESS_MASK) |
-  ((addr & DT_HI_MASK) << DT_SHIFT);
+  ((addr & DTE_BASE_HI_MASK) << DT_SHIFT);
 }
 
 static inline u32 rk_dma_addr_dte_v2(dma_addr_t dt_dma)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 5/6] iommu/amd: Add support for IOMMU default DMA mode build options

From: Zhen Lei 

Make IOMMU_DEFAULT_LAZY default for when AMD_IOMMU config is set, which
matches current behaviour.

For "fullflush" param, just call iommu_set_dma_strict(true) directly.

Since we get a strict vs lazy mode print already in iommu_subsys_init(),
and maintain a deprecation print when "fullflush" param is passed, drop the
prints in amd_iommu_init_dma_ops().

Finally drop global flag amd_iommu_unmap_flush, as it has no longer has any
purpose.

[jpg: Rebase for relocated file and drop amd_iommu_unmap_flush]
Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
---
 drivers/iommu/Kconfig   | 2 +-
 drivers/iommu/amd/amd_iommu_types.h | 6 --
 drivers/iommu/amd/init.c| 3 +--
 drivers/iommu/amd/iommu.c   | 6 --
 4 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c214a36eb2dc..fd1ad28dd5ee 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -94,7 +94,7 @@ choice
prompt "IOMMU default DMA IOTLB invalidation mode"
depends on IOMMU_DMA
 
-   default IOMMU_DEFAULT_LAZY if INTEL_IOMMU
+   default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU)
default IOMMU_DEFAULT_STRICT
help
  This option allows an IOMMU DMA IOTLB invalidation mode to be
diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 94c1a7a9876d..8dbe61e2b3c1 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -779,12 +779,6 @@ extern u16 amd_iommu_last_bdf;
 /* allocation bitmap for domain ids */
 extern unsigned long *amd_iommu_pd_alloc_bitmap;
 
-/*
- * If true, the addresses will be flushed on unmap time, not when
- * they are reused
- */
-extern bool amd_iommu_unmap_flush;
-
 /* Smallest max PASID supported by any IOMMU in the system */
 extern u32 amd_iommu_max_pasid;
 
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 3a2fb805f11e..1e641cb6dddc 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -161,7 +161,6 @@ u16 amd_iommu_last_bdf; /* largest PCI 
device id we have
   to handle */
 LIST_HEAD(amd_iommu_unity_map);/* a list of required unity 
mappings
   we find in ACPI */
-bool amd_iommu_unmap_flush;/* if true, flush on every unmap */
 
 LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the
   system */
@@ -3100,7 +3099,7 @@ static int __init parse_amd_iommu_options(char *str)
for (; *str; ++str) {
if (strncmp(str, "fullflush", 9) == 0) {
pr_warn("amd_iommu=fullflush deprecated; use 
iommu.strict=1 instead\n");
-   amd_iommu_unmap_flush = true;
+   iommu_set_dma_strict(true);
}
if (strncmp(str, "force_enable", 12) == 0)
amd_iommu_force_enable = true;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index b1fbf2c83df5..32b541ee2e11 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1775,12 +1775,6 @@ void amd_iommu_domain_update(struct protection_domain 
*domain)
 static void __init amd_iommu_init_dma_ops(void)
 {
swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0;
-
-   if (amd_iommu_unmap_flush)
-   pr_info("IO/TLB flush on unmap enabled\n");
-   else
-   pr_info("Lazy IO/TLB flushing enabled\n");
-   iommu_set_dma_strict(amd_iommu_unmap_flush);
 }
 
 int __init amd_iommu_init_api(void)
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

We only ever now set strict mode enabled in iommu_set_dma_strict(), so
just remove the argument.

Signed-off-by: John Garry 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/amd/init.c| 2 +-
 drivers/iommu/intel/iommu.c | 6 +++---
 drivers/iommu/iommu.c   | 5 ++---
 include/linux/iommu.h   | 2 +-
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 1e641cb6dddc..6e12a615117b 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3099,7 +3099,7 @@ static int __init parse_amd_iommu_options(char *str)
for (; *str; ++str) {
if (strncmp(str, "fullflush", 9) == 0) {
pr_warn("amd_iommu=fullflush deprecated; use 
iommu.strict=1 instead\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
}
if (strncmp(str, "force_enable", 12) == 0)
amd_iommu_force_enable = true;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0f9d8116..77d0834fb0df 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -454,7 +454,7 @@ static int __init intel_iommu_setup(char *str)
iommu_dma_forcedac = true;
} else if (!strncmp(str, "strict", 6)) {
pr_warn("intel_iommu=strict deprecated; use 
iommu.strict=1 instead\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
} else if (!strncmp(str, "sp_off", 6)) {
pr_info("Disable supported super page\n");
intel_iommu_superpage = 0;
@@ -4382,7 +4382,7 @@ int __init intel_iommu_init(void)
 */
if (cap_caching_mode(iommu->cap)) {
pr_info_once("IOMMU batching disallowed due to 
virtualization\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
}
iommu_device_sysfs_add(>iommu, NULL,
   intel_iommu_groups,
@@ -5699,7 +5699,7 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev 
*dev)
} else if (dmar_map_gfx) {
/* we have to ensure the gfx device is idle before we flush */
pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n");
-   iommu_set_dma_strict(true);
+   iommu_set_dma_strict();
}
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, 
quirk_calpella_no_shadow_gtt);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..ff221d3ddcbc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str)
 }
 early_param("iommu.strict", iommu_dma_setup);
 
-void iommu_set_dma_strict(bool strict)
+void iommu_set_dma_strict(void)
 {
-   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-   iommu_dma_strict = strict;
+   iommu_dma_strict = true;
 }
 
 bool iommu_get_dma_strict(struct iommu_domain *domain)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..754f67d6dd90 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain);
 int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks);
 
-void iommu_set_dma_strict(bool val);
+void iommu_set_dma_strict(void);
 bool iommu_get_dma_strict(struct iommu_domain *domain);
 
 extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options

From: Zhen Lei 

Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set,
as is current behaviour.

Also delete global flag intel_iommu_strict:
- In intel_iommu_setup(), call iommu_set_dma_strict(true) directly. Also
  remove the print, as iommu_subsys_init() prints the mode and we have
  already marked this param as deprecated.

- For cap_caching_mode() check in intel_iommu_setup(), call
  iommu_set_dma_strict(true) directly; also reword the accompanying print
  with a level downgrade and also add the missing '\n'.

- For Ironlake GPU, again call iommu_set_dma_strict(true) directly and
  keep the accompanying print.

[jpg: Remove intel_iommu_strict]
Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
---
 drivers/iommu/Kconfig   |  1 +
 drivers/iommu/intel/iommu.c | 15 ++-
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 0327a942fdb7..c214a36eb2dc 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -94,6 +94,7 @@ choice
prompt "IOMMU default DMA IOTLB invalidation mode"
depends on IOMMU_DMA
 
+   default IOMMU_DEFAULT_LAZY if INTEL_IOMMU
default IOMMU_DEFAULT_STRICT
help
  This option allows an IOMMU DMA IOTLB invalidation mode to be
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 29497113d748..0f9d8116 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -361,7 +361,6 @@ int intel_iommu_enabled = 0;
 EXPORT_SYMBOL_GPL(intel_iommu_enabled);
 
 static int dmar_map_gfx = 1;
-static int intel_iommu_strict;
 static int intel_iommu_superpage = 1;
 static int iommu_identity_mapping;
 static int iommu_skip_te_disable;
@@ -455,8 +454,7 @@ static int __init intel_iommu_setup(char *str)
iommu_dma_forcedac = true;
} else if (!strncmp(str, "strict", 6)) {
pr_warn("intel_iommu=strict deprecated; use 
iommu.strict=1 instead\n");
-   pr_info("Disable batched IOTLB flush\n");
-   intel_iommu_strict = 1;
+   iommu_set_dma_strict(true);
} else if (!strncmp(str, "sp_off", 6)) {
pr_info("Disable supported super page\n");
intel_iommu_superpage = 0;
@@ -4382,9 +4380,9 @@ int __init intel_iommu_init(void)
 * is likely to be much lower than the overhead of synchronizing
 * the virtual and physical IOMMU page-tables.
 */
-   if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) {
-   pr_warn("IOMMU batching is disabled due to 
virtualization");
-   intel_iommu_strict = 1;
+   if (cap_caching_mode(iommu->cap)) {
+   pr_info_once("IOMMU batching disallowed due to 
virtualization\n");
+   iommu_set_dma_strict(true);
}
iommu_device_sysfs_add(>iommu, NULL,
   intel_iommu_groups,
@@ -4393,7 +4391,6 @@ int __init intel_iommu_init(void)
}
up_read(_global_lock);
 
-   iommu_set_dma_strict(intel_iommu_strict);
bus_set_iommu(_bus_type, _iommu_ops);
if (si_domain && !hw_pass_through)
register_memory_notifier(_iommu_memory_nb);
@@ -5702,8 +5699,8 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev 
*dev)
} else if (dmar_map_gfx) {
/* we have to ensure the gfx device is idle before we flush */
pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n");
-   intel_iommu_strict = 1;
-   }
+   iommu_set_dma_strict(true);
+   }
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, 
quirk_calpella_no_shadow_gtt);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, 
quirk_calpella_no_shadow_gtt);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 3/6] iommu: Enhance IOMMU default DMA mode build options

From: Zhen Lei 

First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the
opportunity to set {lazy|strict} mode as default at build time. Then put
the two config options in an choice, as they are mutually exclusive.

[jpg: Make choice between strict and lazy only (and not passthrough)]
Signed-off-by: Zhen Lei 
Signed-off-by: John Garry 
Reviewed-by: Robin Murphy 
---
 .../admin-guide/kernel-parameters.txt |  3 +-
 drivers/iommu/Kconfig | 40 +++
 drivers/iommu/iommu.c |  2 +-
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 673952379900..a1b7c8526bb5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2046,9 +2046,10 @@
  throughput at the cost of reduced device isolation.
  Will fall back to strict mode if not supported by
  the relevant IOMMU driver.
-   1 - Strict mode (default).
+   1 - Strict mode.
  DMA unmap operations invalidate IOMMU hardware TLBs
  synchronously.
+   unset - Use value of CONFIG_IOMMU_DEFAULT_{LAZY,STRICT}.
Note: on x86, the default behaviour depends on the
equivalent driver-specific parameters, but a strict
mode explicitly specified by either method takes
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..0327a942fdb7 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -90,6 +90,46 @@ config IOMMU_DEFAULT_PASSTHROUGH
 
  If unsure, say N here.
 
+choice
+   prompt "IOMMU default DMA IOTLB invalidation mode"
+   depends on IOMMU_DMA
+
+   default IOMMU_DEFAULT_STRICT
+   help
+ This option allows an IOMMU DMA IOTLB invalidation mode to be
+ chosen at build time, to override the default mode of each ARCH,
+ removing the need to pass in kernel parameters through command line.
+ It is still possible to provide common boot params to override this
+ config.
+
+ If unsure, keep the default.
+
+config IOMMU_DEFAULT_STRICT
+   bool "strict"
+   help
+ For every IOMMU DMA unmap operation, the flush operation of IOTLB and
+ the free operation of IOVA are guaranteed to be done in the unmap
+ function.
+
+config IOMMU_DEFAULT_LAZY
+   bool "lazy"
+   help
+ Support lazy mode, where for every IOMMU DMA unmap operation, the
+ flush operation of IOTLB and the free operation of IOVA are deferred.
+ They are only guaranteed to be done before the related IOVA will be
+ reused.
+
+ The isolation provided in this mode is not as secure as STRICT mode,
+ such that a vulnerable time window may be created between the DMA
+ unmap and the mappings cached in the IOMMU IOTLB or device TLB
+ finally being invalidated, where the device could still access the
+ memory which has already been unmapped by the device driver.
+ However this mode may provide better performance in high throughput
+ scenarios, and is still considerably more secure than passthrough
+ mode or no IOMMU.
+
+endchoice
+
 config OF_IOMMU
def_bool y
depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index cf58949cc2f3..60b1ec42e73b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,7 +29,7 @@ static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
 
 static unsigned int iommu_def_domain_type __read_mostly;
-static bool iommu_dma_strict __read_mostly = true;
+static bool iommu_dma_strict __read_mostly = 
IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT);
 static u32 iommu_cmd_line __read_mostly;
 
 struct iommu_group {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 2/6] iommu: Print strict or lazy mode at init time

As well as the default domain type, it's useful to know whether strict
or lazy for DMA domains, so add this info in a separate print.

The (stict/lazy) mode may be also set via iommu.strict earlyparm, but
this will be processed prior to iommu_subsys_init(), so that print will be
accurate for drivers which don't set the mode via custom means.

For the drivers which set the mode via custom means - AMD and Intel drivers
- they maintain prints to inform a change in policy or that custom cmdline
methods to change policy are deprecated.

Signed-off-by: John Garry 
Reviewed-by: Robin Murphy 
---
 drivers/iommu/iommu.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5419c4b9f27a..cf58949cc2f3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -138,6 +138,11 @@ static int __init iommu_subsys_init(void)
(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ?
"(set via kernel command line)" : "");
 
+   pr_info("DMA domain TLB invalidation policy: %s mode %s\n",
+   iommu_dma_strict ? "strict" : "lazy",
+   (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ?
+   "(set via kernel command line)" : "");
+
return 0;
 }
 subsys_initcall(iommu_subsys_init);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 0/6] iommu: Enhance IOMMU default DMA mode build options

This is a reboot of Zhen Lei's series from a couple of years ago, which
never made it across the line.

I still think that it has some value, so taking up the mantle.

Motivation:
Allow lazy mode be default mode for DMA domains for all ARCHs, and not
only those who hardcode it (to be lazy). For ARM64, currently we must use
a kernel command line parameter to use lazy mode, which is less than
ideal.

I have now included the print for strict/lazy mode, which I originally
sent in:
https://lore.kernel.org/linux-iommu/72eb3de9-1d1c-ae46-c5a9-95f26525d...@huawei.com/

There was some concern there about drivers and their custom prints
conflicting with the print in that patch, but I think that it
should be ok.

Based on next-20210611 + "iommu: Update "iommu.strict" documentation"

Differences to v13:
- Improve strict mode deprecation messages and cut out some
  kernel-parameters.txt legacy description
- Add tag in 1/6
- use pr_info_once() for vt-d message about VM and caching

Differences to v12:
- Add Robin's RB tags (thanks!)
- Add a patch to mark x86 strict cmdline params as deprecated
- Improve wording in Kconfig change and tweak iommu_dma_strict declaration

Differences to v11:
- Rebase to next-20210610
- Drop strict mode globals in Intel and AMD drivers
- Include patch to print strict vs lazy mode
- Include patch to remove argument from iommu_set_dma_strict()

John Garry (3):
  iommu: Deprecate Intel and AMD cmdline methods to enable strict mode
  iommu: Print strict or lazy mode at init time
  iommu: Remove mode argument from iommu_set_dma_strict()

Zhen Lei (3):
  iommu: Enhance IOMMU default DMA mode build options
  iommu/vt-d: Add support for IOMMU default DMA mode build options
  iommu/amd: Add support for IOMMU default DMA mode build options

 .../admin-guide/kernel-parameters.txt | 12 ++
 drivers/iommu/Kconfig | 41 +++
 drivers/iommu/amd/amd_iommu_types.h   |  6 ---
 drivers/iommu/amd/init.c  |  7 ++--
 drivers/iommu/amd/iommu.c |  6 ---
 drivers/iommu/intel/iommu.c   | 16 
 drivers/iommu/iommu.c | 12 --
 include/linux/iommu.h |  2 +-
 8 files changed, 65 insertions(+), 37 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v14 1/6] iommu: Deprecate Intel and AMD cmdline methods to enable strict mode

Now that the x86 drivers support iommu.strict, deprecate the custom
methods.

Signed-off-by: John Garry 
Acked-by: Robin Murphy 
---
 Documentation/admin-guide/kernel-parameters.txt | 9 ++---
 drivers/iommu/amd/init.c| 4 +++-
 drivers/iommu/intel/iommu.c | 1 +
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 30e9dd52464e..673952379900 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -290,10 +290,7 @@
amd_iommu=  [HW,X86-64]
Pass parameters to the AMD IOMMU driver in the system.
Possible values are:
-   fullflush - enable flushing of IO/TLB entries when
-   they are unmapped. Otherwise they are
-   flushed before they will be reused, which
-   is a lot of faster
+   fullflush - Deprecated, equivalent to iommu.strict=1
off   - do not initialize any AMD IOMMU found in
the system
force_isolation - Force device isolation for all
@@ -1948,9 +1945,7 @@
this case, gfx device will use physical address for
DMA.
strict [Default Off]
-   With this option on every unmap_single operation will
-   result in a hardware IOTLB flush operation as opposed
-   to batching them for performance.
+   Deprecated, equivalent to iommu.strict=1.
sp_off [Default Off]
By default, super page will be supported if Intel IOMMU
has the capability. With this option, super page will
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 46280e6e1535..3a2fb805f11e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3098,8 +3098,10 @@ static int __init parse_amd_iommu_intr(char *str)
 static int __init parse_amd_iommu_options(char *str)
 {
for (; *str; ++str) {
-   if (strncmp(str, "fullflush", 9) == 0)
+   if (strncmp(str, "fullflush", 9) == 0) {
+   pr_warn("amd_iommu=fullflush deprecated; use 
iommu.strict=1 instead\n");
amd_iommu_unmap_flush = true;
+   }
if (strncmp(str, "force_enable", 12) == 0)
amd_iommu_force_enable = true;
if (strncmp(str, "off", 3) == 0)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index bd93c7ec879e..29497113d748 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -454,6 +454,7 @@ static int __init intel_iommu_setup(char *str)
pr_warn("intel_iommu=forcedac deprecated; use 
iommu.forcedac instead\n");
iommu_dma_forcedac = true;
} else if (!strncmp(str, "strict", 6)) {
+   pr_warn("intel_iommu=strict deprecated; use 
iommu.strict=1 instead\n");
pr_info("Disable batched IOTLB flush\n");
intel_iommu_strict = 1;
} else if (!strncmp(str, "sp_off", 6)) {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()


On 2021-06-18 11:50, Jean-Philippe Brucker wrote:

On Wed, Jun 16, 2021 at 06:02:39PM +0100, Robin Murphy wrote:

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c62e19bed302..175f8eaeb5b3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
if (domain->type == IOMMU_DOMAIN_DMA) {
if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
goto out_err;
-   dev->dma_ops = _dma_ops;
+   set_dma_ops(dev, _dma_ops);
+   } else {
+   set_dma_ops(dev, NULL);


I'm not keen on moving this here, since iommu-dma only knows that its own
ops are right for devices it *is* managing; it can't assume any particular
ops are appropriate for devices it isn't. The idea here is that
arch_setup_dma_ops() may have already set the appropriate ops for the
non-IOMMU case, so if the default domain type is passthrough then we leave
those in place.

For example, I do still plan to revisit my conversion of arch/arm someday,
at which point I'd have to undo this for that reason.


Makes sense, I'll remove this bit.


Simplifying the base and size arguments is of course fine, but TBH I'd say
rip the whole bloody lot out of the arch_setup_dma_ops() flow now. It's a
considerable faff passing them around for nothing but a tenuous sanity check
in iommu_dma_init_domain(), and now that dev->dma_range_map is a common
thing we should expect that to give us any relevant limitations if we even
still care.


So I started working on this but it gets too bulky for a preparatory
patch. Dropping the parameters from arch_setup_dma_ops() seems especially
complicated because arm32 does need the size parameter for IOMMU mappings
and that value falls back to the bus DMA mask or U32_MAX in the absence of
dma-ranges. I could try to dig into this for a separate series.

Even only dropping the parameters from iommu_setup_dma_ops() isn't
completely trivial (8 files changed, 55 insertions(+), 36 deletions(-)
because we still need the lower IOVA limit from dma_range_map), so I'd
rather send it separately and have it sit in -next for a while.


Oh, sure, I didn't mean to imply that the whole cleanup should be within 
the scope of this series, just that we can shave off as much as we *do* 
need to touch here (which TBH is pretty much what you're doing already), 
and mainly to start taking the attitude that these arguments are now 
superseded and increasingly vestigial.


I expected the cross-arch cleanup to be a bit fiddly, but I'd forgotten 
that arch/arm was still actively using these values, so maybe I can 
revisit this when I pick up my iommu-dma conversion again (I swear it's 
not dead, just resting!)


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()

On Wed, Jun 16, 2021 at 06:02:39PM +0100, Robin Murphy wrote:
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index c62e19bed302..175f8eaeb5b3 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 
> > dma_base, u64 dma_limit)
> > if (domain->type == IOMMU_DOMAIN_DMA) {
> > if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
> > goto out_err;
> > -   dev->dma_ops = _dma_ops;
> > +   set_dma_ops(dev, _dma_ops);
> > +   } else {
> > +   set_dma_ops(dev, NULL);
> 
> I'm not keen on moving this here, since iommu-dma only knows that its own
> ops are right for devices it *is* managing; it can't assume any particular
> ops are appropriate for devices it isn't. The idea here is that
> arch_setup_dma_ops() may have already set the appropriate ops for the
> non-IOMMU case, so if the default domain type is passthrough then we leave
> those in place.
> 
> For example, I do still plan to revisit my conversion of arch/arm someday,
> at which point I'd have to undo this for that reason.

Makes sense, I'll remove this bit.

> Simplifying the base and size arguments is of course fine, but TBH I'd say
> rip the whole bloody lot out of the arch_setup_dma_ops() flow now. It's a
> considerable faff passing them around for nothing but a tenuous sanity check
> in iommu_dma_init_domain(), and now that dev->dma_range_map is a common
> thing we should expect that to give us any relevant limitations if we even
> still care.

So I started working on this but it gets too bulky for a preparatory
patch. Dropping the parameters from arch_setup_dma_ops() seems especially
complicated because arm32 does need the size parameter for IOMMU mappings
and that value falls back to the bus DMA mask or U32_MAX in the absence of
dma-ranges. I could try to dig into this for a separate series.

Even only dropping the parameters from iommu_setup_dma_ops() isn't
completely trivial (8 files changed, 55 insertions(+), 36 deletions(-)
because we still need the lower IOVA limit from dma_range_map), so I'd
rather send it separately and have it sit in -next for a while.

Thanks,
Jean

> 
> That said, those are all things which can be fixed up later if the series is
> otherwise ready to go and there's still a chance of landing it for 5.14. If
> you do have any other reason to respin, then I think the x86 probe_finalize
> functions simply want an unconditional set_dma_ops(dev, NULL) before the
> iommu_setup_dma_ops() call.
> 
> Cheers,
> Robin.
> 
> > }
> > return;
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 85f18342603c..8d866940692a 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -5165,15 +5165,7 @@ static void intel_iommu_release_device(struct device 
> > *dev)
> >   static void intel_iommu_probe_finalize(struct device *dev)
> >   {
> > -   dma_addr_t base = IOVA_START_PFN << VTD_PAGE_SHIFT;
> > -   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> > -   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > -
> > -   if (domain && domain->type == IOMMU_DOMAIN_DMA)
> > -   iommu_setup_dma_ops(dev, base,
> > -   __DOMAIN_MAX_ADDR(dmar_domain->gaw));
> > -   else
> > -   set_dma_ops(dev, NULL);
> > +   iommu_setup_dma_ops(dev, 0, U64_MAX);
> >   }
> >   static void intel_iommu_get_resv_regions(struct device *device,
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 4/6] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()

On Wed, Jun 16, 2021 at 05:28:59PM +0200, Eric Auger wrote:
> Hi Jean,
> 
> On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote:
> > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> > index 4bf1dd3eb041..7bd1d2199141 100644
> > --- a/arch/arm64/mm/dma-mapping.c
> > +++ b/arch/arm64/mm/dma-mapping.c
> > @@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
> > u64 size,
> >  
> > dev->dma_coherent = coherent;
> > if (iommu)
> > -   iommu_setup_dma_ops(dev, dma_base, size);
> > +   iommu_setup_dma_ops(dev, dma_base, size - dma_base - 1);
> I don't get  size - dma_base - 1?

Because it's wrong, should be dma_base + size - 1. Thanks for catching it!

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/6] ACPI: Move IOMMU setup code out of IORT


On 2021-06-18 08:41, Jean-Philippe Brucker wrote:

Hi Eric,

On Wed, Jun 16, 2021 at 11:35:13AM +0200, Eric Auger wrote:

-const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
-   const u32 *id_in)
+int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
  {
struct acpi_iort_node *node;
-   const struct iommu_ops *ops;
+   const struct iommu_ops *ops = NULL;


Oops, I need to remove this (and add -Werror to my tests.)



+static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
+  const u32 *id_in)
+{
+   int err;
+   const struct iommu_ops *ops;
+
+   /*
+* If we already translated the fwspec there is nothing left to do,
+* return the iommu_ops.
+*/
+   ops = acpi_iommu_fwspec_ops(dev);
+   if (ops)
+   return ops;
+
+   err = iort_iommu_configure_id(dev, id_in);
+
+   /*
+* If we have reason to believe the IOMMU driver missed the initial
+* add_device callback for dev, replay it to get things in order.
+*/
+   if (!err && dev->bus && !device_iommu_mapped(dev))
+   err = iommu_probe_device(dev);

Previously we had:
     if (!err) {
         ops = iort_fwspec_iommu_ops(dev);
         err = iort_add_device_replay(dev);
     }

Please can you explain the transform? I see the

acpi_iommu_fwspec_ops call below but is it not straightforward to me.


I figured that iort_add_device_replay() is only used once and is
sufficiently simple to be inlined manually (saving 10 lines). Then I
replaced the ops assignment with returns, which saves another line and may
be slightly clearer?  I guess it's mostly a matter of taste, the behavior
should be exactly the same.


Right, IIRC the multiple assignments to ops were more of a haphazard 
evolution inherited from the DT version, and looking at it now I think 
the multiple-return is indeed a bit nicer.


Similarly, it looks like the factoring out of iort_add_device_replay() 
was originally an attempt to encapsulate the IOMMU_API dependency, but 
things have moved around a lot since then, so that seems like a sensible 
simplification to make too.


Robin.




Also the comment mentions replay. Unsure if it is still OK.


The "replay" part is, but "add_device" isn't accurate because it has since
been replaced by probe_device. I'll refresh the comment.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] eventfd: Enlarge recursion limit to allow vhost to work

2021-06-18 Thread He Zhe

commit b5e683d5cab8 ("eventfd: track eventfd_signal() recursion depth")
introduces a percpu counter that tracks the percpu recursion depth and
warn if it greater than zero, to avoid potential deadlock and stack
overflow.

However sometimes different eventfds may be used in parallel. Specifically,
when heavy network load goes through kvm and vhost, working as below, it
would trigger the following call trace.

-  100.00%
   - 66.51%
ret_from_fork
kthread
  - vhost_worker
 - 33.47% handle_tx_kick
  handle_tx
  handle_tx_copy
  vhost_tx_batch.isra.0
  vhost_add_used_and_signal_n
  eventfd_signal
 - 33.05% handle_rx_net
  handle_rx
  vhost_add_used_and_signal_n
  eventfd_signal
   - 33.49%
ioctl
entry_SYSCALL_64_after_hwframe
do_syscall_64
__x64_sys_ioctl
ksys_ioctl
do_vfs_ioctl
kvm_vcpu_ioctl
kvm_arch_vcpu_ioctl_run
vmx_handle_exit
handle_ept_misconfig
kvm_io_bus_write
__kvm_io_bus_write
eventfd_signal

001: WARNING: CPU: 1 PID: 1503 at fs/eventfd.c:73 eventfd_signal+0x85/0xa0
 snip 
001: Call Trace:
001:  vhost_signal+0x15e/0x1b0 [vhost]
001:  vhost_add_used_and_signal_n+0x2b/0x40 [vhost]
001:  handle_rx+0xb9/0x900 [vhost_net]
001:  handle_rx_net+0x15/0x20 [vhost_net]
001:  vhost_worker+0xbe/0x120 [vhost]
001:  kthread+0x106/0x140
001:  ? log_used.part.0+0x20/0x20 [vhost]
001:  ? kthread_park+0x90/0x90
001:  ret_from_fork+0x35/0x40
001: ---[ end trace 0003 ]---

This patch enlarges the limit to 1 which is the maximum recursion depth we
have found so far.

The credit of modification for eventfd_signal_count goes to
Xie Yongji 

Signed-off-by: He Zhe 
---
 fs/eventfd.c| 3 ++-
 include/linux/eventfd.h | 5 -
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index e265b6dd4f34..add6af91cacf 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -71,7 +71,8 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
 * it returns true, the eventfd_signal() call should be deferred to a
 * safe context.
 */
-   if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
+   if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) >
+   EFD_WAKE_COUNT_MAX))
return 0;
 
spin_lock_irqsave(>wqh.lock, flags);
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index fa0a524baed0..74be152ebe87 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -29,6 +29,9 @@
 #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
 #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
 
+/* This is the maximum recursion depth we find so far */
+#define EFD_WAKE_COUNT_MAX 1
+
 struct eventfd_ctx;
 struct file;
 
@@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count);
 
 static inline bool eventfd_signal_count(void)
 {
-   return this_cpu_read(eventfd_wake_count);
+   return this_cpu_read(eventfd_wake_count) > EFD_WAKE_COUNT_MAX;
 }
 
 #else /* CONFIG_EVENTFD */
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems

2021-06-18 Thread Arnaud POULIQUEN

Hello ROb,

On 6/15/21 9:15 PM, Rob Herring wrote:
> If a property has an 'items' list, then a 'minItems' or 'maxItems' with the
> same size as the list is redundant and can be dropped. Note that is DT
> schema specific behavior and not standard json-schema behavior. The tooling
> will fixup the final schema adding any unspecified minItems/maxItems.
> 
> This condition is partially checked with the meta-schema already, but
> only if both 'minItems' and 'maxItems' are equal to the 'items' length.
> An improved meta-schema is pending.
> 
[...]
>  .../devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.yaml | 2 --
[...]
>  .../devicetree/bindings/mailbox/st,stm32-ipcc.yaml  | 2 --
[...]
>  .../devicetree/bindings/remoteproc/st,stm32-rproc.yaml  | 2 --
[...]
>  Documentation/devicetree/bindings/sound/st,stm32-sai.yaml   | 3 ---
Reviewed-by: Arnaud Pouliquen 

Thanks,
Arnaud
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 03/10] eventfd: Increase the recursion depth of eventfd_signal()

2021-06-18 Thread He Zhe




On 6/18/21 11:29 AM, Yongji Xie wrote:
> On Thu, Jun 17, 2021 at 4:34 PM He Zhe  wrote:
>>
>>
>> On 6/15/21 10:13 PM, Xie Yongji wrote:
>>> Increase the recursion depth of eventfd_signal() to 1. This
>>> is the maximum recursion depth we have found so far, which
>>> can be triggered with the following call chain:
>>>
>>> kvm_io_bus_write[kvm]
>>>   --> ioeventfd_write   [kvm]
>>> --> eventfd_signal  [eventfd]
>>>   --> vhost_poll_wakeup [vhost]
>>> --> vduse_vdpa_kick_vq  [vduse]
>>>   --> eventfd_signal[eventfd]
>>>
>>> Signed-off-by: Xie Yongji 
>>> Acked-by: Jason Wang 
>> The fix had been posted one year ago.
>>
>> https://lore.kernel.org/lkml/20200410114720.24838-1-zhe...@windriver.com/
>>
> OK, so it seems to be a fix for the RT system if my understanding is
> correct? Any reason why it's not merged? I'm happy to rebase my series
> on your patch if you'd like to repost it.

It works for both mainline and RT kernel. The folks just reproduced in their RT
environments.

This patch somehow hasn't got maintainer's reply, so not merged yet.

And OK, I'll resend the patch.

>
> BTW, I also notice another thread for this issue:
>
> https://lore.kernel.org/linux-fsdevel/dm6pr11mb420291b550a10853403c7592ff...@dm6pr11mb4202.namprd11.prod.outlook.com/T/

This is the same way as my v1

https://lore.kernel.org/lkml/3b4aa4cb-0e76-89c2-c48a-cf24e1a36...@kernel.dk/

which was not what the maintainer wanted.


>
>>> ---
>>>  fs/eventfd.c| 2 +-
>>>  include/linux/eventfd.h | 5 -
>>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/eventfd.c b/fs/eventfd.c
>>> index e265b6dd4f34..cc7cd1dbedd3 100644
>>> --- a/fs/eventfd.c
>>> +++ b/fs/eventfd.c
>>> @@ -71,7 +71,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
>>>* it returns true, the eventfd_signal() call should be deferred to a
>>>* safe context.
>>>*/
>>> - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count)))
>>> + if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH))
>>>   return 0;
>>>
>>>   spin_lock_irqsave(>wqh.lock, flags);
>>> diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
>>> index fa0a524baed0..886d99cd38ef 100644
>>> --- a/include/linux/eventfd.h
>>> +++ b/include/linux/eventfd.h
>>> @@ -29,6 +29,9 @@
>>>  #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
>>>  #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
>>>
>>> +/* Maximum recursion depth */
>>> +#define EFD_WAKE_DEPTH 1
>>> +
>>>  struct eventfd_ctx;
>>>  struct file;
>>>
>>> @@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count);
>>>
>>>  static inline bool eventfd_signal_count(void)
>>>  {
>>> - return this_cpu_read(eventfd_wake_count);
>>> + return this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH;
>> count is just count. How deep is acceptable should be put
>> where eventfd_signal_count is called.
>>
> The return value of this function is boolean rather than integer.
> Please see the comments in eventfd_signal():
>
> "then it should check eventfd_signal_count() before calling this
> function. If it returns true, the eventfd_signal() call should be
> deferred to a safe context."

OK. Now that the maintainer comments as such we can use it accordingly,
though I still got the feeling that the function name and the type of the return
value don't match.


Thanks,
Zhe

>
> Thanks,
> Yongji

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/vt-d: Fix W=1 clang warning in intel/perf.c

2021-06-18 Thread Joerg Roedel

On Thu, Jun 17, 2021 at 01:51:24PM -0700, Nathan Chancellor wrote:
> kernel-doc is run automatically with W=1, regardless of gcc versus clang.

I see, thanks. Will update the commit message.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 3/6] ACPI: Add driver for the VIOT table

On Thu, Jun 17, 2021 at 01:50:59PM +0200, Rafael J. Wysocki wrote:
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > index be7da23fad76..b835ca702ff0 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -27,6 +27,7 @@
> >  #include 
> >  #endif
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -1339,6 +1340,7 @@ static int __init acpi_init(void)
> > pci_mmcfg_late_init();
> > acpi_iort_init();
> > acpi_scan_init();
> > +   acpi_viot_init();
> 
> Is there a specific reason why to call it right here?
> 
> In particular, does it need to be called after acpi_scan_init()?  And
> does it need to be called before the subsequent functions?  If so,
> then why?

It does need to be called after acpi_scan_init(), because it relies on
struct device and their fwnode to be initialized. In particular to find a
PCI device we call pci_get_domain_bus_and_slot(), which needs the PCI
topology made available by acpi_scan_init().

It does not need to be before the subsequent functions however, I can move
it at the end.

> > +void __init acpi_viot_init(void)
> > +{
> > +   int i;
> > +   acpi_status status;
> > +   struct acpi_table_header *hdr;
> > +   struct acpi_viot_header *node;
> > +
> > +   status = acpi_get_table(ACPI_SIG_VIOT, 0, );
> > +   if (ACPI_FAILURE(status)) {
> > +   if (status != AE_NOT_FOUND) {
> > +   const char *msg = acpi_format_exception(status);
> > +
> > +   pr_err("Failed to get table, %s\n", msg);
> > +   }
> > +   return;
> > +   }
> > +
> > +   viot = (void *)hdr;
> > +
> > +   node = ACPI_ADD_PTR(struct acpi_viot_header, viot, 
> > viot->node_offset);
> > +   for (i = 0; i < viot->node_count; i++) {
> > +   if (viot_parse_node(node))
> > +   return;
> > +
> > +   node = ACPI_ADD_PTR(struct acpi_viot_header, node,
> > +   node->length);
> > +   }
> 
> Do you still need the table after the above is complete?  If not,
> release the reference on it acquired above.

We don't need the table anymore, I'll drop the reference

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 1/6] iommu: Deprecate Intel and AMD cmdline methods to enable strict mode


On 17/06/2021 20:01, Robin Murphy wrote:

  DMA.
-    strict [Default Off]
+    strict [Default Off] [Deprecated, use iommu.strict instead]
  With this option on every unmap_single operation will
  result in a hardware IOTLB flush operation as opposed
  to batching them for performance.


FWIW I'd be inclined to replace both whole descriptions with just 
something like "Deprecated, equivalent to iommu.strict=1".



diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 46280e6e1535..9f3096d650aa 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3098,8 +3098,10 @@ static int __init parse_amd_iommu_intr(char *str)
  static int __init parse_amd_iommu_options(char *str)
  {
  for (; *str; ++str) {
-    if (strncmp(str, "fullflush", 9) == 0)
+    if (strncmp(str, "fullflush", 9) == 0) {
+    pr_warn("amd_iommu=fullflush deprecated; use iommu.strict 
instead\n");


Nit: maybe we should spell out "...use =1 instead" in all of 
these messages just in case anyone takes them literally?



(I'm not sure 
the options parse correctly with no argument)


I don't think they do.

But I'll take both suggestions on board.



Either way,

Acked-by: Robin Murphy 


Cheers!

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 3/6] ACPI: Add driver for the VIOT table

On Wed, Jun 16, 2021 at 03:26:08PM +0200, Eric Auger wrote:
> > +   default:
> > +   pr_warn("Unsupported node %x\n", hdr->type);
> > +   ret = 0;
> > +   goto err_free;
> > +   }
> > +
> > +   /*
> > +* To be compatible with future versions of the table which may include
> > +* other node types, keep parsing.
> > +*/
> nit: doesn't this comment rather apply to the default clause in the
> switch.

Yes, the comment doesn't accurately explain the code below, I'll tweak it.

/*
 * A future version of the table may use the node for other purposes.
 * Keep parsing.
 */

> In case the PCI range node or the single MMIO endoint node does
> not refer to any translation element, isn't it simply an error case?

It is permissible in my opinion. If a future version of the spec appends
new fields to the MMIO endpoint describing some PV property (I can't think
of a useful example), then the table can contain the vIOMMU topology as
usual plus one MMIO node that's only here to describe that property, and
doesn't have a translation element. If we encounter that I think we should
keep parsing.

> > +   if (!ep->viommu) {
> > +   pr_warn("No IOMMU node found\n");
> > +   ret = 0;
> > +   goto err_free;
> > +   }

> Besides
> Reviewed-by: Eric Auger 

Thanks!
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/6] ACPI: Move IOMMU setup code out of IORT

Hi Eric,

On Wed, Jun 16, 2021 at 11:35:13AM +0200, Eric Auger wrote:
> > -const struct iommu_ops *iort_iommu_configure_id(struct device *dev,
> > -   const u32 *id_in)
> > +int iort_iommu_configure_id(struct device *dev, const u32 *id_in)
> >  {
> > struct acpi_iort_node *node;
> > -   const struct iommu_ops *ops;
> > +   const struct iommu_ops *ops = NULL;

Oops, I need to remove this (and add -Werror to my tests.)


> > +static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev,
> > +  const u32 *id_in)
> > +{
> > +   int err;
> > +   const struct iommu_ops *ops;
> > +
> > +   /*
> > +* If we already translated the fwspec there is nothing left to do,
> > +* return the iommu_ops.
> > +*/
> > +   ops = acpi_iommu_fwspec_ops(dev);
> > +   if (ops)
> > +   return ops;
> > +
> > +   err = iort_iommu_configure_id(dev, id_in);
> > +
> > +   /*
> > +* If we have reason to believe the IOMMU driver missed the initial
> > +* add_device callback for dev, replay it to get things in order.
> > +*/
> > +   if (!err && dev->bus && !device_iommu_mapped(dev))
> > +   err = iommu_probe_device(dev);
> Previously we had:
>     if (!err) {
>         ops = iort_fwspec_iommu_ops(dev);
>         err = iort_add_device_replay(dev);
>     }
> 
> Please can you explain the transform? I see the
> 
> acpi_iommu_fwspec_ops call below but is it not straightforward to me.

I figured that iort_add_device_replay() is only used once and is
sufficiently simple to be inlined manually (saving 10 lines). Then I
replaced the ops assignment with returns, which saves another line and may
be slightly clearer?  I guess it's mostly a matter of taste, the behavior
should be exactly the same.

> Also the comment mentions replay. Unsure if it is still OK.

The "replay" part is, but "add_device" isn't accurate because it has since
been replaced by probe_device. I'll refresh the comment.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options


On 18/06/2021 02:46, Lu Baolu wrote:

Hi baolu,


I need to change that. How about this:

bool print_warning = false;

for_each_active_iommu(iommu, drhd) {
 /*
  * The flush queue implementation does not perform
  * page-selective invalidations that are required for efficient
  * TLB flushes in virtual environments.  The benefit of batching
  * is likely to be much lower than the overhead of synchronizing
  * the virtual and physical IOMMU page-tables.
  */
 if (!print_warning && cap_caching_mode(iommu->cap)) {
 pr_warn("IOMMU batching disallowed due to virtualization\n");
 iommu_set_dma_strict(true);
 print_warning = true;
 }
 ...
}

or use pr_warn_once().


 From my p.o.v, pr__once() is better.

How about using a pr_info_once()? I don't think it's a warning, it's
just a policy choice in VM environment.


ok, I can go with that, which Robin mostly agrees with.

Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [bug report] iommu/vt-d: Allocate/register iopf queue for sva devices


Thanks for letting me know this. We already have a fix in linux-next.

commit 90141a0fb72340937ed9ec05301cc548901d8eec
Author: Colin Ian King 
Date:   Fri Jun 11 14:50:24 2021 +0100

iommu/vt-d: Fix dereference of pointer info before it is null checked

The assignment of iommu from info->iommu occurs before info is null 
checked
hence leading to a potential null pointer dereference issue. Fix 
this by

assigning iommu and checking if iommu is null after null checking info.

Fixes: 4c82b88696ac ("iommu/vt-d: Allocate/register iopf queue for 
sva devices")

Signed-off-by: Colin Ian King 
Addresses-Coverity: ("Dereference before null check")
Link: 
https://lore.kernel.org/r/20210611135024.32781-1-colin.k...@canonical.com

Signed-off-by: Lu Baolu 

Best regards,
baolu

On 2021/6/18 14:55, Dan Carpenter wrote:

Hello Lu Baolu,

This is a semi-automatic email about new static checker warnings.

The patch 4c82b88696ac: "iommu/vt-d: Allocate/register iopf queue for
sva devices" from Jun 10, 2021, leads to the following Smatch
complaint:

 drivers/iommu/intel/iommu.c:5335 intel_iommu_enable_sva()
 warn: variable dereferenced before check 'info' (see line 5332)

drivers/iommu/intel/iommu.c
   5331 struct device_domain_info *info = get_domain_info(dev);
   5332 struct intel_iommu *iommu = info->iommu;
 ^^^
Dereferenced

   5333 int ret;
   5334 
   5335 if (!info || !iommu || dmar_disabled)
 ^
Checked too late.  


   5336 return -EINVAL;
   5337 

regards,
dan carpenter


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[bug report] iommu/vt-d: Allocate/register iopf queue for sva devices

2021-06-18 Thread Dan Carpenter

Hello Lu Baolu,

This is a semi-automatic email about new static checker warnings.

The patch 4c82b88696ac: "iommu/vt-d: Allocate/register iopf queue for 
sva devices" from Jun 10, 2021, leads to the following Smatch 
complaint:

drivers/iommu/intel/iommu.c:5335 intel_iommu_enable_sva()
warn: variable dereferenced before check 'info' (see line 5332)

drivers/iommu/intel/iommu.c
  5331  struct device_domain_info *info = get_domain_info(dev);
  5332  struct intel_iommu *iommu = info->iommu;
^^^
Dereferenced

  5333  int ret;
  5334  
  5335  if (!info || !iommu || dmar_disabled)
^
Checked too late.  


  5336  return -EINVAL;
  5337  

regards,
dan carpenter
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions