Re: [PATCH v13 00/12] Restricted DMA
v14: https://lore.kernel.org/patchwork/cover/1448954/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 12/12] of: Add plumbing for restricted DMA pool
If a device is not behind an IOMMU, we look up the device node and set up the restricted DMA when the restricted-dma-pool is presented. Signed-off-by: Claire Chang Tested-by: Stefano Stabellini Tested-by: Will Deacon --- drivers/of/address.c| 33 + drivers/of/device.c | 3 +++ drivers/of/of_private.h | 6 ++ 3 files changed, 42 insertions(+) diff --git a/drivers/of/address.c b/drivers/of/address.c index 73ddf2540f3f..cdf700fba5c4 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -1022,6 +1023,38 @@ int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map) of_node_put(node); return ret; } + +int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np) +{ + struct device_node *node, *of_node = dev->of_node; + int count, i; + + count = of_property_count_elems_of_size(of_node, "memory-region", + sizeof(u32)); + /* +* If dev->of_node doesn't exist or doesn't contain memory-region, try +* the OF node having DMA configuration. +*/ + if (count <= 0) { + of_node = np; + count = of_property_count_elems_of_size( + of_node, "memory-region", sizeof(u32)); + } + + for (i = 0; i < count; i++) { + node = of_parse_phandle(of_node, "memory-region", i); + /* +* There might be multiple memory regions, but only one +* restricted-dma-pool region is allowed. +*/ + if (of_device_is_compatible(node, "restricted-dma-pool") && + of_device_is_available(node)) + return of_reserved_mem_device_init_by_idx(dev, of_node, + i); + } + + return 0; +} #endif /* CONFIG_HAS_DMA */ /** diff --git a/drivers/of/device.c b/drivers/of/device.c index 6cb86de404f1..e68316836a7a 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, arch_setup_dma_ops(dev, dma_start, size, iommu, coherent); + if (!iommu) + return of_dma_set_restricted_buffer(dev, np); + return 0; } EXPORT_SYMBOL_GPL(of_dma_configure_id); diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h index d9e6a324de0a..25cebbed5f02 100644 --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -161,12 +161,18 @@ struct bus_dma_region; #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA) int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map); +int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np); #else static inline int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map) { return -ENODEV; } +static inline int of_dma_set_restricted_buffer(struct device *dev, + struct device_node *np) +{ + return -ENODEV; +} #endif #endif /* _LINUX_OF_PRIVATE_H */ -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 11/12] dt-bindings: of: Add restricted DMA pool
Introduce the new compatible string, restricted-dma-pool, for restricted DMA. One can specify the address and length of the restricted DMA memory region by restricted-dma-pool in the reserved-memory node. Signed-off-by: Claire Chang Tested-by: Stefano Stabellini Tested-by: Will Deacon --- .../reserved-memory/reserved-memory.txt | 36 +-- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt index e8d3096d922c..39b5f4c5a511 100644 --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt @@ -51,6 +51,23 @@ compatible (optional) - standard definition used as a shared pool of DMA buffers for a set of devices. It can be used by an operating system to instantiate the necessary pool management subsystem if necessary. +- restricted-dma-pool: This indicates a region of memory meant to be + used as a pool of restricted DMA buffers for a set of devices. The + memory region would be the only region accessible to those devices. + When using this, the no-map and reusable properties must not be set, + so the operating system can create a virtual mapping that will be used + for synchronization. The main purpose for restricted DMA is to + mitigate the lack of DMA access control on systems without an IOMMU, + which could result in the DMA accessing the system memory at + unexpected times and/or unexpected addresses, possibly leading to data + leakage or corruption. The feature on its own provides a basic level + of protection against the DMA overwriting buffer contents at + unexpected times. However, to protect against general data leakage and + system memory corruption, the system needs to provide way to lock down + the memory access, e.g., MPU. Note that since coherent allocation + needs remapping, one must set up another device coherent pool by + shared-dma-pool and use dma_alloc_from_dev_coherent instead for atomic + coherent allocation. - vendor specific string in the form ,[-] no-map (optional) - empty property - Indicates the operating system must not create a virtual mapping @@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for each corresponding Example --- -This example defines 3 contiguous regions are defined for Linux kernel: +This example defines 4 contiguous regions for Linux kernel: one default of all device drivers (named linux,cma@7200 and 64MiB in size), -one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and -one for multimedia processing (named multimedia-memory@7700, 64MiB). +one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), +one for multimedia processing (named multimedia-memory@7700, 64MiB), and +one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB). / { #address-cells = <1>; @@ -120,6 +138,11 @@ one for multimedia processing (named multimedia-memory@7700, 64MiB). compatible = "acme,multimedia-memory"; reg = <0x7700 0x400>; }; + + restricted_dma_reserved: restricted_dma_reserved { + compatible = "restricted-dma-pool"; + reg = <0x5000 0x400>; + }; }; /* ... */ @@ -138,4 +161,11 @@ one for multimedia processing (named multimedia-memory@7700, 64MiB). memory-region = <_reserved>; /* ... */ }; + + pcie_device: pcie_device@0,0 { + reg = <0x8301 0x0 0x 0x0 0x0010 + 0x8301 0x0 0x0010 0x0 0x0010>; + memory-region = <_dma_reserved>; + /* ... */ + }; }; -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 10/12] swiotlb: Add restricted DMA pool initialization
Add the initialization function to create restricted DMA pools from matching reserved-memory nodes. Regardless of swiotlb setting, the restricted DMA pool is preferred if available. The restricted DMA pools provide a basic level of protection against the DMA overwriting buffer contents at unexpected times. However, to protect against general data leakage and system memory corruption, the system needs to provide a way to lock down the memory access, e.g., MPU. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon --- include/linux/swiotlb.h | 3 +- kernel/dma/Kconfig | 14 kernel/dma/swiotlb.c| 76 + 3 files changed, 92 insertions(+), 1 deletion(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index a73fad460162..175b6c113ed8 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force; * range check to see if the memory was in fact allocated by this * API. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and - * @end. This is command line adjustable via setup_io_tlb_npages. + * @end. For default swiotlb, this is command line adjustable via + * setup_io_tlb_npages. * @used: The number of used IO TLB block. * @list: The free list describing the number of free entries available * from each index. diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 77b405508743..3e961dc39634 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -80,6 +80,20 @@ config SWIOTLB bool select NEED_DMA_MAP_STATE +config DMA_RESTRICTED_POOL + bool "DMA Restricted Pool" + depends on OF && OF_RESERVED_MEM + select SWIOTLB + help + This enables support for restricted DMA pools which provide a level of + DMA memory protection on systems with limited hardware protection + capabilities, such as those lacking an IOMMU. + + For more information see + + and . + If unsure, say "n". + # # Should be selected if we can mmap non-coherent mappings to userspace. # The only thing that is really required is a way to set an uncached bit diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 273b21090ee8..1aef294c82b5 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -39,6 +39,13 @@ #ifdef CONFIG_DEBUG_FS #include #endif +#ifdef CONFIG_DMA_RESTRICTED_POOL +#include +#include +#include +#include +#include +#endif #include #include @@ -736,4 +743,73 @@ bool swiotlb_free(struct device *dev, struct page *page, size_t size) return true; } +static int rmem_swiotlb_device_init(struct reserved_mem *rmem, + struct device *dev) +{ + struct io_tlb_mem *mem = rmem->priv; + unsigned long nslabs = rmem->size >> IO_TLB_SHIFT; + + /* +* Since multiple devices can share the same pool, the private data, +* io_tlb_mem struct, will be initialized by the first device attached +* to it. +*/ + if (!mem) { + mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL); + if (!mem) + return -ENOMEM; + + set_memory_decrypted((unsigned long)phys_to_virt(rmem->base), +rmem->size >> PAGE_SHIFT); + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false); + mem->force_bounce = true; + mem->for_alloc = true; + + rmem->priv = mem; + + if (IS_ENABLED(CONFIG_DEBUG_FS)) { + mem->debugfs = + debugfs_create_dir(rmem->name, debugfs_dir); + swiotlb_create_debugfs_files(mem); + } + } + + dev->dma_io_tlb_mem = mem; + + return 0; +} + +static void rmem_swiotlb_device_release(struct reserved_mem *rmem, + struct device *dev) +{ + dev->dma_io_tlb_mem = io_tlb_default_mem; +} + +static const struct reserved_mem_ops rmem_swiotlb_ops = { + .device_init = rmem_swiotlb_device_init, + .device_release = rmem_swiotlb_device_release, +}; + +static int __init rmem_swiotlb_setup(struct reserved_mem *rmem) +{ + unsigned long node = rmem->fdt_node; + + if (of_get_flat_dt_prop(node, "reusable", NULL) || + of_get_flat_dt_prop(node, "linux,cma-default", NULL) || + of_get_flat_dt_prop(node, "linux,dma-default", NULL) || + of_get_flat_dt_prop(node, "no-map", NULL)) + return -EINVAL; + + if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base { + pr_err("Restricted DMA pool must be accessible within the linear mapping."); + return
[PATCH v14 09/12] swiotlb: Add restricted DMA alloc/free support
Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to support the memory allocation from restricted DMA pool. The restricted DMA pool is preferred if available. Note that since coherent allocation needs remapping, one must set up another device coherent pool by shared-dma-pool and use dma_alloc_from_dev_coherent instead for atomic coherent allocation. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon Acked-by: Stefano Stabellini --- include/linux/swiotlb.h | 26 ++ kernel/dma/direct.c | 49 +++-- kernel/dma/swiotlb.c| 38 ++-- 3 files changed, 99 insertions(+), 14 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 8d8855c77d9a..a73fad460162 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force; * @debugfs: The dentry to debugfs. * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced + * @for_alloc: %true if the pool is used for memory allocation */ struct io_tlb_mem { phys_addr_t start; @@ -96,6 +97,7 @@ struct io_tlb_mem { struct dentry *debugfs; bool late_alloc; bool force_bounce; + bool for_alloc; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -156,4 +158,28 @@ static inline void swiotlb_adjust_size(unsigned long size) extern void swiotlb_print_info(void); extern void swiotlb_set_max_segment(unsigned int); +#ifdef CONFIG_DMA_RESTRICTED_POOL +struct page *swiotlb_alloc(struct device *dev, size_t size); +bool swiotlb_free(struct device *dev, struct page *page, size_t size); + +static inline bool is_swiotlb_for_alloc(struct device *dev) +{ + return dev->dma_io_tlb_mem->for_alloc; +} +#else +static inline struct page *swiotlb_alloc(struct device *dev, size_t size) +{ + return NULL; +} +static inline bool swiotlb_free(struct device *dev, struct page *page, + size_t size) +{ + return false; +} +static inline bool is_swiotlb_for_alloc(struct device *dev) +{ + return false; +} +#endif /* CONFIG_DMA_RESTRICTED_POOL */ + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index a92465b4eb12..2de33e5d302b 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit); } +static void __dma_direct_free_pages(struct device *dev, struct page *page, + size_t size) +{ + if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) && + swiotlb_free(dev, page, size)) + return; + dma_free_contiguous(dev, page, size); +} + static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp_t gfp) { @@ -86,6 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, _limit); + if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) && + is_swiotlb_for_alloc(dev)) { + page = swiotlb_alloc(dev, size); + if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { + __dma_direct_free_pages(dev, page, size); + return NULL; + } + return page; + } + page = dma_alloc_contiguous(dev, size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { dma_free_contiguous(dev, page, size); @@ -142,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size, gfp |= __GFP_NOWARN; if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && - !force_dma_unencrypted(dev)) { + !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) { page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO); if (!page) return NULL; @@ -155,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size, } if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) && - !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - !dev_is_dma_coherent(dev)) + !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) && + !is_swiotlb_for_alloc(dev)) return arch_dma_alloc(dev, size, dma_handle, gfp, attrs); /* * Remapping or decrypting memory may block. If either is required and * we can't block, allocate the memory from the atomic pools. +* If restricted DMA (i.e., is_swiotlb_for_alloc) is
[PATCH v14 08/12] swiotlb: Refactor swiotlb_tbl_unmap_single
Add a new function, swiotlb_release_slots, to make the code reusable for supporting different bounce buffer pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon --- kernel/dma/swiotlb.c | 35 --- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index daf38a52e66d..e79383df5d4a 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -556,27 +556,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, return tlb_addr; } -/* - * tlb_addr is the physical address of the bounce buffer to unmap. - */ -void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, - size_t mapping_size, enum dma_data_direction dir, - unsigned long attrs) +static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr) { - struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long flags; - unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr); + unsigned int offset = swiotlb_align_offset(dev, tlb_addr); int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT; int nslots = nr_slots(mem->slots[index].alloc_size + offset); int count, i; - /* -* First, sync the memory before unmapping the entry -*/ - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) - swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE); - /* * Return the buffer to the free list by setting the corresponding * entries to indicate the number of contiguous entries available. @@ -611,6 +599,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, spin_unlock_irqrestore(>lock, flags); } +/* + * tlb_addr is the physical address of the bounce buffer to unmap. + */ +void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr, + size_t mapping_size, enum dma_data_direction dir, + unsigned long attrs) +{ + /* +* First, sync the memory before unmapping the entry +*/ + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && + (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) + swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE); + + swiotlb_release_slots(dev, tlb_addr); +} + void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) { -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 07/12] swiotlb: Move alloc_size to swiotlb_find_slots
Rename find_slots to swiotlb_find_slots and move the maintenance of alloc_size to it for better code reusability later. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon --- kernel/dma/swiotlb.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 0d294bbf274c..daf38a52e66d 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -432,8 +432,8 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) * Find a suitable number of IO TLB entries size that will fit this request and * allocate a buffer from that IO TLB pool. */ -static int find_slots(struct device *dev, phys_addr_t orig_addr, - size_t alloc_size) +static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, + size_t alloc_size) { struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long boundary_mask = dma_get_seg_boundary(dev); @@ -488,8 +488,11 @@ static int find_slots(struct device *dev, phys_addr_t orig_addr, return -1; found: - for (i = index; i < index + nslots; i++) + for (i = index; i < index + nslots; i++) { mem->slots[i].list = 0; + mem->slots[i].alloc_size = + alloc_size - ((i - index) << IO_TLB_SHIFT); + } for (i = index - 1; io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && mem->slots[i].list; i--) @@ -530,7 +533,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, return (phys_addr_t)DMA_MAPPING_ERROR; } - index = find_slots(dev, orig_addr, alloc_size + offset); + index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset); if (index == -1) { if (!(attrs & DMA_ATTR_NO_WARN)) dev_warn_ratelimited(dev, @@ -544,11 +547,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, * This is needed when we sync the memory. Then we sync the buffer if * needed. */ - for (i = 0; i < nr_slots(alloc_size + offset); i++) { + for (i = 0; i < nr_slots(alloc_size + offset); i++) mem->slots[index + i].orig_addr = slot_addr(orig_addr, i); - mem->slots[index + i].alloc_size = - alloc_size - (i << IO_TLB_SHIFT); - } tlb_addr = slot_addr(mem->start, index) + offset; if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and use it to determine whether to bounce the data or not. This will be useful later to allow for different pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon Acked-by: Stefano Stabellini --- drivers/xen/swiotlb-xen.c | 2 +- include/linux/swiotlb.h | 11 +++ kernel/dma/direct.c | 2 +- kernel/dma/direct.h | 2 +- kernel/dma/swiotlb.c | 4 5 files changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 0c6ed09f8513..4730a146fa35 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -369,7 +369,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, if (dma_capable(dev, dev_addr, size, true) && !range_straddles_page_boundary(phys, size) && !xen_arch_need_swiotlb(dev, phys, dev_addr) && - swiotlb_force != SWIOTLB_FORCE) + !is_swiotlb_force_bounce(dev)) goto done; /* diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index dd1c30a83058..8d8855c77d9a 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force; * unmap calls. * @debugfs: The dentry to debugfs. * @late_alloc:%true if allocated using the page allocator + * @force_bounce: %true if swiotlb bouncing is forced */ struct io_tlb_mem { phys_addr_t start; @@ -94,6 +95,7 @@ struct io_tlb_mem { spinlock_t lock; struct dentry *debugfs; bool late_alloc; + bool force_bounce; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) return mem && paddr >= mem->start && paddr < mem->end; } +static inline bool is_swiotlb_force_bounce(struct device *dev) +{ + return dev->dma_io_tlb_mem->force_bounce; +} + void __init swiotlb_exit(void); unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); @@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { return false; } +static inline bool is_swiotlb_force_bounce(struct device *dev) +{ + return false; +} static inline void swiotlb_exit(void) { } diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 7a88c34d0867..a92465b4eb12 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev) { /* If SWIOTLB is active, use its maximum mapping size */ if (is_swiotlb_active(dev) && - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE)) + (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev))) return swiotlb_max_mapping_size(dev); return SIZE_MAX; } diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h index 13e9e7158d94..4632b0f4f72e 100644 --- a/kernel/dma/direct.h +++ b/kernel/dma/direct.h @@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev, phys_addr_t phys = page_to_phys(page) + offset; dma_addr_t dma_addr = phys_to_dma(dev, phys); - if (unlikely(swiotlb_force == SWIOTLB_FORCE)) + if (is_swiotlb_force_bounce(dev)) return swiotlb_map(dev, phys, size, dir, attrs); if (unlikely(!dma_capable(dev, dma_addr, size, true))) { diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8a120f42340b..0d294bbf274c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->end = mem->start + bytes; mem->index = 0; mem->late_alloc = late_alloc; + + if (swiotlb_force == SWIOTLB_FORCE) + mem->force_bounce = true; + spin_lock_init(>lock); for (i = 0; i < mem->nslabs; i++) { mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument
Update is_swiotlb_active to add a struct device argument. This will be useful later to allow for different pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon Acked-by: Stefano Stabellini --- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +- drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +- drivers/pci/xen-pcifront.c | 2 +- include/linux/swiotlb.h | 4 ++-- kernel/dma/direct.c | 2 +- kernel/dma/swiotlb.c | 4 ++-- 6 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c index a9d65fc8aa0e..4b7afa0fc85d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c @@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj) max_order = MAX_ORDER; #ifdef CONFIG_SWIOTLB - if (is_swiotlb_active()) { + if (is_swiotlb_active(obj->base.dev->dev)) { unsigned int max_segment; max_segment = swiotlb_max_segment(); diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index 9662522aa066..be15bfd9e0ee 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm) } #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86) - need_swiotlb = is_swiotlb_active(); + need_swiotlb = is_swiotlb_active(dev->dev); #endif ret = ttm_bo_device_init(>ttm.bdev, _bo_driver, diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c index b7a8f3a1921f..0d56985bfe81 100644 --- a/drivers/pci/xen-pcifront.c +++ b/drivers/pci/xen-pcifront.c @@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct pcifront_device *pdev) spin_unlock(_dev_lock); - if (!err && !is_swiotlb_active()) { + if (!err && !is_swiotlb_active(>xdev->dev)) { err = pci_xen_swiotlb_init_late(); if (err) dev_err(>xdev->dev, "Could not setup SWIOTLB!\n"); diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index d1f3d95881cd..dd1c30a83058 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) void __init swiotlb_exit(void); unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); -bool is_swiotlb_active(void); +bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); #else #define swiotlb_force SWIOTLB_NO_FORCE @@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct device *dev) return SIZE_MAX; } -static inline bool is_swiotlb_active(void) +static inline bool is_swiotlb_active(struct device *dev) { return false; } diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 84c9feb5474a..7a88c34d0867 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask) size_t dma_direct_max_mapping_size(struct device *dev) { /* If SWIOTLB is active, use its maximum mapping size */ - if (is_swiotlb_active() && + if (is_swiotlb_active(dev) && (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE)) return swiotlb_max_mapping_size(dev); return SIZE_MAX; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 72a4289faed1..8a120f42340b 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -664,9 +664,9 @@ size_t swiotlb_max_mapping_size(struct device *dev) return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE; } -bool is_swiotlb_active(void) +bool is_swiotlb_active(struct device *dev) { - return io_tlb_default_mem != NULL; + return dev->dma_io_tlb_mem != NULL; } EXPORT_SYMBOL_GPL(is_swiotlb_active); -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument
Update is_swiotlb_buffer to add a struct device argument. This will be useful later to allow for different pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon Acked-by: Stefano Stabellini --- drivers/iommu/dma-iommu.c | 12 ++-- drivers/xen/swiotlb-xen.c | 2 +- include/linux/swiotlb.h | 7 --- kernel/dma/direct.c | 6 +++--- kernel/dma/direct.h | 6 +++--- 5 files changed, 17 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 3087d9fa6065..10997ef541f8 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr, __iommu_dma_unmap(dev, dma_addr, size); - if (unlikely(is_swiotlb_buffer(phys))) + if (unlikely(is_swiotlb_buffer(dev, phys))) swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs); } @@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys, } iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask); - if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys)) + if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys)) swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs); return iova; } @@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev, if (!dev_is_dma_coherent(dev)) arch_sync_dma_for_cpu(phys, size, dir); - if (is_swiotlb_buffer(phys)) + if (is_swiotlb_buffer(dev, phys)) swiotlb_sync_single_for_cpu(dev, phys, size, dir); } @@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev, return; phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle); - if (is_swiotlb_buffer(phys)) + if (is_swiotlb_buffer(dev, phys)) swiotlb_sync_single_for_device(dev, phys, size, dir); if (!dev_is_dma_coherent(dev)) @@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev, if (!dev_is_dma_coherent(dev)) arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir); - if (is_swiotlb_buffer(sg_phys(sg))) + if (is_swiotlb_buffer(dev, sg_phys(sg))) swiotlb_sync_single_for_cpu(dev, sg_phys(sg), sg->length, dir); } @@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev, return; for_each_sg(sgl, sg, nelems, i) { - if (is_swiotlb_buffer(sg_phys(sg))) + if (is_swiotlb_buffer(dev, sg_phys(sg))) swiotlb_sync_single_for_device(dev, sg_phys(sg), sg->length, dir); diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 4c89afc0df62..0c6ed09f8513 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr) * in our domain. Therefore _only_ check address within our domain. */ if (pfn_valid(PFN_DOWN(paddr))) - return is_swiotlb_buffer(paddr); + return is_swiotlb_buffer(dev, paddr); return 0; } diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 216854a5e513..d1f3d95881cd 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -2,6 +2,7 @@ #ifndef __LINUX_SWIOTLB_H #define __LINUX_SWIOTLB_H +#include #include #include #include @@ -101,9 +102,9 @@ struct io_tlb_mem { }; extern struct io_tlb_mem *io_tlb_default_mem; -static inline bool is_swiotlb_buffer(phys_addr_t paddr) +static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; return mem && paddr >= mem->start && paddr < mem->end; } @@ -115,7 +116,7 @@ bool is_swiotlb_active(void); void __init swiotlb_adjust_size(unsigned long size); #else #define swiotlb_force SWIOTLB_NO_FORCE -static inline bool is_swiotlb_buffer(phys_addr_t paddr) +static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { return false; } diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index f737e3347059..84c9feb5474a 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev, for_each_sg(sgl, sg, nents, i) { phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg)); - if (unlikely(is_swiotlb_buffer(paddr))) + if (unlikely(is_swiotlb_buffer(dev, paddr)))
[PATCH v14 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
Always have the pointer to the swiotlb pool used in struct device. This could help simplify the code for other pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon Acked-by: Stefano Stabellini --- drivers/base/core.c| 4 include/linux/device.h | 4 kernel/dma/swiotlb.c | 8 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index f29839382f81..cb3123e3954d 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include /* for dma_default_coherent */ @@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev) defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) dev->dma_coherent = dma_default_coherent; #endif +#ifdef CONFIG_SWIOTLB + dev->dma_io_tlb_mem = io_tlb_default_mem; +#endif } EXPORT_SYMBOL_GPL(device_initialize); diff --git a/include/linux/device.h b/include/linux/device.h index ba660731bd25..240d652a0696 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -416,6 +416,7 @@ struct dev_links_info { * @dma_pools: Dma pools (if dma'ble device). * @dma_mem: Internal for coherent mem override. * @cma_area: Contiguous memory area for dma allocations + * @dma_io_tlb_mem: Pointer to the swiotlb pool used. Not for driver use. * @archdata: For arch-specific additions. * @of_node: Associated device tree node. * @fwnode:Associated device node supplied by platform firmware. @@ -518,6 +519,9 @@ struct device { #ifdef CONFIG_DMA_CMA struct cma *cma_area; /* contiguous memory area for dma allocations */ +#endif +#ifdef CONFIG_SWIOTLB + struct io_tlb_mem *dma_io_tlb_mem; #endif /* arch specific additions */ struct dev_archdata archdata; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index ede66df6835b..72a4289faed1 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -340,7 +340,7 @@ void __init swiotlb_exit(void) static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT; unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1); phys_addr_t orig_addr = mem->slots[index].orig_addr; @@ -431,7 +431,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) static int find_slots(struct device *dev, phys_addr_t orig_addr, size_t alloc_size) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long boundary_mask = dma_get_seg_boundary(dev); dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, mem->start) & boundary_mask; @@ -508,7 +508,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, size_t mapping_size, size_t alloc_size, enum dma_data_direction dir, unsigned long attrs) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned int offset = swiotlb_align_offset(dev, orig_addr); unsigned int i; int index; @@ -559,7 +559,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, size_t mapping_size, enum dma_data_direction dir, unsigned long attrs) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem; unsigned long flags; unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr); int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT; -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 02/12] swiotlb: Refactor swiotlb_create_debugfs
Split the debugfs creation to make the code reusable for supporting different bounce buffer pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon --- kernel/dma/swiotlb.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 1f9b2b9e7490..ede66df6835b 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -671,19 +671,26 @@ bool is_swiotlb_active(void) EXPORT_SYMBOL_GPL(is_swiotlb_active); #ifdef CONFIG_DEBUG_FS +static struct dentry *debugfs_dir; -static int __init swiotlb_create_debugfs(void) +static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem) { - struct io_tlb_mem *mem = io_tlb_default_mem; - - if (!mem) - return 0; - mem->debugfs = debugfs_create_dir("swiotlb", NULL); debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs); debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used); +} + +static int __init swiotlb_create_default_debugfs(void) +{ + struct io_tlb_mem *mem = io_tlb_default_mem; + + debugfs_dir = debugfs_create_dir("swiotlb", NULL); + if (mem) { + mem->debugfs = debugfs_dir; + swiotlb_create_debugfs_files(mem); + } return 0; } -late_initcall(swiotlb_create_debugfs); +late_initcall(swiotlb_create_default_debugfs); #endif -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 01/12] swiotlb: Refactor swiotlb init functions
Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct initialization to make the code reusable. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig Tested-by: Stefano Stabellini Tested-by: Will Deacon --- kernel/dma/swiotlb.c | 50 ++-- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 52e2ac526757..1f9b2b9e7490 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void) memset(vaddr, 0, bytes); } -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, + unsigned long nslabs, bool late_alloc) { + void *vaddr = phys_to_virt(start); unsigned long bytes = nslabs << IO_TLB_SHIFT, i; + + mem->nslabs = nslabs; + mem->start = start; + mem->end = mem->start + bytes; + mem->index = 0; + mem->late_alloc = late_alloc; + spin_lock_init(>lock); + for (i = 0; i < mem->nslabs; i++) { + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); + mem->slots[i].orig_addr = INVALID_PHYS_ADDR; + mem->slots[i].alloc_size = 0; + } + memset(vaddr, 0, bytes); +} + +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) +{ struct io_tlb_mem *mem; size_t alloc_size; @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose) if (!mem) panic("%s: Failed to allocate %zu bytes align=0x%lx\n", __func__, alloc_size, PAGE_SIZE); - mem->nslabs = nslabs; - mem->start = __pa(tlb); - mem->end = mem->start + bytes; - mem->index = 0; - spin_lock_init(>lock); - for (i = 0; i < mem->nslabs; i++) { - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; - mem->slots[i].alloc_size = 0; - } + + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); io_tlb_default_mem = mem; if (verbose) @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size) int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) { - unsigned long bytes = nslabs << IO_TLB_SHIFT, i; struct io_tlb_mem *mem; + unsigned long bytes = nslabs << IO_TLB_SHIFT; if (swiotlb_force == SWIOTLB_NO_FORCE) return 0; @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) if (!mem) return -ENOMEM; - mem->nslabs = nslabs; - mem->start = virt_to_phys(tlb); - mem->end = mem->start + bytes; - mem->index = 0; - mem->late_alloc = 1; - spin_lock_init(>lock); - for (i = 0; i < mem->nslabs; i++) { - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; - mem->slots[i].alloc_size = 0; - } - + memset(mem, 0, sizeof(*mem)); set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT); - memset(tlb, 0, bytes); + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true); io_tlb_default_mem = mem; swiotlb_print_info(); -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 00/12] Restricted DMA
This series implements mitigations for lack of DMA access control on systems without an IOMMU, which could result in the DMA accessing the system memory at unexpected times and/or unexpected addresses, possibly leading to data leakage or corruption. For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is not behind an IOMMU. As PCI-e, by design, gives the device full access to system memory, a vulnerability in the Wi-Fi firmware could easily escalate to a full system exploit (remote wifi exploits: [1a], [1b] that shows a full chain of exploits; [2], [3]). To mitigate the security concerns, we introduce restricted DMA. Restricted DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a specially allocated region and does memory allocation from the same region. The feature on its own provides a basic level of protection against the DMA overwriting buffer contents at unexpected times. However, to protect against general data leakage and system memory corruption, the system needs to provide a way to restrict the DMA to a predefined memory region (this is usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]). [1a] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html [1b] https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html [2] https://blade.tencent.com/en/advisories/qualpwn/ [3] https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ [4] https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132 v14: - Move set_memory_decrypted before swiotlb_init_io_tlb_mem (patch 01/12, 10,12) - Add Stefano's Acked-by tag from v13 v13: - Fix xen-swiotlb issues - memset in patch 01/12 - is_swiotlb_force_bounce in patch 06/12 - Fix the dts example typo in reserved-memory.txt - Add Stefano and Will's Tested-by tag from v12 https://lore.kernel.org/patchwork/cover/1448001/ v12: Split is_dev_swiotlb_force into is_swiotlb_force_bounce (patch 06/12) and is_swiotlb_for_alloc (patch 09/12) https://lore.kernel.org/patchwork/cover/1447254/ v11: - Rebase against swiotlb devel/for-linus-5.14 - s/mempry/memory/g - exchange the order of patch 09/12 and 10/12 https://lore.kernel.org/patchwork/cover/1447216/ v10: Address the comments in v9 to - fix the dev->dma_io_tlb_mem assignment - propagate swiotlb_force setting into io_tlb_default_mem->force - move set_memory_decrypted out of swiotlb_init_io_tlb_mem - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block - add swiotlb_ prefix to find_slots and release_slots - merge the 3 alloc/free related patches - move the CONFIG_DMA_RESTRICTED_POOL later https://lore.kernel.org/patchwork/cover/1446882/ v9: Address the comments in v7 to - set swiotlb active pool to dev->dma_io_tlb_mem - get rid of get_io_tlb_mem - dig out the device struct for is_swiotlb_active - move debugfs_create_dir out of swiotlb_create_debugfs - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem - use IS_ENABLED in kernel/dma/direct.c - fix redefinition of 'of_dma_set_restricted_buffer' https://lore.kernel.org/patchwork/cover/1445081/ v8: - Fix reserved-memory.txt and add the reg property in example. - Fix sizeof for of_property_count_elems_of_size in drivers/of/address.c#of_dma_set_restricted_buffer. - Apply Will's suggestion to try the OF node having DMA configuration in drivers/of/address.c#of_dma_set_restricted_buffer. - Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer. - Add error message for PageHighMem in kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to rmem_swiotlb_setup. - Fix the message string in rmem_swiotlb_setup. https://lore.kernel.org/patchwork/cover/1437112/ v7: Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init https://lore.kernel.org/patchwork/cover/1431031/ v6: Address the comments in v5 https://lore.kernel.org/patchwork/cover/1423201/ v5: Rebase on latest linux-next https://lore.kernel.org/patchwork/cover/1416899/ v4: - Fix spinlock bad magic - Use rmem->name for debugfs entry - Address the comments in v3 https://lore.kernel.org/patchwork/cover/1378113/ v3: Using only one reserved memory region for both streaming DMA and memory allocation. https://lore.kernel.org/patchwork/cover/1360992/ v2: Building on top of swiotlb. https://lore.kernel.org/patchwork/cover/1280705/ v1: Using dma_map_ops. https://lore.kernel.org/patchwork/cover/1271660/ Claire Chang (12): swiotlb: Refactor swiotlb init functions swiotlb: Refactor swiotlb_create_debugfs swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used swiotlb: Update is_swiotlb_buffer to add a struct device argument swiotlb: Update is_swiotlb_active to add a struct device argument swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing swiotlb: Move alloc_size
Re: [PATCHv2 2/3] iommu/io-pgtable: Optimize partial walk flush for large scatter-gather list
Hi, On Thu, Jun 17, 2021 at 7:51 PM Sai Prakash Ranjan wrote: > > Currently for iommu_unmap() of large scatter-gather list with page size > elements, the majority of time is spent in flushing of partial walks in > __arm_lpae_unmap() which is a VA based TLB invalidation invalidating > page-by-page on iommus like arm-smmu-v2 (TLBIVA) which do not support > range based invalidations like on arm-smmu-v3.2. > > For example: to unmap a 32MB scatter-gather list with page size elements > (8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB > for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K) > resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge > overhead. > > So instead use tlb_flush_all() callback (TLBIALL/TLBIASID) to invalidate > the entire context for partial walk flush on select few platforms where > cost of over-invalidation is less than unmap latency It would probably be worth punching this description up a little bit. Elsewhere you said in more detail why this over-invalidation is less of a big deal for the Qualcomm SMMU. It's probably worth saying something like that here, too. Like this bit paraphrased from your other email: On qcom impl, we have several performance improvements for TLB cache invalidations in HW like wait-for-safe (for realtime clients such as camera and display) and few others to allow for cache lookups/updates when TLBI is in progress for the same context bank. > using the newly > introduced quirk IO_PGTABLE_QUIRK_TLB_INV_ALL. We also do this for > non-strict mode given its all about over-invalidation saving time on > individual unmaps and non-deterministic generally. As per usual I'm mostly clueless, but I don't quite understand why you want this new behavior for non-strict mode. To me it almost seems like the opposite? Specifically, non-strict mode is already outside the critical path today and so there's no need to optimize it. I'm probably not explaining myself clearly, but I guess i'm thinking: a) today for strict, unmap is in the critical path and it's important to get it out of there. Getting it out of the critical path is so important that we're willing to over-invalidate to speed up the critical path. b) today for non-strict, unmap is not in the critical path. So I would almost expect your patch to _disable_ your new feature for non-strict mappings, not auto-enable your new feature for non-strict mappings. If I'm babbling, feel free to ignore. ;-) Looking back, I guess Robin was the one that suggested the behavior you're implementing, so it's more likely he's right than I am. ;-) -Doug ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch
On 2021-06-18 17:12, Ross Philipson wrote: The IOMMU should always be set to default translated type after the PMRs are disabled to protect the MLE from DMA. Signed-off-by: Ross Philipson --- drivers/iommu/intel/iommu.c | 5 + drivers/iommu/iommu.c | 6 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index be35284..4f0256d 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device *dev) */ static int device_def_domain_type(struct device *dev) { + /* Do not allow identity domain when Secure Launch is configured */ + if (slaunch_get_flags() & SL_FLAG_ACTIVE) + return IOMMU_DOMAIN_DMA; Is this specific to Intel? It seems like it could easily be done commonly like the check for untrusted external devices. + if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d..d49b7dd 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line) { if (cmd_line) iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; + + /* Do not allow identity domain when Secure Launch is configured */ + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE)) + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; Quietly ignoring the setting and possibly leaving iommu_def_domain_type uninitialised (note that 0 is not actually a usable type) doesn't seem great. AFAICS this probably warrants similar treatment to the mem_encrypt_active() case - there doesn't seem a great deal of value in trying to save users from themselves if they care about measured boot yet explicitly pass options which may compromise measured boot. If you really want to go down that route there's at least the sysfs interface you'd need to nobble as well, not to mention the various ways of completely disabling IOMMUs... It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on !SECURE_LAUNCH for clarity though. Robin. } void iommu_set_default_translated(bool cmd_line) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] /dev/ioasid uAPI proposal
On Fri, Jun 18, 2021 at 07:03:31PM +0200, Jean-Philippe Brucker wrote: > configuration. The Arm SMMUs have a lot of small features that > implementations can mix and match and that a user shouldn't have to care > about, and there are lots of different implementations by various > vendors. This is really something to think about carefully in this RFC - I do have a guess that a 'HW specific' channel is going to be useful here. If the goal is for qemu to provide all these fiddly things and they cannot be SW emulated, then direct access to the fiddly HW native stuff is possibly necessary. We've kind of seen this mistake in DRM and RDMA historically. Attempting to generalize too early, or generalize something that is really a one off. Better for everyone to just keep it as a one off. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Plan for /dev/ioasid RFC v2
On Fri, Jun 18, 2021 at 04:57:40PM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Friday, June 18, 2021 8:20 AM > > > > On Thu, Jun 17, 2021 at 03:14:52PM -0600, Alex Williamson wrote: > > > > > I've referred to this as a limitation of type1, that we can't put > > > devices within the same group into different address spaces, such as > > > behind separate vRoot-Ports in a vIOMMU config, but really, who cares? > > > As isolation support improves we see fewer multi-device groups, this > > > scenario becomes the exception. Buy better hardware to use the devices > > > independently. > > > > This is basically my thinking too, but my conclusion is that we should > > not continue to make groups central to the API. > > > > As I've explained to David this is actually causing functional > > problems and mess - and I don't see a clean way to keep groups central > > but still have the device in control of what is happening. We need > > this device <-> iommu connection to be direct to robustly model all > > the things that are in the RFC. > > > > To keep groups central someone needs to sketch out how to solve > > today's mdev SW page table and mdev PASID issues in a clean > > way. Device centric is my suggestion on how to make it clean, but I > > haven't heard an alternative?? > > > > So, I view the purpose of this discussion to scope out what a > > device-centric world looks like and then if we can securely fit in the > > legacy non-isolated world on top of that clean future oriented > > API. Then decide if it is work worth doing or not. > > > > To my mind it looks like it is not so bad, granted not every detail is > > clear, and no code has be sketched, but I don't see a big scary > > blocker emerging. An extra ioctl or two, some special logic that > > activates for >1 device groups that looks a lot like VFIO's current > > logic.. > > > > At some level I would be perfectly fine if we made the group FD part > > of the API for >1 device groups - except that complexifies every user > > space implementation to deal with that. It doesn't feel like a good > > trade off. > > > > Would it be an acceptable tradeoff by leaving >1 device groups > supported only via legacy VFIO (which is anyway kept for backward > compatibility), if we think such scenario is being deprecated over > time (thus little value to add new features on it)? Then all new > sub-systems including vdpa and new vfio only support singleton > device group via /dev/iommu... That might just be a great idea - userspace has to support those APIs anyhow, if it can be made trivially obvious to use this fallback even though /dev/iommu is available it is a great place to start. It also means PASID/etc are naturally blocked off. Maybe years down the road we will want to harmonize them, so I would still sketch it out enough to be confident it could be implemented.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
> Jianxiong Gao, before spending more time on this, could you also try > Chanho Park's patch? > https://lore.kernel.org/linux-iommu/20210510091816.ga2...@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f > I have tested Chanho Parks's patch and it works for us. The NVMe driver performs correctly with the patch. I have teste the patch on 06af8679449d -- Jianxiong Gao ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC] /dev/ioasid uAPI proposal
On Thu, Jun 17, 2021 at 01:00:14PM +1000, David Gibson wrote: > On Thu, Jun 10, 2021 at 06:37:31PM +0200, Jean-Philippe Brucker wrote: > > On Tue, Jun 08, 2021 at 04:31:50PM +1000, David Gibson wrote: > > > For the qemu case, I would imagine a two stage fallback: > > > > > > 1) Ask for the exact IOMMU capabilities (including pagetable > > >format) that the vIOMMU has. If the host can supply, you're > > >good > > > > > > 2) If not, ask for a kernel managed IOAS. Verify that it can map > > >all the IOVA ranges the guest vIOMMU needs, and has an equal or > > >smaller pagesize than the guest vIOMMU presents. If so, > > >software emulate the vIOMMU by shadowing guest io pagetable > > >updates into the kernel managed IOAS. > > > > > > 3) You're out of luck, don't start. > > > > > > For both (1) and (2) I'd expect it to be asking this question *after* > > > saying what devices are attached to the IOAS, based on the virtual > > > hardware configuration. That doesn't cover hotplug, of course, for > > > that you have to just fail the hotplug if the new device isn't > > > supportable with the IOAS you already have. > > > > Yes. So there is a point in time when the IOAS is frozen, and cannot take > > in new incompatible devices. I think that can support the usage I had in > > mind. If the VMM (non-QEMU, let's say) wanted to create one IOASID FD per > > feature set it could bind the first device, freeze the features, then bind > > Are you thinking of this "freeze the features" as an explicitly > triggered action? I have suggested that an explicit "ENABLE" step > might be useful, but that hasn't had much traction from what I've > seen. Seems like we do need an explicit enable step for the flow you described above: a) Bind all devices to an ioasid. Each bind succeeds. b) Ask for a specific set of features for this aggregate of device. Ask for (1), fall back to (2), or abort. c) Boot the VM d) Hotplug a device, bind it to the ioasid. We're long past negotiating features for the ioasid, so the host needs to reject the bind if the new device is incompatible with what was requested at (b) So a successful request at (b) would be the point where we change the behavior of bind. Since the kernel needs a form of feature check in any case, I still have a preference for aborting the bind at (a) if the device isn't exactly compatible with other devices already in the ioasid, because it might be simpler to implement in the host, but I don't feel strongly about this. > > I'd like to understand better where the difficulty lies, with migration. > > Is the problem, once we have a guest running on physical machine A, to > > make sure that physical machine B supports the same IOMMU properties > > before migrating the VM over to B? Why can't QEMU (instead of the user) > > select a feature set on machine A, then when time comes to migrate, query > > all information from the host kernel on machine B and check that it > > matches what was picked for machine A? Or is it only trying to > > accommodate different sets of features between A and B, that would be too > > difficult? > > There are two problems > > 1) Although it could be done in theory, it's hard, and it would need a > huge rewrite to qemu's whole migration infrastructure to do this. > We'd need a way of representing host features, working out which sets > are compatible with which others depending on what things the guest is > allowed to use, encoding the information in the migration stream and > reporting failure. None of this exists now. > > Indeed qemu requires that you create the (stopped) machine on the > destination (including virtual hardware configuration) before even > attempting to process the incoming migration. It does not for the > most part transfer the machine configuration in the migration stream. > Now, that's generally considered a flaw with the design, but fixing it > is a huge project that no-one's really had the energy to begin despite > the idea being around for years. > > 2) It makes behaviour really hard to predict for management layers > above. Things like oVirt automatically migrate around a cluster for > load balancing. At the moment the model which works is basically that > you if you request the same guest features on each end of the > migration, and qemu starts with that configuration on each end, the > migration should work (or only fail for transient reasons). If you > can't know if the migration is possible until you get the incoming > stream, reporting and exposing what will and won't work to the layer > above also becomes an immensely fiddly problem. That was really useful, thanks. One thing I'm worried about is the user having to know way too much detail about IOMMUs in order to pick a precise configuration. The Arm SMMUs have a lot of small features that implementations can mix and match and that a user shouldn't have to care about, and there are
RE: Plan for /dev/ioasid RFC v2
> From: Jason Gunthorpe > Sent: Friday, June 18, 2021 8:20 AM > > On Thu, Jun 17, 2021 at 03:14:52PM -0600, Alex Williamson wrote: > > > I've referred to this as a limitation of type1, that we can't put > > devices within the same group into different address spaces, such as > > behind separate vRoot-Ports in a vIOMMU config, but really, who cares? > > As isolation support improves we see fewer multi-device groups, this > > scenario becomes the exception. Buy better hardware to use the devices > > independently. > > This is basically my thinking too, but my conclusion is that we should > not continue to make groups central to the API. > > As I've explained to David this is actually causing functional > problems and mess - and I don't see a clean way to keep groups central > but still have the device in control of what is happening. We need > this device <-> iommu connection to be direct to robustly model all > the things that are in the RFC. > > To keep groups central someone needs to sketch out how to solve > today's mdev SW page table and mdev PASID issues in a clean > way. Device centric is my suggestion on how to make it clean, but I > haven't heard an alternative?? > > So, I view the purpose of this discussion to scope out what a > device-centric world looks like and then if we can securely fit in the > legacy non-isolated world on top of that clean future oriented > API. Then decide if it is work worth doing or not. > > To my mind it looks like it is not so bad, granted not every detail is > clear, and no code has be sketched, but I don't see a big scary > blocker emerging. An extra ioctl or two, some special logic that > activates for >1 device groups that looks a lot like VFIO's current > logic.. > > At some level I would be perfectly fine if we made the group FD part > of the API for >1 device groups - except that complexifies every user > space implementation to deal with that. It doesn't feel like a good > trade off. > Would it be an acceptable tradeoff by leaving >1 device groups supported only via legacy VFIO (which is anyway kept for backward compatibility), if we think such scenario is being deprecated over time (thus little value to add new features on it)? Then all new sub-systems including vdpa and new vfio only support singleton device group via /dev/iommu... Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 2/5] ACPI: Move IOMMU setup code out of IORT
On 2021-06-18 16:20, Jean-Philippe Brucker wrote: Extract the code that sets up the IOMMU infrastructure from IORT, since it can be reused by VIOT. Move it one level up into a new acpi_iommu_configure_id() function, which calls the IORT parsing function which in turn calls the acpi_iommu_fwspec_init() helper. Reviewed-by: Robin Murphy Signed-off-by: Jean-Philippe Brucker --- include/acpi/acpi_bus.h | 3 ++ include/linux/acpi_iort.h | 8 ++--- drivers/acpi/arm64/iort.c | 74 +-- drivers/acpi/scan.c | 73 +- 4 files changed, 86 insertions(+), 72 deletions(-) diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 3a82faac5767..41f092a269f6 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -588,6 +588,9 @@ struct acpi_pci_root { bool acpi_dma_supported(struct acpi_device *adev); enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev); +int acpi_iommu_fwspec_init(struct device *dev, u32 id, + struct fwnode_handle *fwnode, + const struct iommu_ops *ops); int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset, u64 *size); int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr, diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index f7f054833afd..f1f0842a2cb2 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev); int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); /* IOMMU interface */ int iort_dma_get_ranges(struct device *dev, u64 *size); -const struct iommu_ops *iort_iommu_configure_id(struct device *dev, - const u32 *id_in); +int iort_iommu_configure_id(struct device *dev, const u32 *id_in); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); phys_addr_t acpi_iort_dma_get_max_cpu_address(void); #else @@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device *dev) { } /* IOMMU interface */ static inline int iort_dma_get_ranges(struct device *dev, u64 *size) { return -ENODEV; } -static inline const struct iommu_ops *iort_iommu_configure_id( - struct device *dev, const u32 *id_in) -{ return NULL; } +static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in) +{ return -ENODEV; } static inline int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { return 0; } diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index a940be1cf2af..487d1095030d 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -806,23 +806,6 @@ static struct acpi_iort_node *iort_get_msi_resv_iommu(struct device *dev) return NULL; } -static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device *dev) -{ - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); - - return (fwspec && fwspec->ops) ? fwspec->ops : NULL; -} - -static inline int iort_add_device_replay(struct device *dev) -{ - int err = 0; - - if (dev->bus && !device_iommu_mapped(dev)) - err = iommu_probe_device(dev); - - return err; -} - /** * iort_iommu_msi_get_resv_regions - Reserved region driver helper * @dev: Device from iommu_get_resv_regions() @@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type) } } -static int arm_smmu_iort_xlate(struct device *dev, u32 streamid, - struct fwnode_handle *fwnode, - const struct iommu_ops *ops) -{ - int ret = iommu_fwspec_init(dev, fwnode, ops); - - if (!ret) - ret = iommu_fwspec_add_ids(dev, , 1); - - return ret; -} - static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node) { struct acpi_iort_root_complex *pci_rc; @@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node, return iort_iommu_driver_enabled(node->type) ? -EPROBE_DEFER : -ENODEV; - return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops); + return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops); } struct iort_pci_alias_info { @@ -1020,24 +991,13 @@ static int iort_nc_iommu_map_id(struct device *dev, * @dev: device to configure * @id_in: optional input id const value pointer * - * Returns: iommu_ops pointer on configuration success - * NULL on configuration failure + * Returns: 0 on success, <0 on failure */ -const struct iommu_ops *iort_iommu_configure_id(struct device *dev, - const u32 *id_in) +int iort_iommu_configure_id(struct device *dev, const u32 *id_in) { struct acpi_iort_node
Re: [PATCH v5 1/5] ACPI: arm64: Move DMA setup operations out of IORT
On 2021-06-18 16:20, Jean-Philippe Brucker wrote: Extract generic DMA setup code out of IORT, so it can be reused by VIOT. Keep it in drivers/acpi/arm64 for now, since it could break x86 platforms that haven't run this code so far, if they have invalid tables. Reviewed-by: Robin Murphy Reviewed-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/arm64/Makefile | 1 + include/linux/acpi.h| 3 +++ include/linux/acpi_iort.h | 6 ++--- drivers/acpi/arm64/dma.c| 50 ++ drivers/acpi/arm64/iort.c | 54 ++--- drivers/acpi/scan.c | 2 +- 6 files changed, 66 insertions(+), 50 deletions(-) create mode 100644 drivers/acpi/arm64/dma.c diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index 6ff50f4ed947..66acbe77f46e 100644 --- a/drivers/acpi/arm64/Makefile +++ b/drivers/acpi/arm64/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only obj-$(CONFIG_ACPI_IORT) += iort.o obj-$(CONFIG_ACPI_GTDT) += gtdt.o +obj-y += dma.o diff --git a/include/linux/acpi.h b/include/linux/acpi.h index c60745f657e9..7aaa9559cc19 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -259,9 +259,12 @@ void acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa); #ifdef CONFIG_ARM64 void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa); +void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size); #else static inline void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { } +static inline void +acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) { } #endif int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma); diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index 1a12baa58e40..f7f054833afd 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -34,7 +34,7 @@ struct irq_domain *iort_get_device_domain(struct device *dev, u32 id, void acpi_configure_pmsi_domain(struct device *dev); int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); /* IOMMU interface */ -void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size); +int iort_dma_get_ranges(struct device *dev, u64 *size); const struct iommu_ops *iort_iommu_configure_id(struct device *dev, const u32 *id_in); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); @@ -48,8 +48,8 @@ static inline struct irq_domain *iort_get_device_domain( { return NULL; } static inline void acpi_configure_pmsi_domain(struct device *dev) { } /* IOMMU interface */ -static inline void iort_dma_setup(struct device *dev, u64 *dma_addr, - u64 *size) { } +static inline int iort_dma_get_ranges(struct device *dev, u64 *size) +{ return -ENODEV; } static inline const struct iommu_ops *iort_iommu_configure_id( struct device *dev, const u32 *id_in) { return NULL; } diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c new file mode 100644 index ..f16739ad3cc0 --- /dev/null +++ b/drivers/acpi/arm64/dma.c @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include + +void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) +{ + int ret; + u64 end, mask; + u64 dmaaddr = 0, size = 0, offset = 0; + + /* +* If @dev is expected to be DMA-capable then the bus code that created +* it should have initialised its dma_mask pointer by this point. For +* now, we'll continue the legacy behaviour of coercing it to the +* coherent mask if not, but we'll no longer do so quietly. +*/ + if (!dev->dma_mask) { + dev_warn(dev, "DMA mask not set\n"); + dev->dma_mask = >coherent_dma_mask; + } + + if (dev->coherent_dma_mask) + size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1); + else + size = 1ULL << 32; + + ret = acpi_dma_get_range(dev, , , ); + if (ret == -ENODEV) + ret = iort_dma_get_ranges(dev, ); + if (!ret) { + /* +* Limit coherent and dma mask based on size retrieved from +* firmware. +*/ + end = dmaaddr + size - 1; + mask = DMA_BIT_MASK(ilog2(end) + 1); + dev->bus_dma_limit = end; + dev->coherent_dma_mask = min(dev->coherent_dma_mask, mask); + *dev->dma_mask = min(*dev->dma_mask, mask); + } + + *dma_addr = dmaaddr; + *dma_size = size; + + ret = dma_direct_set_offset(dev, dmaaddr + offset, dmaaddr, size); + + dev_dbg(dev, "dma_offset(%#08llx)%s\n", offset, ret ? " failed!" : ""); +}
Re: [PATCH v5 4/5] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()
On 2021-06-18 16:20, Jean-Philippe Brucker wrote: Passing a 64-bit address width to iommu_setup_dma_ops() is valid on virtual platforms, but isn't currently possible. The overflow check in iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass a limit address instead of a size, so callers don't have to fake a size to work around the check. The base and limit parameters are being phased out, because: * they are redundant for x86 callers. dma-iommu already reserves the first page, and the upper limit is already in domain->geometry. * they can now be obtained from dev->dma_range_map on Arm. But removing them on Arm isn't completely straightforward so is left for future work. As an intermediate step, simplify the x86 callers by passing dummy limits. Reviewed-by: Robin Murphy Signed-off-by: Jean-Philippe Brucker --- include/linux/dma-iommu.h | 4 ++-- arch/arm64/mm/dma-mapping.c | 2 +- drivers/iommu/amd/iommu.c | 2 +- drivers/iommu/dma-iommu.c | 12 ++-- drivers/iommu/intel/iommu.c | 5 + 5 files changed, 11 insertions(+), 14 deletions(-) diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 6e75a2d689b4..758ca4694257 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base); void iommu_put_dma_cookie(struct iommu_domain *domain); /* Setup call for arch DMA mapping code */ -void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size); +void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); /* The DMA API isn't _quite_ the whole story, though... */ /* @@ -50,7 +50,7 @@ struct msi_msg; struct device; static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base, - u64 size) + u64 dma_limit) { } diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 4bf1dd3eb041..6719f9efea09 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, dev->dma_coherent = coherent; if (iommu) - iommu_setup_dma_ops(dev, dma_base, size); + iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1); #ifdef CONFIG_XEN if (xen_swiotlb_detect()) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 3ac42bbdefc6..216323fb27ef 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev) /* Domains are initialized for this device - have a look what we ended up with */ domain = iommu_get_domain_for_dev(dev); if (domain->type == IOMMU_DOMAIN_DMA) - iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0); + iommu_setup_dma_ops(dev, 0, U64_MAX); else set_dma_ops(dev, NULL); } diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 7bcdd1205535..c62e19bed302 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev) * iommu_dma_init_domain - Initialise a DMA mapping domain * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() * @base: IOVA at which the mappable address space starts - * @size: Size of IOVA space + * @limit: Last address of the IOVA space * @dev: Device the domain is being initialised for * - * @base and @size should be exact multiples of IOMMU page granularity to + * @base and @limit + 1 should be exact multiples of IOMMU page granularity to * avoid rounding surprises. If necessary, we reserve the page at address 0 * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but * any change which could make prior IOVAs invalid will fail. */ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, - u64 size, struct device *dev) +dma_addr_t limit, struct device *dev) { struct iommu_dma_cookie *cookie = domain->iova_cookie; unsigned long order, base_pfn; @@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, /* Check the domain allows at least some access to the device... */ if (domain->geometry.force_aperture) { if (base > domain->geometry.aperture_end || - base + size <= domain->geometry.aperture_start) { + limit < domain->geometry.aperture_start) { pr_warn("specified DMA range outside IOMMU capability\n"); return -EFAULT; } @@ -1308,7 +1308,7 @@ static const struct dma_map_ops iommu_dma_ops = { * The IOMMU core code allocates the default DMA domain,
Re: [PATCH v5 3/5] ACPI: Add driver for the VIOT table
On Fri, Jun 18, 2021 at 5:33 PM Jean-Philippe Brucker wrote: > > The ACPI Virtual I/O Translation Table describes topology of > para-virtual platforms, similarly to vendor tables DMAR, IVRS and IORT. > For now it describes the relation between virtio-iommu and the endpoints > it manages. > > Three steps are needed to configure DMA of endpoints: > > (1) acpi_viot_init(): parse the VIOT table, find or create the fwnode > associated to each vIOMMU device. This needs to happen after > acpi_scan_init(), because it relies on the struct device and their > fwnode to be available. > > (2) When probing the vIOMMU device, the driver registers its IOMMU ops > within the IOMMU subsystem. This step doesn't require any > intervention from the VIOT driver. > > (3) viot_iommu_configure(): before binding the endpoint to a driver, > find the associated IOMMU ops. Register them, along with the > endpoint ID, into the device's iommu_fwspec. > > If step (3) happens before step (2), it is deferred until the IOMMU is > initialized, then retried. > > Tested-by: Eric Auger > Reviewed-by: Eric Auger > Signed-off-by: Jean-Philippe Brucker >From the general ACPI perspective Acked-by: Rafael J. Wysocki and I'm assuming that it will be routed through a different tree. > --- > drivers/acpi/Kconfig | 3 + > drivers/iommu/Kconfig | 1 + > drivers/acpi/Makefile | 2 + > include/linux/acpi_viot.h | 19 ++ > drivers/acpi/bus.c| 2 + > drivers/acpi/scan.c | 3 + > drivers/acpi/viot.c | 366 ++ > MAINTAINERS | 8 + > 8 files changed, 404 insertions(+) > create mode 100644 include/linux/acpi_viot.h > create mode 100644 drivers/acpi/viot.c > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig > index eedec61e3476..3758c6940ed7 100644 > --- a/drivers/acpi/Kconfig > +++ b/drivers/acpi/Kconfig > @@ -526,6 +526,9 @@ endif > > source "drivers/acpi/pmic/Kconfig" > > +config ACPI_VIOT > + bool > + > endif # ACPI > > config X86_PM_TIMER > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 1f111b399bca..aff8a4830dd1 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -403,6 +403,7 @@ config VIRTIO_IOMMU > depends on ARM64 > select IOMMU_API > select INTERVAL_TREE > + select ACPI_VIOT if ACPI > help > Para-virtualised IOMMU driver with virtio. > > diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile > index 700b41adf2db..a6e644c48987 100644 > --- a/drivers/acpi/Makefile > +++ b/drivers/acpi/Makefile > @@ -118,3 +118,5 @@ video-objs += acpi_video.o video_detect.o > obj-y += dptf/ > > obj-$(CONFIG_ARM64)+= arm64/ > + > +obj-$(CONFIG_ACPI_VIOT)+= viot.o > diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h > new file mode 100644 > index ..1eb8ee5b0e5f > --- /dev/null > +++ b/include/linux/acpi_viot.h > @@ -0,0 +1,19 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > + > +#ifndef __ACPI_VIOT_H__ > +#define __ACPI_VIOT_H__ > + > +#include > + > +#ifdef CONFIG_ACPI_VIOT > +void __init acpi_viot_init(void); > +int viot_iommu_configure(struct device *dev); > +#else > +static inline void acpi_viot_init(void) {} > +static inline int viot_iommu_configure(struct device *dev) > +{ > + return -ENODEV; > +} > +#endif > + > +#endif /* __ACPI_VIOT_H__ */ > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c > index a4bd673934c0..d6f4e2f06fdb 100644 > --- a/drivers/acpi/bus.c > +++ b/drivers/acpi/bus.c > @@ -27,6 +27,7 @@ > #include > #endif > #include > +#include > #include > #include > #include > @@ -1334,6 +1335,7 @@ static int __init acpi_init(void) > acpi_wakeup_device_init(); > acpi_debugger_init(); > acpi_setup_sb_notify_handler(); > + acpi_viot_init(); > return 0; > } > > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > index 2a2e690040e9..3e2bb04ab528 100644 > --- a/drivers/acpi/scan.c > +++ b/drivers/acpi/scan.c > @@ -9,6 +9,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -1556,6 +1557,8 @@ static const struct iommu_ops > *acpi_iommu_configure_id(struct device *dev, > return ops; > > err = iort_iommu_configure_id(dev, id_in); > + if (err && err != -EPROBE_DEFER) > + err = viot_iommu_configure(dev); > > /* > * If we have reason to believe the IOMMU driver missed the initial > diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c > new file mode 100644 > index ..d2256326c73a > --- /dev/null > +++ b/drivers/acpi/viot.c > @@ -0,0 +1,366 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Virtual I/O topology > + * > + * The Virtual I/O Translation Table (VIOT) describes the topology of > + * para-virtual IOMMUs and the endpoints they
[PATCH] iommu/arm-smmu-v3: Add default domain quirk for Arm Fast Models
Arm Fast Models are still implementing legacy virtio-pci devices behind the SMMU, which continue to be problematic as "real hardware" devices (from the point of view of the simulated system) without the mitigating VIRTIO_F_ACCESS_PLATFORM feature. Since we now have the ability to force passthrough on a device-specific basis, let's use it to work around this particular oddity so that people who just want to boot Linux on a model don't have to fiddle around with things to avoid the SMMU getting in the way by default (and I don't have to keep telling them which things...) Signed-off-by: Robin Murphy --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 54b2f27b81d4..13cf16e8f45b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2649,6 +2649,20 @@ static int arm_smmu_dev_disable_feature(struct device *dev, } } +static int arm_smmu_def_domain_type(struct device *dev) +{ + if (dev_is_pci(dev)) { + struct pci_dev *pdev = to_pci_dev(dev); + + /* Legacy virtio-block devices on Arm Fast Models */ + if (pdev->vendor == 0x1af4 && pdev->device == 0x1001 && + pdev->subsystem_vendor == 0x00ff && pdev->subsystem_device == 0x0002) + return IOMMU_DOMAIN_IDENTITY; + } + + return 0; +} + static struct iommu_ops arm_smmu_ops = { .capable= arm_smmu_capable, .domain_alloc = arm_smmu_domain_alloc, @@ -2673,6 +2687,7 @@ static struct iommu_ops arm_smmu_ops = { .sva_bind = arm_smmu_sva_bind, .sva_unbind = arm_smmu_sva_unbind, .sva_get_pasid = arm_smmu_sva_get_pasid, + .def_domain_type= arm_smmu_def_domain_type, .pgsize_bitmap = -1UL, /* Restricted during device attach */ .owner = THIS_MODULE, }; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 07/12] x86: Secure Launch SMP bringup support
On Intel, the APs are left in a well documented state after TXT performs the late launch. Specifically they cannot have #INIT asserted on them so a standard startup via INIT/SIPI/SIPI cannot be performed. Instead the early SL stub code parked the APs in a pause/jmp loop waiting for an NMI. The modified SMP boot code is called for the Secure Launch case. The jump address for the RM piggy entry point is fixed up in the jump where the APs are waiting and an NMI IPI is sent to the AP. The AP vectors to the Secure Launch entry point in the RM piggy which mimics what the real mode code would do then jumps the the standard RM piggy protected mode entry point. Signed-off-by: Ross Philipson --- arch/x86/include/asm/realmode.h | 3 ++ arch/x86/kernel/smpboot.c| 86 arch/x86/realmode/rm/header.S| 3 ++ arch/x86/realmode/rm/trampoline_64.S | 37 4 files changed, 129 insertions(+) diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h index 5db5d08..ef37bf1 100644 --- a/arch/x86/include/asm/realmode.h +++ b/arch/x86/include/asm/realmode.h @@ -37,6 +37,9 @@ struct real_mode_header { #ifdef CONFIG_X86_64 u32 machine_real_restart_seg; #endif +#ifdef CONFIG_SECURE_LAUNCH + u32 sl_trampoline_start32; +#endif }; /* This must match data at realmode/rm/trampoline_{32,64}.S */ diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 7770245..c324b04 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -57,6 +57,7 @@ #include #include #include +#include #include #include @@ -1023,6 +1024,83 @@ int common_cpu_up(unsigned int cpu, struct task_struct *idle) return 0; } +#ifdef CONFIG_SECURE_LAUNCH + +static atomic_t first_ap_only = {1}; + +/* + * Called to fix the long jump address for the waiting APs to vector to + * the correct startup location in the Secure Launch stub in the rmpiggy. + */ +static int +slaunch_fixup_jump_vector(void) +{ + struct sl_ap_wake_info *ap_wake_info; + u32 *ap_jmp_ptr = NULL; + + if (!atomic_dec_and_test(_ap_only)) + return 0; + + ap_wake_info = slaunch_get_ap_wake_info(); + + ap_jmp_ptr = (u32 *)__va(ap_wake_info->ap_wake_block + +ap_wake_info->ap_jmp_offset); + + *ap_jmp_ptr = real_mode_header->sl_trampoline_start32; + + pr_info("TXT AP long jump address updated\n"); + + return 0; +} + +/* + * TXT AP startup is quite different than normal. The APs cannot have #INIT + * asserted on them or receive SIPIs. The early Secure Launch code has parked + * the APs in a pause loop waiting to receive an NMI. This will wake the APs + * and have them jump to the protected mode code in the rmpiggy where the rest + * of the SMP boot of the AP will proceed normally. + */ +static int +slaunch_wakeup_cpu_from_txt(int cpu, int apicid) +{ + unsigned long send_status = 0, accept_status = 0; + + /* Only done once */ + if (slaunch_fixup_jump_vector()) + return -1; + + /* Send NMI IPI to idling AP and wake it up */ + apic_icr_write(APIC_DM_NMI, apicid); + + if (init_udelay == 0) + udelay(10); + else + udelay(300); + + send_status = safe_apic_wait_icr_idle(); + + if (init_udelay == 0) + udelay(10); + else + udelay(300); + + accept_status = (apic_read(APIC_ESR) & 0xEF); + + if (send_status) + pr_err("Secure Launch IPI never delivered???\n"); + if (accept_status) + pr_err("Secure Launch IPI delivery error (%lx)\n", + accept_status); + + return (send_status | accept_status); +} + +#else + +#define slaunch_wakeup_cpu_from_txt(cpu, apicid) 0 + +#endif /* !CONFIG_SECURE_LAUNCH */ + /* * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad * (ie clustered apic addressing mode), this is a LOGICAL apic ID. @@ -1077,6 +1155,13 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle, cpumask_clear_cpu(cpu, cpu_initialized_mask); smp_mb(); + /* With Intel TXT, the AP startup is totally different */ + if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) == + (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) { + boot_error = slaunch_wakeup_cpu_from_txt(cpu, apicid); + goto txt_wake; + } + /* * Wake up a CPU in difference cases: * - Use the method in the APIC driver if it's defined @@ -1089,6 +1174,7 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle, boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid, cpu0_nmi_registered); +txt_wake: if (!boot_error) { /* * Wait 10s total for
[PATCH v2 01/12] x86/boot: Place kernel_info at a fixed offset
From: Arvind Sankar There are use cases for storing the offset of a symbol in kernel_info. For example, the trenchboot series [0] needs to store the offset of the Measured Launch Environment header in kernel_info. Since commit (note: commit ID from tip/master) 527afc212231 ("x86/boot: Check that there are no run-time relocations") run-time relocations are not allowed in the compressed kernel, so simply using the symbol in kernel_info, as .long symbol will cause a linker error because this is not position-independent. With kernel_info being a separate object file and in a different section from startup_32, there is no way to calculate the offset of a symbol from the start of the image in a position-independent way. To enable such use cases, put kernel_info into its own section which is placed at a predetermined offset (KERNEL_INFO_OFFSET) via the linker script. This will allow calculating the symbol offset in a position-independent way, by adding the offset from the start of kernel_info to KERNEL_INFO_OFFSET. Ensure that kernel_info is aligned, and use the SYM_DATA.* macros instead of bare labels. This stores the size of the kernel_info structure in the ELF symbol table. Signed-off-by: Arvind Sankar Cc: Ross Philipson Signed-off-by: Ross Philipson --- arch/x86/boot/compressed/kernel_info.S | 19 +++ arch/x86/boot/compressed/kernel_info.h | 12 arch/x86/boot/compressed/vmlinux.lds.S | 6 ++ 3 files changed, 33 insertions(+), 4 deletions(-) create mode 100644 arch/x86/boot/compressed/kernel_info.h diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S index f818ee8..c18f071 100644 --- a/arch/x86/boot/compressed/kernel_info.S +++ b/arch/x86/boot/compressed/kernel_info.S @@ -1,12 +1,23 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#include #include +#include "kernel_info.h" - .section ".rodata.kernel_info", "a" +/* + * If a field needs to hold the offset of a symbol from the start + * of the image, use the macro below, eg + * .long rva(symbol) + * This will avoid creating run-time relocations, which are not + * allowed in the compressed kernel. + */ + +#define rva(X) (((X) - kernel_info) + KERNEL_INFO_OFFSET) - .global kernel_info + .section ".rodata.kernel_info", "a" -kernel_info: + .balign 16 +SYM_DATA_START(kernel_info) /* Header, Linux top (structure). */ .ascii "LToP" /* Size. */ @@ -19,4 +30,4 @@ kernel_info: kernel_info_var_len_data: /* Empty for time being... */ -kernel_info_end: +SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end) diff --git a/arch/x86/boot/compressed/kernel_info.h b/arch/x86/boot/compressed/kernel_info.h new file mode 100644 index ..c127f84 --- /dev/null +++ b/arch/x86/boot/compressed/kernel_info.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef BOOT_COMPRESSED_KERNEL_INFO_H +#define BOOT_COMPRESSED_KERNEL_INFO_H + +#ifdef CONFIG_X86_64 +#define KERNEL_INFO_OFFSET 0x500 +#else /* 32-bit */ +#define KERNEL_INFO_OFFSET 0x100 +#endif + +#endif /* BOOT_COMPRESSED_KERNEL_INFO_H */ diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S index 112b237..84c7b4d 100644 --- a/arch/x86/boot/compressed/vmlinux.lds.S +++ b/arch/x86/boot/compressed/vmlinux.lds.S @@ -7,6 +7,7 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT) #include #include +#include "kernel_info.h" #ifdef CONFIG_X86_64 OUTPUT_ARCH(i386:x86-64) @@ -27,6 +28,11 @@ SECTIONS HEAD_TEXT _ehead = . ; } + .rodata.kernel_info KERNEL_INFO_OFFSET : { + *(.rodata.kernel_info) + } + ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info at bad address!") + .rodata..compressed : { *(.rodata..compressed) } -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 02/12] x86: Secure Launch Kconfig
Initial bits to bring in Secure Launch functionality. Add Kconfig options for compiling in/out the Secure Launch code. Signed-off-by: Ross Philipson --- arch/x86/Kconfig | 32 1 file changed, 32 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0045e1b..65d69f0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1989,6 +1989,38 @@ config EFI_MIXED If unsure, say N. +config SECURE_LAUNCH + bool "Secure Launch support" + default n + depends on X86_64 && X86_X2APIC + help + The Secure Launch feature allows a kernel to be loaded + directly through an Intel TXT measured launch. Intel TXT + establishes a Dynamic Root of Trust for Measurement (DRTM) + where the CPU measures the kernel image. This feature then + continues the measurement chain over kernel configuration + information and init images. + +config SECURE_LAUNCH_ALT_PCR19 + bool "Secure Launch Alternate PCR 19 usage" + default n + depends on SECURE_LAUNCH + help + In the post ACM environment, Secure Launch by default measures + configuration information into PCR18. This feature allows finer + control over measurements by moving configuration measurements + into PCR19. + +config SECURE_LAUNCH_ALT_PCR20 + bool "Secure Launch Alternate PCR 20 usage" + default n + depends on SECURE_LAUNCH + help + In the post ACM environment, Secure Launch by default measures + image data like any external initrd into PCR17. This feature + allows finer control over measurements by moving image measurements + into PCR20. + source "kernel/Kconfig.hz" config KEXEC -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 00/12] x86: Trenchboot secure dynamic launch Linux kernel support
The focus of Trechboot project (https://github.com/TrenchBoot) is to enhance the boot security and integrity. This requires the linux kernel to be directly invoked by x86 Dynamic launch measurements to establish Dynamic Root of Trust for Measurement (DRTM). The dynamic launch will be initiated by a boot loader with associated support added to it, for example the first targeted boot loader will be GRUB2. An integral part of establishing the DRTM involves measuring everything that is intended to be run (kernel image, initrd, etc) and everything that will configure that kernel to run (command line, boot params, etc) into specific PCRs, the DRTM PCRs (17-22), in the TPM. Another key aspect is the dynamic launch is rooted in hardware, that is to say the hardware (CPU) is what takes the first measurement for the chain of integrity measurements. On Intel this is done using the GETSEC instruction provided by Intel's TXT and the SKINIT instruction provided by AMD's AMD-V. Information on these technologies can be readily found online. This patchset introduces Intel TXT support. To enable the kernel to be launched by GETSEC, a stub must be built into the setup section of the compressed kernel to handle the specific state that the dynamic launch process leaves the BSP in. Also this stub must measure everything that is going to be used as early as possible. This stub code and subsequent code must also deal with the specific state that the dynamic launch leaves the APs in. A quick note on terminology. The larger open source project itself is called Trenchboot, which is hosted on Github (links below). The kernel feature enabling the use of the x86 technology is referred to as "Secure Launch" within the kernel code. As such the prefixes sl_/SL_ or slaunch/SLAUNCH will be seen in the code. The stub code discussed above is referred to as the SL stub. Note that patch 1 was authored by Arvind Sankar. We were not able to get a status on this patch but Secure Launch depends on it so it is included with the set. The basic flow is: - Entry from the dynamic launch jumps to the SL stub - SL stub fixes up the world on the BSP - For TXT, SL stub wakes the APs, fixes up their worlds - For TXT, APs are left halted waiting for an NMI to wake them - SL stub jumps to startup_32 - SL main locates the TPM event log and writes the measurements of configuration and module information into it. - Kernel boot proceeds normally from this point. - During early setup, slaunch_setup() runs to finish some validation and setup tasks. - The SMP bringup code is modified to wake the waiting APs. APs vector to rmpiggy and start up normally from that point. - SL platform module is registered as a late initcall module. It reads the TPM event log and extends the measurements taken into the TPM PCRs. - SL platform module initializes the securityfs interface to allow asccess to the TPM event log and TXT public registers. - Kernel boot finishes booting normally - SEXIT support to leave SMX mode is present on the kexec path and the various reboot paths (poweroff, reset, halt). Links: The Trenchboot project including documentation: https://github.com/trenchboot Intel TXT is documented in its own specification and in the SDM Instruction Set volume: https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf https://software.intel.com/en-us/articles/intel-sdm AMD SKINIT is documented in the System Programming manual: https://www.amd.com/system/files/TechDocs/24593.pdf GRUB2 pre-launch support patchset (WIP): https://lists.gnu.org/archive/html/grub-devel/2020-05/msg00011.html Thanks Ross Philipson and Daniel P. Smith Changes in v2: - Modified 32b entry code to prevent causing relocations in the compressed kernel. - Dropped patches for compressed kernel TPM PCR extender. - Modified event log code to insert log delimiter events and not rely on TPM access. - Stop extending PCRs in the early Secure Launch stub code. - Removed Kconfig options for hash algorithms and use the algorithms the ACM used. - Match Secure Launch measurement algorithm use to those reported in the TPM 2.0 event log. - Read the TPM events out of the TPM and extend them into the PCRs using the mainline TPM driver. This is done in the late initcall module. - Allow use of alternate PCR 19 and 20 for post ACM measurements. - Add Kconfig constraints needed by Secure Launch (disable KASLR and add x2apic dependency). - Fix testing of SL_FLAGS when determining if Secure Launch is active and the architecture is TXT. - Use SYM_DATA_START_LOCAL macros in early entry point code. - Security audit changes: - Validate buffers passed to MLE do not overlap the MLE and are properly laid out. - Validate buffers and memory regions used by the MLE are protected by IOMMU PMRs. - Force IOMMU to not use passthrough mode during a Secure Launch. - Prevent KASLR use during a
[PATCH v2 05/12] x86: Secure Launch kernel early boot stub
The Secure Launch (SL) stub provides the entry point for Intel TXT (and later AMD SKINIT) to vector to during the late launch. The symbol sl_stub_entry is that entry point and its offset into the kernel is conveyed to the launching code using the MLE (Measured Launch Environment) header in the structure named mle_header. The offset of the MLE header is set in the kernel_info. The routine sl_stub contains the very early late launch setup code responsible for setting up the basic environment to allow the normal kernel startup_32 code to proceed. It is also responsible for properly waking and handling the APs on Intel platforms. The routine sl_main which runs after entering 64b mode is responsible for measuring configuration and module information before it is used like the boot params, the kernel command line, the TXT heap, an external initramfs, etc. Signed-off-by: Ross Philipson --- Documentation/x86/boot.rst | 13 + arch/x86/boot/compressed/Makefile | 3 +- arch/x86/boot/compressed/head_64.S | 37 ++ arch/x86/boot/compressed/kaslr.c | 11 + arch/x86/boot/compressed/kernel_info.S | 33 ++ arch/x86/boot/compressed/sl_main.c | 523 ++ arch/x86/boot/compressed/sl_stub.S | 667 + arch/x86/kernel/asm-offsets.c | 19 + 8 files changed, 1305 insertions(+), 1 deletion(-) create mode 100644 arch/x86/boot/compressed/sl_main.c create mode 100644 arch/x86/boot/compressed/sl_stub.S diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst index fc84491..7623f60 100644 --- a/Documentation/x86/boot.rst +++ b/Documentation/x86/boot.rst @@ -1026,6 +1026,19 @@ Offset/size: 0x000c/4 This field contains maximal allowed type for setup_data and setup_indirect structs. + = +Field name:mle_header_offset +Offset/size: 0x0010/4 + = + + This field contains the offset to the Secure Launch Measured Launch Environment + (MLE) header. This offset is used to locate information needed during a secure + late launch using Intel TXT. If the offset is zero, the kernel does not have + Secure Launch capabilities. The MLE entry point is called from TXT on the BSP + following a success measured launch. The specific state of the processors is + outlined in the TXT Software Development Guide, the latest can be found here: + https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf + The Image Checksum == diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 059d49a..1fe55a5 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -102,7 +102,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a -vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o \ + $(obj)/sl_main.o $(obj)/sl_stub.o $(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE $(call if_changed,ld) diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index a2347de..b35e072 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -498,6 +498,17 @@ trampoline_return: pushq $0 popfq +#ifdef CONFIG_SECURE_LAUNCH + pushq %rsi + + /* Ensure the relocation region coverd by a PMR */ + movq%rbx, %rdi + movl$(_bss - startup_32), %esi + callq sl_check_region + + popq%rsi +#endif + /* * Copy the compressed kernel to the end of our buffer * where decompression in place becomes safe. @@ -556,6 +567,32 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated) shrq$3, %rcx rep stosq +#ifdef CONFIG_SECURE_LAUNCH + /* +* Have to do the final early sl stub work in 64b area. +* +* *** NOTE *** +* +* Several boot params get used before we get a chance to measure +* them in this call. This is a known issue and we currently don't +* have a solution. The scratch field doesn't matter. There is no +* obvious way to do anything about the use of kernel_alignment or +* init_size though these seem low risk with all the PMR and overlap +* checks in place. +*/ + pushq %rsi + + movq%rsi, %rdi + callq sl_main + + /* Ensure the decompression location is coverd by a PMR */ + movq%rbp, %rdi + movqoutput_len(%rip), %rsi + callq sl_check_region + + popq%rsi +#endif + /* * If running as an SEV guest, the encryption mask is required in the * page-table setup code below. When the
[PATCH v2 10/12] x86: Secure Launch late initcall platform module
From: "Daniel P. Smith" The Secure Launch platform module is a late init module. During the init call, the TPM event log is read and measurements taken in the early boot stub code are located. These measurements are extended into the TPM PCRs using the mainline TPM kernel driver. The platform module also registers the securityfs nodes to allow access to TXT register fields on Intel along with the fetching of and writing events to the late launch TPM log. Signed-off-by: Daniel P. Smith Signed-off-by: garnetgrimm Signed-off-by: Ross Philipson --- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/slmodule.c | 495 + 2 files changed, 496 insertions(+) create mode 100644 arch/x86/kernel/slmodule.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 574e643..1187077 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -81,6 +81,7 @@ obj-$(CONFIG_IA32_EMULATION) += tls.o obj-y += step.o obj-$(CONFIG_INTEL_TXT)+= tboot.o obj-$(CONFIG_SECURE_LAUNCH)+= slaunch.o +obj-$(CONFIG_SECURE_LAUNCH)+= slmodule.o obj-$(CONFIG_ISA_DMA_API) += i8237.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += cpu/ diff --git a/arch/x86/kernel/slmodule.c b/arch/x86/kernel/slmodule.c new file mode 100644 index ..807f9ca --- /dev/null +++ b/arch/x86/kernel/slmodule.c @@ -0,0 +1,495 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Secure Launch late validation/setup, securityfs exposure and + * finalization support. + * + * Copyright (c) 2021 Apertus Solutions, LLC + * Copyright (c) 2021 Assured Information Security, Inc. + * Copyright (c) 2021, Oracle and/or its affiliates. + * + * Author(s): + * Daniel P. Smith + * Garnet T. Grimm + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define SL_FS_ENTRIES 10 +/* root directory node must be last */ +#define SL_ROOT_DIR_ENTRY (SL_FS_ENTRIES - 1) +#define SL_TXT_DIR_ENTRY (SL_FS_ENTRIES - 2) +#define SL_TXT_FILE_FIRST (SL_TXT_DIR_ENTRY - 1) +#define SL_TXT_ENTRY_COUNT 7 + +#define DECLARE_TXT_PUB_READ_U(size, fmt, msg_size)\ +static ssize_t txt_pub_read_u##size(unsigned int offset, \ + loff_t *read_offset,\ + size_t read_len,\ + char __user *buf) \ +{ \ + void __iomem *txt; \ + char msg_buffer[msg_size]; \ + u##size reg_value = 0; \ + txt = ioremap(TXT_PUB_CONFIG_REGS_BASE, \ + TXT_NR_CONFIG_PAGES * PAGE_SIZE); \ + if (IS_ERR(txt))\ + return PTR_ERR(txt);\ + memcpy_fromio(_value, txt + offset, sizeof(u##size)); \ + iounmap(txt); \ + snprintf(msg_buffer, msg_size, fmt, reg_value); \ + return simple_read_from_buffer(buf, read_len, read_offset, \ + _buffer, msg_size); \ +} + +DECLARE_TXT_PUB_READ_U(8, "%#04x\n", 6); +DECLARE_TXT_PUB_READ_U(32, "%#010x\n", 12); +DECLARE_TXT_PUB_READ_U(64, "%#018llx\n", 20); + +#define DECLARE_TXT_FOPS(reg_name, reg_offset, reg_size) \ +static ssize_t txt_##reg_name##_read(struct file *flip, \ + char __user *buf, size_t read_len, loff_t *read_offset) \ +{ \ + return txt_pub_read_u##reg_size(reg_offset, read_offset,\ + read_len, buf); \ +} \ +static const struct file_operations reg_name##_ops = { \ + .read = txt_##reg_name##_read, \ +} + +DECLARE_TXT_FOPS(sts, TXT_CR_STS, 64); +DECLARE_TXT_FOPS(ests, TXT_CR_ESTS, 8); +DECLARE_TXT_FOPS(errorcode, TXT_CR_ERRORCODE, 32); +DECLARE_TXT_FOPS(didvid, TXT_CR_DIDVID, 64); +DECLARE_TXT_FOPS(e2sts, TXT_CR_E2STS, 64); +DECLARE_TXT_FOPS(ver_emif, TXT_CR_VER_EMIF, 32); +DECLARE_TXT_FOPS(scratchpad, TXT_CR_SCRATCHPAD, 64); + +/* + * Securityfs exposure + */ +struct memfile { + char *name; + void *addr; + size_t size; +}; + +static struct memfile sl_evtlog = {"eventlog", 0, 0}; +static void *txt_heap; +static struct
[PATCH v2 03/12] x86: Secure Launch main header file
Introduce the main Secure Launch header file used in the early SL stub and the early setup code. Signed-off-by: Ross Philipson --- include/linux/slaunch.h | 540 1 file changed, 540 insertions(+) create mode 100644 include/linux/slaunch.h diff --git a/include/linux/slaunch.h b/include/linux/slaunch.h new file mode 100644 index ..dd8d92e --- /dev/null +++ b/include/linux/slaunch.h @@ -0,0 +1,540 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Main Secure Launch header file. + * + * Copyright (c) 2021, Oracle and/or its affiliates. + */ + +#ifndef _LINUX_SLAUNCH_H +#define _LINUX_SLAUNCH_H + +/* + * Secure Launch Defined State Flags + */ +#define SL_FLAG_ACTIVE 0x0001 +#define SL_FLAG_ARCH_SKINIT0x0002 +#define SL_FLAG_ARCH_TXT 0x0004 + +/* + * Secure Launch CPU Type + */ +#define SL_CPU_AMD 1 +#define SL_CPU_INTEL 2 + +#if IS_ENABLED(CONFIG_SECURE_LAUNCH) + +#define __SL32_CS 0x0008 +#define __SL32_DS 0x0010 + +#define INTEL_CPUID_MFGID_EBX 0x756e6547 /* Genu */ +#define INTEL_CPUID_MFGID_EDX 0x49656e69 /* ineI */ +#define INTEL_CPUID_MFGID_ECX 0x6c65746e /* ntel */ + +#define AMD_CPUID_MFGID_EBX0x68747541 /* Auth */ +#define AMD_CPUID_MFGID_EDX0x69746e65 /* enti */ +#define AMD_CPUID_MFGID_ECX0x444d4163 /* cAMD */ + +/* + * Intel Safer Mode Extensions (SMX) + * + * Intel SMX provides a programming interface to establish a Measured Launched + * Environment (MLE). The measurement and protection mechanisms supported by the + * capabilities of an Intel Trusted Execution Technology (TXT) platform. SMX is + * the processor’s programming interface in an Intel TXT platform. + * + * See Intel SDM Volume 2 - 6.1 "Safer Mode Extensions Reference" + */ + +/* + * SMX GETSEC Leaf Functions + */ +#define SMX_X86_GETSEC_SEXIT 5 +#define SMX_X86_GETSEC_SMCTRL 7 +#define SMX_X86_GETSEC_WAKEUP 8 + +/* + * Intel Trusted Execution Technology MMIO Registers Banks + */ +#define TXT_PUB_CONFIG_REGS_BASE 0xfed3 +#define TXT_PRIV_CONFIG_REGS_BASE 0xfed2 +#define TXT_NR_CONFIG_PAGES ((TXT_PUB_CONFIG_REGS_BASE - \ + TXT_PRIV_CONFIG_REGS_BASE) >> PAGE_SHIFT) + +/* + * Intel Trusted Execution Technology (TXT) Registers + */ +#define TXT_CR_STS 0x +#define TXT_CR_ESTS0x0008 +#define TXT_CR_ERRORCODE 0x0030 +#define TXT_CR_CMD_RESET 0x0038 +#define TXT_CR_CMD_CLOSE_PRIVATE 0x0048 +#define TXT_CR_DIDVID 0x0110 +#define TXT_CR_VER_EMIF0x0200 +#define TXT_CR_CMD_UNLOCK_MEM_CONFIG 0x0218 +#define TXT_CR_SINIT_BASE 0x0270 +#define TXT_CR_SINIT_SIZE 0x0278 +#define TXT_CR_MLE_JOIN0x0290 +#define TXT_CR_HEAP_BASE 0x0300 +#define TXT_CR_HEAP_SIZE 0x0308 +#define TXT_CR_SCRATCHPAD 0x0378 +#define TXT_CR_CMD_OPEN_LOCALITY1 0x0380 +#define TXT_CR_CMD_CLOSE_LOCALITY1 0x0388 +#define TXT_CR_CMD_OPEN_LOCALITY2 0x0390 +#define TXT_CR_CMD_CLOSE_LOCALITY2 0x0398 +#define TXT_CR_CMD_SECRETS 0x08e0 +#define TXT_CR_CMD_NO_SECRETS 0x08e8 +#define TXT_CR_E2STS 0x08f0 + +/* TXT default register value */ +#define TXT_REGVALUE_ONE 0x1ULL + +/* TXTCR_STS status bits */ +#define TXT_SENTER_DONE_STS(1<<0) +#define TXT_SEXIT_DONE_STS (1<<1) + +/* + * SINIT/MLE Capabilities Field Bit Definitions + */ +#define TXT_SINIT_MLE_CAP_WAKE_GETSEC 0 +#define TXT_SINIT_MLE_CAP_WAKE_MONITOR 1 + +/* + * OS/MLE Secure Launch Specific Definitions + */ +#define TXT_OS_MLE_STRUCT_VERSION 1 +#define TXT_OS_MLE_MAX_VARIABLE_MTRRS 32 + +/* + * TXT Heap Table Enumeration + */ +#define TXT_BIOS_DATA_TABLE1 +#define TXT_OS_MLE_DATA_TABLE 2 +#define TXT_OS_SINIT_DATA_TABLE3 +#define TXT_SINIT_MLE_DATA_TABLE 4 +#define TXT_SINIT_TABLE_MAXTXT_SINIT_MLE_DATA_TABLE + +/* + * Secure Launch Defined Error Codes used in MLE-initiated TXT resets. + * + * TXT Specification + * Appendix I ACM Error Codes + */ +#define SL_ERROR_GENERIC 0xc0008001 +#define SL_ERROR_TPM_INIT 0xc0008002 +#define SL_ERROR_TPM_INVALID_LOG20 0xc0008003 +#define SL_ERROR_TPM_LOGGING_FAILED0xc0008004 +#define SL_ERROR_REGION_STRADDLE_4GB 0xc0008005 +#define SL_ERROR_TPM_EXTEND0xc0008006 +#define SL_ERROR_MTRR_INV_VCNT 0xc0008007 +#define SL_ERROR_MTRR_INV_DEF_TYPE 0xc0008008 +#define SL_ERROR_MTRR_INV_BASE 0xc0008009 +#define SL_ERROR_MTRR_INV_MASK 0xc000800a +#define SL_ERROR_MSR_INV_MISC_EN 0xc000800b +#define SL_ERROR_INV_AP_INTERRUPT 0xc000800c +#define SL_ERROR_INTEGER_OVERFLOW 0xc000800d +#define SL_ERROR_HEAP_WALK 0xc000800e +#define SL_ERROR_HEAP_MAP
[PATCH v2 04/12] x86: Add early SHA support for Secure Launch early measurements
From: "Daniel P. Smith" The SHA algorithms are necessary to measure configuration information into the TPM as early as possible before using the values. This implementation uses the established approach of #including the SHA libraries directly in the code since the compressed kernel is not uncompressed at this point. The SHA code here has its origins in the code from the main kernel, commit c4d5b9ffa31f (crypto: sha1 - implement base layer for SHA-1). That code could not be pulled directly into the setup portion of the compressed kernel because of other dependencies it pulls in. The result is this is a modified copy of that code that still leverages the core SHA algorithms. Signed-off-by: Daniel P. Smith Signed-off-by: Ross Philipson --- arch/x86/boot/compressed/Makefile | 2 + arch/x86/boot/compressed/early_sha1.c | 103 arch/x86/boot/compressed/early_sha1.h | 17 ++ arch/x86/boot/compressed/early_sha256.c | 7 +++ lib/crypto/sha256.c | 8 +++ lib/sha1.c | 4 ++ 6 files changed, 141 insertions(+) create mode 100644 arch/x86/boot/compressed/early_sha1.c create mode 100644 arch/x86/boot/compressed/early_sha1.h create mode 100644 arch/x86/boot/compressed/early_sha256.c diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 431bf7f..059d49a 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -102,6 +102,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a +vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o $(obj)/early_sha256.o + $(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE $(call if_changed,ld) diff --git a/arch/x86/boot/compressed/early_sha1.c b/arch/x86/boot/compressed/early_sha1.c new file mode 100644 index ..74f4654 --- /dev/null +++ b/arch/x86/boot/compressed/early_sha1.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2021 Apertus Solutions, LLC. + */ + +#include +#include +#include +#include +#include + +#include "early_sha1.h" + +#define SHA1_DISABLE_EXPORT +#include "../../../../lib/sha1.c" + +/* The SHA1 implementation in lib/sha1.c was written to get the workspace + * buffer as a parameter. This wrapper function provides a container + * around a temporary workspace that is cleared after the transform completes. + */ +static void __sha_transform(u32 *digest, const char *data) +{ + u32 ws[SHA1_WORKSPACE_WORDS]; + + sha1_transform(digest, data, ws); + + memset(ws, 0, sizeof(ws)); + /* +* As this is cryptographic code, prevent the memset 0 from being +* optimized out potentially leaving secrets in memory. +*/ + wmb(); + +} + +void early_sha1_init(struct sha1_state *sctx) +{ + sha1_init(sctx->state); + sctx->count = 0; +} + +void early_sha1_update(struct sha1_state *sctx, + const u8 *data, + unsigned int len) +{ + unsigned int partial = sctx->count % SHA1_BLOCK_SIZE; + + sctx->count += len; + + if (likely((partial + len) >= SHA1_BLOCK_SIZE)) { + int blocks; + + if (partial) { + int p = SHA1_BLOCK_SIZE - partial; + + memcpy(sctx->buffer + partial, data, p); + data += p; + len -= p; + + __sha_transform(sctx->state, sctx->buffer); + } + + blocks = len / SHA1_BLOCK_SIZE; + len %= SHA1_BLOCK_SIZE; + + if (blocks) { + while (blocks--) { + __sha_transform(sctx->state, data); + data += SHA1_BLOCK_SIZE; + } + } + partial = 0; + } + + if (len) + memcpy(sctx->buffer + partial, data, len); +} + +void early_sha1_final(struct sha1_state *sctx, u8 *out) +{ + const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64); + __be64 *bits = (__be64 *)(sctx->buffer + bit_offset); + __be32 *digest = (__be32 *)out; + unsigned int partial = sctx->count % SHA1_BLOCK_SIZE; + int i; + + sctx->buffer[partial++] = 0x80; + if (partial > bit_offset) { + memset(sctx->buffer + partial, 0x0, SHA1_BLOCK_SIZE - partial); + partial = 0; + + __sha_transform(sctx->state, sctx->buffer); + } + + memset(sctx->buffer + partial, 0x0, bit_offset - partial); + *bits = cpu_to_be64(sctx->count << 3); + __sha_transform(sctx->state, sctx->buffer); + + for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++) + put_unaligned_be32(sctx->state[i], digest++); + + *sctx
[PATCH v2 06/12] x86: Secure Launch kernel late boot stub
The routine slaunch_setup is called out of the x86 specific setup_arch routine during early kernel boot. After determining what platform is present, various operations specific to that platform occur. This includes finalizing setting for the platform late launch and verifying that memory protections are in place. For TXT, this code also reserves the original compressed kernel setup area where the APs were left looping so that this memory cannot be used. Signed-off-by: Ross Philipson --- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/setup.c| 3 + arch/x86/kernel/slaunch.c | 472 + drivers/iommu/intel/dmar.c | 4 + 4 files changed, 480 insertions(+) create mode 100644 arch/x86/kernel/slaunch.c diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 0f66682..574e643 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -80,6 +80,7 @@ obj-$(CONFIG_X86_32) += tls.o obj-$(CONFIG_IA32_EMULATION) += tls.o obj-y += step.o obj-$(CONFIG_INTEL_TXT)+= tboot.o +obj-$(CONFIG_SECURE_LAUNCH)+= slaunch.o obj-$(CONFIG_ISA_DMA_API) += i8237.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += cpu/ diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 1e72062..7f5d12a 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -993,6 +994,8 @@ void __init setup_arch(char **cmdline_p) early_gart_iommu_check(); #endif + slaunch_setup(); + /* * partially used pages are not usable - thus * we are rounding upwards: diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c new file mode 100644 index ..a24d384 --- /dev/null +++ b/arch/x86/kernel/slaunch.c @@ -0,0 +1,472 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Secure Launch late validation/setup and finalization support. + * + * Copyright (c) 2021, Oracle and/or its affiliates. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static u32 sl_flags; +static struct sl_ap_wake_info ap_wake_info; +static u64 evtlog_addr; +static u32 evtlog_size; +static u64 vtd_pmr_lo_size; + +/* This should be plenty of room */ +static u8 txt_dmar[PAGE_SIZE] __aligned(16); + +u32 slaunch_get_flags(void) +{ + return sl_flags; +} +EXPORT_SYMBOL(slaunch_get_flags); + +struct sl_ap_wake_info *slaunch_get_ap_wake_info(void) +{ + return _wake_info; +} + +struct acpi_table_header *slaunch_get_dmar_table(struct acpi_table_header *dmar) +{ + /* The DMAR is only stashed and provided via TXT on Intel systems */ + if (memcmp(txt_dmar, "DMAR", 4)) + return dmar; + + return (struct acpi_table_header *)(_dmar[0]); +} + +void __noreturn slaunch_txt_reset(void __iomem *txt, + const char *msg, u64 error) +{ + u64 one = 1, val; + + pr_err("%s", msg); + + /* +* This performs a TXT reset with a sticky error code. The reads of +* TXT_CR_E2STS act as barriers. +*/ + memcpy_toio(txt + TXT_CR_ERRORCODE, , sizeof(error)); + memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val)); + memcpy_toio(txt + TXT_CR_CMD_NO_SECRETS, , sizeof(one)); + memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val)); + memcpy_toio(txt + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(one)); + memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val)); + memcpy_toio(txt + TXT_CR_CMD_RESET, , sizeof(one)); + + for ( ; ; ) + asm volatile ("hlt"); + + unreachable(); +} + +/* + * The TXT heap is too big to map all at once with early_ioremap + * so it is done a table at a time. + */ +static void __init *txt_early_get_heap_table(void __iomem *txt, u32 type, +u32 bytes) +{ + void *heap; + u64 base, size, offset = 0; + int i; + + if (type > TXT_SINIT_TABLE_MAX) + slaunch_txt_reset(txt, + "Error invalid table type for early heap walk\n", + SL_ERROR_HEAP_WALK); + + memcpy_fromio(, txt + TXT_CR_HEAP_BASE, sizeof(base)); + memcpy_fromio(, txt + TXT_CR_HEAP_SIZE, sizeof(size)); + + /* Iterate over heap tables looking for table of "type" */ + for (i = 0; i < type; i++) { + base += offset; + heap = early_memremap(base, sizeof(u64)); + if (!heap) + slaunch_txt_reset(txt, + "Error early_memremap of heap for heap walk\n", + SL_ERROR_HEAP_MAP); + + offset = *((u64 *)heap); + + /* +* After the
[PATCH v2 09/12] reboot: Secure Launch SEXIT support on reboot paths
If the MLE kernel is being powered off, rebooted or halted, then SEXIT must be called. Note that the SEXIT GETSEC leaf can only be called after a machine_shutdown() has been done on these paths. The machine_shutdown() is not called on a few paths like when poweroff action does not have a poweroff callback (into ACPI code) or when an emergency reset is done. In these cases, just the TXT registers are finalized but SEXIT is skipped. Signed-off-by: Ross Philipson --- arch/x86/kernel/reboot.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index b29657b..796cbfb 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -731,6 +732,7 @@ static void native_machine_restart(char *__unused) if (!reboot_force) machine_shutdown(); + slaunch_finalize(!reboot_force); __machine_emergency_restart(0); } @@ -741,6 +743,9 @@ static void native_machine_halt(void) tboot_shutdown(TB_SHUTDOWN_HALT); + /* SEXIT done after machine_shutdown() to meet TXT requirements */ + slaunch_finalize(1); + stop_this_cpu(NULL); } @@ -749,8 +754,12 @@ static void native_machine_power_off(void) if (pm_power_off) { if (!reboot_force) machine_shutdown(); + slaunch_finalize(!reboot_force); pm_power_off(); + } else { + slaunch_finalize(0); } + /* A fallback in case there is no PM info available */ tboot_shutdown(TB_SHUTDOWN_HALT); } @@ -778,6 +787,7 @@ void machine_shutdown(void) void machine_emergency_restart(void) { + slaunch_finalize(0); __machine_emergency_restart(1); } -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch
The IOMMU should always be set to default translated type after the PMRs are disabled to protect the MLE from DMA. Signed-off-by: Ross Philipson --- drivers/iommu/intel/iommu.c | 5 + drivers/iommu/iommu.c | 6 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index be35284..4f0256d 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include #include @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device *dev) */ static int device_def_domain_type(struct device *dev) { + /* Do not allow identity domain when Secure Launch is configured */ + if (slaunch_get_flags() & SL_FLAG_ACTIVE) + return IOMMU_DOMAIN_DMA; + if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d..d49b7dd 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -23,6 +23,7 @@ #include #include #include +#include #include static struct kset *iommu_group_kset; @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line) { if (cmd_line) iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; + + /* Do not allow identity domain when Secure Launch is configured */ + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE)) + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; } void iommu_set_default_translated(bool cmd_line) -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 08/12] kexec: Secure Launch kexec SEXIT support
Prior to running the next kernel via kexec, the Secure Launch code closes down private SMX resources and does an SEXIT. This allows the next kernel to start normally without any issues starting the APs etc. Signed-off-by: Ross Philipson --- arch/x86/kernel/slaunch.c | 71 +++ kernel/kexec_core.c | 4 +++ 2 files changed, 75 insertions(+) diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c index a24d384..8db557a 100644 --- a/arch/x86/kernel/slaunch.c +++ b/arch/x86/kernel/slaunch.c @@ -470,3 +470,74 @@ void __init slaunch_setup(void) vendor[3] == INTEL_CPUID_MFGID_EDX) slaunch_setup_intel(); } + +static inline void smx_getsec_sexit(void) +{ + asm volatile (".byte 0x0f,0x37\n" + : : "a" (SMX_X86_GETSEC_SEXIT)); +} + +void slaunch_finalize(int do_sexit) +{ + void __iomem *config; + u64 one = TXT_REGVALUE_ONE, val; + + if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) != + (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) + return; + + config = ioremap(TXT_PRIV_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES * +PAGE_SIZE); + if (!config) { + pr_emerg("Error SEXIT failed to ioremap TXT private reqs\n"); + return; + } + + /* Clear secrets bit for SEXIT */ + memcpy_toio(config + TXT_CR_CMD_NO_SECRETS, , sizeof(one)); + memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val)); + + /* Unlock memory configurations */ + memcpy_toio(config + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(one)); + memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val)); + + /* Close the TXT private register space */ + memcpy_toio(config + TXT_CR_CMD_CLOSE_PRIVATE, , sizeof(one)); + memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val)); + + /* +* Calls to iounmap are not being done because of the state of the +* system this late in the kexec process. Local IRQs are disabled and +* iounmap causes a TLB flush which in turn causes a warning. Leaving +* thse mappings is not an issue since the next kernel is going to +* completely re-setup memory management. +*/ + + /* Map public registers and do a final read fence */ + config = ioremap(TXT_PUB_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES * +PAGE_SIZE); + if (!config) { + pr_emerg("Error SEXIT failed to ioremap TXT public reqs\n"); + return; + } + + memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val)); + + pr_emerg("TXT clear secrets bit and unlock memory complete."); + + if (!do_sexit) + return; + + if (smp_processor_id() != 0) { + pr_emerg("Error TXT SEXIT must be called on CPU 0\n"); + return; + } + + /* Disable SMX mode */ + cr4_set_bits(X86_CR4_SMXE); + + /* Do the SEXIT SMX operation */ + smx_getsec_sexit(); + + pr_emerg("TXT SEXIT complete."); +} diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index f099bae..1dcf20c 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include @@ -1178,6 +1179,9 @@ int kernel_kexec(void) cpu_hotplug_enable(); pr_notice("Starting new kernel\n"); machine_shutdown(); + + /* Finalize TXT registers and do SEXIT */ + slaunch_finalize(1); } kmsg_dump(KMSG_DUMP_SHUTDOWN); -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 11/12] tpm: Allow locality 2 to be set when initializing the TPM for Secure Launch
The Secure Launch MLE environment uses PCRs that are only accessible from the DRTM locality 2. By default the TPM drivers always initialize the locality to 0. When a Secure Launch is in progress, initialize the locality to 2. Signed-off-by: Ross Philipson --- drivers/char/tpm/tpm-chip.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index ddaeceb..48b9351 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -23,6 +23,7 @@ #include #include #include +#include #include "tpm.h" DEFINE_IDR(dev_nums_idr); @@ -34,12 +35,20 @@ static int tpm_request_locality(struct tpm_chip *chip) { - int rc; + int rc, locality; if (!chip->ops->request_locality) return 0; - rc = chip->ops->request_locality(chip, 0); + if (slaunch_get_flags() & SL_FLAG_ACTIVE) { + dev_dbg(>dev, "setting TPM locality to 2 for MLE\n"); + locality = 2; + } else { + dev_dbg(>dev, "setting TPM locality to 0\n"); + locality = 0; + } + + rc = chip->ops->request_locality(chip, locality); if (rc < 0) return rc; -- 1.8.3.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Plan for /dev/ioasid RFC v2
On Fri, 18 Jun 2021 08:37:35 -0700 "Raj, Ashok" wrote: > On Fri, Jun 18, 2021 at 12:15:06PM -0300, Jason Gunthorpe wrote: > > On Fri, Jun 18, 2021 at 03:47:51PM +0200, Joerg Roedel wrote: > > > Hi Kevin, > > > > > > On Thu, Jun 17, 2021 at 07:31:03AM +, Tian, Kevin wrote: > > > > Now let's talk about the new IOMMU behavior: > > > > > > > > - A device is blocked from doing DMA to any resource outside of > > > > its group when it's probed by the IOMMU driver. This could be a > > > > special state w/o attaching to any domain, or a new special domain > > > > type which differentiates it from existing domain types (identity, > > > > dma, or unmanged). Actually existing code already includes a > > > > IOMMU_DOMAIN_BLOCKED type but nobody uses it. > > > > > > There is a reason for the default domain to exist: Devices which require > > > RMRR mappings to be present. You can't just block all DMA from devices > > > until a driver takes over, we put much effort into making sure there is > > > not even a small window in time where RMRR regions (unity mapped regions > > > on AMD) are not mapped. > > > > Yes, I think the DMA blocking can only start around/after a VFIO type > > driver has probed() and bound to a device in the group, not much > > different from today. > > Does this mean when a device has a required "RMRR" that requires a unity > mapping we block assigning those devices to guests? I remember we had some > restriction but there was a need to go around it at some point in time. > > - Either we disallow assigning devices with RMRR > - Break that unity map when the device is probed and after which any RMRR > access from device will fault. We currently disallow assignment of RMRR encumbered devices except for the known cases of USB and IGD. In the general case, an RMRR imposes a requirement on the host system to maintain ranges of identity mapping that is incompatible with userspace ownership of the device and IOVA address space. AFAICT, nothing changes in the /dev/iommu model that would make it safe to entrust userspace with RMRR encumbered devices. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
Hi Rob, On 6/15/21 2:15 PM, Rob Herring wrote: > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. > > This condition is partially checked with the meta-schema already, but > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > An improved meta-schema is pending. > > Cc: Jens Axboe > Cc: Stephen Boyd > Cc: Herbert Xu > Cc: "David S. Miller" > Cc: David Airlie > Cc: Daniel Vetter > Cc: Vinod Koul > Cc: Bartosz Golaszewski > Cc: Kamal Dasu > Cc: Jonathan Cameron > Cc: Lars-Peter Clausen > Cc: Thomas Gleixner > Cc: Marc Zyngier > Cc: Joerg Roedel > Cc: Jassi Brar > Cc: Mauro Carvalho Chehab > Cc: Krzysztof Kozlowski > Cc: Ulf Hansson > Cc: Jakub Kicinski > Cc: Wolfgang Grandegger > Cc: Marc Kleine-Budde > Cc: Andrew Lunn > Cc: Vivien Didelot > Cc: Vladimir Oltean > Cc: Bjorn Helgaas > Cc: Kishon Vijay Abraham I > Cc: Linus Walleij > Cc: "Uwe Kleine-König" > Cc: Lee Jones > Cc: Ohad Ben-Cohen > Cc: Mathieu Poirier > Cc: Philipp Zabel > Cc: Paul Walmsley > Cc: Palmer Dabbelt > Cc: Albert Ou > Cc: Alessandro Zummo > Cc: Alexandre Belloni > Cc: Greg Kroah-Hartman > Cc: Mark Brown > Cc: Zhang Rui > Cc: Daniel Lezcano > Cc: Wim Van Sebroeck > Cc: Guenter Roeck > Signed-off-by: Rob Herring > --- > .../devicetree/bindings/ata/nvidia,tegra-ahci.yaml | 1 - > .../devicetree/bindings/clock/allwinner,sun4i-a10-ccu.yaml | 2 -- > .../devicetree/bindings/clock/qcom,gcc-apq8064.yaml | 1 - > Documentation/devicetree/bindings/clock/qcom,gcc-sdx55.yaml | 2 -- > .../devicetree/bindings/clock/qcom,gcc-sm8350.yaml | 2 -- > .../devicetree/bindings/clock/sprd,sc9863a-clk.yaml | 1 - > .../devicetree/bindings/crypto/allwinner,sun8i-ce.yaml | 2 -- > Documentation/devicetree/bindings/crypto/fsl-dcp.yaml | 1 - > .../display/allwinner,sun4i-a10-display-backend.yaml| 6 -- > .../bindings/display/allwinner,sun6i-a31-mipi-dsi.yaml | 1 - > .../bindings/display/allwinner,sun8i-a83t-dw-hdmi.yaml | 4 > .../bindings/display/allwinner,sun8i-a83t-hdmi-phy.yaml | 2 -- > .../bindings/display/allwinner,sun8i-r40-tcon-top.yaml | 2 -- > .../devicetree/bindings/display/bridge/cdns,mhdp8546.yaml | 2 -- > .../bindings/display/rockchip/rockchip,dw-hdmi.yaml | 2 -- > Documentation/devicetree/bindings/display/st,stm32-dsi.yaml | 2 -- > .../devicetree/bindings/display/st,stm32-ltdc.yaml | 1 - > .../devicetree/bindings/display/xlnx/xlnx,zynqmp-dpsub.yaml | 4 > .../devicetree/bindings/dma/renesas,rcar-dmac.yaml | 1 - > .../devicetree/bindings/edac/amazon,al-mc-edac.yaml | 2 -- > Documentation/devicetree/bindings/eeprom/at24.yaml | 1 - > Documentation/devicetree/bindings/example-schema.yaml | 2 -- > Documentation/devicetree/bindings/gpu/brcm,bcm-v3d.yaml | 1 - > Documentation/devicetree/bindings/gpu/vivante,gc.yaml | 1 - > Documentation/devicetree/bindings/i2c/brcm,brcmstb-i2c.yaml | 1 - > .../devicetree/bindings/i2c/marvell,mv64xxx-i2c.yaml| 2 -- > .../devicetree/bindings/i2c/mellanox,i2c-mlxbf.yaml | 1 - > .../devicetree/bindings/iio/adc/amlogic,meson-saradc.yaml | 1 - > .../devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.yaml | 2 -- > .../bindings/interrupt-controller/fsl,irqsteer.yaml | 1 - > .../bindings/interrupt-controller/loongson,liointc.yaml | 1 - > Documentation/devicetree/bindings/iommu/arm,smmu-v3.yaml| 1 - > .../devicetree/bindings/iommu/renesas,ipmmu-vmsa.yaml | 1 - > .../devicetree/bindings/mailbox/st,stm32-ipcc.yaml | 2 -- > .../devicetree/bindings/media/amlogic,gx-vdec.yaml | 1 - > Documentation/devicetree/bindings/media/i2c/adv7604.yaml| 1 - > .../devicetree/bindings/media/marvell,mmp2-ccic.yaml| 1 - > .../devicetree/bindings/media/qcom,sc7180-venus.yaml| 1 - > .../devicetree/bindings/media/qcom,sdm845-venus-v2.yaml | 1 - > .../devicetree/bindings/media/qcom,sm8250-venus.yaml| 1 - > Documentation/devicetree/bindings/media/renesas,drif.yaml | 1 - > .../bindings/memory-controllers/mediatek,smi-common.yaml| 6 ++ > .../bindings/memory-controllers/mediatek,smi-larb.yaml | 1 - > .../devicetree/bindings/mmc/allwinner,sun4i-a10-mmc.yaml| 2 -- > Documentation/devicetree/bindings/mmc/fsl-imx-esdhc.yaml| 1 - > Documentation/devicetree/bindings/mmc/mtk-sd.yaml | 2 -- > Documentation/devicetree/bindings/mmc/renesas,sdhi.yaml | 2 -- > Documentation/devicetree/bindings/mmc/sdhci-am654.yaml | 1 - > Documentation/devicetree/bindings/mmc/sdhci-pxa.yaml| 1 - > .../devicetree/bindings/net/amlogic,meson-dwmac.yaml
Re: Plan for /dev/ioasid RFC v2
On Fri, Jun 18, 2021 at 12:15:06PM -0300, Jason Gunthorpe wrote: > On Fri, Jun 18, 2021 at 03:47:51PM +0200, Joerg Roedel wrote: > > Hi Kevin, > > > > On Thu, Jun 17, 2021 at 07:31:03AM +, Tian, Kevin wrote: > > > Now let's talk about the new IOMMU behavior: > > > > > > - A device is blocked from doing DMA to any resource outside of > > > its group when it's probed by the IOMMU driver. This could be a > > > special state w/o attaching to any domain, or a new special domain > > > type which differentiates it from existing domain types (identity, > > > dma, or unmanged). Actually existing code already includes a > > > IOMMU_DOMAIN_BLOCKED type but nobody uses it. > > > > There is a reason for the default domain to exist: Devices which require > > RMRR mappings to be present. You can't just block all DMA from devices > > until a driver takes over, we put much effort into making sure there is > > not even a small window in time where RMRR regions (unity mapped regions > > on AMD) are not mapped. > > Yes, I think the DMA blocking can only start around/after a VFIO type > driver has probed() and bound to a device in the group, not much > different from today. Does this mean when a device has a required "RMRR" that requires a unity mapping we block assigning those devices to guests? I remember we had some restriction but there was a need to go around it at some point in time. - Either we disallow assigning devices with RMRR - Break that unity map when the device is probed and after which any RMRR access from device will fault. Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 5/5] iommu/virtio: Enable x86 support
With the VIOT support in place, x86 platforms can now use the virtio-iommu. Because the other x86 IOMMU drivers aren't yet ready to use the acpi_dma_setup() path, x86 doesn't implement arch_setup_dma_ops() at the moment. Similarly to Vt-d and AMD IOMMU, clear the DMA ops and call iommu_setup_dma_ops() from probe_finalize(). Acked-by: Joerg Roedel Acked-by: Michael S. Tsirkin Tested-by: Eric Auger Reviewed-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/Kconfig| 3 ++- drivers/iommu/dma-iommu.c| 1 + drivers/iommu/virtio-iommu.c | 11 +++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index aff8a4830dd1..07b7c25cbed8 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -400,8 +400,9 @@ config HYPERV_IOMMU config VIRTIO_IOMMU tristate "Virtio IOMMU driver" depends on VIRTIO - depends on ARM64 + depends on (ARM64 || X86) select IOMMU_API + select IOMMU_DMA select INTERVAL_TREE select ACPI_VIOT if ACPI help diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index c62e19bed302..9dbbc95c8189 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1330,6 +1330,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit) pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA ops\n", dev_name(dev)); } +EXPORT_SYMBOL_GPL(iommu_setup_dma_ops); static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, phys_addr_t msi_addr, struct iommu_domain *domain) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index c6e5ee4d9cef..fe581f0c9b3a 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -904,6 +905,15 @@ static struct iommu_device *viommu_probe_device(struct device *dev) return ERR_PTR(ret); } +static void viommu_probe_finalize(struct device *dev) +{ +#ifndef CONFIG_ARCH_HAS_SETUP_DMA_OPS + /* First clear the DMA ops in case we're switching from a DMA domain */ + set_dma_ops(dev, NULL); + iommu_setup_dma_ops(dev, 0, U64_MAX); +#endif +} + static void viommu_release_device(struct device *dev) { struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); @@ -940,6 +950,7 @@ static struct iommu_ops viommu_ops = { .iova_to_phys = viommu_iova_to_phys, .iotlb_sync = viommu_iotlb_sync, .probe_device = viommu_probe_device, + .probe_finalize = viommu_probe_finalize, .release_device = viommu_release_device, .device_group = viommu_device_group, .get_resv_regions = viommu_get_resv_regions, -- 2.32.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 3/5] ACPI: Add driver for the VIOT table
The ACPI Virtual I/O Translation Table describes topology of para-virtual platforms, similarly to vendor tables DMAR, IVRS and IORT. For now it describes the relation between virtio-iommu and the endpoints it manages. Three steps are needed to configure DMA of endpoints: (1) acpi_viot_init(): parse the VIOT table, find or create the fwnode associated to each vIOMMU device. This needs to happen after acpi_scan_init(), because it relies on the struct device and their fwnode to be available. (2) When probing the vIOMMU device, the driver registers its IOMMU ops within the IOMMU subsystem. This step doesn't require any intervention from the VIOT driver. (3) viot_iommu_configure(): before binding the endpoint to a driver, find the associated IOMMU ops. Register them, along with the endpoint ID, into the device's iommu_fwspec. If step (3) happens before step (2), it is deferred until the IOMMU is initialized, then retried. Tested-by: Eric Auger Reviewed-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/Kconfig | 3 + drivers/iommu/Kconfig | 1 + drivers/acpi/Makefile | 2 + include/linux/acpi_viot.h | 19 ++ drivers/acpi/bus.c| 2 + drivers/acpi/scan.c | 3 + drivers/acpi/viot.c | 366 ++ MAINTAINERS | 8 + 8 files changed, 404 insertions(+) create mode 100644 include/linux/acpi_viot.h create mode 100644 drivers/acpi/viot.c diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index eedec61e3476..3758c6940ed7 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -526,6 +526,9 @@ endif source "drivers/acpi/pmic/Kconfig" +config ACPI_VIOT + bool + endif # ACPI config X86_PM_TIMER diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 1f111b399bca..aff8a4830dd1 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -403,6 +403,7 @@ config VIRTIO_IOMMU depends on ARM64 select IOMMU_API select INTERVAL_TREE + select ACPI_VIOT if ACPI help Para-virtualised IOMMU driver with virtio. diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 700b41adf2db..a6e644c48987 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -118,3 +118,5 @@ video-objs += acpi_video.o video_detect.o obj-y += dptf/ obj-$(CONFIG_ARM64)+= arm64/ + +obj-$(CONFIG_ACPI_VIOT)+= viot.o diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h new file mode 100644 index ..1eb8ee5b0e5f --- /dev/null +++ b/include/linux/acpi_viot.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef __ACPI_VIOT_H__ +#define __ACPI_VIOT_H__ + +#include + +#ifdef CONFIG_ACPI_VIOT +void __init acpi_viot_init(void); +int viot_iommu_configure(struct device *dev); +#else +static inline void acpi_viot_init(void) {} +static inline int viot_iommu_configure(struct device *dev) +{ + return -ENODEV; +} +#endif + +#endif /* __ACPI_VIOT_H__ */ diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index a4bd673934c0..d6f4e2f06fdb 100644 --- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -27,6 +27,7 @@ #include #endif #include +#include #include #include #include @@ -1334,6 +1335,7 @@ static int __init acpi_init(void) acpi_wakeup_device_init(); acpi_debugger_init(); acpi_setup_sb_notify_handler(); + acpi_viot_init(); return 0; } diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 2a2e690040e9..3e2bb04ab528 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -1556,6 +1557,8 @@ static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev, return ops; err = iort_iommu_configure_id(dev, id_in); + if (err && err != -EPROBE_DEFER) + err = viot_iommu_configure(dev); /* * If we have reason to believe the IOMMU driver missed the initial diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c new file mode 100644 index ..d2256326c73a --- /dev/null +++ b/drivers/acpi/viot.c @@ -0,0 +1,366 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Virtual I/O topology + * + * The Virtual I/O Translation Table (VIOT) describes the topology of + * para-virtual IOMMUs and the endpoints they manage. The OS uses it to + * initialize devices in the right order, preventing endpoints from issuing DMA + * before their IOMMU is ready. + * + * When binding a driver to a device, before calling the device driver's probe() + * method, the driver infrastructure calls dma_configure(). At that point the + * VIOT driver looks for an IOMMU associated to the device in the VIOT table. + * If an IOMMU exists and has been initialized, the VIOT driver initializes the + * device's IOMMU
[PATCH v5 4/5] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()
Passing a 64-bit address width to iommu_setup_dma_ops() is valid on virtual platforms, but isn't currently possible. The overflow check in iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass a limit address instead of a size, so callers don't have to fake a size to work around the check. The base and limit parameters are being phased out, because: * they are redundant for x86 callers. dma-iommu already reserves the first page, and the upper limit is already in domain->geometry. * they can now be obtained from dev->dma_range_map on Arm. But removing them on Arm isn't completely straightforward so is left for future work. As an intermediate step, simplify the x86 callers by passing dummy limits. Signed-off-by: Jean-Philippe Brucker --- include/linux/dma-iommu.h | 4 ++-- arch/arm64/mm/dma-mapping.c | 2 +- drivers/iommu/amd/iommu.c | 2 +- drivers/iommu/dma-iommu.c | 12 ++-- drivers/iommu/intel/iommu.c | 5 + 5 files changed, 11 insertions(+), 14 deletions(-) diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 6e75a2d689b4..758ca4694257 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base); void iommu_put_dma_cookie(struct iommu_domain *domain); /* Setup call for arch DMA mapping code */ -void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size); +void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); /* The DMA API isn't _quite_ the whole story, though... */ /* @@ -50,7 +50,7 @@ struct msi_msg; struct device; static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base, - u64 size) + u64 dma_limit) { } diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 4bf1dd3eb041..6719f9efea09 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, dev->dma_coherent = coherent; if (iommu) - iommu_setup_dma_ops(dev, dma_base, size); + iommu_setup_dma_ops(dev, dma_base, dma_base + size - 1); #ifdef CONFIG_XEN if (xen_swiotlb_detect()) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 3ac42bbdefc6..216323fb27ef 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev) /* Domains are initialized for this device - have a look what we ended up with */ domain = iommu_get_domain_for_dev(dev); if (domain->type == IOMMU_DOMAIN_DMA) - iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0); + iommu_setup_dma_ops(dev, 0, U64_MAX); else set_dma_ops(dev, NULL); } diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 7bcdd1205535..c62e19bed302 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev) * iommu_dma_init_domain - Initialise a DMA mapping domain * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() * @base: IOVA at which the mappable address space starts - * @size: Size of IOVA space + * @limit: Last address of the IOVA space * @dev: Device the domain is being initialised for * - * @base and @size should be exact multiples of IOMMU page granularity to + * @base and @limit + 1 should be exact multiples of IOMMU page granularity to * avoid rounding surprises. If necessary, we reserve the page at address 0 * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but * any change which could make prior IOVAs invalid will fail. */ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, - u64 size, struct device *dev) +dma_addr_t limit, struct device *dev) { struct iommu_dma_cookie *cookie = domain->iova_cookie; unsigned long order, base_pfn; @@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, /* Check the domain allows at least some access to the device... */ if (domain->geometry.force_aperture) { if (base > domain->geometry.aperture_end || - base + size <= domain->geometry.aperture_start) { + limit < domain->geometry.aperture_start) { pr_warn("specified DMA range outside IOMMU capability\n"); return -EFAULT; } @@ -1308,7 +1308,7 @@ static const struct dma_map_ops iommu_dma_ops = { * The IOMMU core code allocates the default DMA domain, which the underlying * IOMMU driver needs to support via the dma-iommu layer. */ -void iommu_setup_dma_ops(struct
[PATCH v5 2/5] ACPI: Move IOMMU setup code out of IORT
Extract the code that sets up the IOMMU infrastructure from IORT, since it can be reused by VIOT. Move it one level up into a new acpi_iommu_configure_id() function, which calls the IORT parsing function which in turn calls the acpi_iommu_fwspec_init() helper. Signed-off-by: Jean-Philippe Brucker --- include/acpi/acpi_bus.h | 3 ++ include/linux/acpi_iort.h | 8 ++--- drivers/acpi/arm64/iort.c | 74 +-- drivers/acpi/scan.c | 73 +- 4 files changed, 86 insertions(+), 72 deletions(-) diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 3a82faac5767..41f092a269f6 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -588,6 +588,9 @@ struct acpi_pci_root { bool acpi_dma_supported(struct acpi_device *adev); enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev); +int acpi_iommu_fwspec_init(struct device *dev, u32 id, + struct fwnode_handle *fwnode, + const struct iommu_ops *ops); int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset, u64 *size); int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr, diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index f7f054833afd..f1f0842a2cb2 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev); int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); /* IOMMU interface */ int iort_dma_get_ranges(struct device *dev, u64 *size); -const struct iommu_ops *iort_iommu_configure_id(struct device *dev, - const u32 *id_in); +int iort_iommu_configure_id(struct device *dev, const u32 *id_in); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); phys_addr_t acpi_iort_dma_get_max_cpu_address(void); #else @@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device *dev) { } /* IOMMU interface */ static inline int iort_dma_get_ranges(struct device *dev, u64 *size) { return -ENODEV; } -static inline const struct iommu_ops *iort_iommu_configure_id( - struct device *dev, const u32 *id_in) -{ return NULL; } +static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in) +{ return -ENODEV; } static inline int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { return 0; } diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index a940be1cf2af..487d1095030d 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -806,23 +806,6 @@ static struct acpi_iort_node *iort_get_msi_resv_iommu(struct device *dev) return NULL; } -static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device *dev) -{ - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); - - return (fwspec && fwspec->ops) ? fwspec->ops : NULL; -} - -static inline int iort_add_device_replay(struct device *dev) -{ - int err = 0; - - if (dev->bus && !device_iommu_mapped(dev)) - err = iommu_probe_device(dev); - - return err; -} - /** * iort_iommu_msi_get_resv_regions - Reserved region driver helper * @dev: Device from iommu_get_resv_regions() @@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type) } } -static int arm_smmu_iort_xlate(struct device *dev, u32 streamid, - struct fwnode_handle *fwnode, - const struct iommu_ops *ops) -{ - int ret = iommu_fwspec_init(dev, fwnode, ops); - - if (!ret) - ret = iommu_fwspec_add_ids(dev, , 1); - - return ret; -} - static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node) { struct acpi_iort_root_complex *pci_rc; @@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node, return iort_iommu_driver_enabled(node->type) ? -EPROBE_DEFER : -ENODEV; - return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops); + return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops); } struct iort_pci_alias_info { @@ -1020,24 +991,13 @@ static int iort_nc_iommu_map_id(struct device *dev, * @dev: device to configure * @id_in: optional input id const value pointer * - * Returns: iommu_ops pointer on configuration success - * NULL on configuration failure + * Returns: 0 on success, <0 on failure */ -const struct iommu_ops *iort_iommu_configure_id(struct device *dev, - const u32 *id_in) +int iort_iommu_configure_id(struct device *dev, const u32 *id_in) { struct acpi_iort_node *node; - const struct iommu_ops *ops; int err = -ENODEV; - /* -* If we already translated the
[PATCH v5 1/5] ACPI: arm64: Move DMA setup operations out of IORT
Extract generic DMA setup code out of IORT, so it can be reused by VIOT. Keep it in drivers/acpi/arm64 for now, since it could break x86 platforms that haven't run this code so far, if they have invalid tables. Reviewed-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/arm64/Makefile | 1 + include/linux/acpi.h| 3 +++ include/linux/acpi_iort.h | 6 ++--- drivers/acpi/arm64/dma.c| 50 ++ drivers/acpi/arm64/iort.c | 54 ++--- drivers/acpi/scan.c | 2 +- 6 files changed, 66 insertions(+), 50 deletions(-) create mode 100644 drivers/acpi/arm64/dma.c diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index 6ff50f4ed947..66acbe77f46e 100644 --- a/drivers/acpi/arm64/Makefile +++ b/drivers/acpi/arm64/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0-only obj-$(CONFIG_ACPI_IORT)+= iort.o obj-$(CONFIG_ACPI_GTDT)+= gtdt.o +obj-y += dma.o diff --git a/include/linux/acpi.h b/include/linux/acpi.h index c60745f657e9..7aaa9559cc19 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -259,9 +259,12 @@ void acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa); #ifdef CONFIG_ARM64 void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa); +void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size); #else static inline void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { } +static inline void +acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) { } #endif int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma); diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index 1a12baa58e40..f7f054833afd 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -34,7 +34,7 @@ struct irq_domain *iort_get_device_domain(struct device *dev, u32 id, void acpi_configure_pmsi_domain(struct device *dev); int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); /* IOMMU interface */ -void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size); +int iort_dma_get_ranges(struct device *dev, u64 *size); const struct iommu_ops *iort_iommu_configure_id(struct device *dev, const u32 *id_in); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); @@ -48,8 +48,8 @@ static inline struct irq_domain *iort_get_device_domain( { return NULL; } static inline void acpi_configure_pmsi_domain(struct device *dev) { } /* IOMMU interface */ -static inline void iort_dma_setup(struct device *dev, u64 *dma_addr, - u64 *size) { } +static inline int iort_dma_get_ranges(struct device *dev, u64 *size) +{ return -ENODEV; } static inline const struct iommu_ops *iort_iommu_configure_id( struct device *dev, const u32 *id_in) { return NULL; } diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c new file mode 100644 index ..f16739ad3cc0 --- /dev/null +++ b/drivers/acpi/arm64/dma.c @@ -0,0 +1,50 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include + +void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) +{ + int ret; + u64 end, mask; + u64 dmaaddr = 0, size = 0, offset = 0; + + /* +* If @dev is expected to be DMA-capable then the bus code that created +* it should have initialised its dma_mask pointer by this point. For +* now, we'll continue the legacy behaviour of coercing it to the +* coherent mask if not, but we'll no longer do so quietly. +*/ + if (!dev->dma_mask) { + dev_warn(dev, "DMA mask not set\n"); + dev->dma_mask = >coherent_dma_mask; + } + + if (dev->coherent_dma_mask) + size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1); + else + size = 1ULL << 32; + + ret = acpi_dma_get_range(dev, , , ); + if (ret == -ENODEV) + ret = iort_dma_get_ranges(dev, ); + if (!ret) { + /* +* Limit coherent and dma mask based on size retrieved from +* firmware. +*/ + end = dmaaddr + size - 1; + mask = DMA_BIT_MASK(ilog2(end) + 1); + dev->bus_dma_limit = end; + dev->coherent_dma_mask = min(dev->coherent_dma_mask, mask); + *dev->dma_mask = min(*dev->dma_mask, mask); + } + + *dma_addr = dmaaddr; + *dma_size = size; + + ret = dma_direct_set_offset(dev, dmaaddr + offset, dmaaddr, size); + + dev_dbg(dev, "dma_offset(%#08llx)%s\n", offset, ret ? " failed!" : ""); +} diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index 3912a1f6058e..a940be1cf2af 100644 ---
[PATCH v5 0/5] Add support for ACPI VIOT
Add a driver for the ACPI VIOT table, which provides topology information for para-virtual IOMMUs. Enable virtio-iommu on non-devicetree platforms, including x86. Since v4 [1]: * Fixes (comments, wrong argument, unused variable) * Removed patch 5 that wrongly moved set_dma_ops(dev, NULL) into dma-iommu. The simplification of limit parameters for x86 callers is now in patch 4. * Release ACPI table after parsing * Added review and tested tags, thanks for all the feedback! You can find a QEMU implementation at [2], with extra support for testing all VIOT nodes including MMIO-based endpoints and IOMMU. This series is at [3]. [1] https://lore.kernel.org/linux-iommu/20210610075130.67517-1-jean-phili...@linaro.org/ [2] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/acpi [3] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/acpi Jean-Philippe Brucker (5): ACPI: arm64: Move DMA setup operations out of IORT ACPI: Move IOMMU setup code out of IORT ACPI: Add driver for the VIOT table iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops() iommu/virtio: Enable x86 support drivers/acpi/Kconfig | 3 + drivers/iommu/Kconfig| 4 +- drivers/acpi/Makefile| 2 + drivers/acpi/arm64/Makefile | 1 + include/acpi/acpi_bus.h | 3 + include/linux/acpi.h | 3 + include/linux/acpi_iort.h| 14 +- include/linux/acpi_viot.h| 19 ++ include/linux/dma-iommu.h| 4 +- arch/arm64/mm/dma-mapping.c | 2 +- drivers/acpi/arm64/dma.c | 50 + drivers/acpi/arm64/iort.c| 128 ++-- drivers/acpi/bus.c | 2 + drivers/acpi/scan.c | 78 +++- drivers/acpi/viot.c | 366 +++ drivers/iommu/amd/iommu.c| 2 +- drivers/iommu/dma-iommu.c| 13 +- drivers/iommu/intel/iommu.c | 5 +- drivers/iommu/virtio-iommu.c | 11 ++ MAINTAINERS | 8 + 20 files changed, 581 insertions(+), 137 deletions(-) create mode 100644 include/linux/acpi_viot.h create mode 100644 drivers/acpi/arm64/dma.c create mode 100644 drivers/acpi/viot.c -- 2.32.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Plan for /dev/ioasid RFC v2
On Fri, Jun 18, 2021 at 03:47:51PM +0200, Joerg Roedel wrote: > Hi Kevin, > > On Thu, Jun 17, 2021 at 07:31:03AM +, Tian, Kevin wrote: > > Now let's talk about the new IOMMU behavior: > > > > - A device is blocked from doing DMA to any resource outside of > > its group when it's probed by the IOMMU driver. This could be a > > special state w/o attaching to any domain, or a new special domain > > type which differentiates it from existing domain types (identity, > > dma, or unmanged). Actually existing code already includes a > > IOMMU_DOMAIN_BLOCKED type but nobody uses it. > > There is a reason for the default domain to exist: Devices which require > RMRR mappings to be present. You can't just block all DMA from devices > until a driver takes over, we put much effort into making sure there is > not even a small window in time where RMRR regions (unity mapped regions > on AMD) are not mapped. Yes, I think the DMA blocking can only start around/after a VFIO type driver has probed() and bound to a device in the group, not much different from today. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions
On Fri, Jun 18, 2021 at 09:09:17AM -0500, Tom Lendacky wrote: > > swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem > > and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so > > swiotlb_init_with_tbl is also good. > > I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think > > it's clearer and safer. > > On x86, if the memset is done before set_memory_decrypted() and memory > encryption is active, then the memory will look like ciphertext afterwards > and not be zeroes. If zeroed memory is required, then a memset must be > done after the set_memory_decrypted() calls. Which should be fine - we don't care that the memory is cleared to 0, just that it doesn't leak other data. Maybe a comment would be useful, though, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions
On 6/18/21 1:25 AM, Claire Chang wrote: > On Fri, Jun 18, 2021 at 7:30 AM Stefano Stabellini > wrote: >> >> On Thu, 17 Jun 2021, Claire Chang wrote: >>> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct >>> initialization to make the code reusable. >>> >>> Signed-off-by: Claire Chang >>> Reviewed-by: Christoph Hellwig >>> Tested-by: Stefano Stabellini >>> Tested-by: Will Deacon >>> --- >>> kernel/dma/swiotlb.c | 50 ++-- >>> 1 file changed, 25 insertions(+), 25 deletions(-) >>> >>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c >>> index 52e2ac526757..47bb2a766798 100644 >>> --- a/kernel/dma/swiotlb.c >>> +++ b/kernel/dma/swiotlb.c >>> @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void) >>> memset(vaddr, 0, bytes); >>> } >>> >>> -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int >>> verbose) >>> +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t >>> start, >>> + unsigned long nslabs, bool late_alloc) >>> { >>> + void *vaddr = phys_to_virt(start); >>> unsigned long bytes = nslabs << IO_TLB_SHIFT, i; >>> + >>> + mem->nslabs = nslabs; >>> + mem->start = start; >>> + mem->end = mem->start + bytes; >>> + mem->index = 0; >>> + mem->late_alloc = late_alloc; >>> + spin_lock_init(>lock); >>> + for (i = 0; i < mem->nslabs; i++) { >>> + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); >>> + mem->slots[i].orig_addr = INVALID_PHYS_ADDR; >>> + mem->slots[i].alloc_size = 0; >>> + } >>> + memset(vaddr, 0, bytes); >>> +} >>> + >>> +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int >>> verbose) >>> +{ >>> struct io_tlb_mem *mem; >>> size_t alloc_size; >>> >>> @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned >>> long nslabs, int verbose) >>> if (!mem) >>> panic("%s: Failed to allocate %zu bytes align=0x%lx\n", >>> __func__, alloc_size, PAGE_SIZE); >>> - mem->nslabs = nslabs; >>> - mem->start = __pa(tlb); >>> - mem->end = mem->start + bytes; >>> - mem->index = 0; >>> - spin_lock_init(>lock); >>> - for (i = 0; i < mem->nslabs; i++) { >>> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); >>> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; >>> - mem->slots[i].alloc_size = 0; >>> - } >>> + >>> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); >>> >>> io_tlb_default_mem = mem; >>> if (verbose) >>> @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size) >>> int >>> swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) >>> { >>> - unsigned long bytes = nslabs << IO_TLB_SHIFT, i; >>> struct io_tlb_mem *mem; >>> + unsigned long bytes = nslabs << IO_TLB_SHIFT; >>> >>> if (swiotlb_force == SWIOTLB_NO_FORCE) >>> return 0; >>> @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long >>> nslabs) >>> if (!mem) >>> return -ENOMEM; >>> >>> - mem->nslabs = nslabs; >>> - mem->start = virt_to_phys(tlb); >>> - mem->end = mem->start + bytes; >>> - mem->index = 0; >>> - mem->late_alloc = 1; >>> - spin_lock_init(>lock); >>> - for (i = 0; i < mem->nslabs; i++) { >>> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); >>> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; >>> - mem->slots[i].alloc_size = 0; >>> - } >>> - >>> + memset(mem, 0, sizeof(*mem)); >>> + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true); >>> set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT); >>> - memset(tlb, 0, bytes); >> >> This is good for swiotlb_late_init_with_tbl. However I have just noticed >> that mem could also be allocated from swiotlb_init_with_tbl, in which >> case the zeroing is missing. I think we need another memset in >> swiotlb_init_with_tbl as well. Or maybe it could be better to have a >> single memset at the beginning of swiotlb_init_io_tlb_mem instead. Up to >> you. > > swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem > and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so > swiotlb_init_with_tbl is also good. > I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think > it's clearer and safer. On x86, if the memset is done before set_memory_decrypted() and memory encryption is active, then the memory will look like ciphertext afterwards and not be zeroes. If zeroed memory is required, then a memset must be done after the set_memory_decrypted() calls. Thanks, Tom > > [1] >
Re: Plan for /dev/ioasid RFC v2
Hi Kevin, On Thu, Jun 17, 2021 at 07:31:03AM +, Tian, Kevin wrote: > Now let's talk about the new IOMMU behavior: > > - A device is blocked from doing DMA to any resource outside of > its group when it's probed by the IOMMU driver. This could be a > special state w/o attaching to any domain, or a new special domain > type which differentiates it from existing domain types (identity, > dma, or unmanged). Actually existing code already includes a > IOMMU_DOMAIN_BLOCKED type but nobody uses it. There is a reason for the default domain to exist: Devices which require RMRR mappings to be present. You can't just block all DMA from devices until a driver takes over, we put much effort into making sure there is not even a small window in time where RMRR regions (unity mapped regions on AMD) are not mapped. And if a device has no RMRR regions defined, then the default domain will be identical to a blocking domain. Device driver bugs don't count here, as they can be fixed. The kernel trusts itself, so we can rely on drivers unmapping all of their DMA buffers. Maybe that should be checked by dma-debug to find violations there. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH][next] iommu/vt-d: Fix dereference of pointer info before it is null checked
On Fri, Jun 11, 2021 at 02:50:24PM +0100, Colin King wrote: > drivers/iommu/intel/iommu.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
On 2021/6/18 19:34, John Garry wrote: We only ever now set strict mode enabled in iommu_set_dma_strict(), so just remove the argument. Signed-off-by: John Garry Reviewed-by: Robin Murphy --- drivers/iommu/amd/init.c| 2 +- drivers/iommu/intel/iommu.c | 6 +++--- drivers/iommu/iommu.c | 5 ++--- include/linux/iommu.h | 2 +- 4 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 1e641cb6dddc..6e12a615117b 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3099,7 +3099,7 @@ static int __init parse_amd_iommu_options(char *str) for (; *str; ++str) { if (strncmp(str, "fullflush", 9) == 0) { pr_warn("amd_iommu=fullflush deprecated; use iommu.strict=1 instead\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } if (strncmp(str, "force_enable", 12) == 0) amd_iommu_force_enable = true; diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 0f9d8116..77d0834fb0df 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -454,7 +454,7 @@ static int __init intel_iommu_setup(char *str) iommu_dma_forcedac = true; } else if (!strncmp(str, "strict", 6)) { pr_warn("intel_iommu=strict deprecated; use iommu.strict=1 instead\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } else if (!strncmp(str, "sp_off", 6)) { pr_info("Disable supported super page\n"); intel_iommu_superpage = 0; @@ -4382,7 +4382,7 @@ int __init intel_iommu_init(void) */ if (cap_caching_mode(iommu->cap)) { pr_info_once("IOMMU batching disallowed due to virtualization\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } iommu_device_sysfs_add(>iommu, NULL, intel_iommu_groups, @@ -5699,7 +5699,7 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev *dev) } else if (dmar_map_gfx) { /* we have to ensure the gfx device is idle before we flush */ pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, quirk_calpella_no_shadow_gtt); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 60b1ec42e73b..ff221d3ddcbc 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str) } early_param("iommu.strict", iommu_dma_setup); -void iommu_set_dma_strict(bool strict) +void iommu_set_dma_strict(void) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + iommu_dma_strict = true; } bool iommu_get_dma_strict(struct iommu_domain *domain) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d448050bf7..754f67d6dd90 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain); int iommu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirks); -void iommu_set_dma_strict(bool val); +void iommu_set_dma_strict(void); bool iommu_get_dma_strict(struct iommu_domain *domain); extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev, Reviewed-by: Lu Baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options
On 2021/6/18 19:34, John Garry wrote: From: Zhen Lei Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set, as is current behaviour. Also delete global flag intel_iommu_strict: - In intel_iommu_setup(), call iommu_set_dma_strict(true) directly. Also remove the print, as iommu_subsys_init() prints the mode and we have already marked this param as deprecated. - For cap_caching_mode() check in intel_iommu_setup(), call iommu_set_dma_strict(true) directly; also reword the accompanying print with a level downgrade and also add the missing '\n'. - For Ironlake GPU, again call iommu_set_dma_strict(true) directly and keep the accompanying print. [jpg: Remove intel_iommu_strict] Signed-off-by: Zhen Lei Signed-off-by: John Garry --- drivers/iommu/Kconfig | 1 + drivers/iommu/intel/iommu.c | 15 ++- 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 0327a942fdb7..c214a36eb2dc 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,6 +94,7 @@ choice prompt "IOMMU default DMA IOTLB invalidation mode" depends on IOMMU_DMA + default IOMMU_DEFAULT_LAZY if INTEL_IOMMU default IOMMU_DEFAULT_STRICT help This option allows an IOMMU DMA IOTLB invalidation mode to be diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 29497113d748..0f9d8116 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -361,7 +361,6 @@ int intel_iommu_enabled = 0; EXPORT_SYMBOL_GPL(intel_iommu_enabled); static int dmar_map_gfx = 1; -static int intel_iommu_strict; static int intel_iommu_superpage = 1; static int iommu_identity_mapping; static int iommu_skip_te_disable; @@ -455,8 +454,7 @@ static int __init intel_iommu_setup(char *str) iommu_dma_forcedac = true; } else if (!strncmp(str, "strict", 6)) { pr_warn("intel_iommu=strict deprecated; use iommu.strict=1 instead\n"); - pr_info("Disable batched IOTLB flush\n"); - intel_iommu_strict = 1; + iommu_set_dma_strict(true); } else if (!strncmp(str, "sp_off", 6)) { pr_info("Disable supported super page\n"); intel_iommu_superpage = 0; @@ -4382,9 +4380,9 @@ int __init intel_iommu_init(void) * is likely to be much lower than the overhead of synchronizing * the virtual and physical IOMMU page-tables. */ - if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) { - pr_warn("IOMMU batching is disabled due to virtualization"); - intel_iommu_strict = 1; + if (cap_caching_mode(iommu->cap)) { + pr_info_once("IOMMU batching disallowed due to virtualization\n"); + iommu_set_dma_strict(true); } iommu_device_sysfs_add(>iommu, NULL, intel_iommu_groups, @@ -4393,7 +4391,6 @@ int __init intel_iommu_init(void) } up_read(_global_lock); - iommu_set_dma_strict(intel_iommu_strict); bus_set_iommu(_bus_type, _iommu_ops); if (si_domain && !hw_pass_through) register_memory_notifier(_iommu_memory_nb); @@ -5702,8 +5699,8 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev *dev) } else if (dmar_map_gfx) { /* we have to ensure the gfx device is idle before we flush */ pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n"); - intel_iommu_strict = 1; - } + iommu_set_dma_strict(true); + } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, quirk_calpella_no_shadow_gtt); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, quirk_calpella_no_shadow_gtt); Reviewed-by: Lu Baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 3/6] iommu: Enhance IOMMU default DMA mode build options
On 2021/6/18 19:34, John Garry wrote: From: Zhen Lei First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the opportunity to set {lazy|strict} mode as default at build time. Then put the two config options in an choice, as they are mutually exclusive. [jpg: Make choice between strict and lazy only (and not passthrough)] Signed-off-by: Zhen Lei Signed-off-by: John Garry Reviewed-by: Robin Murphy --- .../admin-guide/kernel-parameters.txt | 3 +- drivers/iommu/Kconfig | 40 +++ drivers/iommu/iommu.c | 2 +- 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 673952379900..a1b7c8526bb5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2046,9 +2046,10 @@ throughput at the cost of reduced device isolation. Will fall back to strict mode if not supported by the relevant IOMMU driver. - 1 - Strict mode (default). + 1 - Strict mode. DMA unmap operations invalidate IOMMU hardware TLBs synchronously. + unset - Use value of CONFIG_IOMMU_DEFAULT_{LAZY,STRICT}. Note: on x86, the default behaviour depends on the equivalent driver-specific parameters, but a strict mode explicitly specified by either method takes diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 1f111b399bca..0327a942fdb7 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -90,6 +90,46 @@ config IOMMU_DEFAULT_PASSTHROUGH If unsure, say N here. +choice + prompt "IOMMU default DMA IOTLB invalidation mode" + depends on IOMMU_DMA + + default IOMMU_DEFAULT_STRICT + help + This option allows an IOMMU DMA IOTLB invalidation mode to be + chosen at build time, to override the default mode of each ARCH, + removing the need to pass in kernel parameters through command line. + It is still possible to provide common boot params to override this + config. + + If unsure, keep the default. + +config IOMMU_DEFAULT_STRICT + bool "strict" + help + For every IOMMU DMA unmap operation, the flush operation of IOTLB and + the free operation of IOVA are guaranteed to be done in the unmap + function. + +config IOMMU_DEFAULT_LAZY + bool "lazy" + help + Support lazy mode, where for every IOMMU DMA unmap operation, the + flush operation of IOTLB and the free operation of IOVA are deferred. + They are only guaranteed to be done before the related IOVA will be + reused. + + The isolation provided in this mode is not as secure as STRICT mode, + such that a vulnerable time window may be created between the DMA + unmap and the mappings cached in the IOMMU IOTLB or device TLB + finally being invalidated, where the device could still access the + memory which has already been unmapped by the device driver. + However this mode may provide better performance in high throughput + scenarios, and is still considerably more secure than passthrough + mode or no IOMMU. + +endchoice + config OF_IOMMU def_bool y depends on OF && IOMMU_API diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index cf58949cc2f3..60b1ec42e73b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -29,7 +29,7 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); static unsigned int iommu_def_domain_type __read_mostly; -static bool iommu_dma_strict __read_mostly = true; +static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT); static u32 iommu_cmd_line __read_mostly; struct iommu_group { Reviewed-by: Lu Baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 2/6] iommu: Print strict or lazy mode at init time
On 2021/6/18 19:34, John Garry wrote: As well as the default domain type, it's useful to know whether strict or lazy for DMA domains, so add this info in a separate print. The (stict/lazy) mode may be also set via iommu.strict earlyparm, but this will be processed prior to iommu_subsys_init(), so that print will be accurate for drivers which don't set the mode via custom means. For the drivers which set the mode via custom means - AMD and Intel drivers - they maintain prints to inform a change in policy or that custom cmdline methods to change policy are deprecated. Signed-off-by: John Garry Reviewed-by: Robin Murphy --- drivers/iommu/iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 5419c4b9f27a..cf58949cc2f3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -138,6 +138,11 @@ static int __init iommu_subsys_init(void) (iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ? "(set via kernel command line)" : ""); + pr_info("DMA domain TLB invalidation policy: %s mode %s\n", + iommu_dma_strict ? "strict" : "lazy", + (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ? + "(set via kernel command line)" : ""); + return 0; } subsys_initcall(iommu_subsys_init); Reviewed-by: Lu Baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 1/6] iommu: Deprecate Intel and AMD cmdline methods to enable strict mode
On 2021/6/18 19:34, John Garry wrote: Now that the x86 drivers support iommu.strict, deprecate the custom methods. Signed-off-by: John Garry Acked-by: Robin Murphy --- Documentation/admin-guide/kernel-parameters.txt | 9 ++--- drivers/iommu/amd/init.c| 4 +++- drivers/iommu/intel/iommu.c | 1 + 3 files changed, 6 insertions(+), 8 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 30e9dd52464e..673952379900 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -290,10 +290,7 @@ amd_iommu= [HW,X86-64] Pass parameters to the AMD IOMMU driver in the system. Possible values are: - fullflush - enable flushing of IO/TLB entries when - they are unmapped. Otherwise they are - flushed before they will be reused, which - is a lot of faster + fullflush - Deprecated, equivalent to iommu.strict=1 off - do not initialize any AMD IOMMU found in the system force_isolation - Force device isolation for all @@ -1948,9 +1945,7 @@ this case, gfx device will use physical address for DMA. strict [Default Off] - With this option on every unmap_single operation will - result in a hardware IOTLB flush operation as opposed - to batching them for performance. + Deprecated, equivalent to iommu.strict=1. sp_off [Default Off] By default, super page will be supported if Intel IOMMU has the capability. With this option, super page will diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 46280e6e1535..3a2fb805f11e 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3098,8 +3098,10 @@ static int __init parse_amd_iommu_intr(char *str) static int __init parse_amd_iommu_options(char *str) { for (; *str; ++str) { - if (strncmp(str, "fullflush", 9) == 0) + if (strncmp(str, "fullflush", 9) == 0) { + pr_warn("amd_iommu=fullflush deprecated; use iommu.strict=1 instead\n"); amd_iommu_unmap_flush = true; + } if (strncmp(str, "force_enable", 12) == 0) amd_iommu_force_enable = true; if (strncmp(str, "off", 3) == 0) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index bd93c7ec879e..29497113d748 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -454,6 +454,7 @@ static int __init intel_iommu_setup(char *str) pr_warn("intel_iommu=forcedac deprecated; use iommu.forcedac instead\n"); iommu_dma_forcedac = true; } else if (!strncmp(str, "strict", 6)) { + pr_warn("intel_iommu=strict deprecated; use iommu.strict=1 instead\n"); pr_info("Disable batched IOTLB flush\n"); intel_iommu_strict = 1; } else if (!strncmp(str, "sp_off", 6)) { Reviewed-by: Lu Baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu: rockchip: Fix physical address decoding
Restore bits 39 to 32 at correct position. It reverses the operation done in rk_dma_addr_dte_v2(). Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2") Reported-by: Dan Carpenter Signed-off-by: Benjamin Gaignard --- drivers/iommu/rockchip-iommu.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 94b9d8e5b9a40..9febfb7f3025b 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -544,12 +544,14 @@ static inline u32 rk_dma_addr_dte(dma_addr_t dt_dma) } #define DT_HI_MASK GENMASK_ULL(39, 32) +#define DTE_BASE_HI_MASK GENMASK(11, 4) #define DT_SHIFT 28 static inline phys_addr_t rk_dte_addr_phys_v2(u32 addr) { - return (phys_addr_t)(addr & RK_DTE_PT_ADDRESS_MASK) | - ((addr & DT_HI_MASK) << DT_SHIFT); + u64 addr64 = addr; + return (phys_addr_t)(addr64 & RK_DTE_PT_ADDRESS_MASK) | + ((addr64 & DTE_BASE_HI_MASK) << DT_SHIFT); } static inline u32 rk_dma_addr_dte_v2(dma_addr_t dt_dma) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: rockchip: Fix physical address decoding
On 2021-06-18 12:46, Benjamin Gaignard wrote: Restore bits 39 to 32 at correct position. It reverses the operation done in rk_dma_addr_dte_v2(). Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2") Reported-by: Dan Carpenter Signed-off-by: Benjamin Gaignard --- drivers/iommu/rockchip-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 94b9d8e5b9a40..e55482a477080 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -544,12 +544,13 @@ static inline u32 rk_dma_addr_dte(dma_addr_t dt_dma) } #define DT_HI_MASK GENMASK_ULL(39, 32) +#define DTE_BASE_HI_MASK GENMASK(11, 4) #define DT_SHIFT 28 static inline phys_addr_t rk_dte_addr_phys_v2(u32 addr) { return (phys_addr_t)(addr & RK_DTE_PT_ADDRESS_MASK) | - ((addr & DT_HI_MASK) << DT_SHIFT); + ((addr & DTE_BASE_HI_MASK) << DT_SHIFT); I think this still has the problem that you're shifting off the top of addr *before* the cast takes effect. Robin. } static inline u32 rk_dma_addr_dte_v2(dma_addr_t dt_dma) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: rockchip: Fix physical address decoding
Restore bits 39 to 32 at correct position. It reverses the operation done in rk_dma_addr_dte_v2(). Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2") Reported-by: Dan Carpenter Signed-off-by: Benjamin Gaignard --- drivers/iommu/rockchip-iommu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index 94b9d8e5b9a40..e55482a477080 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -544,12 +544,13 @@ static inline u32 rk_dma_addr_dte(dma_addr_t dt_dma) } #define DT_HI_MASK GENMASK_ULL(39, 32) +#define DTE_BASE_HI_MASK GENMASK(11, 4) #define DT_SHIFT 28 static inline phys_addr_t rk_dte_addr_phys_v2(u32 addr) { return (phys_addr_t)(addr & RK_DTE_PT_ADDRESS_MASK) | - ((addr & DT_HI_MASK) << DT_SHIFT); + ((addr & DTE_BASE_HI_MASK) << DT_SHIFT); } static inline u32 rk_dma_addr_dte_v2(dma_addr_t dt_dma) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 5/6] iommu/amd: Add support for IOMMU default DMA mode build options
From: Zhen Lei Make IOMMU_DEFAULT_LAZY default for when AMD_IOMMU config is set, which matches current behaviour. For "fullflush" param, just call iommu_set_dma_strict(true) directly. Since we get a strict vs lazy mode print already in iommu_subsys_init(), and maintain a deprecation print when "fullflush" param is passed, drop the prints in amd_iommu_init_dma_ops(). Finally drop global flag amd_iommu_unmap_flush, as it has no longer has any purpose. [jpg: Rebase for relocated file and drop amd_iommu_unmap_flush] Signed-off-by: Zhen Lei Signed-off-by: John Garry --- drivers/iommu/Kconfig | 2 +- drivers/iommu/amd/amd_iommu_types.h | 6 -- drivers/iommu/amd/init.c| 3 +-- drivers/iommu/amd/iommu.c | 6 -- 4 files changed, 2 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index c214a36eb2dc..fd1ad28dd5ee 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,7 +94,7 @@ choice prompt "IOMMU default DMA IOTLB invalidation mode" depends on IOMMU_DMA - default IOMMU_DEFAULT_LAZY if INTEL_IOMMU + default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU) default IOMMU_DEFAULT_STRICT help This option allows an IOMMU DMA IOTLB invalidation mode to be diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 94c1a7a9876d..8dbe61e2b3c1 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -779,12 +779,6 @@ extern u16 amd_iommu_last_bdf; /* allocation bitmap for domain ids */ extern unsigned long *amd_iommu_pd_alloc_bitmap; -/* - * If true, the addresses will be flushed on unmap time, not when - * they are reused - */ -extern bool amd_iommu_unmap_flush; - /* Smallest max PASID supported by any IOMMU in the system */ extern u32 amd_iommu_max_pasid; diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 3a2fb805f11e..1e641cb6dddc 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -161,7 +161,6 @@ u16 amd_iommu_last_bdf; /* largest PCI device id we have to handle */ LIST_HEAD(amd_iommu_unity_map);/* a list of required unity mappings we find in ACPI */ -bool amd_iommu_unmap_flush;/* if true, flush on every unmap */ LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the system */ @@ -3100,7 +3099,7 @@ static int __init parse_amd_iommu_options(char *str) for (; *str; ++str) { if (strncmp(str, "fullflush", 9) == 0) { pr_warn("amd_iommu=fullflush deprecated; use iommu.strict=1 instead\n"); - amd_iommu_unmap_flush = true; + iommu_set_dma_strict(true); } if (strncmp(str, "force_enable", 12) == 0) amd_iommu_force_enable = true; diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index b1fbf2c83df5..32b541ee2e11 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1775,12 +1775,6 @@ void amd_iommu_domain_update(struct protection_domain *domain) static void __init amd_iommu_init_dma_ops(void) { swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0; - - if (amd_iommu_unmap_flush) - pr_info("IO/TLB flush on unmap enabled\n"); - else - pr_info("Lazy IO/TLB flushing enabled\n"); - iommu_set_dma_strict(amd_iommu_unmap_flush); } int __init amd_iommu_init_api(void) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
We only ever now set strict mode enabled in iommu_set_dma_strict(), so just remove the argument. Signed-off-by: John Garry Reviewed-by: Robin Murphy --- drivers/iommu/amd/init.c| 2 +- drivers/iommu/intel/iommu.c | 6 +++--- drivers/iommu/iommu.c | 5 ++--- include/linux/iommu.h | 2 +- 4 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 1e641cb6dddc..6e12a615117b 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3099,7 +3099,7 @@ static int __init parse_amd_iommu_options(char *str) for (; *str; ++str) { if (strncmp(str, "fullflush", 9) == 0) { pr_warn("amd_iommu=fullflush deprecated; use iommu.strict=1 instead\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } if (strncmp(str, "force_enable", 12) == 0) amd_iommu_force_enable = true; diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 0f9d8116..77d0834fb0df 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -454,7 +454,7 @@ static int __init intel_iommu_setup(char *str) iommu_dma_forcedac = true; } else if (!strncmp(str, "strict", 6)) { pr_warn("intel_iommu=strict deprecated; use iommu.strict=1 instead\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } else if (!strncmp(str, "sp_off", 6)) { pr_info("Disable supported super page\n"); intel_iommu_superpage = 0; @@ -4382,7 +4382,7 @@ int __init intel_iommu_init(void) */ if (cap_caching_mode(iommu->cap)) { pr_info_once("IOMMU batching disallowed due to virtualization\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } iommu_device_sysfs_add(>iommu, NULL, intel_iommu_groups, @@ -5699,7 +5699,7 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev *dev) } else if (dmar_map_gfx) { /* we have to ensure the gfx device is idle before we flush */ pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, quirk_calpella_no_shadow_gtt); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 60b1ec42e73b..ff221d3ddcbc 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str) } early_param("iommu.strict", iommu_dma_setup); -void iommu_set_dma_strict(bool strict) +void iommu_set_dma_strict(void) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + iommu_dma_strict = true; } bool iommu_get_dma_strict(struct iommu_domain *domain) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d448050bf7..754f67d6dd90 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain); int iommu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirks); -void iommu_set_dma_strict(bool val); +void iommu_set_dma_strict(void); bool iommu_get_dma_strict(struct iommu_domain *domain); extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options
From: Zhen Lei Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set, as is current behaviour. Also delete global flag intel_iommu_strict: - In intel_iommu_setup(), call iommu_set_dma_strict(true) directly. Also remove the print, as iommu_subsys_init() prints the mode and we have already marked this param as deprecated. - For cap_caching_mode() check in intel_iommu_setup(), call iommu_set_dma_strict(true) directly; also reword the accompanying print with a level downgrade and also add the missing '\n'. - For Ironlake GPU, again call iommu_set_dma_strict(true) directly and keep the accompanying print. [jpg: Remove intel_iommu_strict] Signed-off-by: Zhen Lei Signed-off-by: John Garry --- drivers/iommu/Kconfig | 1 + drivers/iommu/intel/iommu.c | 15 ++- 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 0327a942fdb7..c214a36eb2dc 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,6 +94,7 @@ choice prompt "IOMMU default DMA IOTLB invalidation mode" depends on IOMMU_DMA + default IOMMU_DEFAULT_LAZY if INTEL_IOMMU default IOMMU_DEFAULT_STRICT help This option allows an IOMMU DMA IOTLB invalidation mode to be diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 29497113d748..0f9d8116 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -361,7 +361,6 @@ int intel_iommu_enabled = 0; EXPORT_SYMBOL_GPL(intel_iommu_enabled); static int dmar_map_gfx = 1; -static int intel_iommu_strict; static int intel_iommu_superpage = 1; static int iommu_identity_mapping; static int iommu_skip_te_disable; @@ -455,8 +454,7 @@ static int __init intel_iommu_setup(char *str) iommu_dma_forcedac = true; } else if (!strncmp(str, "strict", 6)) { pr_warn("intel_iommu=strict deprecated; use iommu.strict=1 instead\n"); - pr_info("Disable batched IOTLB flush\n"); - intel_iommu_strict = 1; + iommu_set_dma_strict(true); } else if (!strncmp(str, "sp_off", 6)) { pr_info("Disable supported super page\n"); intel_iommu_superpage = 0; @@ -4382,9 +4380,9 @@ int __init intel_iommu_init(void) * is likely to be much lower than the overhead of synchronizing * the virtual and physical IOMMU page-tables. */ - if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) { - pr_warn("IOMMU batching is disabled due to virtualization"); - intel_iommu_strict = 1; + if (cap_caching_mode(iommu->cap)) { + pr_info_once("IOMMU batching disallowed due to virtualization\n"); + iommu_set_dma_strict(true); } iommu_device_sysfs_add(>iommu, NULL, intel_iommu_groups, @@ -4393,7 +4391,6 @@ int __init intel_iommu_init(void) } up_read(_global_lock); - iommu_set_dma_strict(intel_iommu_strict); bus_set_iommu(_bus_type, _iommu_ops); if (si_domain && !hw_pass_through) register_memory_notifier(_iommu_memory_nb); @@ -5702,8 +5699,8 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev *dev) } else if (dmar_map_gfx) { /* we have to ensure the gfx device is idle before we flush */ pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n"); - intel_iommu_strict = 1; - } + iommu_set_dma_strict(true); + } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, quirk_calpella_no_shadow_gtt); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, quirk_calpella_no_shadow_gtt); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 3/6] iommu: Enhance IOMMU default DMA mode build options
From: Zhen Lei First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the opportunity to set {lazy|strict} mode as default at build time. Then put the two config options in an choice, as they are mutually exclusive. [jpg: Make choice between strict and lazy only (and not passthrough)] Signed-off-by: Zhen Lei Signed-off-by: John Garry Reviewed-by: Robin Murphy --- .../admin-guide/kernel-parameters.txt | 3 +- drivers/iommu/Kconfig | 40 +++ drivers/iommu/iommu.c | 2 +- 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 673952379900..a1b7c8526bb5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2046,9 +2046,10 @@ throughput at the cost of reduced device isolation. Will fall back to strict mode if not supported by the relevant IOMMU driver. - 1 - Strict mode (default). + 1 - Strict mode. DMA unmap operations invalidate IOMMU hardware TLBs synchronously. + unset - Use value of CONFIG_IOMMU_DEFAULT_{LAZY,STRICT}. Note: on x86, the default behaviour depends on the equivalent driver-specific parameters, but a strict mode explicitly specified by either method takes diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 1f111b399bca..0327a942fdb7 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -90,6 +90,46 @@ config IOMMU_DEFAULT_PASSTHROUGH If unsure, say N here. +choice + prompt "IOMMU default DMA IOTLB invalidation mode" + depends on IOMMU_DMA + + default IOMMU_DEFAULT_STRICT + help + This option allows an IOMMU DMA IOTLB invalidation mode to be + chosen at build time, to override the default mode of each ARCH, + removing the need to pass in kernel parameters through command line. + It is still possible to provide common boot params to override this + config. + + If unsure, keep the default. + +config IOMMU_DEFAULT_STRICT + bool "strict" + help + For every IOMMU DMA unmap operation, the flush operation of IOTLB and + the free operation of IOVA are guaranteed to be done in the unmap + function. + +config IOMMU_DEFAULT_LAZY + bool "lazy" + help + Support lazy mode, where for every IOMMU DMA unmap operation, the + flush operation of IOTLB and the free operation of IOVA are deferred. + They are only guaranteed to be done before the related IOVA will be + reused. + + The isolation provided in this mode is not as secure as STRICT mode, + such that a vulnerable time window may be created between the DMA + unmap and the mappings cached in the IOMMU IOTLB or device TLB + finally being invalidated, where the device could still access the + memory which has already been unmapped by the device driver. + However this mode may provide better performance in high throughput + scenarios, and is still considerably more secure than passthrough + mode or no IOMMU. + +endchoice + config OF_IOMMU def_bool y depends on OF && IOMMU_API diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index cf58949cc2f3..60b1ec42e73b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -29,7 +29,7 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); static unsigned int iommu_def_domain_type __read_mostly; -static bool iommu_dma_strict __read_mostly = true; +static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT); static u32 iommu_cmd_line __read_mostly; struct iommu_group { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 2/6] iommu: Print strict or lazy mode at init time
As well as the default domain type, it's useful to know whether strict or lazy for DMA domains, so add this info in a separate print. The (stict/lazy) mode may be also set via iommu.strict earlyparm, but this will be processed prior to iommu_subsys_init(), so that print will be accurate for drivers which don't set the mode via custom means. For the drivers which set the mode via custom means - AMD and Intel drivers - they maintain prints to inform a change in policy or that custom cmdline methods to change policy are deprecated. Signed-off-by: John Garry Reviewed-by: Robin Murphy --- drivers/iommu/iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 5419c4b9f27a..cf58949cc2f3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -138,6 +138,11 @@ static int __init iommu_subsys_init(void) (iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ? "(set via kernel command line)" : ""); + pr_info("DMA domain TLB invalidation policy: %s mode %s\n", + iommu_dma_strict ? "strict" : "lazy", + (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ? + "(set via kernel command line)" : ""); + return 0; } subsys_initcall(iommu_subsys_init); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 0/6] iommu: Enhance IOMMU default DMA mode build options
This is a reboot of Zhen Lei's series from a couple of years ago, which never made it across the line. I still think that it has some value, so taking up the mantle. Motivation: Allow lazy mode be default mode for DMA domains for all ARCHs, and not only those who hardcode it (to be lazy). For ARM64, currently we must use a kernel command line parameter to use lazy mode, which is less than ideal. I have now included the print for strict/lazy mode, which I originally sent in: https://lore.kernel.org/linux-iommu/72eb3de9-1d1c-ae46-c5a9-95f26525d...@huawei.com/ There was some concern there about drivers and their custom prints conflicting with the print in that patch, but I think that it should be ok. Based on next-20210611 + "iommu: Update "iommu.strict" documentation" Differences to v13: - Improve strict mode deprecation messages and cut out some kernel-parameters.txt legacy description - Add tag in 1/6 - use pr_info_once() for vt-d message about VM and caching Differences to v12: - Add Robin's RB tags (thanks!) - Add a patch to mark x86 strict cmdline params as deprecated - Improve wording in Kconfig change and tweak iommu_dma_strict declaration Differences to v11: - Rebase to next-20210610 - Drop strict mode globals in Intel and AMD drivers - Include patch to print strict vs lazy mode - Include patch to remove argument from iommu_set_dma_strict() John Garry (3): iommu: Deprecate Intel and AMD cmdline methods to enable strict mode iommu: Print strict or lazy mode at init time iommu: Remove mode argument from iommu_set_dma_strict() Zhen Lei (3): iommu: Enhance IOMMU default DMA mode build options iommu/vt-d: Add support for IOMMU default DMA mode build options iommu/amd: Add support for IOMMU default DMA mode build options .../admin-guide/kernel-parameters.txt | 12 ++ drivers/iommu/Kconfig | 41 +++ drivers/iommu/amd/amd_iommu_types.h | 6 --- drivers/iommu/amd/init.c | 7 ++-- drivers/iommu/amd/iommu.c | 6 --- drivers/iommu/intel/iommu.c | 16 drivers/iommu/iommu.c | 12 -- include/linux/iommu.h | 2 +- 8 files changed, 65 insertions(+), 37 deletions(-) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v14 1/6] iommu: Deprecate Intel and AMD cmdline methods to enable strict mode
Now that the x86 drivers support iommu.strict, deprecate the custom methods. Signed-off-by: John Garry Acked-by: Robin Murphy --- Documentation/admin-guide/kernel-parameters.txt | 9 ++--- drivers/iommu/amd/init.c| 4 +++- drivers/iommu/intel/iommu.c | 1 + 3 files changed, 6 insertions(+), 8 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 30e9dd52464e..673952379900 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -290,10 +290,7 @@ amd_iommu= [HW,X86-64] Pass parameters to the AMD IOMMU driver in the system. Possible values are: - fullflush - enable flushing of IO/TLB entries when - they are unmapped. Otherwise they are - flushed before they will be reused, which - is a lot of faster + fullflush - Deprecated, equivalent to iommu.strict=1 off - do not initialize any AMD IOMMU found in the system force_isolation - Force device isolation for all @@ -1948,9 +1945,7 @@ this case, gfx device will use physical address for DMA. strict [Default Off] - With this option on every unmap_single operation will - result in a hardware IOTLB flush operation as opposed - to batching them for performance. + Deprecated, equivalent to iommu.strict=1. sp_off [Default Off] By default, super page will be supported if Intel IOMMU has the capability. With this option, super page will diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 46280e6e1535..3a2fb805f11e 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3098,8 +3098,10 @@ static int __init parse_amd_iommu_intr(char *str) static int __init parse_amd_iommu_options(char *str) { for (; *str; ++str) { - if (strncmp(str, "fullflush", 9) == 0) + if (strncmp(str, "fullflush", 9) == 0) { + pr_warn("amd_iommu=fullflush deprecated; use iommu.strict=1 instead\n"); amd_iommu_unmap_flush = true; + } if (strncmp(str, "force_enable", 12) == 0) amd_iommu_force_enable = true; if (strncmp(str, "off", 3) == 0) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index bd93c7ec879e..29497113d748 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -454,6 +454,7 @@ static int __init intel_iommu_setup(char *str) pr_warn("intel_iommu=forcedac deprecated; use iommu.forcedac instead\n"); iommu_dma_forcedac = true; } else if (!strncmp(str, "strict", 6)) { + pr_warn("intel_iommu=strict deprecated; use iommu.strict=1 instead\n"); pr_info("Disable batched IOTLB flush\n"); intel_iommu_strict = 1; } else if (!strncmp(str, "sp_off", 6)) { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()
On 2021-06-18 11:50, Jean-Philippe Brucker wrote: On Wed, Jun 16, 2021 at 06:02:39PM +0100, Robin Murphy wrote: diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index c62e19bed302..175f8eaeb5b3 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit) if (domain->type == IOMMU_DOMAIN_DMA) { if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev)) goto out_err; - dev->dma_ops = _dma_ops; + set_dma_ops(dev, _dma_ops); + } else { + set_dma_ops(dev, NULL); I'm not keen on moving this here, since iommu-dma only knows that its own ops are right for devices it *is* managing; it can't assume any particular ops are appropriate for devices it isn't. The idea here is that arch_setup_dma_ops() may have already set the appropriate ops for the non-IOMMU case, so if the default domain type is passthrough then we leave those in place. For example, I do still plan to revisit my conversion of arch/arm someday, at which point I'd have to undo this for that reason. Makes sense, I'll remove this bit. Simplifying the base and size arguments is of course fine, but TBH I'd say rip the whole bloody lot out of the arch_setup_dma_ops() flow now. It's a considerable faff passing them around for nothing but a tenuous sanity check in iommu_dma_init_domain(), and now that dev->dma_range_map is a common thing we should expect that to give us any relevant limitations if we even still care. So I started working on this but it gets too bulky for a preparatory patch. Dropping the parameters from arch_setup_dma_ops() seems especially complicated because arm32 does need the size parameter for IOMMU mappings and that value falls back to the bus DMA mask or U32_MAX in the absence of dma-ranges. I could try to dig into this for a separate series. Even only dropping the parameters from iommu_setup_dma_ops() isn't completely trivial (8 files changed, 55 insertions(+), 36 deletions(-) because we still need the lower IOVA limit from dma_range_map), so I'd rather send it separately and have it sit in -next for a while. Oh, sure, I didn't mean to imply that the whole cleanup should be within the scope of this series, just that we can shave off as much as we *do* need to touch here (which TBH is pretty much what you're doing already), and mainly to start taking the attitude that these arguments are now superseded and increasingly vestigial. I expected the cross-arch cleanup to be a bit fiddly, but I'd forgotten that arch/arm was still actively using these values, so maybe I can revisit this when I pick up my iommu-dma conversion again (I swear it's not dead, just resting!) Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()
On Wed, Jun 16, 2021 at 06:02:39PM +0100, Robin Murphy wrote: > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > index c62e19bed302..175f8eaeb5b3 100644 > > --- a/drivers/iommu/dma-iommu.c > > +++ b/drivers/iommu/dma-iommu.c > > @@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 > > dma_base, u64 dma_limit) > > if (domain->type == IOMMU_DOMAIN_DMA) { > > if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev)) > > goto out_err; > > - dev->dma_ops = _dma_ops; > > + set_dma_ops(dev, _dma_ops); > > + } else { > > + set_dma_ops(dev, NULL); > > I'm not keen on moving this here, since iommu-dma only knows that its own > ops are right for devices it *is* managing; it can't assume any particular > ops are appropriate for devices it isn't. The idea here is that > arch_setup_dma_ops() may have already set the appropriate ops for the > non-IOMMU case, so if the default domain type is passthrough then we leave > those in place. > > For example, I do still plan to revisit my conversion of arch/arm someday, > at which point I'd have to undo this for that reason. Makes sense, I'll remove this bit. > Simplifying the base and size arguments is of course fine, but TBH I'd say > rip the whole bloody lot out of the arch_setup_dma_ops() flow now. It's a > considerable faff passing them around for nothing but a tenuous sanity check > in iommu_dma_init_domain(), and now that dev->dma_range_map is a common > thing we should expect that to give us any relevant limitations if we even > still care. So I started working on this but it gets too bulky for a preparatory patch. Dropping the parameters from arch_setup_dma_ops() seems especially complicated because arm32 does need the size parameter for IOMMU mappings and that value falls back to the bus DMA mask or U32_MAX in the absence of dma-ranges. I could try to dig into this for a separate series. Even only dropping the parameters from iommu_setup_dma_ops() isn't completely trivial (8 files changed, 55 insertions(+), 36 deletions(-) because we still need the lower IOVA limit from dma_range_map), so I'd rather send it separately and have it sit in -next for a while. Thanks, Jean > > That said, those are all things which can be fixed up later if the series is > otherwise ready to go and there's still a chance of landing it for 5.14. If > you do have any other reason to respin, then I think the x86 probe_finalize > functions simply want an unconditional set_dma_ops(dev, NULL) before the > iommu_setup_dma_ops() call. > > Cheers, > Robin. > > > } > > return; > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > > index 85f18342603c..8d866940692a 100644 > > --- a/drivers/iommu/intel/iommu.c > > +++ b/drivers/iommu/intel/iommu.c > > @@ -5165,15 +5165,7 @@ static void intel_iommu_release_device(struct device > > *dev) > > static void intel_iommu_probe_finalize(struct device *dev) > > { > > - dma_addr_t base = IOVA_START_PFN << VTD_PAGE_SHIFT; > > - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > > - struct dmar_domain *dmar_domain = to_dmar_domain(domain); > > - > > - if (domain && domain->type == IOMMU_DOMAIN_DMA) > > - iommu_setup_dma_ops(dev, base, > > - __DOMAIN_MAX_ADDR(dmar_domain->gaw)); > > - else > > - set_dma_ops(dev, NULL); > > + iommu_setup_dma_ops(dev, 0, U64_MAX); > > } > > static void intel_iommu_get_resv_regions(struct device *device, > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 4/6] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()
On Wed, Jun 16, 2021 at 05:28:59PM +0200, Eric Auger wrote: > Hi Jean, > > On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote: > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > > index 4bf1dd3eb041..7bd1d2199141 100644 > > --- a/arch/arm64/mm/dma-mapping.c > > +++ b/arch/arm64/mm/dma-mapping.c > > @@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, > > u64 size, > > > > dev->dma_coherent = coherent; > > if (iommu) > > - iommu_setup_dma_ops(dev, dma_base, size); > > + iommu_setup_dma_ops(dev, dma_base, size - dma_base - 1); > I don't get size - dma_base - 1? Because it's wrong, should be dma_base + size - 1. Thanks for catching it! Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/6] ACPI: Move IOMMU setup code out of IORT
On 2021-06-18 08:41, Jean-Philippe Brucker wrote: Hi Eric, On Wed, Jun 16, 2021 at 11:35:13AM +0200, Eric Auger wrote: -const struct iommu_ops *iort_iommu_configure_id(struct device *dev, - const u32 *id_in) +int iort_iommu_configure_id(struct device *dev, const u32 *id_in) { struct acpi_iort_node *node; - const struct iommu_ops *ops; + const struct iommu_ops *ops = NULL; Oops, I need to remove this (and add -Werror to my tests.) +static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev, + const u32 *id_in) +{ + int err; + const struct iommu_ops *ops; + + /* +* If we already translated the fwspec there is nothing left to do, +* return the iommu_ops. +*/ + ops = acpi_iommu_fwspec_ops(dev); + if (ops) + return ops; + + err = iort_iommu_configure_id(dev, id_in); + + /* +* If we have reason to believe the IOMMU driver missed the initial +* add_device callback for dev, replay it to get things in order. +*/ + if (!err && dev->bus && !device_iommu_mapped(dev)) + err = iommu_probe_device(dev); Previously we had: if (!err) { ops = iort_fwspec_iommu_ops(dev); err = iort_add_device_replay(dev); } Please can you explain the transform? I see the acpi_iommu_fwspec_ops call below but is it not straightforward to me. I figured that iort_add_device_replay() is only used once and is sufficiently simple to be inlined manually (saving 10 lines). Then I replaced the ops assignment with returns, which saves another line and may be slightly clearer? I guess it's mostly a matter of taste, the behavior should be exactly the same. Right, IIRC the multiple assignments to ops were more of a haphazard evolution inherited from the DT version, and looking at it now I think the multiple-return is indeed a bit nicer. Similarly, it looks like the factoring out of iort_add_device_replay() was originally an attempt to encapsulate the IOMMU_API dependency, but things have moved around a lot since then, so that seems like a sensible simplification to make too. Robin. Also the comment mentions replay. Unsure if it is still OK. The "replay" part is, but "add_device" isn't accurate because it has since been replaced by probe_device. I'll refresh the comment. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] eventfd: Enlarge recursion limit to allow vhost to work
commit b5e683d5cab8 ("eventfd: track eventfd_signal() recursion depth") introduces a percpu counter that tracks the percpu recursion depth and warn if it greater than zero, to avoid potential deadlock and stack overflow. However sometimes different eventfds may be used in parallel. Specifically, when heavy network load goes through kvm and vhost, working as below, it would trigger the following call trace. - 100.00% - 66.51% ret_from_fork kthread - vhost_worker - 33.47% handle_tx_kick handle_tx handle_tx_copy vhost_tx_batch.isra.0 vhost_add_used_and_signal_n eventfd_signal - 33.05% handle_rx_net handle_rx vhost_add_used_and_signal_n eventfd_signal - 33.49% ioctl entry_SYSCALL_64_after_hwframe do_syscall_64 __x64_sys_ioctl ksys_ioctl do_vfs_ioctl kvm_vcpu_ioctl kvm_arch_vcpu_ioctl_run vmx_handle_exit handle_ept_misconfig kvm_io_bus_write __kvm_io_bus_write eventfd_signal 001: WARNING: CPU: 1 PID: 1503 at fs/eventfd.c:73 eventfd_signal+0x85/0xa0 snip 001: Call Trace: 001: vhost_signal+0x15e/0x1b0 [vhost] 001: vhost_add_used_and_signal_n+0x2b/0x40 [vhost] 001: handle_rx+0xb9/0x900 [vhost_net] 001: handle_rx_net+0x15/0x20 [vhost_net] 001: vhost_worker+0xbe/0x120 [vhost] 001: kthread+0x106/0x140 001: ? log_used.part.0+0x20/0x20 [vhost] 001: ? kthread_park+0x90/0x90 001: ret_from_fork+0x35/0x40 001: ---[ end trace 0003 ]--- This patch enlarges the limit to 1 which is the maximum recursion depth we have found so far. The credit of modification for eventfd_signal_count goes to Xie Yongji Signed-off-by: He Zhe --- fs/eventfd.c| 3 ++- include/linux/eventfd.h | 5 - 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index e265b6dd4f34..add6af91cacf 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -71,7 +71,8 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) * it returns true, the eventfd_signal() call should be deferred to a * safe context. */ - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count))) + if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) > + EFD_WAKE_COUNT_MAX)) return 0; spin_lock_irqsave(>wqh.lock, flags); diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index fa0a524baed0..74be152ebe87 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -29,6 +29,9 @@ #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE) +/* This is the maximum recursion depth we find so far */ +#define EFD_WAKE_COUNT_MAX 1 + struct eventfd_ctx; struct file; @@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count); static inline bool eventfd_signal_count(void) { - return this_cpu_read(eventfd_wake_count); + return this_cpu_read(eventfd_wake_count) > EFD_WAKE_COUNT_MAX; } #else /* CONFIG_EVENTFD */ -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
Hello ROb, On 6/15/21 9:15 PM, Rob Herring wrote: > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. > > This condition is partially checked with the meta-schema already, but > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > An improved meta-schema is pending. > [...] > .../devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.yaml | 2 -- [...] > .../devicetree/bindings/mailbox/st,stm32-ipcc.yaml | 2 -- [...] > .../devicetree/bindings/remoteproc/st,stm32-rproc.yaml | 2 -- [...] > Documentation/devicetree/bindings/sound/st,stm32-sai.yaml | 3 --- Reviewed-by: Arnaud Pouliquen Thanks, Arnaud ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 03/10] eventfd: Increase the recursion depth of eventfd_signal()
On 6/18/21 11:29 AM, Yongji Xie wrote: > On Thu, Jun 17, 2021 at 4:34 PM He Zhe wrote: >> >> >> On 6/15/21 10:13 PM, Xie Yongji wrote: >>> Increase the recursion depth of eventfd_signal() to 1. This >>> is the maximum recursion depth we have found so far, which >>> can be triggered with the following call chain: >>> >>> kvm_io_bus_write[kvm] >>> --> ioeventfd_write [kvm] >>> --> eventfd_signal [eventfd] >>> --> vhost_poll_wakeup [vhost] >>> --> vduse_vdpa_kick_vq [vduse] >>> --> eventfd_signal[eventfd] >>> >>> Signed-off-by: Xie Yongji >>> Acked-by: Jason Wang >> The fix had been posted one year ago. >> >> https://lore.kernel.org/lkml/20200410114720.24838-1-zhe...@windriver.com/ >> > OK, so it seems to be a fix for the RT system if my understanding is > correct? Any reason why it's not merged? I'm happy to rebase my series > on your patch if you'd like to repost it. It works for both mainline and RT kernel. The folks just reproduced in their RT environments. This patch somehow hasn't got maintainer's reply, so not merged yet. And OK, I'll resend the patch. > > BTW, I also notice another thread for this issue: > > https://lore.kernel.org/linux-fsdevel/dm6pr11mb420291b550a10853403c7592ff...@dm6pr11mb4202.namprd11.prod.outlook.com/T/ This is the same way as my v1 https://lore.kernel.org/lkml/3b4aa4cb-0e76-89c2-c48a-cf24e1a36...@kernel.dk/ which was not what the maintainer wanted. > >>> --- >>> fs/eventfd.c| 2 +- >>> include/linux/eventfd.h | 5 - >>> 2 files changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/eventfd.c b/fs/eventfd.c >>> index e265b6dd4f34..cc7cd1dbedd3 100644 >>> --- a/fs/eventfd.c >>> +++ b/fs/eventfd.c >>> @@ -71,7 +71,7 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) >>>* it returns true, the eventfd_signal() call should be deferred to a >>>* safe context. >>>*/ >>> - if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count))) >>> + if (WARN_ON_ONCE(this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH)) >>> return 0; >>> >>> spin_lock_irqsave(>wqh.lock, flags); >>> diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h >>> index fa0a524baed0..886d99cd38ef 100644 >>> --- a/include/linux/eventfd.h >>> +++ b/include/linux/eventfd.h >>> @@ -29,6 +29,9 @@ >>> #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) >>> #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE) >>> >>> +/* Maximum recursion depth */ >>> +#define EFD_WAKE_DEPTH 1 >>> + >>> struct eventfd_ctx; >>> struct file; >>> >>> @@ -47,7 +50,7 @@ DECLARE_PER_CPU(int, eventfd_wake_count); >>> >>> static inline bool eventfd_signal_count(void) >>> { >>> - return this_cpu_read(eventfd_wake_count); >>> + return this_cpu_read(eventfd_wake_count) > EFD_WAKE_DEPTH; >> count is just count. How deep is acceptable should be put >> where eventfd_signal_count is called. >> > The return value of this function is boolean rather than integer. > Please see the comments in eventfd_signal(): > > "then it should check eventfd_signal_count() before calling this > function. If it returns true, the eventfd_signal() call should be > deferred to a safe context." OK. Now that the maintainer comments as such we can use it accordingly, though I still got the feeling that the function name and the type of the return value don't match. Thanks, Zhe > > Thanks, > Yongji ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: Fix W=1 clang warning in intel/perf.c
On Thu, Jun 17, 2021 at 01:51:24PM -0700, Nathan Chancellor wrote: > kernel-doc is run automatically with W=1, regardless of gcc versus clang. I see, thanks. Will update the commit message. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/6] ACPI: Add driver for the VIOT table
On Thu, Jun 17, 2021 at 01:50:59PM +0200, Rafael J. Wysocki wrote: > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c > > index be7da23fad76..b835ca702ff0 100644 > > --- a/drivers/acpi/bus.c > > +++ b/drivers/acpi/bus.c > > @@ -27,6 +27,7 @@ > > #include > > #endif > > #include > > +#include > > #include > > #include > > #include > > @@ -1339,6 +1340,7 @@ static int __init acpi_init(void) > > pci_mmcfg_late_init(); > > acpi_iort_init(); > > acpi_scan_init(); > > + acpi_viot_init(); > > Is there a specific reason why to call it right here? > > In particular, does it need to be called after acpi_scan_init()? And > does it need to be called before the subsequent functions? If so, > then why? It does need to be called after acpi_scan_init(), because it relies on struct device and their fwnode to be initialized. In particular to find a PCI device we call pci_get_domain_bus_and_slot(), which needs the PCI topology made available by acpi_scan_init(). It does not need to be before the subsequent functions however, I can move it at the end. > > +void __init acpi_viot_init(void) > > +{ > > + int i; > > + acpi_status status; > > + struct acpi_table_header *hdr; > > + struct acpi_viot_header *node; > > + > > + status = acpi_get_table(ACPI_SIG_VIOT, 0, ); > > + if (ACPI_FAILURE(status)) { > > + if (status != AE_NOT_FOUND) { > > + const char *msg = acpi_format_exception(status); > > + > > + pr_err("Failed to get table, %s\n", msg); > > + } > > + return; > > + } > > + > > + viot = (void *)hdr; > > + > > + node = ACPI_ADD_PTR(struct acpi_viot_header, viot, > > viot->node_offset); > > + for (i = 0; i < viot->node_count; i++) { > > + if (viot_parse_node(node)) > > + return; > > + > > + node = ACPI_ADD_PTR(struct acpi_viot_header, node, > > + node->length); > > + } > > Do you still need the table after the above is complete? If not, > release the reference on it acquired above. We don't need the table anymore, I'll drop the reference Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v13 1/6] iommu: Deprecate Intel and AMD cmdline methods to enable strict mode
On 17/06/2021 20:01, Robin Murphy wrote: DMA. - strict [Default Off] + strict [Default Off] [Deprecated, use iommu.strict instead] With this option on every unmap_single operation will result in a hardware IOTLB flush operation as opposed to batching them for performance. FWIW I'd be inclined to replace both whole descriptions with just something like "Deprecated, equivalent to iommu.strict=1". diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 46280e6e1535..9f3096d650aa 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3098,8 +3098,10 @@ static int __init parse_amd_iommu_intr(char *str) static int __init parse_amd_iommu_options(char *str) { for (; *str; ++str) { - if (strncmp(str, "fullflush", 9) == 0) + if (strncmp(str, "fullflush", 9) == 0) { + pr_warn("amd_iommu=fullflush deprecated; use iommu.strict instead\n"); Nit: maybe we should spell out "...use =1 instead" in all of these messages just in case anyone takes them literally? (I'm not sure the options parse correctly with no argument) I don't think they do. But I'll take both suggestions on board. Either way, Acked-by: Robin Murphy Cheers! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/6] ACPI: Add driver for the VIOT table
On Wed, Jun 16, 2021 at 03:26:08PM +0200, Eric Auger wrote: > > + default: > > + pr_warn("Unsupported node %x\n", hdr->type); > > + ret = 0; > > + goto err_free; > > + } > > + > > + /* > > +* To be compatible with future versions of the table which may include > > +* other node types, keep parsing. > > +*/ > nit: doesn't this comment rather apply to the default clause in the > switch. Yes, the comment doesn't accurately explain the code below, I'll tweak it. /* * A future version of the table may use the node for other purposes. * Keep parsing. */ > In case the PCI range node or the single MMIO endoint node does > not refer to any translation element, isn't it simply an error case? It is permissible in my opinion. If a future version of the spec appends new fields to the MMIO endpoint describing some PV property (I can't think of a useful example), then the table can contain the vIOMMU topology as usual plus one MMIO node that's only here to describe that property, and doesn't have a translation element. If we encounter that I think we should keep parsing. > > + if (!ep->viommu) { > > + pr_warn("No IOMMU node found\n"); > > + ret = 0; > > + goto err_free; > > + } > Besides > Reviewed-by: Eric Auger Thanks! Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/6] ACPI: Move IOMMU setup code out of IORT
Hi Eric, On Wed, Jun 16, 2021 at 11:35:13AM +0200, Eric Auger wrote: > > -const struct iommu_ops *iort_iommu_configure_id(struct device *dev, > > - const u32 *id_in) > > +int iort_iommu_configure_id(struct device *dev, const u32 *id_in) > > { > > struct acpi_iort_node *node; > > - const struct iommu_ops *ops; > > + const struct iommu_ops *ops = NULL; Oops, I need to remove this (and add -Werror to my tests.) > > +static const struct iommu_ops *acpi_iommu_configure_id(struct device *dev, > > + const u32 *id_in) > > +{ > > + int err; > > + const struct iommu_ops *ops; > > + > > + /* > > +* If we already translated the fwspec there is nothing left to do, > > +* return the iommu_ops. > > +*/ > > + ops = acpi_iommu_fwspec_ops(dev); > > + if (ops) > > + return ops; > > + > > + err = iort_iommu_configure_id(dev, id_in); > > + > > + /* > > +* If we have reason to believe the IOMMU driver missed the initial > > +* add_device callback for dev, replay it to get things in order. > > +*/ > > + if (!err && dev->bus && !device_iommu_mapped(dev)) > > + err = iommu_probe_device(dev); > Previously we had: > if (!err) { > ops = iort_fwspec_iommu_ops(dev); > err = iort_add_device_replay(dev); > } > > Please can you explain the transform? I see the > > acpi_iommu_fwspec_ops call below but is it not straightforward to me. I figured that iort_add_device_replay() is only used once and is sufficiently simple to be inlined manually (saving 10 lines). Then I replaced the ops assignment with returns, which saves another line and may be slightly clearer? I guess it's mostly a matter of taste, the behavior should be exactly the same. > Also the comment mentions replay. Unsure if it is still OK. The "replay" part is, but "add_device" isn't accurate because it has since been replaced by probe_device. I'll refresh the comment. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v13 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options
On 18/06/2021 02:46, Lu Baolu wrote: Hi baolu, I need to change that. How about this: bool print_warning = false; for_each_active_iommu(iommu, drhd) { /* * The flush queue implementation does not perform * page-selective invalidations that are required for efficient * TLB flushes in virtual environments. The benefit of batching * is likely to be much lower than the overhead of synchronizing * the virtual and physical IOMMU page-tables. */ if (!print_warning && cap_caching_mode(iommu->cap)) { pr_warn("IOMMU batching disallowed due to virtualization\n"); iommu_set_dma_strict(true); print_warning = true; } ... } or use pr_warn_once(). From my p.o.v, pr__once() is better. How about using a pr_info_once()? I don't think it's a warning, it's just a policy choice in VM environment. ok, I can go with that, which Robin mostly agrees with. Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug report] iommu/vt-d: Allocate/register iopf queue for sva devices
Thanks for letting me know this. We already have a fix in linux-next. commit 90141a0fb72340937ed9ec05301cc548901d8eec Author: Colin Ian King Date: Fri Jun 11 14:50:24 2021 +0100 iommu/vt-d: Fix dereference of pointer info before it is null checked The assignment of iommu from info->iommu occurs before info is null checked hence leading to a potential null pointer dereference issue. Fix this by assigning iommu and checking if iommu is null after null checking info. Fixes: 4c82b88696ac ("iommu/vt-d: Allocate/register iopf queue for sva devices") Signed-off-by: Colin Ian King Addresses-Coverity: ("Dereference before null check") Link: https://lore.kernel.org/r/20210611135024.32781-1-colin.k...@canonical.com Signed-off-by: Lu Baolu Best regards, baolu On 2021/6/18 14:55, Dan Carpenter wrote: Hello Lu Baolu, This is a semi-automatic email about new static checker warnings. The patch 4c82b88696ac: "iommu/vt-d: Allocate/register iopf queue for sva devices" from Jun 10, 2021, leads to the following Smatch complaint: drivers/iommu/intel/iommu.c:5335 intel_iommu_enable_sva() warn: variable dereferenced before check 'info' (see line 5332) drivers/iommu/intel/iommu.c 5331 struct device_domain_info *info = get_domain_info(dev); 5332 struct intel_iommu *iommu = info->iommu; ^^^ Dereferenced 5333 int ret; 5334 5335 if (!info || !iommu || dmar_disabled) ^ Checked too late. 5336 return -EINVAL; 5337 regards, dan carpenter ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[bug report] iommu/vt-d: Allocate/register iopf queue for sva devices
Hello Lu Baolu, This is a semi-automatic email about new static checker warnings. The patch 4c82b88696ac: "iommu/vt-d: Allocate/register iopf queue for sva devices" from Jun 10, 2021, leads to the following Smatch complaint: drivers/iommu/intel/iommu.c:5335 intel_iommu_enable_sva() warn: variable dereferenced before check 'info' (see line 5332) drivers/iommu/intel/iommu.c 5331 struct device_domain_info *info = get_domain_info(dev); 5332 struct intel_iommu *iommu = info->iommu; ^^^ Dereferenced 5333 int ret; 5334 5335 if (!info || !iommu || dmar_disabled) ^ Checked too late. 5336 return -EINVAL; 5337 regards, dan carpenter ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions
On Fri, Jun 18, 2021 at 7:30 AM Stefano Stabellini wrote: > > On Thu, 17 Jun 2021, Claire Chang wrote: > > Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct > > initialization to make the code reusable. > > > > Signed-off-by: Claire Chang > > Reviewed-by: Christoph Hellwig > > Tested-by: Stefano Stabellini > > Tested-by: Will Deacon > > --- > > kernel/dma/swiotlb.c | 50 ++-- > > 1 file changed, 25 insertions(+), 25 deletions(-) > > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > > index 52e2ac526757..47bb2a766798 100644 > > --- a/kernel/dma/swiotlb.c > > +++ b/kernel/dma/swiotlb.c > > @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void) > > memset(vaddr, 0, bytes); > > } > > > > -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int > > verbose) > > +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t > > start, > > + unsigned long nslabs, bool late_alloc) > > { > > + void *vaddr = phys_to_virt(start); > > unsigned long bytes = nslabs << IO_TLB_SHIFT, i; > > + > > + mem->nslabs = nslabs; > > + mem->start = start; > > + mem->end = mem->start + bytes; > > + mem->index = 0; > > + mem->late_alloc = late_alloc; > > + spin_lock_init(>lock); > > + for (i = 0; i < mem->nslabs; i++) { > > + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > > + mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > > + mem->slots[i].alloc_size = 0; > > + } > > + memset(vaddr, 0, bytes); > > +} > > + > > +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int > > verbose) > > +{ > > struct io_tlb_mem *mem; > > size_t alloc_size; > > > > @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned > > long nslabs, int verbose) > > if (!mem) > > panic("%s: Failed to allocate %zu bytes align=0x%lx\n", > > __func__, alloc_size, PAGE_SIZE); > > - mem->nslabs = nslabs; > > - mem->start = __pa(tlb); > > - mem->end = mem->start + bytes; > > - mem->index = 0; > > - spin_lock_init(>lock); > > - for (i = 0; i < mem->nslabs; i++) { > > - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > > - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > > - mem->slots[i].alloc_size = 0; > > - } > > + > > + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false); > > > > io_tlb_default_mem = mem; > > if (verbose) > > @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size) > > int > > swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs) > > { > > - unsigned long bytes = nslabs << IO_TLB_SHIFT, i; > > struct io_tlb_mem *mem; > > + unsigned long bytes = nslabs << IO_TLB_SHIFT; > > > > if (swiotlb_force == SWIOTLB_NO_FORCE) > > return 0; > > @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long > > nslabs) > > if (!mem) > > return -ENOMEM; > > > > - mem->nslabs = nslabs; > > - mem->start = virt_to_phys(tlb); > > - mem->end = mem->start + bytes; > > - mem->index = 0; > > - mem->late_alloc = 1; > > - spin_lock_init(>lock); > > - for (i = 0; i < mem->nslabs; i++) { > > - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); > > - mem->slots[i].orig_addr = INVALID_PHYS_ADDR; > > - mem->slots[i].alloc_size = 0; > > - } > > - > > + memset(mem, 0, sizeof(*mem)); > > + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true); > > set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT); > > - memset(tlb, 0, bytes); > > This is good for swiotlb_late_init_with_tbl. However I have just noticed > that mem could also be allocated from swiotlb_init_with_tbl, in which > case the zeroing is missing. I think we need another memset in > swiotlb_init_with_tbl as well. Or maybe it could be better to have a > single memset at the beginning of swiotlb_init_io_tlb_mem instead. Up to > you. swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so swiotlb_init_with_tbl is also good. I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think it's clearer and safer. [1] https://elixir.bootlin.com/linux/v5.13-rc6/source/include/linux/memblock.h#L407 [2] https://elixir.bootlin.com/linux/v5.13-rc6/source/mm/memblock.c#L1555 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu