[PATCH v1] xen: remove a confusing comment on auto-translated guest I/O

2023-08-02 Thread Petr Tesarik
From: Petr Tesarik 

After removing the conditional return from xen_create_contiguous_region(),
the accompanying comment was left in place, but it now precedes an
unrelated conditional and confuses readers.

Fixes: 989513a735f5 ("xen: cleanup pvh leftovers from pv-only sources")
Signed-off-by: Petr Tesarik 
---
 arch/x86/xen/mmu_pv.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index e0a975165de7..804a5441324c 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2310,12 +2310,6 @@ int xen_create_contiguous_region(phys_addr_t pstart, 
unsigned int order,
intsuccess;
unsigned long vstart = (unsigned long)phys_to_virt(pstart);
 
-   /*
-* Currently an auto-translated guest will not perform I/O, nor will
-* it require PAE page directories below 4GB. Therefore any calls to
-* this function are redundant and can be ignored.
-*/
-
if (unlikely(order > MAX_CONTIG_ORDER))
return -ENOMEM;
 
-- 
2.25.1




Re: [PATCH v7 0/9] Allow dynamic allocation of software IO TLB bounce buffers

2023-08-01 Thread Petr Tesarik
On 8/1/2023 6:03 PM, Christoph Hellwig wrote:
> Thanks,
> 
> I've applied this to a new swiotlb-dynamic branch that I'll pull into
> the dma-mapping for-next tree.

Thank you.

I guess I can prepare some follow-up series now. ;-)

Petr T




[PATCH v7 7/9] swiotlb: determine potential physical address limit

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

The value returned by default_swiotlb_limit() should be constant, because
it is used to decide whether DMA can be used. To allow allocating memory
pools on the fly, use the maximum possible physical address rather than the
highest address used by the default pool.

For swiotlb_init_remap(), this is either an arch-specific limit used by
memblock_alloc_low(), or the highest directly mapped physical address if
the initialization flags include SWIOTLB_ANY. For swiotlb_init_late(), the
highest address is determined by the GFP flags.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/swiotlb.c| 14 ++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 66867d2188ba..9825fa14abe4 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -109,6 +109,7 @@ struct io_tlb_pool {
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
  * @can_grow:  %true if more pools can be allocated dynamically.
+ * @phys_limit:Maximum allowed physical address.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -123,6 +124,7 @@ struct io_tlb_mem {
bool for_alloc;
 #ifdef CONFIG_SWIOTLB_DYNAMIC
bool can_grow;
+   u64 phys_limit;
 #endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 30d0fcc3ccb9..0fa081defdbd 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -334,6 +334,10 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 #ifdef CONFIG_SWIOTLB_DYNAMIC
if (!remap)
io_tlb_default_mem.can_grow = true;
+   if (flags & SWIOTLB_ANY)
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
+   else
+   io_tlb_default_mem.phys_limit = ARCH_LOW_ADDRESS_LIMIT;
 #endif
 
if (!default_nareas)
@@ -409,6 +413,12 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 #ifdef CONFIG_SWIOTLB_DYNAMIC
if (!remap)
io_tlb_default_mem.can_grow = true;
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
+   else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
+   else
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
 #endif
 
if (!default_nareas)
@@ -1397,7 +1407,11 @@ phys_addr_t default_swiotlb_base(void)
  */
 phys_addr_t default_swiotlb_limit(void)
 {
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   return io_tlb_default_mem.phys_limit;
+#else
return io_tlb_default_mem.defpool.end - 1;
+#endif
 }
 
 #ifdef CONFIG_DEBUG_FS
-- 
2.25.1




[PATCH v7 8/9] swiotlb: allocate a new memory pool when existing pools are full

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

When swiotlb_find_slots() cannot find suitable slots, schedule the
allocation of a new memory pool. It is not possible to allocate the pool
immediately, because this code may run in interrupt context, which is not
suitable for large memory allocations. This means that the memory pool will
be available too late for the currently requested mapping, but the stress
on the software IO TLB allocator is likely to continue, and subsequent
allocations will benefit from the additional pool eventually.

Keep all memory pools for an allocator in an RCU list to avoid locking on
the read side. For modifications, add a new spinlock to struct io_tlb_mem.

The spinlock also protects updates to the total number of slabs (nslabs in
struct io_tlb_mem), but not reads of the value. Readers may therefore
encounter a stale value, but this is not an issue:

- swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero
  value. This is ensured by the existence of the default memory pool,
  allocated at boot.

- The exact value is used only for non-critical purposes (debugfs, kernel
  messages).

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |   8 +++
 kernel/dma/swiotlb.c| 148 +---
 2 files changed, 131 insertions(+), 25 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 9825fa14abe4..8371c92a0271 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct page;
@@ -104,12 +105,16 @@ struct io_tlb_pool {
 /**
  * struct io_tlb_mem - Software IO TLB allocator
  * @defpool:   Default (initial) IO TLB memory pool descriptor.
+ * @pool:  IO TLB memory pool descriptor (if not dynamic).
  * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
  * @can_grow:  %true if more pools can be allocated dynamically.
  * @phys_limit:Maximum allowed physical address.
+ * @lock:  Lock to synchronize changes to the list.
+ * @pools: List of IO TLB memory pool descriptors (if dynamic).
+ * @dyn_alloc: Dynamic IO TLB pool allocation work.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -125,6 +130,9 @@ struct io_tlb_mem {
 #ifdef CONFIG_SWIOTLB_DYNAMIC
bool can_grow;
u64 phys_limit;
+   spinlock_t lock;
+   struct list_head pools;
+   struct work_struct dyn_alloc;
 #endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0fa081defdbd..adf80dec42d7 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -79,8 +79,23 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+
+static void swiotlb_dyn_alloc(struct work_struct *work);
+
+static struct io_tlb_mem io_tlb_default_mem = {
+   .lock = __SPIN_LOCK_UNLOCKED(io_tlb_default_mem.lock),
+   .pools = LIST_HEAD_INIT(io_tlb_default_mem.pools),
+   .dyn_alloc = __WORK_INITIALIZER(io_tlb_default_mem.dyn_alloc,
+   swiotlb_dyn_alloc),
+};
+
+#else  /* !CONFIG_SWIOTLB_DYNAMIC */
+
 static struct io_tlb_mem io_tlb_default_mem;
 
+#endif /* CONFIG_SWIOTLB_DYNAMIC */
+
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
 static unsigned long default_nareas;
 
@@ -278,6 +293,23 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool 
*mem, phys_addr_t start,
return;
 }
 
+/**
+ * add_mem_pool() - add a memory pool to the allocator
+ * @mem:   Software IO TLB allocator.
+ * @pool:  Memory pool to be added.
+ */
+static void add_mem_pool(struct io_tlb_mem *mem, struct io_tlb_pool *pool)
+{
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   spin_lock(>lock);
+   list_add_rcu(>node, >pools);
+   mem->nslabs += pool->nslabs;
+   spin_unlock(>lock);
+#else
+   mem->nslabs = pool->nslabs;
+#endif
+}
+
 static void __init *swiotlb_memblock_alloc(unsigned long nslabs,
unsigned int flags,
int (*remap)(void *tlb, unsigned long nslabs))
@@ -375,7 +407,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 
swiotlb_init_io_tlb_pool(mem, __pa(tlb), nslabs, false,
 default_nareas);
-   io_tlb_default_mem.nslabs = nslabs;
+   add_mem_pool(_tlb_default_mem, mem);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();
@@ -474,7 +506,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);

Re: [PATCH v6 0/9] Allow dynamic allocation of software IO TLB bounce buffers

2023-08-01 Thread Petr Tesarik
On 7/31/2023 9:46 PM, Petr Tesařík wrote:
> V Mon, 31 Jul 2023 18:04:09 +0200
> Christoph Hellwig  napsáno:
> 
>> I was just going to apply this, but patch 1 seems to have a non-trivial
>> conflict with the is_swiotlb_active removal in pci-dma.c.  Can you resend
>> against the current dma-mapping for-next tree?
> 
> Sure thing, will re-send tomorrow morning.

After commit f9a38ea5172a ("x86: always initialize xen-swiotlb when
xen-pcifront is enabling") removed that call to swiotlb_init_late(),
there is nothing to patch, and the hunk can be dropped.

I have just sent v7.

Petr T




[PATCH v7 9/9] swiotlb: search the software IO TLB only if the device makes use of it

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

Skip searching the software IO TLB if a device has never used it, making
sure these devices are not affected by the introduction of multiple IO TLB
memory pools.

Additional memory barrier is required to ensure that the new value of the
flag is visible to other CPUs after mapping a new bounce buffer. For
efficiency, the flag check should be inlined, and then the memory barrier
must be moved to is_swiotlb_buffer(). However, it can replace the existing
barrier in swiotlb_find_pool(), because all callers use is_swiotlb_buffer()
first to verify that the buffer address belongs to the software IO TLB.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |  2 ++
 include/linux/swiotlb.h |  7 ++-
 kernel/dma/swiotlb.c| 14 ++
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 5fd89c9d005c..6fc808d22bfd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -628,6 +628,7 @@ struct device_physical_location {
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
  * @dma_io_tlb_lock:   Protects changes to the list of active pools.
+ * @dma_uses_io_tlb: %true if device has used the software IO TLB.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -737,6 +738,7 @@ struct device {
 #ifdef CONFIG_SWIOTLB_DYNAMIC
struct list_head dma_io_tlb_pools;
spinlock_t dma_io_tlb_lock;
+   bool dma_uses_io_tlb;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8371c92a0271..b4536626f8ff 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -172,8 +172,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
if (!mem)
return false;
 
-   if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC))
+   if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {
+   /* Pairs with smp_wmb() in swiotlb_find_slots() and
+* swiotlb_dyn_alloc(), which modify the RCU lists.
+*/
+   smp_rmb();
return swiotlb_find_pool(dev, paddr);
+   }
return paddr >= mem->defpool.start && paddr < mem->defpool.end;
 }
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index adf80dec42d7..d7eac84f975b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -730,7 +730,7 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
 
add_mem_pool(mem, pool);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
+   /* Pairs with smp_rmb() in is_swiotlb_buffer(). */
smp_wmb();
 }
 
@@ -764,11 +764,6 @@ struct io_tlb_pool *swiotlb_find_pool(struct device *dev, 
phys_addr_t paddr)
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
struct io_tlb_pool *pool;
 
-   /* Pairs with smp_wmb() in swiotlb_find_slots() and
-* swiotlb_dyn_alloc(), which modify the RCU lists.
-*/
-   smp_rmb();
-
rcu_read_lock();
list_for_each_entry_rcu(pool, >pools, node) {
if (paddr >= pool->start && paddr < pool->end)
@@ -813,6 +808,7 @@ void swiotlb_dev_init(struct device *dev)
 #ifdef CONFIG_SWIOTLB_DYNAMIC
INIT_LIST_HEAD(>dma_io_tlb_pools);
spin_lock_init(>dma_io_tlb_lock);
+   dev->dma_uses_io_tlb = false;
 #endif
 }
 
@@ -1157,9 +1153,11 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
list_add_rcu(>node, >dma_io_tlb_pools);
spin_unlock_irqrestore(>dma_io_tlb_lock, flags);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
-   smp_wmb();
 found:
+   dev->dma_uses_io_tlb = true;
+   /* Pairs with smp_rmb() in is_swiotlb_buffer() */
+   smp_wmb();
+
*retpool = pool;
return index;
 }
-- 
2.25.1




[PATCH v7 6/9] swiotlb: if swiotlb is full, fall back to a transient memory pool

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

Try to allocate a transient memory pool if no suitable slots can be found
and the respective SWIOTLB is allowed to grow. The transient pool is just
enough big for this one bounce buffer. It is inserted into a per-device
list of transient memory pools, and it is freed again when the bounce
buffer is unmapped.

Transient memory pools are kept in an RCU list. A memory barrier is
required after adding a new entry, because any address within a transient
buffer must be immediately recognized as belonging to the SWIOTLB, even if
it is passed to another CPU.

Deletion does not require any synchronization beyond RCU ordering
guarantees. After a buffer is unmapped, its physical addresses may no
longer be passed to the DMA API, so the memory range of the corresponding
stale entry in the RCU list never matches. If the memory range gets
allocated again, then it happens only after a RCU quiescent state.

Since bounce buffers can now be allocated from different pools, add a
parameter to swiotlb_alloc_pool() to let the caller know which memory pool
is used. Add swiotlb_find_pool() to find the memory pool corresponding to
an address. This function is now also used by is_swiotlb_buffer(), because
a simple boundary check is no longer sufficient.

The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(),
simplified and enhanced to use coherent memory pools if needed.

Note that this is not the most efficient way to provide a bounce buffer,
but when a DMA buffer can't be mapped, something may (and will) actually
break. At that point it is better to make an allocation, even if it may be
an expensive operation.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   6 +
 include/linux/dma-mapping.h |   2 +
 include/linux/swiotlb.h |  29 +++-
 kernel/dma/direct.c |   2 +-
 kernel/dma/swiotlb.c| 316 +++-
 5 files changed, 345 insertions(+), 10 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index d9754a68ba95..5fd89c9d005c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -626,6 +626,8 @@ struct device_physical_location {
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
+ * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
+ * @dma_io_tlb_lock:   Protects changes to the list of active pools.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -731,6 +733,10 @@ struct device {
 #endif
 #ifdef CONFIG_SWIOTLB
struct io_tlb_mem *dma_io_tlb_mem;
+#endif
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   struct list_head dma_io_tlb_pools;
+   spinlock_t dma_io_tlb_lock;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index e13050eb9777..f0ccca16a0ac 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -418,6 +418,8 @@ static inline void dma_sync_sgtable_for_device(struct 
device *dev,
 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
 #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
 
+bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
+
 static inline void *dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp)
 {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 57be2a0a9fbf..66867d2188ba 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -80,6 +80,9 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @area_nslabs: Number of slots in each area.
  * @areas: Array of memory area descriptors.
  * @slots: Array of slot descriptors.
+ * @node:  Member of the IO TLB memory pool list.
+ * @rcu:   RCU head for swiotlb_dyn_free().
+ * @transient:  %true if transient memory pool.
  */
 struct io_tlb_pool {
phys_addr_t start;
@@ -91,6 +94,11 @@ struct io_tlb_pool {
unsigned int area_nslabs;
struct io_tlb_area *areas;
struct io_tlb_slot *slots;
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   struct list_head node;
+   struct rcu_head rcu;
+   bool transient;
+#endif
 };
 
 /**
@@ -122,6 +130,20 @@ struct io_tlb_mem {
 #endif
 };
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+
+struct io_tlb_pool *swiotlb_find_pool(struct device *dev, phys_addr_t paddr);
+
+#else
+
+static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev,
+   phys_addr_t paddr)
+{
+   return >dma_io_tlb_mem->defpool;
+}
+
+#endif
+
 /**
  * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
  * @dev:Device which has mapped the buffer.
@@ -137,7 +

[PATCH v7 5/9] swiotlb: add a flag whether SWIOTLB is allowed to grow

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

Add a config option (CONFIG_SWIOTLB_DYNAMIC) to enable or disable dynamic
allocation of additional bounce buffers.

If this option is set, mark the default SWIOTLB as able to grow and
restricted DMA pools as unable.

However, if the address of the default memory pool is explicitly queried,
make the default SWIOTLB also unable to grow. This is currently used to set
up PCI BAR movable regions on some Octeon MIPS boards which may not be able
to use a SWIOTLB pool elsewhere in physical memory. See octeon_pci_setup()
for more details.

If a remap function is specified, it must be also called on any dynamically
allocated pools, but there are some issues:

- The remap function may block, so it should not be called from an atomic
  context.
- There is no corresponding unremap() function if the memory pool is
  freed.
- The only in-tree implementation (xen_swiotlb_fixup) requires that the
  number of slots in the memory pool is a multiple of SWIOTLB_SEGSIZE.

Keep it simple for now and disable growing the SWIOTLB if a remap function
was specified.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  4 
 kernel/dma/Kconfig  | 13 +
 kernel/dma/swiotlb.c| 13 +
 3 files changed, 30 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 247f0ab8795a..57be2a0a9fbf 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -100,6 +100,7 @@ struct io_tlb_pool {
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @can_grow:  %true if more pools can be allocated dynamically.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -112,6 +113,9 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool force_bounce;
bool for_alloc;
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   bool can_grow;
+#endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 562463fe30ea..4c1e9a3c0ab6 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -90,6 +90,19 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config SWIOTLB_DYNAMIC
+   bool "Dynamic allocation of DMA bounce buffers"
+   default n
+   depends on SWIOTLB
+   help
+ This enables dynamic resizing of the software IO TLB. The kernel
+ starts with one memory pool at boot and it will allocate additional
+ pools as needed. To reduce run-time kernel memory requirements, you
+ may have to specify a smaller size of the initial pool using
+ "swiotlb=" on the kernel command line.
+
+ If unsure, say N.
+
 config DMA_BOUNCE_UNALIGNED_KMALLOC
bool
depends on SWIOTLB
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6fc2606e014b..767c8fb36a6b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -330,6 +330,11 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
io_tlb_default_mem.force_bounce =
swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   if (!remap)
+   io_tlb_default_mem.can_grow = true;
+#endif
+
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
 
@@ -400,6 +405,11 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   if (!remap)
+   io_tlb_default_mem.can_grow = true;
+#endif
+
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
 
@@ -1073,6 +1083,9 @@ bool is_swiotlb_active(struct device *dev)
  */
 phys_addr_t default_swiotlb_base(void)
 {
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   io_tlb_default_mem.can_grow = false;
+#endif
return io_tlb_default_mem.defpool.start;
 }
 
-- 
2.25.1




[PATCH v7 4/9] swiotlb: separate memory pool data from other allocator data

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

Carve out memory pool specific fields from struct io_tlb_mem. The original
struct now contains shared data for the whole allocator, while the new
struct io_tlb_pool contains data that is specific to one memory pool of
(potentially) many.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   2 +-
 include/linux/swiotlb.h |  45 +++
 kernel/dma/swiotlb.c| 175 +---
 3 files changed, 140 insertions(+), 82 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index bbaeabd04b0d..d9754a68ba95 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -625,7 +625,7 @@ struct device_physical_location {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
- * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
+ * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 31625ae507ea..247f0ab8795a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,8 +62,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 
 /**
- * struct io_tlb_mem - IO TLB Memory Pool Descriptor
- *
+ * struct io_tlb_pool - IO TLB memory pool descriptor
  * @start: The start address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
@@ -73,15 +72,34 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
  * may be remapped in the memory encrypted case and store virtual
  * address for bounce buffer operation.
- * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. For default swiotlb, this is command line adjustable via
- * setup_io_tlb_npages.
+ * @nslabs:The number of IO TLB slots between @start and @end. For the
+ * default swiotlb, this can be adjusted with a boot parameter,
+ * see setup_io_tlb_npages().
+ * @late_alloc:%true if allocated using the page allocator.
+ * @nareas:Number of areas in the pool.
+ * @area_nslabs: Number of slots in each area.
+ * @areas: Array of memory area descriptors.
+ * @slots: Array of slot descriptors.
+ */
+struct io_tlb_pool {
+   phys_addr_t start;
+   phys_addr_t end;
+   void *vaddr;
+   unsigned long nslabs;
+   bool late_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
+   struct io_tlb_slot *slots;
+};
+
+/**
+ * struct io_tlb_mem - Software IO TLB allocator
+ * @defpool:   Default (initial) IO TLB memory pool descriptor.
+ * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
- * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
- * @nareas:  The area number in the pool.
- * @area_nslabs: The slot number in the area.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -89,18 +107,11 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
phys,
  * in debugfs.
  */
 struct io_tlb_mem {
-   phys_addr_t start;
-   phys_addr_t end;
-   void *vaddr;
+   struct io_tlb_pool defpool;
unsigned long nslabs;
struct dentry *debugfs;
-   bool late_alloc;
bool force_bounce;
bool for_alloc;
-   unsigned int nareas;
-   unsigned int area_nslabs;
-   struct io_tlb_area *areas;
-   struct io_tlb_slot *slots;
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
@@ -122,7 +133,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
-   return mem && paddr >= mem->start && paddr < mem->end;
+   return mem && paddr >= mem->defpool.start && paddr < mem->defpool.end;
 }
 
 static inline bool is_swiotlb_force_bounce(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 66793b59c290..6fc2606e014b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -209,7 +209,7 @@ void __init swiotlb_adjust_size(unsigned long size)
 
 void swiotlb_print_info(void)
 {
-   struct io_tlb_m

[PATCH v7 3/9] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

Add some kernel-doc comments and move the existing documentation of struct
io_tlb_slot to its correct location. The latter was forgotten in commit
942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").

Use the opportunity to give swiotlb_do_find_slots() a more descriptive name
and make it clear how it differs from swiotlb_find_slots().

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h | 15 +++---
 kernel/dma/swiotlb.c| 61 +
 2 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 2d453b3e7771..31625ae507ea 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -76,10 +76,6 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
- * @list:  The free list describing the number of free entries available
- * from each index.
- * @orig_addr: The original address corresponding to a mapped entry.
- * @alloc_size:Size of the allocated buffer.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
@@ -111,6 +107,17 @@ struct io_tlb_mem {
 #endif
 };
 
+/**
+ * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
+ * @dev:Device which has mapped the buffer.
+ * @paddr:  Physical address within the DMA buffer.
+ *
+ * Check if @paddr points into a bounce buffer.
+ *
+ * Return:
+ * * %true if @paddr points into a bounce buffer
+ * * %false otherwise
+ */
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0840bc15fb53..66793b59c290 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -62,6 +62,13 @@
 
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
+/**
+ * struct io_tlb_slot - IO TLB slot descriptor
+ * @orig_addr: The original address corresponding to a mapped entry.
+ * @alloc_size:Size of the allocated buffer.
+ * @list:  The free list describing the number of free entries available
+ * from each index.
+ */
 struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -635,11 +642,22 @@ static void dec_used(struct io_tlb_mem *mem, unsigned int 
nslots)
 }
 #endif /* CONFIG_DEBUG_FS */
 
-/*
- * Find a suitable number of IO TLB entries size that will fit this request and
- * allocate a buffer from that IO TLB pool.
+/**
+ * swiotlb_area_find_slots() - search for slots in one IO TLB memory area
+ * @dev:   Device which maps the buffer.
+ * @area_index:Index of the IO TLB memory area to be searched.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Find a suitable sequence of IO TLB entries for the request and allocate
+ * a buffer from the given IO TLB memory area.
+ * This function takes care of locking.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
  */
-static int swiotlb_do_find_slots(struct device *dev, int area_index,
+static int swiotlb_area_find_slots(struct device *dev, int area_index,
phys_addr_t orig_addr, size_t alloc_size,
unsigned int alloc_align_mask)
 {
@@ -734,6 +752,19 @@ static int swiotlb_do_find_slots(struct device *dev, int 
area_index,
return slot_index;
 }
 
+/**
+ * swiotlb_find_slots() - search for slots in the whole swiotlb
+ * @dev:   Device which maps the buffer.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Search through the whole software IO TLB to find a sequence of slots that
+ * match the allocation constraints.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
+ */
 static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size, unsigned int alloc_align_mask)
 {
@@ -742,8 +773,8 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
int i = start, index;
 
do {
-   index = swiotlb_do_find_slots(dev, i, orig_addr, alloc_size,
- alloc_align_mask);
+   index = swiotlb_area_find_slots(dev, i, orig_addr, alloc_size,
+   alloc_align_mask);
i

[PATCH v7 2/9] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

SWIOTLB implementation details should not be exposed to the rest of the
kernel. This will allow to make changes to the implementation without
modifying non-swiotlb code.

To avoid breaking existing users, provide helper functions for the few
required fields.

As a bonus, using a helper function to initialize struct device allows to
get rid of an #ifdef in driver core.

Signed-off-by: Petr Tesarik 
---
 arch/mips/pci/pci-octeon.c |  2 +-
 drivers/base/core.c|  4 +---
 drivers/xen/swiotlb-xen.c  |  2 +-
 include/linux/swiotlb.h| 25 +++-
 kernel/dma/swiotlb.c   | 39 +-
 mm/slab_common.c   |  5 ++---
 6 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
index e457a18cbdc5..d19d9d456309 100644
--- a/arch/mips/pci/pci-octeon.c
+++ b/arch/mips/pci/pci-octeon.c
@@ -664,7 +664,7 @@ static int __init octeon_pci_setup(void)
 
/* BAR1 movable regions contiguous to cover the swiotlb */
octeon_bar1_pci_phys =
-   io_tlb_default_mem.start & ~((1ull << 22) - 1);
+   default_swiotlb_base() & ~((1ull << 22) - 1);
 
for (index = 0; index < 32; index++) {
union cvmx_pci_bar1_indexx bar1_index;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3dff5037943e..46d1d78c5beb 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3108,9 +3108,7 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
-#ifdef CONFIG_SWIOTLB
-   dev->dma_io_tlb_mem = _tlb_default_mem;
-#endif
+   swiotlb_dev_init(dev);
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 67aa74d20162..946bd56f0ac5 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -381,7 +381,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, default_swiotlb_limit()) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4e52cd5e0bdc..2d453b3e7771 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -110,7 +110,6 @@ struct io_tlb_mem {
atomic_long_t used_hiwater;
 #endif
 };
-extern struct io_tlb_mem io_tlb_default_mem;
 
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
@@ -128,13 +127,22 @@ static inline bool is_swiotlb_force_bounce(struct device 
*dev)
 
 void swiotlb_init(bool addressing_limited, unsigned int flags);
 void __init swiotlb_exit(void);
+void swiotlb_dev_init(struct device *dev);
 size_t swiotlb_max_mapping_size(struct device *dev);
+bool is_swiotlb_allocated(void);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+phys_addr_t default_swiotlb_base(void);
+phys_addr_t default_swiotlb_limit(void);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
 }
+
+static inline void swiotlb_dev_init(struct device *dev)
+{
+}
+
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
@@ -151,6 +159,11 @@ static inline size_t swiotlb_max_mapping_size(struct 
device *dev)
return SIZE_MAX;
 }
 
+static inline bool is_swiotlb_allocated(void)
+{
+   return false;
+}
+
 static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
@@ -159,6 +172,16 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline phys_addr_t default_swiotlb_base(void)
+{
+   return 0;
+}
+
+static inline phys_addr_t default_swiotlb_limit(void)
+{
+   return 0;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ee57fd9949dc..0840bc15fb53 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -71,7 +71,7 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
-struct io_tlb_mem io_tlb_default_mem;
+static struct io_tlb_mem io_tlb_default_mem;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
 static unsigned long default_nareas;
@@ -489,6 +489,15 @@ void __init swiotlb_exit(void)
memset(mem, 0, sizeof(*mem));
 }
 
+/**
+ * swiotlb_dev_init() - initialize swiotlb fields in  device
+ * @dev:   Device to be initialized.
+ */
+void swiotlb_dev_init(struct device *dev)
+{
+   dev->dma_io_tlb_mem = _tlb_default_mem;
+}
+

[PATCH v7 1/9] swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

If swiotlb is allocated, immediately return 0, so callers do not have to
check io_tlb_default_mem.nslabs explicitly.

Signed-off-by: Petr Tesarik 
---
 arch/arm/xen/mm.c| 10 --
 kernel/dma/swiotlb.c |  3 +++
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 3d826c0b5fee..882cd70c7a2f 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -125,12 +125,10 @@ static int __init xen_mm_init(void)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
-   rc = swiotlb_init_late(swiotlb_size_or_default(),
-  xen_swiotlb_gfp(), NULL);
-   if (rc < 0)
-   return rc;
-   }
+   rc = swiotlb_init_late(swiotlb_size_or_default(),
+  xen_swiotlb_gfp(), NULL);
+   if (rc < 0)
+   return rc;
 
cflush.op = 0;
cflush.a.dev_bus_addr = 0;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1c0a49e6685a..ee57fd9949dc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -384,6 +384,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
bool retried = false;
int rc = 0;
 
+   if (io_tlb_default_mem.nslabs)
+   return 0;
+
if (swiotlb_force_disable)
return 0;
 
-- 
2.25.1




[PATCH v7 0/9] Allow dynamic allocation of software IO TLB bounce buffers

2023-08-01 Thread Petr Tesarik
From: Petr Tesarik 

Motivation
==

The software IO TLB was designed with these assumptions:

1) It would not be used much. Small systems (little RAM) don't need it, and
   big systems (lots of RAM) would have modern DMA controllers and an IOMMU
   chip to handle legacy devices.
2) A small fixed memory area (64 MiB by default) is sufficient to
   handle the few cases which require a bounce buffer.
3) 64 MiB is little enough that it has no impact on the rest of the
   system.
4) Bounce buffers require large contiguous chunks of low memory. Such
   memory is precious and can be allocated only early at boot.

It turns out they are not always true:

1) Embedded systems may have more than 4GiB RAM but no IOMMU and legacy
   32-bit peripheral busses and/or DMA controllers.
2) CoCo VMs use bounce buffers for all I/O but may need substantially more
   than 64 MiB.
3) Embedded developers put as many features as possible into the available
   memory. A few dozen "missing" megabytes may limit what features can be
   implemented.
4) If CMA is available, it can allocate large continuous chunks even after
   the system has run for some time.

Goals
=

The goal of this work is to start with a small software IO TLB at boot and
expand it later when/if needed.

Design
==

This version of the patch series retains the current slot allocation
algorithm with multiple areas to reduce lock contention, but additional
slots can be added when necessary.

These alternatives have been considered:

- Allocate and free buffers as needed using direct DMA API. This works
  quite well, except in CoCo VMs where each allocation/free requires
  decrypting/encrypting memory, which is a very expensive operation.

- Allocate a very large software IO TLB at boot, but allow to migrate pages
  to/from it (like CMA does). For systems with CMA, this would mean two big
  allocations at boot. Finding the balance between CMA, SWIOTLB and rest of
  available RAM can be challenging. More importantly, there is no clear
  benefit compared to allocating SWIOTLB memory pools from the CMA.

Implementation Constraints
==

These constraints have been taken into account:

1) Minimize impact on devices which do not benefit from the change.
2) Minimize the number of memory decryption/encryption operations.
3) Avoid contention on a lock or atomic variable to preserve parallel
   scalability.

Additionally, the software IO TLB code is also used to implement restricted
DMA pools. These pools are restricted to a pre-defined physical memory
region and must not use any other memory. In other words, dynamic
allocation of memory pools must be disabled for restricted DMA pools.

Data Structures
===

The existing struct io_tlb_mem is the central type for a SWIOTLB allocator,
but it now contains multiple memory pools::

  io_tlb_mem
  +-+   io_tlb_pool
  | SWIOTLB |   +---+   +---+   +---+
  |allocator|-->|default|-->|dynamic|-->|dynamic|-->...
  | |   |memory |   |memory |   |memory |
  +-+   | pool  |   | pool  |   | pool  |
+---+   +---+   +---+

The allocator structure contains global state (such as flags and counters)
and structures needed to schedule new allocations. Each memory pool
contains the actual buffer slots and metadata. The first memory pool in the
list is the default memory pool allocated statically at early boot.

New memory pools are allocated from a kernel worker thread. That's because
bounce buffers are allocated when mapping a DMA buffer, which may happen in
interrupt context where large atomic allocations would probably fail.
Allocation from process context is much more likely to succeed, especially
if it can use CMA.

Nonetheless, the onset of a load spike may fill up the SWIOTLB before the
worker has a chance to run. In that case, try to allocate a small transient
memory pool to accommodate the request. If memory is encrypted and the
device cannot do DMA to encrypted memory, this buffer is allocated from the
coherent atomic DMA memory pool. Reducing the size of SWIOTLB may therefore
require increasing the size of the coherent pool with the "coherent_pool"
command-line parameter.

Performance
===

All testing compared a vanilla v6.4-rc6 kernel with a fully patched
kernel. The kernel was booted with "swiotlb=force" to allow stress-testing
the software IO TLB on a high-performance device that would otherwise not
need it. CONFIG_DEBUG_FS was set to 'y' to match the configuration of
popular distribution kernels; it is understood that parallel workloads
suffer from contention on the recently added debugfs atomic counters.

These benchmarks were run:

- small: single-threaded I/O of 4 KiB blocks,
- big: single-threaded I/O of 64 KiB blocks,
- 4way: 4-way parallel I/O of 4 KiB blocks.

In all tested cases, the default 64 MiB SWIOTLB would be sufficient (but
wasteful). The "default" 

[PATCH v6 7/9] swiotlb: determine potential physical address limit

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

The value returned by default_swiotlb_limit() should be constant, because
it is used to decide whether DMA can be used. To allow allocating memory
pools on the fly, use the maximum possible physical address rather than the
highest address used by the default pool.

For swiotlb_init_remap(), this is either an arch-specific limit used by
memblock_alloc_low(), or the highest directly mapped physical address if
the initialization flags include SWIOTLB_ANY. For swiotlb_init_late(), the
highest address is determined by the GFP flags.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/swiotlb.c| 14 ++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 66867d2188ba..9825fa14abe4 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -109,6 +109,7 @@ struct io_tlb_pool {
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
  * @can_grow:  %true if more pools can be allocated dynamically.
+ * @phys_limit:Maximum allowed physical address.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -123,6 +124,7 @@ struct io_tlb_mem {
bool for_alloc;
 #ifdef CONFIG_SWIOTLB_DYNAMIC
bool can_grow;
+   u64 phys_limit;
 #endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6e985f65b9f5..ca3aa03f37ba 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -334,6 +334,10 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 #ifdef CONFIG_SWIOTLB_DYNAMIC
if (!remap)
io_tlb_default_mem.can_grow = true;
+   if (flags & SWIOTLB_ANY)
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
+   else
+   io_tlb_default_mem.phys_limit = ARCH_LOW_ADDRESS_LIMIT;
 #endif
 
if (!default_nareas)
@@ -409,6 +413,12 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 #ifdef CONFIG_SWIOTLB_DYNAMIC
if (!remap)
io_tlb_default_mem.can_grow = true;
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
+   else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
+   else
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
 #endif
 
if (!default_nareas)
@@ -1398,7 +1408,11 @@ phys_addr_t default_swiotlb_base(void)
  */
 phys_addr_t default_swiotlb_limit(void)
 {
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   return io_tlb_default_mem.phys_limit;
+#else
return io_tlb_default_mem.defpool.end - 1;
+#endif
 }
 
 #ifdef CONFIG_DEBUG_FS
-- 
2.25.1




[PATCH v6 6/9] swiotlb: if swiotlb is full, fall back to a transient memory pool

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

Try to allocate a transient memory pool if no suitable slots can be found
and the respective SWIOTLB is allowed to grow. The transient pool is just
enough big for this one bounce buffer. It is inserted into a per-device
list of transient memory pools, and it is freed again when the bounce
buffer is unmapped.

Transient memory pools are kept in an RCU list. A memory barrier is
required after adding a new entry, because any address within a transient
buffer must be immediately recognized as belonging to the SWIOTLB, even if
it is passed to another CPU.

Deletion does not require any synchronization beyond RCU ordering
guarantees. After a buffer is unmapped, its physical addresses may no
longer be passed to the DMA API, so the memory range of the corresponding
stale entry in the RCU list never matches. If the memory range gets
allocated again, then it happens only after a RCU quiescent state.

Since bounce buffers can now be allocated from different pools, add a
parameter to swiotlb_alloc_pool() to let the caller know which memory pool
is used. Add swiotlb_find_pool() to find the memory pool corresponding to
an address. This function is now also used by is_swiotlb_buffer(), because
a simple boundary check is no longer sufficient.

The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(),
simplified and enhanced to use coherent memory pools if needed.

Note that this is not the most efficient way to provide a bounce buffer,
but when a DMA buffer can't be mapped, something may (and will) actually
break. At that point it is better to make an allocation, even if it may be
an expensive operation.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   6 +
 include/linux/dma-mapping.h |   2 +
 include/linux/swiotlb.h |  29 +++-
 kernel/dma/direct.c |   2 +-
 kernel/dma/swiotlb.c| 316 +++-
 5 files changed, 345 insertions(+), 10 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index d9754a68ba95..5fd89c9d005c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -626,6 +626,8 @@ struct device_physical_location {
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
+ * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
+ * @dma_io_tlb_lock:   Protects changes to the list of active pools.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -731,6 +733,10 @@ struct device {
 #endif
 #ifdef CONFIG_SWIOTLB
struct io_tlb_mem *dma_io_tlb_mem;
+#endif
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   struct list_head dma_io_tlb_pools;
+   spinlock_t dma_io_tlb_lock;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index e13050eb9777..f0ccca16a0ac 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -418,6 +418,8 @@ static inline void dma_sync_sgtable_for_device(struct 
device *dev,
 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
 #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
 
+bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
+
 static inline void *dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp)
 {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 57be2a0a9fbf..66867d2188ba 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -80,6 +80,9 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @area_nslabs: Number of slots in each area.
  * @areas: Array of memory area descriptors.
  * @slots: Array of slot descriptors.
+ * @node:  Member of the IO TLB memory pool list.
+ * @rcu:   RCU head for swiotlb_dyn_free().
+ * @transient:  %true if transient memory pool.
  */
 struct io_tlb_pool {
phys_addr_t start;
@@ -91,6 +94,11 @@ struct io_tlb_pool {
unsigned int area_nslabs;
struct io_tlb_area *areas;
struct io_tlb_slot *slots;
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   struct list_head node;
+   struct rcu_head rcu;
+   bool transient;
+#endif
 };
 
 /**
@@ -122,6 +130,20 @@ struct io_tlb_mem {
 #endif
 };
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+
+struct io_tlb_pool *swiotlb_find_pool(struct device *dev, phys_addr_t paddr);
+
+#else
+
+static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev,
+   phys_addr_t paddr)
+{
+   return >dma_io_tlb_mem->defpool;
+}
+
+#endif
+
 /**
  * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
  * @dev:Device which has mapped the buffer.
@@ -137,7 +

[PATCH v6 9/9] swiotlb: search the software IO TLB only if the device makes use of it

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

Skip searching the software IO TLB if a device has never used it, making
sure these devices are not affected by the introduction of multiple IO TLB
memory pools.

Additional memory barrier is required to ensure that the new value of the
flag is visible to other CPUs after mapping a new bounce buffer. For
efficiency, the flag check should be inlined, and then the memory barrier
must be moved to is_swiotlb_buffer(). However, it can replace the existing
barrier in swiotlb_find_pool(), because all callers use is_swiotlb_buffer()
first to verify that the buffer address belongs to the software IO TLB.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |  2 ++
 include/linux/swiotlb.h |  7 ++-
 kernel/dma/swiotlb.c| 14 ++
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 5fd89c9d005c..6fc808d22bfd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -628,6 +628,7 @@ struct device_physical_location {
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
  * @dma_io_tlb_lock:   Protects changes to the list of active pools.
+ * @dma_uses_io_tlb: %true if device has used the software IO TLB.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -737,6 +738,7 @@ struct device {
 #ifdef CONFIG_SWIOTLB_DYNAMIC
struct list_head dma_io_tlb_pools;
spinlock_t dma_io_tlb_lock;
+   bool dma_uses_io_tlb;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8371c92a0271..b4536626f8ff 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -172,8 +172,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
if (!mem)
return false;
 
-   if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC))
+   if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {
+   /* Pairs with smp_wmb() in swiotlb_find_slots() and
+* swiotlb_dyn_alloc(), which modify the RCU lists.
+*/
+   smp_rmb();
return swiotlb_find_pool(dev, paddr);
+   }
return paddr >= mem->defpool.start && paddr < mem->defpool.end;
 }
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1560a3e484b9..1fe64573d828 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -730,7 +730,7 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
 
add_mem_pool(mem, pool);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
+   /* Pairs with smp_rmb() in is_swiotlb_buffer(). */
smp_wmb();
 }
 
@@ -764,11 +764,6 @@ struct io_tlb_pool *swiotlb_find_pool(struct device *dev, 
phys_addr_t paddr)
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
struct io_tlb_pool *pool;
 
-   /* Pairs with smp_wmb() in swiotlb_find_slots() and
-* swiotlb_dyn_alloc(), which modify the RCU lists.
-*/
-   smp_rmb();
-
rcu_read_lock();
list_for_each_entry_rcu(pool, >pools, node) {
if (paddr >= pool->start && paddr < pool->end)
@@ -813,6 +808,7 @@ void swiotlb_dev_init(struct device *dev)
 #ifdef CONFIG_SWIOTLB_DYNAMIC
INIT_LIST_HEAD(>dma_io_tlb_pools);
spin_lock_init(>dma_io_tlb_lock);
+   dev->dma_uses_io_tlb = false;
 #endif
 }
 
@@ -1157,9 +1153,11 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
list_add_rcu(>node, >dma_io_tlb_pools);
spin_unlock_irqrestore(>dma_io_tlb_lock, flags);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
-   smp_wmb();
 found:
+   dev->dma_uses_io_tlb = true;
+   /* Pairs with smp_rmb() in is_swiotlb_buffer() */
+   smp_wmb();
+
*retpool = pool;
return index;
 }
-- 
2.25.1




[PATCH v6 8/9] swiotlb: allocate a new memory pool when existing pools are full

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

When swiotlb_find_slots() cannot find suitable slots, schedule the
allocation of a new memory pool. It is not possible to allocate the pool
immediately, because this code may run in interrupt context, which is not
suitable for large memory allocations. This means that the memory pool will
be available too late for the currently requested mapping, but the stress
on the software IO TLB allocator is likely to continue, and subsequent
allocations will benefit from the additional pool eventually.

Keep all memory pools for an allocator in an RCU list to avoid locking on
the read side. For modifications, add a new spinlock to struct io_tlb_mem.

The spinlock also protects updates to the total number of slabs (nslabs in
struct io_tlb_mem), but not reads of the value. Readers may therefore
encounter a stale value, but this is not an issue:

- swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero
  value. This is ensured by the existence of the default memory pool,
  allocated at boot.

- The exact value is used only for non-critical purposes (debugfs, kernel
  messages).

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |   8 +++
 kernel/dma/swiotlb.c| 148 +---
 2 files changed, 131 insertions(+), 25 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 9825fa14abe4..8371c92a0271 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct page;
@@ -104,12 +105,16 @@ struct io_tlb_pool {
 /**
  * struct io_tlb_mem - Software IO TLB allocator
  * @defpool:   Default (initial) IO TLB memory pool descriptor.
+ * @pool:  IO TLB memory pool descriptor (if not dynamic).
  * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
  * @can_grow:  %true if more pools can be allocated dynamically.
  * @phys_limit:Maximum allowed physical address.
+ * @lock:  Lock to synchronize changes to the list.
+ * @pools: List of IO TLB memory pool descriptors (if dynamic).
+ * @dyn_alloc: Dynamic IO TLB pool allocation work.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -125,6 +130,9 @@ struct io_tlb_mem {
 #ifdef CONFIG_SWIOTLB_DYNAMIC
bool can_grow;
u64 phys_limit;
+   spinlock_t lock;
+   struct list_head pools;
+   struct work_struct dyn_alloc;
 #endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ca3aa03f37ba..1560a3e484b9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -79,8 +79,23 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+
+static void swiotlb_dyn_alloc(struct work_struct *work);
+
+static struct io_tlb_mem io_tlb_default_mem = {
+   .lock = __SPIN_LOCK_UNLOCKED(io_tlb_default_mem.lock),
+   .pools = LIST_HEAD_INIT(io_tlb_default_mem.pools),
+   .dyn_alloc = __WORK_INITIALIZER(io_tlb_default_mem.dyn_alloc,
+   swiotlb_dyn_alloc),
+};
+
+#else  /* !CONFIG_SWIOTLB_DYNAMIC */
+
 static struct io_tlb_mem io_tlb_default_mem;
 
+#endif /* CONFIG_SWIOTLB_DYNAMIC */
+
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
 static unsigned long default_nareas;
 
@@ -278,6 +293,23 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool 
*mem, phys_addr_t start,
return;
 }
 
+/**
+ * add_mem_pool() - add a memory pool to the allocator
+ * @mem:   Software IO TLB allocator.
+ * @pool:  Memory pool to be added.
+ */
+static void add_mem_pool(struct io_tlb_mem *mem, struct io_tlb_pool *pool)
+{
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   spin_lock(>lock);
+   list_add_rcu(>node, >pools);
+   mem->nslabs += pool->nslabs;
+   spin_unlock(>lock);
+#else
+   mem->nslabs = pool->nslabs;
+#endif
+}
+
 static void __init *swiotlb_memblock_alloc(unsigned long nslabs,
unsigned int flags,
int (*remap)(void *tlb, unsigned long nslabs))
@@ -375,7 +407,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 
swiotlb_init_io_tlb_pool(mem, __pa(tlb), nslabs, false,
 default_nareas);
-   io_tlb_default_mem.nslabs = nslabs;
+   add_mem_pool(_tlb_default_mem, mem);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();
@@ -474,7 +506,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);

[PATCH v6 5/9] swiotlb: add a flag whether SWIOTLB is allowed to grow

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

Add a config option (CONFIG_SWIOTLB_DYNAMIC) to enable or disable dynamic
allocation of additional bounce buffers.

If this option is set, mark the default SWIOTLB as able to grow and
restricted DMA pools as unable.

However, if the address of the default memory pool is explicitly queried,
make the default SWIOTLB also unable to grow. This is currently used to set
up PCI BAR movable regions on some Octeon MIPS boards which may not be able
to use a SWIOTLB pool elsewhere in physical memory. See octeon_pci_setup()
for more details.

If a remap function is specified, it must be also called on any dynamically
allocated pools, but there are some issues:

- The remap function may block, so it should not be called from an atomic
  context.
- There is no corresponding unremap() function if the memory pool is
  freed.
- The only in-tree implementation (xen_swiotlb_fixup) requires that the
  number of slots in the memory pool is a multiple of SWIOTLB_SEGSIZE.

Keep it simple for now and disable growing the SWIOTLB if a remap function
was specified.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  4 
 kernel/dma/Kconfig  | 13 +
 kernel/dma/swiotlb.c| 13 +
 3 files changed, 30 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 247f0ab8795a..57be2a0a9fbf 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -100,6 +100,7 @@ struct io_tlb_pool {
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @can_grow:  %true if more pools can be allocated dynamically.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -112,6 +113,9 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool force_bounce;
bool for_alloc;
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   bool can_grow;
+#endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 11d077003205..68c61fdf2b44 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -90,6 +90,19 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config SWIOTLB_DYNAMIC
+   bool "Dynamic allocation of DMA bounce buffers"
+   default n
+   depends on SWIOTLB
+   help
+ This enables dynamic resizing of the software IO TLB. The kernel
+ starts with one memory pool at boot and it will allocate additional
+ pools as needed. To reduce run-time kernel memory requirements, you
+ may have to specify a smaller size of the initial pool using
+ "swiotlb=" on the kernel command line.
+
+ If unsure, say N.
+
 config DMA_BOUNCE_UNALIGNED_KMALLOC
bool
depends on SWIOTLB
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 616113760b60..346857581b75 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -330,6 +330,11 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
io_tlb_default_mem.force_bounce =
swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   if (!remap)
+   io_tlb_default_mem.can_grow = true;
+#endif
+
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
 
@@ -400,6 +405,11 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   if (!remap)
+   io_tlb_default_mem.can_grow = true;
+#endif
+
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
 
@@ -1074,6 +1084,9 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
  */
 phys_addr_t default_swiotlb_base(void)
 {
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   io_tlb_default_mem.can_grow = false;
+#endif
return io_tlb_default_mem.defpool.start;
 }
 
-- 
2.25.1




[PATCH v6 4/9] swiotlb: separate memory pool data from other allocator data

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

Carve out memory pool specific fields from struct io_tlb_mem. The original
struct now contains shared data for the whole allocator, while the new
struct io_tlb_pool contains data that is specific to one memory pool of
(potentially) many.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   2 +-
 include/linux/swiotlb.h |  45 +++
 kernel/dma/swiotlb.c| 175 +---
 3 files changed, 140 insertions(+), 82 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index bbaeabd04b0d..d9754a68ba95 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -625,7 +625,7 @@ struct device_physical_location {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
- * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
+ * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 31625ae507ea..247f0ab8795a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,8 +62,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 
 /**
- * struct io_tlb_mem - IO TLB Memory Pool Descriptor
- *
+ * struct io_tlb_pool - IO TLB memory pool descriptor
  * @start: The start address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
@@ -73,15 +72,34 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
  * may be remapped in the memory encrypted case and store virtual
  * address for bounce buffer operation.
- * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. For default swiotlb, this is command line adjustable via
- * setup_io_tlb_npages.
+ * @nslabs:The number of IO TLB slots between @start and @end. For the
+ * default swiotlb, this can be adjusted with a boot parameter,
+ * see setup_io_tlb_npages().
+ * @late_alloc:%true if allocated using the page allocator.
+ * @nareas:Number of areas in the pool.
+ * @area_nslabs: Number of slots in each area.
+ * @areas: Array of memory area descriptors.
+ * @slots: Array of slot descriptors.
+ */
+struct io_tlb_pool {
+   phys_addr_t start;
+   phys_addr_t end;
+   void *vaddr;
+   unsigned long nslabs;
+   bool late_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
+   struct io_tlb_slot *slots;
+};
+
+/**
+ * struct io_tlb_mem - Software IO TLB allocator
+ * @defpool:   Default (initial) IO TLB memory pool descriptor.
+ * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
- * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
- * @nareas:  The area number in the pool.
- * @area_nslabs: The slot number in the area.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -89,18 +107,11 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
phys,
  * in debugfs.
  */
 struct io_tlb_mem {
-   phys_addr_t start;
-   phys_addr_t end;
-   void *vaddr;
+   struct io_tlb_pool defpool;
unsigned long nslabs;
struct dentry *debugfs;
-   bool late_alloc;
bool force_bounce;
bool for_alloc;
-   unsigned int nareas;
-   unsigned int area_nslabs;
-   struct io_tlb_area *areas;
-   struct io_tlb_slot *slots;
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
@@ -122,7 +133,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
-   return mem && paddr >= mem->start && paddr < mem->end;
+   return mem && paddr >= mem->defpool.start && paddr < mem->defpool.end;
 }
 
 static inline bool is_swiotlb_force_bounce(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ef5d5e41a17f..616113760b60 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -209,7 +209,7 @@ void __init swiotlb_adjust_size(unsigned long size)
 
 void swiotlb_print_info(void)
 {
-   struct io_tlb_m

[PATCH v6 3/9] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

Add some kernel-doc comments and move the existing documentation of struct
io_tlb_slot to its correct location. The latter was forgotten in commit
942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").

Use the opportunity to give swiotlb_do_find_slots() a more descriptive name
and make it clear how it differs from swiotlb_find_slots().

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h | 15 +++---
 kernel/dma/swiotlb.c| 61 +
 2 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 2d453b3e7771..31625ae507ea 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -76,10 +76,6 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
- * @list:  The free list describing the number of free entries available
- * from each index.
- * @orig_addr: The original address corresponding to a mapped entry.
- * @alloc_size:Size of the allocated buffer.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
@@ -111,6 +107,17 @@ struct io_tlb_mem {
 #endif
 };
 
+/**
+ * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
+ * @dev:Device which has mapped the buffer.
+ * @paddr:  Physical address within the DMA buffer.
+ *
+ * Check if @paddr points into a bounce buffer.
+ *
+ * Return:
+ * * %true if @paddr points into a bounce buffer
+ * * %false otherwise
+ */
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0b173303e088..ef5d5e41a17f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -62,6 +62,13 @@
 
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
+/**
+ * struct io_tlb_slot - IO TLB slot descriptor
+ * @orig_addr: The original address corresponding to a mapped entry.
+ * @alloc_size:Size of the allocated buffer.
+ * @list:  The free list describing the number of free entries available
+ * from each index.
+ */
 struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -635,11 +642,22 @@ static void dec_used(struct io_tlb_mem *mem, unsigned int 
nslots)
 }
 #endif /* CONFIG_DEBUG_FS */
 
-/*
- * Find a suitable number of IO TLB entries size that will fit this request and
- * allocate a buffer from that IO TLB pool.
+/**
+ * swiotlb_area_find_slots() - search for slots in one IO TLB memory area
+ * @dev:   Device which maps the buffer.
+ * @area_index:Index of the IO TLB memory area to be searched.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Find a suitable sequence of IO TLB entries for the request and allocate
+ * a buffer from the given IO TLB memory area.
+ * This function takes care of locking.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
  */
-static int swiotlb_do_find_slots(struct device *dev, int area_index,
+static int swiotlb_area_find_slots(struct device *dev, int area_index,
phys_addr_t orig_addr, size_t alloc_size,
unsigned int alloc_align_mask)
 {
@@ -734,6 +752,19 @@ static int swiotlb_do_find_slots(struct device *dev, int 
area_index,
return slot_index;
 }
 
+/**
+ * swiotlb_find_slots() - search for slots in the whole swiotlb
+ * @dev:   Device which maps the buffer.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Search through the whole software IO TLB to find a sequence of slots that
+ * match the allocation constraints.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
+ */
 static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size, unsigned int alloc_align_mask)
 {
@@ -742,8 +773,8 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
int i = start, index;
 
do {
-   index = swiotlb_do_find_slots(dev, i, orig_addr, alloc_size,
- alloc_align_mask);
+   index = swiotlb_area_find_slots(dev, i, orig_addr, alloc_size,
+   alloc_align_mask);
i

[PATCH v6 2/9] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

SWIOTLB implementation details should not be exposed to the rest of the
kernel. This will allow to make changes to the implementation without
modifying non-swiotlb code.

To avoid breaking existing users, provide helper functions for the few
required fields.

As a bonus, using a helper function to initialize struct device allows to
get rid of an #ifdef in driver core.

Signed-off-by: Petr Tesarik 
---
 arch/mips/pci/pci-octeon.c |  2 +-
 drivers/base/core.c|  4 +---
 drivers/xen/swiotlb-xen.c  |  2 +-
 include/linux/swiotlb.h| 25 +++-
 kernel/dma/swiotlb.c   | 39 +-
 mm/slab_common.c   |  5 ++---
 6 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
index e457a18cbdc5..d19d9d456309 100644
--- a/arch/mips/pci/pci-octeon.c
+++ b/arch/mips/pci/pci-octeon.c
@@ -664,7 +664,7 @@ static int __init octeon_pci_setup(void)
 
/* BAR1 movable regions contiguous to cover the swiotlb */
octeon_bar1_pci_phys =
-   io_tlb_default_mem.start & ~((1ull << 22) - 1);
+   default_swiotlb_base() & ~((1ull << 22) - 1);
 
for (index = 0; index < 32; index++) {
union cvmx_pci_bar1_indexx bar1_index;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3dff5037943e..46d1d78c5beb 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3108,9 +3108,7 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
-#ifdef CONFIG_SWIOTLB
-   dev->dma_io_tlb_mem = _tlb_default_mem;
-#endif
+   swiotlb_dev_init(dev);
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 67aa74d20162..946bd56f0ac5 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -381,7 +381,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, default_swiotlb_limit()) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4e52cd5e0bdc..2d453b3e7771 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -110,7 +110,6 @@ struct io_tlb_mem {
atomic_long_t used_hiwater;
 #endif
 };
-extern struct io_tlb_mem io_tlb_default_mem;
 
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
@@ -128,13 +127,22 @@ static inline bool is_swiotlb_force_bounce(struct device 
*dev)
 
 void swiotlb_init(bool addressing_limited, unsigned int flags);
 void __init swiotlb_exit(void);
+void swiotlb_dev_init(struct device *dev);
 size_t swiotlb_max_mapping_size(struct device *dev);
+bool is_swiotlb_allocated(void);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+phys_addr_t default_swiotlb_base(void);
+phys_addr_t default_swiotlb_limit(void);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
 }
+
+static inline void swiotlb_dev_init(struct device *dev)
+{
+}
+
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
@@ -151,6 +159,11 @@ static inline size_t swiotlb_max_mapping_size(struct 
device *dev)
return SIZE_MAX;
 }
 
+static inline bool is_swiotlb_allocated(void)
+{
+   return false;
+}
+
 static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
@@ -159,6 +172,16 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline phys_addr_t default_swiotlb_base(void)
+{
+   return 0;
+}
+
+static inline phys_addr_t default_swiotlb_limit(void)
+{
+   return 0;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 66fc8ec9ae45..0b173303e088 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -71,7 +71,7 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
-struct io_tlb_mem io_tlb_default_mem;
+static struct io_tlb_mem io_tlb_default_mem;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
 static unsigned long default_nareas;
@@ -489,6 +489,15 @@ void __init swiotlb_exit(void)
memset(mem, 0, sizeof(*mem));
 }
 
+/**
+ * swiotlb_dev_init() - initialize swiotlb fields in  device
+ * @dev:   Device to be initialized.
+ */
+void swiotlb_dev_init(struct device *dev)
+{
+   dev->dma_io_tlb_mem = _tlb_default_mem;
+}
+

[PATCH v6 1/9] swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

If swiotlb is allocated, immediately return 0, so callers do not have to
check io_tlb_default_mem.nslabs explicitly.

Signed-off-by: Petr Tesarik 
---
 arch/arm/xen/mm.c | 10 --
 arch/x86/kernel/pci-dma.c | 12 ++--
 kernel/dma/swiotlb.c  |  3 +++
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 3d826c0b5fee..882cd70c7a2f 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -125,12 +125,10 @@ static int __init xen_mm_init(void)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
-   rc = swiotlb_init_late(swiotlb_size_or_default(),
-  xen_swiotlb_gfp(), NULL);
-   if (rc < 0)
-   return rc;
-   }
+   rc = swiotlb_init_late(swiotlb_size_or_default(),
+  xen_swiotlb_gfp(), NULL);
+   if (rc < 0)
+   return rc;
 
cflush.op = 0;
cflush.a.dev_bus_addr = 0;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de6be0a3965e..08988b0a1c91 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -86,16 +86,16 @@ static void __init pci_xen_swiotlb_init(void)
 
 int pci_xen_swiotlb_init_late(void)
 {
+   int rc;
+
if (dma_ops == _swiotlb_dma_ops)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
-   int rc = swiotlb_init_late(swiotlb_size_or_default(),
-  GFP_KERNEL, xen_swiotlb_fixup);
-   if (rc < 0)
-   return rc;
-   }
+   rc = swiotlb_init_late(swiotlb_size_or_default(),
+  GFP_KERNEL, xen_swiotlb_fixup);
+   if (rc < 0)
+   return rc;
 
/* XXX: this switches the dma ops under live devices! */
dma_ops = _swiotlb_dma_ops;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2b83e3ad9dca..66fc8ec9ae45 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -384,6 +384,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
bool retried = false;
int rc = 0;
 
+   if (io_tlb_default_mem.nslabs)
+   return 0;
+
if (swiotlb_force_disable)
return 0;
 
-- 
2.25.1




[PATCH v6 0/9] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-27 Thread Petr Tesarik
From: Petr Tesarik 

Motivation
==

The software IO TLB was designed with these assumptions:

1) It would not be used much. Small systems (little RAM) don't need it, and
   big systems (lots of RAM) would have modern DMA controllers and an IOMMU
   chip to handle legacy devices.
2) A small fixed memory area (64 MiB by default) is sufficient to
   handle the few cases which require a bounce buffer.
3) 64 MiB is little enough that it has no impact on the rest of the
   system.
4) Bounce buffers require large contiguous chunks of low memory. Such
   memory is precious and can be allocated only early at boot.

It turns out they are not always true:

1) Embedded systems may have more than 4GiB RAM but no IOMMU and legacy
   32-bit peripheral busses and/or DMA controllers.
2) CoCo VMs use bounce buffers for all I/O but may need substantially more
   than 64 MiB.
3) Embedded developers put as many features as possible into the available
   memory. A few dozen "missing" megabytes may limit what features can be
   implemented.
4) If CMA is available, it can allocate large continuous chunks even after
   the system has run for some time.

Goals
=

The goal of this work is to start with a small software IO TLB at boot and
expand it later when/if needed.

Design
==

This version of the patch series retains the current slot allocation
algorithm with multiple areas to reduce lock contention, but additional
slots can be added when necessary.

These alternatives have been considered:

- Allocate and free buffers as needed using direct DMA API. This works
  quite well, except in CoCo VMs where each allocation/free requires
  decrypting/encrypting memory, which is a very expensive operation.

- Allocate a very large software IO TLB at boot, but allow to migrate pages
  to/from it (like CMA does). For systems with CMA, this would mean two big
  allocations at boot. Finding the balance between CMA, SWIOTLB and rest of
  available RAM can be challenging. More importantly, there is no clear
  benefit compared to allocating SWIOTLB memory pools from the CMA.

Implementation Constraints
==

These constraints have been taken into account:

1) Minimize impact on devices which do not benefit from the change.
2) Minimize the number of memory decryption/encryption operations.
3) Avoid contention on a lock or atomic variable to preserve parallel
   scalability.

Additionally, the software IO TLB code is also used to implement restricted
DMA pools. These pools are restricted to a pre-defined physical memory
region and must not use any other memory. In other words, dynamic
allocation of memory pools must be disabled for restricted DMA pools.

Data Structures
===

The existing struct io_tlb_mem is the central type for a SWIOTLB allocator,
but it now contains multiple memory pools::

  io_tlb_mem
  +-+   io_tlb_pool
  | SWIOTLB |   +---+   +---+   +---+
  |allocator|-->|default|-->|dynamic|-->|dynamic|-->...
  | |   |memory |   |memory |   |memory |
  +-+   | pool  |   | pool  |   | pool  |
+---+   +---+   +---+

The allocator structure contains global state (such as flags and counters)
and structures needed to schedule new allocations. Each memory pool
contains the actual buffer slots and metadata. The first memory pool in the
list is the default memory pool allocated statically at early boot.

New memory pools are allocated from a kernel worker thread. That's because
bounce buffers are allocated when mapping a DMA buffer, which may happen in
interrupt context where large atomic allocations would probably fail.
Allocation from process context is much more likely to succeed, especially
if it can use CMA.

Nonetheless, the onset of a load spike may fill up the SWIOTLB before the
worker has a chance to run. In that case, try to allocate a small transient
memory pool to accommodate the request. If memory is encrypted and the
device cannot do DMA to encrypted memory, this buffer is allocated from the
coherent atomic DMA memory pool. Reducing the size of SWIOTLB may therefore
require increasing the size of the coherent pool with the "coherent_pool"
command-line parameter.

Performance
===

All testing compared a vanilla v6.4-rc6 kernel with a fully patched
kernel. The kernel was booted with "swiotlb=force" to allow stress-testing
the software IO TLB on a high-performance device that would otherwise not
need it. CONFIG_DEBUG_FS was set to 'y' to match the configuration of
popular distribution kernels; it is understood that parallel workloads
suffer from contention on the recently added debugfs atomic counters.

These benchmarks were run:

- small: single-threaded I/O of 4 KiB blocks,
- big: single-threaded I/O of 64 KiB blocks,
- 4way: 4-way parallel I/O of 4 KiB blocks.

In all tested cases, the default 64 MiB SWIOTLB would be sufficient (but
wasteful). The "default" 

[PATCH v5 3/9] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

Add some kernel-doc comments and move the existing documentation of struct
io_tlb_slot to its correct location. The latter was forgotten in commit
942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").

Use the opportunity to give swiotlb_do_find_slots() a more descriptive name
and make it clear how it differs from swiotlb_find_slots().

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h | 15 +++---
 kernel/dma/swiotlb.c| 61 +
 2 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 3b372364a144..b122ad90660d 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -76,10 +76,6 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
- * @list:  The free list describing the number of free entries available
- * from each index.
- * @orig_addr: The original address corresponding to a mapped entry.
- * @alloc_size:Size of the allocated buffer.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
@@ -111,6 +107,17 @@ struct io_tlb_mem {
 #endif
 };
 
+/**
+ * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
+ * @dev:Device which has mapped the buffer.
+ * @paddr:  Physical address within the DMA buffer.
+ *
+ * Check if @paddr points into a bounce buffer.
+ *
+ * Return:
+ * * %true if @paddr points into a bounce buffer
+ * * %false otherwise
+ */
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6734ef7b9d8d..7eafb8ceb577 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -62,6 +62,13 @@
 
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
+/**
+ * struct io_tlb_slot - IO TLB slot descriptor
+ * @orig_addr: The original address corresponding to a mapped entry.
+ * @alloc_size:Size of the allocated buffer.
+ * @list:  The free list describing the number of free entries available
+ * from each index.
+ */
 struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -635,11 +642,22 @@ static void dec_used(struct io_tlb_mem *mem, unsigned int 
nslots)
 }
 #endif /* CONFIG_DEBUG_FS */
 
-/*
- * Find a suitable number of IO TLB entries size that will fit this request and
- * allocate a buffer from that IO TLB pool.
+/**
+ * swiotlb_area_find_slots() - search for slots in one IO TLB memory area
+ * @dev:   Device which maps the buffer.
+ * @area_index:Index of the IO TLB memory area to be searched.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Find a suitable sequence of IO TLB entries for the request and allocate
+ * a buffer from the given IO TLB memory area.
+ * This function takes care of locking.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
  */
-static int swiotlb_do_find_slots(struct device *dev, int area_index,
+static int swiotlb_area_find_slots(struct device *dev, int area_index,
phys_addr_t orig_addr, size_t alloc_size,
unsigned int alloc_align_mask)
 {
@@ -734,6 +752,19 @@ static int swiotlb_do_find_slots(struct device *dev, int 
area_index,
return slot_index;
 }
 
+/**
+ * swiotlb_find_slots() - search for slots in the whole swiotlb
+ * @dev:   Device which maps the buffer.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Search through the whole software IO TLB to find a sequence of slots that
+ * match the allocation constraints.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
+ */
 static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size, unsigned int alloc_align_mask)
 {
@@ -742,8 +773,8 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
int i = start, index;
 
do {
-   index = swiotlb_do_find_slots(dev, i, orig_addr, alloc_size,
- alloc_align_mask);
+   index = swiotlb_area_find_slots(dev, i, orig_addr, alloc_size,
+   alloc_align_mask);
i

[PATCH v5 4/9] swiotlb: separate memory pool data from other allocator data

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

Carve out memory pool specific fields from struct io_tlb_mem. The original
struct now contains shared data for the whole allocator, while the new
struct io_tlb_pool contains data that is specific to one memory pool of
(potentially) many.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   2 +-
 include/linux/swiotlb.h |  45 +++
 kernel/dma/swiotlb.c| 175 +---
 3 files changed, 140 insertions(+), 82 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index bbaeabd04b0d..d9754a68ba95 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -625,7 +625,7 @@ struct device_physical_location {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
- * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
+ * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index b122ad90660d..fd01a0870b38 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,8 +62,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 
 /**
- * struct io_tlb_mem - IO TLB Memory Pool Descriptor
- *
+ * struct io_tlb_pool - IO TLB memory pool descriptor
  * @start: The start address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
@@ -73,15 +72,34 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
  * may be remapped in the memory encrypted case and store virtual
  * address for bounce buffer operation.
- * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. For default swiotlb, this is command line adjustable via
- * setup_io_tlb_npages.
+ * @nslabs:The number of IO TLB slots between @start and @end. For the
+ * default swiotlb, this can be adjusted with a boot parameter,
+ * see setup_io_tlb_npages().
+ * @late_alloc:%true if allocated using the page allocator.
+ * @nareas:Number of areas in the pool.
+ * @area_nslabs: Number of slots in each area.
+ * @areas: Array of memory area descriptors.
+ * @slots: Array of slot descriptors.
+ */
+struct io_tlb_pool {
+   phys_addr_t start;
+   phys_addr_t end;
+   void *vaddr;
+   unsigned long nslabs;
+   bool late_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
+   struct io_tlb_slot *slots;
+};
+
+/**
+ * struct io_tlb_mem - Software IO TLB allocator
+ * @defpool:   Default (initial) IO TLB memory pool descriptor.
+ * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
- * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
- * @nareas:  The area number in the pool.
- * @area_nslabs: The slot number in the area.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -89,18 +107,11 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
phys,
  * in debugfs.
  */
 struct io_tlb_mem {
-   phys_addr_t start;
-   phys_addr_t end;
-   void *vaddr;
+   struct io_tlb_pool defpool;
unsigned long nslabs;
struct dentry *debugfs;
-   bool late_alloc;
bool force_bounce;
bool for_alloc;
-   unsigned int nareas;
-   unsigned int area_nslabs;
-   struct io_tlb_area *areas;
-   struct io_tlb_slot *slots;
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
@@ -122,7 +133,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
-   return mem && paddr >= mem->start && paddr < mem->end;
+   return mem && paddr >= mem->defpool.start && paddr < mem->defpool.end;
 }
 
 static inline bool is_swiotlb_force_bounce(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7eafb8ceb577..11bacde00df7 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -209,7 +209,7 @@ void __init swiotlb_adjust_size(unsigned long size)
 
 void swiotlb_print_info(void)
 {
-   struct io_tlb_m

[PATCH v5 5/9] swiotlb: add a flag whether SWIOTLB is allowed to grow

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

Add a config option (CONFIG_SWIOTLB_DYNAMIC) to enable or disable dynamic
allocation of additional bounce buffers.

If this option is set, mark the default SWIOTLB as able to grow and
restricted DMA pools as unable.

However, if the address of the default memory pool is explicitly queried,
make the default SWIOTLB also unable to grow. This is currently used to set
up PCI BAR movable regions on some Octeon MIPS boards which may not be able
to use a SWIOTLB pool elsewhere in physical memory. See octeon_pci_setup()
for more details.

If a remap function is specified, it must be also called on any dynamically
allocated pools, but there are some issues:

- The remap function may block, so it should not be called from an atomic
  context.
- There is no corresponding unremap() function if the memory pool is
  freed.
- The only in-tree implementation (xen_swiotlb_fixup) requires that the
  number of slots in the memory pool is a multiple of SWIOTLB_SEGSIZE.

Keep it simple for now and disable growing the SWIOTLB if a remap function
was specified.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  4 
 kernel/dma/Kconfig  | 13 +
 kernel/dma/swiotlb.c| 13 +
 3 files changed, 30 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index fd01a0870b38..78a51d2f9f5c 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -100,6 +100,7 @@ struct io_tlb_pool {
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @can_grow:  %true if more pools can be allocated dynamically.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -112,6 +113,9 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool force_bounce;
bool for_alloc;
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   bool can_grow;
+#endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 11d077003205..68c61fdf2b44 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -90,6 +90,19 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config SWIOTLB_DYNAMIC
+   bool "Dynamic allocation of DMA bounce buffers"
+   default n
+   depends on SWIOTLB
+   help
+ This enables dynamic resizing of the software IO TLB. The kernel
+ starts with one memory pool at boot and it will allocate additional
+ pools as needed. To reduce run-time kernel memory requirements, you
+ may have to specify a smaller size of the initial pool using
+ "swiotlb=" on the kernel command line.
+
+ If unsure, say N.
+
 config DMA_BOUNCE_UNALIGNED_KMALLOC
bool
depends on SWIOTLB
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 11bacde00df7..5acb4552f869 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -330,6 +330,11 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
io_tlb_default_mem.force_bounce =
swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   if (!remap)
+   io_tlb_default_mem.can_grow = true;
+#endif
+
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
 
@@ -400,6 +405,11 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   if (!remap)
+   io_tlb_default_mem.can_grow = true;
+#endif
+
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
 
@@ -1066,6 +1076,9 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
  */
 phys_addr_t default_swiotlb_base(void)
 {
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   io_tlb_default_mem.can_grow = false;
+#endif
return io_tlb_default_mem.defpool.start;
 }
 
-- 
2.25.1




[PATCH v5 9/9] swiotlb: search the software IO TLB only if the device makes use of it

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

Skip searching the software IO TLB if a device has never used it, making
sure these devices are not affected by the introduction of multiple IO TLB
memory pools.

Additional memory barrier is required to ensure that the new value of the
flag is visible to other CPUs after mapping a new bounce buffer. For
efficiency, the flag check should be inlined, and then the memory barrier
must be moved to is_swiotlb_buffer(). However, it can replace the existing
barrier in swiotlb_find_pool(), because all callers use is_swiotlb_buffer()
first to verify that the buffer address belongs to the software IO TLB.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |  2 ++
 include/linux/swiotlb.h |  7 ++-
 kernel/dma/swiotlb.c| 14 ++
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 5fd89c9d005c..6fc808d22bfd 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -628,6 +628,7 @@ struct device_physical_location {
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
  * @dma_io_tlb_lock:   Protects changes to the list of active pools.
+ * @dma_uses_io_tlb: %true if device has used the software IO TLB.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -737,6 +738,7 @@ struct device {
 #ifdef CONFIG_SWIOTLB_DYNAMIC
struct list_head dma_io_tlb_pools;
spinlock_t dma_io_tlb_lock;
+   bool dma_uses_io_tlb;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8a9249685d0d..b3cf602922dc 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -172,8 +172,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
if (!mem)
return false;
 
-   if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC))
+   if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {
+   /* Pairs with smp_wmb() in swiotlb_find_slots() and
+* swiotlb_dyn_alloc(), which modify the RCU lists.
+*/
+   smp_rmb();
return swiotlb_find_pool(dev, paddr);
+   }
return paddr >= mem->defpool.start && paddr < mem->defpool.end;
 }
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 3156bb6aa343..87b8c2417c68 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -730,7 +730,7 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
 
add_mem_pool(mem, pool);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
+   /* Pairs with smp_rmb() in is_swiotlb_buffer(). */
smp_wmb();
 }
 
@@ -764,11 +764,6 @@ struct io_tlb_pool *swiotlb_find_pool(struct device *dev, 
phys_addr_t paddr)
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
struct io_tlb_pool *pool;
 
-   /* Pairs with smp_wmb() in swiotlb_find_slots() and
-* swiotlb_dyn_alloc(), which modify the RCU lists.
-*/
-   smp_rmb();
-
rcu_read_lock();
list_for_each_entry_rcu(pool, >pools, node) {
if (paddr >= pool->start && paddr < pool->end)
@@ -813,6 +808,7 @@ void swiotlb_dev_init(struct device *dev)
 #ifdef CONFIG_SWIOTLB_DYNAMIC
INIT_LIST_HEAD(>dma_io_tlb_pools);
spin_lock_init(>dma_io_tlb_lock);
+   dev->dma_uses_io_tlb = false;
 #endif
 }
 
@@ -1157,9 +1153,11 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
list_add_rcu(>node, >dma_io_tlb_pools);
spin_unlock_irqrestore(>dma_io_tlb_lock, flags);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
-   smp_wmb();
 found:
+   dev->dma_uses_io_tlb = true;
+   /* Pairs with smp_rmb() in is_swiotlb_buffer() */
+   smp_wmb();
+
*retpool = pool;
return index;
 }
-- 
2.25.1




[PATCH v5 8/9] swiotlb: allocate a new memory pool when existing pools are full

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

When swiotlb_find_slots() cannot find suitable slots, schedule the
allocation of a new memory pool. It is not possible to allocate the pool
immediately, because this code may run in interrupt context, which is not
suitable for large memory allocations. This means that the memory pool will
be available too late for the currently requested mapping, but the stress
on the software IO TLB allocator is likely to continue, and subsequent
allocations will benefit from the additional pool eventually.

Keep all memory pools for an allocator in an RCU list to avoid locking on
the read side. For modifications, add a new spinlock to struct io_tlb_mem.

The spinlock also protects updates to the total number of slabs (nslabs in
struct io_tlb_mem), but not reads of the value. Readers may therefore
encounter a stale value, but this is not an issue:

- swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero
  value. This is ensured by the existence of the default memory pool,
  allocated at boot.

- The exact value is used only for non-critical purposes (debugfs, kernel
  messages).

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |   8 +++
 kernel/dma/swiotlb.c| 148 +---
 2 files changed, 131 insertions(+), 25 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d5ce51657fac..8a9249685d0d 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct page;
@@ -104,12 +105,16 @@ struct io_tlb_pool {
 /**
  * struct io_tlb_mem - Software IO TLB allocator
  * @defpool:   Default (initial) IO TLB memory pool descriptor.
+ * @pool:  IO TLB memory pool descriptor (if not dynamic).
  * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
  * @can_grow:  %true if more pools can be allocated dynamically.
  * @phys_limit:Maximum allowed physical address.
+ * @lock:  Lock to synchronize changes to the list.
+ * @pools: List of IO TLB memory pool descriptors (if dynamic).
+ * @dyn_alloc: Dynamic IO TLB pool allocation work.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -125,6 +130,9 @@ struct io_tlb_mem {
 #ifdef CONFIG_SWIOTLB_DYNAMIC
bool can_grow;
u64 phys_limit;
+   spinlock_t lock;
+   struct list_head pools;
+   struct work_struct dyn_alloc;
 #endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7e8edf011bba..3156bb6aa343 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -79,8 +79,23 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+
+static void swiotlb_dyn_alloc(struct work_struct *work);
+
+static struct io_tlb_mem io_tlb_default_mem = {
+   .lock = __SPIN_LOCK_UNLOCKED(io_tlb_default_mem.lock),
+   .pools = LIST_HEAD_INIT(io_tlb_default_mem.pools),
+   .dyn_alloc = __WORK_INITIALIZER(io_tlb_default_mem.dyn_alloc,
+   swiotlb_dyn_alloc),
+};
+
+#else  /* !CONFIG_SWIOTLB_DYNAMIC */
+
 static struct io_tlb_mem io_tlb_default_mem;
 
+#endif /* CONFIG_SWIOTLB_DYNAMIC */
+
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
 static unsigned long default_nareas;
 
@@ -278,6 +293,23 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool 
*mem, phys_addr_t start,
return;
 }
 
+/**
+ * add_mem_pool() - add a memory pool to the allocator
+ * @mem:   Software IO TLB allocator.
+ * @pool:  Memory pool to be added.
+ */
+static void add_mem_pool(struct io_tlb_mem *mem, struct io_tlb_pool *pool)
+{
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   spin_lock(>lock);
+   list_add_rcu(>node, >pools);
+   mem->nslabs += pool->nslabs;
+   spin_unlock(>lock);
+#else
+   mem->nslabs = pool->nslabs;
+#endif
+}
+
 static void __init *swiotlb_memblock_alloc(unsigned long nslabs,
unsigned int flags,
int (*remap)(void *tlb, unsigned long nslabs))
@@ -375,7 +407,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 
swiotlb_init_io_tlb_pool(mem, __pa(tlb), nslabs, false,
 default_nareas);
-   io_tlb_default_mem.nslabs = nslabs;
+   add_mem_pool(_tlb_default_mem, mem);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();
@@ -474,7 +506,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);

[PATCH v5 7/9] swiotlb: determine potential physical address limit

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

The value returned by default_swiotlb_limit() should be constant, because
it is used to decide whether DMA can be used. To allow allocating memory
pools on the fly, use the maximum possible physical address rather than the
highest address used by the default pool.

For swiotlb_init_remap(), this is either an arch-specific limit used by
memblock_alloc_low(), or the highest directly mapped physical address if
the initialization flags include SWIOTLB_ANY. For swiotlb_init_late(), the
highest address is determined by the GFP flags.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/swiotlb.c| 14 ++
 2 files changed, 16 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index ad5bdf66ed7a..d5ce51657fac 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -109,6 +109,7 @@ struct io_tlb_pool {
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
  * @can_grow:  %true if more pools can be allocated dynamically.
+ * @phys_limit:Maximum allowed physical address.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -123,6 +124,7 @@ struct io_tlb_mem {
bool for_alloc;
 #ifdef CONFIG_SWIOTLB_DYNAMIC
bool can_grow;
+   u64 phys_limit;
 #endif
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8672810f7c56..7e8edf011bba 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -334,6 +334,10 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 #ifdef CONFIG_SWIOTLB_DYNAMIC
if (!remap)
io_tlb_default_mem.can_grow = true;
+   if (flags & SWIOTLB_ANY)
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
+   else
+   io_tlb_default_mem.phys_limit = ARCH_LOW_ADDRESS_LIMIT;
 #endif
 
if (!default_nareas)
@@ -409,6 +413,12 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 #ifdef CONFIG_SWIOTLB_DYNAMIC
if (!remap)
io_tlb_default_mem.can_grow = true;
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
+   else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
+   else
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
 #endif
 
if (!default_nareas)
@@ -1390,7 +1400,11 @@ phys_addr_t default_swiotlb_base(void)
  */
 phys_addr_t default_swiotlb_limit(void)
 {
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   return io_tlb_default_mem.phys_limit;
+#else
return io_tlb_default_mem.defpool.end - 1;
+#endif
 }
 
 #ifdef CONFIG_DEBUG_FS
-- 
2.25.1




[PATCH v5 6/9] swiotlb: if swiotlb is full, fall back to a transient memory pool

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

Try to allocate a transient memory pool if no suitable slots can be found
and the respective SWIOTLB is allowed to grow. The transient pool is just
enough big for this one bounce buffer. It is inserted into a per-device
list of transient memory pools, and it is freed again when the bounce
buffer is unmapped.

Transient memory pools are kept in an RCU list. A memory barrier is
required after adding a new entry, because any address within a transient
buffer must be immediately recognized as belonging to the SWIOTLB, even if
it is passed to another CPU.

Deletion does not require any synchronization beyond RCU ordering
guarantees. After a buffer is unmapped, its physical addresses may no
longer be passed to the DMA API, so the memory range of the corresponding
stale entry in the RCU list never matches. If the memory range gets
allocated again, then it happens only after a RCU quiescent state.

Since bounce buffers can now be allocated from different pools, add a
parameter to swiotlb_alloc_pool() to let the caller know which memory pool
is used. Add swiotlb_find_pool() to find the memory pool corresponding to
an address. This function is now also used by is_swiotlb_buffer(), because
a simple boundary check is no longer sufficient.

The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(),
simplified and enhanced to use coherent memory pools if needed.

Note that this is not the most efficient way to provide a bounce buffer,
but when a DMA buffer can't be mapped, something may (and will) actually
break. At that point it is better to make an allocation, even if it may be
an expensive operation.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   6 +
 include/linux/dma-mapping.h |   2 +
 include/linux/swiotlb.h |  29 +++-
 kernel/dma/direct.c |   2 +-
 kernel/dma/swiotlb.c| 316 +++-
 5 files changed, 345 insertions(+), 10 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index d9754a68ba95..5fd89c9d005c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -626,6 +626,8 @@ struct device_physical_location {
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
+ * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
+ * @dma_io_tlb_lock:   Protects changes to the list of active pools.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -731,6 +733,10 @@ struct device {
 #endif
 #ifdef CONFIG_SWIOTLB
struct io_tlb_mem *dma_io_tlb_mem;
+#endif
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   struct list_head dma_io_tlb_pools;
+   spinlock_t dma_io_tlb_lock;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index e13050eb9777..f0ccca16a0ac 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -418,6 +418,8 @@ static inline void dma_sync_sgtable_for_device(struct 
device *dev,
 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
 #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
 
+bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
+
 static inline void *dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp)
 {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 78a51d2f9f5c..ad5bdf66ed7a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -80,6 +80,9 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @area_nslabs: Number of slots in each area.
  * @areas: Array of memory area descriptors.
  * @slots: Array of slot descriptors.
+ * @node:  Member of the IO TLB memory pool list.
+ * @rcu:   RCU head for swiotlb_dyn_free().
+ * @transient:  %true if transient memory pool.
  */
 struct io_tlb_pool {
phys_addr_t start;
@@ -91,6 +94,11 @@ struct io_tlb_pool {
unsigned int area_nslabs;
struct io_tlb_area *areas;
struct io_tlb_slot *slots;
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+   struct list_head node;
+   struct rcu_head rcu;
+   bool transient;
+#endif
 };
 
 /**
@@ -122,6 +130,20 @@ struct io_tlb_mem {
 #endif
 };
 
+#ifdef CONFIG_SWIOTLB_DYNAMIC
+
+struct io_tlb_pool *swiotlb_find_pool(struct device *dev, phys_addr_t paddr);
+
+#else
+
+static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev,
+   phys_addr_t paddr)
+{
+   return >dma_io_tlb_mem->defpool;
+}
+
+#endif
+
 /**
  * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
  * @dev:Device which has mapped the buffer.
@@ -137,7 +

[PATCH v5 2/9] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

SWIOTLB implementation details should not be exposed to the rest of the
kernel. This will allow to make changes to the implementation without
modifying non-swiotlb code.

To avoid breaking existing users, provide helper functions for the few
required fields.

As a bonus, using a helper function to initialize struct device allows to
get rid of an #ifdef in driver core.

Signed-off-by: Petr Tesarik 
---
 arch/mips/pci/pci-octeon.c |  2 +-
 drivers/base/core.c|  4 +---
 drivers/xen/swiotlb-xen.c  |  2 +-
 include/linux/swiotlb.h| 19 ++-
 kernel/dma/swiotlb.c   | 31 ++-
 5 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
index e457a18cbdc5..d19d9d456309 100644
--- a/arch/mips/pci/pci-octeon.c
+++ b/arch/mips/pci/pci-octeon.c
@@ -664,7 +664,7 @@ static int __init octeon_pci_setup(void)
 
/* BAR1 movable regions contiguous to cover the swiotlb */
octeon_bar1_pci_phys =
-   io_tlb_default_mem.start & ~((1ull << 22) - 1);
+   default_swiotlb_base() & ~((1ull << 22) - 1);
 
for (index = 0; index < 32; index++) {
union cvmx_pci_bar1_indexx bar1_index;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3dff5037943e..46d1d78c5beb 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3108,9 +3108,7 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
-#ifdef CONFIG_SWIOTLB
-   dev->dma_io_tlb_mem = _tlb_default_mem;
-#endif
+   swiotlb_dev_init(dev);
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 67aa74d20162..946bd56f0ac5 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -381,7 +381,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, default_swiotlb_limit()) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4e52cd5e0bdc..3b372364a144 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -110,7 +110,6 @@ struct io_tlb_mem {
atomic_long_t used_hiwater;
 #endif
 };
-extern struct io_tlb_mem io_tlb_default_mem;
 
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
@@ -128,13 +127,21 @@ static inline bool is_swiotlb_force_bounce(struct device 
*dev)
 
 void swiotlb_init(bool addressing_limited, unsigned int flags);
 void __init swiotlb_exit(void);
+void swiotlb_dev_init(struct device *dev);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+phys_addr_t default_swiotlb_base(void);
+phys_addr_t default_swiotlb_limit(void);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
 }
+
+static inline void swiotlb_dev_init(struct device *dev)
+{
+}
+
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
@@ -159,6 +166,16 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline phys_addr_t default_swiotlb_base(void)
+{
+   return 0;
+}
+
+static inline phys_addr_t default_swiotlb_limit(void)
+{
+   return 0;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 66fc8ec9ae45..6734ef7b9d8d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -71,7 +71,7 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
-struct io_tlb_mem io_tlb_default_mem;
+static struct io_tlb_mem io_tlb_default_mem;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
 static unsigned long default_nareas;
@@ -489,6 +489,15 @@ void __init swiotlb_exit(void)
memset(mem, 0, sizeof(*mem));
 }
 
+/**
+ * swiotlb_dev_init() - initialize swiotlb fields in  device
+ * @dev:   Device to be initialized.
+ */
+void swiotlb_dev_init(struct device *dev)
+{
+   dev->dma_io_tlb_mem = _tlb_default_mem;
+}
+
 /*
  * Return the offset into a iotlb slot required to keep the device happy.
  */
@@ -961,6 +970,26 @@ bool is_swiotlb_active(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
+/**
+ * default_swiotlb_base() - get the base address of the default SWIOTLB
+ *
+ * Get the lowest physical address used by the default software IO TLB pool.

[PATCH v5 1/9] swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

If swiotlb is allocated, immediately return 0, so callers do not have to
check io_tlb_default_mem.nslabs explicitly.

Signed-off-by: Petr Tesarik 
---
 arch/arm/xen/mm.c | 10 --
 arch/x86/kernel/pci-dma.c | 12 ++--
 kernel/dma/swiotlb.c  |  3 +++
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 3d826c0b5fee..882cd70c7a2f 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -125,12 +125,10 @@ static int __init xen_mm_init(void)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
-   rc = swiotlb_init_late(swiotlb_size_or_default(),
-  xen_swiotlb_gfp(), NULL);
-   if (rc < 0)
-   return rc;
-   }
+   rc = swiotlb_init_late(swiotlb_size_or_default(),
+  xen_swiotlb_gfp(), NULL);
+   if (rc < 0)
+   return rc;
 
cflush.op = 0;
cflush.a.dev_bus_addr = 0;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de6be0a3965e..08988b0a1c91 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -86,16 +86,16 @@ static void __init pci_xen_swiotlb_init(void)
 
 int pci_xen_swiotlb_init_late(void)
 {
+   int rc;
+
if (dma_ops == _swiotlb_dma_ops)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
-   int rc = swiotlb_init_late(swiotlb_size_or_default(),
-  GFP_KERNEL, xen_swiotlb_fixup);
-   if (rc < 0)
-   return rc;
-   }
+   rc = swiotlb_init_late(swiotlb_size_or_default(),
+  GFP_KERNEL, xen_swiotlb_fixup);
+   if (rc < 0)
+   return rc;
 
/* XXX: this switches the dma ops under live devices! */
dma_ops = _swiotlb_dma_ops;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2b83e3ad9dca..66fc8ec9ae45 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -384,6 +384,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
bool retried = false;
int rc = 0;
 
+   if (io_tlb_default_mem.nslabs)
+   return 0;
+
if (swiotlb_force_disable)
return 0;
 
-- 
2.25.1




[PATCH v5 0/9] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-24 Thread Petr Tesarik
From: Petr Tesarik 

Motivation
==

The software IO TLB was designed with these assumptions:

1) It would not be used much. Small systems (little RAM) don't need it, and
   big systems (lots of RAM) would have modern DMA controllers and an IOMMU
   chip to handle legacy devices.
2) A small fixed memory area (64 MiB by default) is sufficient to
   handle the few cases which require a bounce buffer.
3) 64 MiB is little enough that it has no impact on the rest of the
   system.
4) Bounce buffers require large contiguous chunks of low memory. Such
   memory is precious and can be allocated only early at boot.

It turns out they are not always true:

1) Embedded systems may have more than 4GiB RAM but no IOMMU and legacy
   32-bit peripheral busses and/or DMA controllers.
2) CoCo VMs use bounce buffers for all I/O but may need substantially more
   than 64 MiB.
3) Embedded developers put as many features as possible into the available
   memory. A few dozen "missing" megabytes may limit what features can be
   implemented.
4) If CMA is available, it can allocate large continuous chunks even after
   the system has run for some time.

Goals
=

The goal of this work is to start with a small software IO TLB at boot and
expand it later when/if needed.

Design
==

This version of the patch series retains the current slot allocation
algorithm with multiple areas to reduce lock contention, but additional
slots can be added when necessary.

These alternatives have been considered:

- Allocate and free buffers as needed using direct DMA API. This works
  quite well, except in CoCo VMs where each allocation/free requires
  decrypting/encrypting memory, which is a very expensive operation.

- Allocate a very large software IO TLB at boot, but allow to migrate pages
  to/from it (like CMA does). For systems with CMA, this would mean two big
  allocations at boot. Finding the balance between CMA, SWIOTLB and rest of
  available RAM can be challenging. More importantly, there is no clear
  benefit compared to allocating SWIOTLB memory pools from the CMA.

Implementation Constraints
==

These constraints have been taken into account:

1) Minimize impact on devices which do not benefit from the change.
2) Minimize the number of memory decryption/encryption operations.
3) Avoid contention on a lock or atomic variable to preserve parallel
   scalability.

Additionally, the software IO TLB code is also used to implement restricted
DMA pools. These pools are restricted to a pre-defined physical memory
region and must not use any other memory. In other words, dynamic
allocation of memory pools must be disabled for restricted DMA pools.

Data Structures
===

The existing struct io_tlb_mem is the central type for a SWIOTLB allocator,
but it now contains multiple memory pools::

  io_tlb_mem
  +-+   io_tlb_pool
  | SWIOTLB |   +---+   +---+   +---+
  |allocator|-->|default|-->|dynamic|-->|dynamic|-->...
  | |   |memory |   |memory |   |memory |
  +-+   | pool  |   | pool  |   | pool  |
+---+   +---+   +---+

The allocator structure contains global state (such as flags and counters)
and structures needed to schedule new allocations. Each memory pool
contains the actual buffer slots and metadata. The first memory pool in the
list is the default memory pool allocated statically at early boot.

New memory pools are allocated from a kernel worker thread. That's because
bounce buffers are allocated when mapping a DMA buffer, which may happen in
interrupt context where large atomic allocations would probably fail.
Allocation from process context is much more likely to succeed, especially
if it can use CMA.

Nonetheless, the onset of a load spike may fill up the SWIOTLB before the
worker has a chance to run. In that case, try to allocate a small transient
memory pool to accommodate the request. If memory is encrypted and the
device cannot do DMA to encrypted memory, this buffer is allocated from the
coherent atomic DMA memory pool. Reducing the size of SWIOTLB may therefore
require increasing the size of the coherent pool with the "coherent_pool"
command-line parameter.

Performance
===

All testing compared a vanilla v6.4-rc6 kernel with a fully patched
kernel. The kernel was booted with "swiotlb=force" to allow stress-testing
the software IO TLB on a high-performance device that would otherwise not
need it. CONFIG_DEBUG_FS was set to 'y' to match the configuration of
popular distribution kernels; it is understood that parallel workloads
suffer from contention on the recently added debugfs atomic counters.

These benchmarks were run:

- small: single-threaded I/O of 4 KiB blocks,
- big: single-threaded I/O of 64 KiB blocks,
- 4way: 4-way parallel I/O of 4 KiB blocks.

In all tested cases, the default 64 MiB SWIOTLB would be sufficient (but
wasteful). The "default" 

[PATCH v4 8/8] swiotlb: search the software IO TLB only if a device makes use of it

2023-07-13 Thread Petr Tesarik
From: Petr Tesarik 

Skip searching the software IO TLB if a device has never used it, making
sure these devices are not affected by the introduction of multiple IO TLB
memory pools.

Additional memory barrier is required to ensure that the new value of the
flag is visible to other CPUs after mapping a new bounce buffer. For
efficiency, the flag check should be inlined, and then the memory barrier
must be moved to is_swiotlb_buffer(). However, it can replace the existing
barrier in swiotlb_find_pool(), because all callers use is_swiotlb_buffer()
first to verify that the buffer address belongs to the software IO TLB.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |  2 ++
 include/linux/swiotlb.h |  6 +-
 kernel/dma/swiotlb.c| 14 ++
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 549b0a62455c..86871d628648 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -628,6 +628,7 @@ struct device_physical_location {
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
  * @dma_io_tlb_lock:   Protects changes to the list of active pools.
+ * @dma_uses_io_tlb: %true if device has used the software IO TLB.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -735,6 +736,7 @@ struct device {
struct io_tlb_mem *dma_io_tlb_mem;
struct list_head dma_io_tlb_pools;
spinlock_t dma_io_tlb_lock;
+   bool dma_uses_io_tlb;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 06fd94de1cd8..8069cb62c893 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -150,7 +150,11 @@ struct io_tlb_pool *swiotlb_find_pool(struct device *dev, 
phys_addr_t paddr);
  */
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   return dev->dma_io_tlb_mem &&
+   /* Pairs with smp_wmb() in swiotlb_find_slots() and
+* swiotlb_dyn_alloc(), which modify the RCU lists.
+*/
+   smp_rmb();
+   return dev->dma_uses_io_tlb &&
!!swiotlb_find_pool(dev, paddr);
 }
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9c66ec2c47dd..854d139ddcb7 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -706,7 +706,7 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
 
add_mem_pool(mem, pool);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
+   /* Pairs with smp_rmb() in is_swiotlb_buffer(). */
smp_wmb();
 }
 
@@ -734,6 +734,7 @@ void swiotlb_dev_init(struct device *dev)
dev->dma_io_tlb_mem = _tlb_default_mem;
INIT_LIST_HEAD(>dma_io_tlb_pools);
spin_lock_init(>dma_io_tlb_lock);
+   dev->dma_uses_io_tlb = false;
 }
 
 /**
@@ -751,11 +752,6 @@ struct io_tlb_pool *swiotlb_find_pool(struct device *dev, 
phys_addr_t paddr)
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
struct io_tlb_pool *pool;
 
-   /* Pairs with smp_wmb() in swiotlb_find_slots() and
-* swiotlb_dyn_alloc(), which modify the RCU lists.
-*/
-   smp_rmb();
-
rcu_read_lock();
list_for_each_entry_rcu(pool, >pools, node) {
if (paddr >= pool->start && paddr < pool->end)
@@ -1128,9 +1124,11 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
list_add_rcu(>node, >dma_io_tlb_pools);
spin_unlock_irqrestore(>dma_io_tlb_lock, flags);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
-   smp_wmb();
 found:
+   dev->dma_uses_io_tlb = true;
+   /* Pairs with smp_rmb() in is_swiotlb_buffer() */
+   smp_wmb();
+
*retpool = pool;
return index;
 }
-- 
2.25.1




[PATCH v4 7/8] swiotlb: allocate a new memory pool when existing pools are full

2023-07-13 Thread Petr Tesarik
From: Petr Tesarik 

When swiotlb_find_slots() cannot find suitable slots, schedule the
allocation of a new memory pool. It is not possible to allocate the pool
immediately, because this code may run in interrupt context, which is not
suitable for large memory allocations. This means that the memory pool will
be available too late for the currently requested mapping, but the stress
on the software IO TLB allocator is likely to continue, and subsequent
allocations will benefit from the additional pool eventually.

Keep all memory pools for an allocator in an RCU list to avoid locking on
the read side. For modifications, add a new spinlock to struct io_tlb_mem.

The spinlock also protects updates to the total number of slabs (nslabs in
struct io_tlb_mem), but not reads of the value. Readers may therefore
encounter a stale value, but this is not an issue:

- swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero
  value. This is ensured by the existence of the default memory pool,
  allocated at boot.

- The exact value is used only for non-critical purposes (debugfs, kernel
  messages).

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |   9 ++-
 kernel/dma/swiotlb.c| 134 +++-
 2 files changed, 113 insertions(+), 30 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index ff8f5150f4de..06fd94de1cd8 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct page;
@@ -103,13 +104,15 @@ struct io_tlb_pool {
 
 /**
  * struct io_tlb_mem - Software IO TLB allocator
- * @pool:  IO TLB memory pool descriptor.
+ * @lock:  Lock to synchronize changes to the list.
+ * @pools: List of IO TLB memory pool descriptors.
  * @nslabs:Total number of IO TLB slabs in all pools.
  * @phys_limit:Maximum allowed physical address.
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
  * @can_grow:  %true if more pools can be allocated dynamically.
+ * @dyn_alloc: Dynamic IO TLB pool allocation work.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -117,13 +120,15 @@ struct io_tlb_pool {
  * in debugfs.
  */
 struct io_tlb_mem {
-   struct io_tlb_pool *pool;
+   spinlock_t lock;
+   struct list_head pools;
unsigned long nslabs;
u64 phys_limit;
struct dentry *debugfs;
bool force_bounce;
bool for_alloc;
bool can_grow;
+   struct work_struct dyn_alloc;
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d6a05727efc5..9c66ec2c47dd 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -79,9 +79,14 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
+static void swiotlb_dyn_alloc(struct work_struct *work);
+
 static struct io_tlb_pool io_tlb_default_pool;
-static struct io_tlb_mem io_tlb_default_mem = {
-   .pool = _tlb_default_pool,
+struct io_tlb_mem io_tlb_default_mem = {
+   .lock = __SPIN_LOCK_UNLOCKED(io_tlb_default_mem.lock),
+   .pools = LIST_HEAD_INIT(io_tlb_default_mem.pools),
+   .dyn_alloc = __WORK_INITIALIZER(io_tlb_default_mem.dyn_alloc,
+   swiotlb_dyn_alloc),
 };
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
@@ -281,6 +286,19 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool 
*mem, phys_addr_t start,
return;
 }
 
+/**
+ * add_mem_pool() - add a memory pool to the allocator
+ * @mem:   Software IO TLB allocator.
+ * @pool:  Memory pool to be added.
+ */
+static void add_mem_pool(struct io_tlb_mem *mem, struct io_tlb_pool *pool)
+{
+   spin_lock(>lock);
+   list_add_rcu(>node, >pools);
+   mem->nslabs += pool->nslabs;
+   spin_unlock(>lock);
+}
+
 static void __init *swiotlb_memblock_alloc(unsigned long nslabs,
unsigned int flags,
int (*remap)(void *tlb, unsigned long nslabs))
@@ -374,7 +392,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 
swiotlb_init_io_tlb_pool(mem, __pa(tlb), nslabs, false,
 default_nareas);
-   io_tlb_default_mem.nslabs = nslabs;
+   add_mem_pool(_tlb_default_mem, mem);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();
@@ -466,7 +484,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
swiotlb_init_io_tlb_pool

[PATCH v4 6/8] swiotlb: determine potential physical address limit

2023-07-13 Thread Petr Tesarik
From: Petr Tesarik 

The value returned by default_swiotlb_limit() should be constant, because
it is used to decide whether DMA can be used. To allow allocating memory
pools on the fly, use the maximum possible physical address rather than the
highest address used by the default pool.

For swiotlb_init_remap(), this is either an arch-specific limit used by
memblock_alloc_low(), or the highest directly mapped physical address if
the initialization flags include SWIOTLB_ANY. For swiotlb_init_late(), the
highest address is determined by the GFP flags.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/swiotlb.c| 12 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index b642e7739604..ff8f5150f4de 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -105,6 +105,7 @@ struct io_tlb_pool {
  * struct io_tlb_mem - Software IO TLB allocator
  * @pool:  IO TLB memory pool descriptor.
  * @nslabs:Total number of IO TLB slabs in all pools.
+ * @phys_limit:Maximum allowed physical address.
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
@@ -118,6 +119,7 @@ struct io_tlb_pool {
 struct io_tlb_mem {
struct io_tlb_pool *pool;
unsigned long nslabs;
+   u64 phys_limit;
struct dentry *debugfs;
bool force_bounce;
bool for_alloc;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 6ec5a81acc2a..d6a05727efc5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -334,6 +334,10 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
io_tlb_default_mem.force_bounce =
swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
io_tlb_default_mem.can_grow = !remap;
+   if (flags & SWIOTLB_ANY)
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
+   else
+   io_tlb_default_mem.phys_limit = ARCH_LOW_ADDRESS_LIMIT;
 
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
@@ -402,6 +406,12 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
io_tlb_default_mem.can_grow = !remap;
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
+   else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
+   io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
+   else
+   io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
 
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
@@ -1338,7 +1348,7 @@ phys_addr_t default_swiotlb_start(void)
  */
 phys_addr_t default_swiotlb_limit(void)
 {
-   return io_tlb_default_pool.end - 1;
+   return io_tlb_default_mem.phys_limit;
 }
 
 #ifdef CONFIG_DEBUG_FS
-- 
2.25.1




[PATCH v4 5/8] swiotlb: if swiotlb is full, fall back to a transient memory pool

2023-07-13 Thread Petr Tesarik
From: Petr Tesarik 

Try to allocate a transient memory pool if no suitable slots can be found
and the respective SWIOTLB is allowed to grow. The transient pool is just
enough big for this one bounce buffer. It is inserted into a per-device
list of transient memory pools, and it is freed again when the bounce
buffer is unmapped.

Transient memory pools are kept in an RCU list. A memory barrier is
required after adding a new entry, because any address within a transient
buffer must be immediately recognized as belonging to the SWIOTLB, even if
it is passed to another CPU.

Deletion does not require any synchronization beyond RCU ordering
guarantees. After a buffer is unmapped, its physical addresses may no
longer be passed to the DMA API, so the memory range of the corresponding
stale entry in the RCU list never matches. If the memory range gets
allocated again, then it happens only after a RCU quiescent state.

Since bounce buffers can now be allocated from different pools, add a
parameter to swiotlb_alloc_pool() to let the caller know which memory pool
is used. Add swiotlb_find_pool() to find the memory pool corresponding to
an address. This function is now also used by is_swiotlb_buffer(), because
a simple boundary check is no longer sufficient.

The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(),
simplified and enhanced to use coherent memory pools if needed.

Note that this is not the most efficient way to provide a bounce buffer,
but when a DMA buffer can't be mapped, something may (and will) actually
break. At that point it is better to make an allocation, even if it may be
an expensive operation.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   4 +
 include/linux/dma-mapping.h |   2 +
 include/linux/swiotlb.h |  13 +-
 kernel/dma/direct.c |   2 +-
 kernel/dma/swiotlb.c| 270 ++--
 5 files changed, 277 insertions(+), 14 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index d9754a68ba95..549b0a62455c 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -626,6 +626,8 @@ struct device_physical_location {
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
+ * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
+ * @dma_io_tlb_lock:   Protects changes to the list of active pools.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -731,6 +733,8 @@ struct device {
 #endif
 #ifdef CONFIG_SWIOTLB
struct io_tlb_mem *dma_io_tlb_mem;
+   struct list_head dma_io_tlb_pools;
+   spinlock_t dma_io_tlb_lock;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index e13050eb9777..f0ccca16a0ac 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -418,6 +418,8 @@ static inline void dma_sync_sgtable_for_device(struct 
device *dev,
 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
 #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
 
+bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
+
 static inline void *dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp)
 {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 81f8c901e888..b642e7739604 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -63,6 +63,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 
 /**
  * struct io_tlb_pool - IO TLB memory pool descriptor
+ * @node:  Member of the IO TLB memory pool list.
  * @start: The start address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
@@ -77,22 +78,27 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * see setup_io_tlb_npages().
  * @used:  The number of used IO TLB slots.
  * @late_alloc:%true if allocated using the page allocator.
+ * @transient:  %true if transient memory pool.
  * @nareas:Number of areas in the pool.
  * @area_nslabs: Number of slots in each area.
  * @areas: Array of memory area descriptors.
  * @slots: Array of slot descriptors.
+ * @rcu:   RCU head for swiotlb_dyn_free().
  */
 struct io_tlb_pool {
+   struct list_head node;
phys_addr_t start;
phys_addr_t end;
void *vaddr;
unsigned long nslabs;
unsigned long used;
bool late_alloc;
+   bool transient;
unsigned int nareas;
unsigned int area_nslabs;
struct io_tlb_area *areas;
struct io_tlb_slot *slots

[PATCH v4 4/8] swiotlb: add a flag whether a SWIOTLB is allowed to grow

2023-07-13 Thread Petr Tesarik
From: Petr Tesarik 

Mark the default SWIOTLB as able to grow and restricted DMA pools as
unable.

However, if the address of the default memory pool is explicitly queried,
make the default SWIOTLB also unable to grow. This is currently used to set
up PCI BAR movable regions on some Octeon MIPS boards which may not be able
to use a SWIOTLB pool elsewhere in physical memory. See octeon_pci_setup()
for more details.

If a remap function is specified, it must be also called on any dynamically
allocated pools, but there are some issues:

- The remap function may block, so it should not be called from an atomic
  context.
- There is no corresponding unremap() function if the memory pool is
  freed.
- The only in-tree implementation (xen_swiotlb_fixup) requires that the
  number of slots in the memory pool is a multiple of SWIOTLB_SEGSIZE.

Keep it simple for now and disable growing the SWIOTLB if a remap function
was specified.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h | 2 ++
 kernel/dma/swiotlb.c| 4 
 2 files changed, 6 insertions(+)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d669e11e2827..81f8c901e888 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -102,6 +102,7 @@ struct io_tlb_pool {
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @can_grow:  %true if more pools can be allocated dynamically.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -114,6 +115,7 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool force_bounce;
bool for_alloc;
+   bool can_grow;
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index a80b77de8829..16e5b9a82902 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -332,6 +332,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 
io_tlb_default_mem.force_bounce =
swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
+   io_tlb_default_mem.can_grow = !remap;
 
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
@@ -399,6 +400,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
return 0;
 
io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
+   io_tlb_default_mem.can_grow = !remap;
 
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
@@ -1074,6 +1076,7 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
  */
 phys_addr_t default_swiotlb_start(void)
 {
+   io_tlb_default_mem.can_grow = false;
return io_tlb_default_pool.start;
 }
 
@@ -1236,6 +1239,7 @@ static int rmem_swiotlb_device_init(struct reserved_mem 
*rmem,
 false, nareas);
mem->force_bounce = true;
mem->for_alloc = true;
+   mem->can_grow = false;
mem->pool = pool;
mem->nslabs = nslabs;
 
-- 
2.25.1




[PATCH v4 3/8] swiotlb: separate memory pool data from other allocator data

2023-07-13 Thread Petr Tesarik
From: Petr Tesarik 

Carve out memory pool specific fields from struct io_tlb_mem. The original
struct now contains shared data for the whole allocator, while the new
struct io_tlb_pool contains data that is specific to one memory pool of
(potentially) many.

Allocate both structures together for restricted DMA pools to keep the
error cleanup path simple.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   2 +-
 include/linux/swiotlb.h |  47 +++
 kernel/dma/swiotlb.c| 181 +---
 3 files changed, 147 insertions(+), 83 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index bbaeabd04b0d..d9754a68ba95 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -625,7 +625,7 @@ struct device_physical_location {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
- * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
+ * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 39313c3a791a..d669e11e2827 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,8 +62,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 
 /**
- * struct io_tlb_mem - IO TLB Memory Pool Descriptor
- *
+ * struct io_tlb_pool - IO TLB memory pool descriptor
  * @start: The start address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
@@ -73,15 +72,36 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
  * may be remapped in the memory encrypted case and store virtual
  * address for bounce buffer operation.
- * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. For default swiotlb, this is command line adjustable via
- * setup_io_tlb_npages.
+ * @nslabs:The number of IO TLB slots between @start and @end. For the
+ * default swiotlb, this can be adjusted with a boot parameter,
+ * see setup_io_tlb_npages().
+ * @used:  The number of used IO TLB slots.
+ * @late_alloc:%true if allocated using the page allocator.
+ * @nareas:Number of areas in the pool.
+ * @area_nslabs: Number of slots in each area.
+ * @areas: Array of memory area descriptors.
+ * @slots: Array of slot descriptors.
+ */
+struct io_tlb_pool {
+   phys_addr_t start;
+   phys_addr_t end;
+   void *vaddr;
+   unsigned long nslabs;
+   unsigned long used;
+   bool late_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
+   struct io_tlb_slot *slots;
+};
+
+/**
+ * struct io_tlb_mem - Software IO TLB allocator
+ * @pool:  IO TLB memory pool descriptor.
+ * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
- * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
- * @nareas:  The area number in the pool.
- * @area_nslabs: The slot number in the area.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -89,18 +109,11 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
phys,
  * in debugfs.
  */
 struct io_tlb_mem {
-   phys_addr_t start;
-   phys_addr_t end;
-   void *vaddr;
+   struct io_tlb_pool *pool;
unsigned long nslabs;
struct dentry *debugfs;
-   bool late_alloc;
bool force_bounce;
bool for_alloc;
-   unsigned int nareas;
-   unsigned int area_nslabs;
-   struct io_tlb_area *areas;
-   struct io_tlb_slot *slots;
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
@@ -122,7 +135,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
-   return mem && paddr >= mem->start && paddr < mem->end;
+   return mem && paddr >= mem->pool->start && paddr < mem->pool->end;
 }
 
 static inline bool is_swiotlb_force_bounce(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 01161d040639..a80b77de8829 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/d

[PATCH v4 2/8] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-07-13 Thread Petr Tesarik
From: Petr Tesarik 

Add some kernel-doc comments and move the existing documentation of struct
io_tlb_slot to its correct location. The latter was forgotten in commit
942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").

Use the opportunity to give swiotlb_do_find_slots() a more descriptive
name, which makes it clear how it differs from swiotlb_find_slots().

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h | 15 +++---
 kernel/dma/swiotlb.c| 61 +
 2 files changed, 66 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 07216af59e93..39313c3a791a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -76,10 +76,6 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
- * @list:  The free list describing the number of free entries available
- * from each index.
- * @orig_addr: The original address corresponding to a mapped entry.
- * @alloc_size:Size of the allocated buffer.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
@@ -111,6 +107,17 @@ struct io_tlb_mem {
 #endif
 };
 
+/**
+ * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
+ * @dev:Device which has mapped the buffer.
+ * @paddr:  Physical address within the DMA buffer.
+ *
+ * Check if @paddr points into a bounce buffer.
+ *
+ * Return:
+ * * %true if @paddr points into a bounce buffer
+ * * %false otherwise
+ */
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 873b077d7e37..01161d040639 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -62,6 +62,13 @@
 
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
+/**
+ * struct io_tlb_slot - IO TLB slot descriptor
+ * @orig_addr: The original address corresponding to a mapped entry.
+ * @alloc_size:Size of the allocated buffer.
+ * @list:  The free list describing the number of free entries available
+ * from each index.
+ */
 struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -632,11 +639,22 @@ static void dec_used(struct io_tlb_mem *mem, unsigned int 
nslots)
 }
 #endif /* CONFIG_DEBUG_FS */
 
-/*
- * Find a suitable number of IO TLB entries size that will fit this request and
- * allocate a buffer from that IO TLB pool.
+/**
+ * area_find_slots() - search for slots in one IO TLB memory area
+ * @dev:   Device which maps the buffer.
+ * @area_index:Index of the IO TLB memory area to be searched.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Find a suitable sequence of IO TLB entries for the request and allocate
+ * a buffer from the given IO TLB memory area.
+ * This function takes care of locking.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
  */
-static int swiotlb_do_find_slots(struct device *dev, int area_index,
+static int area_find_slots(struct device *dev, int area_index,
phys_addr_t orig_addr, size_t alloc_size,
unsigned int alloc_align_mask)
 {
@@ -731,6 +749,19 @@ static int swiotlb_do_find_slots(struct device *dev, int 
area_index,
return slot_index;
 }
 
+/**
+ * swiotlb_find_slots() - search for slots in the whole swiotlb
+ * @dev:   Device which maps the buffer.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Search through the whole software IO TLB to find a sequence of slots that
+ * match the allocation constraints.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
+ */
 static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size, unsigned int alloc_align_mask)
 {
@@ -739,8 +770,8 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
int i = start, index;
 
do {
-   index = swiotlb_do_find_slots(dev, i, orig_addr, alloc_size,
- alloc_align_mask);
+   index = area_find_slots(dev, i, orig_addr, alloc_size,
+   alloc_align_mask);
if (index >= 0)

[PATCH v4 1/8] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-07-13 Thread Petr Tesarik
From: Petr Tesarik 

SWIOTLB implementation details should not be exposed to the rest of the
kernel. This will allow to make changes to the implementation without
modifying non-swiotlb code.

To avoid breaking existing users, provide helper functions for the few
required fields.

As a bonus, using a helper function to initialize struct device allows to
get rid of an #ifdef in driver core.

Signed-off-by: Petr Tesarik 
---
 arch/arm/xen/mm.c  |  2 +-
 arch/mips/pci/pci-octeon.c |  2 +-
 arch/x86/kernel/pci-dma.c  |  2 +-
 drivers/base/core.c|  4 +---
 drivers/xen/swiotlb-xen.c  |  2 +-
 include/linux/swiotlb.h| 25 +++-
 kernel/dma/swiotlb.c   | 39 +-
 7 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 3d826c0b5fee..0f32c14eb786 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -125,7 +125,7 @@ static int __init xen_mm_init(void)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
+   if (!is_swiotlb_allocated()) {
rc = swiotlb_init_late(swiotlb_size_or_default(),
   xen_swiotlb_gfp(), NULL);
if (rc < 0)
diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
index e457a18cbdc5..c5c4c1f7d5e4 100644
--- a/arch/mips/pci/pci-octeon.c
+++ b/arch/mips/pci/pci-octeon.c
@@ -664,7 +664,7 @@ static int __init octeon_pci_setup(void)
 
/* BAR1 movable regions contiguous to cover the swiotlb */
octeon_bar1_pci_phys =
-   io_tlb_default_mem.start & ~((1ull << 22) - 1);
+   default_swiotlb_start() & ~((1ull << 22) - 1);
 
for (index = 0; index < 32; index++) {
union cvmx_pci_bar1_indexx bar1_index;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de6be0a3965e..08c6ffc3550f 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -90,7 +90,7 @@ int pci_xen_swiotlb_init_late(void)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
+   if (!is_swiotlb_allocated()) {
int rc = swiotlb_init_late(swiotlb_size_or_default(),
   GFP_KERNEL, xen_swiotlb_fixup);
if (rc < 0)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3dff5037943e..46d1d78c5beb 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3108,9 +3108,7 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
-#ifdef CONFIG_SWIOTLB
-   dev->dma_io_tlb_mem = _tlb_default_mem;
-#endif
+   swiotlb_dev_init(dev);
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 67aa74d20162..946bd56f0ac5 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -381,7 +381,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, default_swiotlb_limit()) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4e52cd5e0bdc..07216af59e93 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -110,7 +110,6 @@ struct io_tlb_mem {
atomic_long_t used_hiwater;
 #endif
 };
-extern struct io_tlb_mem io_tlb_default_mem;
 
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
@@ -128,13 +127,22 @@ static inline bool is_swiotlb_force_bounce(struct device 
*dev)
 
 void swiotlb_init(bool addressing_limited, unsigned int flags);
 void __init swiotlb_exit(void);
+void swiotlb_dev_init(struct device *dev);
 size_t swiotlb_max_mapping_size(struct device *dev);
+bool is_swiotlb_allocated(void);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+phys_addr_t default_swiotlb_start(void);
+phys_addr_t default_swiotlb_limit(void);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
 }
+
+static inline void swiotlb_dev_init(struct device *dev)
+{
+}
+
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
@@ -151,6 +159,11 @@ static inline size_t swiotlb_max_mapping_size(struct 
device *dev)
return SIZE_MAX;
 }
 
+static inline bool is_swiotlb_allocated(void)
+{
+   return false;
+}
+
 static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
@@ -159,6 +172,16 @@ static inline 

[PATCH v4 0/8] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-13 Thread Petr Tesarik
From: Petr Tesarik 

Motivation
==

The software IO TLB was designed with these assumptions:

1) It would not be used much. Small systems (little RAM) don't need it, and
   big systems (lots of RAM) would have modern DMA controllers and an IOMMU
   chip to handle legacy devices.
2) A small fixed memory area (64 MiB by default) is sufficient to
   handle the few cases which require a bounce buffer.
3) 64 MiB is little enough that it has no impact on the rest of the
   system.
4) Bounce buffers require large contiguous chunks of low memory. Such
   memory is precious and can be allocated only early at boot.

It turns out they are not always true:

1) Embedded systems may have more than 4GiB RAM but no IOMMU and legacy
   32-bit peripheral busses and/or DMA controllers.
2) CoCo VMs use bounce buffers for all I/O but may need substantially more
   than 64 MiB.
3) Embedded developers put as many features as possible into the available
   memory. A few dozen "missing" megabytes may limit what features can be
   implemented.
4) If CMA is available, it can allocate large continuous chunks even after
   the system has run for some time.

Goals
=

The goal of this work is to start with a small software IO TLB at boot and
expand it later when/if needed.

Design
==

This version of the patch series retains the current slot allocation
algorithm with multiple areas to reduce lock contention, but additional
slots can be added when necessary.

These alternatives have been considered:

- Allocate and free buffers as needed using direct DMA API. This works
  quite well, except in CoCo VMs where each allocation/free requires
  decrypting/encrypting memory, which is a very expensive operation.

- Allocate a very large software IO TLB at boot, but allow to migrate pages
  to/from it (like CMA does). For systems with CMA, this would mean two big
  allocations at boot. Finding the balance between CMA, SWIOTLB and rest of
  available RAM can be challenging. More importantly, there is no clear
  benefit compared to allocating SWIOTLB memory pools from the CMA.

Implementation Constraints
==

These constraints have been taken into account:

1) Minimize impact on devices which do not benefit from the change.
2) Minimize the number of memory decryption/encryption operations.
3) Avoid contention on a lock or atomic variable to preserve parallel
   scalability.

Additionally, the software IO TLB code is also used to implement restricted
DMA pools. These pools are restricted to a pre-defined physical memory
region and must not use any other memory. In other words, dynamic
allocation of memory pools must be disabled for restricted DMA pools.

Data Structures
===

The existing struct io_tlb_mem is the central type for a SWIOTLB allocator,
but it now contains multiple memory pools::

  io_tlb_mem
  +-+   io_tlb_pool
  | SWIOTLB |   +---+   +---+   +---+
  |allocator|-->|default|-->|dynamic|-->|dynamic|-->...
  | |   |memory |   |memory |   |memory |
  +-+   | pool  |   | pool  |   | pool  |
+---+   +---+   +---+

The allocator structure contains global state (such as flags and counters)
and structures needed to schedule new allocations. Each memory pool
contains the actual buffer slots and metadata. The first memory pool in the
list is the default memory pool allocated statically at early boot.

New memory pools are allocated from a kernel worker thread. That's because
bounce buffers are allocated when mapping a DMA buffer, which may happen in
interrupt context where large atomic allocations would probably fail.
Allocation from process context is much more likely to succeed, especially
if it can use CMA.

Nonetheless, the onset of a load spike may fill up the SWIOTLB before the
worker has a chance to run. In that case, try to allocate a small transient
memory pool to accommodate the request. If memory is encrypted and the
device cannot do DMA to encrypted memory, this buffer is allocated from the
coherent atomic DMA memory pool. Reducing the size of SWIOTLB may therefore
require increasing the size of the coherent pool with the "coherent_pool"
command-line parameter.

Performance
===

All testing compared a vanilla v6.4-rc6 kernel with a fully patched
kernel. The kernel was booted with "swiotlb=force" to allow stress-testing
the software IO TLB on a high-performance device that would otherwise not
need it. CONFIG_DEBUG_FS was set to 'y' to match the configuration of
popular distribution kernels; it is understood that parallel workloads
suffer from contention on the recently added debugfs atomic counters.

These benchmarks were run:

- small: single-threaded I/O of 4 KiB blocks,
- big: single-threaded I/O of 64 KiB blocks,
- 4way: 4-way parallel I/O of 4 KiB blocks.

In all tested cases, the default 64 MiB SWIOTLB would be sufficient (but
wasteful). The "default" 

[PATCH v3 0/7] Allow dynamic allocation of software IO TLB bounce buffers

2023-06-27 Thread Petr Tesarik
From: Petr Tesarik 

Note: This patch series depends on fixes from this thread:

https://lore.kernel.org/linux-iommu/cover.1687784289.git.petr.tesarik@huawei.com/T/

Motivation
==

The software IO TLB was designed with these assumptions:

1) It would not be used much. Small systems (little RAM) don't need it, and
   big systems (lots of RAM) would have modern DMA controllers and an IOMMU
   chip to handle legacy devices.
2) A small fixed memory area (64 MiB by default) is sufficient to
   handle the few cases which require a bounce buffer.
3) 64 MiB is little enough that it has no impact on the rest of the
   system.
4) Bounce buffers require large contiguous chunks of low memory. Such
   memory is precious and can be allocated only early at boot.

It turns out they are not always true:

1) Embedded systems may have more than 4GiB RAM but no IOMMU and legacy
   32-bit peripheral busses and/or DMA controllers.
2) CoCo VMs use bounce buffers for all I/O but may need substantially more
   than 64 MiB.
3) Embedded developers put as many features as possible into the available
   memory. A few dozen "missing" megabytes may limit what features can be
   implemented.
4) If CMA is available, it can allocate large continuous chunks even after
   the system has run for some time.

Goals
=

The goal of this work is to start with a small software IO TLB at boot and
expand it later when/if needed.

Design
==

This version of the patch series retains the current slot allocation
algorithm with multiple areas to reduce lock contention, but additional
slots can be added when necessary.

These alternatives have been considered:

- Allocate and free buffers as needed using direct DMA API. This works
  quite well, except in CoCo VMs where each allocation/free requires
  decrypting/encrypting memory, which is a very expensive operation.

- Allocate a very large software IO TLB at boot, but allow to migrate pages
  to/from it (like CMA does). For systems with CMA, this would mean two big
  allocations at boot. Finding the balance between CMA, SWIOTLB and rest of
  available RAM can be challenging. More importantly, there is no clear
  benefit compared to allocating SWIOTLB memory pools from the CMA.

Implementation Constraints
==

These constraints have been taken into account:

1) Minimize impact on devices which do not benefit from the change.
2) Minimize the number of memory decryption/encryption operations.
3) Avoid contention on a lock or atomic variable to preserve parallel
   scalability.

Additionally, the software IO TLB code is also used to implement restricted
DMA pools. These pools are restricted to a pre-defined physical memory
region and must not use any other memory. In other words, dynamic
allocation of memory pools must be disabled for restricted DMA pools.

Data Structures
===

The existing struct io_tlb_mem is the central type for a SWIOTLB allocator,
but it now contains multiple memory pools::

  io_tlb_mem
  +-+   io_tlb_pool
  | SWIOTLB |   +---+   +---+   +---+
  |allocator|-->|default|-->|dynamic|-->|dynamic|-->...
  | |   |memory |   |memory |   |memory |
  +-+   | pool  |   | pool  |   | pool  |
+---+   +---+   +---+

The allocator structure contains global state (such as flags and counters)
and structures needed to schedule new allocations. Each memory pool
contains the actual buffer slots and metadata. The first memory pool in the
list is the default memory pool allocated statically at early boot.

New memory pools are allocated from a kernel worker thread. That's because
bounce buffers are allocated when mapping a DMA buffer, which may happen in
interrupt context where large atomic allocations would probably fail.
Allocation from process context is much more likely to succeed, especially
if it can use CMA.

Nonetheless, the onset of a load spike may fill up the SWIOTLB before the
worker has a chance to run. In that case, try to allocate a small transient
memory pool to accommodate the request. If memory is encrypted and the
device cannot do DMA to encrypted memory, this buffer is allocated from the
coherent atomic DMA memory pool. Reducing the size of SWIOTLB may therefore
require increasing the size of the coherent pool with the "coherent_pool"
command-line parameter.

Performance
===

All testing compared a vanilla v6.4-rc6 kernel with a fully patched
kernel. The kernel was booted with "swiotlb=force" to allow stress-testing
the software IO TLB on a high-performance device that would otherwise not
need it. CONFIG_DEBUG_FS was set to 'y' to match the configuration of
popular distribution kernels; it is understood that parallel workloads
suffer from contention on the recently added debugfs atomic counters.

These benchmarks were run:

- small: single-threaded I/O of 4 KiB blocks,
- big: single-threaded I/O of 64 KiB blocks,

[PATCH v3 6/7] swiotlb: allocate a new memory pool when existing pools are full

2023-06-27 Thread Petr Tesarik
From: Petr Tesarik 

When swiotlb_find_slots() cannot find suitable slots, schedule the
allocation of a new memory pool. It is not possible to allocate the pool
immediately, because this code may run in interrupt context, which is not
suitable for large memory allocations. This means that the memory pool will
be available too late for the currently requested mapping, but the stress
on the software IO TLB allocator is likely to continue, and subsequent
allocations will benefit from the additional pool eventually.

Keep all memory pools for an allocator in an RCU list to avoid locking on
the read side. For modifications, add a new spinlock to struct io_tlb_mem.

The spinlock also protects updates to the total number of slabs (nslabs in
struct io_tlb_mem), but not reads of the value. Readers may therefore
encounter a stale value, but this is not an issue:

- swiotlb_tbl_map_single() and is_swiotlb_active() only check for non-zero
  value. This is ensured by the existence of the default memory pool,
  allocated at boot.

- The exact value is used only for non-critical purposes (debugfs, kernel
  messages).

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |   9 ++-
 kernel/dma/swiotlb.c| 141 +++-
 2 files changed, 119 insertions(+), 31 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4a3af1c216d0..ae402890ba41 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct device;
 struct page;
@@ -103,12 +104,14 @@ struct io_tlb_pool {
 
 /**
  * struct io_tlb_mem - Software IO TLB allocator
- * @pool:  IO TLB memory pool descriptor.
+ * @lock:  Lock to synchronize changes to the list.
+ * @pools: List of IO TLB memory pool descriptors.
  * @nslabs:Total number of IO TLB slabs in all pools.
  * @phys_limit:Maximum allowed physical address.
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @dyn_alloc: Dynamic IO TLB pool allocation work.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -116,12 +119,14 @@ struct io_tlb_pool {
  * in debugfs.
  */
 struct io_tlb_mem {
-   struct io_tlb_pool *pool;
+   spinlock_t lock;
+   struct list_head pools;
unsigned long nslabs;
u64 phys_limit;
struct dentry *debugfs;
bool force_bounce;
bool for_alloc;
+   struct work_struct dyn_alloc;
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 5bb83097ade6..7661a6402e80 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -79,9 +79,14 @@ struct io_tlb_slot {
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
+static void swiotlb_dyn_alloc(struct work_struct *work);
+
 static struct io_tlb_pool io_tlb_default_pool;
-static struct io_tlb_mem io_tlb_default_mem = {
-   .pool = _tlb_default_pool,
+struct io_tlb_mem io_tlb_default_mem = {
+   .lock = __SPIN_LOCK_UNLOCKED(io_tlb_default_mem.lock),
+   .pools = LIST_HEAD_INIT(io_tlb_default_mem.pools),
+   .dyn_alloc = __WORK_INITIALIZER(io_tlb_default_mem.dyn_alloc,
+   swiotlb_dyn_alloc),
 };
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
@@ -281,6 +286,19 @@ static void swiotlb_init_io_tlb_pool(struct io_tlb_pool 
*mem, phys_addr_t start,
return;
 }
 
+/**
+ * add_mem_pool() - add a memory pool to the allocator
+ * @mem:   Software IO TLB allocator.
+ * @pool:  Memory pool to be added.
+ */
+static void add_mem_pool(struct io_tlb_mem *mem, struct io_tlb_pool *pool)
+{
+   spin_lock(>lock);
+   list_add_rcu(>node, >pools);
+   mem->nslabs += pool->nslabs;
+   spin_unlock(>lock);
+}
+
 static void __init *swiotlb_memblock_alloc(unsigned long nslabs,
unsigned int flags,
int (*remap)(void *tlb, unsigned long nslabs))
@@ -372,7 +390,7 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 
swiotlb_init_io_tlb_pool(mem, __pa(tlb), nslabs, false,
 default_nareas);
-   io_tlb_default_mem.nslabs = nslabs;
+   add_mem_pool(_tlb_default_mem, mem);
 
if (flags & SWIOTLB_VERBOSE)
swiotlb_print_info();
@@ -463,7 +481,7 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 (nslabs << IO_TLB_SHIFT) >> PAGE_SHIFT);
swiotlb_init_io_tlb_pool(mem, virt_to_phys(vstart), nslabs, true,
 nareas);
-   io_tlb_default_m

[PATCH v3 7/7] swiotlb: search the software IO TLB only if a device makes use of it

2023-06-27 Thread Petr Tesarik
From: Petr Tesarik 

Skip searching the software IO TLB if a device has never used it, making
sure these devices are not affected by the introduction of multiple IO TLB
memory pools.

Additional memory barrier is required to ensure that the new value of the
flag is visible to other CPUs after mapping a new bounce buffer. For
efficiency, the flag check should be inlined, and then the memory barrier
must be moved to is_swiotlb_buffer(). However, it can replace the existing
barrier in swiotlb_find_pool(), because all callers use is_swiotlb_buffer()
first to verify that the buffer address belongs to the software IO TLB.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |  2 ++
 include/linux/swiotlb.h |  6 +-
 kernel/dma/swiotlb.c| 14 ++
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index a1ee4c5924b8..d4b35925a6d1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -512,6 +512,7 @@ struct device_physical_location {
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
  * @dma_io_tlb_lock:   Protects changes to the list of active pools.
+ * @dma_uses_io_tlb: %true if device has used the software IO TLB.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -619,6 +620,7 @@ struct device {
struct io_tlb_mem *dma_io_tlb_mem;
struct list_head dma_io_tlb_pools;
spinlock_t dma_io_tlb_lock;
+   bool dma_uses_io_tlb;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index ae402890ba41..eb30b5c624f1 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -148,7 +148,11 @@ struct io_tlb_pool *swiotlb_find_pool(struct device *dev, 
phys_addr_t paddr);
  */
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   return dev->dma_io_tlb_mem &&
+   /* Pairs with smp_wmb() in swiotlb_find_slots() and
+* swiotlb_dyn_alloc(), which modify the RCU lists.
+*/
+   smp_rmb();
+   return dev->dma_uses_io_tlb &&
!!swiotlb_find_pool(dev, paddr);
 }
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 7661a6402e80..26ab9ed2921b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -701,7 +701,7 @@ static void swiotlb_dyn_alloc(struct work_struct *work)
 
add_mem_pool(mem, pool);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
+   /* Pairs with smp_rmb() in is_swiotlb_buffer(). */
smp_wmb();
 }
 
@@ -729,6 +729,7 @@ void swiotlb_dev_init(struct device *dev)
dev->dma_io_tlb_mem = _tlb_default_mem;
INIT_LIST_HEAD(>dma_io_tlb_pools);
spin_lock_init(>dma_io_tlb_lock);
+   dev->dma_uses_io_tlb = false;
 }
 
 /**
@@ -746,11 +747,6 @@ struct io_tlb_pool *swiotlb_find_pool(struct device *dev, 
phys_addr_t paddr)
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
struct io_tlb_pool *pool;
 
-   /* Pairs with smp_wmb() in swiotlb_find_slots() and
-* swiotlb_dyn_alloc(), which modify the RCU lists.
-*/
-   smp_rmb();
-
rcu_read_lock();
list_for_each_entry_rcu(pool, >pools, node) {
if (paddr >= pool->start && paddr < pool->end)
@@ -1125,9 +1121,11 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
list_add_rcu(>node, >dma_io_tlb_pools);
spin_unlock_irqrestore(>dma_io_tlb_lock, flags);
 
-   /* Pairs with smp_rmb() in swiotlb_find_pool(). */
-   smp_wmb();
 found:
+   dev->dma_uses_io_tlb = true;
+   /* Pairs with smp_rmb() in is_swiotlb_buffer() */
+   smp_wmb();
+
*retpool = pool;
return index;
 }
-- 
2.25.1




[PATCH v3 3/7] swiotlb: separate memory pool data from other allocator data

2023-06-27 Thread Petr Tesarik
From: Petr Tesarik 

Carve out memory pool specific fields from struct io_tlb_mem. The original
struct now contains shared data for the whole allocator, while the new
struct io_tlb_pool contains data that is specific to one memory pool of
(potentially) many.

Allocate both structures together for restricted DMA pools to keep the
error cleanup path simple.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   2 +-
 include/linux/swiotlb.h |  49 ++-
 kernel/dma/swiotlb.c| 181 +---
 3 files changed, 147 insertions(+), 85 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 472dd24d4823..83081aa99e6a 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -509,7 +509,7 @@ struct device_physical_location {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
- * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
+ * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 9f486843a66a..0aa6868cb68c 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,8 +62,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 
 /**
- * struct io_tlb_mem - IO TLB Memory Pool Descriptor
- *
+ * struct io_tlb_pool - IO TLB memory pool descriptor
  * @start: The start address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
@@ -73,16 +72,36 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
  * may be remapped in the memory encrypted case and store virtual
  * address for bounce buffer operation.
- * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. For default swiotlb, this is command line adjustable via
- * setup_io_tlb_npages.
- * @used:  The number of used IO TLB block.
+ * @nslabs:The number of IO TLB slots between @start and @end. For the
+ * default swiotlb, this can be adjusted with a boot parameter,
+ * see setup_io_tlb_npages().
+ * @used:  The number of used IO TLB slots.
+ * @late_alloc:%true if allocated using the page allocator.
+ * @nareas:Number of areas in the pool.
+ * @area_nslabs: Number of slots in each area.
+ * @areas: Array of memory area descriptors.
+ * @slots: Array of slot descriptors.
+ */
+struct io_tlb_pool {
+   phys_addr_t start;
+   phys_addr_t end;
+   void *vaddr;
+   unsigned long nslabs;
+   unsigned long used;
+   bool late_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
+   struct io_tlb_slot *slots;
+};
+
+/**
+ * struct io_tlb_mem - Software IO TLB allocator
+ * @pool:  IO TLB memory pool descriptor.
+ * @nslabs:Total number of IO TLB slabs in all pools.
  * @debugfs:   The dentry to debugfs.
- * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
- * @nareas:  The area number in the pool.
- * @area_nslabs: The slot number in the area.
  * @total_used:The total number of slots in the pool that are 
currently used
  * across all areas. Used only for calculating used_hiwater in
  * debugfs.
@@ -90,19 +109,11 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
phys,
  * in debugfs.
  */
 struct io_tlb_mem {
-   phys_addr_t start;
-   phys_addr_t end;
-   void *vaddr;
+   struct io_tlb_pool *pool;
unsigned long nslabs;
-   unsigned long used;
struct dentry *debugfs;
-   bool late_alloc;
bool force_bounce;
bool for_alloc;
-   unsigned int nareas;
-   unsigned int area_nslabs;
-   struct io_tlb_area *areas;
-   struct io_tlb_slot *slots;
 #ifdef CONFIG_DEBUG_FS
atomic_long_t total_used;
atomic_long_t used_hiwater;
@@ -124,7 +135,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
-   return mem && paddr >= mem->start && paddr < mem->end;
+   return mem && paddr >= mem->pool->start && paddr < mem->pool->end;
 }
 
 static inline bool is_swiotlb_force_bounce(struct device *dev)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 888c25a9

[PATCH v3 1/7] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-06-27 Thread Petr Tesarik
From: Petr Tesarik 

SWIOTLB implementation details should not be exposed to the rest of the
kernel. This will allow to make changes to the implementation without
modifying non-swiotlb code.

To avoid breaking existing users, provide helper functions for the few
required fields. Enhance is_swiotlb_active() to work with the default IO
TLB pool when passed a NULL device.

As a bonus, using a helper function to initialize struct device allows to
get rid of an #ifdef in driver core.

Signed-off-by: Petr Tesarik 
---
 arch/arm/xen/mm.c  |  2 +-
 arch/mips/pci/pci-octeon.c |  2 +-
 arch/x86/kernel/pci-dma.c  |  2 +-
 drivers/base/core.c|  4 +---
 drivers/xen/swiotlb-xen.c  |  2 +-
 include/linux/swiotlb.h| 17 -
 kernel/dma/swiotlb.c   | 39 --
 7 files changed, 58 insertions(+), 10 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 3d826c0b5fee..529aabc1be38 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -125,7 +125,7 @@ static int __init xen_mm_init(void)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
+   if (!is_swiotlb_active(NULL)) {
rc = swiotlb_init_late(swiotlb_size_or_default(),
   xen_swiotlb_gfp(), NULL);
if (rc < 0)
diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
index e457a18cbdc5..c5c4c1f7d5e4 100644
--- a/arch/mips/pci/pci-octeon.c
+++ b/arch/mips/pci/pci-octeon.c
@@ -664,7 +664,7 @@ static int __init octeon_pci_setup(void)
 
/* BAR1 movable regions contiguous to cover the swiotlb */
octeon_bar1_pci_phys =
-   io_tlb_default_mem.start & ~((1ull << 22) - 1);
+   default_swiotlb_start() & ~((1ull << 22) - 1);
 
for (index = 0; index < 32; index++) {
union cvmx_pci_bar1_indexx bar1_index;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index de6be0a3965e..33e960672837 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -90,7 +90,7 @@ int pci_xen_swiotlb_init_late(void)
return 0;
 
/* we can work with the default swiotlb */
-   if (!io_tlb_default_mem.nslabs) {
+   if (!is_swiotlb_active(NULL)) {
int rc = swiotlb_init_late(swiotlb_size_or_default(),
   GFP_KERNEL, xen_swiotlb_fixup);
if (rc < 0)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3dff5037943e..46d1d78c5beb 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3108,9 +3108,7 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
-#ifdef CONFIG_SWIOTLB
-   dev->dma_io_tlb_mem = _tlb_default_mem;
-#endif
+   swiotlb_dev_init(dev);
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 67aa74d20162..946bd56f0ac5 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -381,7 +381,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
scatterlist *sgl,
 static int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
-   return xen_phys_to_dma(hwdev, io_tlb_default_mem.end - 1) <= mask;
+   return xen_phys_to_dma(hwdev, default_swiotlb_limit()) <= mask;
 }
 
 const struct dma_map_ops xen_swiotlb_dma_ops = {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7af2673b47ba..b38045be532d 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -112,7 +112,6 @@ struct io_tlb_mem {
atomic_long_t used_hiwater;
 #endif
 };
-extern struct io_tlb_mem io_tlb_default_mem;
 
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
@@ -130,13 +129,19 @@ static inline bool is_swiotlb_force_bounce(struct device 
*dev)
 
 void swiotlb_init(bool addressing_limited, unsigned int flags);
 void __init swiotlb_exit(void);
+void swiotlb_dev_init(struct device *dev);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+phys_addr_t default_swiotlb_start(void);
+phys_addr_t default_swiotlb_limit(void);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
 }
+static inline void swiotlb_dev_init(struct device *dev)
+{
+}
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
@@ -161,6 +166,16 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline phys_addr_t default_swiotlb_start(void)
+{
+   return 0;
+}
+
+static inline phys_addr_t defaul

[PATCH v3 5/7] swiotlb: determine potential physical address limit

2023-06-27 Thread Petr Tesarik
From: Petr Tesarik 

The value returned by default_swiotlb_limit() should not change, because it
is used to decide whether DMA can be used. To allow allocating memory pools
on the fly, use the maximum possible physical address rather than the
highest address used by the default pool.

For swiotlb_init_remap(), this is either an arch-specific limit used by
memblock_alloc_low(), or the highest directly mapped physical address if
the initialization flags include SWIOTLB_ANY. For swiotlb_init_late(), the
highest address is determined by the GFP flags.

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/swiotlb.c| 11 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index ae1688438850..4a3af1c216d0 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -105,6 +105,7 @@ struct io_tlb_pool {
  * struct io_tlb_mem - Software IO TLB allocator
  * @pool:  IO TLB memory pool descriptor.
  * @nslabs:Total number of IO TLB slabs in all pools.
+ * @phys_limit:Maximum allowed physical address.
  * @debugfs:   The dentry to debugfs.
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
@@ -117,6 +118,7 @@ struct io_tlb_pool {
 struct io_tlb_mem {
struct io_tlb_pool *pool;
unsigned long nslabs;
+   u64 phys_limit;
struct dentry *debugfs;
bool force_bounce;
bool for_alloc;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 06b4fa7c2e9b..5bb83097ade6 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -333,6 +333,9 @@ void __init swiotlb_init_remap(bool addressing_limit, 
unsigned int flags,
 
io_tlb_default_mem.force_bounce =
swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
+   io_tlb_default_mem.phys_limit = flags & SWIOTLB_ANY
+   ? virt_to_phys(high_memory - 1)
+   : ARCH_LOW_ADDRESS_LIMIT;
 
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
@@ -400,6 +403,12 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
return 0;
 
io_tlb_default_mem.force_bounce = swiotlb_force_bounce;
+   io_tlb_default_mem.phys_limit =
+   IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA)
+   ? DMA_BIT_MASK(zone_dma_bits)
+   : (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32)
+  ? DMA_BIT_MASK(32)
+  : virt_to_phys(high_memory - 1));
 
if (!default_nareas)
swiotlb_adjust_nareas(num_possible_cpus());
@@ -1308,7 +1317,7 @@ phys_addr_t default_swiotlb_start(void)
  */
 phys_addr_t default_swiotlb_limit(void)
 {
-   return io_tlb_default_pool.end - 1;
+   return io_tlb_default_mem.phys_limit;
 }
 
 #ifdef CONFIG_DEBUG_FS
-- 
2.25.1




[PATCH v3 2/7] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-06-27 Thread Petr Tesarik
From: Petr Tesarik 

Add some kernel-doc comments and move the existing documentation of struct
io_tlb_slot to its correct location. The latter was forgotten in commit
942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").

Use the opportunity to give swiotlb_do_find_slots() a more descriptive
name, which makes it clear how it differs from swiotlb_find_slots().

Signed-off-by: Petr Tesarik 
---
 include/linux/swiotlb.h | 15 
 kernel/dma/swiotlb.c| 52 -
 2 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index b38045be532d..9f486843a66a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -77,10 +77,6 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
- * @list:  The free list describing the number of free entries available
- * from each index.
- * @orig_addr: The original address corresponding to a mapped entry.
- * @alloc_size:Size of the allocated buffer.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
@@ -113,6 +109,17 @@ struct io_tlb_mem {
 #endif
 };
 
+/**
+ * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
+ * @dev:Device which has mapped the buffer.
+ * @paddr:  Physical address within the DMA buffer.
+ *
+ * Check if @paddr points into a bounce buffer.
+ *
+ * Return:
+ * * %true if @paddr points into a bounce buffer
+ * * %false otherwise
+ */
 static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 63d318059f27..888c25a9bfcc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -62,6 +62,13 @@
 
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
+/**
+ * struct io_tlb_slot - IO TLB slot descriptor
+ * @orig_addr: The original address corresponding to a mapped entry.
+ * @alloc_size:Size of the allocated buffer.
+ * @list:  The free list describing the number of free entries available
+ * from each index.
+ */
 struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -632,11 +639,22 @@ static void dec_used(struct io_tlb_mem *mem, unsigned int 
nslots)
 }
 #endif /* CONFIG_DEBUG_FS */
 
-/*
- * Find a suitable number of IO TLB entries size that will fit this request and
- * allocate a buffer from that IO TLB pool.
+/**
+ * area_find_slots() - search for slots in one IO TLB memory area
+ * @dev:   Device which maps the buffer.
+ * @area_index:Index of the IO TLB memory area to be searched.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Find a suitable sequence of IO TLB entries for the request and allocate
+ * a buffer from the given IO TLB memory area.
+ * This function takes care of locking.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
  */
-static int swiotlb_do_find_slots(struct device *dev, int area_index,
+static int area_find_slots(struct device *dev, int area_index,
phys_addr_t orig_addr, size_t alloc_size,
unsigned int alloc_align_mask)
 {
@@ -731,6 +749,19 @@ static int swiotlb_do_find_slots(struct device *dev, int 
area_index,
return slot_index;
 }
 
+/**
+ * swiotlb_find_slots() - search for slots in the whole swiotlb
+ * @dev:   Device which maps the buffer.
+ * @orig_addr: Original (non-bounced) IO buffer address.
+ * @alloc_size: Total requested size of the bounce buffer,
+ * including initial alignment padding.
+ * @alloc_align_mask:  Required alignment of the allocated buffer.
+ *
+ * Search through the whole software IO TLB to find a sequence of slots that
+ * match the allocation constraints.
+ *
+ * Return: Index of the first allocated slot, or -1 on error.
+ */
 static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size, unsigned int alloc_align_mask)
 {
@@ -739,8 +770,8 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
int i = start, index;
 
do {
-   index = swiotlb_do_find_slots(dev, i, orig_addr, alloc_size,
- alloc_align_mask);
+   index = area_find_slots(dev, i, orig_addr, alloc_size,
+   alloc_align_mask);
if (index >= 0)
return index;
if (++

[PATCH v3 4/7] swiotlb: if swiotlb is full, fall back to a transient memory pool

2023-06-27 Thread Petr Tesarik
From: Petr Tesarik 

Try to allocate a transient memory pool if no suitable slots can be found,
except when allocating from a restricted pool. The transient pool is just
enough big for this one bounce buffer. It is inserted into a per-device
list of transient memory pools, and it is freed again when the bounce
buffer is unmapped.

Transient memory pools are kept in an RCU list. A memory barrier is
required after adding a new entry, because any address within a transient
buffer must be immediately recognized as belonging to the SWIOTLB, even if
it is passed to another CPU.

Deletion does not require any synchronization beyond RCU ordering
guarantees. After a buffer is unmapped, its physical addresses may no
longer be passed to the DMA API, so the memory range of the corresponding
stale entry in the RCU list never matches. If the memory range gets
allocated again, then it happens only after a RCU quiescent state.

Since bounce buffers can now be allocated from different pools, add a
parameter to swiotlb_alloc_pool() to let the caller know which memory pool
is used. Add swiotlb_find_pool() to find the memory pool corresponding to
an address. This function is now also used by is_swiotlb_buffer(), because
a simple boundary check is no longer sufficient.

The logic in swiotlb_alloc_tlb() is taken from __dma_direct_alloc_pages(),
simplified and enhanced to use coherent memory pools if needed.

Note that this is not the most efficient way to provide a bounce buffer,
but when a DMA buffer can't be mapped, something may (and will) actually
break. At that point it is better to make an allocation, even if it may be
an expensive operation.

Signed-off-by: Petr Tesarik 
---
 include/linux/device.h  |   4 +
 include/linux/dma-mapping.h |   2 +
 include/linux/swiotlb.h |  13 +-
 kernel/dma/direct.c |   2 +-
 kernel/dma/swiotlb.c| 265 ++--
 5 files changed, 272 insertions(+), 14 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 83081aa99e6a..a1ee4c5924b8 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -510,6 +510,8 @@ struct device_physical_location {
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
  * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
+ * @dma_io_tlb_pools:  List of transient swiotlb memory pools.
+ * @dma_io_tlb_lock:   Protects changes to the list of active pools.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -615,6 +617,8 @@ struct device {
 #endif
 #ifdef CONFIG_SWIOTLB
struct io_tlb_mem *dma_io_tlb_mem;
+   struct list_head dma_io_tlb_pools;
+   spinlock_t dma_io_tlb_lock;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0ee20b764000..c36c5a546787 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -417,6 +417,8 @@ static inline void dma_sync_sgtable_for_device(struct 
device *dev,
 #define dma_get_sgtable(d, t, v, h, s) dma_get_sgtable_attrs(d, t, v, h, s, 0)
 #define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, 0)
 
+bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size);
+
 static inline void *dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp)
 {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 0aa6868cb68c..ae1688438850 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -63,6 +63,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 
 /**
  * struct io_tlb_pool - IO TLB memory pool descriptor
+ * @node:  Member of the IO TLB memory pool list.
  * @start: The start address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
@@ -77,22 +78,27 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
  * see setup_io_tlb_npages().
  * @used:  The number of used IO TLB slots.
  * @late_alloc:%true if allocated using the page allocator.
+ * @transient:  %true if transient memory pool.
  * @nareas:Number of areas in the pool.
  * @area_nslabs: Number of slots in each area.
  * @areas: Array of memory area descriptors.
  * @slots: Array of slot descriptors.
+ * @rcu:   RCU head for swiotlb_dyn_free().
  */
 struct io_tlb_pool {
+   struct list_head node;
phys_addr_t start;
phys_addr_t end;
void *vaddr;
unsigned long nslabs;
unsigned long used;
bool late_alloc;
+   bool transient;
unsigned int nareas;
unsigned int area_nslabs;
struct io_tlb_area *areas;
struct io_tlb_slot *slots

Re: [Xen-devel] where can I find the 'address translation' code in Xen?

2018-04-30 Thread Petr Tesarik
Hi Minjun,

On Mon, 30 Apr 2018 17:17:36 +0900
Minjun Hong <nickey...@gmail.com> wrote:

> On Mon, Apr 30, 2018 at 3:44 PM, Petr Tesarik <ptesa...@suse.cz> wrote:
> 
> > Hi Minjun,
> >
> > On Sun, 29 Apr 2018 19:11:30 +0900
> > Minjun Hong <nickey...@gmail.com> wrote:
> >  
> > >[...]
> > > My question is,
> > >
> > > 1. Is it sure that the function will be called even though the HW already
> > > translates the address and populates the TLB entry?  
> >
> > I think you miss the point. The hardware page tables for PV domains
> > already contain machine addresses. In other words, virtual addresses
> > get translated directly to machine addresses by the hardware page table
> > walker.
> >  
> > > 2. I'm just asking, is there any code in Xen that is related to the
> > > behavior of the 'hardware walker'?  
> >
> > Not sure what you mean. Maybe this question is no longer pertinent
> > given the explanation above?
> >
> > HTH,
> > Petr T
> >
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xenproject.org
> > https://lists.xenproject.org/mailman/listinfo/xen-devel  
> 
> 
> 
> Thanks for your kind answer, Petr.
> 
> In your answer on my first question, I was wondering why the function (
> guest_walk_tables()) should be called even after the address translation
> and TLB filling has been completed by the hardware page table walker.
> As you mentioned, original goal (address translation) is achieved by the
> hardware. If so, is its role like bottom half of interrupt?

Ah, now I'm starting to get your point. So, you're interested in
what happens after guest page tables are updated.

> I want to know this function is always called after the translation
> by the hardware page walker.

It seems to me that there are quite a few ways to avoid a full page
table walk, but I'm not an expert on this topic, sorry.

Petr T


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] where can I find the 'address translation' code in Xen?

2018-04-30 Thread Petr Tesarik
Hi Minjun,

On Sun, 29 Apr 2018 19:11:30 +0900
Minjun Hong  wrote:

>[...]
> My question is,
> 
> 1. Is it sure that the function will be called even though the HW already
> translates the address and populates the TLB entry?

I think you miss the point. The hardware page tables for PV domains
already contain machine addresses. In other words, virtual addresses
get translated directly to machine addresses by the hardware page table
walker.

> 2. I'm just asking, is there any code in Xen that is related to the
> behavior of the 'hardware walker'?

Not sure what you mean. Maybe this question is no longer pertinent
given the explanation above?

HTH,
Petr T

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Should PV frontend drivers trust the backends?

2018-04-26 Thread Petr Tesarik
On Wed, 25 Apr 2018 13:47:09 +
Paul Durrant  wrote:

> > -Original Message-
> > From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On
> > Behalf Of Juergen Gross
> > Sent: 25 April 2018 13:43
> > To: xen-devel 
> > Subject: [Xen-devel] Should PV frontend drivers trust the backends?
> > 
> > This is a followup of a discussion on IRC:
> > 
> > The main question of the discussion was: "Should frontend drivers
> > trust their backends not doing malicious actions?"
> > 
>[...]
> I see the general question as being analogous to 'should a Linux
> device driver trust its hardware' and I think the answer for a
> general purpose OS like linux is 'yes'.

I can see how this is analogous, but it's not identical. Traditionally,
hardware has full control of the system anyway, so it makes little
sense to distrust it. It does make sense to validate the data and retry
an invalid operation (if possible) or crash the system (if technically
impossible).

However, a backend driver runs in a domain that is not much different
from the domain running the frontend driver, so it is theoretically
possible to implement some support in Xen itself. Now, if you're asking
whether Xen _should_ add complex handling of resilient domain-to-domain
communication, that's purely a matter of taste.

FWIW my vote is: Do nothing. Keep Xen architecture simple.

Petr T

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86: Do not reserve a crash kernel region if booted on Xen PV

2018-04-25 Thread Petr Tesarik
Xen PV domains cannot shut down and start a crash kernel. Instead,
the crashing kernel makes a SCHEDOP_shutdown hypercall with the
reason code SHUTDOWN_crash, cf. xen_crash_shutdown() machine op in
arch/x86/xen/enlighten_pv.c.

A crash kernel reservation is merely a waste of RAM in this case. It
may also confuse users of kexec_load(2) and/or kexec_file_load(2).
When flags include KEXEC_ON_CRASH or KEXEC_FILE_ON_CRASH,
respectively, these syscalls return success, which is technically
correct, but the crash kexec image will never be actually used.

Signed-off-by: Petr Tesarik <ptesa...@suse.com>
---
 arch/x86/kernel/setup.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6285697b6e56..5c623dfe39d1 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -534,6 +535,11 @@ static void __init reserve_crashkernel(void)
high = true;
}
 
+   if (xen_pv_domain()) {
+   pr_info("Ignoring crashkernel for a Xen PV domain\n");
+   return;
+   }
+
/* 0 means: find the address automatically */
if (crash_base <= 0) {
/*
-- 
2.13.6

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel