Re: cleanup swiotlb initialization v8

2022-04-06 Thread Konrad Rzeszutek Wilk
On Mon, Apr 04, 2022 at 07:05:44AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> this series tries to clean up the swiotlb initialization, including
> that of swiotlb-xen.  To get there is also removes the x86 iommu table
> infrastructure that massively obsfucates the initialization path.
> 
> Git tree:
> 
> git://git.infradead.org/users/hch/misc.git swiotlb-init-cleanup
> 
> Gitweb:
> 
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-init-cleanup
> 
> Changes since v7:
>  - rebased to Linux 5.18-rc1
>  - better document the lower bound swiotlb size for xen-swiotlb
>  - improve the nslabs calculation for the retry case in
>swiotlb_init_remap and swiotlb_init_late

Hey Christoph,

Feel free to tack on

Reviewed-by: Konrad Rzeszutek Wilk 

on them if you would like.

Thank you for doing the spring cleaning of this codebase!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/15] swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction

2022-04-06 Thread Konrad Rzeszutek Wilk
> diff --git a/arch/powerpc/platforms/pseries/svm.c 
> b/arch/powerpc/platforms/pseries/svm.c
> index c5228f4969eb2..3b4045d508ec8 100644
> --- a/arch/powerpc/platforms/pseries/svm.c
> +++ b/arch/powerpc/platforms/pseries/svm.c
> @@ -28,7 +28,7 @@ static int __init init_svm(void)
>* need to use the SWIOTLB buffer for DMA even if dma_capable() says
>* otherwise.
>*/
> - swiotlb_force = SWIOTLB_FORCE;
> + ppc_swiotlb_flags |= SWIOTLB_ANY | SWIOTLB_FORCE;

This is the only place you set the ppc_swiotlb_flags.. so I wonder why
the '|=' instead of just '=' ?

Either way:
Reviewed-by: Konrad Rzeszutek Wilk 

Thank you!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/15] x86: remove the IOMMU table infrastructure

2022-04-06 Thread Konrad Rzeszutek Wilk
On Mon, Apr 04, 2022 at 07:05:51AM +0200, Christoph Hellwig wrote:
> The IOMMU table tries to separate the different IOMMUs into different
> backends, but actually requires various cross calls.
> 
> Rewrite the code to do the generic swiotlb/swiotlb-xen setup directly
> in pci-dma.c and then just call into the IOMMU drivers.
> 
> Signed-off-by: Christoph Hellwig 

Hey Christoph,

There is a bit of background behind this - this whole IOMMU table
dynamic was done as at that point of time the pci_iommu_alloc was getting
way to unwieldy - and there needed to be a more 'structured' way with
dependencies.

Hence this creation... But as Christoph points out - it has gotten out
of hand. So smashing it back to a more simplistic mechanism is good.

Reviewed-by: Konrad Rzeszutek Wilk 

Thank you!

> ---
>  arch/ia64/include/asm/iommu_table.h|   7 --
>  arch/x86/include/asm/dma-mapping.h |   1 -
>  arch/x86/include/asm/gart.h|   5 +-
>  arch/x86/include/asm/iommu.h   |   6 ++
>  arch/x86/include/asm/iommu_table.h | 102 ---
>  arch/x86/include/asm/swiotlb.h |  30 ---
>  arch/x86/include/asm/xen/swiotlb-xen.h |   2 -
>  arch/x86/kernel/Makefile   |   2 -
>  arch/x86/kernel/amd_gart_64.c  |   5 +-
>  arch/x86/kernel/aperture_64.c  |  14 ++--
>  arch/x86/kernel/pci-dma.c  | 107 -
>  arch/x86/kernel/pci-iommu_table.c  |  77 --
>  arch/x86/kernel/pci-swiotlb.c  |  77 --
>  arch/x86/kernel/tboot.c|   1 -
>  arch/x86/kernel/vmlinux.lds.S  |  12 ---
>  arch/x86/xen/Makefile  |   2 -
>  arch/x86/xen/pci-swiotlb-xen.c |  96 --
>  drivers/iommu/amd/init.c   |   6 --
>  drivers/iommu/amd/iommu.c  |   5 +-
>  drivers/iommu/intel/dmar.c |   6 +-
>  include/linux/dmar.h   |   6 +-
>  21 files changed, 110 insertions(+), 459 deletions(-)
>  delete mode 100644 arch/ia64/include/asm/iommu_table.h
>  delete mode 100644 arch/x86/include/asm/iommu_table.h
>  delete mode 100644 arch/x86/include/asm/swiotlb.h
>  delete mode 100644 arch/x86/kernel/pci-iommu_table.c
>  delete mode 100644 arch/x86/kernel/pci-swiotlb.c
>  delete mode 100644 arch/x86/xen/pci-swiotlb-xen.c
> 
> diff --git a/arch/ia64/include/asm/iommu_table.h 
> b/arch/ia64/include/asm/iommu_table.h
> deleted file mode 100644
> index cc96116ac276a..0
> --- a/arch/ia64/include/asm/iommu_table.h
> +++ /dev/null
> @@ -1,7 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> -#ifndef _ASM_IA64_IOMMU_TABLE_H
> -#define _ASM_IA64_IOMMU_TABLE_H
> -
> -#define IOMMU_INIT_POST(_detect)
> -
> -#endif /* _ASM_IA64_IOMMU_TABLE_H */
> diff --git a/arch/x86/include/asm/dma-mapping.h 
> b/arch/x86/include/asm/dma-mapping.h
> index bb1654fe0ce74..256fd8115223d 100644
> --- a/arch/x86/include/asm/dma-mapping.h
> +++ b/arch/x86/include/asm/dma-mapping.h
> @@ -9,7 +9,6 @@
>  
>  #include 
>  #include 
> -#include 
>  
>  extern int iommu_merge;
>  extern int panic_on_overflow;
> diff --git a/arch/x86/include/asm/gart.h b/arch/x86/include/asm/gart.h
> index 3185565743459..5af8088a10df6 100644
> --- a/arch/x86/include/asm/gart.h
> +++ b/arch/x86/include/asm/gart.h
> @@ -38,7 +38,7 @@ extern int gart_iommu_aperture_disabled;
>  extern void early_gart_iommu_check(void);
>  extern int gart_iommu_init(void);
>  extern void __init gart_parse_options(char *);
> -extern int gart_iommu_hole_init(void);
> +void gart_iommu_hole_init(void);
>  
>  #else
>  #define gart_iommu_aperture0
> @@ -51,9 +51,8 @@ static inline void early_gart_iommu_check(void)
>  static inline void gart_parse_options(char *options)
>  {
>  }
> -static inline int gart_iommu_hole_init(void)
> +static inline void gart_iommu_hole_init(void)
>  {
> - return -ENODEV;
>  }
>  #endif
>  
> diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
> index bf1ed2ddc74bd..dba89ed40d38d 100644
> --- a/arch/x86/include/asm/iommu.h
> +++ b/arch/x86/include/asm/iommu.h
> @@ -9,6 +9,12 @@
>  extern int force_iommu, no_iommu;
>  extern int iommu_detected;
>  
> +#ifdef CONFIG_SWIOTLB
> +extern bool x86_swiotlb_enable;
> +#else
> +#define x86_swiotlb_enable false
> +#endif
> +
>  /* 10 seconds */
>  #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
>  
> diff --git a/arch/x86/include/asm/iommu_table.h 
> b/arch/x86/include/asm/iommu_table.h
> deleted file mode 100644
> index 1fb3fd1a83c25..0
> --- a/arch/x86/include/asm/iommu_table.h
&g

Re: [PATCH 07/11] x86: remove the IOMMU table infrastructure

2022-03-01 Thread Konrad Rzeszutek Wilk
> -#include 
> -
> -/*
> - * History lesson:
> - * The execution chain of IOMMUs in 2.6.36 looks as so:
> - *
> - *[xen-swiotlb]
> - * |
> - * +[swiotlb *]--+
> - */ | \
> - *   /  |  \
> - *[GART] [Calgary]  [Intel VT-d]
> - * /
> - */
> - * [AMD-Vi]

.. snip..
> - *
>  void __init pci_iommu_alloc(void)
>  {
> - struct iommu_table_entry *p;
> -
> - sort_iommu_table(__iommu_table, __iommu_table_end);
> - check_iommu_entries(__iommu_table, __iommu_table_end);
> -
> - for (p = __iommu_table; p < __iommu_table_end; p++) {
> - if (p && p->detect && p->detect() > 0) {
> - p->flags |= IOMMU_DETECTED;
> - if (p->early_init)
> - p->early_init();
> - if (p->flags & IOMMU_FINISH_IF_DETECTED)
> - break;
> - }
> + if (xen_pv_domain()) {
> + pci_xen_swiotlb_init();
> + return;
>   }
> + pci_swiotlb_detect_4gb();

I think you also need to check for IBM Calgary?

> + gart_iommu_hole_init();
> + amd_iommu_detect();
> + detect_intel_iommu();
> + if (x86_swiotlb_enable)
> + swiotlb_init(0);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: allocate memory in a cache-friendly way

2021-09-16 Thread Konrad Rzeszutek Wilk
On Wed, Sep 01, 2021 at 12:21:35PM +0800, Chao Gao wrote:
> Currently, swiotlb uses a global index to indicate the starting point
> of next search. The index increases from 0 to the number of slots - 1
> and then wraps around. It is straightforward but not cache-friendly
> because the "oldest" slot in swiotlb tends to be used first.
> 
> Freed slots are probably accessed right before being freed, especially
> in VM's case (device backends access them in DMA_TO_DEVICE mode; guest
> accesses them in other DMA modes). Thus those just freed slots may
> reside in cache. Then reusing those just freed slots can reduce cache
> misses.
> 
> To that end, maintain a free list for free slots and insert freed slots
> from the head and searching for free slots always starts from the head.
> 
> With this optimization, network throughput of sending data from host to
> guest, measured by iperf3, increases by 7%.

Wow, that is pretty awesome!

Are there any other benchmarks that you ran that showed a negative
performance?

Thank you.
> 
> A bad side effect of this patch is we cannot use a large stride to skip
> unaligned slots when there is an alignment requirement. Currently, a
> large stride is used when a) device has an alignment requirement, stride
> is calculated according to the requirement; b) the requested size is
> larger than PAGE_SIZE. For x86 with 4KB page size, stride is set to 2.
> 
> For case a), few devices have an alignment requirement; the impact is
> limited. For case b) this patch probably leads to one (or more if page size
> is larger than 4K) additional lookup; but as the "io_tlb_slot" struct of
> free slots are also accessed when freeing slots, they probably resides in
> CPU cache as well and then the overhead is almost negligible.
> 
> Suggested-by: Andi Kleen 
> Signed-off-by: Chao Gao 
> ---
>  include/linux/swiotlb.h | 15 --
>  kernel/dma/swiotlb.c| 43 +++--
>  2 files changed, 20 insertions(+), 38 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index b0cb2a9973f4..8cafafd218af 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -63,6 +63,13 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
> phys,
>  #ifdef CONFIG_SWIOTLB
>  extern enum swiotlb_force swiotlb_force;
>  
> +struct io_tlb_slot {
> + phys_addr_t orig_addr;
> + size_t alloc_size;
> + unsigned int list;
> + struct list_head node;
> +};
> +
>  /**
>   * struct io_tlb_mem - IO TLB Memory Pool Descriptor
>   *
> @@ -93,17 +100,13 @@ struct io_tlb_mem {
>   phys_addr_t end;
>   unsigned long nslabs;
>   unsigned long used;
> - unsigned int index;
> + struct list_head free_slots;
>   spinlock_t lock;
>   struct dentry *debugfs;
>   bool late_alloc;
>   bool force_bounce;
>   bool for_alloc;
> - struct io_tlb_slot {
> - phys_addr_t orig_addr;
> - size_t alloc_size;
> - unsigned int list;
> - } *slots;
> + struct io_tlb_slot *slots;
>  };
>  extern struct io_tlb_mem io_tlb_default_mem;
>  
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 87c40517e822..12b5b8471e54 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -184,7 +184,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
> *mem, phys_addr_t start,
>   mem->nslabs = nslabs;
>   mem->start = start;
>   mem->end = mem->start + bytes;
> - mem->index = 0;
> + INIT_LIST_HEAD(&mem->free_slots);
>   mem->late_alloc = late_alloc;
>  
>   if (swiotlb_force == SWIOTLB_FORCE)
> @@ -195,6 +195,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
> *mem, phys_addr_t start,
>   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>   mem->slots[i].alloc_size = 0;
> + list_add_tail(&mem->slots[i].node, &mem->free_slots);
>   }
>   memset(vaddr, 0, bytes);
>  }
> @@ -447,13 +448,6 @@ static inline unsigned long get_max_slots(unsigned long 
> boundary_mask)
>   return nr_slots(boundary_mask + 1);
>  }
>  
> -static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index)
> -{
> - if (index >= mem->nslabs)
> - return 0;
> - return index;
> -}
> -
>  /*
>   * Find a suitable number of IO TLB entries size that will fit this request 
> and
>   * allocate a buffer from that IO TLB pool.
> @@ -462,38 +456,29 @@ static int swiotlb_find_slots(struct device *dev, 
> phys_addr_t orig_addr,
> size_t alloc_size)
>  {
>   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> + struct io_tlb_slot *slot, *tmp;
>   unsigned long boundary_mask = dma_get_seg_boundary(dev);
>   dma_addr_t tbl_dma_addr =
>   phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
>   unsigned long max_slots = get_max_slots(boundary_mask);
>   

Re: [PATCH] swiotlb: use depends on for DMA_RESTRICTED_POOL

2021-08-31 Thread Konrad Rzeszutek Wilk
On Tue, Aug 31, 2021 at 04:17:47PM +0100, Will Deacon wrote:
> On Fri, Aug 27, 2021 at 11:48:02AM +0800, Claire Chang wrote:
> > Use depends on instead of select for DMA_RESTRICTED_POOL; otherwise it
> > will make SWIOTLB user configurable and cause compile errors for some
> > arch (e.g. mips).
> > 
> > Fixes: 0b84e4f8b793 ("swiotlb: Add restricted DMA pool initialization")
> > Reported-by: Guenter Roeck 
> > Signed-off-by: Claire Chang 
> > ---
> >  kernel/dma/Kconfig | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> > index fd4db714d86b..1b02179758cb 100644
> > --- a/kernel/dma/Kconfig
> > +++ b/kernel/dma/Kconfig
> > @@ -82,8 +82,7 @@ config SWIOTLB
> >  
> >  config DMA_RESTRICTED_POOL
> > bool "DMA Restricted Pool"
> > -   depends on OF && OF_RESERVED_MEM
> > -   select SWIOTLB
> > +   depends on OF && OF_RESERVED_MEM && SWIOTLB
> > help
> >   This enables support for restricted DMA pools which provide a level of
> >   DMA memory protection on systems with limited hardware protection
> 
> Acked-by: Will Deacon 

I put it in linux-next and devel/for-linus-5.15.

I will wait a day and see if there are any issues with linux-next and if
not will send out a GIT pull.

Thanks!


> 
> Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/2] Don't fail device probing due to of_dma_set_restricted_buffer()

2021-08-23 Thread Konrad Rzeszutek Wilk
On Mon, Aug 16, 2021 at 02:26:15PM +0100, Will Deacon wrote:
> Hi all,
> 
> This is v2 of the patch I previously posted here:
> 
>   https://lore.kernel.org/r/20210805094736.902-1-w...@kernel.org
> 
> Changes since v1 are:
> 
>   * Move of_dma_set_restricted_buffer() into of/device.c (Rob)
>   * Use IS_ENABLED() instead of 'static inline' stub (Rob)
> 
> This applies on Konrad's devel/for-linus-5.15 branch in swiotlb.git

It should show up later today.

Thanks!
> 
> Cheers,
> 
> Will
> 
> Cc: Claire Chang 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Christoph Hellwig 
> Cc: Rob Herring 
> Cc: Robin Murphy 
> 
> --->8
> 
> Will Deacon (2):
>   of: Move of_dma_set_restricted_buffer() into device.c
>   of: restricted dma: Don't fail device probe on rmem init failure
> 
>  drivers/of/address.c| 33 -
>  drivers/of/device.c | 39 ++-
>  drivers/of/of_private.h |  7 ---
>  3 files changed, 38 insertions(+), 41 deletions(-)
> 
> -- 
> 2.33.0.rc1.237.g0d66db33f3-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/4] swiotlb: Free tbl memory in swiotlb_exit()

2021-07-31 Thread Konrad Rzeszutek Wilk
On Sat, Jul 31, 2021 at 11:26:11AM -0700, Guenter Roeck wrote:
> Hi,
> 
> On Tue, Jul 20, 2021 at 02:38:26PM +0100, Will Deacon wrote:
> > Although swiotlb_exit() frees the 'slots' metadata array referenced by
> > 'io_tlb_default_mem', it leaves the underlying buffer pages allocated
> > despite no longer being usable.
> > 
> > Extend swiotlb_exit() to free the buffer pages as well as the slots
> > array.
> > 
> 
> This patch causes qemu pseries emulations to crash. Backtrace and bisect
> log see below. Reverting it fixes the problem.

I am 99% sure it is fixed by this patch (which should be in linux-next
in 5 minutes):


>From a449ffaf9181b5a2dc705d8a06b13e0068207fd4 Mon Sep 17 00:00:00 2001
From: Will Deacon 
Date: Fri, 30 Jul 2021 12:42:31 +0100
Subject: [PATCH] powerpc/svm: Don't issue ultracalls if !mem_encrypt_active()

Commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()")
introduced a set_memory_encrypted() call to swiotlb_exit() so that the
buffer pages are returned to an encrypted state prior to being freed.

Sachin reports that this leads to the following crash on a Power server:

[0.010799] software IO TLB: tearing down default memory pool
[0.010805] [ cut here ]
[0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!

Nick spotted that this is because set_memory_encrypted() is issuing an
ultracall which doesn't exist for the processor, and should therefore
be gated by mem_encrypt_active() to mirror the x86 implementation.

Cc: Konrad Rzeszutek Wilk 
Cc: Claire Chang 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Fixes: ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()")
Suggested-by: Nicholas Piggin 
Reported-by: Sachin Sant 
Tested-by: Sachin Sant 
Tested-by: Nathan Chancellor 
Link: 
https://lore.kernel.org/r/1905cd70-7656-42ae-99e2-a31fc3812...@linux.vnet.ibm.com/
Signed-off-by: Will Deacon 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/powerpc/platforms/pseries/svm.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/svm.c 
b/arch/powerpc/platforms/pseries/svm.c
index 1d829e257996..87f001b4c4e4 100644
--- a/arch/powerpc/platforms/pseries/svm.c
+++ b/arch/powerpc/platforms/pseries/svm.c
@@ -63,6 +63,9 @@ void __init svm_swiotlb_init(void)
 
 int set_memory_encrypted(unsigned long addr, int numpages)
 {
+   if (!mem_encrypt_active())
+   return 0;
+
if (!PAGE_ALIGNED(addr))
return -EINVAL;
 
@@ -73,6 +76,9 @@ int set_memory_encrypted(unsigned long addr, int numpages)
 
 int set_memory_decrypted(unsigned long addr, int numpages)
 {
+   if (!mem_encrypt_active())
+   return 0;
+
if (!PAGE_ALIGNED(addr))
return -EINVAL;
 
-- 
2.31.1

> 
> Guenter
> 
> ---
> Crash log:
> 
> ...
> [0.937801][T1] software IO TLB: tearing down default memory pool
> [0.938939][T1] [ cut here ]
> [0.940331][T1] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
> [0.940787][T1] Oops: Exception in kernel mode, sig: 5 [#1]
> [0.940883][T1] BE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA 
> pSeries
> [0.940999][T1] Modules linked in:
> [0.941240][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.14.0-rc3-next-20210729 #1
> [0.941523][T1] NIP:  c0031310 LR: c00312f4 CTR: 
> c000c5f0
> [0.941608][T1] REGS: c8687ac0 TRAP: 0700   Not tainted  
> (5.14.0-rc3-next-20210729)
> [0.941795][T1] MSR:  80029032   CR: 
> 2400  XER: 
> [0.942031][T1] CFAR: c0300360 IRQMASK: 3
> [0.942031][T1] GPR00: c00312f4 c8687d60 
> c22ee300 0003
> [0.942031][T1] GPR04: 033f 0400 
>  3e5a
> [0.942031][T1] GPR08: 3e5a 0001 
>  0003
> [0.942031][T1] GPR12:  c2fb 
> c00129c0 
> [0.942031][T1] GPR16:   
>  
> [0.942031][T1] GPR20:   
>  c21d8b00
> [0.942031][T1] GPR24:   
>  0400
> [0.942031][T1] GPR28: 033f f134 
> c176c068 c8687e80
> [0.942884][T1] NIP [c0031310] 
> .system_call_exception+0x70/0x2d0
> [0.943399][T1] LR [c00312f4] .system_call_exception+0x54/0x2d0
> [0.943594][T1] Call Trace:
> [0.943667][T1] [c8687d60] [c00312f4]

Re: [powerpc][next-20210727] Boot failure - kernel BUG at arch/powerpc/kernel/interrupt.c:98!

2021-07-29 Thread Konrad Rzeszutek Wilk
On Thu, Jul 29, 2021 at 05:13:36PM +0100, Will Deacon wrote:
> On Wed, Jul 28, 2021 at 10:35:34AM -0700, Nathan Chancellor wrote:
> > On Wed, Jul 28, 2021 at 01:31:06PM +0530, Sachin Sant wrote:
> > > next-20210723 was good. The boot failure seems to have been introduced 
> > > with next-20210726.
> > > 
> > > I have attached the boot log.
> > 
> > I noticed this with OpenSUSE's ppc64le config [1] and my bisect landed on
> > commit ad6c00283163 ("swiotlb: Free tbl memory in swiotlb_exit()"). That
> > series just keeps on giving...

Low-level across platform do that. And thank you for testing it and
finding this bug. Please let me know if the patch works so I can add it
in in the patch series.
> 
> Yes, but look how handy our new print is!

:)
> 
> [0.010799] software IO TLB: tearing down default memory pool
> [0.010805] [ cut here ]
> [0.010808] kernel BUG at arch/powerpc/kernel/interrupt.c:98!
> 
> Following Nick's suggestion, the diff below should help? I don't have a
> relevant box on which I can test it though.
> 
> Will
> 
> --->8
> 
> diff --git a/arch/powerpc/platforms/pseries/svm.c 
> b/arch/powerpc/platforms/pseries/svm.c
> index 1d829e257996..87f001b4c4e4 100644
> --- a/arch/powerpc/platforms/pseries/svm.c
> +++ b/arch/powerpc/platforms/pseries/svm.c
> @@ -63,6 +63,9 @@ void __init svm_swiotlb_init(void)
>  
>  int set_memory_encrypted(unsigned long addr, int numpages)
>  {
> +   if (!mem_encrypt_active())
> +   return 0;
> +
> if (!PAGE_ALIGNED(addr))
> return -EINVAL;
>  
> @@ -73,6 +76,9 @@ int set_memory_encrypted(unsigned long addr, int numpages)
>  
>  int set_memory_decrypted(unsigned long addr, int numpages)
>  {
> +   if (!mem_encrypt_active())
> +   return 0;
> +
> if (!PAGE_ALIGNED(addr))
> return -EINVAL;
>  
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-07-29 Thread Konrad Rzeszutek Wilk
On Wed, Jul 28, 2021 at 10:52:25AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
> extra address space which is above shared_gpa_boundary
> (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
> The access physical address will be original physical address +
> shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
> spec is called virtual top of memory(vTOM). Memory addresses below
> vTOM are automatically treated as private while memory above
> vTOM is treated as shared.
> 
> Use dma_map_decrypted() in the swiotlb code, store remap address returned
> and use the remap address to copy data from/to swiotlb bounce buffer.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  include/linux/swiotlb.h |  4 
>  kernel/dma/swiotlb.c| 11 ---
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index f507e3eacbea..584560ecaa8e 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
>   * @end: The end address of the swiotlb memory pool. Used to do a quick
>   *   range check to see if the memory was in fact allocated by this
>   *   API.
> + * @vaddr:   The vaddr of the swiotlb memory pool. The swiotlb
> + *   memory pool may be remapped in the memory encrypted case and 
> store
> + *   virtual address for bounce buffer operation.
>   * @nslabs:  The number of IO TLB blocks (in groups of 64) between @start and
>   *   @end. For default swiotlb, this is command line adjustable via
>   *   setup_io_tlb_npages.
> @@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force;
>  struct io_tlb_mem {
>   phys_addr_t start;
>   phys_addr_t end;
> + void *vaddr;
>   unsigned long nslabs;
>   unsigned long used;
>   unsigned int index;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 1fa81c096c1d..6866e5784b53 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -194,8 +194,13 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
> *mem, phys_addr_t start,
>   mem->slots[i].alloc_size = 0;
>   }
>  
> - set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
> - memset(vaddr, 0, bytes);
> + mem->vaddr = dma_map_decrypted(vaddr, bytes);
> + if (!mem->vaddr) {
> + pr_err("Failed to decrypt memory.\n");

I am wondering if it would be worth returning an error code in this
function instead of just printing an error?

For this patch I think it is Ok, but perhaps going forward this would be
better done as I am thinking - is there some global guest->hyperv
reporting mechanism so that if this fails - it ends up being bubbled up
to the HyperV console-ish?

And ditto for other hypervisors?


> + return;
> + }
> +
> + memset(mem->vaddr, 0, bytes);
>  }
>  
>  int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
> @@ -360,7 +365,7 @@ static void swiotlb_bounce(struct device *dev, 
> phys_addr_t tlb_addr, size_t size
>   phys_addr_t orig_addr = mem->slots[index].orig_addr;
>   size_t alloc_size = mem->slots[index].alloc_size;
>   unsigned long pfn = PFN_DOWN(orig_addr);
> - unsigned char *vaddr = phys_to_virt(tlb_addr);
> + unsigned char *vaddr = mem->vaddr + tlb_addr - mem->start;
>   unsigned int tlb_offset;
>  
>   if (orig_addr == INVALID_PHYS_ADDR)
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] Fix restricted DMA vs swiotlb_exit()

2021-07-23 Thread Konrad Rzeszutek Wilk
On Tue, Jul 20, 2021 at 02:38:22PM +0100, Will Deacon wrote:
> Hi again, folks,
> 
> This is version two of the patch series I posted yesterday:
> 
>   https://lore.kernel.org/r/20210719123054.6844-1-w...@kernel.org
> 
> The only changes since v1 are:
> 
>   * Squash patches 2 and 3, amending the commit message accordingly
>   * Add Reviewed-by and Tested-by tags from Christoph and Claire (thanks!)
> 
> I'd usually leave it a bit longer between postings, but since this fixes
> issues with patches in -next I thought I'd spin a new version immediately.

Thank you!

I put them in devel/for-linus-5.15 and linux-next.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/1] s390/pv: fix the forcing of the swiotlb

2021-07-23 Thread Konrad Rzeszutek Wilk
On Sat, Jul 24, 2021 at 01:17:46AM +0200, Halil Pasic wrote:
> Since commit 903cd0f315fe ("swiotlb: Use is_swiotlb_force_bounce for
> swiotlb data bouncing") if code sets swiotlb_force it needs to do so
> before the swiotlb is initialised. Otherwise
> io_tlb_default_mem->force_bounce will not get set to true, and devices
> that use (the default) swiotlb will not bounce despite switolb_force
> having the value of SWIOTLB_FORCE.
> 
> Let us restore swiotlb functionality for PV by fulfilling this new
> requirement.
> 
> This change addresses what turned out to be a fragility in
> commit 64e1f0c531d1 ("s390/mm: force swiotlb for protected
> virtualization"), which ain't exactly broken in its original context,
> but could give us some more headache if people backport the broken
> change and forget this fix.
> 
> Signed-off-by: Halil Pasic 
> Tested-by: Christian Borntraeger 
> Reviewed-by: Christian Borntraeger 
> Fixes: 903cd0f315fe ("swiotlb: Use is_swiotlb_force_bounce for swiotlb data 
> bouncing")
> Fixes: 64e1f0c531d1 ("s390/mm: force swiotlb for protected virtualization")
> Cc: sta...@vger.kernel.org #5.3+
> 
> ---

Picked it up and stuck it in linux-next with the other set of patches (Will's 
fixes).
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] Fix restricted DMA vs swiotlb_exit()

2021-07-23 Thread Konrad Rzeszutek Wilk
On Fri, Jul 23, 2021 at 10:50:57AM +0200, Christian Borntraeger wrote:
> 
> 
> On 23.07.21 10:47, Halil Pasic wrote:
> > On Fri, 23 Jul 2021 08:14:19 +0200
> > Christian Borntraeger  wrote:
> > 
> > > Resending with the correct email of Heiko
> > > 
> > > On 23.07.21 03:12, Halil Pasic wrote:
> > > > On Thu, 22 Jul 2021 21:22:58 +0200
> > > > Christian Borntraeger  wrote:
> > > > > On 20.07.21 15:38, Will Deacon wrote:
> > > > > > Hi again, folks,
> > > > > > 
> > > > > > This is version two of the patch series I posted yesterday:
> > > > > > 
> > > > > >  https://lore.kernel.org/r/20210719123054.6844-1-w...@kernel.org
> > > > > > 
> > > > > > The only changes since v1 are:
> > > > > > 
> > > > > >  * Squash patches 2 and 3, amending the commit message 
> > > > > > accordingly
> > > > > >  * Add Reviewed-by and Tested-by tags from Christoph and Claire 
> > > > > > (thanks!)
> > > > > > 
> > > > > > I'd usually leave it a bit longer between postings, but since this 
> > > > > > fixes
> > > > > > issues with patches in -next I thought I'd spin a new version 
> > > > > > immediately.
> > > > > > 
> > > > > > Cheers,
> > > > > 
> > > > > FWIW, I just bisected virtio-errors with secure execution mode
> > > > > qemu-system-s390x: virtio-serial-bus: Unexpected port id 4205794771 
> > > > > for device virtio-serial0.0
> > > > > 
> > > > > to
> > > > > commit 903cd0f315fe426c6a64c54ed389de0becb663dc
> > > > > Author: Claire Chang 
> > > > > Date:   Thu Jun 24 23:55:20 2021 +0800
> > > > > 
> > > > > swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
> > > > > 
> > > > > Unfortunately this patch series does NOT fix this issue, so it seems 
> > > > > that even more
> > > > > things are broken.
> > > > > 
> > > > > Any idea what else might be broken?
> > > > 
> > > > I've done some debugging, and I think I know what is going on. Since
> > > > that commit we need to set force_swiotlb before the swiotlb itself is
> > > > initialized. So the patch below should fix the problem.
> > > > 
> > > > 8<-
> > > > 
> > > > From: Halil Pasic 
> > > > Date: Fri, 23 Jul 2021 02:57:06 +0200
> > > > Subject: [PATCH 1/1] s390/pv: fix the forcing of the swiotlb
> > > > 
> > > > Since commit 903cd0f315fe ("swiotlb: Use is_swiotlb_force_bounce for
> > > > swiotlb data bouncing") if code sets swiotlb_force it needs to do so
> > > > before the swiotlb is initialised. Otherwise
> > > > io_tlb_default_mem->force_bounce will not get set to true, and devices
> > > > that use (the default) swiotlb will not bounce  despite switolb_force
> > > > having the value of SWIOTLB_FORCE.
> > > > 
> > > > Let us restore swiotlb functionality for PV by fulfilling this new
> > > > requirement.
> > > I would add:
> > > Fixes: 903cd0f315fe ("swiotlb: Use is_swiotlb_force_bounce for swiotlb 
> > > data bouncing")
> > > as this patch breaks things
> > > and
> > > Fixes: 64e1f0c531d1 ("s390/mm: force swiotlb for protected 
> > > virtualization")
> > > 
> > > to make the s390 init code more robust in case people start backporting 
> > > things.
> > 
> > I agree. Do we want this backported to the stable releases that have
> > 64e1f0c531d1  (i.e. do we need a cc stable) or should the fixes tag just
> > serve as metadata? My guess is, it's the former. In that sense should I
> > add the tags along with an explanation for the second fixes respin with
> > cc stable?
> > 
> > (BTW I don't think this formally qualifies for the stable backports, but
> > I hope we can make an exception...)
> 
> I think it makes sense for stable as it is cleaner to set the flags before
> calling the init function. cc stable would be better and the right way
> according to process, but the Fixes tag is mostly enough.

But the reaso for fixing this is for code that is not yet in Linus's
tree?

I can just pick this patch up and add it in the pile I have for the next
merge window?
> 
> > 
> > > 
> > > > Signed-off-by: Halil Pasic 
> > > 
> > > I can confirm that this fixes the problem. This also makes sense codewise.
> > > 
> > > Tested-by: Christian Borntraeger 
> > > Reviewed-by: Christian Borntraeger 
> > 
> > Thanks!
> > 
> > Regards,
> > Halil
> > > 
> > > Konrad, Heiko, Vasily, any preference which tree this goes? I think s390
> > > would be easiest, but that requires that the patches in the swiotlb tree 
> > > have
> > > fixed commit IDs.
> > > 
> > > > ---
> > > >arch/s390/mm/init.c | 2 +-
> > > >1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> > > > index 8ac710de1ab1..07bbee9b7320 100644
> > > > --- a/arch/s390/mm/init.c
> > > > +++ b/arch/s390/mm/init.c
> > > > @@ -186,9 +186,9 @@ static void pv_init(void)
> > > > return;
> > > > /* make sure bounce buffers are shared */
> > > > +   swiotlb_force = SWIOTLB_FORCE;
> > > > swiotlb_init(1);
> > > > swiotlb_update_mem_attribute

Re: [PATCH v2] dma-mapping: use vmalloc_to_page for vmalloc addresses

2021-07-13 Thread Konrad Rzeszutek Wilk
On Tue, Jun 22, 2021 at 04:34:14PM +0300, Roman Skakun wrote:
> This commit is dedicated to fix incorrect conversion from
> cpu_addr to page address in cases when we get virtual
> address which allocated in the vmalloc range.
> As the result, virt_to_page() cannot convert this address
> properly and return incorrect page address.
> 
> Need to detect such cases and obtains the page address using
> vmalloc_to_page() instead.
> 
> Signed-off-by: Roman Skakun 
> Reviewed-by: Andrii Anisov 
> ---
> Hey!
> Thanks for suggestions, Christoph!
> I updated the patch according to your advice.
> But, I'm so surprised because nobody catches this problem
> in the common code before. It looks a bit strange as for me. 

This looks like it wasn't picked up? Should it go in rc1?
> 
> 
>  kernel/dma/ops_helpers.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
> index 910ae69cae77..782728d8a393 100644
> --- a/kernel/dma/ops_helpers.c
> +++ b/kernel/dma/ops_helpers.c
> @@ -5,6 +5,14 @@
>   */
>  #include 
>  
> +static struct page *cpu_addr_to_page(void *cpu_addr)
> +{
> + if (is_vmalloc_addr(cpu_addr))
> + return vmalloc_to_page(cpu_addr);
> + else
> + return virt_to_page(cpu_addr);
> +}
> +
>  /*
>   * Create scatter-list for the already allocated DMA buffer.
>   */
> @@ -12,7 +20,7 @@ int dma_common_get_sgtable(struct device *dev, struct 
> sg_table *sgt,
>void *cpu_addr, dma_addr_t dma_addr, size_t size,
>unsigned long attrs)
>  {
> - struct page *page = virt_to_page(cpu_addr);
> + struct page *page = cpu_addr_to_page(cpu_addr);
>   int ret;
>  
>   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
> @@ -43,7 +51,7 @@ int dma_common_mmap(struct device *dev, struct 
> vm_area_struct *vma,
>   return -ENXIO;
>  
>   return remap_pfn_range(vma, vma->vm_start,
> - page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff,
> + page_to_pfn(cpu_addr_to_page(cpu_addr)) + vma->vm_pgoff,
>   user_count << PAGE_SHIFT, vma->vm_page_prot);
>  #else
>   return -ENXIO;
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-13 Thread Konrad Rzeszutek Wilk
..snip..
> > > I think the main question I have is how would you like to see patches for
> > > 5.15? i.e. as patches on top of devel/for-linus-5.14 or something else?
> > 
> > Yes that would be perfect. If there are any dependencies on the rc1, I
> > can rebase it on top of that.
> 
> Yes, please, rebasing would be very helpful. The broader rework of
> 'io_tlb_default_mem' is going to conflict quite badly otherwise.

There is a devel/for-linus-5.15 (based on v5.14-rc1) now.

Thank you!
> 
> Cheers,
> 
> Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Enable swiotlb if any device supports iommu v2 and uses identity mapping

2021-07-13 Thread Konrad Rzeszutek Wilk
On Thu, Jul 08, 2021 at 03:43:42PM +0100, Robin Murphy wrote:
> On 2021-07-08 14:57, Kai-Heng Feng wrote:
> > On Thu, Jul 8, 2021 at 6:18 PM Robin Murphy  wrote:
> > > 
> > > On 2021-07-08 10:28, Joerg Roedel wrote:
> > > > On Thu, Jul 08, 2021 at 03:42:32PM +0800, Kai-Heng Feng wrote:
> > > > > @@ -344,6 +344,9 @@ static int iommu_init_device(struct device *dev)
> > > > > 
> > > > >   iommu = amd_iommu_rlookup_table[dev_data->devid];
> > > > >   dev_data->iommu_v2 = iommu->is_iommu_v2;
> > > > > +
> > > > > +if (dev_data->iommu_v2)
> > > > > +swiotlb = 1;
> > > > 
> > > > This looks like the big hammer, as it will affect all other systems
> > > > where the AMD GPUs are in their own group.
> > > > 
> > > > What is needed here is an explicit check whether a non-iommu-v2 device
> > > > is direct-mapped because it shares a group with the GPU, and only enable
> > > > swiotlb in this case.
> > > 
> > > Right, it's basically about whether any DMA-limited device might at any
> > > time end up in an IOMMU_DOMAIN_IDENTITY domain. And given the
> > > possibility of device hotplug and the user being silly with the sysfs
> > > interface, I don't think we can categorically determine that at boot time.
> > > 
> > > Also note that Intel systems are likely to be similarly affected (in
> > > fact intel-iommu doesn't even have the iommu_default_passthough() check
> > > so it's probably even easier to blow up).
> > 
> > swiotlb is enabled by pci_swiotlb_detect_4gb() and intel-iommu doesn't
> > disable it.
> 
> Oh, right... I did say I found this dance hard to follow. Clearly I
> shouldn't have trusted what I thought I remembered from looking at it
> yesterday :)
> 
> Also not helped by the fact that it sets iommu_detected which *does* disable
> SWIOTLB, but only on IA-64.
> 
> > I wonder if we can take the same approach in amd-iommu?
> 
> Certainly if there's a precedent for leaving SWIOTLB enabled even if it
> *might* be redundant, that seems like the easiest option (it's what we do on
> arm64 too, but then we have system topologies where some devices may not be
> behind IOMMUs even when others are). More fun would be to try to bring it up
> at the first sign of IOMMU_DOMAIN_IDENTITY if it was disabled previously,
> but I don't have the highest hope of that being practical.


It is kind of silly to enable SWIOTLB which will just eat 64MB of memory
"just in case".

The SWIOTLB does have support to do late initialization (xen-pcifront
does that for example - so if you add devices that can't do 64-bit it
will allocate something like 4MB).

Would that be a better choice going forward - that is allocate this
under those conditions?
> 
> Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: add overflow checks to swiotlb_bounce

2021-07-13 Thread Konrad Rzeszutek Wilk
On Wed, Jul 07, 2021 at 02:12:54PM +0900, Dominique Martinet wrote:
> This is a follow-up on 5f89468e2f06 ("swiotlb: manipulate orig_addr
> when tlb_addr has offset") which fixed unaligned dma mappings,
> making sure the following overflows are caught:
> 
> - offset of the start of the slot within the device bigger than
> requested address' offset, in other words if the base address
> given in swiotlb_tbl_map_single to create the mapping (orig_addr)
> was after the requested address for the sync (tlb_offset) in the
> same block:
> 
>  |--| block
>   <> mapped part of the block
>   ^
>   orig_addr
>^
>invalid tlb_addr for sync
> 
> - if the resulting offset was bigger than the allocation size
> this one could happen if the mapping was not until the end. e.g.
> 
>  |--| block
>   <-> mapped part of the block
>   ^   ^
>   orig_addr   invalid tlb_addr
> 
> Both should never happen so print a warning and bail out without trying
> to adjust the sizes/offsets: the first one could try to sync from
> orig_addr to whatever is left of the requested size, but the later
> really has nothing to sync there...
> 
> Signed-off-by: Dominique Martinet 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Bumyong Lee 
> Cc: Chanho Park 
> Cc: Christoph Hellwig 
> ---
> 
> Hi Konrad,
> 
> here's the follow up for the swiotlb/caamjr regression I had promissed.

Awesome!
> It doesn't really change anything, and I confirmed I don't hit either of
> the warnings on our board, but it's probably best to have as either
> could really happen.

:nods:

I put it in the devel/for-linus-5.14 and linux-next. Thank you!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Konrad Rzeszutek Wilk
On Tue, Jul 06, 2021 at 05:57:21PM +0100, Will Deacon wrote:
> On Tue, Jul 06, 2021 at 10:46:07AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote:
> > > On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > > > FWIW I was pondering the question of whether to do something along 
> > > > those 
> > > > lines or just scrap the default assignment entirely, so since I hadn't 
> > > > got 
> > > > round to saying that I've gone ahead and hacked up the alternative 
> > > > (similarly untested) for comparison :)
> > > >
> > > > TBH I'm still not sure which one I prefer...
> > > 
> > > Claire did implement something like your suggestion originally, but
> > > I don't really like it as it doesn't scale for adding multiple global
> > > pools, e.g. for the 64-bit addressable one for the various encrypted
> > > secure guest schemes.
> > 
> > Couple of things:
> >  - I am not pushing to Linus the Claire's patchset until we have a
> >resolution on this. I hope you all agree that is a sensible way
> >forward as much as I hate doing that.
> 
> Sure, it's a pity but we could clearly use a bit more time to get these
> just right and we've run out of time for 5.14.
> 
> I think the main question I have is how would you like to see patches for
> 5.15? i.e. as patches on top of devel/for-linus-5.14 or something else?

Yes that would be perfect. If there are any dependencies on the rc1, I
can rebase it on top of that.

> 
> Cheers,
> 
> Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-07-06 Thread Konrad Rzeszutek Wilk
On Tue, Jul 06, 2021 at 04:05:13PM +0200, Christoph Hellwig wrote:
> On Tue, Jul 06, 2021 at 03:01:04PM +0100, Robin Murphy wrote:
> > FWIW I was pondering the question of whether to do something along those 
> > lines or just scrap the default assignment entirely, so since I hadn't got 
> > round to saying that I've gone ahead and hacked up the alternative 
> > (similarly untested) for comparison :)
> >
> > TBH I'm still not sure which one I prefer...
> 
> Claire did implement something like your suggestion originally, but
> I don't really like it as it doesn't scale for adding multiple global
> pools, e.g. for the 64-bit addressable one for the various encrypted
> secure guest schemes.

Couple of things:
 - I am not pushing to Linus the Claire's patchset until we have a
   resolution on this. I hope you all agree that is a sensible way
   forward as much as I hate doing that.

 - I like Robin's fix as it is simplest looking. Would love to see if it
   does fix the problem.

 - Christopher - we can always add multiple pools as the next milestone
   and just focus on this feature getting tested extensively during this
   release.

 - Would it be worth (for future or maybe in another tiny fix) to also add
   a printk in swiotlb when we de-allocate the buffer so when someone looks
   through the `dmesg` it becomes much easier to diagnose issues?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-24 Thread Konrad Rzeszutek Wilk
On Thu, Jun 24, 2021 at 11:58:57PM +0800, Claire Chang wrote:
> On Thu, Jun 24, 2021 at 11:56 PM Konrad Rzeszutek Wilk
>  wrote:
> >
> > On Thu, Jun 24, 2021 at 10:10:51AM -0400, Qian Cai wrote:
> > >
> > >
> > > On 6/24/2021 7:48 AM, Will Deacon wrote:
> > > > Ok, diff below which attempts to tackle the offset issue I mentioned as
> > > > well. Qian Cai -- please can you try with these changes?
> > >
> > > This works fine.
> >
> > Cool. Let me squash this patch in #6 and rebase the rest of them.
> >
> > Claire, could you check the devel/for-linus-5.14 say by end of today to
> > double check that I didn't mess anything up please?
> 
> I just submitted v15 here
> (https://lore.kernel.org/patchwork/cover/1451322/) in case it's
> helpful.

Oh! Nice!
> I'll double check of course. Thanks for the efforts!

I ended up using your patch #6 and #7. Please double-check.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v15 00/12] Restricted DMA

2021-06-24 Thread Konrad Rzeszutek Wilk
On Thu, Jun 24, 2021 at 11:55:14PM +0800, Claire Chang wrote:
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
> 
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
> 
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
> 
> [1a] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> [1b] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> [2] https://blade.tencent.com/en/advisories/qualpwn/
> [3] 
> https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> [4] 
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> 
> v15:
> - Apply Will's diff (https://lore.kernel.org/patchwork/patch/1448957/#1647521)
>   to fix the crash reported by Qian.
> - Add Stefano's Acked-by tag for patch 01/12 from v14

That all should be now be on

https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/
devel/for-linus-5.14 (and linux-next)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-24 Thread Konrad Rzeszutek Wilk
On Thu, Jun 24, 2021 at 10:10:51AM -0400, Qian Cai wrote:
> 
> 
> On 6/24/2021 7:48 AM, Will Deacon wrote:
> > Ok, diff below which attempts to tackle the offset issue I mentioned as
> > well. Qian Cai -- please can you try with these changes?
> 
> This works fine.

Cool. Let me squash this patch in #6 and rebase the rest of them.

Claire, could you check the devel/for-linus-5.14 say by end of today to
double check that I didn't mess anything up please?

Will,

Thank you for generating the fix! I am going to run it on x86 and Xen
to make sure all is good (granted last time I ran devel/for-linus-5.14
on that setup I didn't see any errors so I need to double check
I didn't do something silly like run a wrong kernel).


> 
> > 
> > Will
> > 
> > --->8
> > 
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index 175b6c113ed8..39284ff2a6cd 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -116,7 +116,9 @@ static inline bool is_swiotlb_buffer(struct device 
> > *dev, phys_addr_t paddr)
> >  
> >  static inline bool is_swiotlb_force_bounce(struct device *dev)
> >  {
> > -   return dev->dma_io_tlb_mem->force_bounce;
> > +   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> > +
> > +   return mem && mem->force_bounce;
> >  }
> >  
> >  void __init swiotlb_exit(void);
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 44be8258e27b..0ffbaae9fba2 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -449,6 +449,7 @@ static int swiotlb_find_slots(struct device *dev, 
> > phys_addr_t orig_addr,
> > dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> > unsigned int nslots = nr_slots(alloc_size), stride;
> > unsigned int index, wrap, count = 0, i;
> > +   unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> > unsigned long flags;
> >  
> > BUG_ON(!nslots);
> > @@ -497,7 +498,7 @@ static int swiotlb_find_slots(struct device *dev, 
> > phys_addr_t orig_addr,
> > for (i = index; i < index + nslots; i++) {
> > mem->slots[i].list = 0;
> > mem->slots[i].alloc_size =
> > -   alloc_size - ((i - index) << IO_TLB_SHIFT);
> > +   alloc_size - (offset + ((i - index) << 
> > IO_TLB_SHIFT));
> > }
> > for (i = index - 1;
> >  io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 00/12] Restricted DMA

2021-06-23 Thread Konrad Rzeszutek Wilk
On Sat, Jun 19, 2021 at 11:40:31AM +0800, Claire Chang wrote:
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
> 
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
> 
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
> 
> [1a] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> [1b] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> [2] https://blade.tencent.com/en/advisories/qualpwn/
> [3] 
> https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> [4] 
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

Heya Claire,

I put all your patches on
https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/log/?h=devel/for-linus-5.14

Please double-check that they all look ok.

Thank you!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[GIT PULL] (stable) stable/for-linus-5.14

2021-06-22 Thread Konrad Rzeszutek Wilk
Hey Linus,

Please git pull the following branch:

git pull git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git 
stable/for-linus-5.14

which has the regression for the DMA operations where the offset was
ignored and corruptions would appear.

Going forward there will be a cleanups to make the offset and alignment
logic more clearer and better test-cases to help with this.

Bumyong Lee (1):
  swiotlb: manipulate orig_addr when tlb_addr has offset

 kernel/dma/swiotlb.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-22 Thread Konrad Rzeszutek Wilk
On Tue, Jun 22, 2021 at 04:48:24PM +0900, 'Dominique MARTINET' wrote:
> Konrad Rzeszutek Wilk wrote on Mon, Jun 21, 2021 at 09:16:43AM -0400:
> > The beaty of 'devel' and 'linux-next' is that they can be reshuffled and
> > mangled. I pushed them original patch from Bumyong there and will let
> > it sit for a day and then create a stable branch and give it to Linus.
> 
> Thanks, that should be good.
> 
> Do you want me to send a follow-up patch with the two extra checks
> (tlb_addr & (IO_TLB_SIZE -1)) > swiotlb_align_offset(dev, orig_addr)
> tlb_offset < alloc_size
> 
> or are we certain this can't ever happen?

I would love more patches and I saw the previous one you posted.

But we only got two (or one) weeks before the next merge window opens
so I am sending to Linus the one that was tested with NVMe and crypto
(see above).

That is the
https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=stable/for-linus-5.14

And then after Linus releases the 5.14 - I would love to take your
cleanup on top of that and test it?

> (I didn't see any hit in dmesg when I ran with these, but my opinion is
> better safe than sorry...)
> 
> 
> > Then I need to expand the test-regression bucket so that this does not
> > happen again. Dominique, how easy would it be to purchase one of those
> > devices?
> 
> My company is making such a device, but it's not on the market yet
> (was planned for august, with some delay in approvisionning it'll
> probably be a bit late), and would mean buying from Japan so I'm not
> sure how convenient that would be...
> 
> These are originally NXP devices so I assume Horia would have better
> suggestions, if you would?
> 
> 
> > I was originally thinking to create a crypto device in QEMU to simulate
> > this but that may take longer to write than just getting the real thing.
> > 
> > Or I could create some fake devices with weird offsets and write a driver
> > for it to exercise this.. like this one I had done some time ago that
> > needs some brushing off.
> 
> Just a fake device with fake offsets as a test is probably good enough,
> ideally would need to exerce both failures we've seen (offset in
> dma_sync_single_for_device like caam does and in the original mapping (I
> assume?) like the NVMe driver does), but that sounds possible :)

Yup. Working on that now.
> 
> 
> Thanks again!
> -- 
> Dominique
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-21 Thread Konrad Rzeszutek Wilk
On Mon, Jun 21, 2021 at 01:14:48PM +0900, 'Dominique MARTINET' wrote:
> Chanho Park wrote on Mon, Jun 21, 2021 at 11:55:22AM +0900:
> > Sure. No problem. But, the patch was already stacked on Konrad's tree
> > and linux-next as well.
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=devel/for-linus-5.14&id=33d1641f38f0c327bc3e5c21de585c77a6512bc6
> >  
> 
> That patch is slightly different, it's a rewrite Konrad did that mixes
> in Linus' suggestion[1], which breaks things for the NVMe usecase
> Jianxiong Gao has.
> 
> [1] offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1)
> 
> 
> Konrad is aware so I think it shouldn't be submitted :)

The beaty of 'devel' and 'linux-next' is that they can be reshuffled and
mangled. I pushed them original patch from Bumyong there and will let
it sit for a day and then create a stable branch and give it to Linus.

Then I need to expand the test-regression bucket so that this does not
happen again. Dominique, how easy would it be to purchase one of those
devices?

I was originally thinking to create a crypto device in QEMU to simulate
this but that may take longer to write than just getting the real thing.

Or I could create some fake devices with weird offsets and write a driver
for it to exercise this.. like this one I had done some time ago that
needs some brushing off.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-16 Thread Konrad Rzeszutek Wilk
On Wed, Jun 16, 2021 at 01:49:54PM -0700, Jianxiong Gao wrote:
> On Fri, Jun 11, 2021 at 3:35 AM Konrad Rzeszutek Wilk
>  wrote:
> >
> > On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote:
> > > On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:
> > > > I've noticed the failure also in v5.10 and v5.11 stable kernels,
> > > > since the patch set has been backported.
> > >
> > > FYI, there has been a patch on the list that should have fixed this
> > > for about a month:
> > >
> > > https://lore.kernel.org/linux-iommu/20210510091816.ga2...@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f
> > >
> > > but it seems like it never got picked up.
> >
> > Jianxiong,
> > Would you be up for testing this patch on your NVMe rig please? I don't
> > forsee a problem.. but just in case
> >
> I have tested the attached patch and it generates an error when
> formatting a disk to xfs format in Rhel 8 environment:

Thank you for testing that - and this is a bummer indeed.

Jianxiong,
How unique is this NVMe? Should I be able to reproduce this with any
type or is it specific to Google Cloud?

Dominique, Horia,

Are those crypto devices somehow easily available to test out the
patches?

P.S.
Most unfortunate timing - I am out in rural areas in US with not great
Internet, so won't be able to get fully down to this until Monday.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-15 Thread Konrad Rzeszutek Wilk
On Tue, Jun 15, 2021 at 09:27:02PM +0800, Claire Chang wrote:
> Always have the pointer to the swiotlb pool used in struct device. This
> could help simplify the code for other pools.

Applying: swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
error: patch failed: kernel/dma/swiotlb.c:339
error: kernel/dma/swiotlb.c: patch does not apply
..

Would you be OK rebasing this against devel/for-linus-5.14 please?
(And please send out with the Reviewed-by from Christopher)

Thank you!
> 
> Signed-off-by: Claire Chang 
> ---
>  drivers/base/core.c| 4 
>  include/linux/device.h | 4 
>  kernel/dma/swiotlb.c   | 8 
>  3 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index b8a8c96dca58..eeb2d49d3aa3 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include  /* for dma_default_coherent */
>  
> @@ -2846,6 +2847,9 @@ void device_initialize(struct device *dev)
>  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
>   dev->dma_coherent = dma_default_coherent;
>  #endif
> +#ifdef CONFIG_SWIOTLB
> + dev->dma_io_tlb_mem = io_tlb_default_mem;
> +#endif
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
>  
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 4443e12238a0..2e9a378c9100 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -432,6 +432,7 @@ struct dev_links_info {
>   * @dma_pools:   Dma pools (if dma'ble device).
>   * @dma_mem: Internal for coherent mem override.
>   * @cma_area:Contiguous memory area for dma allocations
> + * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
>   * @archdata:For arch-specific additions.
>   * @of_node: Associated device tree node.
>   * @fwnode:  Associated device node supplied by platform firmware.
> @@ -540,6 +541,9 @@ struct device {
>  #ifdef CONFIG_DMA_CMA
>   struct cma *cma_area;   /* contiguous memory area for dma
>  allocations */
> +#endif
> +#ifdef CONFIG_SWIOTLB
> + struct io_tlb_mem *dma_io_tlb_mem;
>  #endif
>   /* arch specific additions */
>   struct dev_archdata archdata;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 97c6ad50fdc2..949a6bb21343 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -339,7 +339,7 @@ void __init swiotlb_exit(void)
>  static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
> size,
>  enum dma_data_direction dir)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
>   phys_addr_t orig_addr = mem->slots[index].orig_addr;
>   size_t alloc_size = mem->slots[index].alloc_size;
> @@ -421,7 +421,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
> unsigned int index)
>  static int find_slots(struct device *dev, phys_addr_t orig_addr,
>   size_t alloc_size)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   unsigned long boundary_mask = dma_get_seg_boundary(dev);
>   dma_addr_t tbl_dma_addr =
>   phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
> @@ -498,7 +498,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
> phys_addr_t orig_addr,
>   size_t mapping_size, size_t alloc_size,
>   enum dma_data_direction dir, unsigned long attrs)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>   unsigned int offset = swiotlb_align_offset(dev, orig_addr);
>   unsigned int i;
>   int index;
> @@ -549,7 +549,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
> size_t mapping_size, enum dma_data_direction dir,
> unsigned long attrs)
>  {
> - struct io_tlb_mem *mem = io_tlb_default_mem;
> + struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
>   unsigned long flags;
>   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
>   int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
> -- 
> 2.32.0.272.g935e593368-goog
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-11 Thread Konrad Rzeszutek Wilk
On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote:
> > I've noticed the failure also in v5.10 and v5.11 stable kernels,
> > since the patch set has been backported.
> 
> FYI, there has been a patch on the list that should have fixed this
> for about a month:
> 
> https://lore.kernel.org/linux-iommu/20210510091816.ga2...@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f
> 
> but it seems like it never got picked up.

Yikes!

Dominique,

Would you be up to testing the attached (and inline) patch please?

Linus,

Would you be terribly offended if I took your code (s/unsigned
long/unsigned int), and used Chanho's description of the problem (see below)?

Christoph,
I took the liberty of putting your Reviewed-by on the patch, you OK with
that?

Jianxiong,
Would you be up for testing this patch on your NVMe rig please? I don't
forsee a problem.. but just in case

>From f06da55596675383fbf2563fe2919b2a8f68901b Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Thu, 10 Jun 2021 12:41:26 -0700
Subject: [PATCH] swiotlb: manipulate orig_addr when tlb_addr has offset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

in case of driver wants to sync part of ranges with offset,
swiotlb_tbl_sync_single() copies from orig_addr base to tlb_addr with
offset and ends up with data mismatch.

It was removed from
"swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single",
but said logic has to be added back in.

[From Linus's email:
That commit which the removed the offset calculation entirely, because the old

(unsigned long)tlb_addr & (IO_TLB_SIZE - 1)

was wrong, but instead of removing it, I think it should have just
fixed it to be

(tlb_addr - mem->start) & (IO_TLB_SIZE - 1);

instead. That way the slot offset always matches the slot index calculation.]

The use-case that drivers are hitting is as follow:

1. Get dma_addr_t from dma_map_single()

dma_addr_t tlb_addr = dma_map_single(dev, vaddr, vsize, DMA_TO_DEVICE);

|<---vsize->|
+---+
|   | original buffer
+---+
  vaddr

 swiotlb_align_offset
 |<->|<---vsize->|
 +---+---+
 |   |   | swiotlb buffer
 +---+---+
  tlb_addr

2. Do something
3. Sync dma_addr_t through dma_sync_single_for_device(..)

dma_sync_single_for_device(dev, tlb_addr + offset, size, DMA_TO_DEVICE);

  Error case.
Copy data to original buffer but it is from base addr (instead of
  base addr + offset) in original buffer:

 swiotlb_align_offset
 |<->|<- offset ->|<- size ->|
 +---+---+
 |   ||##|   | swiotlb buffer
 +---+---+
  tlb_addr

|<- size ->|
+---+
|##|| original buffer
+---+
  vaddr

The fix is to copy the data to the original buffer and take into
account the offset, like so:

 swiotlb_align_offset
 |<->|<- offset ->|<- size ->|
 +---+---+
 |   ||##|   | swiotlb buffer
 +---+---+
  tlb_addr

|<- offset ->|<- size ->|
+---+
||##|   | original buffer
+---+
  vaddr

[This patch text is from Bumyong's email; and his solution was very close
to Linus's, so incorporating his text]

Fixes: 16fc3cef33a0 ("swiotlb: don't modify orig_addr in 
swiotlb_tbl_sync_single")
Signed-off-by: Bumyong Lee 
Signed-off-by: Chanho Park 
Reviewed-by: Christoph Hellwig 
Reported-by: Dominique MARTINET 
Reported-by: Horia Geantă 
Tested-by: Horia Geantă 
Signed-off-by: Linus Torvalds 
CC: sta...@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk 
---
 kernel/dma/swiotlb.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d50..dc438b5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -342,6 +342,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t 
tlb_addr, size_t size
 {
struct io_tlb_mem *mem = io_tlb_default_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
+   unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
phys_addr_t orig_addr = mem->slots[index].orig_addr;

Re: [PATCH v1 5/8] dma: Use size for swiotlb boundary checks

2021-06-02 Thread Konrad Rzeszutek Wilk
On Wed, Jun 02, 2021 at 05:41:30PM -0700, Andi Kleen wrote:
> swiotlb currently only uses the start address of a DMA to check if something
> is in the swiotlb or not. But with virtio and untrusted hosts the host
> could give some DMA mapping that crosses the swiotlb boundaries,
> potentially leaking or corrupting data. Add size checks to all the swiotlb
> checks and reject any DMAs that cross the swiotlb buffer boundaries.

I seem to be only CC-ed on this and #7, so please bear with me.

But could you explain to me why please:

commit daf9514fd5eb098d7d6f3a1247cb8cc48fc94155 (swiotlb/stable/for-linus-5.12)
Author: Martin Radev 
Date:   Tue Jan 12 16:07:29 2021 +0100

swiotlb: Validate bounce size in the sync/unmap path

does not solve the problem as well?

> 
> Signed-off-by: Andi Kleen 
> ---
>  drivers/iommu/dma-iommu.c   | 13 ++---
>  drivers/xen/swiotlb-xen.c   | 11 ++-
>  include/linux/dma-mapping.h |  4 ++--
>  include/linux/swiotlb.h |  8 +---
>  kernel/dma/direct.c |  8 
>  kernel/dma/direct.h |  8 
>  kernel/dma/mapping.c|  4 ++--
>  net/xdp/xsk_buff_pool.c |  2 +-
>  8 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 7bcdd1205535..7ef13198721b 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -504,7 +504,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
> dma_addr_t dma_addr,
>  
>   __iommu_dma_unmap(dev, dma_addr, size);
>  
> - if (unlikely(is_swiotlb_buffer(phys)))
> + if (unlikely(is_swiotlb_buffer(phys, size)))
>   swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
>  }
>  
> @@ -575,7 +575,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
> *dev, phys_addr_t phys,
>   }
>  
>   iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
> - if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
> + if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys, org_size))
>   swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
>   return iova;
>  }
> @@ -781,7 +781,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
> *dev,
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_cpu(phys, size, dir);
>  
> - if (is_swiotlb_buffer(phys))
> + if (is_swiotlb_buffer(phys, size))
>   swiotlb_sync_single_for_cpu(dev, phys, size, dir);
>  }
>  
> @@ -794,7 +794,7 @@ static void iommu_dma_sync_single_for_device(struct 
> device *dev,
>   return;
>  
>   phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> - if (is_swiotlb_buffer(phys))
> + if (is_swiotlb_buffer(phys, size))
>   swiotlb_sync_single_for_device(dev, phys, size, dir);
>  
>   if (!dev_is_dma_coherent(dev))
> @@ -815,7 +815,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
>  
> - if (is_swiotlb_buffer(sg_phys(sg)))
> + if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>   swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
>   sg->length, dir);
>   }
> @@ -832,10 +832,9 @@ static void iommu_dma_sync_sg_for_device(struct device 
> *dev,
>   return;
>  
>   for_each_sg(sgl, sg, nelems, i) {
> - if (is_swiotlb_buffer(sg_phys(sg)))
> + if (is_swiotlb_buffer(sg_phys(sg), sg->length))
>   swiotlb_sync_single_for_device(dev, sg_phys(sg),
>  sg->length, dir);
> -
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_device(sg_phys(sg), sg->length, dir);
>   }
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 24d11861ac7d..333846af8d35 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -89,7 +89,8 @@ static inline int range_straddles_page_boundary(phys_addr_t 
> p, size_t size)
>   return 0;
>  }
>  
> -static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
> +static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr,
> +  size_t size)
>  {
>   unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
>   unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> @@ -100,7 +101,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
> dma_addr_t dma_addr)
>* in our domain. Therefore _only_ check address within our domain.
>*/
>   if (pfn_valid(PFN_DOWN(paddr)))
> - return is_swiotlb_buffer(paddr);
> + return is_swiotlb_buffer(paddr, size);
>   return 0;
>  }
>  
> @@ -431,7 +432,7 @@ static void xen_swiotlb_unmap_page(struct device 

Re: [PATCH v7 01/15] swiotlb: Refactor swiotlb init functions

2021-05-24 Thread Konrad Rzeszutek Wilk
> > do the set_memory_decrypted()+memset(). Is this okay or should
> > swiotlb_init_io_tlb_mem() add an additional argument to do this
> > conditionally?
> 
> I'm actually not sure if this it okay. If not, will add an additional
> argument for it.

Any observations discovered? (Want to make sure my memory-cache has the
correct semantics for set_memory_decrypted in mind).
> 
> > --
> > Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 05/15] swiotlb: Add a new get_io_tlb_mem getter

2021-05-24 Thread Konrad Rzeszutek Wilk
On Tue, May 18, 2021 at 02:51:52PM +0800, Claire Chang wrote:
> Still keep this function because directly using dev->dma_io_tlb_mem
> will cause issues for memory allocation for existing devices. The pool
> can't support atomic coherent allocation so we need to distinguish the
> per device pool and the default pool in swiotlb_alloc.

This above should really be rolled in the commit. You can prefix it by
"The reason it was done this way was because directly using .."


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 04/15] swiotlb: Add restricted DMA pool initialization

2021-05-24 Thread Konrad Rzeszutek Wilk
On Tue, May 18, 2021 at 02:48:35PM +0800, Claire Chang wrote:
> I didn't move this to a separate file because I feel it might be
> confusing for swiotlb_alloc/free (and need more functions to be
> non-static).
> Maybe instead of moving to a separate file, we can try to come up with
> a better naming?

I think you are referring to:

rmem_swiotlb_setup

?

Which is ARM specific and inside the generic code?



Christopher wants to unify it in all the code so there is one single
source, but the "you seperate arch code out from generic" saying
makes me want to move it out.

I agree that if you move it out from generic to arch-specific we have to
expose more of the swiotlb functions, which will undo's Christopher
cleanup code.

How about this - lets leave it as is now, and when there are more
use-cases we can revisit it and then if need to move the code?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: i915 and swiotlb_max_segment

2021-05-20 Thread Konrad Rzeszutek Wilk
On Mon, May 10, 2021 at 05:25:25PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> swiotlb_max_segment is a rather strange "API" export by swiotlb.c,
> and i915 is the only (remaining) user.
> 
> swiotlb_max_segment returns 0 if swiotlb is not in use, 1 if
> SWIOTLB_FORCE is set or swiotlb-zen is set, and the swiotlb segment
> size when swiotlb is otherwise enabled.
> 
> i915 then uses it to:
> 
>  a) decided on the max order in i915_gem_object_get_pages_internal
>  b) decide on a max segment size in i915_sg_segment_size
> 
> for a) it really seems i915 should switch to dma_alloc_noncoherent
> or dma_alloc_noncontigous ASAP instead of using alloc_page and
> streaming DMA mappings.  Any chance I could trick one of the i915
> maintaines into doing just that given that the callchain is not
> exactly trivial?
> 
> For b) I'm not sure swiotlb and i915 really agree on the meaning
> of the value.  swiotlb_set_max_segment basically returns the entire
> size of the swiotlb buffer, while i915 seems to use it to limit
> the size each scatterlist entry.  It seems like dma_max_mapping_size
> might be the best value to use here.

Yes. The background behind that was SWIOTLB would fail because well, the
size of the sg was too large. And some way to limit it to max size
was needed - the dma_max_mapping_size "should" be just fine.

> 
> Once that is fixed I'd like to kill off swiotlb_max_segment as it is
> a horribly confusing API.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: don't override user specified size in swiotlb_adjust_size

2021-04-29 Thread Konrad Rzeszutek Wilk
On Thu, Apr 29, 2021 at 08:45:51AM -0500, Tom Lendacky wrote:
> On 4/29/21 1:28 AM, Christoph Hellwig wrote:
> > If the user already specified a swiotlb size on the command line,
> > swiotlb_adjust_size shoul not overwrite it.
> > 
> > Fixes: 2cbc2776efe4 ("swiotlb: remove swiotlb_nr_tbl")
> > Reported-by: Tom Lendacky 
> 
> Thanks, Christoph!
> 
> Tested-by: Tom Lendacky 

Awesome!  I put on the stable/for-linus-5.13 and will send a GIT PULL to
Linus later on this week.

Thank you!
> 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  kernel/dma/swiotlb.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 8635a57f88e9..8ca7d505d61c 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -118,6 +118,8 @@ void __init swiotlb_adjust_size(unsigned long size)
> >  * architectures such as those supporting memory encryption to
> >  * adjust/expand SWIOTLB size for their use.
> >  */
> > +   if (default_nslabs != IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT)
> > +   return;
> > size = ALIGN(size, IO_TLB_SIZE);
> > default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
> > pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 09/12] swiotlb: Add bounce buffer remap address setting function

2021-04-15 Thread Konrad Rzeszutek Wilk
On Wed, Apr 14, 2021 at 10:49:42AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> For Hyper-V isolation VM with AMD SEV SNP, the bounce buffer(shared memory)
> needs to be accessed via extra address space(e.g address above bit39).
> Hyper-V code may remap extra address space outside of swiotlb. 
> swiotlb_bounce()

May? Isn't this a MUST in this case?

> needs to use remap virtual address to copy data from/to bounce buffer. Add
> new interface swiotlb_set_bounce_remap() to do that.

I am bit lost - why can't you use the swiotlb_init and pass in your
DMA pool that is already above bit 39?

> 
> Signed-off-by: Tianyu Lan 
> ---
>  include/linux/swiotlb.h |  5 +
>  kernel/dma/swiotlb.c| 13 -
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index d9c9fc9ca5d2..3ccd08116683 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -82,8 +82,13 @@ unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
>  bool is_swiotlb_active(void);
>  void __init swiotlb_adjust_size(unsigned long new_size);
> +void swiotlb_set_bounce_remap(unsigned char *vaddr);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
> +static inline void swiotlb_set_bounce_remap(unsigned char *vaddr)
> +{
> +}
> +
>  static inline bool is_swiotlb_buffer(phys_addr_t paddr)
>  {
>   return false;
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 7c42df6e6100..5fd2db6aa149 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -94,6 +94,7 @@ static unsigned int io_tlb_index;
>   * not be bounced (unless SWIOTLB_FORCE is set).
>   */
>  static unsigned int max_segment;
> +static unsigned char *swiotlb_bounce_remap_addr;
>  
>  /*
>   * We need to save away the original address corresponding to a mapped entry
> @@ -421,6 +422,11 @@ void __init swiotlb_exit(void)
>   swiotlb_cleanup();
>  }
>  
> +void swiotlb_set_bounce_remap(unsigned char *vaddr)
> +{
> + swiotlb_bounce_remap_addr = vaddr;
> +}
> +
>  /*
>   * Bounce: copy the swiotlb buffer from or back to the original dma location
>   */
> @@ -428,7 +434,12 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
> phys_addr_t tlb_addr,
>  size_t size, enum dma_data_direction dir)
>  {
>   unsigned long pfn = PFN_DOWN(orig_addr);
> - unsigned char *vaddr = phys_to_virt(tlb_addr);
> + unsigned char *vaddr;
> +
> + if (swiotlb_bounce_remap_addr)
> + vaddr = swiotlb_bounce_remap_addr + tlb_addr - io_tlb_start;
> + else
> + vaddr = phys_to_virt(tlb_addr);
>  
>   if (PageHighMem(pfn_to_page(pfn))) {
>   /* The buffer does not have a mapping.  Map it in and copy */
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Resend RFC PATCH V2 07/12] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

2021-04-15 Thread Konrad Rzeszutek Wilk
On Wed, Apr 14, 2021 at 10:49:40AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> VMbus ring buffer are shared with host and it's need to
> be accessed via extra address space of Isolation VM with
> SNP support. This patch is to map the ring buffer
> address in extra address space via ioremap(). HV host

Why do you need to use ioremap()? Why not just use vmap?


> visibility hvcall smears data in the ring buffer and
> so reset the ring buffer memory to zero after calling
> visibility hvcall.

So you are exposing these two:
 EXPORT_SYMBOL_GPL(get_vm_area);
 EXPORT_SYMBOL_GPL(ioremap_page_range);

But if you used vmap wouldn't you get the same thing for free?

> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/hv/channel.c  | 10 +
>  drivers/hv/hyperv_vmbus.h |  2 +
>  drivers/hv/ring_buffer.c  | 83 +--
>  mm/ioremap.c  |  1 +
>  mm/vmalloc.c  |  1 +
>  5 files changed, 76 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
> index 407b74d72f3f..4a9fb7ad4c72 100644
> --- a/drivers/hv/channel.c
> +++ b/drivers/hv/channel.c
> @@ -634,6 +634,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
>   if (err)
>   goto error_clean_ring;
>  
> + err = hv_ringbuffer_post_init(&newchannel->outbound,
> +   page, send_pages);
> + if (err)
> + goto error_free_gpadl;
> +
> + err = hv_ringbuffer_post_init(&newchannel->inbound,
> +   &page[send_pages], recv_pages);
> + if (err)
> + goto error_free_gpadl;
> +
>   /* Create and init the channel open message */
>   open_info = kzalloc(sizeof(*open_info) +
>  sizeof(struct vmbus_channel_open_channel),
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 0778add21a9c..d78a04ad5490 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
>  /* Interface */
>  
>  void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> + struct page *pages, u32 page_cnt);
>  
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  struct page *pages, u32 pagecnt);
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 35833d4d1a1d..c8b0f7b45158 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -17,6 +17,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "hyperv_vmbus.h"
>  
> @@ -188,6 +190,44 @@ void hv_ringbuffer_pre_init(struct vmbus_channel 
> *channel)
>   mutex_init(&channel->outbound.ring_buffer_mutex);
>  }
>  
> +int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
> +struct page *pages, u32 page_cnt)
> +{
> + struct vm_struct *area;
> + u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
> + unsigned long vaddr;
> + int err = 0;
> +
> + if (!hv_isolation_type_snp())
> + return 0;
> +
> + physic_addr += ms_hyperv.shared_gpa_boundary;
> + area = get_vm_area((2 * page_cnt - 1) * PAGE_SIZE, VM_IOREMAP);
> + if (!area || !area->addr)
> + return -EFAULT;
> +
> + vaddr = (unsigned long)area->addr;
> + err = ioremap_page_range(vaddr, vaddr + page_cnt * PAGE_SIZE,
> +physic_addr, PAGE_KERNEL_IO);
> + err |= ioremap_page_range(vaddr + page_cnt * PAGE_SIZE,
> +   vaddr + (2 * page_cnt - 1) * PAGE_SIZE,
> +   physic_addr + PAGE_SIZE, PAGE_KERNEL_IO);
> + if (err) {
> + vunmap((void *)vaddr);
> + return -EFAULT;
> + }
> +
> + /* Clean memory after setting host visibility. */
> + memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
> +
> + ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
> + ring_info->ring_buffer->read_index = 0;
> + ring_info->ring_buffer->write_index = 0;
> + ring_info->ring_buffer->feature_bits.value = 1;
> +
> + return 0;
> +}
> +
>  /* Initialize the ring buffer. */
>  int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
>  struct page *pages, u32 page_cnt)
> @@ -197,33 +237,34 @@ int hv_ringbuffer_init(struct hv_ring_buffer_info 
> *ring_info,
>  
>   BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
>  
> - /*
> -  * First page holds struct hv_ring_buffer, do wraparound mapping for
> -  * the rest.
> -  */
> - pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
> -GFP_KERNEL);
> - if (!pages_wraparound)
> - return -ENOMEM;
> -
> - pages_wraparound[0] = pages;
> - for (i = 0; i < 2 * (page_cnt - 1); i++)
> - pa

Re: [Resend RFC PATCH V2 06/12] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-04-15 Thread Konrad Rzeszutek Wilk
On Wed, Apr 14, 2021 at 10:49:39AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> The physical address of monitor pages in the CHANNELMSG_INITIATE_CONTACT
> msg should be in the extra address space for SNP support and these

What is this 'extra address space'? Is that just normal virtual address
space of the Linux kernel?

> pages also should be accessed via the extra address space inside Linux
> guest and remap the extra address by ioremap function.

OK, why do you need to use ioremap on them? Why not use vmap for
example? What is it that makes ioremap the right candidate?





> 
> Signed-off-by: Tianyu Lan 
> ---
>  drivers/hv/connection.c   | 62 +++
>  drivers/hv/hyperv_vmbus.h |  1 +
>  2 files changed, 63 insertions(+)
> 
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 79bca653dce9..a0be9c11d737 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -101,6 +101,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
>  
>   msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
>   msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
> +
> + if (hv_isolation_type_snp()) {
> + msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
> + msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
> + }
> +
>   msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
>  
>   /*
> @@ -145,6 +151,29 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
> *msginfo, u32 version)
>   return -ECONNREFUSED;
>   }
>  
> + if (hv_isolation_type_snp()) {
> + vmbus_connection.monitor_pages_va[0]
> + = vmbus_connection.monitor_pages[0];
> + vmbus_connection.monitor_pages[0]
> + = ioremap_cache(msg->monitor_page1, HV_HYP_PAGE_SIZE);
> + if (!vmbus_connection.monitor_pages[0])
> + return -ENOMEM;
> +
> + vmbus_connection.monitor_pages_va[1]
> + = vmbus_connection.monitor_pages[1];
> + vmbus_connection.monitor_pages[1]
> + = ioremap_cache(msg->monitor_page2, HV_HYP_PAGE_SIZE);
> + if (!vmbus_connection.monitor_pages[1]) {
> + vunmap(vmbus_connection.monitor_pages[0]);
> + return -ENOMEM;
> + }
> +
> + memset(vmbus_connection.monitor_pages[0], 0x00,
> +HV_HYP_PAGE_SIZE);
> + memset(vmbus_connection.monitor_pages[1], 0x00,
> +HV_HYP_PAGE_SIZE);
> + }
> +
>   return ret;
>  }
>  
> @@ -156,6 +185,7 @@ int vmbus_connect(void)
>   struct vmbus_channel_msginfo *msginfo = NULL;
>   int i, ret = 0;
>   __u32 version;
> + u64 pfn[2];
>  
>   /* Initialize the vmbus connection */
>   vmbus_connection.conn_state = CONNECTING;
> @@ -213,6 +243,16 @@ int vmbus_connect(void)
>   goto cleanup;
>   }
>  
> + if (hv_isolation_type_snp()) {
> + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> + if (hv_mark_gpa_visibility(2, pfn,
> + VMBUS_PAGE_VISIBLE_READ_WRITE)) {
> + ret = -EFAULT;
> + goto cleanup;
> + }
> + }
> +
>   msginfo = kzalloc(sizeof(*msginfo) +
> sizeof(struct vmbus_channel_initiate_contact),
> GFP_KERNEL);
> @@ -279,6 +319,8 @@ int vmbus_connect(void)
>  
>  void vmbus_disconnect(void)
>  {
> + u64 pfn[2];
> +
>   /*
>* First send the unload request to the host.
>*/
> @@ -298,6 +340,26 @@ void vmbus_disconnect(void)
>   vmbus_connection.int_page = NULL;
>   }
>  
> + if (hv_isolation_type_snp()) {
> + if (vmbus_connection.monitor_pages_va[0]) {
> + vunmap(vmbus_connection.monitor_pages[0]);
> + vmbus_connection.monitor_pages[0]
> + = vmbus_connection.monitor_pages_va[0];
> + vmbus_connection.monitor_pages_va[0] = NULL;
> + }
> +
> + if (vmbus_connection.monitor_pages_va[1]) {
> + vunmap(vmbus_connection.monitor_pages[1]);
> + vmbus_connection.monitor_pages[1]
> + = vmbus_connection.monitor_pages_va[1];
> + vmbus_connection.monitor_pages_va[1] = NULL;
> + }
> +
> + pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
> + pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
> + hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
> + }
> +
>   hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages

Re: [Resend RFC PATCH V2 04/12] HV: Add Write/Read MSR registers via ghcb

2021-04-15 Thread Konrad Rzeszutek Wilk
On Wed, Apr 14, 2021 at 10:49:37AM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Hyper-V provides GHCB protocol to write Synthetic Interrupt
> Controller MSR registers and these registers are emulated by
> Hypervisor rather than paravisor.

What is paravisor? Is that the VMPL0 to borrow AMD speak from
https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf
 
> 
> Hyper-V requests to write SINTx MSR registers twice(once via
> GHCB and once via wrmsr instruction including the proxy bit 21)

Why? And what does 'proxy bit 21' mean? Isn't it just setting a bit
on the MSR?

Oh. Are you writting to it twice because you need to let the hypervisor
know (Hyper-V) which is done via the WRMSR. And then the inner
hypervisor (VMPL0) via the SINITx MSR?


> Guest OS ID MSR also needs to be set via GHCB.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  arch/x86/hyperv/hv_init.c   |  18 +
>  arch/x86/hyperv/ivm.c   | 130 
>  arch/x86/include/asm/mshyperv.h |  87 +
>  arch/x86/kernel/cpu/mshyperv.c  |   3 +
>  drivers/hv/hv.c |  65 +++-
>  include/asm-generic/mshyperv.h  |   4 +-
>  6 files changed, 261 insertions(+), 46 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 90e65fbf4c58..87b1dd9c84d6 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -475,6 +475,9 @@ void __init hyperv_init(void)
>  
>   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
>   *ghcb_base = ghcb_va;
> +
> + /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
>   }
>  
>   rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -561,6 +564,7 @@ void hyperv_cleanup(void)
>  
>   /* Reset our OS id */
>   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> + hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
>  
>   /*
>* Reset hypercall page reference before reset the page,
> @@ -668,17 +672,3 @@ bool hv_is_hibernation_supported(void)
>   return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
> -
> -enum hv_isolation_type hv_get_isolation_type(void)
> -{
> - if (!(ms_hyperv.features_b & HV_ISOLATION))
> - return HV_ISOLATION_TYPE_NONE;
> - return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
> -}
> -EXPORT_SYMBOL_GPL(hv_get_isolation_type);
> -
> -bool hv_is_isolation_supported(void)
> -{
> - return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
> -}
> -EXPORT_SYMBOL_GPL(hv_is_isolation_supported);
> diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
> index a5950b7a9214..2ec64b367aaf 100644
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -6,12 +6,139 @@
>   *  Tianyu Lan 
>   */
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  
> +union hv_ghcb {
> + struct ghcb ghcb;
> +} __packed __aligned(PAGE_SIZE);
> +
> +void hv_ghcb_msr_write(u64 msr, u64 value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return;
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> + memset(hv_ghcb, 0x00, HV_HYP_PAGE_SIZE);
> +
> + hv_ghcb->ghcb.protocol_version = 1;
> + hv_ghcb->ghcb.ghcb_usage = 0;
> +
> + ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
> + ghcb_set_rcx(&hv_ghcb->ghcb, msr);
> + ghcb_set_rax(&hv_ghcb->ghcb, lower_32_bits(value));
> + ghcb_set_rdx(&hv_ghcb->ghcb, value >> 32);
> + ghcb_set_sw_exit_info_1(&hv_ghcb->ghcb, 1);
> + ghcb_set_sw_exit_info_2(&hv_ghcb->ghcb, 0);
> +
> + VMGEXIT();
> +
> + if ((hv_ghcb->ghcb.save.sw_exit_info_1 & 0x) == 1)
> + pr_warn("Fail to write msr via ghcb.\n.");
> +
> + local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(hv_ghcb_msr_write);
> +
> +void hv_ghcb_msr_read(u64 msr, u64 *value)
> +{
> + union hv_ghcb *hv_ghcb;
> + void **ghcb_base;
> + unsigned long flags;
> +
> + if (!ms_hyperv.ghcb_base)
> + return;
> +
> + local_irq_save(flags);
> + ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
> + hv_ghcb = (union hv_ghcb *)*ghcb_base;
> + if (!hv_ghcb) {
> + local_irq_restore(flags);
> + return;
> + }
> +
> + memset(hv_ghcb, 0x00, PAGE_SIZE);
> + hv_ghcb->ghcb.protocol_version = 1;
> + hv_ghcb->ghcb.ghcb_usage = 0;
> +
> + ghcb_set_sw_exit_code(&hv_ghcb->ghcb, SVM_EXIT_MSR);
> + ghcb_

Re: [PATCH v3] swiotlb: Make SWIOTLB_NO_FORCE perform no allocation

2021-04-09 Thread Konrad Rzeszutek Wilk
On Thu, Apr 08, 2021 at 08:13:07PM -0700, Florian Fainelli wrote:
> 
> 
> On 3/24/2021 1:42 AM, Christoph Hellwig wrote:
> > On Mon, Mar 22, 2021 at 06:53:49PM -0700, Florian Fainelli wrote:
> >> When SWIOTLB_NO_FORCE is used, there should really be no allocations of
> >> default_nslabs to occur since we are not going to use those slabs. If a
> >> platform was somehow setting swiotlb_no_force and a later call to
> >> swiotlb_init() was to be made we would still be proceeding with
> >> allocating the default SWIOTLB size (64MB), whereas if swiotlb=noforce
> >> was set on the kernel command line we would have only allocated 2KB.
> >>
> >> This would be inconsistent and the point of initializing default_nslabs
> >> to 1, was intended to allocate the minimum amount of memory possible, so
> >> simply remove that minimal allocation period.
> >>
> >> Signed-off-by: Florian Fainelli 
> > 
> > Looks good,
> > 
> > Reviewed-by: Christoph Hellwig 
> > 
> 
> Thanks! Konrad, can you apply this patch to your for-linus-5.13 branch
> if you are also happy with it?

It should be now visible?
> -- 
> Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] ARM: Qualify enabling of swiotlb_init()

2021-04-01 Thread Konrad Rzeszutek Wilk
On Tue, Mar 30, 2021 at 07:36:07AM +0200, Christoph Hellwig wrote:
> On Mon, Mar 29, 2021 at 12:30:42PM -0700, Florian Fainelli wrote:
> > Should I toss this in Russell's patch tracker or do you need me to make
> > some changes to the patch?
> 
> Due to all the other changes in this area I don't think anything but
> the swiotlb tree makes much sense here.

I've put them all on 

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git
devel/for-linus-5.13


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] ARM: Qualify enabling of swiotlb_init()

2021-03-19 Thread Konrad Rzeszutek Wilk
On Fri, Mar 19, 2021 at 02:07:31PM +0100, Christoph Hellwig wrote:
> On Thu, Mar 18, 2021 at 09:03:33PM -0700, Florian Fainelli wrote:
> >  #ifdef CONFIG_ARM_LPAE
> > +   if (swiotlb_force == SWIOTLB_FORCE ||
> > +   max_pfn > arm_dma_pfn_limit)
> 
> Does arm_dma_pfn_limit do the right thing even with the weirdest
> remapping ranges?  Maybe a commen here would be useful.
> 
> > +   swiotlb_init(1);
> > +   else
> > +   swiotlb_force = SWIOTLB_NO_FORCE;
> 
> Konrad: what do you think of setting swiotlb_force to SWIOTLB_NO_FORCE
> and only switching it to SWIOTLB_NORMAL when swiotlb_init* is called?
> That kind makes more sense than forcing the callers to do it.
> 
> While we're at it, I think swiotlb_force should probably be renamed to
> swiotlb_mode or somethng like that.

swiotlb_mode sounds good.

Also it got me thinking - ARM on Xen at some point was a bit strange, so not 
sure how
the logic works here, Stefano?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: Make SWIOTLB_NO_FORCE perform no allocation

2021-03-18 Thread Konrad Rzeszutek Wilk
On Thu, Mar 18, 2021 at 09:00:54PM -0700, Florian Fainelli wrote:
> When SWIOTLB_NO_FORCE is used, there should really be no allocations of
> io_tlb_nslabs to occur since we are not going to use those slabs. If a
> platform was somehow setting swiotlb_no_force and a later call to
> swiotlb_init() was to be made we would still be proceeding with
> allocating the default SWIOTLB size (64MB), whereas if swiotlb=noforce
> was set on the kernel command line we would have only allocated 2KB.
> 
> This would be inconsistent and the point of initializing io_tlb_nslabs
> to 1, was to avoid hitting the test for io_tlb_nslabs being 0/not
> initialized.

Could you rebase this on devel/for-linus-5.13 in

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git

please?
> 
> Signed-off-by: Florian Fainelli 
> ---
>  kernel/dma/swiotlb.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c10e855a03bc..526c8321b76f 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -121,12 +121,10 @@ setup_io_tlb_npages(char *str)
>   }
>   if (*str == ',')
>   ++str;
> - if (!strcmp(str, "force")) {
> + if (!strcmp(str, "force"))
>   swiotlb_force = SWIOTLB_FORCE;
> - } else if (!strcmp(str, "noforce")) {
> + else if (!strcmp(str, "noforce"))
>   swiotlb_force = SWIOTLB_NO_FORCE;
> - io_tlb_nslabs = 1;
> - }
>  
>   return 0;
>  }
> @@ -284,6 +282,9 @@ swiotlb_init(int verbose)
>   unsigned char *vstart;
>   unsigned long bytes;
>  
> + if (swiotlb_force == SWIOTLB_NO_FORCE)
> + goto out;
> +
>   if (!io_tlb_nslabs) {
>   io_tlb_nslabs = (default_size >> IO_TLB_SHIFT);
>   io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> @@ -302,6 +303,7 @@ swiotlb_init(int verbose)
>   io_tlb_start = 0;
>   }
>   pr_warn("Cannot allocate buffer");
> +out:
>   no_iotlb_memory = true;
>  }
>  
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: Add swiotlb=off to disable SWIOTLB

2021-03-18 Thread Konrad Rzeszutek Wilk
> > 
> > In fact I should have looked more closely at that myself - checking
> > debugfs on my 4GB arm64 board actually shows io_tlb_nslabs = 0, and
> > indeed we are bypassing initialisation completely and (ab)using
> > SWIOTLB_NO_FORCE to cover it up, so I guess it probably *is* safe now
> > for the noforce option to do the same for itself and save even that one
> > page.
> 
> OK, I can submit a patch that does that. 5.12-rc3 works correctly for me
> here as well and only allocates SWIOTLB when needed which in our case is
> either:
> 
> - we have DRAM at PA >= 4GB
> - we have limited peripherals (Raspberry Pi 4 derivative) that can only
> address the lower 1GB
> 
> Now let's see if we can get ARM 32-bit to match :)

Whatever patch you come up with, if it is against SWIOTLB please base it on top 
of
devel/for-linus-5.12 in 
https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/

Thx
> -- 
> Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/14] swiotlb: move global variables into a new io_tlb_mem structure

2021-03-17 Thread Konrad Rzeszutek Wilk
On Wed, Mar 17, 2021 at 06:57:42PM +0100, Christoph Hellwig wrote:
> On Wed, Mar 17, 2021 at 01:51:56PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 17, 2021 at 02:53:27PM +0100, Christoph Hellwig wrote:
> > > On Wed, Mar 17, 2021 at 01:42:07PM +, Konrad Rzeszutek Wilk wrote:
> > > > > - alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(size_t));
> > > > > - io_tlb_alloc_size = memblock_alloc(alloc_size, PAGE_SIZE);
> > > > > - if (!io_tlb_alloc_size)
> > > > > - panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> > > > > -   __func__, alloc_size, PAGE_SIZE);
> > > > 
> > > > Shouldn't this be converted to:
> > > > mem->alloc_size = memblock_alloc(alloc_size, PAGE_SIZE);
> > > > if (...)
> > > > 
> > > > Seems that it got lost in the search and replace?
> > > 
> > > Yes, I messed that up during the rebase.  That being said it magically
> > > gets fixed in the next patch..
> > 
> > Yes. However if someone does a bisection they are going to be mighty unhappy
> > with you.
> 
> Sure, I was planning on fixing it anyway.  Just waiting for feedback
> on the rest of the patches before doing a respin.

I put the patches up-to this one on :

 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb 
devel/for-linus-5.13

Would you be OK rebasing on top of that and sending the patches?

Juergen, would you be OK testing that branch on your Xen setup?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/14] swiotlb: move global variables into a new io_tlb_mem structure

2021-03-17 Thread Konrad Rzeszutek Wilk
On Wed, Mar 17, 2021 at 02:53:27PM +0100, Christoph Hellwig wrote:
> On Wed, Mar 17, 2021 at 01:42:07PM +0000, Konrad Rzeszutek Wilk wrote:
> > > - alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(size_t));
> > > - io_tlb_alloc_size = memblock_alloc(alloc_size, PAGE_SIZE);
> > > - if (!io_tlb_alloc_size)
> > > - panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> > > -   __func__, alloc_size, PAGE_SIZE);
> > 
> > Shouldn't this be converted to:
> > mem->alloc_size = memblock_alloc(alloc_size, PAGE_SIZE);
> > if (...)
> > 
> > Seems that it got lost in the search and replace?
> 
> Yes, I messed that up during the rebase.  That being said it magically
> gets fixed in the next patch..

Yes. However if someone does a bisection they are going to be mighty unhappy
with you.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 12/14] swiotlb: move global variables into a new io_tlb_mem structure

2021-03-17 Thread Konrad Rzeszutek Wilk
..snip..
>  int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose)
>  {
..snip..
>   /*
>* Allocate and initialize the free list array.  This array is used
>* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
> -  * between io_tlb_start and io_tlb_end.
> +  * between mem->start and mem->end.
>*/
> - alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(int));
> - io_tlb_list = memblock_alloc(alloc_size, PAGE_SIZE);
> - if (!io_tlb_list)
> + alloc_size = PAGE_ALIGN(mem->nslabs * sizeof(int));
> + mem->list = memblock_alloc(alloc_size, PAGE_SIZE);
> + if (!mem->list)
>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> __func__, alloc_size, PAGE_SIZE);
>  
> - alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t));
> - io_tlb_orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
> - if (!io_tlb_orig_addr)
> + alloc_size = PAGE_ALIGN(mem->nslabs * sizeof(phys_addr_t));
> + mem->orig_addr = memblock_alloc(alloc_size, PAGE_SIZE);
> + if (!mem->orig_addr)
>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> __func__, alloc_size, PAGE_SIZE);
>  
> - alloc_size = PAGE_ALIGN(io_tlb_nslabs * sizeof(size_t));
> - io_tlb_alloc_size = memblock_alloc(alloc_size, PAGE_SIZE);
> - if (!io_tlb_alloc_size)
> - panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
> -   __func__, alloc_size, PAGE_SIZE);

Shouldn't this be converted to:
mem->alloc_size = memblock_alloc(alloc_size, PAGE_SIZE);
if (...)

Seems that it got lost in the search and replace?
> -
> - for (i = 0; i < io_tlb_nslabs; i++) {
> - io_tlb_list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
> - io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
> - io_tlb_alloc_size[i] = 0;
> + for (i = 0; i < mem->nslabs; i++) {
> + mem->list[i] = IO_TLB_SEGSIZE - io_tlb_offset(i);
> + mem->orig_addr[i] = INVALID_PHYS_ADDR;
> + mem->alloc_size[i] = 0;
>   }
> - io_tlb_index = 0;
>   no_iotlb_memory = false;
>  
>   if (verbose)
>   swiotlb_print_info();
>  
> - swiotlb_set_max_segment(io_tlb_nslabs << IO_TLB_SHIFT);
> + swiotlb_set_max_segment(mem->nslabs << IO_TLB_SHIFT);
>   return 0;
>  }
>  

..snip..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/14] swiotlb: move orig addr and size validation into swiotlb_bounce

2021-03-16 Thread Konrad Rzeszutek Wilk
On Mon, Mar 01, 2021 at 08:44:25AM +0100, Christoph Hellwig wrote:
> Move the code to find and validate the original buffer address and size
> from the callers into swiotlb_bounce.  This means a tiny bit of extra
> work in the swiotlb_map path, but avoids code duplication and a leads to
> a better code structure.


Reviewed-by: Konrad Rzeszutek Wilk 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 02/14] swiotlb: remove the alloc_size parameter to swiotlb_tbl_unmap_single

2021-03-16 Thread Konrad Rzeszutek Wilk
On Mon, Mar 01, 2021 at 08:44:24AM +0100, Christoph Hellwig wrote:
> Now that swiotlb remembers the allocation size there is no need to pass
> it back to swiotlb_tbl_unmap_single.
Reviewed-by: Konrad Rzeszutek Wilk 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: Fix type of max_slot

2021-03-02 Thread Konrad Rzeszutek Wilk

On 3/2/21 12:21 PM, Kunihiko Hayashi wrote:

After the refactoring phase, the type of max_slot has changed from unsigned
long to unsigned int. The return type of the function get_max_slots() and
the 4th argument type of iommu_is_span_boundary() are different from the
type of max_slot. Finally, asserts BUG_ON in iommu_is_span_boundary().

Cc: Christoph Hellwig 
Fixes: 567d877f9a7d ("swiotlb: refactor swiotlb_tbl_map_single")
Signed-off-by: Kunihiko Hayashi 


I think this is all good. Looking at Linus's master I see:


537 unsigned long max_slots = get_max_slots(boundary_mask);

?


---
  kernel/dma/swiotlb.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 369e4c3..c10e855 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -534,7 +534,7 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
-   unsigned int max_slots = get_max_slots(boundary_mask);
+   unsigned long max_slots = get_max_slots(boundary_mask);
unsigned int iotlb_align_mask =
dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
unsigned int nslots = nr_slots(alloc_size), stride;



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: swiotlb_tbl_map_single() kernel BUG in iommu-helper.h:30

2021-02-26 Thread Konrad Rzeszutek Wilk
On Fri, Feb 26, 2021 at 01:18:14PM -0800, Brad Larson wrote:
> On Fri, Feb 26, 2021 at 12:43:07PM -0800, Brad Larson wrote:
> > Kernel Oops introduced in next-20210222 due to get_max_slots return arg
> size.
> > In the function find_slots() variable max_slots is zero when
> boundary_mask is
> > 0x.
> 
> I am looking at the stable/for-linus-5.12 and what I sent out for
> a GIT PULL and I believe this is already squashed in:
> 
> 531 static int find_slots(struct device *dev, phys_addr_t orig_addr,  
>  
> 
> 532 size_t alloc_size)
>  
> 
> 533 { 
>  
> 534 unsigned long boundary_mask = dma_get_seg_boundary(dev);  
>  
> 
> 535 dma_addr_t tbl_dma_addr = 
>  
> 536 phys_to_dma_unencrypted(dev, io_tlb_start) &
> boundary_mask; 
> 537 unsigned long max_slots = get_max_slots(boundary_mask);
> 
> Could you double-check please?
> 
>  
> Yes this is squashed in the snippet you are showing.  The bug was introduced 
> in
> next-20210222 and is still there when I updated to next-20210226 today. 

Duh! It should be fixed now for the next one. Thank you!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: swiotlb_tbl_map_single() kernel BUG in iommu-helper.h:30

2021-02-26 Thread Konrad Rzeszutek Wilk
On Fri, Feb 26, 2021 at 12:43:07PM -0800, Brad Larson wrote:
> Kernel Oops introduced in next-20210222 due to get_max_slots return arg size.
> In the function find_slots() variable max_slots is zero when boundary_mask is
> 0x.

I am looking at the stable/for-linus-5.12 and what I sent out for
a GIT PULL and I believe this is already squashed in:

531 static int find_slots(struct device *dev, phys_addr_t orig_addr,

532 size_t alloc_size)  

533 {   

534 unsigned long boundary_mask = dma_get_seg_boundary(dev);

535 dma_addr_t tbl_dma_addr =   

536 phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask; 

537 unsigned long max_slots = get_max_slots(boundary_mask);

Could you double-check please?

> 
> [0.242119] kernel BUG at ./include/linux/iommu-helper.h:30!
> [0.247793] Internal error: Oops - BUG: 0 [#1] SMP
> [0.252595] Modules linked in:
> [0.255657] CPU: 0 PID: 93 Comm: kworker/0:1 Not tainted 
> 5.11.0-next-20210224+ #25
> [0.263245] Hardware name: Elba ASIC Board (DT)
> [0.267784] Workqueue: events_freezable mmc_rescan
> [0.272592] pstate: 6085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
> [0.278612] pc : swiotlb_tbl_map_single+0x2b0/0x6a0
> [0.283505] lr : swiotlb_tbl_map_single+0x440/0x6a0
> [0.288395] sp : ffc0122736b0
> [0.291713] x29: ffc0122736b0 x28: ffc010e3
> [0.297039] x27: bbf58000 x26: 
> [0.302364] x25:  x24: 0001
> [0.307689] x23:  x22: 
> [0.313013] x21:  x20: 
> [0.318338] x19: 001241fd4600 x18: ffc010d288c8
> [0.323662] x17: 0007 x16: 0001
> [0.328987] x15: ffc092273367 x14: 3a424c54204f4920
> [0.334311] x13: 6572617774666f73 x12: 20726e2030207865
> [0.339636] x11: 646e692078787820 x10: 3062653737317830
> [0.344960] x9 : 2074666968732031 x8 : ff977cf82368
> [0.350285] x7 : 0001 x6 : c000efff
> [0.355609] x5 : 00017fe8 x4 : 
> [0.360934] x3 :  x2 : 18b0d50da009d000
> [0.366258] x1 :  x0 : 0042
> [0.371583] Call trace:
> [0.374032]  swiotlb_tbl_map_single+0x2b0/0x6a0
> [0.378573]  swiotlb_map+0xa8/0x2b0
> 
> Signed-off-by: Brad Larson 
> ---
>  kernel/dma/swiotlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 369e4c3a0f2b..c10e855a03bc 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -534,7 +534,7 @@ static int find_slots(struct device *dev, phys_addr_t 
> orig_addr,
>   unsigned long boundary_mask = dma_get_seg_boundary(dev);
>   dma_addr_t tbl_dma_addr =
>   phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
> - unsigned int max_slots = get_max_slots(boundary_mask);
> + unsigned long max_slots = get_max_slots(boundary_mask);
>   unsigned int iotlb_align_mask =
>   dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
>   unsigned int nslots = nr_slots(alloc_size), stride;
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/9] swiotlb: refactor swiotlb_tbl_map_single

2021-02-22 Thread Konrad Rzeszutek Wilk
> > +static int find_slots(struct device *dev, size_t alloc_size)
> > +{
> > +   unsigned long boundary_mask = dma_get_seg_boundary(dev);
> > +   dma_addr_t tbl_dma_addr =
> > +   phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
> > +   unsigned int max_slots = get_max_slots(boundary_mask);
> 
> 'max_slots' should be 'unsigned long' here. Breaks SWIOTLB on RPi4. Do you 
> want
> me to send a fix or you prefer editing the patch?

I can roll it in. Thx!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/8] xen-swiotlb: use io_tlb_end in xen_swiotlb_dma_supported

2021-02-19 Thread Konrad Rzeszutek Wilk
On Sun, Feb 07, 2021 at 05:09:29PM +0100, Christoph Hellwig wrote:
> Use the existing variable that holds the physical address for
> xen_io_tlb_end to simplify xen_swiotlb_dma_supported a bit, and remove
> the otherwise unused xen_io_tlb_end variable and the xen_virt_to_bus
> helper.
> 
Reviewed-by: Konrad Rzeszutek Wilk 

> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/xen/swiotlb-xen.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index a4026822a889f7..4298f74a083985 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -46,7 +46,7 @@
>   * API.
>   */
>  
> -static char *xen_io_tlb_start, *xen_io_tlb_end;
> +static char *xen_io_tlb_start;
>  static unsigned long xen_io_tlb_nslabs;
>  /*
>   * Quick lookup value of the bus address of the IOTLB.
> @@ -82,11 +82,6 @@ static inline phys_addr_t xen_dma_to_phys(struct device 
> *dev,
>   return xen_bus_to_phys(dev, dma_to_phys(dev, dma_addr));
>  }
>  
> -static inline dma_addr_t xen_virt_to_bus(struct device *dev, void *address)
> -{
> - return xen_phys_to_dma(dev, virt_to_phys(address));
> -}
> -
>  static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>  {
>   unsigned long next_bfn, xen_pfn = XEN_PFN_DOWN(p);
> @@ -250,7 +245,6 @@ int __ref xen_swiotlb_init(int verbose, bool early)
>   rc = swiotlb_late_init_with_tbl(xen_io_tlb_start, 
> xen_io_tlb_nslabs);
>  
>  end:
> - xen_io_tlb_end = xen_io_tlb_start + bytes;
>   if (!rc)
>   swiotlb_set_max_segment(PAGE_SIZE);
>  
> @@ -558,7 +552,7 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct 
> scatterlist *sgl,
>  static int
>  xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> - return xen_virt_to_bus(hwdev, xen_io_tlb_end - 1) <= mask;
> + return xen_phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
>  }
>  
>  const struct dma_map_ops xen_swiotlb_dma_ops = {
> -- 
> 2.29.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/8] xen-swiotlb: use is_swiotlb_buffer in is_xen_swiotlb_buffer

2021-02-19 Thread Konrad Rzeszutek Wilk
On Sun, Feb 07, 2021 at 05:09:28PM +0100, Christoph Hellwig wrote:
> Use the is_swiotlb_buffer to check if a physical address is
> a swiotlb buffer.  This works because xen-swiotlb does use the
> same buffer as the main swiotlb code, and xen_io_tlb_{start,end}
> are just the addresses for it that went through phys_to_virt.
> 

Reviewed-by: Konrad Rzeszutek Wilk 

> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/xen/swiotlb-xen.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 2b385c1b4a99cb..a4026822a889f7 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -111,10 +111,8 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
> dma_addr_t dma_addr)
>* have the same virtual address as another address
>* in our domain. Therefore _only_ check address within our domain.
>*/
> - if (pfn_valid(PFN_DOWN(paddr))) {
> - return paddr >= virt_to_phys(xen_io_tlb_start) &&
> -paddr < virt_to_phys(xen_io_tlb_end);
> - }
> + if (pfn_valid(PFN_DOWN(paddr)))
> + return is_swiotlb_buffer(paddr);
>   return 0;
>  }
>  
> -- 
> 2.29.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 5/6] xen-swiotlb: convert variables to arrays

2021-02-19 Thread Konrad Rzeszutek Wilk
On Sun, Feb 07, 2021 at 04:56:01PM +0100, Christoph Hellwig wrote:
> On Thu, Feb 04, 2021 at 09:40:23AM +0100, Christoph Hellwig wrote:
> > So one thing that has been on my mind for a while:  I'd really like
> > to kill the separate dma ops in Xen swiotlb.  If we compare xen-swiotlb
> > to swiotlb the main difference seems to be:
> > 
> >  - additional reasons to bounce I/O vs the plain DMA capable
> >  - the possibility to do a hypercall on arm/arm64
> >  - an extra translation layer before doing the phys_to_dma and vice
> >versa
> >  - an special memory allocator
> > 
> > I wonder if inbetween a few jump labels or other no overhead enablement
> > options and possibly better use of the dma_range_map we could kill
> > off most of swiotlb-xen instead of maintaining all this code duplication?
> 
> So I looked at this a bit more.
> 
> For x86 with XENFEAT_auto_translated_physmap (how common is that?)

Juergen, Boris please correct me if I am wrong, but that 
XENFEAT_auto_translated_physmap
only works for PVH guests?

> pfn_to_gfn is a nop, so plain phys_to_dma/dma_to_phys do work as-is.
> 
> xen_arch_need_swiotlb always returns true for x86, and
> range_straddles_page_boundary should never be true for the
> XENFEAT_auto_translated_physmap case.

Correct. The kernel should have no clue of what the real MFNs are
for PFNs.
> 
> So as far as I can tell the mapping fast path for the
> XENFEAT_auto_translated_physmap can be trivially reused from swiotlb.
> 
> That leaves us with the next more complicated case, x86 or fully cache
> coherent arm{,64} without XENFEAT_auto_translated_physmap.  In that case
> we need to patch in a phys_to_dma/dma_to_phys that performs the MFN
> lookup, which could be done using alternatives or jump labels.
> I think if that is done right we should also be able to let that cover
> the foreign pages in is_xen_swiotlb_buffer/is_swiotlb_buffer, but
> in that worst case that would need another alternative / jump label.
> 
> For non-coherent arm{,64} we'd also need to use alternatives or jump
> labels to for the cache maintainance ops, but that isn't a hard problem
> either.
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: preserve DMA offsets when using swiotlb v2

2021-02-19 Thread Konrad Rzeszutek Wilk
On Tue, Feb 09, 2021 at 11:49:40AM -0800, Jianxiong Gao wrote:
> > Sorry for being a little pushy, any chance we could get this reviewed
> > in time for the 5.12 merge window?
> >
> Tested and it looks good. Thanks for clearing it up!

Let me put it on my regression bucket as last time it was able to catch
issues which Jianxiong's rig did not.

> 
> Gao
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: Validate bounce size in the sync/unmap path

2021-02-08 Thread Konrad Rzeszutek Wilk
On Fri, Feb 05, 2021 at 06:58:52PM +0100, Christoph Hellwig wrote:
> On Wed, Feb 03, 2021 at 02:36:38PM -0500, Konrad Rzeszutek Wilk wrote:
> > > So what?  If you guys want to provide a new capability you'll have to do
> > > work.  And designing a new protocol based around the fact that the
> > > hardware/hypervisor is not trusted and a copy is always required makes
> > > a lot of more sense than throwing in band aids all over the place.
> > 
> > If you don't trust the hypervisor, what would this capability be in?
> 
> Well, they don't trust the hypervisor to not attack the guest somehow,
> except through the data read.  I never really understood the concept,
> as it leaves too many holes.
> 
> But the point is that these schemes want to force bounce buffering
> because they think it is more secure.  And if that is what you want
> you better have protocol build around the fact that each I/O needs
> to use bounce buffers, so you make those buffers the actual shared
> memory use for communication, and build the protocol around it.

Right. That is what the SWIOTLB pool ends up being as it is allocated at
bootup where the guest tells the hypervisor - these are shared and
clear-text.

> E.g. you don't force the ridiculous NVMe PRP offset rules on the block
> layer, just to make a complicated swiotlb allocation that needs to
> preserve the alignment just do I/O.  But instead you have a trivial

I agree that NVMe is being silly. It could have allocated the coherent
pool and use that and do its own offset within that. That would in
essence carve out a static pool within the SWIOTLB static one..

TTM does that - it has its own DMA machinery on top of DMA API to deal
with its "passing" buffers from one application to another and the fun
of keeping track of that.

> ring buffer or whatever because you know I/O will be copied anyway
> and none of all the hard work higher layers do to make the I/O suitable
> for a normal device apply.

I lost you here. Sorry, are you saying have a simple ring protocol
(like NVME has), where the ring entries (SG or DMA phys) are statically
allocated and whenever NVME driver gets data from user-space it
would copy it in there?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC v1 2/6] swiotlb: convert variables to arrays

2021-02-04 Thread Konrad Rzeszutek Wilk
On Thu, Feb 04, 2021 at 11:49:23AM +, Robin Murphy wrote:
> On 2021-02-04 07:29, Christoph Hellwig wrote:
> > On Wed, Feb 03, 2021 at 03:37:05PM -0800, Dongli Zhang wrote:
> > > This patch converts several swiotlb related variables to arrays, in
> > > order to maintain stat/status for different swiotlb buffers. Here are
> > > variables involved:
> > > 
> > > - io_tlb_start and io_tlb_end
> > > - io_tlb_nslabs and io_tlb_used
> > > - io_tlb_list
> > > - io_tlb_index
> > > - max_segment
> > > - io_tlb_orig_addr
> > > - no_iotlb_memory
> > > 
> > > There is no functional change and this is to prepare to enable 64-bit
> > > swiotlb.
> > 
> > Claire Chang (on Cc) already posted a patch like this a month ago,
> > which looks much better because it actually uses a struct instead
> > of all the random variables.
> 
> Indeed, I skimmed the cover letter and immediately thought that this whole
> thing is just the restricted DMA pool concept[1] again, only from a slightly
> different angle.


Kind of. Let me lay out how some of these pieces are right now:

+---+  +--+
|   |  |  |
|   |  |  |
|   a)Xen-SWIOTLB   |  | b)SWIOTLB (for !Xen) |
|   |  |  |
+---XX--+  +---X--+
   X
  XX XXX
X   XX

 +--XX---+
 |   |
 |   |
 |   c) SWIOTLB generic  |
 |   |
 +---+

Dongli's patches modify the SWIOTLB generic c), and Xen-SWIOTLB a)
parts.

Also see the IOMMU_INIT logic which lays this a bit more deepth
(for example how to enable SWIOTLB on AMD boxes, or IBM with Calgary
IOMMU, etc - see iommu_table.h).

Furtheremore it lays the groundwork to allocate AMD SEV SWIOTLB buffers
later after boot (so that you can stich different pools together).
All the bits are kind of inside of the SWIOTLB code. And also it changes
the Xen-SWIOTLB to do something similar.

The mempool did it similarly by taking the internal parts (aka the
various io_tlb) of SWIOTLB and exposing them out and having
other code:

+---+  +--+
|   |  |  |
|   |  |  |
| a)Xen-SWIOTLB |  | b)SWIOTLB (for !Xen) |
|   |  |  |
+---XX--+  +---X--+
   X
  XX XXX
X   XX

 +--XX---+ +--+
 |   | | Device tree  |
 |   +<+ enabling SWIOTLB |
 |c) SWIOTLB generic | |  |
 |   | | mempool  |
 +---+ +--+

What I was suggesting to Clarie to follow Xen model, that is
do something like this:

+---+  +--+   ++
|   |  |  |   ||
|   |  |  |   ||
| a)Xen-SWIOTLB |  | b)SWIOTLB (for !Xen) |   | e) DT-SWIOTLB  |
|   |  |  |   ||
+---XX--+  +---X--+   +XX-X+
   XXXX X X XX X XX
  XX XXX
X   XX X

 +--XXX--+
 |   |
 |   |
 |c) SWIOTLB generic |
 |   |
 +---+


so using the SWIOTLB generic parts, and then bolt on top
of the device-tree logic, along with the mempool logic.



But Christopher has an interesting suggestion which is
to squash the all the existing code (a, b, c) all together
and pepper it with various jump-tables.


So:


-+
| SWIOTLB:   |
||
|  a) SWIOTLB (for non-Xen)  |
|  b) Xen-SWIOTLB|
|  c) DT-SWIOTLB |
||
||
-+


with all the various bits (M2P/P2M for Xen, mempool for ARM,
and normal allocation for BM) in one big file.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/l

Re: [PATCH] swiotlb: Validate bounce size in the sync/unmap path

2021-02-03 Thread Konrad Rzeszutek Wilk
On Wed, Feb 03, 2021 at 01:49:22PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 18, 2021 at 12:44:58PM +0100, Martin Radev wrote:
> > Your comment makes sense but then that would require the cooperation
> > of these vendors and the cloud providers to agree on something meaningful.
> > I am also not sure whether the end result would be better than hardening
> > this interface to catch corruption. There is already some validation in
> > unmap path anyway.
> 
> So what?  If you guys want to provide a new capability you'll have to do
> work.  And designing a new protocol based around the fact that the
> hardware/hypervisor is not trusted and a copy is always required makes
> a lot of more sense than throwing in band aids all over the place.

If you don't trust the hypervisor, what would this capability be in?

I suppose you mean this would need to be in the the guest kernel and
this protocol would depend on .. not-hypervisor and most certainly not
the virtio or any SR-IOV device. That removes a lot of options.

The one sensibile one (since folks will trust OEM vendors like Intel
or AMD to provide the memory encryption so they will also trust the
IOMMU - I hope?) - and they do have plans for that with their IOMMU
frameworks which will remove the need for SWIOTLB (I hope).

But that is not now, but in future.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: Validate bounce size in the sync/unmap path

2021-02-02 Thread Konrad Rzeszutek Wilk
On Tue, Feb 02, 2021 at 04:34:09PM -0600, Tom Lendacky wrote:
> On 2/2/21 10:37 AM, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jan 25, 2021 at 07:33:35PM +0100, Martin Radev wrote:
> >> On Mon, Jan 18, 2021 at 10:14:28AM -0500, Konrad Rzeszutek Wilk wrote:
> >>> On Mon, Jan 18, 2021 at 12:44:58PM +0100, Martin Radev wrote:
> >>>> On Wed, Jan 13, 2021 at 12:30:17PM +0100, Christoph Hellwig wrote:
> >>>>> On Tue, Jan 12, 2021 at 04:07:29PM +0100, Martin Radev wrote:
> >>>>>> The size of the buffer being bounced is not checked if it happens
> >>>>>> to be larger than the size of the mapped buffer. Because the size
> >>>>>> can be controlled by a device, as it's the case with virtio devices,
> >>>>>> this can lead to memory corruption.
> >>>>>>
> >>>>>
> >>>>> I'm really worried about all these hodge podge hacks for not trusted
> >>>>> hypervisors in the I/O stack.  Instead of trying to harden protocols
> >>>>> that are fundamentally not designed for this, how about instead coming
> >>>>> up with a new paravirtualized I/O interface that is specifically
> >>>>> designed for use with an untrusted hypervisor from the start?
> >>>>
> >>>> Your comment makes sense but then that would require the cooperation
> >>>> of these vendors and the cloud providers to agree on something 
> >>>> meaningful.
> >>>> I am also not sure whether the end result would be better than hardening
> >>>> this interface to catch corruption. There is already some validation in
> >>>> unmap path anyway.
> >>>>
> >>>> Another possibility is to move this hardening to the common virtio code,
> >>>> but I think the code may become more complicated there since it would
> >>>> require tracking both the dma_addr and length for each descriptor.
> >>>
> >>> Christoph,
> >>>
> >>> I've been wrestling with the same thing - this is specific to busted
> >>> drivers. And in reality you could do the same thing with a hardware
> >>> virtio device (see example in 
> >>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fthunderclap.io%2F&data=04%7C01%7Cthomas.lendacky%40amd.com%7Cfc27af49d9a943699f6c08d8c798eed4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637478806973542212%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aUVqobkOSDfDhCAEauABOUvCAaIcw%2FTh07YFxeBjBDU%3D&reserved=0)
> >>>  - where the
> >>> mitigation is 'enable the IOMMU to do its job.'.
> >>>
> >>> AMD SEV documents speak about utilizing IOMMU to do this (AMD SEV-SNP)..
> >>> and while that is great in the future, SEV without IOMMU is now here.
> >>>
> >>> Doing a full circle here, this issue can be exploited with virtio
> >>> but you could say do that with real hardware too if you hacked the
> >>> firmware, so if you say used Intel SR-IOV NIC that was compromised
> >>> on an AMD SEV machine, and plumbed in the guest - the IOMMU inside
> >>> of the guest would be SWIOTLB code. Last line of defense against
> >>> bad firmware to say.
> >>>
> >>> As such I am leaning towards taking this code, but I am worried
> >>> about the performance hit .. but perhaps I shouldn't as if you
> >>> are using SWIOTLB=force already you are kind of taking a
> >>> performance hit?
> >>>
> >>
> >> I have not measured the performance degradation. This will hit all AMD SEV,
> >> Intel TDX, IBM Protected Virtualization VMs. I don't expect the hit to
> >> be large since there are only few added operations per hundreads of copied
> >> bytes. I could try to measure the performance hit by running some benchmark
> >> with virtio-net/virtio-blk/virtio-rng.
> >>
> >> Earlier I said:
> >>>> Another possibility is to move this hardening to the common virtio code,
> >>>> but I think the code may become more complicated there since it would
> >>>> require tracking both the dma_addr and length for each descriptor.
> >>
> >> Unfortunately, this doesn't make sense. Even if there's validation for
> >> the size in the common virtio layer, there will be some other device
> >> which controls a dma_addr and length passed to dma_unmap

Re: [PATCH] swiotlb: Validate bounce size in the sync/unmap path

2021-02-02 Thread Konrad Rzeszutek Wilk
On Mon, Jan 25, 2021 at 07:33:35PM +0100, Martin Radev wrote:
> On Mon, Jan 18, 2021 at 10:14:28AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jan 18, 2021 at 12:44:58PM +0100, Martin Radev wrote:
> > > On Wed, Jan 13, 2021 at 12:30:17PM +0100, Christoph Hellwig wrote:
> > > > On Tue, Jan 12, 2021 at 04:07:29PM +0100, Martin Radev wrote:
> > > > > The size of the buffer being bounced is not checked if it happens
> > > > > to be larger than the size of the mapped buffer. Because the size
> > > > > can be controlled by a device, as it's the case with virtio devices,
> > > > > this can lead to memory corruption.
> > > > > 
> > > > 
> > > > I'm really worried about all these hodge podge hacks for not trusted
> > > > hypervisors in the I/O stack.  Instead of trying to harden protocols
> > > > that are fundamentally not designed for this, how about instead coming
> > > > up with a new paravirtualized I/O interface that is specifically
> > > > designed for use with an untrusted hypervisor from the start?
> > > 
> > > Your comment makes sense but then that would require the cooperation
> > > of these vendors and the cloud providers to agree on something meaningful.
> > > I am also not sure whether the end result would be better than hardening
> > > this interface to catch corruption. There is already some validation in
> > > unmap path anyway.
> > > 
> > > Another possibility is to move this hardening to the common virtio code,
> > > but I think the code may become more complicated there since it would
> > > require tracking both the dma_addr and length for each descriptor.
> > 
> > Christoph,
> > 
> > I've been wrestling with the same thing - this is specific to busted
> > drivers. And in reality you could do the same thing with a hardware
> > virtio device (see example in http://thunderclap.io/) - where the
> > mitigation is 'enable the IOMMU to do its job.'.
> > 
> > AMD SEV documents speak about utilizing IOMMU to do this (AMD SEV-SNP)..
> > and while that is great in the future, SEV without IOMMU is now here.
> > 
> > Doing a full circle here, this issue can be exploited with virtio
> > but you could say do that with real hardware too if you hacked the
> > firmware, so if you say used Intel SR-IOV NIC that was compromised
> > on an AMD SEV machine, and plumbed in the guest - the IOMMU inside
> > of the guest would be SWIOTLB code. Last line of defense against
> > bad firmware to say.
> > 
> > As such I am leaning towards taking this code, but I am worried
> > about the performance hit .. but perhaps I shouldn't as if you
> > are using SWIOTLB=force already you are kind of taking a
> > performance hit?
> > 
> 
> I have not measured the performance degradation. This will hit all AMD SEV,
> Intel TDX, IBM Protected Virtualization VMs. I don't expect the hit to
> be large since there are only few added operations per hundreads of copied
> bytes. I could try to measure the performance hit by running some benchmark
> with virtio-net/virtio-blk/virtio-rng.
> 
> Earlier I said:
> > > Another possibility is to move this hardening to the common virtio code,
> > > but I think the code may become more complicated there since it would
> > > require tracking both the dma_addr and length for each descriptor.
> 
> Unfortunately, this doesn't make sense. Even if there's validation for
> the size in the common virtio layer, there will be some other device
> which controls a dma_addr and length passed to dma_unmap* in the
> corresponding driver. The device can target a specific dma-mapped private
> buffer by changing the dma_addr and set a good length to overwrite buffers
> following it.
> 
> So, instead of doing the check in every driver and hitting a performance
> cost even when swiotlb is not used, it's probably better to fix it in
> swiotlb.
> 
> @Tom Lendacky, do you think that it makes sense to harden swiotlb or
> some other approach may be better for the SEV features?

I am not Tom, but this change seems the right way forward regardless if
is TDX, AMD SEV, or any other architecture that encrypt memory and use
SWIOTLB.

Let me queue it up in development branch and do some regression testing.
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.

2021-01-28 Thread Konrad Rzeszutek Wilk
On Wed, Jan 27, 2021 at 04:38:28PM -0800, Jianxiong Gao wrote:
> For devices that need to preserve address offset on mapping through
> swiotlb, this patch adds offset preserving based on page_offset_mask
> and keeps the offset if the mask is non zero. This is needed for
> device drivers like NVMe.



Didn't you send this patch like a month ago and someone pointed
out that the right fix would be in the NVMe driver?

Is there an issue with fixing the NVMe driver?

> 
> Signed-off-by: Jianxiong Gao 
> ---
>  kernel/dma/swiotlb.c | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 7c42df6e6100..4cab35f2c9bc 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>   dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
>   unsigned long flags;
>   phys_addr_t tlb_addr;
> - unsigned int nslots, stride, index, wrap;
> + unsigned int nslots, stride, index, wrap, page_offset_mask, page_offset;
>   int i;
>   unsigned long mask;
>   unsigned long offset_slots;
> @@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device 
> *hwdev, phys_addr_t orig_addr,
>   ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
>   : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
>  
> + page_offset_mask = dma_get_page_offset_mask(hwdev);
> + page_offset = orig_addr & page_offset_mask;
> + alloc_size += page_offset;
> +
>   /*
>* For mappings greater than or equal to a page, we limit the stride
>* (and hence alignment) to a page size.
>*/
>   nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT;
> - if (alloc_size >= PAGE_SIZE)
> + if ((alloc_size >= PAGE_SIZE) || (page_offset_mask > (1 << 
> IO_TLB_SHIFT)))
>   stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT));
>   else
>   stride = 1;
> @@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>*/
>   for (i = 0; i < nslots; i++)
>   io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> + /*
> +  * When keeping the offset of the original data, we need to advance
> +  * the tlb_addr by the offset of orig_addr.
> +  */
> + tlb_addr += page_offset;
>   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
> DMA_TO_DEVICE);
> @@ -598,7 +607,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
> enum dma_data_direction dir, unsigned long attrs)
>  {
>   unsigned long flags;
> - int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> 
> IO_TLB_SHIFT;
> + unsigned int num_page_offset_slabs, page_offset_mask = 
> dma_get_page_offset_mask(hwdev);
> + int i, count;
> + int nslots = ALIGN(alloc_size + tlb_addr & page_offset_mask, 1 << 
> IO_TLB_SHIFT) >> IO_TLB_SHIFT;
>   int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT;
>   phys_addr_t orig_addr = io_tlb_orig_addr[index];
>  
> @@ -610,6 +621,14 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>   ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL)))
>   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
> DMA_FROM_DEVICE);
>  
> + /*
> +  * When dma_get_page_offset_mask is used, we may have padded more slabs
> +  * when padding exceeds one slab. We need to move index back to the
> +  * beginning of the padding.
> +  */
> + num_page_offset_slabs =  (tlb_addr & page_offset_mask) / (1 << 
> IO_TLB_SHIFT);
> + index -= num_page_offset_slabs;
> +
>   /*
>* Return the buffer to the free list by setting the corresponding
>* entries to indicate the number of contiguous entries available.
> -- 
> 2.27.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] kernel/dma: remove unnecessary unmap_kernel_range

2021-01-26 Thread Konrad Rzeszutek Wilk
On Tue, Jan 26, 2021 at 02:54:01PM +1000, Nicholas Piggin wrote:
> vunmap will remove ptes.

Should there be some ASSERT after the vunmap to make sure that is the
case? 
> 
> Cc: Christoph Hellwig 
> Cc: Marek Szyprowski 
> Cc: Robin Murphy 
> Cc: iommu@lists.linux-foundation.org
> Signed-off-by: Nicholas Piggin 
> ---
>  kernel/dma/remap.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
> index 905c3fa005f1..b4526668072e 100644
> --- a/kernel/dma/remap.c
> +++ b/kernel/dma/remap.c
> @@ -66,6 +66,5 @@ void dma_common_free_remap(void *cpu_addr, size_t size)
>   return;
>   }
>  
> - unmap_kernel_range((unsigned long)cpu_addr, PAGE_ALIGN(size));
>   vunmap(cpu_addr);
>  }
> -- 
> 2.23.0
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] swiotlb: Validate bounce size in the sync/unmap path

2021-01-18 Thread Konrad Rzeszutek Wilk
On Mon, Jan 18, 2021 at 12:44:58PM +0100, Martin Radev wrote:
> On Wed, Jan 13, 2021 at 12:30:17PM +0100, Christoph Hellwig wrote:
> > On Tue, Jan 12, 2021 at 04:07:29PM +0100, Martin Radev wrote:
> > > The size of the buffer being bounced is not checked if it happens
> > > to be larger than the size of the mapped buffer. Because the size
> > > can be controlled by a device, as it's the case with virtio devices,
> > > this can lead to memory corruption.
> > > 
> > 
> > I'm really worried about all these hodge podge hacks for not trusted
> > hypervisors in the I/O stack.  Instead of trying to harden protocols
> > that are fundamentally not designed for this, how about instead coming
> > up with a new paravirtualized I/O interface that is specifically
> > designed for use with an untrusted hypervisor from the start?
> 
> Your comment makes sense but then that would require the cooperation
> of these vendors and the cloud providers to agree on something meaningful.
> I am also not sure whether the end result would be better than hardening
> this interface to catch corruption. There is already some validation in
> unmap path anyway.
> 
> Another possibility is to move this hardening to the common virtio code,
> but I think the code may become more complicated there since it would
> require tracking both the dma_addr and length for each descriptor.

Christoph,

I've been wrestling with the same thing - this is specific to busted
drivers. And in reality you could do the same thing with a hardware
virtio device (see example in http://thunderclap.io/) - where the
mitigation is 'enable the IOMMU to do its job.'.

AMD SEV documents speak about utilizing IOMMU to do this (AMD SEV-SNP)..
and while that is great in the future, SEV without IOMMU is now here.

Doing a full circle here, this issue can be exploited with virtio
but you could say do that with real hardware too if you hacked the
firmware, so if you say used Intel SR-IOV NIC that was compromised
on an AMD SEV machine, and plumbed in the guest - the IOMMU inside
of the guest would be SWIOTLB code. Last line of defense against
bad firmware to say.

As such I am leaning towards taking this code, but I am worried
about the performance hit .. but perhaps I shouldn't as if you
are using SWIOTLB=force already you are kind of taking a
performance hit?

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-07 Thread Konrad Rzeszutek Wilk
On Thu, Jan 07, 2021 at 10:09:14AM -0800, Florian Fainelli wrote:
> On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
> >> Hi Greg and Konrad,
> >>
> >> This change is intended to be non-arch specific. Any arch that lacks DMA 
> >> access
> >> control and has devices not behind an IOMMU can make use of it. Could you 
> >> share
> >> why you think this should be arch specific?
> > 
> > The idea behind non-arch specific code is it to be generic. The devicetree
> > is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> > be in arch specific code.
> 
> In premise the same code could be used with an ACPI enabled system with
> an appropriate service to identify the restricted DMA regions and unlock
> them.

Which this patchset is not.

> 
> More than 1 architecture requiring this function (ARM and ARM64 are the
> two I can think of needing this immediately) sort of calls for making
> the code architecture agnostic since past 2, you need something that scales.

I believe the use-case is for ARM64 at this moment.

> 
> There is already code today under kernel/dma/contiguous.c that is only
> activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
> no different.
> -- 
> Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

2021-01-07 Thread Konrad Rzeszutek Wilk
On Fri, Jan 08, 2021 at 01:39:43AM +0800, Claire Chang wrote:
> On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
>  wrote:
> >
> > On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > > Introduce the new compatible string, restricted-dma-pool, for restricted
> > > DMA. One can specify the address and length of the restricted DMA memory
> > > region by restricted-dma-pool in the device tree.
> > >
> > > Signed-off-by: Claire Chang 
> > > ---
> > >  .../reserved-memory/reserved-memory.txt   | 24 +++
> > >  1 file changed, 24 insertions(+)
> > >
> > > diff --git 
> > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > index e8d3096d922c..44975e2a1fd2 100644
> > > --- 
> > > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > +++ 
> > > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> > >used as a shared pool of DMA buffers for a set of devices. It 
> > > can
> > >be used by an operating system to instantiate the necessary 
> > > pool
> > >management subsystem if necessary.
> > > +- restricted-dma-pool: This indicates a region of memory meant 
> > > to be
> > > +  used as a pool of restricted DMA buffers for a set of devices. 
> > > The
> > > +  memory region would be the only region accessible to those 
> > > devices.
> > > +  When using this, the no-map and reusable properties must not 
> > > be set,
> > > +  so the operating system can create a virtual mapping that will 
> > > be used
> > > +  for synchronization. The main purpose for restricted DMA is to
> > > +  mitigate the lack of DMA access control on systems without an 
> > > IOMMU,
> > > +  which could result in the DMA accessing the system memory at
> > > +  unexpected times and/or unexpected addresses, possibly leading 
> > > to data
> > > +  leakage or corruption. The feature on its own provides a basic 
> > > level
> > > +  of protection against the DMA overwriting buffer contents at
> > > +  unexpected times. However, to protect against general data 
> > > leakage and
> > > +  system memory corruption, the system needs to provide way to 
> > > restrict
> > > +  the DMA to a predefined memory region.
> >
> > Heya!
> >
> > I think I am missing something obvious here so please bear with my
> > questions:
> >
> >  - This code adds the means of having the SWIOTLB pool tied to a specific
> >memory correct?
> 
> It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
> code to create another DMA pool tied to a specific memory region for a given 
> set
> of devices. It bounces the streaming DMA (map/unmap) in and out of that region
> and does the memory allocation (dma_direct_alloc) from the same region.

Right, so why can't it follow the same mechanism that Xen SWIOTLB does - which
had exactly the same problem (needed special handling on the pool) - and do
a similar code?

> 
> >
> >
> >  - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> >That is if an errant device screwed up the length or DMA address, the
> >SWIOTLB would gladly do what the device told it do?
> 
> So the system needs to provide a way to lock down the memory access, e.g. MPU.

OK! Would it be prudent to have this in the description above perhaps?
> 
> >
> >  - This has to be combined with SWIOTLB-force-ish to always use the
> >bounce buffer, otherwise you could still do DMA without using
> >SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
> 
> Since restricted DMA is for the devices that are not behind an IOMMU, I change
> the criteria
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
> to
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
> in dma_direct_map_page().
> 
> Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
> (get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).
> 
> Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-07 Thread Konrad Rzeszutek Wilk
On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
> Hi Greg and Konrad,
> 
> This change is intended to be non-arch specific. Any arch that lacks DMA 
> access
> control and has devices not behind an IOMMU can make use of it. Could you 
> share
> why you think this should be arch specific?

The idea behind non-arch specific code is it to be generic. The devicetree
is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
be in arch specific code.

> 
> Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

2021-01-06 Thread Konrad Rzeszutek Wilk
On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> Introduce the new compatible string, restricted-dma-pool, for restricted
> DMA. One can specify the address and length of the restricted DMA memory
> region by restricted-dma-pool in the device tree.
> 
> Signed-off-by: Claire Chang 
> ---
>  .../reserved-memory/reserved-memory.txt   | 24 +++
>  1 file changed, 24 insertions(+)
> 
> diff --git 
> a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> index e8d3096d922c..44975e2a1fd2 100644
> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
>used as a shared pool of DMA buffers for a set of devices. It can
>be used by an operating system to instantiate the necessary pool
>management subsystem if necessary.
> +- restricted-dma-pool: This indicates a region of memory meant to be
> +  used as a pool of restricted DMA buffers for a set of devices. The
> +  memory region would be the only region accessible to those devices.
> +  When using this, the no-map and reusable properties must not be 
> set,
> +  so the operating system can create a virtual mapping that will be 
> used
> +  for synchronization. The main purpose for restricted DMA is to
> +  mitigate the lack of DMA access control on systems without an 
> IOMMU,
> +  which could result in the DMA accessing the system memory at
> +  unexpected times and/or unexpected addresses, possibly leading to 
> data
> +  leakage or corruption. The feature on its own provides a basic 
> level
> +  of protection against the DMA overwriting buffer contents at
> +  unexpected times. However, to protect against general data leakage 
> and
> +  system memory corruption, the system needs to provide way to 
> restrict
> +  the DMA to a predefined memory region.

Heya!

I think I am missing something obvious here so please bear with my
questions:

 - This code adds the means of having the SWIOTLB pool tied to a specific
   memory correct?

 - Nothing stops the physical device from bypassing the SWIOTLB buffer.
   That is if an errant device screwed up the length or DMA address, the
   SWIOTLB would gladly do what the device told it do?

 - This has to be combined with SWIOTLB-force-ish to always use the
   bounce buffer, otherwise you could still do DMA without using
   SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool

2021-01-06 Thread Konrad Rzeszutek Wilk
Hello!

In this file:

> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index e4368159f88a..7fb2ac087d23 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
..

> +static const struct reserved_mem_ops rmem_swiotlb_ops = {
> + .device_init= rmem_swiotlb_device_init,
> + .device_release = rmem_swiotlb_device_release,
> +};
> +
> +static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
> +{
> + unsigned long node = rmem->fdt_node;
> +
> + if (of_get_flat_dt_prop(node, "reusable", NULL) ||
> + of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
> + of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
> + of_get_flat_dt_prop(node, "no-map", NULL))
> + return -EINVAL;
> +
> + rmem->ops = &rmem_swiotlb_ops;
> + pr_info("Reserved memory: created device swiotlb memory pool at %pa, 
> size %ld MiB\n",
> + &rmem->base, (unsigned long)rmem->size / SZ_1M);
> + return 0;
> +}
> +
> +RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);

The code should be as much as possible arch-agnostic. That is why there
are multiple -swiotlb files scattered in arch directories that own the
architecture specific code.

Would it be possible to move the code there and perhaps have a ARM
specific front-end for this DMA restricted pool there? See for example
the xen-swiotlb code.

Cheers!

Konrad
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/iommu: Fix two minimal issues in check_iommu_entries()

2021-01-04 Thread Konrad Rzeszutek Wilk
On Wed, Dec 23, 2020 at 02:24:12PM +0800, Zhenzhong Duan wrote:
> check_iommu_entries() checks for cyclic dependency in iommu entries
> and fixes the cyclic dependency by setting x->depend to NULL. But
> this repairing isn't correct if q is in front of p, there will be
> "EXECUTION ORDER INVALID!" report following. Fix it by NULLing
> whichever in the front.
> 
> The second issue is about the report of exectuion order reverse,
> the order is reversed incorrectly in the report, fix it.

Heya!

When you debugged this, did you by any chance save the
serial logs and the debug logs to double-check it?

Thanks!
> 
> Signed-off-by: Zhenzhong Duan 
> ---
>  arch/x86/kernel/pci-iommu_table.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/pci-iommu_table.c 
> b/arch/x86/kernel/pci-iommu_table.c
> index 2e9006c..40c8249 100644
> --- a/arch/x86/kernel/pci-iommu_table.c
> +++ b/arch/x86/kernel/pci-iommu_table.c
> @@ -60,7 +60,10 @@ void __init check_iommu_entries(struct iommu_table_entry 
> *start,
>   printk(KERN_ERR "CYCLIC DEPENDENCY FOUND! %pS depends 
> on %pS and vice-versa. BREAKING IT.\n",
>  p->detect, q->detect);
>   /* Heavy handed way..*/
> - x->depend = NULL;
> + if (p > q)
> + q->depend = NULL;
> + else
> + p->depend = NULL;
>   }
>   }
>  
> @@ -68,7 +71,7 @@ void __init check_iommu_entries(struct iommu_table_entry 
> *start,
>   q = find_dependents_of(p, finish, p);
>   if (q && q > p) {
>   printk(KERN_ERR "EXECUTION ORDER INVALID! %pS should be 
> called before %pS!\n",
> -p->detect, q->detect);
> +q->detect, p->detect);
>   }
>   }
>  }
> -- 
> 1.8.3.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/virtio: unchecked device dma address and length

2020-12-16 Thread Konrad Rzeszutek Wilk
..snip..
>> > > This raises two issues:
>> > > 1) swiotlb_tlb_unmap_single fails to check whether the index
>generated
>> > > from the dma_addr is in range of the io_tlb_orig_addr array.
>> > That is fairly simple to implement I would think. That is it can
>check
>> > that the dma_addr is from the PA in the io_tlb pool when
>SWIOTLB=force
>> > is used.
>> 
>> 
>> I'm not sure this can fix all the cases. It looks to me we should map
>> descriptor coherent but readonly (which is not supported by current
>DMA
>> API).
>
>Neither is this supported but encrypted memory technologies.


-ECONFUSED.

Could you state this once more please? I am not exactly sure what you are 
saying 

>
>> Otherwise, device can modify the desc[i].addr/desc[i].len at any time
>to
>> pretend a valid mapping.
>> 
>> Thanks
>> 
>> 
>> > 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/virtio: unchecked device dma address and length

2020-12-16 Thread Konrad Rzeszutek Wilk
On December 16, 2020 1:41:48 AM EST, Jason Wang  wrote:
>
>
>- Original Message -
>> 
>> 
>> - Original Message -
>> > .snip.
>> > > > > This raises two issues:
>> > > > > 1) swiotlb_tlb_unmap_single fails to check whether the index
>> > > > > generated
>> > > > > from the dma_addr is in range of the io_tlb_orig_addr array.
>> > > > That is fairly simple to implement I would think. That is it
>can check
>> > > > that the dma_addr is from the PA in the io_tlb pool when
>SWIOTLB=force
>> > > > is used.
>> > > 
>> > > 
>> > > I'm not sure this can fix all the cases. It looks to me we should
>map
>> > > descriptor coherent but readonly (which is not supported by
>current DMA
>> > > API).
>> > 
>> > I think I am missing something obvious here. The attacker is the
>> > hypervisor,
>> > aka
>> > the owner of the VirtIO device (ring0). The attacker is the one
>that
>> > provides the addr/len - having that readonly from a guest
>perspective
>> > does not change the fact that the hypervisor can modify the memory
>range
>> > by mapping it via a different virtual address in the hypervisor?
>(aka
>> > aliasing it).
>> 
>> Right, but if we allow hypervisor to provide arbitrary addr/len, does
>> it mean hypervisor can read encrypted content of encrypted memory of
>> guest through swiotlb?

Yes .
>> 
>> Thanks
>
>Actually not. I think you're right.


Your sentence is very confusing.

On a DMA unmap SWIOTLB (when force is used) it trusts the driver from providing 
the correct DMA address and length which SWIOTLB uses to match to its 
associated original PA address.

Think original PA having a mapping to a PA in the SWIOTLB pool.


The length is not checked so the attacker can modify that to say a huge number 
and cause SWIOTLB bounce code to write or read data from non SWIOTLB PA into 
the SWIOTLB PA pool.




>
>Thanks
>
>> 
>> > > 
>> > > Otherwise, device can modify the desc[i].addr/desc[i].len at any
>time to
>> > > pretend a valid mapping.
>> > 
>> > With the swiotlb=force as long as addr/len are within the PA
>boundaries
>> > within the SWIOTLB pool this should be OK?
>> > 
>> > After all that whole area is in cleartext and visible to the
>attacker.
>> > 
>> > 
>> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/virtio: unchecked device dma address and length

2020-12-15 Thread Konrad Rzeszutek Wilk
On Tue, Dec 15, 2020 at 11:54:08AM +0100, Felicitas Hetzelt wrote:
> Hello,
> thank you all for looking into this! To answer some of the questions:
>  - Did you have already some PoC fixes for this:
>We don't have a full PoC or fix currently. Thought we have a PoC
>with which were able to overwrite memory outside of the mapped
>dma region.
>  - Is there a CVE associated with this?
>No
>  - Is there a paper on this you all are working on?
>Yes, we were planning to use this bug (among others
>in a paper)
> 
> Could you point us to the intel thunder issue that you mentioned?

ThunderClap was it!

https://lwn.net/Articles/786558/

Cc-ing Lu Baolu ..

Hm, this was a year ago and it looks like there are some extra SWIOTLB
patches to be done ?

> 
> On 12/15/20 9:47 AM, Ashish Kalra wrote:
> > On Mon, Dec 14, 2020 at 04:49:50PM -0500, Konrad Rzeszutek Wilk wrote:
> >> On Fri, Dec 11, 2020 at 06:31:21PM +0100, Felicitas Hetzelt wrote:
> >>> Hello,
> >>
> >> Hi! Please see below my responses.
> >>
> >>> we have been analyzing the Hypervisor-OS interface of Linux
> >>> and discovered bugs in the swiotlb/virtio implementation that can be
> >>> triggered from a malicious Hypervisor / virtual device.
> >>> With SEV, the SWIOTLB implementation is forcefully enabled and would
> >>> always be used. Thus, all virtio devices and others would use it under
> >>> the hood.
> >>>
> >>> The reason for analyzing this interface is that, technologies such as
> >>> Intel's Trusted Domain Extensions [1] and AMD's Secure Nested Paging [2]
> >>> change the threat model assumed by various Linux kernel subsystems.
> >>> These technologies take the presence of a fully malicious hypervisor
> >>> into account and aim to provide protection for virtual machines in such
> >>> an environment. Therefore, all input received from the hypervisor or an
> >>> external device should be carefully validated. Note that these issues
> >>> are of little (or no) relevance in a "normal" virtualization setup,
> >>> nevertheless we believe that it is required to fix them if TDX or SNP is
> >>> used.
> >>>
> >>> We are happy to provide more information if needed!
> >>>
> >>> [1]
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsoftware.intel.com%2Fcontent%2Fwww%2Fus%2Fen%2Fdevelop%2Farticles%2Fintel-trust-domain-extensions.html&data=04%7C01%7Cashish.kalra%40amd.com%7C1d1cbca182a84c0e504708d8a079eec0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637435792867090126%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=THAJlYGLSOx3bKQYH62TLKH50By7Wnsu0z92snfNY84%3D&reserved=0
> >>>
> >>> [2] 
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fen%2Fprocessors%2Famd-secure-encrypted-virtualization&data=04%7C01%7Cashish.kalra%40amd.com%7C1d1cbca182a84c0e504708d8a079eec0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637435792867090126%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=M3jmYCWaEvmAzIy%2F4z5XstsPf812SbEkuNX5PVVr0HY%3D&reserved=0
> >>>
> >>> Bug:
> >>> OOB memory write.
> >>> dma_unmap_single -> swiotlb_tbl_unmap_single is invoked with dma_addr
> >>> and length parameters that are under control of the device.
> >>> This happens e.g. in virtio_ring:
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.10-rc7%2Fsource%2Fdrivers%2Fvirtio%2Fvirtio_ring.c%23L378&data=04%7C01%7Cashish.kalra%40amd.com%7C1d1cbca182a84c0e504708d8a079eec0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637435792867090126%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=j0CIi%2F8hBkVx45XGBtT4Ri52uWIOdOts%2BSbJ0kCB5B0%3D&reserved=0
> >>
> >> Heya!
> >>
> >> Thank you for pointing this out! I've a couple of questions and hope you 
> >> can
> >> help me out with them.
> >>
> >> Also CC-ing AMD / TDX folks.
> >>>
> > 
> > Adding more relevant folks in AMD.
> > 
> > Needless to say, the swiotlb code needs to validate this external untrusted 
> > input dma_addr and length parameters.
> > 
> > Thanks,
> > Ashish
> > 
> >>> This raises two issues:
> >>> 1) swiotlb_tlb_unmap_single fails to check whether th

Re: swiotlb/virtio: unchecked device dma address and length

2020-12-15 Thread Konrad Rzeszutek Wilk
.snip.
> > > This raises two issues:
> > > 1) swiotlb_tlb_unmap_single fails to check whether the index generated
> > > from the dma_addr is in range of the io_tlb_orig_addr array.
> > That is fairly simple to implement I would think. That is it can check
> > that the dma_addr is from the PA in the io_tlb pool when SWIOTLB=force
> > is used.
> 
> 
> I'm not sure this can fix all the cases. It looks to me we should map
> descriptor coherent but readonly (which is not supported by current DMA
> API).

I think I am missing something obvious here. The attacker is the hypervisor, aka
the owner of the VirtIO device (ring0). The attacker is the one that
provides the addr/len - having that readonly from a guest perspective
does not change the fact that the hypervisor can modify the memory range
by mapping it via a different virtual address in the hypervisor? (aka
aliasing it).
> 
> Otherwise, device can modify the desc[i].addr/desc[i].len at any time to
> pretend a valid mapping.

With the swiotlb=force as long as addr/len are within the PA boundaries
within the SWIOTLB pool this should be OK?

After all that whole area is in cleartext and visible to the attacker.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/virtio: unchecked device dma address and length

2020-12-14 Thread Konrad Rzeszutek Wilk
On Fri, Dec 11, 2020 at 06:31:21PM +0100, Felicitas Hetzelt wrote:
> Hello,

Hi! Please see below my responses.

> we have been analyzing the Hypervisor-OS interface of Linux
> and discovered bugs in the swiotlb/virtio implementation that can be
> triggered from a malicious Hypervisor / virtual device.
> With SEV, the SWIOTLB implementation is forcefully enabled and would
> always be used. Thus, all virtio devices and others would use it under
> the hood.
> 
> The reason for analyzing this interface is that, technologies such as
> Intel's Trusted Domain Extensions [1] and AMD's Secure Nested Paging [2]
> change the threat model assumed by various Linux kernel subsystems.
> These technologies take the presence of a fully malicious hypervisor
> into account and aim to provide protection for virtual machines in such
> an environment. Therefore, all input received from the hypervisor or an
> external device should be carefully validated. Note that these issues
> are of little (or no) relevance in a "normal" virtualization setup,
> nevertheless we believe that it is required to fix them if TDX or SNP is
> used.
> 
> We are happy to provide more information if needed!
> 
> [1]
> https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
> 
> [2] https://www.amd.com/en/processors/amd-secure-encrypted-virtualization
> 
> Bug:
> OOB memory write.
> dma_unmap_single -> swiotlb_tbl_unmap_single is invoked with dma_addr
> and length parameters that are under control of the device.
> This happens e.g. in virtio_ring:
> https://elixir.bootlin.com/linux/v5.10-rc7/source/drivers/virtio/virtio_ring.c#L378

Heya!

Thank you for pointing this out! I've a couple of questions and hope you can
help me out with them.

Also CC-ing AMD / TDX folks.
> 
> This raises two issues:
> 1) swiotlb_tlb_unmap_single fails to check whether the index generated
> from the dma_addr is in range of the io_tlb_orig_addr array.

That is fairly simple to implement I would think. That is it can check
that the dma_addr is from the PA in the io_tlb pool when SWIOTLB=force
is used.

> 2) when swiotlb_bounce is called the device controls the length of the
> memory copied to the cpu address.

So.. this sounds very similar to the Intel Thunder.. something issue
where this exact issue was fixed by handing the DMA off to the SWIOTLB
bounce code.

But if that is broken, then that CVE is still not fixed?

So the issue here is that swiotlb_tbl_unmap_single(..,mapping_size,) is
under the attacker control. Ugh.

One way could be to have a io_tlb_orig_addr-ish array with the length
of mappings to double check?

Couple more questions:
 - Did you have already some PoC fixes for this? 
 - Is there a CVE associated with this?
 - Is there a paper on this you all are working on?

Thank you!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] [PATCH] Keep offset when mapping data via SWIOTLB.

2020-12-11 Thread Konrad Rzeszutek Wilk
On Mon, Dec 07, 2020 at 01:42:04PM -0800, Jianxiong Gao wrote:
> NVMe driver and other applications depend on the data offset
> to operate correctly. Currently when unaligned data is mapped via
> SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
> booting with --swiotlb=force option and using NVMe as interface,
> running mkfs.xfs on Rhel fails because of the unalignment issue.
> This patch makes sure the mapped data preserves
> its offset of the orginal address. Tested on latest kernel that
> this patch fixes the issue.
> 
> Signed-off-by: Jianxiong Gao 
> Acked-by: David Rientjes 

This breaks DHCP with upstream kernel (applied this on top v5.10-rc7)
and used swiotlb=262144,force and now the dhclient is not working:

[  119.300502] bnxt_en :3b:00.0 eno2np0: NIC Link is Up, 25000 Mbps full 
duplex, Flow control: ON - receive & transmit
[  119.437573] bnxt_en :3b:00.0 eno2np0: FEC autoneg off encoding: None
[   90.064220] dracut-initqueue[1477]: Warning: dhcp for interface eno2np0 
failed
[  101.155295] dracut-initqueue[1477]: Warning: dhcp for interfa[  142.361359] 
bnxt_en :3b:00.1 eno3np1: NIC Link is Up, 25000 Mbps full duplex, Flow 
control: ON - receive & transmit
ce eno2np0 faile[  142.501860] bnxt_en :3b:00.1 eno3np1: FEC autoneg off 
encoding: None
d
[  113.054108] dracut-initqueue[1477]: Warning: dhcp for interface eno3np1 
failed
[  123.867108] dracut-initqueue[1477]: Warning: dhcp for interface eno3np1 
failed
[  251.888002] dracut-initqueue[1477]: Warning: dracut-initqueue timeout - 
starting timeout scripts

Dropping from linux-next.

> ---
>  kernel/dma/swiotlb.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 781b9dca197c..56a35e71b3fd 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -483,6 +483,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>   max_slots = mask + 1
>   ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
>   : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> + 
> + /*
> +  * We need to keep the offset when mapping, so adding the offset
> +  * to the total set we need to allocate in SWIOTLB
> +  */
> + alloc_size += offset_in_page(orig_addr);
>  
>   /*
>* For mappings greater than or equal to a page, we limit the stride
> @@ -567,6 +573,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>*/
>   for (i = 0; i < nslots; i++)
>   io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> + /*
> +  * When keeping the offset of the original data, we need to advance
> +  * the tlb_addr by the offset of orig_addr.
> +  */
> + tlb_addr += orig_addr & (PAGE_SIZE - 1);
>   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
> DMA_TO_DEVICE);
> -- 
> 2.27.0
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-12-08 Thread Konrad Rzeszutek Wilk
On December 8, 2020 6:01:19 PM EST, Borislav Petkov  wrote:
>On Tue, Dec 08, 2020 at 05:22:20PM -0500, Konrad Rzeszutek Wilk wrote:
>> I will fix it up.
>
>So who's picking this up? If not me then I probably should have a
>detailed look at the x86 bits before it goes in...

I was planning to pick this up (got one more SWIOTLB related patch).

That said  if you have the time to take a peek at the x86 bits -  that would be 
awesome!


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-12-08 Thread Konrad Rzeszutek Wilk
On Mon, Dec 07, 2020 at 11:10:57PM +, Ashish Kalra wrote:
> From: Ashish Kalra 
> 
> For SEV, all DMA to and from guest has to use shared (un-encrypted) pages.
> SEV uses SWIOTLB to make this happen without requiring changes to device
> drivers.  However, depending on workload being run, the default 64MB of
> SWIOTLB might not be enough and SWIOTLB may run out of buffers to use
> for DMA, resulting in I/O errors and/or performance degradation for
> high I/O workloads.
> 
> Adjust the default size of SWIOTLB for SEV guests using a
> percentage of the total memory available to guest for SWIOTLB buffers.
> 
> Using late_initcall() interface to invoke swiotlb_adjust() does not
> work as the size adjustment needs to be done before mem_encrypt_init()
> and reserve_crashkernel() which use the allocated SWIOTLB buffer size,
> hence call it explicitly from setup_arch().
> 
> The SWIOTLB default size adjustment needs to be added as an architecture
> specific interface/callback to allow architectures such as those supporting
> memory encryption to adjust/expand SWIOTLB size for their use.
> 
> v5 fixed build errors and warnings as
> Reported-by: kbuild test robot 
> 
> Signed-off-by: Ashish Kalra 
> ---
>  arch/x86/kernel/setup.c   |  2 ++
>  arch/x86/mm/mem_encrypt.c | 37 +
>  include/linux/swiotlb.h   |  6 ++
>  kernel/dma/swiotlb.c  | 22 ++
>  4 files changed, 67 insertions(+)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 84f581c91db4..31e24e198061 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1149,6 +1149,8 @@ void __init setup_arch(char **cmdline_p)
>   if (boot_cpu_has(X86_FEATURE_GBPAGES))
>   hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
>  
> + swiotlb_adjust();
> +
>   /*
>* Reserve memory for crash kernel after SRAT is parsed so that it
>* won't consume hotpluggable memory.
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 1bcfbcd2bfd7..d1b8d60040cf 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -485,7 +485,44 @@ static void print_mem_encrypt_feature_info(void)
>   pr_cont("\n");
>  }
>  
> +/*
> + * The percentage of guest memory used here for SWIOTLB buffers
> + * is more of an approximation of the static adjustment which
> + * is 128M for <1G guests, 256M for 1G-4G guests and 512M for >4G guests.

No?

it is 64MB for <1G, and ~128M to 256M for 1G-to-4G

I will fix it up.
> + */
> +#define SEV_ADJUST_SWIOTLB_SIZE_PERCENT  6
> +
>  /* Architecture __weak replacement functions */
> +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> +{
> + unsigned long size = iotlb_default_size;
> +
> + /*
> +  * For SEV, all DMA has to occur via shared/unencrypted pages.
> +  * SEV uses SWOTLB to make this happen without changing device
> +  * drivers. However, depending on the workload being run, the
> +  * default 64MB of SWIOTLB may not be enough and`SWIOTLB may
> +  * run out of buffers for DMA, resulting in I/O errors and/or
> +  * performance degradation especially with high I/O workloads.
> +  * Adjust the default size of SWIOTLB for SEV guests using
> +  * a percentage of guest memory for SWIOTLB buffers.
> +  * Also as the SWIOTLB bounce buffer memory is allocated
> +  * from low memory, ensure that the adjusted size is within
> +  * the limits of low available memory.
> +  *
> +  */
> + if (sev_active()) {
> + phys_addr_t total_mem = memblock_phys_mem_size();
> +
> + size = total_mem * SEV_ADJUST_SWIOTLB_SIZE_PERCENT / 100;
> + size = clamp_val(size, iotlb_default_size, SZ_1G);
> + pr_info("SWIOTLB bounce buffer size adjusted to %luMB for SEV",
> + size >> 20);
> + }
> +
> + return size;
> +}
> +
>  void __init mem_encrypt_init(void)
>  {
>   if (!sme_me_mask)
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 3bb72266a75a..b5904fa4b67c 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
>  int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
>  extern unsigned long swiotlb_nr_tbl(void);
>  unsigned long swiotlb_size_or_default(void);
> +unsigned long __init arch_swiotlb_adjust(unsigned long size);
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
>  extern int swiotlb_late_init_with_default_size(size_t default_size);
>  extern void __init swiotlb_update_mem_attributes(void);
> @@ -77,6 +78,7 @@ void __init swiotlb_exit(void);
>  unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
>  bool is_swiotlb_active(void);
> +void __init swiotlb_adjust(void);
>  #else
>  #define swiotlb_force SWIOTLB_NO_FORCE
>  static inline bool is_swi

Re: [PATCH] [PATCH] Keep offset when mapping data via SWIOTLB.

2020-12-08 Thread Konrad Rzeszutek Wilk
On Mon, Dec 07, 2020 at 01:42:04PM -0800, Jianxiong Gao wrote:
> NVMe driver and other applications depend on the data offset
> to operate correctly. Currently when unaligned data is mapped via
> SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
> booting with --swiotlb=force option and using NVMe as interface,
> running mkfs.xfs on Rhel fails because of the unalignment issue.
> This patch makes sure the mapped data preserves
> its offset of the orginal address. Tested on latest kernel that
> this patch fixes the issue.

Lets reword this comment a bit more since you are not providing
the RHEL Bug, and instead are focusing on the upstream kernel.

I can do that for you..

> 
> Signed-off-by: Jianxiong Gao 
> Acked-by: David Rientjes 
> ---
>  kernel/dma/swiotlb.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 781b9dca197c..56a35e71b3fd 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -483,6 +483,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>   max_slots = mask + 1
>   ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
>   : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> + 
> + /*
> +  * We need to keep the offset when mapping, so adding the offset
> +  * to the total set we need to allocate in SWIOTLB
> +  */
> + alloc_size += offset_in_page(orig_addr);
>  
>   /*
>* For mappings greater than or equal to a page, we limit the stride
> @@ -567,6 +573,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>*/
>   for (i = 0; i < nslots; i++)
>   io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> + /*
> +  * When keeping the offset of the original data, we need to advance
> +  * the tlb_addr by the offset of orig_addr.
> +  */
> + tlb_addr += orig_addr & (PAGE_SIZE - 1);
>   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
> DMA_TO_DEVICE);
> -- 
> 2.27.0
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-12-01 Thread Konrad Rzeszutek Wilk
On Tue, Nov 24, 2020 at 11:46:22PM +, Ashish Kalra wrote:
> Hello Konrad, 
> 
> On Mon, Nov 23, 2020 at 10:56:31PM +, Ashish Kalra wrote:
> > Hello Konrad,
> > 
> > On Mon, Nov 23, 2020 at 12:56:32PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Mon, Nov 23, 2020 at 06:06:47PM +0100, Borislav Petkov wrote:
> > > > On Thu, Nov 19, 2020 at 09:42:05PM +, Ashish Kalra wrote:
> > > > > From: Ashish Kalra 
> > > > > 
> > > > > For SEV, all DMA to and from guest has to use shared (un-encrypted) 
> > > > > pages.
> > > > > SEV uses SWIOTLB to make this happen without requiring changes to 
> > > > > device
> > > > > drivers.  However, depending on workload being run, the default 64MB 
> > > > > of
> > > > > SWIOTLB might not be enough and SWIOTLB may run out of buffers to use
> > > > > for DMA, resulting in I/O errors and/or performance degradation for
> > > > > high I/O workloads.
> > > > > 
> > > > > Increase the default size of SWIOTLB for SEV guests using a minimum
> > > > > value of 128MB and a maximum value of 512MB, determining on amount
> > > > > of provisioned guest memory.
> > > > 
> > > > That sentence needs massaging.
> > > > 
> > > > > Using late_initcall() interface to invoke swiotlb_adjust() does not
> > > > > work as the size adjustment needs to be done before mem_encrypt_init()
> > > > > and reserve_crashkernel() which use the allocated SWIOTLB buffer size,
> > > > > hence calling it explicitly from setup_arch().
> > > > 
> > > > "hence call it ... "
> > > > 
> > > > > 
> > > > > The SWIOTLB default size adjustment is added as an architecture 
> > > > > specific
> > > > 
> > > > "... is added... " needs to be "Add ..."
> > > > 
> > > > > interface/callback to allow architectures such as those supporting 
> > > > > memory
> > > > > encryption to adjust/expand SWIOTLB size for their use.
> > > > > 
> > > > > v5 fixed build errors and warnings as
> > > > > Reported-by: kbuild test robot 
> > > > > 
> > > > > Signed-off-by: Ashish Kalra 
> > > > > ---
> > > > >  arch/x86/kernel/setup.c   |  2 ++
> > > > >  arch/x86/mm/mem_encrypt.c | 32 
> > > > >  include/linux/swiotlb.h   |  6 ++
> > > > >  kernel/dma/swiotlb.c  | 24 
> > > > >  4 files changed, 64 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > index 3511736fbc74..b073d58dd4a3 100644
> > > > > --- a/arch/x86/kernel/setup.c
> > > > > +++ b/arch/x86/kernel/setup.c
> > > > > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > > > >   if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > > > >   hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> > > > >  
> > > > > + swiotlb_adjust();
> > > > > +
> > > > >   /*
> > > > >* Reserve memory for crash kernel after SRAT is parsed so that 
> > > > > it
> > > > >* won't consume hotpluggable memory.
> > > > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > > > > index 3f248f0d0e07..c79a0d761db5 100644
> > > > > --- a/arch/x86/mm/mem_encrypt.c
> > > > > +++ b/arch/x86/mm/mem_encrypt.c
> > > > > @@ -490,6 +490,38 @@ static void print_mem_encrypt_feature_info(void)
> > > > >  }
> > > > >  
> > > > >  /* Architecture __weak replacement functions */
> > > > > +unsigned long __init arch_swiotlb_adjust(unsigned long 
> > > > > iotlb_default_size)
> > > > > +{
> > > > > + unsigned long size = 0;
> > > > 
> > > > unsigned long size = iotlb_default_size;
> > > > 
> > > > > +
> > > > > + /*
> > > > > +  * For SEV, all DMA has to occur via shared/unencrypted pages.
> > > > > +  * SEV uses SWOTLB to make this happen without changing device
> > > > > +  * drivers. However, depe

Re: [PATCH 1/1] x86/tboot: Don't disable swiotlb when iommu is forced on

2020-11-25 Thread Konrad Rzeszutek Wilk
On Wed, Nov 25, 2020 at 03:51:30PM +, Will Deacon wrote:
> Hi Konrad,
> 
> On Wed, Nov 25, 2020 at 10:41:53AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Nov 25, 2020 at 02:05:15PM +, Will Deacon wrote:
> > > On Wed, 25 Nov 2020 09:41:24 +0800, Lu Baolu wrote:
> > > > After commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA
> > > > domain"), swiotbl could also be used for direct memory access if IOMMU
> > > > is enabled but a device is configured to pass through the DMA 
> > > > translation.
> > > > Keep swiotlb when IOMMU is forced on, otherwise, some devices won't work
> > > > if "iommu=pt" kernel parameter is used.
> > > 
> > > Applied to arm64 (for-next/iommu/fixes), thanks!
> > > 
> > > [1/1] x86/tboot: Don't disable swiotlb when iommu is forced on
> > >   https://git.kernel.org/arm64/c/e2be2a833ab5
> > 
> > But tboot never ran on ARM. It is a Intel specifc.
> > 
> > I think either me or Thomas should take this patch.
> 
> FWIW, I did check with Thomas before I picked it up. I know it looks weird
> going via arm64, but that's only because I'm temporarily handling the IOMMU
> tree there (including vt-d changes) while Joerg is away. Since this fixes a
> vt-d regression, I thought I'd pick it up along with the other IOMMU fixes I
> have queued for -rc6.
> 

Aah, I missed the memo :-)

> That said, if you insist, then I can revert it. I'm really only trying to
> help here.

Nah. Enjoy picking up patches!

Thanks!
> 
> Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] x86/tboot: Don't disable swiotlb when iommu is forced on

2020-11-25 Thread Konrad Rzeszutek Wilk
On Wed, Nov 25, 2020 at 02:05:15PM +, Will Deacon wrote:
> On Wed, 25 Nov 2020 09:41:24 +0800, Lu Baolu wrote:
> > After commit 327d5b2fee91c ("iommu/vt-d: Allow 32bit devices to uses DMA
> > domain"), swiotbl could also be used for direct memory access if IOMMU
> > is enabled but a device is configured to pass through the DMA translation.
> > Keep swiotlb when IOMMU is forced on, otherwise, some devices won't work
> > if "iommu=pt" kernel parameter is used.
> 
> Applied to arm64 (for-next/iommu/fixes), thanks!
> 
> [1/1] x86/tboot: Don't disable swiotlb when iommu is forced on
>   https://git.kernel.org/arm64/c/e2be2a833ab5

But tboot never ran on ARM. It is a Intel specifc.

I think either me or Thomas should take this patch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] [PATCH] Adding offset keeping option when mapping data via SWIOTLB.

2020-11-24 Thread Konrad Rzeszutek Wilk
On Mon, Nov 23, 2020 at 02:18:07PM -0800, Jianxiong Gao wrote:
> NVMe driver and other applications may depend on the data offset
> to operate correctly. Currently when unaligned data is mapped via
> SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
> booting with --swiotlb=force option and using NVMe as interface,
> running mkfs.xfs on Rhel fails because of the unalignment issue.

RHEL? So a specific RHEL kernel. Is there a Red Hat bug created
for this that can be linked to this patch to make it easier
for folks to figure this?

Why would you be using swiotlb=force?
Ah, you are using AMD SEV!

> This patch adds an option to make sure the mapped data preserves
> its offset of the orginal addrss. Tested on latest kernel that

s/addrss/address/
> this patch fixes the issue.
> 
> Signed-off-by: Jianxiong Gao 
> Acked-by: David Rientjes 
> ---
>  drivers/nvme/host/pci.c |  3 ++-
>  include/linux/dma-mapping.h |  8 
>  kernel/dma/swiotlb.c| 13 +
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 0578ff253c47..a366fb8a1ff0 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -833,7 +833,8 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, 
> struct request *req,
>   iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN);
>   else
>   nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents,
> -  rq_dma_dir(req), DMA_ATTR_NO_WARN);
> + rq_dma_dir(req),
> + DMA_ATTR_NO_WARN|DMA_ATTR_SWIOTLB_KEEP_OFFSET);
>   if (!nr_mapped)
>   goto out;
>  
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index 956151052d45..e46d23d9fa20 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -61,6 +61,14 @@
>   */
>  #define DMA_ATTR_PRIVILEGED  (1UL << 9)
>  
> +/*
> + * DMA_ATTR_SWIOTLB_KEEP_OFFSET: used to indicate that the buffer has to keep
> + * its offset when mapped via SWIOTLB. Some application functionality depends
> + * on the address offset, thus when buffers are mapped via SWIOTLB, the 
> offset
> + * needs to be preserved.
> + */
> +#define DMA_ATTR_SWIOTLB_KEEP_OFFSET (1UL << 10)
> +
>  /*
>   * A dma_addr_t can hold any valid DMA or bus address for the platform.  It 
> can
>   * be given to a device to use as a DMA source or target.  It is specific to 
> a
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 781b9dca197c..f43d7be1342d 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -483,6 +483,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>   max_slots = mask + 1
>   ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
>   : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> + 
> + /*
> +  * If we need to keep the offset when mapping, we need to add the offset
> +  * to the total set we need to allocate in SWIOTLB
> +  */
> + if (attrs & DMA_ATTR_SWIOTLB_KEEP_OFFSET)
> + alloc_size += offset_in_page(orig_addr);
>  
>   /*
>* For mappings greater than or equal to a page, we limit the stride
> @@ -567,6 +574,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, 
> phys_addr_t orig_addr,
>*/
>   for (i = 0; i < nslots; i++)
>   io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> + /*
> +  * When keeping the offset of the original data, we need to advance
> +  * the tlb_addr by the offset of orig_addr.
> +  */
> + if (attrs & DMA_ATTR_SWIOTLB_KEEP_OFFSET)
> + tlb_addr += orig_addr & (PAGE_SIZE - 1);
>   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
> DMA_TO_DEVICE);
> -- 
> 2.27.0
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-23 Thread Konrad Rzeszutek Wilk
On Mon, Nov 23, 2020 at 07:02:15PM +0100, Borislav Petkov wrote:
> On Mon, Nov 23, 2020 at 12:56:32PM -0500, Konrad Rzeszutek Wilk wrote:
> > This is not going to work for TDX. I think having a registration
> > to SWIOTLB to have this function would be better going forward.
> > 
> > As in there will be a swiotlb_register_adjuster() which AMD SEV
> > code can call at start, also TDX can do it (and other platforms).
> 
> Oh do tell. It doesn't need to adjust size?

I am assuming that TDX is going to have the same exact issue that 
AMD SEV will have.

Are you recommending to have an unified x86 specific callback
where we check if it:

 - CPUID_AMD_SEV or CPUID_INTEL_TDX is set, and
 - No vIOMMU present, then we adjust the size?

> 
> -- 
> Regards/Gruss,
> Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-23 Thread Konrad Rzeszutek Wilk
On Mon, Nov 23, 2020 at 06:06:47PM +0100, Borislav Petkov wrote:
> On Thu, Nov 19, 2020 at 09:42:05PM +, Ashish Kalra wrote:
> > From: Ashish Kalra 
> > 
> > For SEV, all DMA to and from guest has to use shared (un-encrypted) pages.
> > SEV uses SWIOTLB to make this happen without requiring changes to device
> > drivers.  However, depending on workload being run, the default 64MB of
> > SWIOTLB might not be enough and SWIOTLB may run out of buffers to use
> > for DMA, resulting in I/O errors and/or performance degradation for
> > high I/O workloads.
> > 
> > Increase the default size of SWIOTLB for SEV guests using a minimum
> > value of 128MB and a maximum value of 512MB, determining on amount
> > of provisioned guest memory.
> 
> That sentence needs massaging.
> 
> > Using late_initcall() interface to invoke swiotlb_adjust() does not
> > work as the size adjustment needs to be done before mem_encrypt_init()
> > and reserve_crashkernel() which use the allocated SWIOTLB buffer size,
> > hence calling it explicitly from setup_arch().
> 
> "hence call it ... "
> 
> > 
> > The SWIOTLB default size adjustment is added as an architecture specific
> 
> "... is added... " needs to be "Add ..."
> 
> > interface/callback to allow architectures such as those supporting memory
> > encryption to adjust/expand SWIOTLB size for their use.
> > 
> > v5 fixed build errors and warnings as
> > Reported-by: kbuild test robot 
> > 
> > Signed-off-by: Ashish Kalra 
> > ---
> >  arch/x86/kernel/setup.c   |  2 ++
> >  arch/x86/mm/mem_encrypt.c | 32 
> >  include/linux/swiotlb.h   |  6 ++
> >  kernel/dma/swiotlb.c  | 24 
> >  4 files changed, 64 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 3511736fbc74..b073d58dd4a3 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> >  
> > +   swiotlb_adjust();
> > +
> > /*
> >  * Reserve memory for crash kernel after SRAT is parsed so that it
> >  * won't consume hotpluggable memory.
> > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > index 3f248f0d0e07..c79a0d761db5 100644
> > --- a/arch/x86/mm/mem_encrypt.c
> > +++ b/arch/x86/mm/mem_encrypt.c
> > @@ -490,6 +490,38 @@ static void print_mem_encrypt_feature_info(void)
> >  }
> >  
> >  /* Architecture __weak replacement functions */
> > +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> > +{
> > +   unsigned long size = 0;
> 
>   unsigned long size = iotlb_default_size;
> 
> > +
> > +   /*
> > +* For SEV, all DMA has to occur via shared/unencrypted pages.
> > +* SEV uses SWOTLB to make this happen without changing device
> > +* drivers. However, depending on the workload being run, the
> > +* default 64MB of SWIOTLB may not be enough & SWIOTLB may
>^
> 
> Use words pls, not "&".
> 
> 
> > +* run out of buffers for DMA, resulting in I/O errors and/or
> > +* performance degradation especially with high I/O workloads.
> > +* Increase the default size of SWIOTLB for SEV guests using
> > +* a minimum value of 128MB and a maximum value of 512MB,
> > +* depending on amount of provisioned guest memory.
> > +*/
> > +   if (sev_active()) {
> > +   phys_addr_t total_mem = memblock_phys_mem_size();
> > +
> > +   if (total_mem <= SZ_1G)
> > +   size = max(iotlb_default_size, (unsigned long) SZ_128M);
> > +   else if (total_mem <= SZ_4G)
> > +   size = max(iotlb_default_size, (unsigned long) SZ_256M);

That is eating 128MB for 1GB, aka 12% of the guest memory allocated statically 
for this.

And for guests that are 2GB, that is 12% until it gets to 3GB when it is 8%
and then 6% at 4GB.

I would prefer this to be based on your memory count, that is 6% of total
memory. And then going forward we can allocate memory _after_ boot and then 
stich
the late SWIOTLB pool and allocate on demand.



> > +   else
> > +   size = max(iotlb_default_size, (unsigned long) SZ_512M);
> > +
> > +   pr_info("SWIOTLB bounce buffer size adjusted to %luMB for SEV 
> > platform",
> 
> just "... for SEV" - no need for "platform".
> 
> ...
> 
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index c19379fabd20..3be9a19ea0a5 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -163,6 +163,30 @@ unsigned long swiotlb_size_or_default(void)
> > return size ? size : (IO_TLB_DEFAULT_SIZE);
> >  }
> >  
> > +unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
> > +{
> > +   return 0;
> 
> That, of course, needs to return size, not 0.

This is not going to work for TDX. I think havi

Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-17 Thread Konrad Rzeszutek Wilk
On Tue, Nov 17, 2020 at 07:04:59PM +, Kalra, Ashish wrote:
> Hello Konrad,
> 
> Actually I didn’t get that, do you mean you are taking 1G and <=4G cases out 
> of the patch and only going to apply the >4G case as part of the patch ?

That was the thought, but now I am wondering how TDX is going to work
with this. That is the __weak won't work on distro kernel that has to
run on both AMD and Intel. Hmm.

Let me brush off the late-SWIOTLB patches that internally we developed some 
time ago.

> 
> Thanks,
> Ashish 
> 
> > On Nov 17, 2020, at 11:38 AM, Kalra, Ashish  wrote:
> > 
> > Hello Konrad, 
> > 
> >> On Tue, Nov 17, 2020 at 12:00:03PM -0500, Konrad Rzeszutek Wilk wrote:
> >> .snip..
> >>>>>> Lets break this down:
> >>>>>> 
> >>>>>> How does the performance improve for one single device if you increase 
> >>>>>> the SWIOTLB?
> >>>>>> Is there a specific device/driver that you can talk about that improve 
> >>>>>> with this patch?
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> Yes, these are mainly for multi-queue devices such as NICs or even
> >>>>> multi-queue virtio. 
> >>>>> 
> >>>>> This basically improves performance with concurrent DMA, hence,
> >>>>> basically multi-queue devices.
> >>>> 
> >>>> OK, and for _1GB_ guest - what are the "internal teams/external 
> >>>> customers" amount 
> >>>> of CPUs they use? Please lets use real use-cases.
> >>> 
> >>>>> I am sure you will understand we cannot share any external customer
> >>>>> data as all that customer information is proprietary.
> >>>>> 
> >>>>> In similar situation if you have to share Oracle data, you will
> >>>>> surely have the same concerns and i don't think you will be able
> >>>>> to share any such information externally, i.e., outside Oracle.
> >>>>> 
> >>>> I am asking for a simple query - what amount of CPUs does a 1GB
> >>>> guest have? The reason for this should be fairly obvious - if
> >>>> it is a 1vCPU, then there is no multi-queue and the existing
> >>>> SWIOTLB pool size as it is OK.
> >>>> 
> >>>> If however there are say 2 and multiqueue is enabled, that
> >>>> gives me an idea of how many you use and I can find out what
> >>>> the maximum pool size usage of virtio there is with that configuration.
> >>> 
> >>> Again we cannot share any customer data.
> >>> 
> >>> Also i don't think there can be a definitive answer to how many vCPUs a
> >>> 1GB guest will have, it will depend on what kind of configuration we are
> >>> testing.
> >>> 
> >>> For example, i usually setup 4-16 vCPUs for as low as 512M configured
> >>> gueest memory.
> >> 
> >> Sure, but you are not the normal user.
> >> 
> >> That is I don't like that for 1GB guests your patch ends up doubling the
> >> SWIOTLB memory pool. That seems to me we are trying to solve a problem
> >> that normal users will not hit. That is why I want 'here is the customer
> >> bug'.
> >> 
> >> Here is what I am going to do - I will take out the 1GB and 4GB case out of
> >> your patch and call it a day. If there are customers who start reporting 
> >> issues
> >> we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the
> >> customer if he or she would like to be recognized on upstream bugs).
> >> 
> > 
> > Ok.
> > 
> >> And in the meantime I am going to look about adding ..
> >>> 
> >>> I have been also testing with 16 vCPUs configuration for 512M-1G guest
> >>> memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
> >>> device environment.
> >> 
> >> .. late SWIOTLB expansion to stich the DMA pools together as both
> >> Mellanox and VirtIO are not 32-bit DMA bound.
> >> 
> >>> 
> >>> So we might be having less configured guest memory, but we still might
> >>> be using that configuration with I/O intensive workloads.
> >>> 
> > 
> > I am going to submit v4 of my current patch-set which uses max() instead
> > of clamp() and also replaces constants defined in this patch with the
> > pre-defined ones in sizes.h
> > 
> > Thanks,
> > Ashish
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-17 Thread Konrad Rzeszutek Wilk
.snip..
> > > > Lets break this down:
> > > > 
> > > > How does the performance improve for one single device if you increase 
> > > > the SWIOTLB?
> > > > Is there a specific device/driver that you can talk about that improve 
> > > > with this patch?
> > > > 
> > > > 
> > > 
> > > Yes, these are mainly for multi-queue devices such as NICs or even
> > > multi-queue virtio. 
> > > 
> > > This basically improves performance with concurrent DMA, hence,
> > > basically multi-queue devices.
> > 
> > OK, and for _1GB_ guest - what are the "internal teams/external customers" 
> > amount 
> > of CPUs they use? Please lets use real use-cases.
> 
> >> I am sure you will understand we cannot share any external customer
> >> data as all that customer information is proprietary.
> >>
> >> In similar situation if you have to share Oracle data, you will
> >> surely have the same concerns and i don't think you will be able
> >> to share any such information externally, i.e., outside Oracle.
> >>
> >I am asking for a simple query - what amount of CPUs does a 1GB
> >guest have? The reason for this should be fairly obvious - if
> >it is a 1vCPU, then there is no multi-queue and the existing
> >SWIOTLB pool size as it is OK.
> >
> >If however there are say 2 and multiqueue is enabled, that
> >gives me an idea of how many you use and I can find out what
> >the maximum pool size usage of virtio there is with that configuration.
> 
> Again we cannot share any customer data.
> 
> Also i don't think there can be a definitive answer to how many vCPUs a
> 1GB guest will have, it will depend on what kind of configuration we are
> testing.
> 
> For example, i usually setup 4-16 vCPUs for as low as 512M configured
> gueest memory.

Sure, but you are not the normal user.

That is I don't like that for 1GB guests your patch ends up doubling the
SWIOTLB memory pool. That seems to me we are trying to solve a problem
that normal users will not hit. That is why I want 'here is the customer
bug'.

Here is what I am going to do - I will take out the 1GB and 4GB case out of
your patch and call it a day. If there are customers who start reporting issues
we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the
customer if he or she would like to be recognized on upstream bugs).

And in the meantime I am going to look about adding ..
> 
> I have been also testing with 16 vCPUs configuration for 512M-1G guest
> memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
> device environment.

.. late SWIOTLB expansion to stich the DMA pools together as both
Mellanox and VirtIO are not 32-bit DMA bound.

> 
> So we might be having less configured guest memory, but we still might
> be using that configuration with I/O intensive workloads.
> 
> Thanks,
> Ashish
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-13 Thread Konrad Rzeszutek Wilk
On Thu, Nov 05, 2020 at 09:20:45PM +, Ashish Kalra wrote:
> On Thu, Nov 05, 2020 at 03:20:07PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 05, 2020 at 07:38:28PM +, Ashish Kalra wrote:
> > > On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > .
> > > > > > Right, so I am wondering if we can do this better.
> > > > > > 
> > > > > > That is you are never going to get any 32-bit devices with SEV 
> > > > > > right? That
> > > > > > is there is nothing that bounds you to always use the memory below 
> > > > > > 4GB?
> > > > > > 
> > > > > 
> > > > > We do support 32-bit PCIe passthrough devices with SEV.
> > > > 
> > > > Ewww..  Which devices would this be?
> > > 
> > > That will be difficult to predict as customers could be doing
> > > passthrough of all kinds of devices.
> > 
> > But SEV is not on some 1990 hardware. It has PCIe, there is no PCI slots in 
> > there.
> > 
> > Is it really possible to have a PCIe device that can't do more than 32-bit 
> > DMA?
> > 
> > > 
> > > > > 
> > > > > Therefore, we can't just depend on >4G memory for SWIOTLB bounce 
> > > > > buffering
> > > > > when there is I/O pressure, because we do need to support device
> > > > > passthrough of 32-bit devices.
> > > > 
> > > > Presumarily there is just a handful of them?
> > > >
> > > Again, it will be incorrect to assume this.
> > > 
> > > > > 
> > > > > Considering this, we believe that this patch needs to adjust/extend
> > > > > boot-allocation of SWIOTLB and we want to keep it simple to do this
> > > > > within a range detemined by amount of allocated guest memory.
> > > > 
> > > > I would prefer to not have to revert this in a year as customers
> > > > complain about "I paid $$$ and I am wasting half a gig on something 
> > > > I am not using" and giving customers knobs to tweak this instead of
> > > > doing the right thing from the start.
> > > 
> > > Currently, we face a lot of situations where we have to tell our
> > > internal teams/external customers to explicitly increase SWIOTLB buffer
> > > via the swiotlb parameter on the kernel command line, especially to
> > > get better I/O performance numbers with SEV. 
> > 
> > Presumarily these are 64-bit?
> > 
> > And what devices do you speak off that are actually affected by 
> > this performance? Increasing the SWIOTLB just means we have more
> > memory, which in mind means you can have _more_ devices in the guest
> > that won't handle the fact that DMA mapping returns an error.
> > 
> > Not neccessarily that one device suddenly can go faster.
> > 
> > > 
> > > So by having this SWIOTLB size adjustment done implicitly (even using a
> > > static logic) is a great win-win situation. In other words, having even
> > > a simple and static default increase of SWIOTLB buffer size for SEV is
> > > really useful for us.
> > > 
> > > We can always think of adding all kinds of heuristics to this, but that
> > > just adds too much complexity without any predictable performance gain.
> > > 
> > > And to add, the patch extends the SWIOTLB size as an architecture
> > > specific callback, currently it is a simple and static logic for SEV/x86
> > > specific, but there is always an option to tweak/extend it with
> > > additional logic in the future.
> > 
> > Right, and that is what I would like to talk about as I think you
> > are going to disappear (aka, busy with other stuff) after this patch goes 
> > in.
> > 
> > I need to understand this more than "performance" and "internal teams"
> > requirements to come up with a better way going forward as surely other
> > platforms will hit the same issue anyhow.
> > 
> > Lets break this down:
> > 
> > How does the performance improve for one single device if you increase the 
> > SWIOTLB?
> > Is there a specific device/driver that you can talk about that improve with 
> > this patch?
> > 
> > 
> 
> Yes, these are mainly for multi-queue devices such as NICs or even
> multi-queue virtio. 
> 
> This basically improves performance with concurrent DMA, hence,
> basically multi-queue devices.

OK, and for _1GB_ guest - what are the "internal teams/external customers" 
amount 
of CPUs they use? Please lets use real use-cases.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH for-5.10] swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single

2020-11-10 Thread Konrad Rzeszutek Wilk
On Tue, Nov 10, 2020 at 10:14:21AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 04, 2020 at 09:04:38AM -0500, Konrad Rzeszutek Wilk wrote:
> > On Tue, Nov 03, 2020 at 10:46:43AM +0100, Christoph Hellwig wrote:
> > > ping?
> > 
> > Hopefully this goes through. I am in the process of testing it but ran
> > into testing issues that I believe are unrelated.
> 
> Did you manage to make any progress?  This fixes an issue with the

YES!!
> new support for systems with DMA offsets in 5.10..

OK. Sending the git pull request in a minute or two.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-05 Thread Konrad Rzeszutek Wilk
On Thu, Nov 05, 2020 at 07:38:28PM +, Ashish Kalra wrote:
> On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> > .
> > > > Right, so I am wondering if we can do this better.
> > > > 
> > > > That is you are never going to get any 32-bit devices with SEV right? 
> > > > That
> > > > is there is nothing that bounds you to always use the memory below 4GB?
> > > > 
> > > 
> > > We do support 32-bit PCIe passthrough devices with SEV.
> > 
> > Ewww..  Which devices would this be?
> 
> That will be difficult to predict as customers could be doing
> passthrough of all kinds of devices.

But SEV is not on some 1990 hardware. It has PCIe, there is no PCI slots in 
there.

Is it really possible to have a PCIe device that can't do more than 32-bit DMA?

> 
> > > 
> > > Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
> > > when there is I/O pressure, because we do need to support device
> > > passthrough of 32-bit devices.
> > 
> > Presumarily there is just a handful of them?
> >
> Again, it will be incorrect to assume this.
> 
> > > 
> > > Considering this, we believe that this patch needs to adjust/extend
> > > boot-allocation of SWIOTLB and we want to keep it simple to do this
> > > within a range detemined by amount of allocated guest memory.
> > 
> > I would prefer to not have to revert this in a year as customers
> > complain about "I paid $$$ and I am wasting half a gig on something 
> > I am not using" and giving customers knobs to tweak this instead of
> > doing the right thing from the start.
> 
> Currently, we face a lot of situations where we have to tell our
> internal teams/external customers to explicitly increase SWIOTLB buffer
> via the swiotlb parameter on the kernel command line, especially to
> get better I/O performance numbers with SEV. 

Presumarily these are 64-bit?

And what devices do you speak off that are actually affected by 
this performance? Increasing the SWIOTLB just means we have more
memory, which in mind means you can have _more_ devices in the guest
that won't handle the fact that DMA mapping returns an error.

Not neccessarily that one device suddenly can go faster.

> 
> So by having this SWIOTLB size adjustment done implicitly (even using a
> static logic) is a great win-win situation. In other words, having even
> a simple and static default increase of SWIOTLB buffer size for SEV is
> really useful for us.
> 
> We can always think of adding all kinds of heuristics to this, but that
> just adds too much complexity without any predictable performance gain.
> 
> And to add, the patch extends the SWIOTLB size as an architecture
> specific callback, currently it is a simple and static logic for SEV/x86
> specific, but there is always an option to tweak/extend it with
> additional logic in the future.

Right, and that is what I would like to talk about as I think you
are going to disappear (aka, busy with other stuff) after this patch goes in.

I need to understand this more than "performance" and "internal teams"
requirements to come up with a better way going forward as surely other
platforms will hit the same issue anyhow.

Lets break this down:

How does the performance improve for one single device if you increase the 
SWIOTLB?
Is there a specific device/driver that you can talk about that improve with 
this patch?


> 
> Thanks,
> Ashish'
> 
> > 
> > That is the right thing being something less static.
> > 
> > Can you work with me on what that could be please?
> > 
> > > 
> > > Thanks,
> > > Ashish
> > > 
> > > > What I wonder is if we can combine the boot-allocation of the SWIOTLB
> > > > with the post-boot-allocation of SWIOLTB to stitch together
> > > > continous physical ranges.
> > > > 
> > > > That way you have the flexibility at the start of using 64MB but if 
> > > > there
> > > > is pressure, we grow to a bigger size?
> > > > 
> > > > > 
> > > > > Thanks,
> > > > > Ashish
> > > > > 
> > > > > > > memory.
> > > > > > > 
> > > > > > > Using late_initcall() interface to invoke
> > > > > > > swiotlb_adjust() does not work as the size
> > > > > > > adjustment needs to be done before mem_encrypt_init()
> > > > > > > and reserve_crashkernel() which use the allocated
> > > > > > > SWIOTLB buffe

Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-05 Thread Konrad Rzeszutek Wilk
.
> > Right, so I am wondering if we can do this better.
> > 
> > That is you are never going to get any 32-bit devices with SEV right? That
> > is there is nothing that bounds you to always use the memory below 4GB?
> > 
> 
> We do support 32-bit PCIe passthrough devices with SEV.

Ewww..  Which devices would this be?
> 
> Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
> when there is I/O pressure, because we do need to support device
> passthrough of 32-bit devices.

Presumarily there is just a handful of them?

> 
> Considering this, we believe that this patch needs to adjust/extend
> boot-allocation of SWIOTLB and we want to keep it simple to do this
> within a range detemined by amount of allocated guest memory.

I would prefer to not have to revert this in a year as customers
complain about "I paid $$$ and I am wasting half a gig on something 
I am not using" and giving customers knobs to tweak this instead of
doing the right thing from the start.

That is the right thing being something less static.

Can you work with me on what that could be please?

> 
> Thanks,
> Ashish
> 
> > What I wonder is if we can combine the boot-allocation of the SWIOTLB
> > with the post-boot-allocation of SWIOLTB to stitch together
> > continous physical ranges.
> > 
> > That way you have the flexibility at the start of using 64MB but if there
> > is pressure, we grow to a bigger size?
> > 
> > > 
> > > Thanks,
> > > Ashish
> > > 
> > > > > memory.
> > > > > 
> > > > > Using late_initcall() interface to invoke
> > > > > swiotlb_adjust() does not work as the size
> > > > > adjustment needs to be done before mem_encrypt_init()
> > > > > and reserve_crashkernel() which use the allocated
> > > > > SWIOTLB buffer size, hence calling it explicitly
> > > > > from setup_arch().
> > > > > 
> > > > > The SWIOTLB default size adjustment is added as an
> > > > > architecture specific interface/callback to allow
> > > > > architectures such as those supporting memory
> > > > > encryption to adjust/expand SWIOTLB size for their
> > > > > use.
> > > > > 
> > > > > Signed-off-by: Ashish Kalra 
> > > > > ---
> > > > >  arch/x86/kernel/setup.c   |  2 ++
> > > > >  arch/x86/mm/mem_encrypt.c | 42 
> > > > > +++
> > > > >  include/linux/swiotlb.h   |  1 +
> > > > >  kernel/dma/swiotlb.c  | 27 +
> > > > >  4 files changed, 72 insertions(+)
> > > > > 
> > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > index 3511736fbc74..b073d58dd4a3 100644
> > > > > --- a/arch/x86/kernel/setup.c
> > > > > +++ b/arch/x86/kernel/setup.c
> > > > > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > > > >   if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > > > >   hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> > > > >  
> > > > > + swiotlb_adjust();
> > > > > +
> > > > >   /*
> > > > >* Reserve memory for crash kernel after SRAT is parsed so that 
> > > > > it
> > > > >* won't consume hotpluggable memory.
> > > > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > > > > index 3f248f0d0e07..e0deb157cddd 100644
> > > > > --- a/arch/x86/mm/mem_encrypt.c
> > > > > +++ b/arch/x86/mm/mem_encrypt.c
> > > > > @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
> > > > >   pr_cont("\n");
> > > > >  }
> > > > >  
> > > > > +#define TOTAL_MEM_1G 0x4000UL
> > > > > +#define TOTAL_MEM_4G 0x1UL
> > > > > +
> > > > > +#define SIZE_128M (128UL<<20)
> > > > > +#define SIZE_256M (256UL<<20)
> > > > > +#define SIZE_512M (512UL<<20)
> > > > > +
> > > > >  /* Architecture __weak replacement functions */
> > > > > +unsigned long __init arch_swiotlb_adjust(unsigned long 
> > > > > iotlb_default_size)
> > > > > +{
> > > > > + unsigned long size = 0;
> > > > > +
> > > > > + /*
> > > > > +  * For SEV, all DMA has to occur via shared/unencrypted pages.
> > > > > +  * SEV uses SWOTLB to make this happen without changing device
> > > > > +  * drivers. However, depending on the workload being run, the
> > > > > +  * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> > > > > +  * run out of buffers for DMA, resulting in I/O errors and/or
> > > > > +  * performance degradation especially with high I/O workloads.
> > > > > +  * Increase the default size of SWIOTLB for SEV guests using
> > > > > +  * a minimum value of 128MB and a maximum value of 512MB,
> > > > > +  * depending on amount of provisioned guest memory.
> > > > > +  */
> > > > > + if (sev_active()) {
> > > > > + phys_addr_t total_mem = memblock_phys_mem_size();
> > > > > +
> > > > > + if (total_mem <= TOTAL_MEM_1G)
> > > > > + size = clamp(iotlb_default_size * 2, SIZE_128M,
> > > > > +  SIZE_128M);
> > > > > + else if (total_mem <= TOTAL_MEM_4G)
> > > > > +

Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-05 Thread Konrad Rzeszutek Wilk
On Wed, Nov 04, 2020 at 10:39:13PM +, Ashish Kalra wrote:
> Hello Konrad,
> 
> On Wed, Nov 04, 2020 at 05:14:52PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Nov 04, 2020 at 10:08:04PM +, Ashish Kalra wrote:
> > > From: Ashish Kalra 
> > > 
> > > For SEV, all DMA to and from guest has to use shared
> > > (un-encrypted) pages. SEV uses SWIOTLB to make this
> > > happen without requiring changes to device drivers.
> > > However, depending on workload being run, the default
> > > 64MB of SWIOTLB might not be enough and SWIOTLB
> > > may run out of buffers to use for DMA, resulting
> > > in I/O errors and/or performance degradation for
> > > high I/O workloads.
> > > 
> > > Increase the default size of SWIOTLB for SEV guests
> > > using a minimum value of 128MB and a maximum value
> > 
> > 
> > 
> > 64MB for a 1GB VM is not enough?
> > 
> > > of 512MB, determining on amount of provisioned guest
> > 
> > I like the implementation on how this is done.. but
> > the choices of memory and how much seems very much
> > random. Could there be some math behind this?
> >
> 
> Earlier the patch was based on using a % of guest memory, as below:
> 
> +#define SEV_ADJUST_SWIOTLB_SIZE_PERCENT5
> +#define SEV_ADJUST_SWIOTLB_SIZE_MAX(1UL << 30)
> ...
> ...
> +   if (sev_active() && !io_tlb_nslabs) {
> +   unsigned long total_mem = get_num_physpages() << PAGE_SHIFT;
> +
> +   default_size = total_mem *
> +   SEV_ADJUST_SWIOTLB_SIZE_PERCENT / 100;
> +
> +   default_size = ALIGN(default_size, 1 << IO_TLB_SHIFT);
> +
> +   default_size = clamp_val(default_size, IO_TLB_DEFAULT_SIZE,
> +   SEV_ADJUST_SWIOTLB_SIZE_MAX);
> +   }
> 
> But, then it is difficult to predict what % of guest memory to use ?
> 
> Then there are other factors to consider, such as vcpu_count or if there
> is going to be high I/O workload, etc.
> 
> But that all makes it very complicated, what we basically want is a
> range from 128M to 512M and that's why the current patch which picks up
> this range from the amount of allocated guest memory keeps it simple. 

Right, so I am wondering if we can do this better.

That is you are never going to get any 32-bit devices with SEV right? That
is there is nothing that bounds you to always use the memory below 4GB?

What I wonder is if we can combine the boot-allocation of the SWIOTLB
with the post-boot-allocation of SWIOLTB to stitch together
continous physical ranges.

That way you have the flexibility at the start of using 64MB but if there
is pressure, we grow to a bigger size?

> 
> Thanks,
> Ashish
> 
> > > memory.
> > > 
> > > Using late_initcall() interface to invoke
> > > swiotlb_adjust() does not work as the size
> > > adjustment needs to be done before mem_encrypt_init()
> > > and reserve_crashkernel() which use the allocated
> > > SWIOTLB buffer size, hence calling it explicitly
> > > from setup_arch().
> > > 
> > > The SWIOTLB default size adjustment is added as an
> > > architecture specific interface/callback to allow
> > > architectures such as those supporting memory
> > > encryption to adjust/expand SWIOTLB size for their
> > > use.
> > > 
> > > Signed-off-by: Ashish Kalra 
> > > ---
> > >  arch/x86/kernel/setup.c   |  2 ++
> > >  arch/x86/mm/mem_encrypt.c | 42 +++
> > >  include/linux/swiotlb.h   |  1 +
> > >  kernel/dma/swiotlb.c  | 27 +
> > >  4 files changed, 72 insertions(+)
> > > 
> > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > index 3511736fbc74..b073d58dd4a3 100644
> > > --- a/arch/x86/kernel/setup.c
> > > +++ b/arch/x86/kernel/setup.c
> > > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > >   if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > >   hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> > >  
> > > + swiotlb_adjust();
> > > +
> > >   /*
> > >* Reserve memory for crash kernel after SRAT is parsed so that it
> > >* won't consume hotpluggable memory.
> > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > > index 3f248f0d0e07..e0deb157cddd 100644
> > > --- a/arch/x86/mm/mem_encrypt.c
> > > +++ b/arch/x86/mm/mem_encrypt.c
> > > @@ -489,7 +489,49 @@

Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

2020-11-04 Thread Konrad Rzeszutek Wilk
On Wed, Nov 04, 2020 at 10:08:04PM +, Ashish Kalra wrote:
> From: Ashish Kalra 
> 
> For SEV, all DMA to and from guest has to use shared
> (un-encrypted) pages. SEV uses SWIOTLB to make this
> happen without requiring changes to device drivers.
> However, depending on workload being run, the default
> 64MB of SWIOTLB might not be enough and SWIOTLB
> may run out of buffers to use for DMA, resulting
> in I/O errors and/or performance degradation for
> high I/O workloads.
> 
> Increase the default size of SWIOTLB for SEV guests
> using a minimum value of 128MB and a maximum value



64MB for a 1GB VM is not enough?

> of 512MB, determining on amount of provisioned guest

I like the implementation on how this is done.. but
the choices of memory and how much seems very much
random. Could there be some math behind this?

> memory.
> 
> Using late_initcall() interface to invoke
> swiotlb_adjust() does not work as the size
> adjustment needs to be done before mem_encrypt_init()
> and reserve_crashkernel() which use the allocated
> SWIOTLB buffer size, hence calling it explicitly
> from setup_arch().
> 
> The SWIOTLB default size adjustment is added as an
> architecture specific interface/callback to allow
> architectures such as those supporting memory
> encryption to adjust/expand SWIOTLB size for their
> use.
> 
> Signed-off-by: Ashish Kalra 
> ---
>  arch/x86/kernel/setup.c   |  2 ++
>  arch/x86/mm/mem_encrypt.c | 42 +++
>  include/linux/swiotlb.h   |  1 +
>  kernel/dma/swiotlb.c  | 27 +
>  4 files changed, 72 insertions(+)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3511736fbc74..b073d58dd4a3 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
>   if (boot_cpu_has(X86_FEATURE_GBPAGES))
>   hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
>  
> + swiotlb_adjust();
> +
>   /*
>* Reserve memory for crash kernel after SRAT is parsed so that it
>* won't consume hotpluggable memory.
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 3f248f0d0e07..e0deb157cddd 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
>   pr_cont("\n");
>  }
>  
> +#define TOTAL_MEM_1G 0x4000UL
> +#define TOTAL_MEM_4G 0x1UL
> +
> +#define SIZE_128M (128UL<<20)
> +#define SIZE_256M (256UL<<20)
> +#define SIZE_512M (512UL<<20)
> +
>  /* Architecture __weak replacement functions */
> +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> +{
> + unsigned long size = 0;
> +
> + /*
> +  * For SEV, all DMA has to occur via shared/unencrypted pages.
> +  * SEV uses SWOTLB to make this happen without changing device
> +  * drivers. However, depending on the workload being run, the
> +  * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> +  * run out of buffers for DMA, resulting in I/O errors and/or
> +  * performance degradation especially with high I/O workloads.
> +  * Increase the default size of SWIOTLB for SEV guests using
> +  * a minimum value of 128MB and a maximum value of 512MB,
> +  * depending on amount of provisioned guest memory.
> +  */
> + if (sev_active()) {
> + phys_addr_t total_mem = memblock_phys_mem_size();
> +
> + if (total_mem <= TOTAL_MEM_1G)
> + size = clamp(iotlb_default_size * 2, SIZE_128M,
> +  SIZE_128M);
> + else if (total_mem <= TOTAL_MEM_4G)
> + size = clamp(iotlb_default_size * 4, SIZE_256M,
> +  SIZE_256M);
> + else
> + size = clamp(iotlb_default_size * 8, SIZE_512M,
> +  SIZE_512M);
> +
> + pr_info("SEV adjusted max SWIOTLB size = %luMB",
> + size >> 20);
> + }
> +
> + return size;
> +}
> +
>  void __init mem_encrypt_init(void)
>  {
>   if (!sme_me_mask)
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 046bb94bd4d6..01ae6d891327 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
>  int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
>  extern unsigned long swiotlb_nr_tbl(void);
>  unsigned long swiotlb_size_or_default(void);
> +extern void __init swiotlb_adjust(void);
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
>  extern void __init swiotlb_update_mem_attributes(void);
>  
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c19379fabd20..66a9e627bb51 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_defa

Re: [PATCH for-5.10] swiotlb: remove the tbl_dma_addr argument to swiotlb_tbl_map_single

2020-11-04 Thread Konrad Rzeszutek Wilk
On Tue, Nov 03, 2020 at 10:46:43AM +0100, Christoph Hellwig wrote:
> ping?

Hopefully this goes through. I am in the process of testing it but ran
into testing issues that I believe are unrelated.


> 
> On Fri, Oct 23, 2020 at 08:33:09AM +0200, Christoph Hellwig wrote:
> > The tbl_dma_addr argument is used to check the DMA boundary for the
> > allocations, and thus needs to be a dma_addr_t.  swiotlb-xen instead
> > passed a physical address, which could lead to incorrect results for
> > strange offsets.  Fix this by removing the parameter entirely and hard
> > code the DMA address for io_tlb_start instead.
> > 
> > Fixes: 91ffe4ad534a ("swiotlb-xen: introduce phys_to_dma/dma_to_phys 
> > translations")
> > Signed-off-by: Christoph Hellwig 
> > Reviewed-by: Stefano Stabellini 
> > ---
> >  drivers/iommu/intel/iommu.c |  5 ++---
> >  drivers/xen/swiotlb-xen.c   |  3 +--
> >  include/linux/swiotlb.h | 10 +++---
> >  kernel/dma/swiotlb.c| 16 ++--
> >  4 files changed, 12 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 8651f6d4dfa032..6b560e6f193096 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -3815,9 +3815,8 @@ bounce_map_single(struct device *dev, phys_addr_t 
> > paddr, size_t size,
> >  * page aligned, we don't need to use a bounce page.
> >  */
> > if (!IS_ALIGNED(paddr | size, VTD_PAGE_SIZE)) {
> > -   tlb_addr = swiotlb_tbl_map_single(dev,
> > -   phys_to_dma_unencrypted(dev, io_tlb_start),
> > -   paddr, size, aligned_size, dir, attrs);
> > +   tlb_addr = swiotlb_tbl_map_single(dev, paddr, size,
> > +   aligned_size, dir, attrs);
> > if (tlb_addr == DMA_MAPPING_ERROR) {
> > goto swiotlb_error;
> > } else {
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index 71ce1b7a23d1cc..2b385c1b4a99cb 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -395,8 +395,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device 
> > *dev, struct page *page,
> >  */
> > trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
> >  
> > -   map = swiotlb_tbl_map_single(dev, virt_to_phys(xen_io_tlb_start),
> > -phys, size, size, dir, attrs);
> > +   map = swiotlb_tbl_map_single(dev, phys, size, size, dir, attrs);
> > if (map == (phys_addr_t)DMA_MAPPING_ERROR)
> > return DMA_MAPPING_ERROR;
> >  
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index 513913ff748626..3bb72266a75a1d 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -45,13 +45,9 @@ enum dma_sync_target {
> > SYNC_FOR_DEVICE = 1,
> >  };
> >  
> > -extern phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
> > - dma_addr_t tbl_dma_addr,
> > - phys_addr_t phys,
> > - size_t mapping_size,
> > - size_t alloc_size,
> > - enum dma_data_direction dir,
> > - unsigned long attrs);
> > +phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> > +   size_t mapping_size, size_t alloc_size,
> > +   enum dma_data_direction dir, unsigned long attrs);
> >  
> >  extern void swiotlb_tbl_unmap_single(struct device *hwdev,
> >  phys_addr_t tlb_addr,
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index b4eea0abc3f002..92e2f54f24c01b 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -441,14 +441,11 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
> > phys_addr_t tlb_addr,
> > }
> >  }
> >  
> > -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
> > -  dma_addr_t tbl_dma_addr,
> > -  phys_addr_t orig_addr,
> > -  size_t mapping_size,
> > -  size_t alloc_size,
> > -  enum dma_data_direction dir,
> > -  unsigned long attrs)
> > +phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t 
> > orig_addr,
> > +   size_t mapping_size, size_t alloc_size,
> > +   enum dma_data_direction dir, unsigned long attrs)
> >  {
> > +   dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
> > unsigned long flags;
> > phys_addr_t tlb_addr;
> > unsigned int nslots, stride, index, wrap;
> > @@ -667,9 +664,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
> > paddr, size_t size,
> > trace_swiotlb_bounced(dev, phys_to_dma(dev, paddr), size,
> >   

Re: [PATCH] fix swiotlb panic on Xen

2020-10-27 Thread Konrad Rzeszutek Wilk
> As the person who first found this and then confirmed this fixes a bug:
> 
> Tested-by: Elliott Mitchell 

Thank you!!

I changed the title and added the various tags and will put it in
linux-next later this week.

>From a1eb2768bf5954d25aa0f0136b38f0aa5d92d984 Mon Sep 17 00:00:00 2001
From: Stefano Stabellini 
Date: Mon, 26 Oct 2020 17:02:14 -0700
Subject: [PATCH] swiotlb: fix "x86: Don't panic if can not alloc buffer for
 swiotlb"

kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
allocate a buffer for the swiotlb. It does so by calling

  memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);

If the allocation must fail, no_iotlb_memory is set.

Later during initialization swiotlb-xen comes in
(drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
is != 0, it thinks the memory is ready to use when actually it is not.

When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
and since no_iotlb_memory is set the kernel panics.

Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
initialized, it would do the initialization itself, which might still
succeed.

Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
failure, and also by setting no_iotlb_memory to false on swiotlb
initialization success.

Fixes: ac2cbab21f31 ("x86: Don't panic if can not alloc buffer for swiotlb")

Reported-by: Elliott Mitchell 
Tested-by: Elliott Mitchell 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Christoph Hellwig 
CC: sta...@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk 
---
 kernel/dma/swiotlb.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 465a567678d9..e08cac39c0ba 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -229,6 +229,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
io_tlb_index = 0;
+   no_iotlb_memory = false;
 
if (verbose)
swiotlb_print_info();
@@ -260,9 +261,11 @@ swiotlb_init(int verbose)
if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
return;
 
-   if (io_tlb_start)
+   if (io_tlb_start) {
memblock_free_early(io_tlb_start,
PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
+   io_tlb_start = 0;
+   }
pr_warn("Cannot allocate buffer");
no_iotlb_memory = true;
 }
@@ -360,6 +363,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
}
io_tlb_index = 0;
+   no_iotlb_memory = false;
 
swiotlb_print_info();
 
-- 
2.13.6

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] fix swiotlb panic on Xen

2020-10-27 Thread Konrad Rzeszutek Wilk
On Mon, Oct 26, 2020 at 05:02:14PM -0700, Stefano Stabellini wrote:
> From: Stefano Stabellini 
> 
> kernel/dma/swiotlb.c:swiotlb_init gets called first and tries to
> allocate a buffer for the swiotlb. It does so by calling
> 
>   memblock_alloc_low(PAGE_ALIGN(bytes), PAGE_SIZE);
> 
> If the allocation must fail, no_iotlb_memory is set.
> 
> 
> Later during initialization swiotlb-xen comes in
> (drivers/xen/swiotlb-xen.c:xen_swiotlb_init) and given that io_tlb_start
> is != 0, it thinks the memory is ready to use when actually it is not.
> 
> When the swiotlb is actually needed, swiotlb_tbl_map_single gets called
> and since no_iotlb_memory is set the kernel panics.
> 
> Instead, if swiotlb-xen.c:xen_swiotlb_init knew the swiotlb hadn't been
> initialized, it would do the initialization itself, which might still
> succeed.
> 
> 
> Fix the panic by setting io_tlb_start to 0 on swiotlb initialization
> failure, and also by setting no_iotlb_memory to false on swiotlb
> initialization success.

Should this have a Fixes: flag?

> 
> Signed-off-by: Stefano Stabellini 
> 
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c19379fabd20..9924214df60a 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -231,6 +231,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
> nslabs, int verbose)
>   io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
>   }
>   io_tlb_index = 0;
> + no_iotlb_memory = false;
>  
>   if (verbose)
>   swiotlb_print_info();
> @@ -262,9 +263,11 @@ swiotlb_init(int verbose)
>   if (vstart && !swiotlb_init_with_tbl(vstart, io_tlb_nslabs, verbose))
>   return;
>  
> - if (io_tlb_start)
> + if (io_tlb_start) {
>   memblock_free_early(io_tlb_start,
>   PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
> + io_tlb_start = 0;
> + }
>   pr_warn("Cannot allocate buffer");
>   no_iotlb_memory = true;
>  }
> @@ -362,6 +365,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
> nslabs)
>   io_tlb_orig_addr[i] = INVALID_PHYS_ADDR;
>   }
>   io_tlb_index = 0;
> + no_iotlb_memory = false;
>  
>   swiotlb_print_info();
>  
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2] dma-direct: Fix potential NULL pointer dereference

2020-09-17 Thread Konrad Rzeszutek Wilk
On Wed, Sep 16, 2020 at 02:51:06PM -0600, Thomas Tai wrote:
> When booting the kernel v5.9-rc4 on a VM, the kernel would panic when
> printing a warning message in swiotlb_map(). The dev->dma_mask must not
> be a NULL pointer when calling the dma mapping layer. A NULL pointer check
> can potentially avoid the panic.
> 
> [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 0
>  BUG: kernel NULL pointer dereference, address: 
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x) - not-present page
>  PGD 0 P4D 0
>  Oops:  [#1] SMP PTI
>  CPU: 1 PID: 331 Comm: systemd-udevd Not tainted 5.9.0-rc4 #1
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
>  BIOS 1.13.0-1ubuntu1 04/01/2014
>  RIP: 0010:swiotlb_map+0x1ac/0x200
>  Code: e8 d9 fc ff ff 80 3d 92 ee 4c 01 00 75 51 49 8b 84 24 48 02 00 00
>  4d 8b 6c 24 50 c6 05 7c ee 4c 01 01 4d 8b bc 24 58 02 00 00 <4c> 8b 30
>  4d 85 ed 75 04 4d 8b 2c 24 4c 89 e7 e8 10 6b 4f 00 4d 89
>  RSP: 0018:9f96801af6f8 EFLAGS: 00010246
>  RAX:  RBX: 1000 RCX: 0080
>  RDX: 007f RSI: 0202 RDI: 0202
>  RBP: 9f96801af748 R08:  R09: 0020
>  R10:  R11: 8fabfffa3000 R12: 8faad02c7810
>  R13:  R14: 0020 R15: 
>  FS:  7fabc63588c0() GS:8fabf7c8()
>  knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2:  CR3: 000151496005 CR4: 00370ee0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: fffe0ff0 DR7: 0400
>  Call Trace:
>   dma_direct_map_sg+0x124/0x210
>   dma_map_sg_attrs+0x32/0x50
>   drm_gem_shmem_get_pages_sgt+0x6a/0x90 [drm]
>   virtio_gpu_object_create+0x140/0x2f0 [virtio_gpu]
>   ? ww_mutex_unlock+0x26/0x30
>   virtio_gpu_mode_dumb_create+0xab/0x160 [virtio_gpu]
>   drm_mode_create_dumb+0x82/0x90 [drm]
>   drm_client_framebuffer_create+0xaa/0x200 [drm]
>   drm_fb_helper_generic_probe+0x59/0x150 [drm_kms_helper]
>   drm_fb_helper_single_fb_probe+0x29e/0x3e0 [drm_kms_helper]
>   __drm_fb_helper_initial_config_and_unlock+0x41/0xd0 [drm_kms_helper]
>   drm_fbdev_client_hotplug+0xe6/0x1a0 [drm_kms_helper]
>   drm_fbdev_generic_setup+0xaf/0x170 [drm_kms_helper]
>   virtio_gpu_probe+0xea/0x100 [virtio_gpu]
>   virtio_dev_probe+0x14b/0x1e0 [virtio]
>   really_probe+0x1db/0x440
>   driver_probe_device+0xe9/0x160
>   device_driver_attach+0x5d/0x70
>   __driver_attach+0x8f/0x150
>   ? device_driver_attach+0x70/0x70
>   bus_for_each_dev+0x7e/0xc0
>   driver_attach+0x1e/0x20
>   bus_add_driver+0x152/0x1f0
>   driver_register+0x74/0xd0
>   ? 0xc0529000
>   register_virtio_driver+0x20/0x30 [virtio]
>   virtio_gpu_driver_init+0x15/0x1000 [virtio_gpu]
>   do_one_initcall+0x4a/0x1fa
>   ? _cond_resched+0x19/0x30
>   ? kmem_cache_alloc_trace+0x16b/0x2e0
>   do_init_module+0x62/0x240
>   load_module+0xe0e/0x1100
>   ? security_kernel_post_read_file+0x5c/0x70
>   __do_sys_finit_module+0xbe/0x120
>   ? __do_sys_finit_module+0xbe/0x120
>   __x64_sys_finit_module+0x1a/0x20
>   do_syscall_64+0x38/0x50
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Signed-off-by: Thomas Tai 

Reviewed-by: Konrad Rzeszutek Wilk 

Thank you!
> ---
>  include/linux/dma-direct.h |  3 ---
>  kernel/dma/mapping.c   | 11 +++
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index 6e87225..0648708 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -62,9 +62,6 @@ static inline bool dma_capable(struct device *dev, 
> dma_addr_t addr, size_t size,
>  {
>   dma_addr_t end = addr + size - 1;
>  
> - if (!dev->dma_mask)
> - return false;
> -
>   if (is_ram && !IS_ENABLED(CONFIG_ARCH_DMA_ADDR_T_64BIT) &&
>   min(addr, end) < phys_to_dma(dev, PFN_PHYS(min_low_pfn)))
>   return false;
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 0d12942..7133d5c 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -144,6 +144,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct 
> page *page,
>   dma_addr_t addr;
>  
>   BUG_ON(!valid_dma_direction(dir));
> +
> + if (WARN_ON_ONCE(!dev->dma_mask))
> + return DMA_MAPPING_ERROR;
> +
>   if (dma_map_direct(dev, ops))
>   addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
>   else
> @@ -179,6 +183,10 @@ i

Re: [PATCH v2 1/3] swiotlb: Use %pa to print phys_addr_t variables

2020-09-09 Thread Konrad Rzeszutek Wilk
On Wed, Sep 09, 2020 at 06:59:13PM +0300, Andy Shevchenko wrote:
> On Wed, Sep 02, 2020 at 11:02:46PM -0300, Fabio Estevam wrote:
> > On Wed, Sep 2, 2020 at 2:31 PM Andy Shevchenko
> >  wrote:
> > >
> > > There is an extension to a %p to print phys_addr_t type of variables.
> > > Use it here.
> > >
> > > Signed-off-by: Andy Shevchenko 
> > > ---
> > > v2: dropped bytes replacement (Fabio)
> > 
> > Reviewed-by: Fabio Estevam 
> 
> Thanks!
> 
> Guys, can this series be applied?

Sure.
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Is: virtio_gpu_object_shmem_init issues? Was:Re: upstream boot error: general protection fault in swiotlb_map

2020-08-24 Thread Konrad Rzeszutek Wilk
On Thu, Aug 06, 2020 at 03:46:23AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:47ec5303 Merge git://git.kernel.org/pub/scm/linux/kernel/g..
> git tree:   upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=16fe1dea90
> kernel config:  https://syzkaller.appspot.com/x/.config?x=7c06047f622c5724
> dashboard link: https://syzkaller.appspot.com/bug?extid=3f86afd0b1e4bf1cb64c
> compiler:   gcc (GCC) 10.1.0-syz 20200507
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+3f86afd0b1e4bf1cb...@syzkaller.appspotmail.com
> 
> ceph: loaded (mds proto 32)
> NET: Registered protocol family 38
> async_tx: api initialized (async)
> Key type asymmetric registered
> Asymmetric key parser 'x509' registered
> Asymmetric key parser 'pkcs8' registered
> Key type pkcs7_test registered
> Asymmetric key parser 'tpm_parser' registered
> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
> io scheduler mq-deadline registered
> io scheduler kyber registered
> io scheduler bfq registered
> hgafb: HGA card not detected.
> hgafb: probe of hgafb.0 failed with error -22
> usbcore: registered new interface driver udlfb
> uvesafb: failed to execute /sbin/v86d
> uvesafb: make sure that the v86d helper is installed and executable
> uvesafb: Getting VBE info block failed (eax=0x4f00, err=-2)
> uvesafb: vbe_init() failed with -22
> uvesafb: probe of uvesafb.0 failed with error -22
> vga16fb: mapped to 0x8aac772d
> Console: switching to colour frame buffer device 80x30
> fb0: VGA16 VGA frame buffer device
> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
> ACPI: Power Button [PWRF]
> ioatdma: Intel(R) QuickData Technology Driver 5.00
> PCI Interrupt Link [GSIF] enabled at IRQ 21
> PCI Interrupt Link [GSIG] enabled at IRQ 22
> PCI Interrupt Link [GSIH] enabled at IRQ 23
> N_HDLC line discipline registered with maxframe=4096
> Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> Cyclades driver 2.6
> Initializing Nozomi driver 2.1d
> RocketPort device driver module, version 2.09, 12-June-2003
> No rocketport ports found; unloading driver
> Non-volatile memory driver v1.3
> Linux agpgart interface v0.103
> [drm] Initialized vgem 1.0.0 20120112 for vgem on minor 0
> [drm] Initialized vkms 1.0.0 20180514 for vkms on minor 1
> usbcore: registered new interface driver udl
> [drm] pci: virtio-vga detected at :00:01.0
> fb0: switching to virtiodrmfb from VGA16 VGA
> Console: switching to colour VGA+ 80x25
> virtio-pci :00:01.0: vgaarb: deactivate vga console
> Console: switching to colour dummy device 80x25
> [drm] features: -virgl +edid
> [drm] number of scanouts: 1
> [drm] number of cap sets: 0
> [drm] Initialized virtio_gpu 0.1.0 0 for virtio0 on minor 2
> general protection fault, probably for non-canonical address 
> 0xdc00:  [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x-0x0007]
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0-syzkaller #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> RIP: 0010:swiotlb_map+0x5ac/0x700 kernel/dma/swiotlb.c:683
> Code: 28 04 00 00 48 c1 ea 03 80 3c 02 00 0f 85 4d 01 00 00 4c 8b a5 18 04 00 
> 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 1e 
> 01 00 00 48 8d 7d 50 4d 8b 24 24 48 b8 00 00
> RSP: :c934f3e0 EFLAGS: 00010246
> RAX: dc00 RBX:  RCX: 8162cc1d
> RDX:  RSI: 8162cc98 RDI: 88802971a470
> RBP: 88802971a048 R08: 0001 R09: 8c5dba77
> R10:  R11:  R12: 
> R13: 7ac0 R14: dc00 R15: 1000
> FS:  () GS:88802ce0() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2:  CR3: 09a8d000 CR4: 00350ef0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  dma_direct_map_page include/linux/dma-direct.h:170 [inline]
>  dma_direct_map_sg+0x3bb/0x670 kernel/dma/direct.c:368
>  dma_map_sg_attrs+0xd0/0x160 kernel/dma/mapping.c:183
>  drm_gem_shmem_get_pages_sgt drivers/gpu/drm/drm_gem_shmem_helper.c:700 
> [inline]
>  drm_gem_shmem_get_pages_sgt+0x1fc/0x310 
> drivers/gpu/drm/drm_gem_shmem_helper.c:679
>  virtio_gpu_object_shmem_init drivers/gpu/drm/virtio/virtgpu_object.c:153 
> [inline]
>  virtio_gpu_object_create+0x2fd/0xa70 
> drivers/gpu/drm/virtio/virtgpu_object.c:232
>  virtio_gpu_gem_create drivers/gpu/drm/virtio/virtgpu_gem.c:45 [inline]
>  virtio_gpu_mode_dumb_create+0x298/0x530 
> drivers/gpu/drm/virtio/virtgpu_gem.c:85
>  drm_mode_create_dumb

  1   2   3   >