Re: [PATCH V3] swiotlb: Split up single swiotlb lock

2022-07-10 Thread Tianyu Lan

On 7/8/2022 1:07 AM, Christoph Hellwig wrote:

Thanks, this looks much better.  I think there is a small problem
with how default_nareas is set - we need to use 0 as the default
so that an explicit command line value of 1 works.  Als have you
checked the interaction with swiotlb_adjust_size in detail?


Yes, the patch was tested in the Hyper-V SEV VM which always calls
swiotlb_adjust_size() to adjust bounce buffer size according to memory
size. It will round up bounce buffer size to the next power of 2 if the 
memory size is not power of 2.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V4] swiotlb: Split up single swiotlb lock

2022-07-10 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb kernel parameter and is default
to be possible cpu number. If possible cpu number is not power of
2, area number will be round up to the next power of 2.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/
4529b5784c141782c72ec9bd9a92df2b68cb7d45).

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
Change sicne v3:
   * Make area number to be zero by default

Change since v2:
   * Use possible cpu number to adjust iotlb area number

Change since v1:
   * Move struct io_tlb_area to swiotlb.c
   * Fix some coding style issue.
---
 .../admin-guide/kernel-parameters.txt |   4 +-
 include/linux/swiotlb.h   |   5 +
 kernel/dma/swiotlb.c  | 230 +++---
 3 files changed, 198 insertions(+), 41 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 2522b11e593f..4a6ad177d4b8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5904,8 +5904,10 @@
it if 0 is given (See 
Documentation/admin-guide/cgroup-v1/memory.rst)
 
swiotlb=[ARM,IA-64,PPC,MIPS,X86]
-   Format: {  | force | noforce }
+   Format: {  [,] | force | noforce }
 -- Number of I/O TLB slabs
+-- Second integer after comma. Number of swiotlb
+areas with their own lock. Must be power of 2.
force -- force using of bounce buffers even if they
 wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..5f898c5e9f19 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -89,6 +89,8 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @nareas:  The area number in the pool.
+ * @area_nslabs: The slot number in the area.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +104,9 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cb50f8d38360..9f547d8ab550 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -70,6 +70,43 @@ struct io_tlb_mem io_tlb_default_mem;
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long default_nareas;
+
+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int index;
+   spinlock_t lock;
+};
+
+static void swiotlb_adjust_nareas(unsigned int nareas)
+{
+   if (!is_power_of_2(nareas))
+   nareas = roundup_pow_of_two(nareas);
+
+   default_nareas = nareas;
+
+   pr_info("area num %d.\n", nareas);
+   /*
+* Round up number of slabs to the next power of 2.
+* The last area is going be smaller than the rest if
+* default_nslabs is not power of two.
+*/
+   if (nareas && !is_power_of_2(default_nslabs)) {
+   default_nslabs = roundup_pow_of_two(default_nslabs);
+   pr_info("SWIOTLB bounce buffer size roundup to %luMB",
+   (default_nslabs << IO_TLB_SHIFT) >> 20);
+   }
+}
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -79,6 +116,10 @@ setup_io_tlb_np

[PATCH V3] swiotlb: Split up single swiotlb lock

2022-07-07 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb kernel parameter and is default
to be possible cpu number. If possible cpu number is not power of
2, area number will be round up to the next power of 2.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/
4529b5784c141782c72ec9bd9a92df2b68cb7d45).

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
Change since v2:
   * Use possible cpu number to adjust iotlb area number

Change since v1:
   * Move struct io_tlb_area to swiotlb.c
   * Fix some coding style issue.
---
 .../admin-guide/kernel-parameters.txt |   4 +-
 include/linux/swiotlb.h   |   5 +
 kernel/dma/swiotlb.c  | 222 +++---
 3 files changed, 191 insertions(+), 40 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 2522b11e593f..4a6ad177d4b8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5904,8 +5904,10 @@
it if 0 is given (See 
Documentation/admin-guide/cgroup-v1/memory.rst)
 
swiotlb=[ARM,IA-64,PPC,MIPS,X86]
-   Format: {  | force | noforce }
+   Format: {  [,] | force | noforce }
 -- Number of I/O TLB slabs
+-- Second integer after comma. Number of swiotlb
+areas with their own lock. Must be power of 2.
force -- force using of bounce buffers even if they
 wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..5f898c5e9f19 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -89,6 +89,8 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @nareas:  The area number in the pool.
+ * @area_nslabs: The slot number in the area.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +104,9 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cb50f8d38360..9e7aeca8faf4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -70,6 +70,43 @@ struct io_tlb_mem io_tlb_default_mem;
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long default_nareas = 1;
+
+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int index;
+   spinlock_t lock;
+};
+
+static void swiotlb_adjust_nareas(unsigned int nareas)
+{
+   if (!is_power_of_2(nareas))
+   nareas = roundup_pow_of_two(nareas);
+
+   default_nareas = nareas;
+
+   pr_info("area num %d.\n", nareas);
+   /*
+* Round up number of slabs to the next power of 2.
+* The last area is going be smaller than the rest if
+* default_nslabs is not power of two.
+*/
+   if (nareas > 1) {
+   default_nslabs = roundup_pow_of_two(default_nslabs);
+   pr_info("SWIOTLB bounce buffer size roundup to %luMB",
+   (default_nslabs << IO_TLB_SHIFT) >> 20);
+   }
+}
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -79,6 +116,10 @@ setup_io_tlb_npages(char *str)
default_nslabs =
ALIGN

Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT

2022-07-06 Thread Tianyu Lan

On 7/6/2022 5:02 PM, Christoph Hellwig wrote:

On Wed, Jul 06, 2022 at 04:57:33PM +0800, Tianyu Lan wrote:

Swiotlb_init() is called in the mem_init() of different architects and
memblock free pages are released to the buddy allocator just after
calling swiotlb_init() via memblock_free_all().


Yes.


The mem_init() is called before smp_init().


But why would that matter?  cpu_possible_map is set up from
setup_arch(), which is called before that.


Sorry. I just still focus online cpu number and the number is got after
smp_init(). Possible cpu number includes some offline cpus. I will have 
a try. Thanks for suggestion.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT

2022-07-06 Thread Tianyu Lan

On 7/6/2022 4:00 PM, Christoph Hellwig wrote:

On Fri, Jul 01, 2022 at 01:02:21AM +0800, Tianyu Lan wrote:

Can we reorder that initialization?  Because I really hate having
to have an arch hook in every architecture.


How about using "flags" parameter of swiotlb_init() to pass area number
or add new parameter for area number?

I just reposted patch 1 since there is just some coding style issue and area
number may also set via swiotlb kernel parameter. We still need figure out a
good solution to pass area number from architecture code.


What is the problem with calling swiotlb_init after nr_possible_cpus()
works?


Swiotlb_init() is called in the mem_init() of different architects and
memblock free pages are released to the buddy allocator just after
calling swiotlb_init() via memblock_free_all().

The mem_init() is called before smp_init(). If calling swiotlb_init()
after smp_init(), that means we can't allocate large chunk low end
memory via memblock_alloc() in the swiotlb(). Swiotlb_init() needs
to rework to allocate memory from the buddy allocator and just like
swiotlb_init_late() does. This will limit the bounce buffer size.
Otherwise We need to do the reorder for all achitectures and there maybe
some other unknown issues.

swiotlb flags parameter of swiotlb_init() seems to be a good place to
pass the area number in current code. If not set the swiotlb_area
number/flag, the area number will be one and keep the original behavior
of one single global spinlock protecting io tlb data structure.














___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT

2022-06-30 Thread Tianyu Lan

On 6/29/2022 10:04 PM, Christoph Hellwig wrote:

On Mon, Jun 27, 2022 at 11:31:50AM -0400, Tianyu Lan wrote:

From: Tianyu Lan 

When initialize swiotlb bounce buffer, smp_init() has not been
called and cpu number can not be got from num_online_cpus().
Use the number of lapic entry to set swiotlb area number and
keep swiotlb area number equal to cpu number on the x86 platform.


Can we reorder that initialization?  Because I really hate having
to have an arch hook in every architecture.


How about using "flags" parameter of swiotlb_init() to pass area number
or add new parameter for area number?

I just reposted patch 1 since there is just some coding style issue and 
area number may also set via swiotlb kernel parameter. We still need 
figure out a good solution to pass area number from architecture code.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 1/1] swiotlb: Split up single swiotlb lock

2022-06-30 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel
parameter.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/
4529b5784c141782c72ec9bd9a92df2b68cb7d45).

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
Change since v1:
   * Move struct io_tlb_area to swiotlb.c
   * Fix some coding style issue.
---
 .../admin-guide/kernel-parameters.txt |   4 +-
 include/linux/swiotlb.h   |   5 +
 kernel/dma/swiotlb.c  | 218 ++
 3 files changed, 187 insertions(+), 40 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 2522b11e593f..4a6ad177d4b8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5904,8 +5904,10 @@
it if 0 is given (See 
Documentation/admin-guide/cgroup-v1/memory.rst)
 
swiotlb=[ARM,IA-64,PPC,MIPS,X86]
-   Format: {  | force | noforce }
+   Format: {  [,] | force | noforce }
 -- Number of I/O TLB slabs
+-- Second integer after comma. Number of swiotlb
+areas with their own lock. Must be power of 2.
force -- force using of bounce buffers even if they
 wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..5f898c5e9f19 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -89,6 +89,8 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @nareas:  The area number in the pool.
+ * @area_nslabs: The slot number in the area.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +104,9 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cb50f8d38360..421bba62d4f1 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -70,6 +70,45 @@ struct io_tlb_mem io_tlb_default_mem;
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long default_nareas = 1;
+
+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int index;
+   spinlock_t lock;
+};
+
+static void __init swiotlb_adjust_nareas(unsigned int nareas)
+{
+   if (!is_power_of_2(nareas)) {
+   pr_err("swiotlb: Invalid areas parameter %d.\n", nareas);
+   return;
+   }
+
+   default_nareas = nareas;
+
+   pr_info("area num %d.\n", nareas);
+   /*
+* Round up number of slabs to the next power of 2.
+* The last area is going be smaller than the rest if
+* default_nslabs is not power of two.
+*/
+   if (nareas > 1) {
+   default_nslabs = roundup_pow_of_two(default_nslabs);
+   pr_info("SWIOTLB bounce buffer size roundup to %luMB",
+   (default_nslabs << IO_TLB_SHIFT) >> 20);
+   }
+}
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -79,6 +118,10 @@ setup_io_tlb_npages(char *str)
default_nslabs =
ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE);
}
+   if (*str == ',')
+   ++str;
+   if (isdigit(*str))
+

[PATCH 1/2] swiotlb: Split up single swiotlb lock

2022-06-27 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel
parameter.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578
4c141782c72ec9bd9a92df2b68cb7d45).

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
 .../admin-guide/kernel-parameters.txt |   4 +-
 include/linux/swiotlb.h   |  27 +++
 kernel/dma/swiotlb.c  | 202 ++
 3 files changed, 194 insertions(+), 39 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 2522b11e593f..4a6ad177d4b8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5904,8 +5904,10 @@
it if 0 is given (See 
Documentation/admin-guide/cgroup-v1/memory.rst)
 
swiotlb=[ARM,IA-64,PPC,MIPS,X86]
-   Format: {  | force | noforce }
+   Format: {  [,] | force | noforce }
 -- Number of I/O TLB slabs
+-- Second integer after comma. Number of swiotlb
+areas with their own lock. Must be power of 2.
force -- force using of bounce buffers even if they
 wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..7157428cf3ac 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,6 +62,22 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
 
+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int index;
+   spinlock_t lock;
+};
+
 /**
  * struct io_tlb_mem - IO TLB Memory Pool Descriptor
  *
@@ -89,6 +105,8 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @nareas:  The area number in the pool.
+ * @area_nslabs: The slot number in the area.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +120,9 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -130,6 +151,7 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+void __init swiotlb_adjust_nareas(unsigned int nareas);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -162,6 +184,11 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline void swiotlb_adjust_nareas(unsigned int nareas)
+{
+}
+
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cb50f8d38360..17154abdfb34 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -70,6 +70,7 @@ struct io_tlb_mem io_tlb_default_mem;
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long default_nareas = 1;
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -79,6 +80,10 @@ setup_io_tlb_npages(char *str)
default_nslabs =
ALIGN(simple_strtoul(str, , 0), IO_TLB_SEGSIZE);
}
+   if (*str == ',')
+   ++str;
+   if (isdigi

[PATCH 0/2] swiotlb: Split up single swiotlb lock

2022-06-27 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

Patch 1 is to introduce swiotlb area concept and split up single swiotlb
lock.
Patch 2 set swiotlb area number with lapic number


Tianyu Lan (2):
  swiotlb: Split up single swiotlb lock
  x86/ACPI: Set swiotlb area according to the number of lapic entry in
MADT

 .../admin-guide/kernel-parameters.txt |   4 +-
 arch/x86/kernel/acpi/boot.c   |   3 +
 include/linux/swiotlb.h   |  27 +++
 kernel/dma/swiotlb.c  | 202 ++
 4 files changed, 197 insertions(+), 39 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] x86/ACPI: Set swiotlb area according to the number of lapic entry in MADT

2022-06-27 Thread Tianyu Lan
From: Tianyu Lan 

When initialize swiotlb bounce buffer, smp_init() has not been
called and cpu number can not be got from num_online_cpus().
Use the number of lapic entry to set swiotlb area number and
keep swiotlb area number equal to cpu number on the x86 platform.

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/acpi/boot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 907cc98b1938..7e13499f2c10 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1131,6 +1132,8 @@ static int __init acpi_parse_madt_lapic_entries(void)
return count;
}
 
+   swiotlb_adjust_nareas(max(count, x2count));
+
x2count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_X2APIC_NMI,
acpi_parse_x2apic_nmi, 0);
count = acpi_table_parse_madt(ACPI_MADT_TYPE_LOCAL_APIC_NMI,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH V4 1/1] swiotlb: Split up single swiotlb lock

2022-06-23 Thread Tianyu Lan

On 6/22/2022 6:54 PM, Christoph Hellwig wrote:

Thanks,

this looks pretty good to me.  A few comments below:



Thanks for your review.


On Fri, Jun 17, 2022 at 10:47:41AM -0400, Tianyu Lan wrote:

+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int index;
+   spinlock_t lock;
+};


This can go into swiotlb.c.


struct io_tlb_area is used in the struct io_tlb_mem.




+void __init swiotlb_adjust_nareas(unsigned int nareas);


And this should be marked static.


+#define DEFAULT_NUM_AREAS 1


I'd drop this define, the magic 1 and a > 1 comparism seems to
convey how it is used much better as the checks aren't about default
or not, but about larger than one.

I also think that we want some good way to size the default, e.g.
by number of CPUs or memory size.


swiotlb_adjust_nareas() is exposed to platforms to set area number.
When swiotlb_init() is called, smp_init() isn't called at that point and
so standard API of checking cpu number (e.g, num_online_cpus()) doesn't
work. Platforms may have other ways to get cpu number(e.g x86 may ACPI
MADT table entries to get cpu nubmer) and set area number. I will post 
following patch to set cpu number via swiotlb_adjust_nareas(),





+void __init swiotlb_adjust_nareas(unsigned int nareas)
+{
+   if (!is_power_of_2(nareas)) {
+   pr_err("swiotlb: Invalid areas parameter %d.\n", nareas);
+   return;
+   }
+
+   default_nareas = nareas;
+
+   pr_info("area num %d.\n", nareas);
+   /* Round up number of slabs to the next power of 2.
+* The last area is going be smaller than the rest if
+* default_nslabs is not power of two.
+*/


Please follow the normal kernel comment style with a /* on its own line.



OK. Will update.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V4 1/1] swiotlb: Split up single swiotlb lock

2022-06-17 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb bounce buffer pool into individual areas
which have their own lock. Each CPU tries to allocate in its own area
first. Only if that fails does it search other areas. On freeing the
allocation is freed into the area where the memory was originally
allocated from.

Area number can be set via swiotlb_adjust_nareas() and swiotlb kernel
parameter.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578
4c141782c72ec9bd9a92df2b68cb7d45).

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
 .../admin-guide/kernel-parameters.txt |   4 +-
 include/linux/swiotlb.h   |  27 +++
 kernel/dma/swiotlb.c  | 202 ++
 3 files changed, 194 insertions(+), 39 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 8090130b544b..5d46271982d5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5869,8 +5869,10 @@
it if 0 is given (See 
Documentation/admin-guide/cgroup-v1/memory.rst)
 
swiotlb=[ARM,IA-64,PPC,MIPS,X86]
-   Format: {  | force | noforce }
+   Format: {  [,] | force | noforce }
 -- Number of I/O TLB slabs
+-- Second integer after comma. Number of swiotlb
+areas with their own lock. Must be power of 2.
force -- force using of bounce buffers even if they
 wouldn't be automatically used by the kernel
noforce -- Never use bounce buffers (for debugging)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..7157428cf3ac 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,6 +62,22 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
 
+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int index;
+   spinlock_t lock;
+};
+
 /**
  * struct io_tlb_mem - IO TLB Memory Pool Descriptor
  *
@@ -89,6 +105,8 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @nareas:  The area number in the pool.
+ * @area_nslabs: The slot number in the area.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +120,9 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int nareas;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -130,6 +151,7 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+void __init swiotlb_adjust_nareas(unsigned int nareas);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -162,6 +184,11 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+static inline void swiotlb_adjust_nareas(unsigned int nareas)
+{
+}
+
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index cb50f8d38360..139d08068912 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -62,6 +62,8 @@
 
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
+#define DEFAULT_NUM_AREAS 1
+
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
@@ -70,6 +72,7 @@ struct io_tlb_mem io_tlb_default_mem;
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long default_nareas = DEFAULT_NUM_AREAS;
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -79,6 +82,10 @@ setup_io_tlb_npages(char *str)
default_

Re: [RFC PATCH V3 2/2] net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc

2022-05-27 Thread Tianyu Lan



On 5/27/2022 2:43 AM, Dexuan Cui wrote:

From: Tianyu Lan 
Sent: Thursday, May 26, 2022 5:01 AM
...
@@ -119,6 +124,10 @@ static void netvsc_subchan_work(struct work_struct
*w)
nvdev->max_chn = 1;
nvdev->num_chn = 1;
}
+
+   /* Allocate boucne buffer.*/
+   swiotlb_device_allocate(>device, nvdev->num_chn,
+   10 * IO_TLB_BLOCK_UNIT);
}


Looks like swiotlb_device_allocate() is not called if the netvsc device
has only 1 primary channel and no sub-schannel, e.g. in the case of
single-vCPU VM?


When there is only sinlge,there seems not to be much performance
penalty. But you are right, we should keep the same behavior when single 
CPU and multi CPU. Will update in the next version.


Thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC PATCH V3 1/2] swiotlb: Add Child IO TLB mem support

2022-05-26 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. Swiotlb code allocates
bounce buffer among child IO tlb mem iterately.

Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer
from default pool for devices. IO TLB segment(256k) is too small for
device bounce buffer.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  38 +
 kernel/dma/swiotlb.c| 304 ++--
 2 files changed, 329 insertions(+), 13 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..a48a9d64e3c3 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -31,6 +31,14 @@ struct scatterlist;
 #define IO_TLB_SHIFT 11
 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT)
 
+/*
+ * IO TLB BLOCK UNIT as device bounce buffer allocation unit.
+ * This allows device allocates bounce buffer from default io
+ * tlb pool.
+ */
+#define IO_TLB_BLOCKSIZE   (8 * IO_TLB_SEGSIZE)
+#define IO_TLB_BLOCK_UNIT  (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT)
+
 /* default to 64MB */
 #define IO_TLB_DEFAULT_SIZE (64UL<<20)
 
@@ -89,6 +97,11 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @num_child:  The child io tlb mem number in the pool.
+ * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
+ * @child_nblock:The number of IO TLB block in the child IO TLB mem.
+ * @child_start:The child index to start searching in the next round.
+ * @block_start:The block index to start searching in the next round.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +115,16 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int num_child;
+   unsigned int child_nslot;
+   unsigned int child_nblock;
+   unsigned int child_start;
+   unsigned int block_index;
+   struct io_tlb_mem *child;
+   struct io_tlb_mem *parent;
+   struct io_tlb_block {
+   unsigned int list;
+   } *block;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -130,6 +153,10 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size);
+void swiotlb_device_free(struct device *dev);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -162,6 +189,17 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+void swiotlb_device_free(struct device *dev)
+{
+}
+
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size)
+{
+   return -ENOMEM;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e2ef0864eb1e..7ca22a5a1886 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -195,7 +195,8 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j;
+   unsigned int block_num = nslabs / IO_TLB_BLOCKSIZE;
 
mem->nslabs = nslabs;
mem->start = start;
@@ -207,7 +208,36 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->force_bounce = true;
 
spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
+
+   if (mem->num_child) {
+   mem->child_nslot = nslabs / mem->num_child;
+   mem->child_nblock = block_num / mem->num_child;
+   mem->child_start = 0;
+
+   /*
+* Initialize child IO TLB mem, divide IO TLB pool
+* into child number. Reuse parent mem->slot in the
+* child mem->slo

[RFC PATCH V3 2/2] net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc

2022-05-26 Thread Tianyu Lan
From: Tianyu Lan 

Netvsc driver allocates device io tlb mem via calling swiotlb_device_
allocate() and set child io tlb mem number according to device queue
number. Child io tlb mem may reduce overhead of single spin lock in
device io tlb mem among multi device queues.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/netvsc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 9442f751ad3a..26a8f8f84fc4 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -23,6 +23,7 @@
 
 #include 
 #include 
+#include 
 
 #include "hyperv_net.h"
 #include "netvsc_trace.h"
@@ -98,6 +99,7 @@ static void netvsc_subchan_work(struct work_struct *w)
struct netvsc_device *nvdev =
container_of(w, struct netvsc_device, subchan_work);
struct rndis_device *rdev;
+   struct hv_device *hdev;
int i, ret;
 
/* Avoid deadlock with device removal already under RTNL */
@@ -108,6 +110,9 @@ static void netvsc_subchan_work(struct work_struct *w)
 
rdev = nvdev->extension;
if (rdev) {
+   hdev = ((struct net_device_context *)
+   netdev_priv(rdev->ndev))->device_ctx;
+
ret = rndis_set_subchannel(rdev->ndev, nvdev, NULL);
if (ret == 0) {
netif_device_attach(rdev->ndev);
@@ -119,6 +124,10 @@ static void netvsc_subchan_work(struct work_struct *w)
nvdev->max_chn = 1;
nvdev->num_chn = 1;
}
+
+   /* Allocate boucne buffer.*/
+   swiotlb_device_allocate(>device, nvdev->num_chn,
+   10 * IO_TLB_BLOCK_UNIT);
}
 
rtnl_unlock();
@@ -769,6 +778,7 @@ void netvsc_device_remove(struct hv_device *device)
 
/* Release all resources */
free_netvsc_device_rcu(net_device);
+   swiotlb_device_free(>device);
 }
 
 #define RING_AVAIL_PERCENT_HIWATER 20
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V3 0/2] swiotlb: Add child io tlb mem support

2022-05-26 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. The number child IO tlb
mem maybe set up equal with device queue number and this helps to resolve
swiotlb spinlock overhead among devices and queues.

introduces IO TLB Block concepts and swiotlb_device_allocate()
API to allocate per-device swiotlb bounce buffer. The new API Accepts
queue number as the number of child IO TLB mem to set up device's IO
TLB mem.

Patch 2 calls new allocation function in the netvsc driver to resolve
global spin lock issue.

Tianyu Lan (2):
  swiotlb: Add Child IO TLB mem support
  net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc

 drivers/net/hyperv/netvsc.c |  10 ++
 include/linux/swiotlb.h |  38 +
 kernel/dma/swiotlb.c| 299 ++--
 3 files changed, 334 insertions(+), 13 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH V2 1/2] swiotlb: Add Child IO TLB mem support

2022-05-16 Thread Tianyu Lan

On 5/16/2022 3:34 PM, Christoph Hellwig wrote:

I don't really understand how 'childs' fit in here.  The code also
doesn't seem to be usable without patch 2 and a caller of the
new functions added in patch 2, so it is rather impossible to review.


Hi Christoph:
 OK. I will merge two patches and add a caller patch. The motivation
is to avoid global spin lock when devices use swiotlb bounce buffer and
this introduces overhead during high throughput cases. In my test
environment, current code can achieve about 24Gb/s network throughput
with SWIOTLB force enabled and it can achieve about 40Gb/s without
SWIOTLB force. Storage also has the same issue.
 Per-device IO TLB mem may resolve global spin lock issue among
devices but device still may have multi queues. Multi queues still need
to share one spin lock. This is why introduce child or IO tlb areas in
the previous patches. Each device queues will have separate child IO TLB
mem and single spin lock to manage their IO TLB buffers.
 Otherwise, global spin lock still cost cpu usage during high 
throughput even when there is performance regression. Each device queues 
needs to spin on the different cpus to acquire the global lock. Child IO

TLB mem also may resolve the cpu issue.



Also:

  1) why is SEV/TDX so different from other cases that need bounce
 buffering to treat it different and we can't work on a general
 scalability improvement


Other cases also have global spin lock issue but it depends on
whether hits the bottleneck. The cpu usage issue may be ignored.


  2) per previous discussions at how swiotlb itself works, it is
 clear that another option is to just make pages we DMA to
 shared with the hypervisor.  Why don't we try that at least
 for larger I/O?


For confidential VM(Both TDX and SEV), we need to use bounce
buffer to copy between private memory that hypervisor can't
access directly and shared memory. For security consideration,
confidential VM should not share IO stack DMA pages with
hypervisor directly to avoid attack from hypervisor when IO
stack handles the DMA data.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] swiotlb: Max mapping size takes min align mask into account

2022-05-10 Thread Tianyu Lan
From: Tianyu Lan 

swiotlb_find_slots() skips slots according to io tlb aligned mask
calculated from min aligned mask and original physical address
offset. This affects max mapping size. The mapping size can't
achieve the IO_TLB_SEGSIZE * IO_TLB_SIZE when original offset is
non-zero. This will cause system boot up failure in Hyper-V
Isolation VM where swiotlb force is enabled. Scsi layer use return
value of dma_max_mapping_size() to set max segment size and it
finally calls swiotlb_max_mapping_size(). Hyper-V storage driver
sets min align mask to 4k - 1. Scsi layer may pass 256k length of
request buffer with 0~4k offset and Hyper-V storage driver can't
get swiotlb bounce buffer via DMA API. Swiotlb_find_slots() can't
find 256k length bounce buffer with offset. Make swiotlb_max_mapping
_size() take min align mask into account.

Signed-off-by: Tianyu Lan 
---
 kernel/dma/swiotlb.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 73a41cec9e38..0d6684ca7eab 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -743,7 +743,18 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
paddr, size_t size,
 
 size_t swiotlb_max_mapping_size(struct device *dev)
 {
-   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
+   int min_align_mask = dma_get_min_align_mask(dev);
+   int min_align = 0;
+
+   /*
+* swiotlb_find_slots() skips slots according to
+* min align mask. This affects max mapping size.
+* Take it into acount here.
+*/
+   if (min_align_mask)
+   min_align = roundup(min_align_mask, IO_TLB_SIZE);
+
+   return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE - min_align;
 }
 
 bool is_swiotlb_active(struct device *dev)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH V2 0/2] swiotlb: Add child io tlb mem support

2022-05-09 Thread Tianyu Lan

On 5/2/2022 8:54 PM, Tianyu Lan wrote:

From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. The number child IO tlb
mem maybe set up equal with device queue number and this helps to resolve
swiotlb spinlock overhead among devices and queues.

Patch 2 introduces IO TLB Block concepts and swiotlb_device_allocate()
API to allocate per-device swiotlb bounce buffer. The new API Accepts
queue number as the number of child IO TLB mem to set up device's IO
TLB mem.


Gentile ping...

Thanks.


Tianyu Lan (2):
   swiotlb: Add Child IO TLB mem support
   Swiotlb: Add device bounce buffer allocation interface

  include/linux/swiotlb.h |  40 ++
  kernel/dma/swiotlb.c| 290 ++--
  2 files changed, 317 insertions(+), 13 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V2 1/2] swiotlb: Add Child IO TLB mem support

2022-05-02 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. Swiotlb code allocates
bounce buffer among child IO tlb mem iterately.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  7 +++
 kernel/dma/swiotlb.c| 97 -
 2 files changed, 94 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..4a3f6a7b4b7e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -89,6 +89,9 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
+ * @num_child:  The child io tlb mem number in the pool.
+ * @child_start:The child index to start searching in the next round.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +105,10 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int num_child;
+   unsigned int child_nslot;
+   unsigned int child_start;
+   struct io_tlb_mem *child;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e2ef0864eb1e..32e8f42530b6 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -207,6 +207,26 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->force_bounce = true;
 
spin_lock_init(>lock);
+
+   if (mem->num_child) {
+   mem->child_nslot = nslabs / mem->num_child;
+   mem->child_start = 0;
+
+   /*
+* Initialize child IO TLB mem, divide IO TLB pool
+* into child number. Reuse parent mem->slot in the
+* child mem->slot.
+*/
+   for (i = 0; i < mem->num_child; i++) {
+   mem->child[i].slots = mem->slots + i * mem->child_nslot;
+   mem->child[i].num_child = 0;
+
+   swiotlb_init_io_tlb_mem(>child[i],
+   start + ((i * mem->child_nslot) << 
IO_TLB_SHIFT),
+   mem->child_nslot, late_alloc);
+   }
+   }
+
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
@@ -336,16 +356,18 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
get_order(array_size(sizeof(*mem->slots), nslabs)));
-   if (!mem->slots) {
-   free_pages((unsigned long)vstart, order);
-   return -ENOMEM;
-   }
+   if (!mem->slots)
+   goto error_slots;
 
set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT);
swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
 
swiotlb_print_info();
return 0;
+
+error_slots:
+   free_pages((unsigned long)vstart, order);
+   return -ENOMEM;
 }
 
 void __init swiotlb_exit(void)
@@ -483,10 +505,11 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
- size_t alloc_size, unsigned int alloc_align_mask)
+static int swiotlb_do_find_slots(struct io_tlb_mem *mem,
+struct device *dev, phys_addr_t orig_addr,
+size_t alloc_size,
+unsigned int alloc_align_mask)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -565,6 +588,46 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
return index;
 }
 
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ 

[RFC PATCH V2 0/2] swiotlb: Add child io tlb mem support

2022-05-02 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. The number child IO tlb
mem maybe set up equal with device queue number and this helps to resolve
swiotlb spinlock overhead among devices and queues.

Patch 2 introduces IO TLB Block concepts and swiotlb_device_allocate()
API to allocate per-device swiotlb bounce buffer. The new API Accepts
queue number as the number of child IO TLB mem to set up device's IO
TLB mem.

Tianyu Lan (2):
  swiotlb: Add Child IO TLB mem support
  Swiotlb: Add device bounce buffer allocation interface

 include/linux/swiotlb.h |  40 ++
 kernel/dma/swiotlb.c| 290 ++--
 2 files changed, 317 insertions(+), 13 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V2 2/2] Swiotlb: Add device bounce buffer allocation interface

2022-05-02 Thread Tianyu Lan
From: Tianyu Lan 

In SEV/TDX Confidential VM, device DMA transaction needs use swiotlb
bounce buffer to share data with host/hypervisor. The swiotlb spinlock
introduces overhead among devices if they share io tlb mem. Avoid such
issue, introduce swiotlb_device_allocate() to allocate device bounce
buffer from default io tlb pool and set up child IO tlb mem for queue
bounce buffer allocaton according input queue number. Device may have
multi io queues and setting up the same number of child io tlb mem may
help to resolve spinlock overhead among queues.

Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer
from default pool for devices. IO TLB segment(256k) is too small.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  35 +++-
 kernel/dma/swiotlb.c| 195 +++-
 2 files changed, 225 insertions(+), 5 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 4a3f6a7b4b7e..efd29e884fd7 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -31,6 +31,14 @@ struct scatterlist;
 #define IO_TLB_SHIFT 11
 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT)
 
+/*
+ * IO TLB BLOCK UNIT as device bounce buffer allocation unit.
+ * This allows device allocates bounce buffer from default io
+ * tlb pool.
+ */
+#define IO_TLB_BLOCKSIZE   (8 * IO_TLB_SEGSIZE)
+#define IO_TLB_BLOCK_UNIT  (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT)
+
 /* default to 64MB */
 #define IO_TLB_DEFAULT_SIZE (64UL<<20)
 
@@ -89,9 +97,11 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
- * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
  * @num_child:  The child io tlb mem number in the pool.
+ * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
+ * @child_nblock:The number of IO TLB block in the child IO TLB mem.
  * @child_start:The child index to start searching in the next round.
+ * @block_start:The block index to start searching in the next round.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -107,8 +117,16 @@ struct io_tlb_mem {
bool for_alloc;
unsigned int num_child;
unsigned int child_nslot;
+   unsigned int child_nblock;
unsigned int child_start;
+   unsigned int block_index;
struct io_tlb_mem *child;
+   struct io_tlb_mem *parent;
+   struct io_tlb_block {
+   size_t alloc_size;
+   unsigned long start_slot;
+   unsigned int list;
+   } *block;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -137,6 +155,10 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size);
+void swiotlb_device_free(struct device *dev);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -169,6 +191,17 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+void swiotlb_device_free(struct device *dev)
+{
+}
+
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size)
+{
+   return -ENOMEM;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 32e8f42530b6..f8a0711cd9de 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -195,7 +195,8 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j;
+   unsigned int block_num = nslabs / IO_TLB_BLOCKSIZE;
 
mem->nslabs = nslabs;
mem->start = start;
@@ -210,6 +211,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
 
if (mem->num_child) {
mem->child_nslot = nslabs / mem->num_child;
+   mem->child_nblock = block_num / mem->num_child;
mem->child_start = 0;
 
/*
@@ -219,15 +221,24 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
 */
for (i = 0; i < mem->num_child; i++) {
mem->child[i].slots = mem->slots + i * mem->child_nslot;
-   mem->c

Re: [RFC PATCH] swiotlb: Add Child IO TLB mem support

2022-04-29 Thread Tianyu Lan

On 4/29/2022 10:21 PM, Tianyu Lan wrote:

From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. Swiotlb code allocates
bounce buffer among child IO tlb mem iterately.



Hi Robin and Christoph:
  According to Robin idea. I draft this patch. Please have a look 
and check whether it's right diection.


Thanks.


Signed-off-by: Tianyu Lan 
---
  include/linux/swiotlb.h |  7 +++
  kernel/dma/swiotlb.c| 96 -
  2 files changed, 93 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..4a3f6a7b4b7e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -89,6 +89,9 @@ extern enum swiotlb_force swiotlb_force;
   * @late_alloc:   %true if allocated using the page allocator
   * @force_bounce: %true if swiotlb bouncing is forced
   * @for_alloc:  %true if the pool is used for memory allocation
+ * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
+ * @num_child:  The child io tlb mem number in the pool.
+ * @child_start:The child index to start searching in the next round.
   */
  struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +105,10 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int num_child;
+   unsigned int child_nslot;
+   unsigned int child_start;
+   struct io_tlb_mem *child;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e2ef0864eb1e..382fa2288645 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -207,6 +207,25 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->force_bounce = true;
  
  	spin_lock_init(>lock);

+
+   if (mem->num_child) {
+   mem->child_nslot = nslabs / mem->num_child;
+   mem->child_start = 0;
+
+   /*
+* Initialize child IO TLB mem, divide IO TLB pool
+* into child number. Reuse parent mem->slot in the
+* child mem->slot.
+*/
+   for (i = 0; i < mem->num_child; i++) {
+   mem->num_child = 0;
+   mem->child[i].slots = mem->slots + i * mem->child_nslot;
+   swiotlb_init_io_tlb_mem(>child[i],
+   start + ((i * mem->child_nslot) << 
IO_TLB_SHIFT),
+   mem->child_nslot, late_alloc);
+   }
+   }
+
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
@@ -336,16 +355,18 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
  
  	mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,

get_order(array_size(sizeof(*mem->slots), nslabs)));
-   if (!mem->slots) {
-   free_pages((unsigned long)vstart, order);
-   return -ENOMEM;
-   }
+   if (!mem->slots)
+   goto error_slots;
  
  	set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT);

swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
  
  	swiotlb_print_info();

return 0;
+
+error_slots:
+   free_pages((unsigned long)vstart, order);
+   return -ENOMEM;
  }
  
  void __init swiotlb_exit(void)

@@ -483,10 +504,11 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
   * Find a suitable number of IO TLB entries size that will fit this request 
and
   * allocate a buffer from that IO TLB pool.
   */
-static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
- size_t alloc_size, unsigned int alloc_align_mask)
+static int swiotlb_do_find_slots(struct io_tlb_mem *mem,
+struct device *dev, phys_addr_t orig_addr,
+size_t alloc_size,
+unsigned int alloc_align_mask)
  {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -565,6 +587,46 @@ static int swiotl

[RFC PATCH] swiotlb: Add Child IO TLB mem support

2022-04-29 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. Swiotlb code allocates
bounce buffer among child IO tlb mem iterately.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  7 +++
 kernel/dma/swiotlb.c| 96 -
 2 files changed, 93 insertions(+), 10 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..4a3f6a7b4b7e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -89,6 +89,9 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
+ * @num_child:  The child io tlb mem number in the pool.
+ * @child_start:The child index to start searching in the next round.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +105,10 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int num_child;
+   unsigned int child_nslot;
+   unsigned int child_start;
+   struct io_tlb_mem *child;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e2ef0864eb1e..382fa2288645 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -207,6 +207,25 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->force_bounce = true;
 
spin_lock_init(>lock);
+
+   if (mem->num_child) {
+   mem->child_nslot = nslabs / mem->num_child;
+   mem->child_start = 0;
+
+   /*
+* Initialize child IO TLB mem, divide IO TLB pool
+* into child number. Reuse parent mem->slot in the
+* child mem->slot.  
+*/
+   for (i = 0; i < mem->num_child; i++) {
+   mem->num_child = 0;
+   mem->child[i].slots = mem->slots + i * mem->child_nslot;
+   swiotlb_init_io_tlb_mem(>child[i],
+   start + ((i * mem->child_nslot) << 
IO_TLB_SHIFT),
+   mem->child_nslot, late_alloc);
+   }
+   }
+
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
@@ -336,16 +355,18 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 
mem->slots = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
get_order(array_size(sizeof(*mem->slots), nslabs)));
-   if (!mem->slots) {
-   free_pages((unsigned long)vstart, order);
-   return -ENOMEM;
-   }
+   if (!mem->slots)
+   goto error_slots;
 
set_memory_decrypted((unsigned long)vstart, bytes >> PAGE_SHIFT);
swiotlb_init_io_tlb_mem(mem, virt_to_phys(vstart), nslabs, true);
 
swiotlb_print_info();
return 0;
+
+error_slots:
+   free_pages((unsigned long)vstart, order);
+   return -ENOMEM;
 }
 
 void __init swiotlb_exit(void)
@@ -483,10 +504,11 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
- size_t alloc_size, unsigned int alloc_align_mask)
+static int swiotlb_do_find_slots(struct io_tlb_mem *mem,
+struct device *dev, phys_addr_t orig_addr,
+size_t alloc_size,
+unsigned int alloc_align_mask)
 {
-   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -565,6 +587,46 @@ static int swiotlb_find_slots(struct device *dev, 
phys_addr_t orig_addr,
return index;
 }
 
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ size_t a

Re: [RFC PATCH 1/2] swiotlb: Split up single swiotlb lock

2022-04-28 Thread Tianyu Lan

On 4/28/2022 10:44 PM, Robin Murphy wrote:

On 2022-04-28 15:14, Tianyu Lan wrote:

From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb into individual areas which have their
own lock. When there are swiotlb map/allocate request, allocate
io tlb buffer from areas averagely and free the allocation back
to the associated area. This is to prepare to resolve the overhead
of single spinlock among device's queues. Per device may have its
own io tlb mem and bounce buffer pool.

This idea from Andi Kleen 
patch(https://github.com/intel/tdx/commit/4529b578

4c141782c72ec9bd9a92df2b68cb7d45). Rework it and make it may work
for individual device's io tlb mem. The device driver may determine
area number according to device queue number.


Rather than introduce this extra level of allocator complexity, how 
about just dividing up the initial SWIOTLB allocation into multiple 
io_tlb_mem instances?


Robin.


Agree. Thanks for suggestion. That will be more generic and will update
in the next version.

Thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/2] Swiotlb: Add device bounce buffer allocation interface

2022-04-28 Thread Tianyu Lan

On 4/28/2022 10:14 PM, Tianyu Lan wrote:

From: Tianyu Lan 

In SEV/TDX Confidential VM, device DMA transaction needs use swiotlb
bounce buffer to share data with host/hypervisor. The swiotlb spinlock
introduces overhead among devices if they share io tlb mem. Avoid such
issue, introduce swiotlb_device_allocate() to allocate device bounce
buffer from default io tlb pool and set up areas according input queue
number. Device may have multi io queues and setting up the same number
of io tlb area may help to resolve spinlock overhead among queues.

Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer
from default pool for devices. IO TLB segment(256k) is too small.


Hi Christoph and Robin Murphy:

From Christoph:
"Yeah.  We're almost done removing all knowledge of swiotlb from 
drivers, so the very last thing I want is an interface that allows a 
driver to allocate a per-device buffer."

Please have a look at this patch. This patch is to provide a API
to device driver to allocate per-device buffer. Just providing 
per-device bounce buffer is not enough. Device still may have multi queue.
The single io tlb mem just has one spin lock in current code and this 
will introuduce overhead among queues DMA transaction. So the new API 
requests queues number as the IO TLB area number and this is why we 
still need to creat area in the IO Tlb mem.

   This new API is the one mentioned in the Christoph's comment.








Signed-off-by: Tianyu Lan 
---
  include/linux/swiotlb.h |  33 
  kernel/dma/swiotlb.c| 173 +++-
  2 files changed, 203 insertions(+), 3 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 489c249da434..380bd1ce3d0f 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -31,6 +31,14 @@ struct scatterlist;
  #define IO_TLB_SHIFT 11
  #define IO_TLB_SIZE (1 << IO_TLB_SHIFT)
  
+/*

+ * IO TLB BLOCK UNIT as device bounce buffer allocation unit.
+ * This allows device allocates bounce buffer from default io
+ * tlb pool.
+ */
+#define IO_TLB_BLOCKSIZE   (8 * IO_TLB_SEGSIZE)
+#define IO_TLB_BLOCK_UNIT  (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT)
+
  /* default to 64MB */
  #define IO_TLB_DEFAULT_SIZE (64UL<<20)
  
@@ -72,11 +80,13 @@ extern enum swiotlb_force swiotlb_force;

   * @index:The slot index to start searching in this area for next round.
   * @lock: The lock to protect the above data structures in the map and
   *unmap calls.
+ * @block_index: The block index to start earching in this area for next round.
   */
  struct io_tlb_area {
unsigned long used;
unsigned int area_index;
unsigned int index;
+   unsigned int block_index;
spinlock_t lock;
  };
  
@@ -110,6 +120,7 @@ struct io_tlb_area {

   * @num_areas:  The area number in the pool.
   * @area_start: The area index to start searching in the next round.
   * @area_nslabs: The slot number in the area.
+ * @areas_block_number: The block number in the area.
   */
  struct io_tlb_mem {
phys_addr_t start;
@@ -126,7 +137,14 @@ struct io_tlb_mem {
unsigned int num_areas;
unsigned int area_start;
unsigned int area_nslabs;
+   unsigned int area_block_number;
+   struct io_tlb_mem *parent;
struct io_tlb_area *areas;
+   struct io_tlb_block {
+   size_t alloc_size;
+   unsigned long start_slot;
+   unsigned int list;
+   } *block;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -155,6 +173,10 @@ unsigned int swiotlb_max_segment(void);
  size_t swiotlb_max_mapping_size(struct device *dev);
  bool is_swiotlb_active(struct device *dev);
  void __init swiotlb_adjust_size(unsigned long size);
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size);
+void swiotlb_device_free(struct device *dev);
  #else
  static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
  {
@@ -187,6 +209,17 @@ static inline bool is_swiotlb_active(struct device *dev)
  static inline void swiotlb_adjust_size(unsigned long size)
  {
  }
+
+void swiotlb_device_free(struct device *dev)
+{
+}
+
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size)
+{
+   return -ENOMEM;
+}
  #endif /* CONFIG_SWIOTLB */
  
  extern void swiotlb_print_info(void);

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 00a16f540f20..7b95a140694a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -218,7 +218,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
unsigned long nslabs, bool late_alloc)
  {
void *vaddr = phys_to_virt(start);
-   unsigned long bytes

[RFC PATCH 2/2] Swiotlb: Add device bounce buffer allocation interface

2022-04-28 Thread Tianyu Lan
From: Tianyu Lan 

In SEV/TDX Confidential VM, device DMA transaction needs use swiotlb
bounce buffer to share data with host/hypervisor. The swiotlb spinlock
introduces overhead among devices if they share io tlb mem. Avoid such
issue, introduce swiotlb_device_allocate() to allocate device bounce
buffer from default io tlb pool and set up areas according input queue
number. Device may have multi io queues and setting up the same number
of io tlb area may help to resolve spinlock overhead among queues.

Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer
from default pool for devices. IO TLB segment(256k) is too small.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  33 
 kernel/dma/swiotlb.c| 173 +++-
 2 files changed, 203 insertions(+), 3 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 489c249da434..380bd1ce3d0f 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -31,6 +31,14 @@ struct scatterlist;
 #define IO_TLB_SHIFT 11
 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT)
 
+/*
+ * IO TLB BLOCK UNIT as device bounce buffer allocation unit.
+ * This allows device allocates bounce buffer from default io
+ * tlb pool.
+ */
+#define IO_TLB_BLOCKSIZE   (8 * IO_TLB_SEGSIZE)
+#define IO_TLB_BLOCK_UNIT  (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT)
+
 /* default to 64MB */
 #define IO_TLB_DEFAULT_SIZE (64UL<<20)
 
@@ -72,11 +80,13 @@ extern enum swiotlb_force swiotlb_force;
  * @index: The slot index to start searching in this area for next round.
  * @lock:  The lock to protect the above data structures in the map and
  * unmap calls.
+ * @block_index: The block index to start earching in this area for next round.
  */
 struct io_tlb_area {
unsigned long used;
unsigned int area_index;
unsigned int index;
+   unsigned int block_index;
spinlock_t lock;
 };
 
@@ -110,6 +120,7 @@ struct io_tlb_area {
  * @num_areas:  The area number in the pool.
  * @area_start: The area index to start searching in the next round.
  * @area_nslabs: The slot number in the area.
+ * @areas_block_number: The block number in the area.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -126,7 +137,14 @@ struct io_tlb_mem {
unsigned int num_areas;
unsigned int area_start;
unsigned int area_nslabs;
+   unsigned int area_block_number;
+   struct io_tlb_mem *parent;
struct io_tlb_area *areas;
+   struct io_tlb_block {
+   size_t alloc_size;
+   unsigned long start_slot;
+   unsigned int list;
+   } *block;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -155,6 +173,10 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size);
+void swiotlb_device_free(struct device *dev);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -187,6 +209,17 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+void swiotlb_device_free(struct device *dev)
+{
+}
+
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size)
+{
+   return -ENOMEM;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 00a16f540f20..7b95a140694a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -218,7 +218,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j, k;
unsigned int block_list;
 
mem->nslabs = nslabs;
@@ -226,6 +226,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
mem->end = mem->start + bytes;
mem->index = 0;
mem->late_alloc = late_alloc;
+   mem->area_block_number = nslabs / (IO_TLB_BLOCKSIZE * mem->num_areas);
 
if (swiotlb_force_bounce)
mem->force_bounce = true;
@@ -233,10 +234,18 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
for (i = 0, j = 0, k = 0; i < mem->nslabs; i++) {
if (!(i % mem->area_nslabs)) {
mem->areas[j].index = 0;
+   mem->area

[RFC PATCH 1/2] swiotlb: Split up single swiotlb lock

2022-04-28 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significat lock contention on the swiotlb lock.

This patch splits the swiotlb into individual areas which have their
own lock. When there are swiotlb map/allocate request, allocate
io tlb buffer from areas averagely and free the allocation back
to the associated area. This is to prepare to resolve the overhead
of single spinlock among device's queues. Per device may have its
own io tlb mem and bounce buffer pool.

This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/4529b578
4c141782c72ec9bd9a92df2b68cb7d45). Rework it and make it may work
for individual device's io tlb mem. The device driver may determine
area number according to device queue number.

Based-on-idea-by: Andi Kleen 
Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  25 ++
 kernel/dma/swiotlb.c| 173 +++-
 2 files changed, 162 insertions(+), 36 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..489c249da434 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -62,6 +62,24 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
 #ifdef CONFIG_SWIOTLB
 extern enum swiotlb_force swiotlb_force;
 
+/**
+ * struct io_tlb_area - IO TLB memory area descriptor
+ *
+ * This is a single area with a single lock.
+ *
+ * @used:  The number of used IO TLB block.
+ * @area_index: The index of to tlb area.
+ * @index: The slot index to start searching in this area for next round.
+ * @lock:  The lock to protect the above data structures in the map and
+ * unmap calls.
+ */
+struct io_tlb_area {
+   unsigned long used;
+   unsigned int area_index;
+   unsigned int index;
+   spinlock_t lock;
+};
+
 /**
  * struct io_tlb_mem - IO TLB Memory Pool Descriptor
  *
@@ -89,6 +107,9 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @num_areas:  The area number in the pool.
+ * @area_start: The area index to start searching in the next round.
+ * @area_nslabs: The slot number in the area.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +123,10 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int num_areas;
+   unsigned int area_start;
+   unsigned int area_nslabs;
+   struct io_tlb_area *areas;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e2ef0864eb1e..00a16f540f20 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -62,6 +62,8 @@
 
 #define INVALID_PHYS_ADDR (~(phys_addr_t)0)
 
+#define NUM_AREAS_DEFAULT 1
+
 static bool swiotlb_force_bounce;
 static bool swiotlb_force_disable;
 
@@ -70,6 +72,25 @@ struct io_tlb_mem io_tlb_default_mem;
 phys_addr_t swiotlb_unencrypted_base;
 
 static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
+static unsigned long default_area_num = NUM_AREAS_DEFAULT;
+
+static int swiotlb_setup_areas(struct io_tlb_mem *mem,
+   unsigned int num_areas, unsigned long nslabs)
+{
+   if (nslabs < 1 || !is_power_of_2(num_areas)) {
+   pr_err("swiotlb: Invalid areas parameter %d.\n", num_areas);
+   return -EINVAL;
+   }
+
+   /* Round up number of slabs to the next power of 2.
+* The last area is going be smaller than the rest if default_nslabs is
+* not power of two.
+*/
+   mem->area_start = 0;
+   mem->num_areas = num_areas;
+   mem->area_nslabs = nslabs / num_areas;
+   return 0;
+}
 
 static int __init
 setup_io_tlb_npages(char *str)
@@ -114,6 +135,8 @@ void __init swiotlb_adjust_size(unsigned long size)
return;
size = ALIGN(size, IO_TLB_SIZE);
default_nslabs = ALIGN(size >> IO_TLB_SHIFT, IO_TLB_SEGSIZE);
+   swiotlb_setup_areas(_tlb_default_mem, default_area_num,
+   default_nslabs);
pr_info("SWIOTLB bounce buffer size adjusted to %luMB", size >> 20);
 }
 
@@ -195,7 +218,8 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j;
+   unsigned int block_list;
 
mem-&

[RFC PATCH 0/2] swiotlb: Introduce swiotlb device allocation function

2022-04-28 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patchset splits the swiotlb into individual areas which have their
own lock. When there are swiotlb map/allocate request, allocate io tlb
buffer from areas averagely and free the allocation back to the associated
area.

Patch 2 introduces an helper function to allocate bounce buffer
from default IO tlb pool for devices with new IO TLB block unit
and set up IO TLB area for device queues to avoid spinlock overhead.
The area number is set by device driver according queue number.

The network test between traditional VM and Confidential VM.
The throughput improves from ~20Gb/s to ~34Gb/s  with this patchset.

Tianyu Lan (2):
  swiotlb: Split up single swiotlb lock
  Swiotlb: Add device bounce buffer allocation interface

 include/linux/swiotlb.h |  58 +++
 kernel/dma/swiotlb.c| 340 +++-
 2 files changed, 362 insertions(+), 36 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-03-01 Thread Tianyu Lan

On 3/1/2022 7:53 PM, Christoph Hellwig wrote:

On Fri, Feb 25, 2022 at 10:28:54PM +0800, Tianyu Lan wrote:

  One more perspective is that one device may have multiple queues and
each queues should have independent swiotlb bounce buffer to avoid spin
lock overhead. The number of queues is only available in the device
driver. This means new API needs to be called in the device driver
according to queue number.


Well, given how hell bent people are on bounce buffering we might
need some scalability work there anyway.


According to my test on the local machine with two VMs, Linux guest 
without swiotlb bounce buffer or with the fix patch from Andi Kleen can
achieve about 40G/s throughput but it's just 24-25G/s with current 
swiotlb code. Otherwise, the spinlock contention also consumes more cpu 
usage.






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-02-25 Thread Tianyu Lan

On 2/23/2022 5:46 PM, Tianyu Lan wrote:



On 2/23/2022 12:00 AM, Christoph Hellwig wrote:

On Tue, Feb 22, 2022 at 11:07:19PM +0800, Tianyu Lan wrote:

Thanks for your comment. That means we need to expose an
swiotlb_device_init() interface to allocate bounce buffer and initialize
io tlb mem entry. DMA API Current  rmem_swiotlb_device_init() only works
for platform with device tree. The new API should be called in the bus
driver or new DMA API. Could you check whether this is a right way 
before

we start the work.


Do these VMs use ACPI?  We'd probably really want some kind of higher
level configuration and not have the drivers request it themselves.


Yes, Hyper-V isolation VM uses ACPI. Devices are enumerated via vmbus 
host and there is no child device information in ACPI table. The host 
driver seems to be the right place to call new API.


Hi Christoph:
 One more perspective is that one device may have multiple queues 
and each queues should have independent swiotlb bounce buffer to avoid 
spin lock overhead. The number of queues is only available in the device
driver. This means new API needs to be called in the device driver 
according to queue number.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-02-23 Thread Tianyu Lan




On 2/23/2022 12:00 AM, Christoph Hellwig wrote:

On Tue, Feb 22, 2022 at 11:07:19PM +0800, Tianyu Lan wrote:

Thanks for your comment. That means we need to expose an
swiotlb_device_init() interface to allocate bounce buffer and initialize
io tlb mem entry. DMA API Current  rmem_swiotlb_device_init() only works
for platform with device tree. The new API should be called in the bus
driver or new DMA API. Could you check whether this is a right way before
we start the work.


Do these VMs use ACPI?  We'd probably really want some kind of higher
level configuration and not have the drivers request it themselves.


Yes, Hyper-V isolation VM uses ACPI. Devices are enumerated via vmbus 
host and there is no child device information in ACPI table. The host 
driver seems to be the right place to call new API.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-02-22 Thread Tianyu Lan




On 2/22/2022 4:05 PM, Christoph Hellwig wrote:

On Mon, Feb 21, 2022 at 11:14:58PM +0800, Tianyu Lan wrote:

Sorry. The boot failure is not related with these patches and the issue
has been fixed in the latest upstream code.

There is a performance bottleneck due to io tlb mem's spin lock during
performance test. All devices'io queues uses same io tlb mem entry
and the spin lock of io tlb mem introduce overheads. There is a fix patch
from Andi Kleen in the github. Could you have a look?

https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45


Please post these things to the list.

But I suspect the right answer for the "secure" hypervisor case is to
use the per-device swiotlb regions that we've recently added.


Thanks for your comment. That means we need to expose an 
swiotlb_device_init() interface to allocate bounce buffer and initialize

io tlb mem entry. DMA API Current  rmem_swiotlb_device_init() only works
for platform with device tree. The new API should be called in the bus
driver or new DMA API. Could you check whether this is a right way 
before we start the work.


Thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-02-21 Thread Tianyu Lan

On 2/15/2022 11:32 PM, Tianyu Lan wrote:

On 2/14/2022 9:58 PM, Christoph Hellwig wrote:

On Mon, Feb 14, 2022 at 07:28:40PM +0800, Tianyu Lan wrote:

On 2/14/2022 4:19 PM, Christoph Hellwig wrote:

Adding a function to set the flag doesn't really change much.  As Robin
pointed out last time you should fine a way to just call
swiotlb_init_with_tbl directly with the memory allocated the way you
like it.  Or given that we have quite a few of these trusted hypervisor
schemes maybe add an argument to swiotlb_init that specifies how to
allocate the memory.


Thanks for your suggestion. I will try the first approach first 
approach.


Take a look at the SWIOTLB_ANY flag in this WIP branch:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-init-cleanup 



That being said I'm not sure that either this flag or the existing 
powerpc

code iѕ actually the right thing to do.  We still need the 4G limited
buffer to support devices with addressing limitations.  So I think we 
need

an additional io_tlb_mem instance for the devices without addressing
limitations instead.



Hi Christoph:
  Thanks for your patches. I tested these patches in Hyper-V trusted 
VM and system can't boot up. I am debugging and will report back.


Sorry. The boot failure is not related with these patches and the issue
has been fixed in the latest upstream code.

There is a performance bottleneck due to io tlb mem's spin lock during
performance test. All devices'io queues uses same io tlb mem entry
and the spin lock of io tlb mem introduce overheads. There is a fix 
patch from Andi Kleen in the github. Could you have a look?


https://github.com/intel/tdx/commit/4529b5784c141782c72ec9bd9a92df2b68cb7d45

Thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-02-15 Thread Tianyu Lan

On 2/14/2022 9:58 PM, Christoph Hellwig wrote:

On Mon, Feb 14, 2022 at 07:28:40PM +0800, Tianyu Lan wrote:

On 2/14/2022 4:19 PM, Christoph Hellwig wrote:

Adding a function to set the flag doesn't really change much.  As Robin
pointed out last time you should fine a way to just call
swiotlb_init_with_tbl directly with the memory allocated the way you
like it.  Or given that we have quite a few of these trusted hypervisor
schemes maybe add an argument to swiotlb_init that specifies how to
allocate the memory.


Thanks for your suggestion. I will try the first approach first approach.


Take a look at the SWIOTLB_ANY flag in this WIP branch:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/swiotlb-init-cleanup

That being said I'm not sure that either this flag or the existing powerpc
code iѕ actually the right thing to do.  We still need the 4G limited
buffer to support devices with addressing limitations.  So I think we need
an additional io_tlb_mem instance for the devices without addressing
limitations instead.



Hi Christoph:
 Thanks for your patches. I tested these patches in Hyper-V trusted 
VM and system can't boot up. I am debugging and will report back.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-02-14 Thread Tianyu Lan

On 2/14/2022 4:19 PM, Christoph Hellwig wrote:

Adding a function to set the flag doesn't really change much.  As Robin
pointed out last time you should fine a way to just call
swiotlb_init_with_tbl directly with the memory allocated the way you
like it.  Or given that we have quite a few of these trusted hypervisor
schemes maybe add an argument to swiotlb_init that specifies how to
allocate the memory.


Thanks for your suggestion. I will try the first approach first approach.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 2/2] x86/hyperv: Make swiotlb bounce buffer allocation not just from low pages

2022-02-09 Thread Tianyu Lan
From: Tianyu Lan 

In Hyper-V Isolation VM, swiotlb bnounce buffer size maybe 1G at most
and there maybe no enough memory from 0 to 4G according to memory layout.
Devices in Isolation VM can use memory above 4G as DMA memory and call
swiotlb_alloc_from_low_pages() to allocate swiotlb bounce buffer not
limit from 0 to 4G.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cpu/mshyperv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 5a99f993e639..50ba4622c650 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -343,6 +343,7 @@ static void __init ms_hyperv_init_platform(void)
 * use swiotlb bounce buffer for dma transaction.
 */
swiotlb_force = SWIOTLB_FORCE;
+   swiotlb_set_alloc_from_low_pages(false);
 #endif
}
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-02-09 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM and AMD SEV VM uses swiotlb bounce buffer to
share memory with hypervisor. Current swiotlb bounce buffer is only
allocated from 0 to ARCH_LOW_ADDRESS_LIMIT which is default to
0xUL. Isolation VM and AMD SEV VM needs 1G bounce buffer at most.
This will fail when there is not enough memory from 0 to 4G address
space and devices also may use memory above 4G address space as DMA memory.
Expose swiotlb_alloc_from_low_pages and platform mey set it to false when
it's not necessary to limit bounce buffer from 0 to 4G memory.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  1 +
 kernel/dma/swiotlb.c| 18 --
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f6c3638255d5..2b4f92668bc7 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -39,6 +39,7 @@ enum swiotlb_force {
 extern void swiotlb_init(int verbose);
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 unsigned long swiotlb_size_or_default(void);
+void swiotlb_set_alloc_from_low_pages(bool low);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 extern int swiotlb_late_init_with_default_size(size_t default_size);
 extern void __init swiotlb_update_mem_attributes(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f1e7ea160b43..62bf8b5cc3e4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -73,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+static bool swiotlb_alloc_from_low_pages = true;
+
 phys_addr_t swiotlb_unencrypted_base;
 
 /*
@@ -116,6 +118,11 @@ void swiotlb_set_max_segment(unsigned int val)
max_segment = rounddown(val, PAGE_SIZE);
 }
 
+void swiotlb_set_alloc_from_low_pages(bool low)
+{
+   swiotlb_alloc_from_low_pages = low;
+}
+
 unsigned long swiotlb_size_or_default(void)
 {
return default_nslabs << IO_TLB_SHIFT;
@@ -284,8 +291,15 @@ swiotlb_init(int verbose)
if (swiotlb_force == SWIOTLB_NO_FORCE)
return;
 
-   /* Get IO TLB memory from the low pages */
-   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
+   /*
+* Get IO TLB memory from the low pages if swiotlb_alloc_from_low_pages
+* is set.
+*/
+   if (swiotlb_alloc_from_low_pages)
+   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
+   else
+   tlb = memblock_alloc(bytes, PAGE_SIZE);
+
if (!tlb)
goto fail;
if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose))
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 0/2] x86/hyperv/Swiotlb: Add swiotlb_set_alloc_from_low_pages() switch function.

2022-02-09 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM may fail to allocate swiotlb bounce buffer due
to there is no enough contiguous memory from 0 to 4G in some cases.
Current swiotlb code allocates bounce buffer in the low end memory.
This patchset adds a new function swiotlb_set_alloc_from_low_pages()
to control swiotlb bounce buffer from low pages or no limitation.
Devices in Hyper-V Isolation VM may use memory above 4G as DMA memory
and switch swiotlb allocation in order to avoid no enough contiguous
memory in low pages.

Tianyu Lan (2):
  Swiotlb: Add swiotlb_alloc_from_low_pages switch
  x86/hyperv: Make swiotlb bounce buffer allocation not just from low
pages

 arch/x86/kernel/cpu/mshyperv.c |  1 +
 include/linux/swiotlb.h|  1 +
 kernel/dma/swiotlb.c   | 18 --
 3 files changed, 18 insertions(+), 2 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Netvsc: Call hv_unmap_memory() in the netvsc_device_remove()

2022-02-08 Thread Tianyu Lan

On 2/3/2022 1:05 AM, Michael Kelley (LINUX) wrote:

From: Tianyu Lan  Sent: Tuesday, February 1, 2022 8:32 AM

netvsc_device_remove() calls vunmap() inside which should not be
called in the interrupt context. Current code calls hv_unmap_memory()
in the free_netvsc_device() which is rcu callback and maybe called
in the interrupt context. This will trigger BUG_ON(in_interrupt())
in the vunmap(). Fix it via moving hv_unmap_memory() to netvsc_device_
remove().

I think this change can fail to call hv_unmap_memory() in an error case.

If netvsc_init_buf() fails after hv_map_memory() succeeds for the receive
buffer or for the send buffer, no corresponding hv_unmap_memory() will
be done.  The failure in netvsc_init_buf() will cause netvsc_connect_vsp()
to fail, so netvsc_add_device() will "goto close" where free_netvsc_device()
will be called.  But free_netvsc_device() no longer calls hv_unmap_memory(),
so it won't ever happen.   netvsc_device_remove() is never called in this case
because netvsc_add_device() failed.



Hi Michael:
  Thanks for your review. Nice catch and will fix in the next
version.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] x86/hyperv/Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-02-02 Thread Tianyu Lan

On 2/2/2022 4:12 PM, Christoph Hellwig wrote:

I think this interface is a little too hacky.  In the end all the
non-trusted hypervisor schemes (including the per-device swiotlb one)
can allocate the memory from everywhere and want for force use of
swiotlb.  I think we need some kind of proper interface for that instead
of setting all kinds of global variables.


Hi Christoph:
 Thanks for your review. I draft the following patch to export a
interface swiotlb_set_alloc_from_low_pages(). Could you have a look
whether this looks good for you.

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f6c3638255d5..2b4f92668bc7 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -39,6 +39,7 @@ enum swiotlb_force {
 extern void swiotlb_init(int verbose);
 int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
 unsigned long swiotlb_size_or_default(void);
+void swiotlb_set_alloc_from_low_pages(bool low);
 extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
 extern int swiotlb_late_init_with_default_size(size_t default_size);
 extern void __init swiotlb_update_mem_attributes(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f1e7ea160b43..62bf8b5cc3e4 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -73,6 +73,8 @@ enum swiotlb_force swiotlb_force;

 struct io_tlb_mem io_tlb_default_mem;

+static bool swiotlb_alloc_from_low_pages = true;
+
 phys_addr_t swiotlb_unencrypted_base;

 /*
@@ -116,6 +118,11 @@ void swiotlb_set_max_segment(unsigned int val)
max_segment = rounddown(val, PAGE_SIZE);
 }

+void swiotlb_set_alloc_from_low_pages(bool low)
+{
+   swiotlb_alloc_from_low_pages = low;
+}
+
 unsigned long swiotlb_size_or_default(void)
 {
return default_nslabs << IO_TLB_SHIFT;
@@ -284,8 +291,15 @@ swiotlb_init(int verbose)
if (swiotlb_force == SWIOTLB_NO_FORCE)
return;

-   /* Get IO TLB memory from the low pages */
-   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
+   /*
+* Get IO TLB memory from the low pages if 
swiotlb_alloc_from_low_pages

+* is set.
+*/
+   if (swiotlb_alloc_from_low_pages)
+   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
+   else
+   tlb = memblock_alloc(bytes, PAGE_SIZE);
+
if (!tlb)
goto fail;
if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose))


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] Netvsc: Call hv_unmap_memory() in the netvsc_device_remove()

2022-02-01 Thread Tianyu Lan
From: Tianyu Lan 

netvsc_device_remove() calls vunmap() inside which should not be
called in the interrupt context. Current code calls hv_unmap_memory()
in the free_netvsc_device() which is rcu callback and maybe called
in the interrupt context. This will trigger BUG_ON(in_interrupt())
in the vunmap(). Fix it via moving hv_unmap_memory() to netvsc_device_
remove().

Fixes: 846da38de0e8 ("net: netvsc: Add Isolation VM support for netvsc driver")
Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/netvsc.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index afa81a9480cc..f989f920d4ce 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -154,19 +154,15 @@ static void free_netvsc_device(struct rcu_head *head)
 
kfree(nvdev->extension);
 
-   if (nvdev->recv_original_buf) {
-   hv_unmap_memory(nvdev->recv_buf);
+   if (nvdev->recv_original_buf)
vfree(nvdev->recv_original_buf);
-   } else {
+   else
vfree(nvdev->recv_buf);
-   }
 
-   if (nvdev->send_original_buf) {
-   hv_unmap_memory(nvdev->send_buf);
+   if (nvdev->send_original_buf)
vfree(nvdev->send_original_buf);
-   } else {
+   else
vfree(nvdev->send_buf);
-   }
 
bitmap_free(nvdev->send_section_map);
 
@@ -765,6 +761,12 @@ void netvsc_device_remove(struct hv_device *device)
netvsc_teardown_send_gpadl(device, net_device, ndev);
}
 
+   if (net_device->recv_original_buf)
+   hv_unmap_memory(net_device->recv_buf);
+
+   if (net_device->send_original_buf)
+   hv_unmap_memory(net_device->send_buf);
+
/* Release all resources */
free_netvsc_device_rcu(net_device);
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] x86/hyperv: Set swiotlb_alloc_from_low_pages to false

2022-01-26 Thread Tianyu Lan
From: Tianyu Lan 

In Hyper-V Isolation VM, swiotlb bnounce buffer size maybe 1G at most
and there maybe no enough memory from 0 to 4G according to memory layout.
Devices in Isolation VM can use memory above 4G as DMA memory. Set swiotlb_
alloc_from_low_pages to false in Isolation VM.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cpu/mshyperv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 5a99f993e639..80a0423ac75d 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -343,6 +343,7 @@ static void __init ms_hyperv_init_platform(void)
 * use swiotlb bounce buffer for dma transaction.
 */
swiotlb_force = SWIOTLB_FORCE;
+   swiotlb_alloc_from_low_pages = false;
 #endif
}
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-01-26 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM and AMD SEV VM uses swiotlb bounce buffer to
share memory with hypervisor. Current swiotlb bounce buffer is only
allocated from 0 to ARCH_LOW_ADDRESS_LIMIT which is default to
0xUL. Isolation VM and AMD SEV VM needs 1G bounce buffer at most.
This will fail when there is not enough contiguous memory from 0 to 4G
address space and devices also may use memory above 4G address space as
DMA memory. Expose swiotlb_alloc_from_low_pages and platform mey set it
to false when it's not necessary to limit bounce buffer from 0 to 4G memory.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  1 +
 kernel/dma/swiotlb.c| 13 +++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f6c3638255d5..55c178e8eee0 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -191,5 +191,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
 extern phys_addr_t swiotlb_unencrypted_base;
+extern bool swiotlb_alloc_from_low_pages;
 
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f1e7ea160b43..159fef80f3db 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -73,6 +73,12 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+/*
+ * Get IO TLB memory from the low pages if swiotlb_alloc_from_low_pages
+ * is set.
+ */
+bool swiotlb_alloc_from_low_pages = true;
+
 phys_addr_t swiotlb_unencrypted_base;
 
 /*
@@ -284,8 +290,11 @@ swiotlb_init(int verbose)
if (swiotlb_force == SWIOTLB_NO_FORCE)
return;
 
-   /* Get IO TLB memory from the low pages */
-   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
+   if (swiotlb_alloc_from_low_pages)
+   tlb = memblock_alloc_low(bytes, PAGE_SIZE);
+   else
+   tlb = memblock_alloc(bytes, PAGE_SIZE);
+
if (!tlb)
goto fail;
if (swiotlb_init_with_tbl(tlb, default_nslabs, verbose))
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] x86/hyperv/Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-01-26 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM may fail to allocate swiotlb bounce buffer due
to there is no enough contiguous memory from 0 to 4G in some cases.
Current swiotlb code allocate bounce buffer in the low end memory.
This patchset adds a switch "swiotlb_alloc_from_low_pages" and it's
set to true by default. Platform may clear it if necessary. Devices
in Hyper-V Isolation VM may use memory above 4G as DMA memory and set
the switch to false in order to avoid no enough contiguous memory in
low end address space.

Tianyu Lan (2):
  Swiotlb: Add swiotlb_alloc_from_low_pages switch
  x86/hyperv: Set swiotlb_alloc_from_low_pages to false

 arch/x86/kernel/cpu/mshyperv.c |  1 +
 include/linux/swiotlb.h|  1 +
 kernel/dma/swiotlb.c   | 13 +++--
 3 files changed, 13 insertions(+), 2 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] Swiotlb: Add CONFIG_HAS_IOMEM check around memremap() in the swiotlb_mem_remap()

2021-12-31 Thread Tianyu Lan
From: Tianyu Lan 

HAS_IOMEM option may not be selected on some platforms(e.g, s390) and this
will cause compile error due to miss memremap() implementation. Fix it via
adding HAS_IOMEM check around memremap() in the swiotlb.c.

Reported-by: kernel test robot 
Signed-off-by: Tianyu Lan 
---
 kernel/dma/swiotlb.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b36c1cdd0c4f..3de651ba38cc 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -167,6 +167,7 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, 
unsigned long bytes)
 {
void *vaddr = NULL;
 
+#ifdef CONFIG_HAS_IOMEM
if (swiotlb_unencrypted_base) {
phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
 
@@ -175,6 +176,7 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, 
unsigned long bytes)
pr_err("Failed to map the unencrypted memory %pa size 
%lx.\n",
   , bytes);
}
+#endif
 
return vaddr;
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

2021-12-14 Thread Tianyu Lan




On 12/15/2021 6:40 AM, Dave Hansen wrote:

On 12/14/21 2:23 PM, Tom Lendacky wrote:

I don't really understand how this can be more general any *not* get
utilized by the existing SEV support.


The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is
meant to be used with a (relatively) un-enlightened guest. The idea is
that the C-bit in the guest page tables must be 0 for all accesses. It
is only the physical address relative to VTOM that determines if the
access is encrypted or not. So setting sme_me_mask will actually cause
issues when running with this feature. Since all DMA for an SEV-SNP
guest must still be to shared (unencrypted) memory, some enlightenment
is needed. In this case, memory mapped above VTOM will provide that via
the SWIOTLB update. For SEV-SNP guests running with VTOM, they are
likely to also be running with the Reflect #VC feature, allowing a
"paravisor" to handle any #VCs generated by the guest.

See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC"
in volume 2 of the AMD APM [1].


Thanks, Tom, that's pretty much what I was looking for.

The C-bit normally comes from the page tables.  But, the hardware also
provides an alternative way to effectively get C-bit behavior without
actually setting the bit in the page tables: Virtual Top-of-Memory
(VTOM).  Right?

It sounds like Hyper-V has chosen to use VTOM instead of requiring the
guest to do the C-bit in its page tables.

But, the thing that confuses me is when you said: "it (VTOM) is meant to
be used with a (relatively) un-enlightened guest".  We don't have an
unenlightened guest here.  We have Linux, which is quite enlightened.


Is VTOM being used because there's something that completely rules out
using the C-bit in the page tables?  What's that "something"?



For "un-enlightened" guest, there is an another system running insider
the VM to emulate some functions(tsc, timer, interrupt and so on) and
this helps not to modify OS(Linux/Windows) a lot. In Hyper-V Isolation
VM, we called the new system as HCL/paravisor. HCL runs in the VMPL0 and 
Linux runs in VMPL2. This is similar with nested virtualization. HCL

plays similar role as L1 hypervisor to emulate some general functions
(e.g, rdmsr/wrmsr accessing and interrupt injection) which needs to be
enlightened in the enlightened guest. Linux kernel needs to handle
#vc/#ve exception directly in the enlightened guest. HCL handles such
exception in un-enlightened guest and emulate interrupt injection which
helps not to modify OS core part code. Using vTOM also is same purpose.
Hyper-V uses vTOM avoid changing page table related code in OS(include
Windows and Linux)and just needs to map memory into decrypted address
space above vTOM in the driver code.

Linux has generic swiotlb bounce buffer implementation and so introduce
swiotlb_unencrypted_base here to set shared memory boundary or vTOM.
Hyper-V Isolation VM is un-enlightened guest. Hyper-V doesn't expose 
sev/sme capability to guest and so SEV code actually doesn't work.

So we also can't interact current existing SEV code and these code is
for enlightened guest support without HCL/paravisor. If other platforms
or SEV want to use similar vTOM feature, swiotlb_unencrypted_base can
be reused. So swiotlb_unencrypted_base is a general solution for all
platforms besides SEV and Hyper-V.










___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

2021-12-13 Thread Tianyu Lan

On 12/14/2021 12:45 AM, Dave Hansen wrote:

On 12/12/21 11:14 PM, Tianyu Lan wrote:

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.


This seems to be independently reintroducing some of the SEV
infrastructure.  Is it really OK that this doesn't interact at all with
any existing SEV code?

For instance, do we need a new 'swiotlb_unencrypted_base', or should
this just be using sme_me_mask somehow?


Hi Dave:
   Thanks for your review. Hyper-V provides a para-virtualized
confidential computing solution based on the AMD SEV function and not
expose sev capabilities to guest. So sme_me_mask is unset in the
Hyper-V Isolation VM. swiotlb_unencrypted_base is more general solution
to handle such case of different address space for encrypted and
decrypted memory and other platform also may reuse it.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V7 5/5] net: netvsc: Add Isolation VM support for netvsc driver

2021-12-12 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

rx/tx ring buffer is allocated via vzalloc() and they need to be
mapped into unencrypted address space(above vTOM) before sharing
with host and accessing. Add hv_map/unmap_memory() to map/umap rx
/tx ring buffer.

Signed-off-by: Tianyu Lan 
---
Change since v3:
   * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn()
 with vmalloc_to_pfn() in the hv_map_memory()

Change since v2:
   * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
---
 arch/x86/hyperv/ivm.c |  28 ++
 drivers/hv/hv_common.c|  11 +++
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|   5 ++
 8 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 69c7a57f3307..2b994117581e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int 
pagecount, bool visibl
kfree(pfn_array);
return ret;
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / PAGE_SIZE; i++)
+   pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 7be173a99f27..3c5cb1f70319 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, 
void *output, u32 input_s
return HV_STATUS_INVALID_PARAMETER;
 }
 EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
+void __weak *hv_map_memory(void *addr, unsigned long size)
+{
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(hv_map_memory);
+
+void __weak hv_unmap_memory(void *addr)
+{
+}
+EXPORT_SYMBOL_GPL(hv_unmap_memory);
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..cf69da0e296c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..b7ade735a806 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   hv_unmap_memory(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   hv_unmap_memory(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev-&

[PATCH V7 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-12-12 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/vmbus_drv.c |  4 
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..ae6ec503399a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   

[PATCH V7 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM

2021-12-12 Thread Tianyu Lan
From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
(e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_
attributes() in the hyperv_init().

Signed-off-by: Tianyu Lan 
---
Change since v6:
* Fix compile error when swiotlb is not enabled.

Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
  and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
  ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
  in the hyperv_init().

Change since v3:
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().
---
 arch/x86/hyperv/hv_init.c  | 12 
 arch/x86/kernel/cpu/mshyperv.c | 15 ++-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 24f4a06ac46a..749906a8e068 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int hyperv_init_cpuhp;
 u64 hv_current_partition_id = ~0ull;
@@ -502,6 +503,17 @@ void __init hyperv_init(void)
 
/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);
+
+#ifdef CONFIG_SWIOTLB
+   /*
+* Swiotlb bounce buffer needs to be mapped in extra address
+* space. Map function doesn't work in the early place and so
+* call swiotlb_update_mem_attributes() here.
+*/
+   if (hv_is_isolation_supported())
+   swiotlb_update_mem_attributes();
+#endif
+
return;
 
 clean_guest_os_id:
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 4794b716ec79..e3a240c5e4f5 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -319,8 +320,20 @@ static void __init ms_hyperv_init_platform(void)
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
 
-   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
static_branch_enable(_type_snp);
+#ifdef CONFIG_SWIOTLB
+   swiotlb_unencrypted_base = 
ms_hyperv.shared_gpa_boundary;
+#endif
+   }
+
+#ifdef CONFIG_SWIOTLB
+   /*
+* Enable swiotlb force mode in Isolation VM to
+* use swiotlb bounce buffer for dma transaction.
+*/
+   swiotlb_force = SWIOTLB_FORCE;
+#endif
}
 
if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V7 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-12 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides Isolation VM for confidential computing support and
guest memory is encrypted in it. Places checking cc_platform_has()
with GUEST_MEM_ENCRYPT attr should return "True" in Isolation vm. e.g,
swiotlb bounce buffer size needs to adjust according to memory size
in the sev_setup_arch(). Add GUEST_MEM_ENCRYPT check for Hyper-V Isolation
VM.

Signed-off-by: Tianyu Lan 
---
Change since v6:
* Change the order in the cc_platform_has() and check sev first.

Change since v3:
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
---
 arch/x86/kernel/cc_platform.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..6cb3a675e686 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,12 +59,19 @@ static bool amd_cc_platform_has(enum cc_attr attr)
 #endif
 }
 
+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+   return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
+}
 
 bool cc_platform_has(enum cc_attr attr)
 {
if (sme_me_mask)
return amd_cc_platform_has(attr);
 
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);
+
return false;
 }
 EXPORT_SYMBOL_GPL(cc_platform_has);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V7 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

2021-12-12 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
drivers in Isolation VM.

Change since v6:
* Fix compile error in hv_init.c and mshyperv.c when swiotlb
  is not enabled.
* Change the order in the cc_platform_has() and check sev first. 

Change sicne v5:
* Modify "Swiotlb" to "swiotlb" in commit log.
* Remove CONFIG_HYPERV check in the hyperv_cc_platform_has()

Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
  and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
  ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
  in the hyperv_init().

Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().

Change since v2:
* Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add
  hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra
  address space.
* Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start)
  when fail to remap swiotlb memory.

Change since v1:
* Add Hyper-V Isolation support check in the cc_platform_has()
  and return true for guest memory encrypt attr.
* Remove hv isolation check in the sev_setup_arch()

Tianyu Lan (5):
  swiotlb: Add swiotlb bounce buffer remap function for HV IVM
  x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
  hyper-v: Enable swiotlb bounce buffer for Isolation VM
  scsi: storvsc: Add Isolation VM support for storvsc driver
  net: netvsc: Add Isolation VM support for netvsc driver

 arch/x86/hyperv/hv_init.c |  12 +++
 arch/x86/hyperv/ivm.c |  28 ++
 arch/x86/kernel/cc_platform.c |   8 ++
 arch/x86/kernel/cpu/mshyperv.c|  15 +++-
 drivers/hv/hv_common.c|  11 +++
 drivers/hv/vmbus_drv.c|   4 +
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|   6 ++
 include/linux/swiotlb.h   |   6 ++
 kernel/dma/swiotlb.c  |  43 +-
 15 files changed, 294 insertions(+), 22 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

2021-12-12 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Acked-by: Christoph Hellwig 
Signed-off-by: Tianyu Lan 
---
Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().

Change since v2:
* Leave mem->vaddr with phys_to_virt(mem->start) when fail
  to remap swiotlb memory.

Change since v1:
* Rework comment in the swiotlb_init_io_tlb_mem()
* Make swiotlb_init_io_tlb_mem() back to return void.
---
 include/linux/swiotlb.h |  6 ++
 kernel/dma/swiotlb.c| 43 +++--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ * may be remapped in the memory encrypted case and store virtual
+ * address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..34e6ade4f73c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr = NULL;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr)
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   }
+
+   return vaddr;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr)
+   mem->vaddr = vaddr;
+
+   memset(mem->vaddr, 0, bytes);
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID

Re: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM

2021-12-10 Thread Tianyu Lan



On 12/10/2021 9:25 PM, Tianyu Lan wrote:

@@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
  pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
  ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);


-    if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+    if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
  static_branch_enable(_type_snp);
+    swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
+    }
+
+    /*
+ * Enable swiotlb force mode in Isolation VM to
+ * use swiotlb bounce buffer for dma transaction.
+ */
+    swiotlb_force = SWIOTLB_FORCE;
I'm good with this approach that directly updates the swiotlb settings 
here


rather than in IOMMU initialization code.  It's a lot more 
straightforward.


However, there's an issue if building for X86_32 without PAE, in that the
swiotlb module may not be built, resulting in compile and link 
errors.  The

swiotlb.h file needs to be updated to provide a stub function for
swiotlb_update_mem_attributes().   swiotlb_unencrypted_base probably
needs wrapper functions to get/set it, which can be stubs when
CONFIG_SWIOTLB is not set.  swiotlb_force is a bit of a mess in that 
it already
has a stub definition that assumes it will only be read, and not set.  
A bit of

thinking will be needed to sort that out.


It's ok to fix the issue via selecting swiotlb when CONFIG_HYPERV is
set?

Sorry. ignore the previous statement. These codes doesn't depend on 
CONFIG_HYPERV.


How about making these code under #ifdef CONFIG_X86_64 or CONFIG_SWIOTLB?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM

2021-12-10 Thread Tianyu Lan

On 12/10/2021 4:09 AM, Michael Kelley (LINUX) wrote:

@@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);

-   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
static_branch_enable(_type_snp);
+   swiotlb_unencrypted_base = 
ms_hyperv.shared_gpa_boundary;
+   }
+
+   /*
+* Enable swiotlb force mode in Isolation VM to
+* use swiotlb bounce buffer for dma transaction.
+*/
+   swiotlb_force = SWIOTLB_FORCE;

I'm good with this approach that directly updates the swiotlb settings here

rather than in IOMMU initialization code.  It's a lot more straightforward.

However, there's an issue if building for X86_32 without PAE, in that the
swiotlb module may not be built, resulting in compile and link errors.  The
swiotlb.h file needs to be updated to provide a stub function for
swiotlb_update_mem_attributes().   swiotlb_unencrypted_base probably
needs wrapper functions to get/set it, which can be stubs when
CONFIG_SWIOTLB is not set.  swiotlb_force is a bit of a mess in that it already
has a stub definition that assumes it will only be read, and not set.  A bit of
thinking will be needed to sort that out.


It's ok to fix the issue via selecting swiotlb when CONFIG_HYPERV is
set?




}

if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b823311eac79..1f037e114dc8 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, 
unsigned int len,
  int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
void (*block_invalidate)(void *context,
 u64 block_mask));
+#if IS_ENABLED(CONFIG_HYPERV)
+int __init hyperv_swiotlb_detect(void);
+#else
+static inline int __init hyperv_swiotlb_detect(void)
+{
+   return 0;
+}
+#endif

I don't think hyperv_swiotlb_detect() is used any longer, so this change
should be dropped.

Yes, will update.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-10 Thread Tianyu Lan

On 12/10/2021 4:38 AM, Michael Kelley (LINUX) wrote:

From: Tianyu Lan  Sent: Monday, December 6, 2021 11:56 PM


Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.

Signed-off-by: Tianyu Lan 
---
Change since v3:
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
---
  arch/x86/kernel/cc_platform.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..47db88c275d5 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
  #include 
  #include 

+#include 
  #include 

  static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr)
  #endif
  }

+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+   return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
+}

  bool cc_platform_has(enum cc_attr attr)
  {
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);
+
if (sme_me_mask)
return amd_cc_platform_has(attr);



Throughout Linux kernel code, there are about 20 calls to cc_platform_has()
with CC_ATTR_GUEST_MEM_ENCRYPT as the argument.  The original code
(from v1 of this patch set) only dealt with the call in sev_setup_arch().   But
with this patch, all the other calls that previously returned "false" will now
return "true" in a Hyper-V Isolated VM.  I didn't try to analyze all these other
calls, so I think there's an open question about whether this is the behavior
we want.



CC_ATTR_GUEST_MEM_ENCRYPT is for SEV support so far. Hyper-V Isolation
VM is based on SEV or software memory encrypt. Most checks can be 
reused. The difference is that SEV code use encrypt bit in the page

table to encrypt and decrypt memory while Hyper-V uses vTOM. But the sev
memory encrypt mask "sme_me_mask" is unset in the Hyper-V Isolation VM
where claims sev and sme are unsupported. The rest of checks for mem enc
bit are still safe. So reuse CC_ATTR_GUEST_MEM_ENCRYPT for Hyper-V.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-12-09 Thread Tianyu Lan




On 12/9/2021 4:00 PM, Long Li wrote:

@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host
*host, struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;

+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;

Hi Tianyu,

This patch (and this patch series) unconditionally adds code for dealing with 
DMA addresses for all VMs, including non-isolation VMs.

Does this add performance penalty for VMs that don't require isolation?



Hi Long:
scsi_dma_map() in the traditional VM just save sg->offset to
sg->dma_address and no data copy because swiotlb bounce buffer code
doesn't work. The data copy only takes place in the Isolation VM and
swiotlb_force is set. So there is no additional overhead in the 
traditional VM.


Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver

2021-12-09 Thread Tianyu Lan




On 12/9/2021 4:14 AM, Haiyang Zhang wrote:

From: Tianyu Lan 
Sent: Tuesday, December 7, 2021 2:56 AM
To: KY Srinivasan ; Haiyang Zhang ; 
Stephen
Hemminger ; wei@kernel.org; Dexuan Cui 
;
t...@linutronix.de; mi...@redhat.com; b...@alien8.de; 
dave.han...@linux.intel.com;
x...@kernel.org; h...@zytor.com; da...@davemloft.net; k...@kernel.org; 
j...@linux.ibm.com;
martin.peter...@oracle.com; a...@arndb.de; h...@infradead.org; 
m.szyprow...@samsung.com;
robin.mur...@arm.com; Tianyu Lan ; 
thomas.lenda...@amd.com;
Michael Kelley (LINUX) 
Cc: iommu@lists.linux-foundation.org; linux-a...@vger.kernel.org; linux-
hyp...@vger.kernel.org; linux-ker...@vger.kernel.org; 
linux-s...@vger.kernel.org;
net...@vger.kernel.org; vkuznets ; brijesh.si...@amd.com;
konrad.w...@oracle.com; h...@lst.de; j...@8bytes.org; parri.and...@gmail.com;
dave.han...@intel.com
Subject: [PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver

From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

rx/tx ring buffer is allocated via vzalloc() and they need to be
mapped into unencrypted address space(above vTOM) before sharing
with host and accessing. Add hv_map/unmap_memory() to map/umap rx
/tx ring buffer.

Signed-off-by: Tianyu Lan 
---
Change since v3:
* Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn()
  with vmalloc_to_pfn() in the hv_map_memory()

Change since v2:
* Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
---
  arch/x86/hyperv/ivm.c |  28 ++
  drivers/hv/hv_common.c|  11 +++
  drivers/net/hyperv/hyperv_net.h   |   5 ++
  drivers/net/hyperv/netvsc.c   | 136 +-
  drivers/net/hyperv/netvsc_drv.c   |   1 +
  drivers/net/hyperv/rndis_filter.c |   2 +
  include/asm-generic/mshyperv.h|   2 +
  include/linux/hyperv.h|   5 ++
  8 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 69c7a57f3307..2b994117581e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int 
pagecount,
bool visibl
kfree(pfn_array);
return ret;
  }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / PAGE_SIZE; i++)
+   pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 7be173a99f27..3c5cb1f70319 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, 
void *output,
u32 input_s
return HV_STATUS_INVALID_PARAMETER;
  }
  EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
+void __weak *hv_map_memory(void *addr, unsigned long size)
+{
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(hv_map_memory);
+
+void __weak hv_unmap_memory(void *addr)
+{
+}
+EXPORT_SYMBOL_GPL(hv_unmap_memory);
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..cf69da0e296c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
  };

  #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {

/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {

/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
  #define RETRY_US_HI   1
  #define RETRY_MAX

Re: [PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-07 Thread Tianyu Lan

Hi Borislav:
Thanks for your review.

On 12/7/2021 5:47 PM, Borislav Petkov wrote:

On Tue, Dec 07, 2021 at 02:55:58AM -0500, Tianyu Lan wrote:

From: Tianyu Lan 

Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.


You need to refresh on how to write commit messages - never say what the
patch is doing - that's visible in the diff itself. Rather, you should
talk about *why* it is doing what it is doing.


Sure. Will update.




  bool cc_platform_has(enum cc_attr attr)
  {
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);


Is there any reason for the hv_is_.. check to come before...



Do you mean to check hyper-v before sev? If yes, no special reason.



+
if (sme_me_mask)
return amd_cc_platform_has(attr);


... the sme_me_mask check?

What's in sme_me_mask on hyperv?


sme_me_mask is unset in this case.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

rx/tx ring buffer is allocated via vzalloc() and they need to be
mapped into unencrypted address space(above vTOM) before sharing
with host and accessing. Add hv_map/unmap_memory() to map/umap rx
/tx ring buffer.

Signed-off-by: Tianyu Lan 
---
Change since v3:
   * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn()
 with vmalloc_to_pfn() in the hv_map_memory()

Change since v2:
   * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
---
 arch/x86/hyperv/ivm.c |  28 ++
 drivers/hv/hv_common.c|  11 +++
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|   5 ++
 8 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 69c7a57f3307..2b994117581e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int 
pagecount, bool visibl
kfree(pfn_array);
return ret;
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / PAGE_SIZE; i++)
+   pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 7be173a99f27..3c5cb1f70319 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, 
void *output, u32 input_s
return HV_STATUS_INVALID_PARAMETER;
 }
 EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
+void __weak *hv_map_memory(void *addr, unsigned long size)
+{
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(hv_map_memory);
+
+void __weak hv_unmap_memory(void *addr)
+{
+}
+EXPORT_SYMBOL_GPL(hv_unmap_memory);
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..cf69da0e296c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..b7ade735a806 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   hv_unmap_memory(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   hv_unmap_memory(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev-&

[PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
(e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_
attributes() in the hyperv_init().

Signed-off-by: Tianyu Lan 
---
Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
  and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
  ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
  in the hyperv_init().

Change since v3:
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().
---
 arch/x86/hyperv/hv_init.c  | 10 ++
 arch/x86/kernel/cpu/mshyperv.c | 11 ++-
 include/linux/hyperv.h |  8 
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 24f4a06ac46a..9e18a280f89d 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int hyperv_init_cpuhp;
 u64 hv_current_partition_id = ~0ull;
@@ -502,6 +503,15 @@ void __init hyperv_init(void)
 
/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);
+
+   /*
+* Swiotlb bounce buffer needs to be mapped in extra address
+* space. Map function doesn't work in the early place and so
+* call swiotlb_update_mem_attributes() here.
+*/
+   if (hv_is_isolation_supported())
+   swiotlb_update_mem_attributes();
+
return;
 
 clean_guest_os_id:
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 4794b716ec79..baf3a0873552 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
 
-   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
static_branch_enable(_type_snp);
+   swiotlb_unencrypted_base = 
ms_hyperv.shared_gpa_boundary;
+   }
+
+   /*
+* Enable swiotlb force mode in Isolation VM to
+* use swiotlb bounce buffer for dma transaction.
+*/
+   swiotlb_force = SWIOTLB_FORCE;
}
 
if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b823311eac79..1f037e114dc8 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, 
unsigned int len,
 int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
void (*block_invalidate)(void *context,
 u64 block_mask));
+#if IS_ENABLED(CONFIG_HYPERV)
+int __init hyperv_swiotlb_detect(void);
+#else
+static inline int __init hyperv_swiotlb_detect(void)
+{
+   return 0;
+}
+#endif
 
 struct hyperv_pci_block_ops {
int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/vmbus_drv.c |  4 
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..ae6ec503399a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   

[PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.

Signed-off-by: Tianyu Lan 
---
Change since v3:
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
---
 arch/x86/kernel/cc_platform.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..47db88c275d5 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr)
 #endif
 }
 
+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+   return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
+}
 
 bool cc_platform_has(enum cc_attr attr)
 {
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);
+
if (sme_me_mask)
return amd_cc_platform_has(attr);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V6 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Acked-by: Christoph Hellwig 
Signed-off-by: Tianyu Lan 
---
Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().

Change since v2:
* Leave mem->vaddr with phys_to_virt(mem->start) when fail
  to remap swiotlb memory.

Change since v1:
* Rework comment in the swiotlb_init_io_tlb_mem()
* Make swiotlb_init_io_tlb_mem() back to return void.
---
 include/linux/swiotlb.h |  6 ++
 kernel/dma/swiotlb.c| 43 +++--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ * may be remapped in the memory encrypted case and store virtual
+ * address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..34e6ade4f73c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr = NULL;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr)
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   }
+
+   return vaddr;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr)
+   mem->vaddr = vaddr;
+
+   memset(mem->vaddr, 0, bytes);
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID

[PATCH V6 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM.

This version follows Michael Kelley suggestion in the following link.
https://lkml.org/lkml/2021/11/24/2044

Change sicne v5:
* Modify "Swiotlb" to "swiotlb" in commit log.
* Remove CONFIG_HYPERV check in the hyperv_cc_platform_has()

Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
  and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
  ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
  in the hyperv_init().

Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().

Change since v2:
* Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add
  hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra
  address space.
* Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start)
  when fail to remap swiotlb memory.

Change since v1:
* Add Hyper-V Isolation support check in the cc_platform_has()
  and return true for guest memory encrypt attr.
* Remove hv isolation check in the sev_setup_arch()

Tianyu Lan (5):
  swiotlb: Add swiotlb bounce buffer remap function for HV IVM
  x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
  hyper-v: Enable swiotlb bounce buffer for Isolation VM
  scsi: storvsc: Add Isolation VM support for storvsc driver
  net: netvsc: Add Isolation VM support for netvsc driver

 arch/x86/hyperv/hv_init.c |  10 +++
 arch/x86/hyperv/ivm.c |  28 ++
 arch/x86/kernel/cc_platform.c |   8 ++
 arch/x86/kernel/cpu/mshyperv.c|  11 ++-
 drivers/hv/hv_common.c|  11 +++
 drivers/hv/vmbus_drv.c|   4 +
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|  14 +++
 include/linux/swiotlb.h   |   6 ++
 kernel/dma/swiotlb.c  |  43 +-
 15 files changed, 296 insertions(+), 22 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V6 5/5] net: netvsc: Add Isolation VM support for netvsc driver

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

rx/tx ring buffer is allocated via vzalloc() and they need to be
mapped into unencrypted address space(above vTOM) before sharing
with host and accessing. Add hv_map/unmap_memory() to map/umap rx
/tx ring buffer.

Signed-off-by: Tianyu Lan 
---
Change since v3:
   * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn()
 with vmalloc_to_pfn() in the hv_map_memory()

Change since v2:
   * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
---
 arch/x86/hyperv/ivm.c |  28 ++
 drivers/hv/hv_common.c|  11 +++
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|   5 ++
 8 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 69c7a57f3307..2b994117581e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int 
pagecount, bool visibl
kfree(pfn_array);
return ret;
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / PAGE_SIZE; i++)
+   pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 7be173a99f27..3c5cb1f70319 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, 
void *output, u32 input_s
return HV_STATUS_INVALID_PARAMETER;
 }
 EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
+void __weak *hv_map_memory(void *addr, unsigned long size)
+{
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(hv_map_memory);
+
+void __weak hv_unmap_memory(void *addr)
+{
+}
+EXPORT_SYMBOL_GPL(hv_unmap_memory);
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..cf69da0e296c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..b7ade735a806 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   hv_unmap_memory(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   hv_unmap_memory(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev-&

[PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
(e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_
attributes() in the hyperv_init().

Signed-off-by: Tianyu Lan 
---
Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
  and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
  ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
  in the hyperv_init().

Change since v3:
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().
---
 arch/x86/hyperv/hv_init.c  | 10 ++
 arch/x86/kernel/cpu/mshyperv.c | 11 ++-
 include/linux/hyperv.h |  8 
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 24f4a06ac46a..9e18a280f89d 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int hyperv_init_cpuhp;
 u64 hv_current_partition_id = ~0ull;
@@ -502,6 +503,15 @@ void __init hyperv_init(void)
 
/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);
+
+   /*
+* Swiotlb bounce buffer needs to be mapped in extra address
+* space. Map function doesn't work in the early place and so
+* call swiotlb_update_mem_attributes() here.
+*/
+   if (hv_is_isolation_supported())
+   swiotlb_update_mem_attributes();
+
return;
 
 clean_guest_os_id:
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 4794b716ec79..baf3a0873552 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
 
-   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
static_branch_enable(_type_snp);
+   swiotlb_unencrypted_base = 
ms_hyperv.shared_gpa_boundary;
+   }
+
+   /*
+* Enable swiotlb force mode in Isolation VM to
+* use swiotlb bounce buffer for dma transaction.
+*/
+   swiotlb_force = SWIOTLB_FORCE;
}
 
if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b823311eac79..1f037e114dc8 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, 
unsigned int len,
 int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
void (*block_invalidate)(void *context,
 u64 block_mask));
+#if IS_ENABLED(CONFIG_HYPERV)
+int __init hyperv_swiotlb_detect(void);
+#else
+static inline int __init hyperv_swiotlb_detect(void)
+{
+   return 0;
+}
+#endif
 
 struct hyperv_pci_block_ops {
int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V6 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.

Signed-off-by: Tianyu Lan 
---
Change since v3:
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
---
 arch/x86/kernel/cc_platform.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..47db88c275d5 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,9 +59,16 @@ static bool amd_cc_platform_has(enum cc_attr attr)
 #endif
 }
 
+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+   return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
+}
 
 bool cc_platform_has(enum cc_attr attr)
 {
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);
+
if (sme_me_mask)
return amd_cc_platform_has(attr);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/vmbus_drv.c |  4 
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..ae6ec503399a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   

[PATCH V6 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Acked-by: Christoph Hellwig 
Signed-off-by: Tianyu Lan 
---
Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().

Change since v2:
* Leave mem->vaddr with phys_to_virt(mem->start) when fail
  to remap swiotlb memory.

Change since v1:
* Rework comment in the swiotlb_init_io_tlb_mem()
* Make swiotlb_init_io_tlb_mem() back to return void.
---
 include/linux/swiotlb.h |  6 ++
 kernel/dma/swiotlb.c| 43 +++--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ * may be remapped in the memory encrypted case and store virtual
+ * address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..34e6ade4f73c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr = NULL;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr)
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   }
+
+   return vaddr;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr)
+   mem->vaddr = vaddr;
+
+   memset(mem->vaddr, 0, bytes);
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID

[PATCH V6 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM.

This version follows Michael Kelley suggestion in the following link.
https://lkml.org/lkml/2021/11/24/2044

Change sicne v5:
* Modify "Swiotlb" to "swiotlb" in commit log.
* Remove CONFIG_HYPERV check in the hyperv_cc_platform_has()

Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
  and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
  ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
  in the hyperv_init().

Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().

Change since v2:
* Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add
  hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra
  address space.
* Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start)
  when fail to remap swiotlb memory.

Change since v1:
* Add Hyper-V Isolation support check in the cc_platform_has()
  and return true for guest memory encrypt attr.
* Remove hv isolation check in the sev_setup_arch()

Tianyu Lan (5):
  swiotlb: Add swiotlb bounce buffer remap function for HV IVM
  x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
  hyper-v: Enable swiotlb bounce buffer for Isolation VM
  scsi: storvsc: Add Isolation VM support for storvsc driver
  net: netvsc: Add Isolation VM support for netvsc driver

 arch/x86/hyperv/hv_init.c |  10 +++
 arch/x86/hyperv/ivm.c |  28 ++
 arch/x86/kernel/cc_platform.c |   8 ++
 arch/x86/kernel/cpu/mshyperv.c|  11 ++-
 drivers/hv/hv_common.c|  11 +++
 drivers/hv/vmbus_drv.c|   4 +
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|  14 +++
 include/linux/swiotlb.h   |   6 ++
 kernel/dma/swiotlb.c  |  43 +-
 15 files changed, 296 insertions(+), 22 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-12-06 Thread Tianyu Lan




On 12/5/2021 6:31 PM, Juergen Gross wrote:

On 05.12.21 09:48, Tianyu Lan wrote:



On 12/5/2021 4:34 PM, Juergen Gross wrote:

On 05.12.21 09:18, Tianyu Lan wrote:

From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.


Why? Does Hyper-V plan to support Xen PV guests? If not, I don't see
the need for adding this change.



This is to keep detect function calling order that Hyper-V detect 
callback needs to call before pci_swiotlb_detect_override() and 
pci_swiotlb_detect_4gb(). This is the same for why
pci_swiotlb_detect_override() needs to depend on the 
pci_xen_swiotlb_detect(). Hyper-V also has such request and so make 
xen detect callback depends on Hyper-V one.


And does this even work without CONFIG_SWIOTLB_XEN, i.e. without
pci_xen_swiotlb_detect() being in the system?


Hi Juergen:
	Thanks for your review. This is a issue and I just sent out a v5 which 
decouples the dependency between xen and hyperv.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-12-06 Thread Tianyu Lan

On 12/6/2021 10:09 PM, Christoph Hellwig wrote:

Please spell swiotlb with a lower case s.  Otherwise this look good

Acked-by: Christoph Hellwig 

Feel free to carry this in whatever tree is suitable for the rest of the
patches.



Sure. Thanks for your ack and will update "swiotlb" in the next version.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-06 Thread Tianyu Lan

Hi Christoph:
Thanks for your review.

On 12/6/2021 10:06 PM, Christoph Hellwig wrote:

On Sun, Dec 05, 2021 at 03:18:10AM -0500, Tianyu Lan wrote:

+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_HYPERV
+   return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
+#else
+   return false;
+#endif
+}


Can we even end up here without CONFIG_HYPERV?



Yes, I will update in the next version.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V5 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/vmbus_drv.c |  4 
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 3 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..ae6ec503399a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   

[PATCH V5 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.

Signed-off-by: Tianyu Lan 
---
Change since v3:
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
---
 arch/x86/kernel/cc_platform.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..7b66793c0f25 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,9 +59,20 @@ static bool amd_cc_platform_has(enum cc_attr attr)
 #endif
 }
 
+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+#if IS_ENABLED(CONFIG_HYPERV)
+   return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
+#else
+   return false;
+#endif
+}
 
 bool cc_platform_has(enum cc_attr attr)
 {
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);
+
if (sme_me_mask)
return amd_cc_platform_has(attr);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V5 5/5] net: netvsc: Add Isolation VM support for netvsc driver

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

rx/tx ring buffer is allocated via vzalloc() and they need to be
mapped into unencrypted address space(above vTOM) before sharing
with host and accessing. Add hv_map/unmap_memory() to map/umap rx
/tx ring buffer.

Signed-off-by: Tianyu Lan 
---
Change since v3:
   * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn()
 with vmalloc_to_pfn() in the hv_map_memory()

Change since v2:
   * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
---
 arch/x86/hyperv/ivm.c |  28 ++
 drivers/hv/hv_common.c|  11 +++
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|   5 ++
 8 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 69c7a57f3307..2b994117581e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int 
pagecount, bool visibl
kfree(pfn_array);
return ret;
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / PAGE_SIZE; i++)
+   pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 7be173a99f27..3c5cb1f70319 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, 
void *output, u32 input_s
return HV_STATUS_INVALID_PARAMETER;
 }
 EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
+void __weak *hv_map_memory(void *addr, unsigned long size)
+{
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(hv_map_memory);
+
+void __weak hv_unmap_memory(void *addr)
+{
+}
+EXPORT_SYMBOL_GPL(hv_unmap_memory);
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..cf69da0e296c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..b7ade735a806 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   hv_unmap_memory(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   hv_unmap_memory(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev-&

[PATCH V5 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
(e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_
attributes() in the hyperv_init().

Signed-off-by: Tianyu Lan 
---
Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
  and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
  ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
  in the hyperv_init().

Change since v3:
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().
---
 arch/x86/hyperv/hv_init.c  | 10 ++
 arch/x86/kernel/cpu/mshyperv.c | 11 ++-
 include/linux/hyperv.h |  8 
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 24f4a06ac46a..9e18a280f89d 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int hyperv_init_cpuhp;
 u64 hv_current_partition_id = ~0ull;
@@ -502,6 +503,15 @@ void __init hyperv_init(void)
 
/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);
+
+   /*
+* Swiotlb bounce buffer needs to be mapped in extra address
+* space. Map function doesn't work in the early place and so
+* call swiotlb_update_mem_attributes() here.
+*/
+   if (hv_is_isolation_supported())
+   swiotlb_update_mem_attributes();
+
return;
 
 clean_guest_os_id:
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 4794b716ec79..baf3a0873552 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
 
-   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
+   if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
static_branch_enable(_type_snp);
+   swiotlb_unencrypted_base = 
ms_hyperv.shared_gpa_boundary;
+   }
+
+   /*
+* Enable swiotlb force mode in Isolation VM to
+* use swiotlb bounce buffer for dma transaction.
+*/
+   swiotlb_force = SWIOTLB_FORCE;
}
 
if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) {
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b823311eac79..1f037e114dc8 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1726,6 +1726,14 @@ int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, 
unsigned int len,
 int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
void (*block_invalidate)(void *context,
 u64 block_mask));
+#if IS_ENABLED(CONFIG_HYPERV)
+int __init hyperv_swiotlb_detect(void);
+#else
+static inline int __init hyperv_swiotlb_detect(void)
+{
+   return 0;
+}
+#endif
 
 struct hyperv_pci_block_ops {
int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V5 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().

Change since v2:
* Leave mem->vaddr with phys_to_virt(mem->start) when fail
  to remap swiotlb memory.

Change since v1:
* Rework comment in the swiotlb_init_io_tlb_mem()
* Make swiotlb_init_io_tlb_mem() back to return void.
---
 include/linux/swiotlb.h |  6 ++
 kernel/dma/swiotlb.c| 43 +++--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ * may be remapped in the memory encrypted case and store virtual
+ * address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..34e6ade4f73c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr = NULL;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr)
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   }
+
+   return vaddr;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr)
+   mem->vaddr = vaddr;
+
+   memset(mem->vaddr, 0, bytes);
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID_PHY

[PATCH V5 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

2021-12-06 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM.

This version follows Michael Kelley suggestion in the following link.
https://lkml.org/lkml/2021/11/24/2044

Change since v4:
* Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions
  and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the
  ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes()
  in the hyperv_init().

Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().

Change since v2:
* Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add
  hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra
  address space.
* Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start)
  when fail to remap swiotlb memory.

Change since v1:
* Add Hyper-V Isolation support check in the cc_platform_has()
  and return true for guest memory encrypt attr.
* Remove hv isolation check in the sev_setup_arch()

Tianyu Lan (5):
  Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
  hyper-v: Enable swiotlb bounce buffer for Isolation VM
  scsi: storvsc: Add Isolation VM support for storvsc driver
  net: netvsc: Add Isolation VM support for netvsc driver

 arch/x86/hyperv/hv_init.c |  10 +++
 arch/x86/hyperv/ivm.c |  28 ++
 arch/x86/kernel/cc_platform.c |  12 +++
 arch/x86/kernel/cpu/mshyperv.c|  11 ++-
 drivers/hv/hv_common.c|  11 +++
 drivers/hv/vmbus_drv.c|   4 +
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|  14 +++
 include/linux/swiotlb.h   |   6 ++
 kernel/dma/swiotlb.c  |  43 +-
 15 files changed, 300 insertions(+), 22 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-12-05 Thread Tianyu Lan




On 12/5/2021 4:34 PM, Juergen Gross wrote:

On 05.12.21 09:18, Tianyu Lan wrote:

From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.


Why? Does Hyper-V plan to support Xen PV guests? If not, I don't see
the need for adding this change.



This is to keep detect function calling order that Hyper-V detect 
callback needs to call before pci_swiotlb_detect_override() and 
pci_swiotlb_detect_4gb(). This is the same for why
pci_swiotlb_detect_override() needs to depend on the 
pci_xen_swiotlb_detect(). Hyper-V also has such request and so make xen 
detect callback depends on Hyper-V one.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V4 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-05 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.

Signed-off-by: Tianyu Lan 
---
Change since v3:
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
---
 arch/x86/kernel/cc_platform.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..27c06b32e7c4 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,9 +59,20 @@ static bool amd_cc_platform_has(enum cc_attr attr)
 #endif
 }
 
+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_HYPERV
+   return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
+#else
+   return false;
+#endif
+}
 
 bool cc_platform_has(enum cc_attr attr)
 {
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);
+
if (sme_me_mask)
return amd_cc_platform_has(attr);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V4 5/5] hv_netvsc: Add Isolation VM support for netvsc driver

2021-12-05 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

rx/tx ring buffer is allocated via vzalloc() and they need to be
mapped into unencrypted address space(above vTOM) before sharing
with host and accessing. Add hv_map/unmap_memory() to map/umap rx
/tx ring buffer.

Signed-off-by: Tianyu Lan 
---
Change since v3:
   * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn()
 with vmalloc_to_pfn() in the hv_map_memory()

Change since v2:
   * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
---
 arch/x86/hyperv/ivm.c |  28 ++
 drivers/hv/hv_common.c|  11 +++
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|   5 ++
 8 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 69c7a57f3307..2b994117581e 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int 
pagecount, bool visibl
kfree(pfn_array);
return ret;
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / PAGE_SIZE; i++)
+   pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 7be173a99f27..3c5cb1f70319 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, 
void *output, u32 input_s
return HV_STATUS_INVALID_PARAMETER;
 }
 EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
+void __weak *hv_map_memory(void *addr, unsigned long size)
+{
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(hv_map_memory);
+
+void __weak hv_unmap_memory(void *addr)
+{
+}
+EXPORT_SYMBOL_GPL(hv_unmap_memory);
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..cf69da0e296c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..b7ade735a806 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   hv_unmap_memory(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   hv_unmap_memory(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev-&

[PATCH V4 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-12-05 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/vmbus_drv.c |  1 +
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 0a64ccfafb8b..ae6ec503399a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2121,6 +2121,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
hv_debug_add_dev_dir(child_device_obj);
 
child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
-   hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
-   hvpgoff;
+   hvpfn = HVPFN_DOWN(sg_dma_address(sg));
+   hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
+  

[PATCH V4 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-12-05 Thread Tianyu Lan
From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes()
in the hyperv_iommu_swiotlb_later_init().

Signed-off-by: Tianyu Lan 
---
Change since v3:
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().
---
 arch/x86/xen/pci-swiotlb-xen.c | 12 ++-
 drivers/hv/vmbus_drv.c |  3 ++
 drivers/iommu/hyperv-iommu.c   | 58 ++
 include/linux/hyperv.h |  8 +
 4 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 46df59aeaa06..8e2ee3ce6374 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -90,7 +91,16 @@ int pci_xen_swiotlb_init_late(void)
 }
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
+/*
+ * Hyper-V initalizes swiotlb bounce buffer and default swiotlb
+ * needs to be disabled. pci_swiotlb_detect_override() and
+ * pci_swiotlb_detect_4gb() enable the default one. To override
+ * the setting, hyperv_swiotlb_detect() needs to run before
+ * these detect functions which depends on the pci_xen_swiotlb_
+ * init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
+ * _detect() to keep the order.
+ */
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..0a64ccfafb8b 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..44ba24d9e06c 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,14 +13,20 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -337,4 +343,56 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
.free = hyperv_root_irq_remapping_free,
 };
 
+static void __init hyperv_iommu_swiotlb_init(void)
+{
+   unsigned long hyperv_io_tlb_size;
+   void *hyperv_io_tlb_start;
+
+   /*
+* Allocate Hyper-V swiotlb bounce buffer at early place
+* to reserve large contiguous memory.
+*/
+   hyperv_io_tlb_size = swiotlb_size_or_default();
+   hyperv_io_tlb_start = memblock_alloc(hyperv_io_tlb_size, PAGE_SIZE);
+
+   if (!hyperv_io_tlb_start) {
+   pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n");
+   return;
+   }
+
+   swiotlb_init_with_tbl(hyperv_io_tlb_start,
+ hyperv_io_tlb_size >>

[PATCH V4 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-12-05 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().

Change since v2:
* Leave mem->vaddr with phys_to_virt(mem->start) when fail
  to remap swiotlb memory.

Change since v1:
* Rework comment in the swiotlb_init_io_tlb_mem()
* Make swiotlb_init_io_tlb_mem() back to return void.
---
 include/linux/swiotlb.h |  6 ++
 kernel/dma/swiotlb.c| 43 +++--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ * may be remapped in the memory encrypted case and store virtual
+ * address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..34e6ade4f73c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr = NULL;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr)
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   }
+
+   return vaddr;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr)
+   mem->vaddr = vaddr;
+
+   memset(mem->vaddr, 0, bytes);
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID_PHY

[PATCH V4 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

2021-12-05 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM.

This version follows Michael Kelley suggestion in the following link.
https://lkml.org/lkml/2021/11/24/2044

Change since v3:
* Fix boot up failure on the host with mem_encrypt=on.
  Move calloing of set_memory_decrypted() back from
  swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl()
  and rmem_swiotlb_device_init().
* Change code style of checking GUEST_MEM attribute in the
  hyperv_cc_platform_has().
* Add comment in pci-swiotlb-xen.c to explain why add
  dependency between hyperv_swiotlb_detect() and pci_
  xen_swiotlb_detect().
* Return directly when fails to allocate Hyper-V swiotlb
  buffer in the hyperv_iommu_swiotlb_init().

Change since v2:
* Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add
  hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra
  address space.
* Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start)
  when fail to remap swiotlb memory.

Change since v1:
* Add Hyper-V Isolation support check in the cc_platform_has()
  and return true for guest memory encrypt attr.
* Remove hv isolation check in the sev_setup_arch()

Tianyu Lan (5):
  Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
  hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  scsi: storvsc: Add Isolation VM support for storvsc driver
  hv_netvsc: Add Isolation VM support for netvsc driver

 arch/x86/hyperv/ivm.c |  28 ++
 arch/x86/kernel/cc_platform.c |  12 +++
 arch/x86/xen/pci-swiotlb-xen.c|  12 ++-
 drivers/hv/hv_common.c|  11 +++
 drivers/hv/vmbus_drv.c|   4 +
 drivers/iommu/hyperv-iommu.c  |  58 +
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|  14 +++
 include/linux/swiotlb.h   |   6 ++
 kernel/dma/swiotlb.c  |  43 +-
 15 files changed, 349 insertions(+), 22 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 5/5] hv_netvsc: Add Isolation VM support for netvsc driver

2021-12-03 Thread Tianyu Lan

On 12/4/2021 2:59 AM, Michael Kelley (LINUX) wrote:

+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,

This should be just PAGE_SIZE, as this code is unrelated to communication
with Hyper-V.



Yes, agree. Will update.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-12-03 Thread Tianyu Lan




On 12/4/2021 3:17 AM, Michael Kelley (LINUX) wrote:

+static void __init hyperv_iommu_swiotlb_init(void)
+{
+   unsigned long hyperv_io_tlb_size;
+   void *hyperv_io_tlb_start;
+
+   /*
+* Allocate Hyper-V swiotlb bounce buffer at early place
+* to reserve large contiguous memory.
+*/
+   hyperv_io_tlb_size = swiotlb_size_or_default();
+   hyperv_io_tlb_start = memblock_alloc(hyperv_io_tlb_size, PAGE_SIZE);
+
+   if (!hyperv_io_tlb_start)
+   pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n");

In the error case, won't swiotlb_init_with_tlb() end up panic'ing when
it tries to zero out the memory?   The only real choice here is to
return immediately after printing the message, and not call
swiotlb_init_with_tlb().



Yes, agree. Will update.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-12-03 Thread Tianyu Lan

On 12/4/2021 4:06 AM, Tom Lendacky wrote:

Hi Tom:
   Thanks for your test. Could you help to test the following 
patch and check whether it can fix the issue.


The patch is mangled. Is the only difference where 
set_memory_decrypted() is called?


I de-mangled the patch. No more stack traces with SME active.

Thanks,
Tom


Hi Tom:
Thanks a lot for your rework and test. I will update in the next 
version.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V3 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-12-03 Thread Tianyu Lan

On 12/2/2021 10:43 PM, Wei Liu wrote:

On Wed, Dec 01, 2021 at 11:02:54AM -0500, Tianyu Lan wrote:
[...]

diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 46df59aeaa06..30fd0600b008 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
  
  #include 

  #include 
+#include 
  #include 
  
  #include 

@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
  EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
  
  IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,

- NULL,
+ hyperv_swiotlb_detect,


It is not immediately obvious why this is needed just by reading the
code. Please consider copying some of the text in the commit message to
a comment here.



Thanks for suggestion. Will update.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-03 Thread Tianyu Lan

On 12/2/2021 10:39 PM, Wei Liu wrote:

+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_HYPERV
+   if (attr == CC_ATTR_GUEST_MEM_ENCRYPT)
+   return true;
+   else
+   return false;

This can be simplified as

return attr == CC_ATTR_GUEST_MEM_ENCRYPT;


Wei.


Hi Wei: 
Thanks for your review. Will update.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-12-03 Thread Tianyu Lan



On 12/2/2021 10:42 PM, Tom Lendacky wrote:

On 12/1/21 10:02 AM, Tianyu Lan wrote:

From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 


This patch results in the following stack trace during a bare-metal boot
on my EPYC system with SME active (e.g. mem_encrypt=on):

[    0.123932] BUG: Bad page state in process swapper  pfn:108001
[    0.123942] page:(ptrval) refcount:0 mapcount:-128 
mapping: index:0x0 pfn:0x108001

[    0.123946] flags: 0x17c000(node=0|zone=2|lastcpupid=0x1f)
[    0.123952] raw: 0017c000 88904f2d5e80 88904f2d5e80 

[    0.123954] raw:   ff7f 


[    0.123955] page dumped because: nonzero mapcount
[    0.123957] Modules linked in:
[    0.123961] CPU: 0 PID: 0 Comm: swapper Not tainted 
5.16.0-rc3-sos-custom #2

[    0.123964] Hardware name: AMD Corporation
[    0.123967] Call Trace:
[    0.123971]  
[    0.123975]  dump_stack_lvl+0x48/0x5e
[    0.123985]  bad_page.cold+0x65/0x96
[    0.123990]  __free_pages_ok+0x3a8/0x410
[    0.123996]  memblock_free_all+0x171/0x1dc
[    0.124005]  mem_init+0x1f/0x14b
[    0.124011]  start_kernel+0x3b5/0x6a1
[    0.124016]  secondary_startup_64_no_verify+0xb0/0xbb
[    0.124022]  

I see ~40 of these traces, each for different pfns.

Thanks,
Tom


Hi Tom:
  Thanks for your test. Could you help to test the following patch 
and check whether it can fix the issue.



diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do 
a quick
  * range check to see if the memory was in fact allocated 
by this

  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory 
pool
+ * may be remapped in the memory encrypted case and store 
virtual

+ * address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between 
@start and
  * @end. For default swiotlb, this is command line 
adjustable via

  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct 
device *dev)

 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */

+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..34e6ade4f73c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 

+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;

 struct io_tlb_mem io_tlb_default_mem;

+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }

+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr = NULL;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr)
+   pr_err("Failed to map the unencrypted memory 
%llx size %lx.\n",

+  paddr, bytes);
+   }
+
+   return vaddr;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This fu

[PATCH V3 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-12-01 Thread Tianyu Lan
From: Tianyu Lan 

hyperv Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

Swiotlb bounce buffer code calls set_memory_decrypted()
to mark bounce buffer visible to host and map it in extra
address space via memremap. Populate the shared_gpa_boundary
(vTOM) via swiotlb_unencrypted_base variable.

The map function memremap() can't work in the early place
hyperv_iommu_swiotlb_init() and so call swiotlb_update_mem_attributes()
in the hyperv_iommu_swiotlb_later_init().

Signed-off-by: Tianyu Lan 
---
 arch/x86/xen/pci-swiotlb-xen.c |  3 +-
 drivers/hv/vmbus_drv.c |  3 ++
 drivers/iommu/hyperv-iommu.c   | 56 ++
 include/linux/hyperv.h |  8 +
 4 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 46df59aeaa06..30fd0600b008 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 392c1ac4f819..0a64ccfafb8b 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "hyperv_vmbus.h"
 
@@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2118,6 +2120,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..dd729d49a1eb 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -13,14 +13,20 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -337,4 +343,54 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
.free = hyperv_root_irq_remapping_free,
 };
 
+static void __init hyperv_iommu_swiotlb_init(void)
+{
+   unsigned long hyperv_io_tlb_size;
+   void *hyperv_io_tlb_start;
+
+   /*
+* Allocate Hyper-V swiotlb bounce buffer at early place
+* to reserve large contiguous memory.
+*/
+   hyperv_io_tlb_size = swiotlb_size_or_default();
+   hyperv_io_tlb_start = memblock_alloc(hyperv_io_tlb_size, PAGE_SIZE);
+
+   if (!hyperv_io_tlb_start)
+   pr_warn("Fail to allocate Hyper-V swiotlb buffer.\n");
+
+   swiotlb_init_with_tbl(hyperv_io_tlb_start,
+ hyperv_io_tlb_size >> IO_TLB_SHIFT, true);
+}
+
+int __init hyperv_swiotlb_detect(void)
+{
+   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
+   return 0;
+
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   /*
+* Enable swiotlb force mode in Isolation VM to
+* use swiotlb bounce buffer for dma transaction.
+*/
+   if (hv_isolation_type_snp())
+   swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
+   swiotlb_force = SWIOTLB_FORCE;
+   return 1;
+}
+
+static void __init hyperv_iommu_swiotlb_later_init(void)
+{
+   /*
+* Swiotlb bounce buffer needs to be mapped in extra address
+* space. Map function doesn't work in the early place and so
+* ca

[PATCH V3 5/5] hv_netvsc: Add Isolation VM support for netvsc driver

2021-12-01 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma adress will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

rx/tx ring buffer is allocated via vzalloc() and they need to be
mapped into unencrypted address space(above vTOM) before sharing
with host and accessing. Add hv_map/unmap_memory() to map/umap rx
/tx ring buffer.

Signed-off-by: Tianyu Lan 
---
Change since v2:
   * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer.
---
 arch/x86/hyperv/ivm.c |  28 ++
 drivers/hv/hv_common.c|  11 +++
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|   5 ++
 8 files changed, 187 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 69c7a57f3307..9f78d8f67ea3 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int 
pagecount, bool visibl
kfree(pfn_array);
return ret;
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / PAGE_SIZE; i++)
+   pfns[i] = virt_to_hvpfn(addr + i * PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
index 7be173a99f27..3c5cb1f70319 100644
--- a/drivers/hv/hv_common.c
+++ b/drivers/hv/hv_common.c
@@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, 
void *output, u32 input_s
return HV_STATUS_INVALID_PARAMETER;
 }
 EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
+void __weak *hv_map_memory(void *addr, unsigned long size)
+{
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(hv_map_memory);
+
+void __weak hv_unmap_memory(void *addr)
+{
+}
+EXPORT_SYMBOL_GPL(hv_unmap_memory);
diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index 315278a7cf88..cf69da0e296c 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
struct vmbus_gpadl recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,7 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
u32 send_buf_size;
struct vmbus_gpadl send_buf_gpadl_handle;
u32 send_section_cnt;
@@ -1731,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 396bc1c204e6..b7ade735a806 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   hv_unmap_memory(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   hv_unmap_memory(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -338,6 +3

[PATCH V3 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver

2021-12-01 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap)
to map these memory during sending/receiving packet and return swiotlb
bounce buffer dma address. In Isolation VM, swiotlb  bounce buffer is
marked to be visible to host and the swiotlb force mode is enabled.

Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to
keep the original data offset in the bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/vmbus_drv.c |  1 +
 drivers/scsi/storvsc_drv.c | 37 +
 include/linux/hyperv.h |  1 +
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 0a64ccfafb8b..ae6ec503399a 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2121,6 +2121,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
hv_debug_add_dev_dir(child_device_obj);
 
child_device_obj->device.dma_mask = _dma_mask;
+   child_device_obj->device.dma_parms = _device_obj->dma_parms;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 20595c0ba0ae..ae293600d799 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context)
continue;
}
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
+   scsi_dma_unmap(scmnd);
}
 
storvsc_on_receive(stor_device, packet, request);
@@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
struct hv_host_device *host_dev = shost_priv(host);
struct hv_device *dev = host_dev->dev;
struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd);
-   int i;
struct scatterlist *sgl;
unsigned int sg_count;
struct vmscsi_request *vm_srb;
@@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload_sz = sizeof(cmd_request->mpb);
 
if (sg_count) {
-   unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
-   u64 hvpfn;
+   struct scatterlist *sg;
+   unsigned long hvpfn, hvpfns_to_add;
+   int j, i = 0;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   sg_count = scsi_dma_map(scmnd);
+   if (sg_count < 0)
+   return SCSI_MLQUEUE_DEVICE_BUSY;
 
-   for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
+   for_each_sg(sgl, sg, sg_count, j) {
/*
-* Init values for the current sgl entry. hvpgoff
-* and hvpfns_to_add are in units of Hyper-V size
-* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE
-* case also handles values of sgl->offset that are
-* larger than PAGE_SIZE. Such offsets are handled
-* even on other than the first sgl entry, provided
-* they are a multiple of PAGE_SIZE.
+* Init values for the current sgl entry. hvpfns_to_add
+* is in units of Hyper-V size pages. Handling the
+* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles
+* values of sgl->offset that are larger than PAGE_SIZE.
+* Such offsets are handled even on other than the first
+* sgl entry, provided they are a multiple of PAGE_SIZE.
 */
-   hvpgoff = HVPFN_DOWN(sgl->offset);
-   hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff;
-   hvpfns_to_add = HVPFN_UP(sgl->offset + sgl->length) -
-   hvpgoff;
+   hvpfn = HVPFN_DOWN(sg_dma_address(sg));
+   hvpfns_to_add = HVPFN_UP(sg_dma_address(sg) +
+  

[PATCH V3 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-01 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides Isolation VM which has memory encrypt support. Add
hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
attribute.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cc_platform.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
index 03bb2f343ddb..f3bb0431f5c5 100644
--- a/arch/x86/kernel/cc_platform.c
+++ b/arch/x86/kernel/cc_platform.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
@@ -58,9 +59,23 @@ static bool amd_cc_platform_has(enum cc_attr attr)
 #endif
 }
 
+static bool hyperv_cc_platform_has(enum cc_attr attr)
+{
+#ifdef CONFIG_HYPERV
+   if (attr == CC_ATTR_GUEST_MEM_ENCRYPT)
+   return true;
+   else
+   return false;
+#else
+   return false;
+#endif
+}
 
 bool cc_platform_has(enum cc_attr attr)
 {
+   if (hv_is_isolation_supported())
+   return hyperv_cc_platform_has(attr);
+
if (sme_me_mask)
return amd_cc_platform_has(attr);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V3 1/5] Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-12-01 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary (E.G 39 bit
address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access
physical address will be original physical address + shared_gpa_boundary.
The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of
memory(vTOM). Memory addresses below vTOM are automatically treated as
private while memory above vTOM is treated as shared.

Expose swiotlb_unencrypted_base for platforms to set unencrypted
memory base offset and platform calls swiotlb_update_mem_attributes()
to remap swiotlb mem to unencrypted address space. memremap() can
not be called in the early stage and so put remapping code into
swiotlb_update_mem_attributes(). Store remap address and use it to copy
data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
Change since v2:
* Leave mem->vaddr with phys_to_virt(mem->start) when fail
  to remap swiotlb memory.

Change since v1:
* Rework comment in the swiotlb_init_io_tlb_mem()
* Make swiotlb_init_io_tlb_mem() back to return void.
---
 include/linux/swiotlb.h |  6 ++
 kernel/dma/swiotlb.c| 47 -
 2 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 569272871375..f6c3638255d5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool
+ * may be remapped in the memory encrypted case and store virtual
+ * address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
@@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 }
 #endif /* CONFIG_DMA_RESTRICTED_POOL */
 
+extern phys_addr_t swiotlb_unencrypted_base;
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8e840fbbed7c..adb9d06af5c8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -50,6 +50,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem io_tlb_default_mem;
 
+phys_addr_t swiotlb_unencrypted_base;
+
 /*
  * Max segment that we can provide which (if pages are contingous) will
  * not be bounced (unless SWIOTLB_FORCE is set).
@@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val)
return DIV_ROUND_UP(val, IO_TLB_SIZE);
 }
 
+/*
+ * Remap swioltb memory in the unencrypted physical address space
+ * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
+ * Isolation VMs).
+ */
+void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   void *vaddr = NULL;
+
+   if (swiotlb_unencrypted_base) {
+   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
+
+   vaddr = memremap(paddr, bytes, MEMREMAP_WB);
+   if (!vaddr)
+   pr_err("Failed to map the unencrypted memory %llx size 
%lx.\n",
+  paddr, bytes);
+   }
+
+   return vaddr;
+}
+
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
  * perform the desired operations.  This function allows the architecture to
@@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void)
vaddr = phys_to_virt(mem->start);
bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT);
set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+
+   mem->vaddr = swiotlb_mem_remap(mem, bytes);
+   if (!mem->vaddr)
+   mem->vaddr = vaddr;
+
+   memset(mem->vaddr, 0, bytes);
 }
 
 static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
@@ -196,7 +225,18 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
mem->slots[i].alloc_size = 0;
}
+
+   /*
+* If swiotlb_unencrypted_base is set, the bounce buffer memory will
+* be remapped and cleared in swiotlb_update_mem_attributes.
+*/
+   if (swiotlb_unencrypted_base)

[PATCH V3 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

2021-12-01 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

This patchset is to enable swiotlb bounce buffer for netvsc/storvsc
in Isolation VM.

This version follows Michael Kelley suggestion in the following link.
https://lkml.org/lkml/2021/11/24/2044

Change since v2:
 * Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add
   hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra
   address space.
 * Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start)
   when fail to remap swiotlb memory.

Change since v1:
 * Add Hyper-V Isolation support check in the cc_platform_has()
   and return true for guest memory encrypt attr.
 * Remove hv isolation check in the sev_setup_arch()

Tianyu Lan (5):
  Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
  hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  scsi: storvsc: Add Isolation VM support for storvsc driver
  hv_netvsc: Add Isolation VM support for netvsc driver

 arch/x86/hyperv/ivm.c |  28 ++
 arch/x86/kernel/cc_platform.c |  15 
 arch/x86/xen/pci-swiotlb-xen.c|   3 +-
 drivers/hv/hv_common.c|  11 +++
 drivers/hv/vmbus_drv.c|   4 +
 drivers/iommu/hyperv-iommu.c  |  56 
 drivers/net/hyperv/hyperv_net.h   |   5 ++
 drivers/net/hyperv/netvsc.c   | 136 +-
 drivers/net/hyperv/netvsc_drv.c   |   1 +
 drivers/net/hyperv/rndis_filter.c |   2 +
 drivers/scsi/storvsc_drv.c|  37 
 include/asm-generic/mshyperv.h|   2 +
 include/linux/hyperv.h|  14 +++
 include/linux/swiotlb.h   |   6 ++
 kernel/dma/swiotlb.c  |  47 +--
 15 files changed, 342 insertions(+), 25 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


  1   2   3   4   >